Ankit malik

Posted on Jan 31

Scaling Amazon Kinesis and AWS Lambda

#aws #kinesis #lambda #systemdesign

Introduction

Amazon Kinesis Data Streams (provisioned mode) and AWS Lambda are commonly paired to build event-driven pipelines. Getting them to scale predictably requires understanding how capacity is allocated (shards), how Lambda consumes shards, and which operational signals reveal problems. This note corrects common terminology and gives concrete actions you can take when streams backlog or show increased latency.

1. Kinesis fundamentals

Shards are the unit of scale.

Per-shard limits (provisioned mode):
- Write: 1 MB/sec or 1,000 records/sec
- Read: 2 MB/sec (shared across consumers)
If input rate approaches shard limits you must increase shard count; if you double shards you roughly double throughput and shard-hour cost.

Partition-key distribution matters.

Low-cardinality keys → hot shards → throttling regardless of total shard count.
Best practice: high-cardinality keys or composite/hash keys to spread load.

Read more about kinesis in my previous article - link

2. Lambda consumption model

Shard-driven parallelism

Each shard is the unit Lambda polls. Parallelism is driven by shard count and the “concurrent batches per shard” setting. This is similar to increasing the number of consumers for other message broker like kafka, rabbitmq etc. We increase consumer in this way.
Rough mapping:
- Max concurrent Lambdas = shard_count × concurrent_batches_per_shard (subject to Lambda account-level concurrency limits).
Ordering: preserved within a batch, not across concurrently processed batches from the same shard.

Batch size and windowing

BatchSize controls how many records Lambda receives per invocation.
MaximumBatchingWindowInSeconds controls how long Kinesis will wait to accumulate a batch.
Trade-offs:
- Larger batch → fewer invocations, higher per-invocation work, higher end-to-end latency.
- Smaller batch → more invocations, lower latency, higher overhead/cost.

3. The parameter: “Concurrent batches per shard”

AWS console & Lambda event source mapping calls it “Concurrent batches per shard” (sometimes displayed as a UI field), not ParallelizationFactor in that terminology.
Behavior:
- Default is 1 (one batch processed at a time per shard).
- Increasing this lets Lambda process multiple batches concurrently from the same shard.
Use-case:
- When a single batch takes long (CPU-heavy work, slow downstream calls), increasing concurrent batches lets you drain the backlog faster without increasing shard count immediately.
Caution:
- Ordering guarantees weaken: ordering is guaranteed only within a batch. If strict ordering across all records is required, increasing concurrent batches may violate your ordering assumptions.

4. How to recover a stuck/lagging stream

If you see many messages backlogged and GetRecords.IteratorAgeMilliseconds rising, you can:

Confirm whether the consumer is the bottleneck (high Lambda duration, throttles, or low concurrency).
If consumer CPU or downstream latency is the cause, increase Concurrent batches per shard to add concurrency per shard and clear backlog faster.
- This increases concurrent Lambda invocations and potentially cost — check account concurrency limits and downstream capacity.
If shard-level throughput is saturated (Write/Read throttles, high WriteProvisionedThroughputExceeded), add shards — do not rely on increasing concurrent batches alone.
If multiple consumers exist and read contention is present, evaluate Enhanced Fan-Out (EFO) to allocate dedicated read throughput per consumer.

Note: Increasing concurrent batches is a fast operational lever to reduce iterator age when consumer-side parallelism is the limiting factor. If Kinesis itself is slow to deliver records, increasing concurrency will not fix it (see Section 6).

5. Provisioned streams — right-size shards for input/output

For provisioned mode, ensure shard count can handle both incoming writes and outgoing reads:
- If incoming rate > shard write capacity → producer-side throttles.
- If consumers require more read throughput than shard-level read capacity → consider more shards or EFO.
Strategy:
- Measure incoming rate (IncomingRecords / IncomingBytes).
- Map required consumer parallelism to shards × concurrent batches per shard, while observing downstream limits.
- Resize shards only when shard-level saturation is the root cause.

6. Key metrics to monitor (CloudWatch), and what they mean

Create dashboards and alerts for these metrics. The ones below are essential:

Kinesis-level

IncomingRecords / IncomingBytes — production or input rates.
GetRecords.Records — records returned per GetRecords.
GetRecords.IteratorAgeMilliseconds — consumer lag (older values = backlog).
GetRecords.Latency - Average (Milliseconds) (a.k.a. GetRecords latency - average in console) — how long Kinesis is taking to respond to GetRecords.
- If this metric rises significantly, it indicates Kinesis is slow to deliver records — this can be an AWS-side or stream-level issue. Increasing consumer concurrency will not help if Kinesis itself is the bottleneck.
WriteProvisionedThroughputExceeded / ReadProvisionedThroughputExceeded — throttling on shard capacity.
You can create this type of cloudwatch metrics dashboard to to check if there is any issues similar to this screenshot where read latency is increased which caused fewer reads.

Lambda-level

ConcurrentExecutions — how many Lambdas are active; should correlate with shards × concurrent_batches_per_shard.
IteratorAge (Lambda) — similar to Kinesis iterator age; useful cross-check.
Duration and Throttles — consumer-side slowness and throttling.
Invocation count and error/ DLQ metrics.

7. GetRecords latency — special attention

GetRecords latency - average (Milliseconds) measures how long Kinesis takes to return records for a GetRecords call.
Operational interpretation:
- A rising trend can mean Kinesis is under pressure, experiencing internal delays, or there are transient AWS-side issues. When this increases:
  - Check AWS service health & region notifications.
  - Check whether many consumers are contending for read throughput.
  - Investigate shard-level throttling.
- If GetRecords latency is high and IteratorAgeMilliseconds continues to increase despite raising consumer concurrency, this points to Kinesis/service-side delivery problems rather than your consumer.
- To resolve this quickly you can increase “Concurrent batches per shard” which can help process the records faster.

8. Cost considerations

Increasing shards increases Kinesis shard-hour costs linearly.
Increasing concurrent batches per shard increases Lambda concurrency and invocation count — this increases Lambda costs (GB-seconds × invocations).
EFO brings additional Kinesis costs per consumer.

Conclusion

Use Concurrent batches per shard (not the misnamed ParallelizationFactor) to increase parallelism per shard when consumer-side slowness is the cause of backlog.
Right-size shards for your input and output workloads — don’t rely exclusively on increasing Lambda concurrency.
Monitor GetRecords.IteratorAgeMilliseconds and GetRecords.Latency - Average (Milliseconds) closely: iterator age shows backlog; GetRecords latency indicates whether Kinesis itself is slow to deliver messages.
Build a focused CloudWatch dashboard with the panels and alerts above so you can confidently distinguish consumer-side vs Kinesis-side issues and act accordingly.

DEV Community