When a host needs to send many records per second (RPS) to Amazon Kinesis, simply calling the basic PutRecord API action in a loop is inadequate. To reduce overhead and increase throughput, the application must batch records and implement parallel HTTP requests. This will increase the efficiency overall and ensure you are optimally using the shards.
1. What is Amazon Kinesis Data Streams?
Amazon Kinesis Data Streams is a service for collecting, processing, and analyzing real-time streaming data.
A data stream consists of shards, which are like parallel lanes of traffic — each shard processes a portion of the total data.
-
Each shard has a fixed capacity:
- 1 MB/sec write throughput (data input)
- 2 MB/sec read throughput (data output)
If you have more data flowing than one shard can handle, you need more shards.
2. What is Resharding?
Resharding = adjusting the number of shards in a stream to match changing data rates.
This is advanced because it changes the structure of the stream and may impact how applications read/write data.
Resharding lets you:
- Increase shards → handle more data (more capacity)
- Decrease shards → reduce costs if less capacity is needed
3. Two Types of Resharding Operations
a) Shard Split
- Purpose: Increase capacity
- How it works: Take one shard (parent shard) and split it into two shards (child shards).
- Effect: More shards → more parallelism → higher data throughput.
- Example: If your stream has a single shard and data rate doubles → split it so each child shard handles part of the data.
b) Shard Merge
- Purpose: Reduce capacity
- How it works: Take two shards (parent shards) and merge them into one shard (child shard).
- Effect: Fewer shards → less parallelism → reduced capacity but lower cost.
- Example: If data rate drops → merge shards to save costs.
4. Important Details
-
Resharding is always pairwise:
- Split: one shard → exactly two child shards
- Merge: exactly two shards → one child shard
The parent shard(s) are the ones being split/merged.
The child shard(s) are the result of the resharding operation.
5. Cost Implications
- Splitting increases shards → increases cost (because AWS charges per shard).
- Merging decreases shards → decreases cost.
6. Throughput Scaling
If your incoming data rate increases:
- You need to increase shards to maintain performance.
- Resharding allows this by adding more shards.
Throughput for Kinesis Data Streams is designed to scale without limits as long as you add enough shards.
7. How Resharding Happens
- AWS provides the UpdateShardCount API to adjust the number of shards.
- This API handles the resharding logic without requiring you to manually split or merge shards.
Example Scenario
Imagine a stream with 2 shards, each handling 1 MB/sec.
If your data rate jumps from 2 MB/sec to 4 MB/sec:
- Solution: Split one or both shards → increase shard count to 4.
- Now each shard handles only 1 MB/sec → capacity doubled.
If the data rate later drops to 1 MB/sec:
- Solution: Merge shards → reduce from 4 to 2 shards → save cost.
Resharding is the way Kinesis Data Streams dynamically adapts capacity to changes in data flow, balancing performance and cost.
Top comments (0)