Your team needs to build a data‑processing pipeline that reads records from a DynamoDB stream and writes transformed items to another DynamoDB table.
The Lambda function must guarantee exactly‑once processing even if the function is retried due to failures or throttling.
Explain how you would design the Lambda function and its interaction with DynamoDB to achieve exactly‑once semantics, and list any limitations or edge cases you need to handle.
-
Use DynamoDB Streams with
TRIM_HORIZONandNEW_IMAGE- Enable a stream on the source table with the
NEW_IMAGEview type so each record contains the full item after the write. - The Lambda event source mapping reads the stream in order per partition key, preserving the sequence.
- Enable a stream on the source table with the
-
Idempotent Write Logic
- In the target table, include a deduplication attribute (e.g.,
sourceVersionor a hash of the source item’sPK+SK+Version). - When writing, use a conditional put (
ConditionExpression: attribute_not_exists(sourceVersion)) so the write succeeds only if that version hasn’t been processed before. - If the condition fails, treat it as a duplicate and simply acknowledge the record.
- In the target table, include a deduplication attribute (e.g.,
-
Leverage DynamoDB Transaction API
- Wrap the read‑transform‑write in a single transaction (
TransactWriteItems) that includes: a) A condition check on the source record’sStreamViewTypeversion (e.g.,attribute_exists(sourceVersion)) to ensure the source hasn’t changed since the stream event was generated. b) The put into the destination table with the deduplication attribute. - Transactions are atomic; either both succeed or both fail, preventing partial updates.
- Wrap the read‑transform‑write in a single transaction (
-
Handle Retries & Throttling
- Configure the Lambda event source mapping with a maximum retry attempts (default 5) and a dead‑letter queue (DLQ) for records that still fail.
- Ensure the function is idempotent (step 2) so retries do not create duplicate target items.
-
Optional: Use a
DynamoDBUpdateItemwithADDon a processed‑set attribute- For low‑volume streams, maintain a small set of processed stream
SequenceNumbers in a separate “tracker” item. - Use
UpdateItemwithADDand a condition that the sequence number isn’t already present. This provides an explicit exactly‑once guard without relying on target‑table attributes.
- For low‑volume streams, maintain a small set of processed stream
Limitations & Edge Cases
| Issue | Why it matters | Mitigation |
|---|---|---|
| Out‑of‑order processing across shards | Streams guarantee order only within a partition key. If your logic depends on global ordering, you must add a sequencing layer (e.g., a Kinesis stream). | Design the pipeline to be order‑agnostic per item, or introduce a downstream ordering service. |
| Stream record expiration | Stream records are retained for 24 h (default) – if the Lambda falls behind, records may be lost. | Set a higher retention (up to 7 days) and monitor IteratorAge metric; scale Lambda concurrency accordingly. |
| Transaction size limits | DynamoDB transactions can include up to 25 items and 4 MB total. | Keep each transaction to a single source‑target pair; batch processing must be split. |
| Conditional write race | Two concurrent Lambda invocations for the same source key could both pass the condition check before either writes. | Use the transaction approach (condition check + put) which is atomic, or include a unique version attribute that changes on every source write. |
| DLQ handling | Records that end up in the DLQ have not been processed exactly once. | Implement a manual re‑processing job that inspects the DLQ, applies the same idempotent logic, and moves successful items back to the target table. |
Summary Design Steps
- Enable DynamoDB stream (
NEW_IMAGE). - Create Lambda with Go handler that:
- Parses the stream record.
- Computes a deterministic deduplication key.
- Calls
TransactWriteItemswith a condition check on the source version and a conditional put into the destination table.
- Configure event source mapping: batch size ~ 100, max retries, DLQ (e.g., SQS).
- Monitor
IteratorAge,ThrottledRequests, and DLQ length; adjust concurrency or provisioned throughput as needed.
With this design the pipeline achieves exactly‑once semantics while handling retries, throttling, and potential race conditions.
Top comments (0)