Duplicate processing is one of those problems that looks small in a diagram and very expensive in production.
I have seen teams build clean event-driven and Lambda-based systems, only to run into duplicate charges, duplicated emails, repeated downstream writes, or inconsistent state once retries and redrives start happening. The tricky part is that the system is often behaving as designed. AWS services are doing what they should do: retrying, buffering, redriving, and favoring delivery durability.
This is exactly why I consider idempotency architecture one of the most important and most underexplained topics in serverless engineering.
In this post, I will walk through how I design idempotency for Lambda-driven systems on AWS, including:
- the exactly-once myth vs the at-least-once reality
- how to choose idempotency keys and dedupe windows
- a DynamoDB-based idempotency store design
- using AWS Lambda Powertools idempotency utility
- handling retries from API Gateway, SQS, and EventBridge
- an end-to-end implementation walkthrough with code
I will keep this practical and architecture-heavy, so you can adapt it to real workloads instead of only toy examples.
The core idea in one sentence
Idempotency means I can safely process the same logical request more than once and still end up with the same intended outcome.
That does not mean the system only receives the request once. It means my system is resilient when it receives it multiple times.
Exactly-once is usually not the right mental model
A lot of production mistakes start with the assumption that a serverless flow will process each request exactly once.
In practice, in distributed systems, what I usually get is:
- at-least-once delivery (common with queues/events)
- retries at multiple layers (client SDKs, AWS services, Lambda, Step Functions, etc.)
- timeouts and ambiguous outcomes (did the function finish but the caller timed out?)
- redrives / replay (DLQ reprocessing, archive replay, manual reruns)
- duplicate submissions from clients (double-click, refresh, mobile reconnect)
So instead of trying to force an “exactly-once” guarantee everywhere, I design for:
- at-least-once delivery
- idempotent handlers
- safe retries
- observability around duplicates
That mental shift makes the architecture much more robust.
Where duplicates come from in AWS Lambda-driven systems
Before I show the solution, I like to make the duplicate paths explicit.
API Gateway -> Lambda
Duplicates can happen when:
- the client retries after a timeout
- the network drops after the backend succeeded
- the user taps “Submit” multiple times
- an upstream reverse proxy retries
SQS -> Lambda
Duplicates can happen when:
- Lambda fails and the message becomes visible again
- processing exceeds visibility timeout
- partial batch failures cause some records to be retried
- DLQ redrive sends records back later
- standard queues deliver the same message more than once
EventBridge -> Lambda
Duplicates can happen when:
- target invocation is retried
- a publisher emits semantically duplicate events
- archive/replay is used
- consumers reprocess historical events intentionally
That is why I architect idempotency at the business operation level, not at just one transport layer.
What a good idempotency architecture looks like
At a high level, I want:
- a stable idempotency key for each logical operation
- a dedupe window (TTL) appropriate for the business
- a persistence store to track processing status and cached results
- conditional writes to prevent concurrent duplicate execution
- response replay for safe duplicate requests when appropriate
- clear behavior for mismatched payloads using the same key
Architecture at a glance
End-to-end walkthrough (the scenario I will implement)
To make this concrete, I will use a common example:
“Create payment intent / order processing” style operation.
Flow
- A client sends
POST /paymentswith anIdempotency-Keyheader. - API Gateway invokes Lambda.
- Lambda checks DynamoDB idempotency table.
- If the key is new, Lambda acquires an IN_PROGRESS lock and processes the request.
- Lambda writes the business result (for example, a payment record) and stores the response in the idempotency table with status COMPLETED.
- If the same request is retried, Lambda returns the cached response instead of processing again.
Then I will extend the same pattern to:
- SQS worker retries
- EventBridge consumer retries/replay
Designing the idempotency key
This is where a lot of teams accidentally introduce bugs.
A good idempotency key should identify the logical operation, not just the transport envelope.
Good key examples
payment:{merchant_id}:{client_request_id}order-create:{tenant_id}:{cart_checkout_id}invoice-email:{invoice_id}:{template_version}-
event:{event_id}(if the publisher guarantees a stable event ID)
Risky key choices
- raw timestamp
- Lambda
aws_request_id(changes every invocation) - SQS
receiptHandle(changes on delivery) - entire payload serialized without normalization (field order / formatting issues)
- keys that are too broad (cause false dedupe)
- keys that are too narrow (miss duplicates)
My rule of thumb
I choose a key from business identity + operation intent, and I define it explicitly in the contract.
For APIs, that often means:
- require an
Idempotency-Keyheader from the client, and - validate that it maps to a stable request intent.
For asynchronous consumers, that often means:
- use the publisher’s stable
eventId, or - derive a deterministic business key (for example
order_id + action).
Dedupe windows (TTL): how long should I remember a key?
There is no universal value. I set the dedupe window based on business and retry patterns.
What affects the dedupe window
- expected client retry duration
- SQS redrive timing and replay operations
- EventBridge replay windows / operational reruns
- whether duplicates after long periods are still harmful
- cost of storing idempotency records
Practical examples
- API payment creation: 24 hours to 7 days
- Webhook ingestion: 1 to 7 days (depends on provider retry policy)
- Batch event processing: hours to days
- high-volume telemetry: maybe minutes to hours (if duplicate impact is low)
Important nuance
TTL in DynamoDB is eventually applied, not immediate deletion. I design my logic so:
-
expiry_timestampis authoritative in code, and - TTL is the cleanup mechanism.
In other words, I do not depend on the item disappearing exactly at expiry time.
DynamoDB-based idempotency store design (recommended pattern)
I prefer DynamoDB for idempotency state in Lambda workloads because it gives me:
- low-latency key lookups
- conditional writes
- TTL support
- simple scaling
- good fit for stateless Lambda functions
Table design (single-purpose table)
A dedicated table keeps the pattern easy to reason about.
Partition key
-
id(string): the idempotency key
Recommended attributes
-
status(IN_PROGRESS,COMPLETED, optionallyEXPIRED) -
expiryTimestamp(epoch seconds for dedupe window) -
inProgressExpiryTimestamp(shorter lock expiry to recover from crashed executions) -
payloadHash(optional but highly recommended) -
responseData(optional, cached result or safe response envelope) createdAtupdatedAt-
source(api / sqs / eventbridge) -
functionName(optional for ops visibility)
Why payloadHash matters
If a client reuses the same idempotency key with a different payload, I want to detect that and reject it. Otherwise I can accidentally return a cached response for the wrong request.
This is a subtle but critical best practice.
State transitions I use
Here is the lifecycle I generally implement:
-
No record exists
- Try conditional write -> create
IN_PROGRESS
- Try conditional write -> create
-
IN_PROGRESSexists- Another invocation is already processing (or crashed recently)
- Return a retryable outcome or fail fast depending on source
-
COMPLETEDexists and not expired- Return cached result (or safe ack)
-
Expired
- Treat as a new request (depending on business policy)
That gives me concurrency safety and retry safety.
Infrastructure example (SAM / CloudFormation snippets)
Below is a minimal setup for:
- Lambda function
- DynamoDB idempotency table
- IAM permissions
- env vars for configuration
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Idempotent Lambda API example
Resources:
PaymentsFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.12
Handler: app.lambda_handler
CodeUri: src/
MemorySize: 512
Timeout: 29
Policies:
- Statement:
- Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:UpdateItem
- dynamodb:DeleteItem
Resource: !GetAtt IdempotencyTable.Arn
Environment:
Variables:
IDEMPOTENCY_TABLE_NAME: !Ref IdempotencyTable
IDEMPOTENCY_EXPIRES_SECONDS: "86400" # 24h
Events:
CreatePaymentApi:
Type: Api
Properties:
Path: /payments
Method: post
IdempotencyTable:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: id
AttributeType: S
KeySchema:
- AttributeName: id
KeyType: HASH
TimeToLiveSpecification:
AttributeName: expiration
Enabled: true
PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true
SSESpecification:
SSEEnabled: true
Outputs:
ApiUrl:
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/payments"
Notes on the table schema vs Powertools
AWS Lambda Powertools idempotency utility has its own default attribute names and record model. If I use the library, I usually let it manage the item shape and only customize when I truly need to.
That said, I still think through the conceptual schema above so the team understands what is being stored and why.
Implementation option 1 (recommended): AWS Lambda Powertools idempotency utility
If I am using Python Lambda functions, AWS Lambda Powertools is my default choice. It saves me from reimplementing concurrency locks, conditional checks, and record state transitions from scratch.
Install
pip install aws-lambda-powertools[boto3]
API Gateway example with idempotency (Python)
This example assumes the client sends:
-
Idempotency-Keyheader - JSON body containing payment details
I use:
-
event_key_jmespathto extract the idempotency key from headers -
payload_validation_jmespathto validate that key reuse with different payloads is detected - a DynamoDB persistence layer
import json
import os
from typing import Any, Dict
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.idempotency import (
DynamoDBPersistenceLayer,
IdempotencyConfig,
idempotent,
)
from aws_lambda_powertools.utilities.idempotency.exceptions import (
IdempotencyValidationError,
)
logger = Logger(service="payments-api")
TABLE_NAME = os.environ["IDEMPOTENCY_TABLE_NAME"]
EXPIRES_SECONDS = int(os.getenv("IDEMPOTENCY_EXPIRES_SECONDS", "86400"))
persistence_layer = DynamoDBPersistenceLayer(table_name=TABLE_NAME)
config = IdempotencyConfig(
# API Gateway/Lambda proxy event header lookup (normalize to lower-case in code if needed)
event_key_jmespath="headers.idempotency-key",
# Payload fields that should remain consistent when reusing the same key
payload_validation_jmespath="powertools_json(body).[customerId, amount, currency]",
expires_after_seconds=EXPIRES_SECONDS,
use_local_cache=True,
)
def create_payment_intent(request: Dict[str, Any]) -> Dict[str, Any]:
# Replace this with your real business logic / external payment call.
# The critical point is that this function is wrapped with idempotency.
payment_id = f"pay_{request['customerId']}_{request['amount']}_{request['currency']}"
return {
"paymentId": payment_id,
"status": "AUTHORIZED",
"amount": request["amount"],
"currency": request["currency"],
}
@idempotent(config=config, persistence_store=persistence_layer)
def process_request(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
body = json.loads(event.get("body") or "{}")
required = ["customerId", "amount", "currency"]
missing = [k for k in required if k not in body]
if missing:
return {
"statusCode": 400,
"body": json.dumps({"message": f"Missing required fields: {missing}"})
}
result = create_payment_intent(body)
# Powertools can persist and return this response for duplicate calls
return {
"statusCode": 200,
"body": json.dumps(result),
"headers": {"Content-Type": "application/json"},
}
@logger.inject_lambda_context(log_event=False)
def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
# Normalize headers so event_key_jmespath is predictable
headers = event.get("headers") or {}
event["headers"] = {str(k).lower(): v for k, v in headers.items()}
try:
return process_request(event, context)
except IdempotencyValidationError:
return {
"statusCode": 409,
"body": json.dumps({
"message": "Idempotency key was reused with a different request payload"
}),
"headers": {"Content-Type": "application/json"},
}
Why this pattern works well
- If the same request is retried, Powertools returns the stored response.
- If the same key is reused with a different body, I return
409 Conflict. - I avoid duplicate payment authorization for simple retries.
API client contract (important and often skipped)
Idempotency works much better when the contract is explicit.
For API producers/clients, I document:
-
Idempotency-Keyis required for mutating operations (POST, sometimesPATCH) - same key + same intent/payload -> safe retry
- same key + different payload ->
409 Conflict - dedupe window (for example, 24h)
- response replay behavior (cached result may be returned)
That avoids ambiguity across frontend, mobile, and backend teams.
Handling retries from SQS (Lambda event source mapping)
SQS is one of the most common places where teams need idempotency but only discover it after duplicates happen.
What changes for SQS?
- Lambda receives a batch of messages.
- Some messages may succeed while others fail.
- I should use partial batch response so only failed records are retried.
- Each message should still be processed idempotently.
Key design for SQS consumers
I avoid using transient delivery metadata. I prefer:
- a business key in the message body (for example
orderId) - or an upstream event ID included in the message
Example key:
order-paid:{orderId}inventory-reservation:{reservationId}
SQS consumer example (Powertools Batch + per-record idempotency)
Below is a simplified Python example using:
- Powertools Batch utility for partial batch handling
- Powertools idempotency on the per-record processing function
import json
import os
from typing import Any, Dict
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.batch import BatchProcessor, EventType, process_partial_response
from aws_lambda_powertools.utilities.batch.types import PartialItemFailureResponse
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
from aws_lambda_powertools.utilities.idempotency import (
DynamoDBPersistenceLayer,
IdempotencyConfig,
idempotent_function,
)
logger = Logger(service="orders-sqs-worker")
TABLE_NAME = os.environ["IDEMPOTENCY_TABLE_NAME"]
persistence_layer = DynamoDBPersistenceLayer(table_name=TABLE_NAME)
# Here we build the key from the function argument "data"
# (Powertools hashes the configured subset to create an idempotency key)
idempotency_config = IdempotencyConfig(
event_key_jmespath="orderId",
payload_validation_jmespath="[orderId, action, version]",
expires_after_seconds=3 * 24 * 60 * 60, # 3 days
)
processor = BatchProcessor(event_type=EventType.SQS)
@idempotent_function(data_keyword_argument="data", config=idempotency_config, persistence_store=persistence_layer)
def process_order_event(*, data: Dict[str, Any]) -> Dict[str, Any]:
# Business logic here (must be safe to retry / replay via idempotency)
# Example: reserve inventory, update status, publish follow-up event, etc.
logger.info("Processing order event", extra={"orderId": data["orderId"], "action": data["action"]})
return {"ok": True, "orderId": data["orderId"], "action": data["action"]}
def record_handler(record: SQSRecord) -> Dict[str, Any]:
payload = json.loads(record.body)
return process_order_event(data=payload)
def lambda_handler(event, context) -> PartialItemFailureResponse:
return process_partial_response(
event=event,
record_handler=record_handler,
processor=processor,
context=context,
)
Best practices I apply with SQS + idempotency
- Use partial batch response to avoid retrying already-successful records.
- Set visibility timeout longer than worst-case processing time (or heartbeat/extend strategy).
- Keep the idempotency key in the message payload, not delivery metadata.
- Use a dedupe window long enough to cover retries, DLQ redrive, and operational replay.
Handling retries from EventBridge targets
EventBridge makes event-driven architecture clean, but duplicate-safe consumers are still my responsibility.
EventBridge-specific considerations
- EventBridge may retry target delivery.
- Archive/replay can intentionally re-send events.
- Different publishers may emit semantically duplicate events unless the contract is strict.
Key strategy for EventBridge consumers
If the event has a stable id or domain event ID, I use it. If not, I derive one from:
detail-type- source/domain identifier
- business entity ID
- action/version
Example:
eventbridge:{source}:{detailType}:{detail.orderId}:{detail.version}
EventBridge consumer example (Python Lambda + Powertools idempotency)
import os
from typing import Any, Dict
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.idempotency import (
DynamoDBPersistenceLayer,
IdempotencyConfig,
idempotent,
)
logger = Logger(service="eventbridge-consumer")
TABLE_NAME = os.environ["IDEMPOTENCY_TABLE_NAME"]
persistence_layer = DynamoDBPersistenceLayer(table_name=TABLE_NAME)
config = IdempotencyConfig(
# Prefer a producer-defined unique event ID if available in detail
event_key_jmespath="detail.eventId || id",
payload_validation_jmespath="[source, detail-type, detail.orderId, detail.version]",
expires_after_seconds=7 * 24 * 60 * 60,
)
@idempotent(config=config, persistence_store=persistence_layer)
def handle_event(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
detail = event["detail"]
logger.info(
"Handling EventBridge event",
extra={"eventId": detail.get("eventId", event.get("id")), "orderId": detail.get("orderId")}
)
# Execute your domain logic here
# Example: update read model, trigger notification, call downstream API, etc.
return {"handled": True, "orderId": detail.get("orderId")}
def lambda_handler(event, context):
return handle_event(event, context)
When I implement idempotency manually (without Powertools)
Powertools is my default, but sometimes I implement manually when:
- I need a custom record schema shared across services/languages
- I need custom conflict behavior beyond the library defaults
- I am in a language/runtime where I am standardizing a platform abstraction
- I want explicit control over lock and result update semantics
The key principle stays the same: use DynamoDB conditional writes.
Manual DynamoDB idempotency pattern (Python + boto3)
The pattern below shows the core idea:
- Try to write
IN_PROGRESSwithattribute_not_exists(id) - If successful, execute business logic
- Update item to
COMPLETEDand store result - If conditional check fails, inspect existing item and decide
import json
import time
import hashlib
import boto3
from botocore.exceptions import ClientError
dynamodb = boto3.client("dynamodb")
TABLE_NAME = "IdempotencyTable"
def sha256_text(value: str) -> str:
return hashlib.sha256(value.encode("utf-8")).hexdigest()
def acquire_lock(idempotency_key: str, payload_hash: str, ttl_seconds: int = 86400, lock_seconds: int = 120):
now = int(time.time())
item = {
"id": {"S": idempotency_key},
"status": {"S": "IN_PROGRESS"},
"payloadHash": {"S": payload_hash},
"expiration": {"N": str(now + ttl_seconds)},
"inProgressExpiryTimestamp": {"N": str(now + lock_seconds)},
"createdAt": {"N": str(now)},
"updatedAt": {"N": str(now)},
}
try:
dynamodb.put_item(
TableName=TABLE_NAME,
Item=item,
ConditionExpression="attribute_not_exists(id)"
)
return {"acquired": True}
except ClientError as e:
if e.response["Error"]["Code"] != "ConditionalCheckFailedException":
raise
return {"acquired": False}
def get_record(idempotency_key: str):
resp = dynamodb.get_item(
TableName=TABLE_NAME,
Key={"id": {"S": idempotency_key}},
ConsistentRead=True,
)
return resp.get("Item")
def mark_completed(idempotency_key: str, response_obj: dict):
now = int(time.time())
dynamodb.update_item(
TableName=TABLE_NAME,
Key={"id": {"S": idempotency_key}},
UpdateExpression="SET #s = :completed, responseData = :resp, updatedAt = :now",
ExpressionAttributeNames={"#s": "status"},
ExpressionAttributeValues={
":completed": {"S": "COMPLETED"},
":resp": {"S": json.dumps(response_obj)},
":now": {"N": str(now)},
},
)
def handler(event, context):
body = json.loads(event["body"])
idem_key = event["headers"]["idempotency-key"]
payload_hash = sha256_text(json.dumps({
"customerId": body["customerId"],
"amount": body["amount"],
"currency": body["currency"],
}, sort_keys=True))
lock = acquire_lock(idem_key, payload_hash)
if not lock["acquired"]:
existing = get_record(idem_key)
if not existing:
# Rare race / eventual state issue; safe to retry
raise Exception("Retry request")
if existing.get("payloadHash", {}).get("S") != payload_hash:
return {"statusCode": 409, "body": json.dumps({"message": "Idempotency key payload mismatch"})}
status = existing["status"]["S"]
if status == "COMPLETED":
return {"statusCode": 200, "body": existing["responseData"]["S"]}
if status == "IN_PROGRESS":
# For API workflows you might return 409 or 425-style retry signal (implementation-specific)
return {"statusCode": 409, "body": json.dumps({"message": "Request is already in progress"})}
# Execute business logic after lock acquired
result = {"paymentId": "pay_123", "status": "AUTHORIZED"}
mark_completed(idem_key, result)
return {"statusCode": 200, "body": json.dumps(result)}
Manual pattern caveats
If I go manual, I also need to think about:
- expired
IN_PROGRESSlock recovery - exception handling and safe cleanup
- serialization of cached responses
- metrics for duplicate hits vs fresh requests
- consistent behavior across all event sources
That is exactly why Powertools is usually the better default.
End-to-end implementation discussion (how I wire this in production)
This is the part I care about most in architecture reviews: not just the code, but where idempotency sits in the overall system.
1) Idempotency belongs close to the handler boundary
I usually apply idempotency at the Lambda entry point (or record handler for batches), before business side effects occur.
Why:
- it prevents duplicate external calls
- it keeps the protection broad
- it makes retries safe by default
2) I still design downstream writes carefully
Idempotency at the Lambda layer is great, but if the function can partially succeed before crashing, I also check downstream safety:
- unique constraints in relational DBs
- conditional writes in DynamoDB
- provider-side idempotency support for payment APIs or webhooks
Think in layers, not in a single magic switch.
3) I define duplicate behavior per source
The response to a duplicate is not always the same.
- API Gateway: return cached success response (best UX)
- SQS: ack success for already-processed message, avoid poison-loop
- EventBridge: safely no-op or return success after dedupe
4) I separate “idempotency key” and “correlation ID”
They are related but not identical.
- Correlation ID -> tracing/observability
- Idempotency key -> duplicate suppression for a specific operation
Sometimes they can be the same, but I do not assume that.
Handling edge cases and failure modes
Edge case 1: Same key, different payload
This should be treated as a contract violation.
Best practice: return 409 Conflict (or equivalent domain error) and log it loudly.
Why I do this:
- protects clients from accidental misuse
- prevents serving incorrect cached results
- surfaces integration bugs early
Edge case 2: Function times out after making a side effect
This is the classic ambiguous outcome problem.
Idempotency helps, but only if:
- the side effect can be detected or safely repeated, and/or
- the result gets persisted before timeout
Best practices:
- keep timeouts realistic
- use downstream idempotency where available
- break long operations into Step Functions when needed
- persist progress checkpoints for multi-step work
Edge case 3: IN_PROGRESS records stuck after crashes
If a function crashes after acquiring the lock, duplicates may keep seeing IN_PROGRESS.
Best practices:
- use an in-progress lock expiry
- make retries back off
- alert on sustained
IN_PROGRESSaccumulation - evaluate whether the operation is safe to re-attempt after lock expiry
Edge case 4: Replay and backfill
Operational replay is common and healthy. I design for it intentionally.
Best practices:
- choose a dedupe window that matches replay expectations
- if replay should re-run effects, use a different idempotency namespace/version
- document replay semantics for ops teams
Example:
- normal key:
invoice-email:{invoiceId} - forced replay key namespace:
invoice-email:replay:{jobId}:{invoiceId}
Observability: what I monitor for idempotency
Idempotency is not just a code concern. I want to know how often it is being exercised and whether it is hiding a deeper issue.
Metrics I like to emit
IdempotencyFreshRequestsIdempotencyDuplicateHitsIdempotencyInProgressConflicts-
IdempotencyValidationConflicts(same key, different payload) IdempotencyStoreErrors- handler latency split by fresh vs duplicate
Logs I always include
- idempotency key (or redacted/hash if sensitive)
- source (
api,sqs,eventbridge) - dedupe outcome (
fresh,duplicate_completed,duplicate_in_progress,validation_mismatch) - business identifier (
orderId,paymentId, etc.)
This makes incident triage much faster.
Practical best practices checklist (the part I use in reviews)
Key selection
- [ ] Key represents a business operation, not transport metadata
- [ ] Key is stable across retries/replays
- [ ] Key cardinality is high enough to avoid false collisions
- [ ] Payload mismatch with same key is detected
Dedupe window
- [ ] TTL matches retry + redrive + replay realities
- [ ] Expiry is checked in code (not only by DynamoDB TTL cleanup)
- [ ] Window is documented in API/event contract
Store design
- [ ] DynamoDB conditional write used for first writer wins
- [ ]
IN_PROGRESSandCOMPLETEDstates are handled explicitly - [ ] Cached response/ack strategy is defined
- [ ] Encryption, backups/PITR, and least privilege are configured
Source-specific behavior
- [ ] API duplicates return deterministic response
- [ ] SQS uses partial batch response
- [ ] EventBridge consumer supports replay safely
- [ ] Redrive/replay semantics are documented for ops
Operations
- [ ] Metrics and alarms exist for duplicate spikes and store failures
- [ ] Logs include dedupe outcomes and business IDs
- [ ] Runbooks cover stale
IN_PROGRESSrecords and replay scenarios
Common mistakes I see (and how I avoid them)
Mistake 1: Using Lambda requestId as the idempotency key
That only identifies the invocation, not the logical request.
Fix: use business operation identity or client-provided idempotency key.
Mistake 2: Assuming FIFO queue means I do not need idempotency
FIFO helps with ordering and deduplication windows, but it does not replace end-to-end idempotency for all side effects and replay paths.
Fix: still make the consumer idempotent.
Mistake 3: Dedupe only at the API layer
Then an async worker downstream duplicates the side effect anyway.
Fix: apply idempotency where side effects happen, especially in SQS/EventBridge consumers.
Mistake 4: No payload validation on key reuse
This can return the wrong cached response and create hidden data integrity issues.
Fix: validate a stable subset of the payload with the idempotency key.
Mistake 5: Too-short TTL
The key expires before retries/redrives finish, so duplicates sneak through.
Fix: pick TTL based on actual operational timelines, not guesswork.
Final thoughts
If I had to summarize production-grade idempotency architecture in one line, it would be this:
Design for duplicate delivery as normal behavior, then make your Lambda handlers safe, deterministic, and observable.
AWS gives us excellent building blocks for this:
- Lambda
- SQS / EventBridge / API Gateway
- DynamoDB conditional writes
- AWS Lambda Powertools idempotency utility
When I combine them intentionally, retries stop being scary and start being a reliability feature instead of a data integrity risk.
If you are building Lambda-driven systems that write to money, inventory, notifications, or customer state, idempotency is not optional. It is a core part of the architecture.
That placement helps readers understand the flow before diving into implementation details.
References
- AWS Lambda Powertools (Python) documentation
- AWS Lambda developer guide
- Amazon SQS developer guide (Lambda event source mappings / retries / partial batch response)
- Amazon EventBridge documentation (retries, targets, replay/archive)
- Amazon DynamoDB documentation (conditional writes, TTL, PITR)

Top comments (0)