Last month, I watched my thirdparty API costs triple overnight. The strangest part? My DynamoDB cache was working perfectly or so I thought.
Turns out, I'd built a textbook race condition into my serverless architecture. The kind that only shows up under load, when it's expensive to debug.
The Setup
Standard serverless caching pattern: Lambda checks DynamoDB, returns cached data if present, otherwise hits an external API and writes the result back. Clean separation of concerns, scales to zero, the usual AWS promise.
In isolation, every request behaved exactly right. Cache miss triggered one API call, wrote to DynamoDB, subsequent requests got cache hits. Perfect.
Where It Breaks
Hot keys destroy this pattern. When 20 concurrent requests ask for the same uncached item, you'd expect one API call and 19 cache hits. That's not what happens.
Here's the actual sequence:
- Request 1 reads DynamoDB (empty), prepares API call
- Request 2 reads DynamoDB before Request 1 writes (still empty), prepares API call
- Requests 3-20 do the same
You just paid for 20 API calls to fetch identical data. Zero cache benefit.
I tested this with controlled concurrency 20 parallel Lambda invocations requesting the same key. Every single invocation hit the external API. The race window between read and write is small, but it's large enough.
DynamoDB Conditional Updates
The fix isn't complicated, but it requires thinking about state differently. You can't rely on read then write logic because that gap is where the race lives.
DynamoDB's ConditionExpression parameter makes writes atomic. You specify conditions that must be true for the write to succeed. If they're not, you get ConditionalCheckFailedException immediately.
Implementation Pattern
I use a three-state flow: Pending → Processing → Completed.
When a Lambda detects a cache miss, it attempts an atomic state transition:
# Only one Lambda will succeed
try:
table.update_item(
Key={'id': item_id},
UpdateExpression='SET #status = :processing',
ConditionExpression='attribute_not_exists(id) OR #status = :pending',
ExpressionAttributeNames={'#status': 'status'},
ExpressionAttributeValues={
':processing': 'Processing',
':pending': 'Pending'
}
)
# Winner: call external API
data = fetch_from_external_api(item_id)
# Write final result
table.update_item(
Key={'id': item_id},
UpdateExpression='SET #status = :completed, #data = :data',
ExpressionAttributeNames={
'#status': 'status',
'#data': 'data'
},
ExpressionAttributeValues={
':completed': 'Completed',
':data': data
}
)
return data
except ClientError as e:
if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
# Loser: wait for winner to finish
time.sleep(0.1) # Simple backoff
return get_from_cache(item_id)
raise
Only one Lambda succeeds. The others catch ConditionalCheckFailedException and know someone else claimed the work. They can either return a retry message or implement exponential backoff until the data is ready.
Results
Same test, same 20 concurrent requests:
- External API calls: 1
- Cached responses: 19
The race condition is gone. DynamoDB handles the coordination because that's what it's built for atomic operations at scale.
Key Takeaways
If you're writing to DynamoDB based on a previous read without conditional expressions, you have a race condition. It might not matter at low traffic, but it will surface under load.
The pattern is straightforward:
- Move concurrency control into the database layer
- Use
ConditionExpressionfor atomic state transitions - Handle
ConditionalCheckFailedExceptionin your Lambda code
A few extra lines of code eliminate an entire class of expensive bugs.
Have you run into similar race conditions in your serverless architecture? Drop a comment below. I'd love to hear how you've handled concurrent cache misses.

Top comments (0)