In many backend systems, user data is fetched on almost every request. A common assumption is that adding an in-memory cache will improve read performance by any system.
To validate this assumption, I’ve benchmarked 3 approaches of accessing user data from DynamoDB inside the Serverless infrastructure:
- A baseline using AWS Lambda and DynamoDB
- A cache-aside approach using Redis
- AWS Lambda backed by DynamoDB Accelerator (DAX)
I wanted to find out, if you are a CTO of a startup and you are thinking of implementing caching services into your architecture, to see if it’s worth. I expected that the cached approaches will outperform the baseline easily.
Even at 200 requests per second (that’s 12 000 requests per minute!), that assumption didn’t hold.
Experimental Setup
The architecture across the benchmark is as identical as possible — the cheapest available option on AWS region eu-central-1.
Here are the 3 approaches
Baseline (DDB + Lambda)
- Lambda: Python 3.12, 256 MB memory, 10s timeout
- DynamoDB: Pay-per-request billing mode
- Access Pattern: Direct reads from DynamoDB without any caching layer
- Latency: Highest (no cache)
DAX (DynamoDB Accelerator)
- Lambda: Python 3.12, 256 MB memory, 10s timeout
-
DAX Cluster:
- Node type:
dax.t3.small - Replication factor: 1 (single node)
- Deployed in VPC isolated subnets
- Node type:
- Cache: Managed by DAX automatically (default item TTL ~5 minutes, query cache TTL ~5 minutes)
- Access Pattern: Lambda → DAX → DynamoDB
-
Client: Uses
amazondaxPython client available on PIP
Redis (AWS ElastiCache)
- Lambda: Python 3.12, 256 MB memory, 10s timeout
-
Redis Cluster:
- Node type:
cache.t4g.micro - Single node (no automatic failover)
- Deployed in VPC isolated subnets
- Node type:
-
Cache TTL: 30 seconds (configurable via
REDIS_TTL_SECONDS) -
Access Pattern:
- Lambda checks Redis first
- On miss: reads from DynamoDB, stores in Redis with 30s TTL
- On hit: returns cached data directly
-
Client: Standard
redis-pyclient with 1s connection timeout
I’ve tested 2 load levels — one with 50 reads per second (3000 reads per minute), the other 200 reads per second (12000 reads per minute), by using the same payload size, hot/mixed key distribution, and p95 latency as the primary metric. Testing was done with an open-source library k6 with Javascript.
In all 3 approaches, the same data was fetched from the database. Here is an example:
{
"pk": "ITEM#123",
"sk": "META",
"itemId": "123",
"title": "Example Item Title",
"body": "This is the body content of the item",
"updatedAt": 1736467200,
"etag": "a7f8d9e1c2b3a4f5e6d7c8b9a0f1e2d3c4b5a6f7e8d9c0b1a2f3e4d5c6b7a8f9"
}
Results at 50 RPS — Establish the baseline truth
| Access Pattern | p95 Latency (ms) | Avg Latency (ms) | Dropped Iterations | Notes |
|---|---|---|---|---|
| Lambda + DynamoDB (Baseline) | ~63 ms | ~48 ms | 0 | Fast, stable, no bottlenecks |
| Redis (warmup run) | ~68 ms | ~66 ms | 22 | Cache misses + write-back cost |
| Redis (steady state) | ~63 ms | ~48 ms | 0 | Matches baseline, no latency win |
| DAX (single small node) | ~1040 ms | ~957 ms | 19 | Cache saturation, unusable |
Baseline: Lambda + DynamoDB @ 50 RPS
At 50 RPS, the most common and baseline setup by using Lambda + DynamoDB achieved a p95 latency of ~63ms, with no dropped requests. Hot and mixed key access patterns had very similar latency, which indicates that DynamoDB on-demand wasn’t under pressure at all.
Redis: Warmup vs Steady State @ 50 RPS
The Redis results clearly show two very different phases — during the first run, the p95 latency was, on average 66ms. This is expected behavior, as the system asks the Redis cluster if the requested data is already inside it. If it’s not, the Lambda proceeds to query the data inside the DynamoDB. Since the cluster doesn’t have any data cached, this latency is expected.
Once the cache was warm, Redis did it’s job and achieved nearly identical latency to the baseline. However, it did not outperform it.
DAX: Undersized Cache Failure @ 50 RPS
The DAX configuration performed significantly worse than both the baseline and Redis. With a single small node, the DAX cluster became CPU-bound, leading to request queueing and p95 latencies exceeding one second.
This result highlights an often-overlooked risk: a misconfigured or undersized cache can actively degrade performance. DAX is not a drop-in optimization — it requires careful capacity planning. Keep in mind, we are running the benchmark with the weakest instance of DAX available.
Conclusion: 50 RPS
At 50 RPS, the baseline configuration performed great and doesn’t require any additional caching, as it only introduces additional network and service hops, while getting not much in return. Yes, by using Redis, DynamoDB pressure went down, but still wasn’t even near it’s limit.
Results at 200 RPS — The expected crossover that didn’t happen
| Access Pattern | p95 Latency (ms) | Avg Latency (ms) | Dropped Iterations | Notes |
|---|---|---|---|---|
| Lambda + DynamoDB (Baseline) | ~63 ms | ~48 ms | 13 | Stable, scales linearly |
| Redis (warm run) | ~64 ms | ~52 ms | 40 | Cache population under load |
| Redis (steady state) | ~70 ms | ~58 ms | 79 | Slightly worse than baseline |
| DAX (single small node) | ~1050 ms | ~968 ms | 5,399 | Cluster saturation |
Baseline: Lambda + DynamoDB @ 200 RPS
At 200 requests per second, the baseline Lambda + DynamoDB configuration had a nearly identical performance to the 50 RPS run. p95 latency remained around 63 ms, with only a small number of dropped iterations.
This makes a very strong case that the baseline is a very optimized choice by default.
Redis @ 200 RPS: Warm vs Steady State
As with the 50 RPS test, Redis exhibited two distinct phases. During the initial run, cache misses caused additional latency and tail spikes as items were fetched from DynamoDB and written back to Redis.
Once the cache was warm, Redis stabilized but did not outperform the baseline. In steady state, p95 latency increased slightly to ~70 ms, and the number of dropped iterations was higher than with DynamoDB alone.
DAX @ 200 RPS: Saturation Under Load
At 200 RPS, the DAX configuration began to collapse. Effective throughput dropped well below the target rate, p95 latency exceeded one second, and thousands of iterations were dropped.
This behavior confirms that DAX is highly sensitive to sizing — smaller instances simply do not provide any benefit.
Conclusion: 200RPS
Even at 200 RPS, the dominant cost in this system was not database access but network and managed service overhead. Adding a cache did not remove that cost — it added to it, as you still need to pay for the on-demand or serverless Redis instance, depending on your choice.
What These Results Actually Prove
With these benchmarks, we’ve proven:
- DynamoDB on-demand scales extremely well for simple reads
- Redis reduces pressure, not latency
- Cache warmup matters
- Misconfigured DAX is worse than no cache
- Latency optimization and scaling optimization are different problems
Conclusion & Lessons Learned
The results from both the 50 RPS and 200 RPS benchmarks lead to a clear and somewhat counterintuitive conclusion: for this workload, Lambda backed by DynamoDB on-demand was already fast enough that adding a cache did not improve user-visible latency.
At both load levels, DynamoDB was not the bottleneck. End-to-end latency was dominated by network distance and managed service overhead rather than database access time. As a result, introducing Redis added an extra network hop and client-side overhead without removing the dominant cost in the request path.
Redis still served a purpose, but not the one I initially expected. It reduced pressure on DynamoDB and flattened backend load, which can be valuable for cost control and future scaling. What it did not do — at least at these traffic levels — was make requests faster.
DynamoDB Accelerator, based on these results, taught us a totally different lesson. When undersized, it simply doesn’t help in anyway, as it got saturated very quickly which caused increased latency and dropped requests. It requires careful capacity planning and workflow understanding.
And maybe the biggest lesson while doing this experiment is the understanding that doing a measurement run before optimizing is incredibly important. Even though it makes sense to cache the data to save the resources and ease the load on the backend, it doesn’t cause the end-to-end latency to lower.
It seems that this workflow is just too simple to see the caching benefits — if you have a time-costly method inside your code and the data can be cached, most probably caching would be worth testing.
In short: don’t cache because it feels right—cache because the data proves you need it.
Top comments (2)
Informative!
thank you for leaving a comment! i appreciate it very much :)