Matia Rašetina

Posted on Feb 2

Redis vs DynamoDB vs DAX: I Benchmarked AWS Caching Performance (The Results Were Unexpected)

#webdev #aws #programming #python

In many backend systems, user data is fetched on almost every request. A common assumption is that adding an in-memory cache will improve read performance by any system.

To validate this assumption, I’ve benchmarked 3 approaches of accessing user data from DynamoDB inside the Serverless infrastructure:

A baseline using AWS Lambda and DynamoDB
A cache-aside approach using Redis
AWS Lambda backed by DynamoDB Accelerator (DAX)

I wanted to find out, if you are a CTO of a startup and you are thinking of implementing caching services into your architecture, to see if it’s worth. I expected that the cached approaches will outperform the baseline easily.

Even at 200 requests per second (that’s 12 000 requests per minute!), that assumption didn’t hold.

Experimental Setup

The architecture across the benchmark is as identical as possible — the cheapest available option on AWS region eu-central-1.

Here are the 3 approaches

Baseline (DDB + Lambda)

Lambda: Python 3.12, 256 MB memory, 10s timeout
DynamoDB: Pay-per-request billing mode
Access Pattern: Direct reads from DynamoDB without any caching layer
Latency: Highest (no cache)

DAX (DynamoDB Accelerator)

Lambda: Python 3.12, 256 MB memory, 10s timeout
DAX Cluster:
- Node type: dax.t3.small
- Replication factor: 1 (single node)
- Deployed in VPC isolated subnets
Cache: Managed by DAX automatically (default item TTL ~5 minutes, query cache TTL ~5 minutes)
Access Pattern: Lambda → DAX → DynamoDB
Client: Uses amazondax Python client available on PIP

Redis (AWS ElastiCache)

Lambda: Python 3.12, 256 MB memory, 10s timeout
Redis Cluster:
- Node type: cache.t4g.micro
- Single node (no automatic failover)
- Deployed in VPC isolated subnets
Cache TTL: 30 seconds (configurable via REDIS_TTL_SECONDS)
Access Pattern:
- Lambda checks Redis first
- On miss: reads from DynamoDB, stores in Redis with 30s TTL
- On hit: returns cached data directly
Client: Standard redis-py client with 1s connection timeout

I’ve tested 2 load levels — one with 50 reads per second (3000 reads per minute), the other 200 reads per second (12000 reads per minute), by using the same payload size, hot/mixed key distribution, and p95 latency as the primary metric. Testing was done with an open-source library k6 with Javascript.

In all 3 approaches, the same data was fetched from the database. Here is an example:

{
  "pk": "ITEM#123",
  "sk": "META",
  "itemId": "123",
  "title": "Example Item Title",
  "body": "This is the body content of the item",
  "updatedAt": 1736467200,
  "etag": "a7f8d9e1c2b3a4f5e6d7c8b9a0f1e2d3c4b5a6f7e8d9c0b1a2f3e4d5c6b7a8f9"
}

Results at 50 RPS — Establish the baseline truth

Access Pattern	p95 Latency (ms)	Avg Latency (ms)	Dropped Iterations	Notes
Lambda + DynamoDB (Baseline)	~63 ms	~48 ms	0	Fast, stable, no bottlenecks
Redis (warmup run)	~68 ms	~66 ms	22	Cache misses + write-back cost
Redis (steady state)	~63 ms	~48 ms	0	Matches baseline, no latency win
DAX (single small node)	~1040 ms	~957 ms	19	Cache saturation, unusable

Baseline: Lambda + DynamoDB @ 50 RPS

At 50 RPS, the most common and baseline setup by using Lambda + DynamoDB achieved a p95 latency of ~63ms, with no dropped requests. Hot and mixed key access patterns had very similar latency, which indicates that DynamoDB on-demand wasn’t under pressure at all.

Redis: Warmup vs Steady State @ 50 RPS

The Redis results clearly show two very different phases — during the first run, the p95 latency was, on average 66ms. This is expected behavior, as the system asks the Redis cluster if the requested data is already inside it. If it’s not, the Lambda proceeds to query the data inside the DynamoDB. Since the cluster doesn’t have any data cached, this latency is expected.

Once the cache was warm, Redis did it’s job and achieved nearly identical latency to the baseline. However, it did not outperform it.

DAX: Undersized Cache Failure @ 50 RPS

The DAX configuration performed significantly worse than both the baseline and Redis. With a single small node, the DAX cluster became CPU-bound, leading to request queueing and p95 latencies exceeding one second.

This result highlights an often-overlooked risk: a misconfigured or undersized cache can actively degrade performance. DAX is not a drop-in optimization — it requires careful capacity planning. Keep in mind, we are running the benchmark with the weakest instance of DAX available.

Conclusion: 50 RPS

At 50 RPS, the baseline configuration performed great and doesn’t require any additional caching, as it only introduces additional network and service hops, while getting not much in return. Yes, by using Redis, DynamoDB pressure went down, but still wasn’t even near it’s limit.

Results at 200 RPS — The expected crossover that didn’t happen

Access Pattern	p95 Latency (ms)	Avg Latency (ms)	Dropped Iterations	Notes
Lambda + DynamoDB (Baseline)	~63 ms	~48 ms	13	Stable, scales linearly
Redis (warm run)	~64 ms	~52 ms	40	Cache population under load
Redis (steady state)	~70 ms	~58 ms	79	Slightly worse than baseline
DAX (single small node)	~1050 ms	~968 ms	5,399	Cluster saturation

Baseline: Lambda + DynamoDB @ 200 RPS

At 200 requests per second, the baseline Lambda + DynamoDB configuration had a nearly identical performance to the 50 RPS run. p95 latency remained around 63 ms, with only a small number of dropped iterations.

This makes a very strong case that the baseline is a very optimized choice by default.

Redis @ 200 RPS: Warm vs Steady State

As with the 50 RPS test, Redis exhibited two distinct phases. During the initial run, cache misses caused additional latency and tail spikes as items were fetched from DynamoDB and written back to Redis.

Once the cache was warm, Redis stabilized but did not outperform the baseline. In steady state, p95 latency increased slightly to ~70 ms, and the number of dropped iterations was higher than with DynamoDB alone.

DAX @ 200 RPS: Saturation Under Load

At 200 RPS, the DAX configuration began to collapse. Effective throughput dropped well below the target rate, p95 latency exceeded one second, and thousands of iterations were dropped.

This behavior confirms that DAX is highly sensitive to sizing — smaller instances simply do not provide any benefit.

Conclusion: 200RPS

Even at 200 RPS, the dominant cost in this system was not database access but network and managed service overhead. Adding a cache did not remove that cost — it added to it, as you still need to pay for the on-demand or serverless Redis instance, depending on your choice.

What These Results Actually Prove

With these benchmarks, we’ve proven:

DynamoDB on-demand scales extremely well for simple reads
Redis reduces pressure, not latency
Cache warmup matters
Misconfigured DAX is worse than no cache
Latency optimization and scaling optimization are different problems

Conclusion & Lessons Learned

The results from both the 50 RPS and 200 RPS benchmarks lead to a clear and somewhat counterintuitive conclusion: for this workload, Lambda backed by DynamoDB on-demand was already fast enough that adding a cache did not improve user-visible latency.

At both load levels, DynamoDB was not the bottleneck. End-to-end latency was dominated by network distance and managed service overhead rather than database access time. As a result, introducing Redis added an extra network hop and client-side overhead without removing the dominant cost in the request path.

Redis still served a purpose, but not the one I initially expected. It reduced pressure on DynamoDB and flattened backend load, which can be valuable for cost control and future scaling. What it did not do — at least at these traffic levels — was make requests faster.

DynamoDB Accelerator, based on these results, taught us a totally different lesson. When undersized, it simply doesn’t help in anyway, as it got saturated very quickly which caused increased latency and dropped requests. It requires careful capacity planning and workflow understanding.

And maybe the biggest lesson while doing this experiment is the understanding that doing a measurement run before optimizing is incredibly important. Even though it makes sense to cache the data to save the resources and ease the load on the backend, it doesn’t cause the end-to-end latency to lower.

It seems that this workflow is just too simple to see the caching benefits — if you have a time-costly method inside your code and the data can be cached, most probably caching would be worth testing.

In short: don’t cache because it feels right—cache because the data proves you need it.

Top comments (3)

Martijn Assie • Feb 3

Great benchmark, super clear and refreshingly honest!!
I really like how you separate latency problems from scaling problems, that nuance is often missing...
One strong takeaway here is to always measure end-to-end before adding infrastructure, caches feel obvious but they often just add hops!
This is a great reminder that “simple and boring” can already be the optimal architecture.

ANIRUDDHA ADAK • Feb 2

Informative!

Matia Rašetina • Feb 2

thank you for leaving a comment! i appreciate it very much :)