I put Redis in front of a slow SQLite database, ran three app pods behind round-robin load, and hammered one hot key until TTL expiry produced visible origin spikes — then swapped in fixes one at a time and measured the difference. Not a production guide; just the problem, the charts, and honest numbers from a setup small enough to reason about.
The problem, without the jargon pile
You cache expensive reads so most requests never touch the database. That works until a popular key expires. Every client misses at once, they all run the same query, and the database gets a spike it cannot absorb. That is a cache stampede. The generic name is thundering herd — many workers doing the same expensive thing at once.
The averages lie. You can have a 97% cache hit rate and still hammer the origin hundreds of times in 60 seconds, because misses cluster at expiry instead of spreading out.
App → Redis (cache) → on miss → SQLite (~200ms per read)
The fix is never "more cache." It is coordination: who refreshes, when, and whether everyone else waits, reads stale data, or refreshes early.
What I built
| Piece | Role |
|---|---|
| Redis (Docker) | Shared cache — visible to all app instances |
| 3 FastAPI replicas | Round-robin load — multi-pod from the start; scaling servers does not fix cache coordination |
| FastAPI app | Cache-aside: GET Redis → on miss, read SQLite → SETEX with 5s TTL |
| Load generator | 50 concurrent clients, 60 seconds, 95% traffic to one item |
| Metrics | Total DB queries, max DB queries in any 1-second window, latency percentiles |
| Charts | Time series after each run — DB queries/s and latency/s |
Each strategy is a swap-in module (naive, singleflight, swr, xfetch, jitter). Same load script, same TTL, same three pods — compare the charts.
How to read the chart: the top panel is the whole lesson. Naive cache shows flat calm + tall spikes every ~5 seconds. Fixed strategies show a low flat line with occasional single bumps.
Step 1 — Naive cache-aside: the stampede baseline
What I did: Naive cache-aside only — GET → miss → slow DB read → SETEX with 5s TTL. Flush Redis, run load test. Every later strategy compares against this.
Results:
| Metric | Value |
|---|---|
| Cache hit rate | ~97% |
| Total DB queries (60s) | 689 |
| Max DB queries in 1s | 61 |
| p99 latency | ~223 ms |
What clicked:
- With a 5s TTL over 60s, you'd hope for ~12 refreshes on one hot key. I got hundreds of DB reads — every concurrent miss triggers its own query.
- Hit rate hides burst damage. The metric that matters is max origin QPS in 1 second.
- All pods see the same expired key. An in-process mutex only protects one process; coordination has to live in shared storage (Redis).
- The chart is the signature: periodic needles at TTL expiry.
Step 2 — Single-flight: one refresher per key
Next problem: how do you stop 50 concurrent misses from becoming 50 DB queries?
What I did: Redis distributed lock — SET lock:item:{id} NX EX 10. One winner refreshes; losers wait and retry every 50ms (5s timeout).
Results:
| Metric | naive | single-flight |
|---|---|---|
| Max DB/s | 61 | 8 |
| Total DB (60s) | 689 | 111 |
| p99 | 223 ms | 213 ms |
wait_hit responses |
— | 623 |
What clicked:
- Origin protection works. Spikes flattened.
- Cost shifted to coordination — 623 requests waited for the winner (
wait_hit). Users blocked ~200ms on empty cache instead of the DB taking 50 parallel hits. - p99 barely moved — one wait is ~one refresh time, not a pile-up of DB queries.
Step 3 — SWR: keep users fast at expiry
Next problem: single-flight protects the DB, but users still wait when the cache is empty after soft expiry. Can we serve old data while refreshing in the background?
What I did: Stale-while-revalidate in Redis — envelope {item, cached_at}. Fresh for 5s, then serve stale for 5s while a background task refreshes (still with single-flight lock). Hard Redis TTL 10s.
Results:
| Metric | single-flight | SWR |
|---|---|---|
| Max DB/s | 8 | 7 |
| Total DB (60s) | 111 | 111 |
| p99 | 213 ms | 28 ms |
stale responses |
— | 1160 |
wait_hit |
623 | 50 |
What clicked:
- Same origin load as single-flight — SWR changes UX, not refresh count.
- p99 collapsed because expiry no longer blocks on the refresher. Users got instant stale instead of waiting.
- The name comes from HTTP caching (
Cache-Control: stale-while-revalidate). Same semantics; I implemented it in Redis, not response headers.
Step 4 — XFetch: refresh before the cliff
Next problem: SWR handles after expiry. Can we refresh before the key goes empty and avoid the cliff entirely?
What I did: Probabilistic early expiration (XFetch) — as cache age approaches TTL, rising probability of a background refresh (with single-flight lock). User always gets the cached value immediately; refresh happens proactively.
Why probability, not "refresh at 4.5s"?
| Approach | Under load |
|---|---|
| Fixed cutoff (age ≥ 4.5s) | Mini-cliff — many requests try to refresh in the same window |
| XFetch (probability rises with age) | Refresh attempts spread across time; lock ensures one DB read |
Results:
| Metric | SWR | XFetch |
|---|---|---|
| Max DB/s | 7 | 10 |
| Total DB (60s) | 111 | 174 |
| p99 | 28 ms | 21 ms |
early_refresh triggers |
— | 6275 |
What clicked:
- More total DB work, fewer synchronized cliffs. Trade smoothness for extra origin reads.
- Complements SWR: SWR is post-expiry UX; XFetch tries to stay ahead of expiry.
- Tune
beta— higher means earlier, more refreshes.
Hot-key recap — same load throughout Steps 1–4 (95% traffic to one item):
| Strategy | max DB/s | total DB | p99 ms |
|---|---|---|---|
| naive | 61 | 689 | 223 |
| single-flight | 8 | 111 | 213 |
| SWR | 7 | 111 | 28 |
| XFetch | 10 | 174 | 21 |
For one viral key, single-flight (or SWR/XFetch built on top of it) is what moves the needle. That is the main thread of the exercise.
Step 5 — TTL jitter: a different problem (many keys expiring together)
Next problem: Steps 1–4 all hammer one hot key. After deploy, a different stampede shows up: you warm thousands of keys with the same TTL, traffic is spread across them, and they all expire in the same second. Can you smear that cliff without a lock per key?
What jitter does: on each cache write, add random extra TTL — e.g. 5s base + up to 20% → keys live 5–6s instead of all dying at exactly 5s. Cheap; no coordination logic.
When it helps — and when it does not:
| Scenario | Jitter useful? | Why |
|---|---|---|
| One hot key (Steps 1–4 load) | No | One key still has one expiry moment per cycle |
| Bulk cache warm after deploy | Yes | Many keys written together → desync their expiries |
| Cold empty cache | No | Nothing to jitter until keys exist |
| Need hard origin cap on hot key | No | Use single-flight / SWR, not jitter alone |
I ran the same hot-key load with jitter added: max DB/s 57 vs 61 — barely changed. Wrong tool, wrong problem. That confirmed jitter is not a substitute for the coordination in Steps 2–4.
Bulk warm experiment: seed 100 items, populate all keys in one burst (simulated deploy warm), even traffic across items 1–100.
| Strategy | max DB/s | seconds >50 DB/s |
|---|---|---|
| naive | 92 | 17 |
| jitter (20% spread) | 83 | 11 |
Modest at 20% — only 0–1 extra seconds on a 5s TTL. I re-ran with 100% spread (TTL 5–10s) to make the chart readable: peak 95 → 45 DB/s, zero seconds above 50 DB/s for jitter. Needle became a hill.
What clicked:
- Jitter randomizes when keys disappear across many keys. XFetch randomizes when refresh starts on one key still alive — different layer, easy to confuse.
- Jitter is a deploy/warm helper, stacked on top of coordination for hot paths — not a replacement for it.
The stack I'd actually use
For most Redis-in-front-of-DB setups:
1. Shared cache (Redis)
2. Single-flight lock (multi-pod essential)
3. SWR or XFetch on hot keys (business choice)
4. TTL jitter on bulk warms (cheap extra)
5. Lock renewal, 503 + client retry — failure polish
That covers the hot-key path and the bulk-warm case I exercised. It does not replace circuit breakers, bulkheads, CDN caching, or negative caching — adjacent layers for failures and edge traffic.
If you remember only five things
- Cache stampede = synchronized misses — often at TTL expiry on a hot key. Fix is coordination, not a bigger cache.
- Measure max origin QPS in 1 second — hit rate and average latency lie.
- Multi-pod needs a distributed lock — in-process mutex does not help.
- Single-flight protects the origin; SWR protects latency — same DB load, very different p99.
- Jitter desynchronizes keys; XFetch desynchronizes refresh time — different problems, both use randomness differently.
Closing
The chart did the teaching. Averages said "fine"; the DB-queries-per-second panel said "stampede every five seconds." Single-flight turned needles into a flat line. SWR changed almost nothing on that panel but made latency tell a different story. Jitter only clicked once I stopped hammering one key and warmed a hundred.






Top comments (0)