DEV Community

Cover image for Cache Stampede: A Primer
Kailash Sankar
Kailash Sankar

Posted on

Cache Stampede: A Primer

I put Redis in front of a slow SQLite database, ran three app pods behind round-robin load, and hammered one hot key until TTL expiry produced visible origin spikes — then swapped in fixes one at a time and measured the difference. Not a production guide; just the problem, the charts, and honest numbers from a setup small enough to reason about.


The problem, without the jargon pile

You cache expensive reads so most requests never touch the database. That works until a popular key expires. Every client misses at once, they all run the same query, and the database gets a spike it cannot absorb. That is a cache stampede. The generic name is thundering herd — many workers doing the same expensive thing at once.

The averages lie. You can have a 97% cache hit rate and still hammer the origin hundreds of times in 60 seconds, because misses cluster at expiry instead of spreading out.

App → Redis (cache) → on miss → SQLite (~200ms per read)
Enter fullscreen mode Exit fullscreen mode

The fix is never "more cache." It is coordination: who refreshes, when, and whether everyone else waits, reads stale data, or refreshes early.


What I built

Piece Role
Redis (Docker) Shared cache — visible to all app instances
3 FastAPI replicas Round-robin load — multi-pod from the start; scaling servers does not fix cache coordination
FastAPI app Cache-aside: GET Redis → on miss, read SQLite → SETEX with 5s TTL
Load generator 50 concurrent clients, 60 seconds, 95% traffic to one item
Metrics Total DB queries, max DB queries in any 1-second window, latency percentiles
Charts Time series after each run — DB queries/s and latency/s

Each strategy is a swap-in module (naive, singleflight, swr, xfetch, jitter). Same load script, same TTL, same three pods — compare the charts.

How to read the chart: the top panel is the whole lesson. Naive cache shows flat calm + tall spikes every ~5 seconds. Fixed strategies show a low flat line with occasional single bumps.


Step 1 — Naive cache-aside: the stampede baseline

What I did: Naive cache-aside only — GET → miss → slow DB read → SETEX with 5s TTL. Flush Redis, run load test. Every later strategy compares against this.

Results:

Metric Value
Cache hit rate ~97%
Total DB queries (60s) 689
Max DB queries in 1s 61
p99 latency ~223 ms

What clicked:

  • With a 5s TTL over 60s, you'd hope for ~12 refreshes on one hot key. I got hundreds of DB reads — every concurrent miss triggers its own query.
  • Hit rate hides burst damage. The metric that matters is max origin QPS in 1 second.
  • All pods see the same expired key. An in-process mutex only protects one process; coordination has to live in shared storage (Redis).
  • The chart is the signature: periodic needles at TTL expiry.

Naive cache: DB spikes every ~5 seconds at TTL expiry


Step 2 — Single-flight: one refresher per key

Next problem: how do you stop 50 concurrent misses from becoming 50 DB queries?

What I did: Redis distributed lock — SET lock:item:{id} NX EX 10. One winner refreshes; losers wait and retry every 50ms (5s timeout).

Results:

Metric naive single-flight
Max DB/s 61 8
Total DB (60s) 689 111
p99 223 ms 213 ms
wait_hit responses 623

What clicked:

  • Origin protection works. Spikes flattened.
  • Cost shifted to coordination — 623 requests waited for the winner (wait_hit). Users blocked ~200ms on empty cache instead of the DB taking 50 parallel hits.
  • p99 barely moved — one wait is ~one refresh time, not a pile-up of DB queries.

Single-flight: origin spikes flattened


Step 3 — SWR: keep users fast at expiry

Next problem: single-flight protects the DB, but users still wait when the cache is empty after soft expiry. Can we serve old data while refreshing in the background?

What I did: Stale-while-revalidate in Redis — envelope {item, cached_at}. Fresh for 5s, then serve stale for 5s while a background task refreshes (still with single-flight lock). Hard Redis TTL 10s.

Results:

Metric single-flight SWR
Max DB/s 8 7
Total DB (60s) 111 111
p99 213 ms 28 ms
stale responses 1160
wait_hit 623 50

What clicked:

  • Same origin load as single-flight — SWR changes UX, not refresh count.
  • p99 collapsed because expiry no longer blocks on the refresher. Users got instant stale instead of waiting.
  • The name comes from HTTP caching (Cache-Control: stale-while-revalidate). Same semantics; I implemented it in Redis, not response headers.

SWR: same low DB load, far lower latency spikes


Step 4 — XFetch: refresh before the cliff

Next problem: SWR handles after expiry. Can we refresh before the key goes empty and avoid the cliff entirely?

What I did: Probabilistic early expiration (XFetch) — as cache age approaches TTL, rising probability of a background refresh (with single-flight lock). User always gets the cached value immediately; refresh happens proactively.

Why probability, not "refresh at 4.5s"?

Approach Under load
Fixed cutoff (age ≥ 4.5s) Mini-cliff — many requests try to refresh in the same window
XFetch (probability rises with age) Refresh attempts spread across time; lock ensures one DB read

Results:

Metric SWR XFetch
Max DB/s 7 10
Total DB (60s) 111 174
p99 28 ms 21 ms
early_refresh triggers 6275

What clicked:

  • More total DB work, fewer synchronized cliffs. Trade smoothness for extra origin reads.
  • Complements SWR: SWR is post-expiry UX; XFetch tries to stay ahead of expiry.
  • Tune beta — higher means earlier, more refreshes.

XFetch: proactive refresh, flat DB load

Hot-key recap — same load throughout Steps 1–4 (95% traffic to one item):

Strategy max DB/s total DB p99 ms
naive 61 689 223
single-flight 8 111 213
SWR 7 111 28
XFetch 10 174 21

For one viral key, single-flight (or SWR/XFetch built on top of it) is what moves the needle. That is the main thread of the exercise.


Step 5 — TTL jitter: a different problem (many keys expiring together)

Next problem: Steps 1–4 all hammer one hot key. After deploy, a different stampede shows up: you warm thousands of keys with the same TTL, traffic is spread across them, and they all expire in the same second. Can you smear that cliff without a lock per key?

What jitter does: on each cache write, add random extra TTL — e.g. 5s base + up to 20% → keys live 5–6s instead of all dying at exactly 5s. Cheap; no coordination logic.

When it helps — and when it does not:

Scenario Jitter useful? Why
One hot key (Steps 1–4 load) No One key still has one expiry moment per cycle
Bulk cache warm after deploy Yes Many keys written together → desync their expiries
Cold empty cache No Nothing to jitter until keys exist
Need hard origin cap on hot key No Use single-flight / SWR, not jitter alone

I ran the same hot-key load with jitter added: max DB/s 57 vs 61 — barely changed. Wrong tool, wrong problem. That confirmed jitter is not a substitute for the coordination in Steps 2–4.

Bulk warm experiment: seed 100 items, populate all keys in one burst (simulated deploy warm), even traffic across items 1–100.

Strategy max DB/s seconds >50 DB/s
naive 92 17
jitter (20% spread) 83 11

Modest at 20% — only 0–1 extra seconds on a 5s TTL. I re-ran with 100% spread (TTL 5–10s) to make the chart readable: peak 95 → 45 DB/s, zero seconds above 50 DB/s for jitter. Needle became a hill.

Naive bulk warm: sharp periodic spikes

Jitter bulk warm: spikes smeared into rolling hills

What clicked:

  • Jitter randomizes when keys disappear across many keys. XFetch randomizes when refresh starts on one key still alive — different layer, easy to confuse.
  • Jitter is a deploy/warm helper, stacked on top of coordination for hot paths — not a replacement for it.

The stack I'd actually use

For most Redis-in-front-of-DB setups:

1. Shared cache (Redis)
2. Single-flight lock (multi-pod essential)
3. SWR or XFetch on hot keys (business choice)
4. TTL jitter on bulk warms (cheap extra)
5. Lock renewal, 503 + client retry — failure polish
Enter fullscreen mode Exit fullscreen mode

That covers the hot-key path and the bulk-warm case I exercised. It does not replace circuit breakers, bulkheads, CDN caching, or negative caching — adjacent layers for failures and edge traffic.


If you remember only five things

  1. Cache stampede = synchronized misses — often at TTL expiry on a hot key. Fix is coordination, not a bigger cache.
  2. Measure max origin QPS in 1 second — hit rate and average latency lie.
  3. Multi-pod needs a distributed lock — in-process mutex does not help.
  4. Single-flight protects the origin; SWR protects latency — same DB load, very different p99.
  5. Jitter desynchronizes keys; XFetch desynchronizes refresh time — different problems, both use randomness differently.

Closing

The chart did the teaching. Averages said "fine"; the DB-queries-per-second panel said "stampede every five seconds." Single-flight turned needles into a flat line. SWR changed almost nothing on that panel but made latency tell a different story. Jitter only clicked once I stopped hammering one key and warmed a hundred.

git

Top comments (0)