Kailash Sankar

Posted on Jun 27

Cache Stampede: A Primer

#redis #systemdesign #performance #webdev

I put Redis in front of a slow SQLite database, ran three app pods behind round-robin load, and hammered one hot key until TTL expiry produced visible origin spikes — then swapped in fixes one at a time and measured the difference. Not a production guide; just the problem, the charts, and honest numbers from a setup small enough to reason about.

The problem, without the jargon pile

You cache expensive reads so most requests never touch the database. That works until a popular key expires. Every client misses at once, they all run the same query, and the database gets a spike it cannot absorb. That is a cache stampede. The generic name is thundering herd — many workers doing the same expensive thing at once.

The averages lie. You can have a 97% cache hit rate and still hammer the origin hundreds of times in 60 seconds, because misses cluster at expiry instead of spreading out.

App → Redis (cache) → on miss → SQLite (~200ms per read)

The fix is never "more cache." It is coordination: who refreshes, when, and whether everyone else waits, reads stale data, or refreshes early.

What I built

Piece	Role
Redis (Docker)	Shared cache — visible to all app instances
3 FastAPI replicas	Round-robin load — multi-pod from the start; scaling servers does not fix cache coordination
FastAPI app	Cache-aside: `GET` Redis → on miss, read SQLite → `SETEX` with 5s TTL
Load generator	50 concurrent clients, 60 seconds, 95% traffic to one item
Metrics	Total DB queries, max DB queries in any 1-second window, latency percentiles
Charts	Time series after each run — DB queries/s and latency/s

Each strategy is a swap-in module (naive, singleflight, swr, xfetch, jitter). Same load script, same TTL, same three pods — compare the charts.

How to read the chart: the top panel is the whole lesson. Naive cache shows flat calm + tall spikes every ~5 seconds. Fixed strategies show a low flat line with occasional single bumps.

Step 1 — Naive cache-aside: the stampede baseline

What I did: Naive cache-aside only — GET → miss → slow DB read → SETEX with 5s TTL. Flush Redis, run load test. Every later strategy compares against this.

Results:

Metric	Value
Cache hit rate	~97%
Total DB queries (60s)	689
Max DB queries in 1s	61
p99 latency	~223 ms

What clicked:

With a 5s TTL over 60s, you'd hope for ~12 refreshes on one hot key. I got hundreds of DB reads — every concurrent miss triggers its own query.
Hit rate hides burst damage. The metric that matters is max origin QPS in 1 second.
All pods see the same expired key. An in-process mutex only protects one process; coordination has to live in shared storage (Redis).
The chart is the signature: periodic needles at TTL expiry.

Step 2 — Single-flight: one refresher per key

Next problem: how do you stop 50 concurrent misses from becoming 50 DB queries?

What I did: Redis distributed lock — SET lock:item:{id} NX EX 10. One winner refreshes; losers wait and retry every 50ms (5s timeout).

Results:

Metric	naive	single-flight
Max DB/s	61	8
Total DB (60s)	689	111
p99	223 ms	213 ms
`wait_hit` responses	—	623

What clicked:

Origin protection works. Spikes flattened.
Cost shifted to coordination — 623 requests waited for the winner (wait_hit). Users blocked ~200ms on empty cache instead of the DB taking 50 parallel hits.
p99 barely moved — one wait is ~one refresh time, not a pile-up of DB queries.

Step 3 — SWR: keep users fast at expiry

Next problem: single-flight protects the DB, but users still wait when the cache is empty after soft expiry. Can we serve old data while refreshing in the background?

What I did: Stale-while-revalidate in Redis — envelope {item, cached_at}. Fresh for 5s, then serve stale for 5s while a background task refreshes (still with single-flight lock). Hard Redis TTL 10s.

Results:

Metric	single-flight	SWR
Max DB/s	8	7
Total DB (60s)	111	111
p99	213 ms	28 ms
`stale` responses	—	1160
`wait_hit`	623	50

What clicked:

Same origin load as single-flight — SWR changes UX, not refresh count.
p99 collapsed because expiry no longer blocks on the refresher. Users got instant stale instead of waiting.
The name comes from HTTP caching (Cache-Control: stale-while-revalidate). Same semantics; I implemented it in Redis, not response headers.

Step 4 — XFetch: refresh before the cliff

Next problem: SWR handles after expiry. Can we refresh before the key goes empty and avoid the cliff entirely?

What I did: Probabilistic early expiration (XFetch) — as cache age approaches TTL, rising probability of a background refresh (with single-flight lock). User always gets the cached value immediately; refresh happens proactively.

Why probability, not "refresh at 4.5s"?

Approach	Under load
Fixed cutoff (age ≥ 4.5s)	Mini-cliff — many requests try to refresh in the same window
XFetch (probability rises with age)	Refresh attempts spread across time; lock ensures one DB read

Results:

Metric	SWR	XFetch
Max DB/s	7	10
Total DB (60s)	111	174
p99	28 ms	21 ms
`early_refresh` triggers	—	6275

What clicked:

More total DB work, fewer synchronized cliffs. Trade smoothness for extra origin reads.
Complements SWR: SWR is post-expiry UX; XFetch tries to stay ahead of expiry.
Tune beta — higher means earlier, more refreshes.

Hot-key recap — same load throughout Steps 1–4 (95% traffic to one item):

Strategy	max DB/s	total DB	p99 ms
naive	61	689	223
single-flight	8	111	213
SWR	7	111	28
XFetch	10	174	21

For one viral key, single-flight (or SWR/XFetch built on top of it) is what moves the needle. That is the main thread of the exercise.

Step 5 — TTL jitter: a different problem (many keys expiring together)

Next problem: Steps 1–4 all hammer one hot key. After deploy, a different stampede shows up: you warm thousands of keys with the same TTL, traffic is spread across them, and they all expire in the same second. Can you smear that cliff without a lock per key?

What jitter does: on each cache write, add random extra TTL — e.g. 5s base + up to 20% → keys live 5–6s instead of all dying at exactly 5s. Cheap; no coordination logic.

When it helps — and when it does not:

Scenario	Jitter useful?	Why
One hot key (Steps 1–4 load)	No	One key still has one expiry moment per cycle
Bulk cache warm after deploy	Yes	Many keys written together → desync their expiries
Cold empty cache	No	Nothing to jitter until keys exist
Need hard origin cap on hot key	No	Use single-flight / SWR, not jitter alone

I ran the same hot-key load with jitter added: max DB/s 57 vs 61 — barely changed. Wrong tool, wrong problem. That confirmed jitter is not a substitute for the coordination in Steps 2–4.

Bulk warm experiment: seed 100 items, populate all keys in one burst (simulated deploy warm), even traffic across items 1–100.

Strategy	max DB/s	seconds >50 DB/s
naive	92	17
jitter (20% spread)	83	11

Modest at 20% — only 0–1 extra seconds on a 5s TTL. I re-ran with 100% spread (TTL 5–10s) to make the chart readable: peak 95 → 45 DB/s, zero seconds above 50 DB/s for jitter. Needle became a hill.

What clicked:

Jitter randomizes when keys disappear across many keys. XFetch randomizes when refresh starts on one key still alive — different layer, easy to confuse.
Jitter is a deploy/warm helper, stacked on top of coordination for hot paths — not a replacement for it.

The stack I'd actually use

For most Redis-in-front-of-DB setups:

1. Shared cache (Redis)
2. Single-flight lock (multi-pod essential)
3. SWR or XFetch on hot keys (business choice)
4. TTL jitter on bulk warms (cheap extra)
5. Lock renewal, 503 + client retry — failure polish

That covers the hot-key path and the bulk-warm case I exercised. It does not replace circuit breakers, bulkheads, CDN caching, or negative caching — adjacent layers for failures and edge traffic.

If you remember only five things

Cache stampede = synchronized misses — often at TTL expiry on a hot key. Fix is coordination, not a bigger cache.
Measure max origin QPS in 1 second — hit rate and average latency lie.
Multi-pod needs a distributed lock — in-process mutex does not help.
Single-flight protects the origin; SWR protects latency — same DB load, very different p99.
Jitter desynchronizes keys; XFetch desynchronizes refresh time — different problems, both use randomness differently.

Closing

The chart did the teaching. Averages said "fine"; the DB-queries-per-second panel said "stampede every five seconds." Single-flight turned needles into a flat line. SWR changed almost nothing on that panel but made latency tell a different story. Jitter only clicked once I stopped hammering one key and warmed a hundred.

git