Imagine a popular store opening its doors at 9 AM sharp. Hundreds of customers lined up outside rush in simultaneously, overwhelming the cashiers and causing chaos. This is exactly what happens in distributed systems—the Thundering Herd Problem—when too many requests hit a shared resource at once.
NORMAL OPERATION (Cache Hit) THUNDERING HERD (Cache Miss Stampede)
Fast path Failure path
┌─────────────┐ ┌─────────────┐
│ Clients │ │ Clients │
│ 10k users │ │ 10k users │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────▼─────┐ Cache Hit ┌──────────────┐ ▼
│ App Server│◄──────────────│ Redis Cache │ ┌──────▼──────┐
│ Node 1 │ │ key=product1 │ │ 10k Cache │
└─────┬─────┘ │ TTL=60s │ │ MISSES │
│ └──────┬───────┘ └──────┬──────┘
│ Cache Miss │ │
▼ ▼ ▼
┌─────▼─────┐ ┌──────────────┐ ┌──────────────┐
│ App Server│ │ Database │ │ Database │
│ Node 2 │◄─── 1 Query ────────│ 1 Query Only │ │ 10k Queries! │
└─────┬─────┘ │ Returns Data │ │ CPU=1000% │
│ └──────┬───────┘ └──────────────┘
│ │ 💥 OVERLOAD
└──────────┬───────────────────────┘
│
┌────▼────┐
│Cache Set│ ← Serves all 10k clients
└─────────┘
What is the Thundering Herd Problem?
The Thundering Herd Problem occurs when numerous clients or processes simultaneously compete for the same shared resource, like a database or cache, creating a sudden traffic spike that overwhelms the system.
Unlike gradual load increases, this is a synchronized burst—think cache keys expiring at the exact same timestamp across millions of requests.
Where It Commonly Occurs
This issue plagues several system components:
- Caching systems: Popular cache entries expire together, triggering mass backend fetches.
- Databases: Multiple app servers hammer the DB after a cache miss.
- Load balancers: Requests flood a single healthy node during failures elsewhere.
- Lock acquisition: Processes race for mutexes on critical sections.
In a typical app architecture, clients query an app server, which checks Redis cache first. Cache hit? Serve instantly. Miss? Fetch from DB and repopulate.
Real-World Example: Cache Expiry Spike
Consider Netflix releasing a hot new show—millions request episode data cached with 60-second TTL. At expiry (say 10,000 req/sec), all clients miss simultaneously:
Normal: Cache serves 10k req/sec at 1ms latency
Expiry moment: 10k DB queries at 100ms each → 5-10x overload
Database connections exhaust, latency jumps to seconds, cascading failures hit the entire app.
Similar spikes occur during IPL match ticket sales in India or Black Friday e-commerce rushes.
Normal Spike vs Thundering Herd
| Aspect | Normal Traffic Spike | Thundering Herd |
|---|---|---|
| Cause | Organic growth (marketing, events) | Synchronized event (TTL expiry, cron jobs) |
| Pattern | Gradual ramp-up | Instant burst |
| Impact | Autoscaling handles | Overwhelms even scaled capacity |
| Duration | Minutes-hours | Seconds (but devastating) |
Key difference: Herd is predictable but synchronized, amplifying tiny windows of vulnerability into outages.
Why Dangerous in Distributed Systems
Clients → App → DB overload → Timeouts → Retries → More DB load → 💥
- Amplification: 1 cache miss → N DB queries (N=concurrent clients).
- Tail latency: Slowest DB query blocks everyone.
- Cascading failure: Overloaded DB slows apps → more timeouts → retry storms.
- Autoscaling lag: Spikes are too brief for new instances to spin up.
In multi-region setups, one region's stampede ripples globally.
System Impacts Breakdown
CPU Overload
- Sudden thread explosions thrash scheduler; context switches skyrocket.
Database Strain
- Connection pools exhaust; query queues balloon → timeouts cascade.
Cache Ineffectiveness
- Becomes useless during stampede—worse than no cache!
Latency Explosion
- P99 jumps 100x; users abandon sessions.
Prevention Techniques
1. Stale-While-Revalidating
- Only one request refreshes the cache; others serve stale data and reuse the result.
2. MUTEX
- Use a distributed lock (e.g., Redis SETNX) so only one request hits DB.
3. Jitter on TTL
- TTL = base + random(0, maxJitter) to avoid synchronized expiry.
4. Probability Early Computation
- Refresh hot keys early based on access frequency / near-expiry.
5. Rate Limiting
- Limit requests per key/user to prevent backend overload.
6: Cache Warming
- Preload hot keys before traffic spikes or deployments.
Real outage example: Facebook's 2010 cache stampede took hours to resolve Link.
Final Thoughts
The Thundering Herd turns "working at scale" into outages without proper safeguards. Master these patterns—staggered TTLs + coalescing + backoff.
Next time your cache expires, remember: one cow is fine, the herd is deadly.

Top comments (0)