Rhytham Negi

Posted on Feb 26

Understanding the Thundering Herd Problem

#architecture #distributedsystems #performance #systemdesign

Imagine a quick commerce app like Zepto, Blinkit, or Instacart announcing a “10-minute Mega Sale – 70% OFF on iPhones” starting exactly at 7:00 PM.

At 7:00:00 PM sharp, lakhs of users tap Buy Now at the same second.

Servers spike. Databases choke. Orders fail. Payments timeout.

This is the Thundering Herd Problem.

What Is the Thundering Herd Problem?

The Thundering Herd Problem happens when a large number of users or processes try to access the same resource at the exact same time.

It’s not just high traffic.
It’s synchronized traffic.

Think of it like:

A normal sale → People walk into a store gradually.
A flash drop at a fixed second → Everyone breaks the door together.

That sudden, coordinated rush is the problem.

Where It Happens in Quick Commerce

1. Flash Sales & Limited Stock Drops

Example: 1,000 PlayStations go live at 7:00 PM.

At that exact moment:

200,000 users refresh the product page.
All of them check stock simultaneously.
All of them try to lock inventory.
All of them hit payment APIs.

Result:

Inventory service crashes.
DB connection pool gets exhausted.
Payment retries multiply load.
Orders fail randomly.

2. Cache Expiry During Peak Hours

Let’s say the “iPhone Deal” product page is cached for 60 seconds.

During those 60 seconds:

Cache serves 20,000 requests per second.
Everything is smooth.

At 60 seconds:

Cache expires.
20,000 requests instantly miss the cache.
All hit the database at once.

Instead of 1 DB query, you now have 20,000 identical DB queries.

This is called cache stampede (another name for thundering herd).

3. Order Status Polling

After placing an order, users keep refreshing:

“Is it packed?”
“Is it out for delivery?”
“Where is my rider?”

If 50,000 users poll the same tracking service every 2 seconds, the backend gets hammered continuously.

Why It’s Dangerous

A thundering herd causes a chain reaction:

1️⃣ Amplification

1 cache miss → 10,000 database calls.

2️⃣ Cascading Failures

DB slows → API times out → Clients retry → More load → System collapses.

3️⃣ Autoscaling Is Too Slow

Autoscaling takes minutes.
A herd spike happens in seconds.

By the time new servers start, the system is already down.

Normal Traffic Spike vs Thundering Herd

Normal Spike	Thundering Herd
Gradual increase	Instant burst
Marketing campaign	Flash drop / TTL expiry
Auto-scaling handles it	System collapses before scaling
Predictable pattern	Synchronized chaos

How Quick Commerce Apps Prevent It

Now let’s look at practical solutions used by companies like Amazon and major grocery delivery platforms.

1. Request Coalescing (One Does the Work, Others Wait)

Instead of allowing 20,000 users to fetch the same product data:

First request goes to DB.
Other 19,999 wait.
When result returns → all get the same response.

Result:

1 DB query instead of 20,000.
Massive load reduction.

Simple but extremely powerful.

2. Cache Locking (Distributed Mutex)

When cache expires:

First server acquires a lock.
Only that server rebuilds cache.
Others either:
- Wait, or
- Serve stale data temporarily.

This prevents duplicate recomputation.

3. Add Jitter to Cache Expiry

Bad:

TTL = 60 seconds

All keys expire together → crash.

Better:

TTL = 60 + random(0–30 seconds)

Now:

Some expire at 61s
Some at 75s
Some at 88s

Load spreads out naturally.

4. Probabilistic Early Refresh

Before cache expires:

Some servers refresh it early (randomly).
By the time TTL hits zero, cache is already warm.

No sudden spike.

5. Exponential Backoff with Jitter (For Retries)

If payment API fails:

Bad retry:

Retry after 1s
Retry after 2s
Retry after 4s

All users retry at same intervals → new spike.

Better retry:

Retry after random(1–2s)
Retry after random(2–4s)
Retry after random(4–8s)

This spreads retries evenly.

6. Virtual Waiting Rooms (Traffic Shaping)

Used in extreme cases (concert tickets, iPhone drops).

Instead of letting 200,000 users hit inventory at once:

Admit 2,000 users per minute.
Others wait in queue.

Spike becomes a smooth line.

Many large platforms, including Ticketmaster, use this approach during high-demand events.

What Actually Fails During a Stampede

When herd hits:

CPU usage jumps to 100%
Thread pools explode
DB connections max out
P99 latency increases 50–100x
Error rates spike
Users abandon carts

Worst case: Entire region goes down.

Conclusion

The thundering herd problem is not about high traffic.

It’s about synchronized traffic.

Quick commerce apps are especially vulnerable because:

Flash sales
Limited inventory
Live inventory locking
Real-time delivery tracking
Heavy retry behavior

If traffic is predictable → you can scale.
If traffic is synchronized → you must control coordination.

Simple Summary

Think of it like this:

Normal growth = Water slowly filling a tank.
Thundering herd = Fire hydrant blasting full force instantly.

The solution is not just “add more servers.”

The real solution is:

Spread traffic over time.
Prevent duplicate work.
Control retries.
Shape the flow.

That’s how modern distributed systems survive flash-sale chaos.

DEV Community