DEV Community

Cover image for Cache Stampede in Go: Preventing Thundering Herds with Singleflight, Stale Caching, and Request Coalescing
Serif COLAKEL
Serif COLAKEL

Posted on

Cache Stampede in Go: Preventing Thundering Herds with Singleflight, Stale Caching, and Request Coalescing

Modern backend systems spend enormous effort optimizing databases, tuning queries, and scaling infrastructure.

Yet some of the most expensive production incidents begin with a single innocent event:

A cache entry expires.

Everything looks healthy.

CPU is low.

Memory is stable.

Latency is acceptable.

Then suddenly:

  • PostgreSQL reaches 100% CPU
  • Redis traffic spikes
  • Request latency explodes
  • Pods start scaling
  • Error rates climb

Nothing is technically "broken."

Your cache simply stopped protecting your database.

This phenomenon is commonly known as a Cache Stampede, and if you've operated high-traffic Go services long enough, you've probably experienced it.

In this article, we'll explore:

  • What cache stampedes actually are
  • Why naïve caching fails under load
  • How to use singleflight correctly
  • Request coalescing patterns
  • Stale-While-Revalidate strategies
  • Distributed cache coordination
  • Production pitfalls and monitoring

The Hidden Production Killer

Most engineers think about cache performance like this:

Request
   ↓
Cache
   ↓
Database
Enter fullscreen mode Exit fullscreen mode

If the cache misses:

1 request
    ↓
1 database query
Enter fullscreen mode Exit fullscreen mode

No problem.

But production systems rarely receive one request.

Imagine:

10,000 concurrent requests
Enter fullscreen mode Exit fullscreen mode

for the same product page.

As long as the cache exists:

10,000 requests
      ↓
Redis
      ↓
Done
Enter fullscreen mode Exit fullscreen mode

Life is good.

Then the cache expires.


What Is a Cache Stampede?

A cache stampede occurs when many requests simultaneously encounter a cache miss and all attempt to rebuild the same cached data.

Example:

Redis Key Expires

        ↓

50,000 Requests

        ↓

50,000 Database Queries

        ↓

Database Collapse
Enter fullscreen mode Exit fullscreen mode

The database becomes the bottleneck exactly when traffic is highest.

Ironically, the cache that was supposed to reduce load now amplifies it.


The Naïve Cache Implementation

Most services start with something similar:

func GetProduct(ctx context.Context, id string) (*Product, error) {
    if product, ok := cache.Get(id); ok {
        return product, nil
    }

    product, err := repository.GetProduct(ctx, id)
    if err != nil {
        return nil, err
    }

    cache.Set(id, product)

    return product, nil
}
Enter fullscreen mode Exit fullscreen mode

Looks perfectly reasonable.

The problem appears under concurrency.

Imagine 5,000 requests arriving simultaneously:

Request 1 → cache miss
Request 2 → cache miss
Request 3 → cache miss
Request 4 → cache miss
...
Request 5000 → cache miss
Enter fullscreen mode Exit fullscreen mode

Every request executes:

repository.GetProduct(...)
Enter fullscreen mode Exit fullscreen mode

Now your database receives:

5,000 identical queries
Enter fullscreen mode Exit fullscreen mode

for the same product.


Request Coalescing: The First Real Solution

Instead of allowing every request to rebuild the cache independently:

1000 requests
      ↓
1 database query
      ↓
1000 responses
Enter fullscreen mode Exit fullscreen mode

This concept is known as:

Request Coalescing

Multiple identical requests are merged into a single execution.


Enter singleflight

Go provides a production-proven implementation:

golang.org/x/sync/singleflight
Enter fullscreen mode Exit fullscreen mode

Originally extracted from Google's infrastructure and heavily used throughout the Go ecosystem.


Basic Usage

var group singleflight.Group

func GetProduct(
    ctx context.Context,
    id string,
) (*Product, error) {

    if product, ok := cache.Get(id); ok {
        return product, nil
    }

    result, err, _ := group.Do(id, func() (interface{}, error) {

        product, err := repository.GetProduct(ctx, id)
        if err != nil {
            return nil, err
        }

        cache.Set(id, product)

        return product, nil
    })

    if err != nil {
        return nil, err
    }

    return result.(*Product), nil
}
Enter fullscreen mode Exit fullscreen mode

Now:

5000 requests
      ↓
singleflight
      ↓
1 database query
      ↓
shared result
Enter fullscreen mode Exit fullscreen mode

Understanding What Actually Happens

Without singleflight:

R1 → DB
R2 → DB
R3 → DB
R4 → DB
R5 → DB
Enter fullscreen mode Exit fullscreen mode

With singleflight:

R1 → DB

R2 waits
R3 waits
R4 waits
R5 waits

      ↓

all receive same result
Enter fullscreen mode Exit fullscreen mode

The first caller performs the work.

Everyone else waits for the result.


Observing Shared Calls

The third return value indicates whether the result was shared.

result, err, shared := group.Do(key, fn)

if shared {
    metrics.SharedRequests.Inc()
}
Enter fullscreen mode Exit fullscreen mode

This metric becomes extremely useful in production.

A high shared ratio means:

singleflight is saving your database
Enter fullscreen mode Exit fullscreen mode

The First Production Pitfall

Many teams accidentally create:

var group singleflight.Group
Enter fullscreen mode Exit fullscreen mode

for every request.

Example:

func Handler(w http.ResponseWriter, r *http.Request) {
    var group singleflight.Group

    group.Do(...)
}
Enter fullscreen mode Exit fullscreen mode

This completely defeats the purpose.

Each request gets its own group.

Nothing is shared.

The group must live longer than individual requests.

Typically:

type Service struct {
    group singleflight.Group
}
Enter fullscreen mode Exit fullscreen mode

Slow Request Amplification

singleflight prevents duplicate work.

It does not make slow work faster.

Imagine:

1 DB query = 8 seconds
Enter fullscreen mode Exit fullscreen mode

Now:

5000 requests
Enter fullscreen mode Exit fullscreen mode

all wait:

8 seconds
Enter fullscreen mode Exit fullscreen mode

for the same result.

You reduced database load.

You did not reduce latency.

This distinction matters.


Adding Timeouts

Always combine cache rebuilds with context deadlines.

ctx, cancel := context.WithTimeout(
    ctx,
    2*time.Second,
)
defer cancel()

product, err := repository.GetProduct(ctx, id)
Enter fullscreen mode Exit fullscreen mode

Never allow a cache refresh operation to run forever.


Stale-While-Revalidate

One of the most effective production techniques is:

Stale-While-Revalidate (SWR)

Instead of blocking requests when data expires:

Cache expired
      ↓
Serve stale data
      ↓
Refresh in background
Enter fullscreen mode Exit fullscreen mode

Users receive slightly old data.

The database survives.


Basic Example

type CacheEntry struct {
    Value      *Product
    ExpiresAt  time.Time
}
Enter fullscreen mode Exit fullscreen mode

func GetProduct(id string) *Product {

    entry := cache[id]

    if time.Now().Before(entry.ExpiresAt) {
        return entry.Value
    }

    go refreshProduct(id)

    return entry.Value
}
Enter fullscreen mode Exit fullscreen mode

Notice:

request never waits
Enter fullscreen mode Exit fullscreen mode

This dramatically improves resilience during traffic spikes.


Background Refresh with singleflight

Combining both patterns:

func refreshProduct(id string) {

    _, _, _ = group.Do(id, func() (interface{}, error) {

        product, err := repository.GetProduct(
            context.Background(),
            id,
        )

        if err != nil {
            return nil, err
        }

        cache.Set(id, product)

        return nil, nil
    })
}
Enter fullscreen mode Exit fullscreen mode

Even if multiple refreshes trigger simultaneously:

1 refresh
Enter fullscreen mode Exit fullscreen mode

actually executes.


The Distributed Problem

singleflight only works:

inside one process
Enter fullscreen mode Exit fullscreen mode

But most production systems run:

Pod 1
Pod 2
Pod 3
Pod 4
Enter fullscreen mode Exit fullscreen mode

Each pod has its own memory.

Each pod has its own singleflight group.

Now:

4 pods
Enter fullscreen mode Exit fullscreen mode

may still execute:

4 identical DB queries
Enter fullscreen mode Exit fullscreen mode

Distributed Request Coalescing

A common solution uses Redis locks.

ok, err := redis.SetNX(
    ctx,
    "lock:product:"+id,
    "1",
    10*time.Second,
).Result()
Enter fullscreen mode Exit fullscreen mode

If lock acquired:

refresh cache
Enter fullscreen mode Exit fullscreen mode

Otherwise:

wait for cache population
Enter fullscreen mode Exit fullscreen mode

This reduces duplicate rebuilds across an entire cluster.


Cache Expiration Is Also Dangerous

Another hidden problem:

1 million keys
Enter fullscreen mode Exit fullscreen mode

all expire at:

12:00 PM
Enter fullscreen mode Exit fullscreen mode

Then:

12:00 PM
Enter fullscreen mode Exit fullscreen mode

becomes a database apocalypse.


Probabilistic Expiration

Instead of:

TTL = 1 hour
Enter fullscreen mode Exit fullscreen mode

use:

TTL = 1 hour ± random jitter
Enter fullscreen mode Exit fullscreen mode

Example:

ttl := time.Hour +
    time.Duration(rand.Intn(300))*time.Second
Enter fullscreen mode Exit fullscreen mode

Now expirations spread naturally over time.


Production Metrics That Matter

Most teams only track:

Cache Hit Ratio
Enter fullscreen mode Exit fullscreen mode

That's not enough.

Track:

Cache Hit Ratio

hits / total requests
Enter fullscreen mode Exit fullscreen mode

Shared Request Ratio

singleflight shared requests
Enter fullscreen mode Exit fullscreen mode

Measures stampede prevention effectiveness.


Cache Rebuild Count

cache refreshes / minute
Enter fullscreen mode Exit fullscreen mode

Detects excessive invalidation.


Database Fallback Rate

cache misses hitting DB
Enter fullscreen mode Exit fullscreen mode

Should remain stable.

Spikes indicate problems.


Refresh Latency

time spent rebuilding cache
Enter fullscreen mode Exit fullscreen mode

Slow refreshes often precede incidents.


A Real Production Incident

A product catalog service served:

~80k requests/minute
Enter fullscreen mode Exit fullscreen mode

One Redis key expired.

Without request coalescing:

11,000 identical queries
Enter fullscreen mode Exit fullscreen mode

hit PostgreSQL within seconds.

Database CPU jumped:

15% → 100%
Enter fullscreen mode Exit fullscreen mode

Latency increased:

40ms → 9s
Enter fullscreen mode Exit fullscreen mode

After introducing:

  • singleflight
  • stale-while-revalidate
  • TTL jitter

the same event generated:

1 database query
Enter fullscreen mode Exit fullscreen mode

instead of thousands.

The incident never occurred again.


Key Takeaways

✅ Cache stampedes are often more dangerous than cache misses

singleflight is one of the simplest and most effective protections available in Go

✅ Request coalescing turns thousands of duplicate requests into one execution

✅ Stale-While-Revalidate often beats synchronous cache rebuilding

singleflight is process-local and does not solve distributed coordination

✅ Monitor cache rebuilds, shared requests, and fallback rates—not just hit ratio

Most caching discussions focus on speed.

Production caching is really about survivability under load.

Because the most expensive database query is not the slow one.

It's the same query executed 10,000 times simultaneously. 🚀

Top comments (0)