Serif COLAKEL

Posted on Jun 20

Cache Stampede in Go: Preventing Thundering Herds with Singleflight, Stale Caching, and Request Coalescing

#go #backend #productivity #software

Modern backend systems spend enormous effort optimizing databases, tuning queries, and scaling infrastructure.

Yet some of the most expensive production incidents begin with a single innocent event:

A cache entry expires.

Everything looks healthy.

CPU is low.

Memory is stable.

Latency is acceptable.

Then suddenly:

PostgreSQL reaches 100% CPU
Redis traffic spikes
Request latency explodes
Pods start scaling
Error rates climb

Nothing is technically "broken."

Your cache simply stopped protecting your database.

This phenomenon is commonly known as a Cache Stampede, and if you've operated high-traffic Go services long enough, you've probably experienced it.

In this article, we'll explore:

What cache stampedes actually are
Why naïve caching fails under load
How to use singleflight correctly
Request coalescing patterns
Stale-While-Revalidate strategies
Distributed cache coordination
Production pitfalls and monitoring

The Hidden Production Killer

Most engineers think about cache performance like this:

Request
   ↓
Cache
   ↓
Database

If the cache misses:

1 request
    ↓
1 database query

No problem.

But production systems rarely receive one request.

Imagine:

10,000 concurrent requests

for the same product page.

As long as the cache exists:

10,000 requests
      ↓
Redis
      ↓
Done

Life is good.

Then the cache expires.

What Is a Cache Stampede?

A cache stampede occurs when many requests simultaneously encounter a cache miss and all attempt to rebuild the same cached data.

Example:

Redis Key Expires

        ↓

50,000 Requests

        ↓

50,000 Database Queries

        ↓

Database Collapse

The database becomes the bottleneck exactly when traffic is highest.

Ironically, the cache that was supposed to reduce load now amplifies it.

The Naïve Cache Implementation

Most services start with something similar:

func GetProduct(ctx context.Context, id string) (*Product, error) {
    if product, ok := cache.Get(id); ok {
        return product, nil
    }

    product, err := repository.GetProduct(ctx, id)
    if err != nil {
        return nil, err
    }

    cache.Set(id, product)

    return product, nil
}

Looks perfectly reasonable.

The problem appears under concurrency.

Imagine 5,000 requests arriving simultaneously:

Request 1 → cache miss
Request 2 → cache miss
Request 3 → cache miss
Request 4 → cache miss
...
Request 5000 → cache miss

Every request executes:

repository.GetProduct(...)

Now your database receives:

5,000 identical queries

for the same product.

Request Coalescing: The First Real Solution

Instead of allowing every request to rebuild the cache independently:

1000 requests
      ↓
1 database query
      ↓
1000 responses

This concept is known as:

Request Coalescing

Multiple identical requests are merged into a single execution.

Enter singleflight

Go provides a production-proven implementation:

golang.org/x/sync/singleflight

Originally extracted from Google's infrastructure and heavily used throughout the Go ecosystem.

Basic Usage

var group singleflight.Group

func GetProduct(
    ctx context.Context,
    id string,
) (*Product, error) {

    if product, ok := cache.Get(id); ok {
        return product, nil
    }

    result, err, _ := group.Do(id, func() (interface{}, error) {

        product, err := repository.GetProduct(ctx, id)
        if err != nil {
            return nil, err
        }

        cache.Set(id, product)

        return product, nil
    })

    if err != nil {
        return nil, err
    }

    return result.(*Product), nil
}

Now:

5000 requests
      ↓
singleflight
      ↓
1 database query
      ↓
shared result

Understanding What Actually Happens

Without singleflight:

R1 → DB
R2 → DB
R3 → DB
R4 → DB
R5 → DB

With singleflight:

R1 → DB

R2 waits
R3 waits
R4 waits
R5 waits

      ↓

all receive same result

The first caller performs the work.

Everyone else waits for the result.

Observing Shared Calls

The third return value indicates whether the result was shared.

result, err, shared := group.Do(key, fn)

if shared {
    metrics.SharedRequests.Inc()
}

This metric becomes extremely useful in production.

A high shared ratio means:

singleflight is saving your database

The First Production Pitfall

Many teams accidentally create:

var group singleflight.Group

for every request.

Example:

func Handler(w http.ResponseWriter, r *http.Request) {
    var group singleflight.Group

    group.Do(...)
}

This completely defeats the purpose.

Each request gets its own group.

Nothing is shared.

The group must live longer than individual requests.

Typically:

type Service struct {
    group singleflight.Group
}

Slow Request Amplification

singleflight prevents duplicate work.

It does not make slow work faster.

Imagine:

1 DB query = 8 seconds

Now:

5000 requests

all wait:

8 seconds

for the same result.

You reduced database load.

You did not reduce latency.

This distinction matters.

Adding Timeouts

Always combine cache rebuilds with context deadlines.

ctx, cancel := context.WithTimeout(
    ctx,
    2*time.Second,
)
defer cancel()

product, err := repository.GetProduct(ctx, id)

Never allow a cache refresh operation to run forever.

Stale-While-Revalidate

One of the most effective production techniques is:

Stale-While-Revalidate (SWR)

Instead of blocking requests when data expires:

Cache expired
      ↓
Serve stale data
      ↓
Refresh in background

Users receive slightly old data.

The database survives.

Basic Example

type CacheEntry struct {
    Value      *Product
    ExpiresAt  time.Time
}

func GetProduct(id string) *Product {

    entry := cache[id]

    if time.Now().Before(entry.ExpiresAt) {
        return entry.Value
    }

    go refreshProduct(id)

    return entry.Value
}

Notice:

request never waits

This dramatically improves resilience during traffic spikes.

Background Refresh with singleflight

Combining both patterns:

func refreshProduct(id string) {

    _, _, _ = group.Do(id, func() (interface{}, error) {

        product, err := repository.GetProduct(
            context.Background(),
            id,
        )

        if err != nil {
            return nil, err
        }

        cache.Set(id, product)

        return nil, nil
    })
}

Even if multiple refreshes trigger simultaneously:

1 refresh

actually executes.

The Distributed Problem

singleflight only works:

inside one process

But most production systems run:

Pod 1
Pod 2
Pod 3
Pod 4

Each pod has its own memory.

Each pod has its own singleflight group.

Now:

4 pods

may still execute:

4 identical DB queries

Distributed Request Coalescing

A common solution uses Redis locks.

ok, err := redis.SetNX(
    ctx,
    "lock:product:"+id,
    "1",
    10*time.Second,
).Result()

If lock acquired:

refresh cache

Otherwise:

wait for cache population

This reduces duplicate rebuilds across an entire cluster.

Cache Expiration Is Also Dangerous

Another hidden problem:

1 million keys

all expire at:

12:00 PM

Then:

12:00 PM

becomes a database apocalypse.

Probabilistic Expiration

Instead of:

TTL = 1 hour

use:

TTL = 1 hour ± random jitter

Example:

ttl := time.Hour +
    time.Duration(rand.Intn(300))*time.Second

Now expirations spread naturally over time.

Production Metrics That Matter

Most teams only track:

Cache Hit Ratio

That's not enough.

Track:

Cache Hit Ratio

hits / total requests

Shared Request Ratio

singleflight shared requests

Measures stampede prevention effectiveness.

Cache Rebuild Count

cache refreshes / minute

Detects excessive invalidation.

Database Fallback Rate

cache misses hitting DB

Should remain stable.

Spikes indicate problems.

Refresh Latency

time spent rebuilding cache

Slow refreshes often precede incidents.

A Real Production Incident

A product catalog service served:

~80k requests/minute

One Redis key expired.

Without request coalescing:

11,000 identical queries

hit PostgreSQL within seconds.

Database CPU jumped:

15% → 100%

Latency increased:

40ms → 9s

After introducing:

singleflight
stale-while-revalidate
TTL jitter

the same event generated:

1 database query

instead of thousands.

The incident never occurred again.

Key Takeaways

✅ Cache stampedes are often more dangerous than cache misses

✅ singleflight is one of the simplest and most effective protections available in Go

✅ Request coalescing turns thousands of duplicate requests into one execution

✅ Stale-While-Revalidate often beats synchronous cache rebuilding

✅ singleflight is process-local and does not solve distributed coordination

✅ Monitor cache rebuilds, shared requests, and fallback rates—not just hit ratio

Most caching discussions focus on speed.

Production caching is really about survivability under load.

Because the most expensive database query is not the slow one.

It's the same query executed 10,000 times simultaneously. 🚀

DEV Community

Cache Stampede in Go: Preventing Thundering Herds with Singleflight, Stale Caching, and Request Coalescing

The Hidden Production Killer

What Is a Cache Stampede?

The Naïve Cache Implementation

Request Coalescing: The First Real Solution

Enter singleflight

Basic Usage

Understanding What Actually Happens

Observing Shared Calls

The First Production Pitfall

Slow Request Amplification

Adding Timeouts

Stale-While-Revalidate

Basic Example

Background Refresh with singleflight

The Distributed Problem

Distributed Request Coalescing

Cache Expiration Is Also Dangerous

Probabilistic Expiration

Production Metrics That Matter

Cache Hit Ratio

Shared Request Ratio

Cache Rebuild Count

Database Fallback Rate

Refresh Latency

A Real Production Incident

Key Takeaways

Top comments (0)