Modern backend systems spend enormous effort optimizing databases, tuning queries, and scaling infrastructure.
Yet some of the most expensive production incidents begin with a single innocent event:
A cache entry expires.
Everything looks healthy.
CPU is low.
Memory is stable.
Latency is acceptable.
Then suddenly:
- PostgreSQL reaches 100% CPU
- Redis traffic spikes
- Request latency explodes
- Pods start scaling
- Error rates climb
Nothing is technically "broken."
Your cache simply stopped protecting your database.
This phenomenon is commonly known as a Cache Stampede, and if you've operated high-traffic Go services long enough, you've probably experienced it.
In this article, we'll explore:
- What cache stampedes actually are
- Why naïve caching fails under load
- How to use
singleflightcorrectly - Request coalescing patterns
- Stale-While-Revalidate strategies
- Distributed cache coordination
- Production pitfalls and monitoring
The Hidden Production Killer
Most engineers think about cache performance like this:
Request
↓
Cache
↓
Database
If the cache misses:
1 request
↓
1 database query
No problem.
But production systems rarely receive one request.
Imagine:
10,000 concurrent requests
for the same product page.
As long as the cache exists:
10,000 requests
↓
Redis
↓
Done
Life is good.
Then the cache expires.
What Is a Cache Stampede?
A cache stampede occurs when many requests simultaneously encounter a cache miss and all attempt to rebuild the same cached data.
Example:
Redis Key Expires
↓
50,000 Requests
↓
50,000 Database Queries
↓
Database Collapse
The database becomes the bottleneck exactly when traffic is highest.
Ironically, the cache that was supposed to reduce load now amplifies it.
The Naïve Cache Implementation
Most services start with something similar:
func GetProduct(ctx context.Context, id string) (*Product, error) {
if product, ok := cache.Get(id); ok {
return product, nil
}
product, err := repository.GetProduct(ctx, id)
if err != nil {
return nil, err
}
cache.Set(id, product)
return product, nil
}
Looks perfectly reasonable.
The problem appears under concurrency.
Imagine 5,000 requests arriving simultaneously:
Request 1 → cache miss
Request 2 → cache miss
Request 3 → cache miss
Request 4 → cache miss
...
Request 5000 → cache miss
Every request executes:
repository.GetProduct(...)
Now your database receives:
5,000 identical queries
for the same product.
Request Coalescing: The First Real Solution
Instead of allowing every request to rebuild the cache independently:
1000 requests
↓
1 database query
↓
1000 responses
This concept is known as:
Request Coalescing
Multiple identical requests are merged into a single execution.
Enter singleflight
Go provides a production-proven implementation:
golang.org/x/sync/singleflight
Originally extracted from Google's infrastructure and heavily used throughout the Go ecosystem.
Basic Usage
var group singleflight.Group
func GetProduct(
ctx context.Context,
id string,
) (*Product, error) {
if product, ok := cache.Get(id); ok {
return product, nil
}
result, err, _ := group.Do(id, func() (interface{}, error) {
product, err := repository.GetProduct(ctx, id)
if err != nil {
return nil, err
}
cache.Set(id, product)
return product, nil
})
if err != nil {
return nil, err
}
return result.(*Product), nil
}
Now:
5000 requests
↓
singleflight
↓
1 database query
↓
shared result
Understanding What Actually Happens
Without singleflight:
R1 → DB
R2 → DB
R3 → DB
R4 → DB
R5 → DB
With singleflight:
R1 → DB
R2 waits
R3 waits
R4 waits
R5 waits
↓
all receive same result
The first caller performs the work.
Everyone else waits for the result.
Observing Shared Calls
The third return value indicates whether the result was shared.
result, err, shared := group.Do(key, fn)
if shared {
metrics.SharedRequests.Inc()
}
This metric becomes extremely useful in production.
A high shared ratio means:
singleflight is saving your database
The First Production Pitfall
Many teams accidentally create:
var group singleflight.Group
for every request.
Example:
func Handler(w http.ResponseWriter, r *http.Request) {
var group singleflight.Group
group.Do(...)
}
This completely defeats the purpose.
Each request gets its own group.
Nothing is shared.
The group must live longer than individual requests.
Typically:
type Service struct {
group singleflight.Group
}
Slow Request Amplification
singleflight prevents duplicate work.
It does not make slow work faster.
Imagine:
1 DB query = 8 seconds
Now:
5000 requests
all wait:
8 seconds
for the same result.
You reduced database load.
You did not reduce latency.
This distinction matters.
Adding Timeouts
Always combine cache rebuilds with context deadlines.
ctx, cancel := context.WithTimeout(
ctx,
2*time.Second,
)
defer cancel()
product, err := repository.GetProduct(ctx, id)
Never allow a cache refresh operation to run forever.
Stale-While-Revalidate
One of the most effective production techniques is:
Stale-While-Revalidate (SWR)
Instead of blocking requests when data expires:
Cache expired
↓
Serve stale data
↓
Refresh in background
Users receive slightly old data.
The database survives.
Basic Example
type CacheEntry struct {
Value *Product
ExpiresAt time.Time
}
func GetProduct(id string) *Product {
entry := cache[id]
if time.Now().Before(entry.ExpiresAt) {
return entry.Value
}
go refreshProduct(id)
return entry.Value
}
Notice:
request never waits
This dramatically improves resilience during traffic spikes.
Background Refresh with singleflight
Combining both patterns:
func refreshProduct(id string) {
_, _, _ = group.Do(id, func() (interface{}, error) {
product, err := repository.GetProduct(
context.Background(),
id,
)
if err != nil {
return nil, err
}
cache.Set(id, product)
return nil, nil
})
}
Even if multiple refreshes trigger simultaneously:
1 refresh
actually executes.
The Distributed Problem
singleflight only works:
inside one process
But most production systems run:
Pod 1
Pod 2
Pod 3
Pod 4
Each pod has its own memory.
Each pod has its own singleflight group.
Now:
4 pods
may still execute:
4 identical DB queries
Distributed Request Coalescing
A common solution uses Redis locks.
ok, err := redis.SetNX(
ctx,
"lock:product:"+id,
"1",
10*time.Second,
).Result()
If lock acquired:
refresh cache
Otherwise:
wait for cache population
This reduces duplicate rebuilds across an entire cluster.
Cache Expiration Is Also Dangerous
Another hidden problem:
1 million keys
all expire at:
12:00 PM
Then:
12:00 PM
becomes a database apocalypse.
Probabilistic Expiration
Instead of:
TTL = 1 hour
use:
TTL = 1 hour ± random jitter
Example:
ttl := time.Hour +
time.Duration(rand.Intn(300))*time.Second
Now expirations spread naturally over time.
Production Metrics That Matter
Most teams only track:
Cache Hit Ratio
That's not enough.
Track:
Cache Hit Ratio
hits / total requests
Shared Request Ratio
singleflight shared requests
Measures stampede prevention effectiveness.
Cache Rebuild Count
cache refreshes / minute
Detects excessive invalidation.
Database Fallback Rate
cache misses hitting DB
Should remain stable.
Spikes indicate problems.
Refresh Latency
time spent rebuilding cache
Slow refreshes often precede incidents.
A Real Production Incident
A product catalog service served:
~80k requests/minute
One Redis key expired.
Without request coalescing:
11,000 identical queries
hit PostgreSQL within seconds.
Database CPU jumped:
15% → 100%
Latency increased:
40ms → 9s
After introducing:
- singleflight
- stale-while-revalidate
- TTL jitter
the same event generated:
1 database query
instead of thousands.
The incident never occurred again.
Key Takeaways
✅ Cache stampedes are often more dangerous than cache misses
✅ singleflight is one of the simplest and most effective protections available in Go
✅ Request coalescing turns thousands of duplicate requests into one execution
✅ Stale-While-Revalidate often beats synchronous cache rebuilding
✅ singleflight is process-local and does not solve distributed coordination
✅ Monitor cache rebuilds, shared requests, and fallback rates—not just hit ratio
Most caching discussions focus on speed.
Production caching is really about survivability under load.
Because the most expensive database query is not the slow one.
It's the same query executed 10,000 times simultaneously. 🚀
Top comments (0)