Your service is slow. You add Redis. It gets faster. Then Redis becomes the bottleneck -- every request still makes a network round-trip, serialization costs add up, and under load you start seeing latency spikes from connection pool contention.
Sound familiar? In this article, we'll build a two-tier cache layer in Go that combines a local in-memory cache with Redis, prevent cache stampedes using singleflight, and discuss the production considerations that separate a toy cache from a battle-tested one.
Why Not Just Redis?
Redis is excellent. But it's still a network hop away. For a typical service:
| Operation | Latency |
|---|---|
| Local memory read | ~50ns |
| Redis GET (same AZ) | ~0.5-1ms |
| PostgreSQL query | ~2-10ms |
That's a 10,000x difference between local memory and Redis. For hot keys that get read thousands of times per second, this matters.
A local cache also gives you:
- Zero network overhead -- no serialization, no TCP, no connection pools
- Resilience -- your service still responds if Redis goes down briefly
- Reduced Redis load -- fewer commands means lower Redis CPU and network usage
The tradeoff? Local caches are per-instance and can serve stale data. We'll address both.
Tier 1: Local In-Memory Cache with TTL
Let's start with a simple but effective local cache. We'll use sync.Map for concurrent access and a background goroutine for TTL eviction.
package cache
import (
"sync"
"time"
)
type entry struct {
value any
expiresAt time.Time
}
type LocalCache struct {
data sync.Map
maxSize int
size int64
mu sync.Mutex // guards size
}
func NewLocalCache(maxSize int, evictInterval time.Duration) *LocalCache {
c := &LocalCache{maxSize: maxSize}
go c.evictLoop(evictInterval)
return c
}
func (c *LocalCache) Get(key string) (any, bool) {
raw, ok := c.data.Load(key)
if !ok {
return nil, false
}
e := raw.(*entry)
if time.Now().After(e.expiresAt) {
c.data.Delete(key)
c.decrSize()
return nil, false
}
return e.value, true
}
func (c *LocalCache) Set(key string, value any, ttl time.Duration) {
_, loaded := c.data.LoadOrStore(key, &entry{
value: value,
expiresAt: time.Now().Add(ttl),
})
if !loaded {
c.incrSize()
} else {
c.data.Store(key, &entry{
value: value,
expiresAt: time.Now().Add(ttl),
})
}
}
func (c *LocalCache) Delete(key string) {
if _, loaded := c.data.LoadAndDelete(key); loaded {
c.decrSize()
}
}
func (c *LocalCache) evictLoop(interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for range ticker.C {
now := time.Now()
c.data.Range(func(key, value any) bool {
if now.After(value.(*entry).expiresAt) {
c.data.Delete(key)
c.decrSize()
}
return true
})
}
}
func (c *LocalCache) incrSize() { c.mu.Lock(); c.size++; c.mu.Unlock() }
func (c *LocalCache) decrSize() { c.mu.Lock(); c.size--; c.mu.Unlock() }
This gives us O(1) reads and writes with lazy + periodic expiration. The sync.Map is optimized for the read-heavy, write-light pattern that caches typically exhibit.
Why not a regular map with sync.RWMutex? For read-dominated workloads with many goroutines, sync.Map avoids lock contention on the read path entirely. Under write-heavy loads, a sharded map with RWMutex can outperform it -- but caches are almost always read-heavy.
Tier 2: Two-Tier Cache with Cache-Aside Pattern
Now let's compose the local cache with Redis into a two-tier system. The lookup flow:
- Check local cache -> hit? Return immediately.
- Check Redis -> hit? Backfill local cache, return.
- Call the loader (DB, API, etc.) -> Populate both caches, return.
package cache
import (
"context"
"fmt"
"time"
"github.com/redis/go-redis/v9"
)
type TieredCache struct {
local *LocalCache
redis *redis.Client
localTTL time.Duration
redisTTL time.Duration
}
func NewTieredCache(rc *redis.Client, localTTL, redisTTL time.Duration) *TieredCache {
return &TieredCache{
local: NewLocalCache(10000, 30*time.Second),
redis: rc,
localTTL: localTTL,
redisTTL: redisTTL,
}
}
func (tc *TieredCache) Get(ctx context.Context, key string) ([]byte, bool) {
// Tier 1: local memory
if val, ok := tc.local.Get(key); ok {
return val.([]byte), true
}
// Tier 2: Redis
val, err := tc.redis.Get(ctx, key).Bytes()
if err == nil {
tc.local.Set(key, val, tc.localTTL) // backfill L1
return val, true
}
return nil, false
}
func (tc *TieredCache) Set(ctx context.Context, key string, value []byte) error {
tc.local.Set(key, value, tc.localTTL)
return tc.redis.Set(ctx, key, value, tc.redisTTL).Err()
}
// GetOrLoad implements the full cache-aside pattern.
func (tc *TieredCache) GetOrLoad(
ctx context.Context,
key string,
loader func(ctx context.Context) ([]byte, error),
) ([]byte, error) {
if val, ok := tc.Get(ctx, key); ok {
return val, nil
}
val, err := loader(ctx)
if err != nil {
return nil, fmt.Errorf("loader for key %s: %w", key, err)
}
_ = tc.Set(ctx, key, val) // best-effort cache write
return val, nil
}
Usage is clean:
data, err := cache.GetOrLoad(ctx, "user:1234", func(ctx context.Context) ([]byte, error) {
u, err := db.GetUser(ctx, 1234)
if err != nil {
return nil, err
}
return json.Marshal(u)
})
Important: keep the local TTL shorter than the Redis TTL. A good starting point is local 10-30s, Redis 5-15 minutes. This bounds cross-instance staleness while still absorbing the vast majority of reads locally.
Preventing Cache Stampedes with singleflight
There's a critical problem with GetOrLoad. When a popular key expires, hundreds of goroutines simultaneously discover the miss and all call the loader. This is a cache stampede -- it can flatten your database.
Go's golang.org/x/sync/singleflight deduplicates concurrent calls for the same key so only one goroutine does the actual work:
import "golang.org/x/sync/singleflight"
type TieredCache struct {
local *LocalCache
redis *redis.Client
localTTL time.Duration
redisTTL time.Duration
sf singleflight.Group
}
func (tc *TieredCache) GetOrLoad(
ctx context.Context,
key string,
loader func(ctx context.Context) ([]byte, error),
) ([]byte, error) {
if val, ok := tc.Get(ctx, key); ok {
return val, nil
}
// Only one goroutine executes per key; others wait and share the result.
result, err, shared := tc.sf.Do(key, func() (any, error) {
// Double-check: another goroutine may have filled the cache
// while we waited for the singleflight slot.
if val, ok := tc.Get(ctx, key); ok {
return val, nil
}
val, err := loader(ctx)
if err != nil {
return nil, err
}
_ = tc.Set(ctx, key, val)
return val, nil
})
if err != nil {
return nil, err
}
_ = shared // useful for metrics: high share rate = stampede prevention working
return result.([]byte), nil
}
The double-check inside Do matters. Between the initial miss and acquiring the singleflight slot, another goroutine may have already populated the cache. Without this, you'd still make one redundant database call per stampede event.
Benchmarks
Test setup: 8-core machine, 100 concurrent goroutines, 10K unique keys with Zipfian distribution (some keys much hotter than others, like real traffic).
func BenchmarkCacheTiers(b *testing.B) {
b.Run("redis-only", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
rdb.Get(ctx, zipfKey())
}
})
})
b.Run("local-only", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
local.Get(zipfKey())
}
})
})
b.Run("tiered", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
tiered.Get(ctx, zipfKey())
}
})
})
}
Results:
| Approach | ops/sec | p50 | p99 |
|---|---|---|---|
| Redis only | 85,000 | 0.6ms | 2.1ms |
| Local only | 12,000,000 | 48ns | 210ns |
| Tiered (warm) | 10,500,000 | 52ns | 380ns |
| Tiered (cold start) | 78,000 | 0.7ms | 2.4ms |
At steady state the tiered cache runs at near-local-only speed because hot keys live in L1. The extra local-miss check on cold paths adds only ~4ns of overhead before falling through to Redis.
Stampede test: 1000 goroutines hitting the same expired key simultaneously:
| Without singleflight | With singleflight |
|---|---|
| 1000 DB calls | 1 DB call |
| p99: 850ms | p99: 12ms |
The difference is dramatic and gets worse under real load.
Production Considerations
Memory Limits
An unbounded local cache will OOM your process. Two approaches:
Max entry count -- simple and predictable. Evict oldest entries when full. Add a size check in
Setand use an LRU library likehashicorp/golang-lru/v2when you need eviction ordering.Max memory bytes -- more precise but harder. For
[]bytevalues you can sum lengths directly; for arbitrary types, estimation gets complex.
Start with max entry count + short TTL. Monitor via runtime.MemStats and adjust.
Eviction Policies
TTL-based eviction is often sufficient. When you also need to cap size:
- LRU -- the default choice. Well-understood, works for most access patterns.
- LFU -- better for heavily skewed workloads. More complex to implement correctly.
- Random -- surprisingly effective and nearly free. Consider it for unpredictable access patterns.
For most services, LRU + TTL hits the sweet spot.
Cache Invalidation
Options for multi-instance consistency:
- Short local TTLs -- accept bounded staleness (10-30s). Simplest approach, often sufficient.
- Redis Pub/Sub -- publish invalidation events on write; instances subscribe and evict locally.
func (tc *TieredCache) Invalidate(ctx context.Context, key string) error {
tc.local.Delete(key)
tc.redis.Del(ctx, key)
return tc.redis.Publish(ctx, "cache:invalidate", key).Err()
}
// Each instance subscribes on startup:
func (tc *TieredCache) SubscribeInvalidations(ctx context.Context) {
sub := tc.redis.Subscribe(ctx, "cache:invalidate")
go func() {
for msg := range sub.Channel() {
tc.local.Delete(msg.Payload)
}
}()
}
Negative Caching
Cache misses too. If a key doesn't exist in your database, store a sentinel to prevent repeated lookups:
var sentinel = []byte("__MISS__")
// In the loader:
if errors.Is(err, ErrNotFound) {
_ = cache.Set(ctx, key, sentinel) // short TTL
return nil, ErrNotFound
}
Without this, a nonexistent key generates a database query on every request -- a pattern attackers can exploit.
Monitoring
Track these metrics (export to Prometheus, Datadog, etc.):
- Hit rate per tier -- local should be 80%+ for hot paths
- Singleflight share rate -- high = stampede prevention working
- Cache size -- entry count and estimated memory
- Loader latency -- what you're protecting the system from
type Metrics struct {
LocalHits atomic.Int64
LocalMisses atomic.Int64
RedisHits atomic.Int64
RedisMisses atomic.Int64
SFShared atomic.Int64
}
A dashboard showing per-tier hit rates will immediately tell you whether your cache is earning its complexity.
The Complete Architecture
Request -> Local Cache (L1, ~50ns)
|miss
Redis (L2, ~0.5ms)
|miss
singleflight dedup
|
Database (~5ms)
|
Populate L1 + L2
Key takeaways:
- Two tiers beat one -- local absorbs hot reads, Redis handles the long tail and cross-instance sharing.
- singleflight is non-negotiable -- without it, cache expiration under load becomes a database stampede.
- Short local TTLs -- 10-30s balances freshness against hit rate.
- Monitor everything -- hit rates, sizes, loader latency. Caches fail silently.
Start with the simple version. Measure. Then add complexity only where the numbers justify it.
This is part of the **Production Backend Patterns* series, where we tackle real infrastructure problems with practical Go code. Follow for the next post on rate limiting and backpressure.*
If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.
Top comments (0)