DEV Community

Aishwarya B R
Aishwarya B R

Posted on

Caching Patterns That Scale

You're writing a research paper. Every time you need a fact, you walk to the library (10 minutes away), search for the book (20 minutes), read the page (5 minutes), return the book, walk home (10 minutes).

For 20 facts, that's 15 hours of walking and searching.

Or: First trip, photocopy the relevant pages. Keep them on your desk. Look up facts instantly. One trip, 45 minutes total.

That's caching. Stop making the same expensive trip repeatedly.

Core idea: Trade memory for speed.


The Real-World Pain

You're building a social media feed. Users refresh constantly. Right now:

  • 12.7 seconds per feed load
  • Database melts down at 10K concurrent users
  • Paying $50K/month for oversized instances
  • Database connections pool out (max 100)
  • Database CPU hits 100%

The platform grinds to a halt.


The Refactoring Journey

You can't keep hitting the database for every request. Here's how most teams evolve their caching strategy.

Attempt 1: Cache Everything With Fixed TTL (25%)

The idea: Just cache it all. Redis is fast. Problem solved, right?

feed = cache.get("user:123:feed")
if (!feed) {
  feed = database.query(...)
  cache.set("user:123:feed", feed, ttl=300)  // 5 minutes
}
return feed
Enter fullscreen mode Exit fullscreen mode

What happens: It works! Data refreshes. Memory stays bounded. But then you notice something odd. Every 5 minutes, database load spikes to 100%. All cache entries expire at once. It's like a dam breaking every 5 minutes.

Also, user profiles (which barely change) get the same 5-minute TTL as rapidly updating feeds. Wasteful.

✅ Achieved ❌ Still Pending
Basic caching works No invalidation strategy
Reduced initial DB load Thundering herd (all expire together)
Faster repeated requests No granular control
Data eventually refreshes Cache hit rate drops on mass expiry
Memory stays bounded
Automatic cleanup

Attempt 2: Smart Caching with Invalidation (50%)

The evolution: Different TTLs for different data. Cache-aside pattern. Write-through when needed. Event-driven invalidation.

// Read with cache-aside
feed = cache.get(key)
if (!feed) {
  feed = database.query(...)
  cache.set(key, feed, ttl=30)  // Short TTL for feeds
}

// Write-through for critical data
database.update(userId, data)
cache.set(`user:${userId}`, data)  // Update cache immediately
eventBus.publish("UserUpdated", userId)  // Notify dependents

// Event handler invalidates related caches
onUserUpdated((userId) => {
  cache.delete(`user:${userId}:feed`)
  // Invalidate followers' feeds too
  followers = getFollowers(userId)
  followers.forEach(f => cache.delete(`user:${f}:feed`))
})
Enter fullscreen mode Exit fullscreen mode

What happens: Much better! Fresh data after updates. But edge cases emerge.

✅ Achieved ❌ Still Pending
Fresh data after updates Race conditions (write during read)
Granular TTL control No thundering herd prevention
Event-driven invalidation Cold start hammers DB
Write-through for consistency Single point of failure

Attempt 3: Thundering Herd Prevention (75%)

The breakthrough: The database dies when popular cache keys expire simultaneously. 10,000 requests hit the database at once. You need to ensure only ONE request fetches from the database while others wait.

Key innovations:

  • Distributed locks: Only one request refreshes expired data
  • Probabilistic early expiration: Stagger refreshes before TTL expires

What happens: Traffic spike at noon. Your most popular user's feed (1M followers) expires from cache. 50,000 requests arrive in the next second. Without protection, that's 50,000 database queries. Instant death.

Impact:

  • Without protection: 10,000 concurrent requests = 10,000 DB queries = database death
  • With distributed lock: 10,000 concurrent requests = 1 DB query = smooth sailing
  • With probabilistic refresh: Refreshes spread over 30 seconds = no expiration spike
✅ Achieved ❌ Still Pending
Thundering herd eliminated Cold start still painful
Smooth cache refresh No failure handling
Database protected during spikes Single-layer cache
Staggered expirations

Attempt 4: Production-Ready Caching (100%) ✨

The final form: Everything from Attempt 3, plus multi-layer caching, intelligent warming, circuit breakers, and graceful degradation.

Full implementation code

Architecture:

Request → L1 (memory, 1ms) → L2 (Redis, 7ms) → Database (200ms)
                                ↓ (if down)
                            Fallback to DB
Enter fullscreen mode Exit fullscreen mode

Key additions:

Multi-layer caching:

  • L1 (in-memory): 100ms TTL, prevents Redis hammering during traffic spikes
  • L2 (Redis): 5min TTL, shared across all servers
  • L3 (Database): Source of truth

Cache warming on startup

What happens: L1 cache hits return in 1ms. L2 hits in 7ms. Database queries only happen for cache misses or cold data. No thundering herd. Redis failures degrade gracefully to database. App restarts are smooth. You scale to 100K concurrent users without breaking a sweat.

Real-world latency:

Request Type Latency What Happened
L1 hit (same server) 1ms Read from Node.js memory
L2 hit (Redis) 7ms Network + Redis lookup
Database miss 207ms Full query + cache populate

Impact: 80% L1 hit rate + 15% L2 hit rate = 95% of requests under 10ms. Only 5% hit database.

✅ Production Ready:

  • Multi-layer caching (L1: memory, L2: Redis, L3: DB)
  • Cache warming on startup with parallel batch processing
  • Thundering herd prevention (distributed locks)
  • Probabilistic refresh (stagger expirations)
  • Circuit breaker for Redis failures
  • Graceful degradation (fallback to DB)
  • Lock safety (TTL prevents deadlocks)
  • Proper error handling
  • Monitoring & alerting

Sizing Your Cache

The 80/20 Rule (Pareto Principle): 20% of your data gets 80% of the traffic.

You have 10 million users. Do you cache all feeds? No. Cache the active users.

Real example calculation:

Total users: 10,000,000
Active daily users: 2,000,000 (20%)
Active hourly users: 500,000 (5%)

Average feed size: 50 posts × 500 bytes = 25 KB
Cache budget: 16 GB (Redis instance size)

Option 1: Cache all users
10M × 25 KB = 250 GB ❌ (exceeds budget by 15x)

Option 2: Cache active hourly users
500K × 25 KB = 12.5 GB ✅ (fits with room to spare)

Hit rate: 95% (active users cached)
Miss rate: 5% (inactive users hit DB directly)
Enter fullscreen mode Exit fullscreen mode

L1 vs L2 sizing:

Layer Size TTL Purpose
L1 (memory) 100 MB 100ms Prevent Redis hammering during traffic spikes
L2 (Redis) 16-64 GB 5-30min Main cache layer, shared across servers
L3 (Database) Unlimited Forever Source of truth

Rule of thumb: L1 should hold ~1000 of your hottest items. L2 should hold 10-20% of your dataset.


Cache Invalidation Strategies

Strategy Example When to Use
TTL-based Profiles (5min), Posts (30s), Static (24hr) Different data change frequencies
Event-based User updates profile → publish event → delete cache Immediate consistency needed
Dependency-based New post → invalidate author + all 1000 followers Complex relationships
Tag-based cache.invalidateByTag('user') invalidates all user-related keys Bulk invalidation
Probabilistic Refresh randomly before expiration Prevent thundering herd

Key Patterns Compared

Caching Patterns

Pattern Use When Consistency What You Trade
Cache-Aside Default for most reads Eventual Simple but cold cache penalty
Write-Through Need strong consistency Strong 2× write latency
Write-Behind High write volume Eventual Risk of data loss if cache fails

When NOT to Cache

Don't Cache Why
Financial transactions Always read from source of truth. Stale balance = wrong charge.
Real-time stock prices Price changes every second. Cache = misleading data.
Passwords / PII Security risk. Always fetch from secure storage.
Rarely accessed data Cache memory wasted. Cost > benefit.
User permissions/auth Stale permissions = security hole. Cache for max 30s if needed.
Audit logs Legal requirement for immutability. No caching.
Shopping cart totals Needs transactional consistency. Cache items, not totals.

Redis Cluster: Scaling Beyond Single Instance

When you hit 100K+ concurrent users, single-instance Redis becomes the bottleneck. Redis Cluster distributes data across multiple nodes for horizontal scaling.

When You Need Clustering

Single Instance Limits:

  • Memory: ~64GB per instance (AWS/GCP limits)
  • Throughput: ~100K ops/sec
  • Network: ~1-2 Gbps

Signs you need clustering:

  • Redis memory usage >70% and growing
  • CPU consistently >60%
  • Network bandwidth saturating
  • Single point of failure unacceptable

How Redis Cluster Works

// Redis uses CRC16 hash slot algorithm
hash_slot = CRC16(key) % 16384

// Keys are distributed across nodes
Node 1: slots 0-5460
Node 2: slots 5461-10922
Node 3: slots 10923-16383
Enter fullscreen mode Exit fullscreen mode

Critical: Hash Tags for Multi-Key Operations

// ❌ Keys go to different nodes - MULTI/EXEC fails
redis.mget('user:123:feed', 'user:123:profile')

// ✅ Use hash tags {} to force same slot
redis.mget('user:{123}:feed', 'user:{123}:profile')
// Both keys hash on "123", guaranteed same node
Enter fullscreen mode Exit fullscreen mode

More on this in Part 2


Monitoring: What to Track

Metric Target Alert If Why Check in Redis CLI
Cache hit rate >80% <70% Cache not effective INFO stats
Memory usage <80% >90% Running out of RAM INFO memory
L1 latency <1ms >5ms Memory pressure or GC issues N/A (app-level)
L2 latency <5ms >20ms Redis overloaded or network issues --latency
Eviction rate <100/sec >1000/sec Cache too small for working set INFO stats
Miss rate <20% >30% Ineffective caching strategy INFO stats
DB query time <50ms >200ms Database struggling N/A (DB-level)

The Bottom Line

Production caching is about resilience and observability, not just performance.

Challenge Solution
Single point of failure Redis Cluster with replicas
Cache goes down Circuit breakers, fallback to DB
Mystery slowness Comprehensive monitoring + debug toolkit
Traffic spikes Multi-layer cache + distributed locks
Can't reproduce bugs Metric dashboards + structured logging

Start simple. Evolve deliberately. Your database will thank you.


Building scalable systems? I write about architecture patterns and clean code. Follow me on Dev.to | GitHub

Top comments (0)