Aishwarya B R

Posted on Feb 17 • Edited on Feb 19

Caching Patterns That Scale

#redis #architecture #backend #javascript

You're writing a research paper. Every time you need a fact, you walk to the library (10 minutes away), search for the book (20 minutes), read the page (5 minutes), return the book, walk home (10 minutes).

For 20 facts, that's 15 hours of walking and searching.

Or: First trip, photocopy the relevant pages. Keep them on your desk. Look up facts instantly. One trip, 45 minutes total.

That's caching. Stop making the same expensive trip repeatedly.

Core idea: Trade memory for speed.

The Real-World Pain

You're building a social media feed. Users refresh constantly. Right now:

12.7 seconds per feed load
Database melts down at 10K concurrent users
Paying $50K/month for oversized instances
Database connections pool out (max 100)
Database CPU hits 100%

The platform grinds to a halt.

The Refactoring Journey

You can't keep hitting the database for every request. Here's how most teams evolve their caching strategy.

Attempt 1: Cache Everything With Fixed TTL (25%)

The idea: Just cache it all. Redis is fast. Problem solved, right?

feed = cache.get("user:123:feed")
if (!feed) {
  feed = database.query(...)
  cache.set("user:123:feed", feed, ttl=300)  // 5 minutes
}
return feed

What happens: It works! Data refreshes. Memory stays bounded. But then you notice something odd. Every 5 minutes, database load spikes to 100%. All cache entries expire at once. It's like a dam breaking every 5 minutes.

Also, user profiles (which barely change) get the same 5-minute TTL as rapidly updating feeds. Wasteful.

✅ Achieved	❌ Still Pending
Basic caching works	No invalidation strategy
Reduced initial DB load	Thundering herd (all expire together)
Faster repeated requests	No granular control
Data eventually refreshes	Cache hit rate drops on mass expiry
Memory stays bounded
Automatic cleanup

Attempt 2: Smart Caching with Invalidation (50%)

The evolution: Different TTLs for different data. Cache-aside pattern. Write-through when needed. Event-driven invalidation.

// Read with cache-aside
feed = cache.get(key)
if (!feed) {
  feed = database.query(...)
  cache.set(key, feed, ttl=30)  // Short TTL for feeds
}

// Write-through for critical data
database.update(userId, data)
cache.set(`user:${userId}`, data)  // Update cache immediately
eventBus.publish("UserUpdated", userId)  // Notify dependents

// Event handler invalidates related caches
onUserUpdated((userId) => {
  cache.delete(`user:${userId}:feed`)
  // Invalidate followers' feeds too
  followers = getFollowers(userId)
  followers.forEach(f => cache.delete(`user:${f}:feed`))
})

What happens: Much better! Fresh data after updates. But edge cases emerge.

✅ Achieved	❌ Still Pending
Fresh data after updates	Race conditions (write during read)
Granular TTL control	No thundering herd prevention
Event-driven invalidation	Cold start hammers DB
Write-through for consistency	Single point of failure

Attempt 3: Thundering Herd Prevention (75%)

The breakthrough: The database dies when popular cache keys expire simultaneously. 10,000 requests hit the database at once. You need to ensure only ONE request fetches from the database while others wait.

Key innovations:

Distributed locks: Only one request refreshes expired data
Probabilistic early expiration: Stagger refreshes before TTL expires

What happens: Traffic spike at noon. Your most popular user's feed (1M followers) expires from cache. 50,000 requests arrive in the next second. Without protection, that's 50,000 database queries. Instant death.

Impact:

Without protection: 10,000 concurrent requests = 10,000 DB queries = database death
With distributed lock: 10,000 concurrent requests = 1 DB query = smooth sailing
With probabilistic refresh: Refreshes spread over 30 seconds = no expiration spike

✅ Achieved	❌ Still Pending
Thundering herd eliminated	Cold start still painful
Smooth cache refresh	No failure handling
Database protected during spikes	Single-layer cache
Staggered expirations

Attempt 4: Production-Ready Caching (100%) ✨

The final form: Everything from Attempt 3, plus multi-layer caching, intelligent warming, circuit breakers, and graceful degradation.

Full implementation code

Architecture:

Request → L1 (memory, 1ms) → L2 (Redis, 7ms) → Database (200ms)
                                ↓ (if down)
                            Fallback to DB

Key additions:

Multi-layer caching:

L1 (in-memory): 100ms TTL, prevents Redis hammering during traffic spikes
L2 (Redis): 5min TTL, shared across all servers
L3 (Database): Source of truth

Cache warming on startup

What happens: L1 cache hits return in 1ms. L2 hits in 7ms. Database queries only happen for cache misses or cold data. No thundering herd. Redis failures degrade gracefully to database. App restarts are smooth. You scale to 100K concurrent users without breaking a sweat.

Real-world latency:

Request Type	Latency	What Happened
L1 hit (same server)	1ms	Read from Node.js memory
L2 hit (Redis)	7ms	Network + Redis lookup
Database miss	207ms	Full query + cache populate

Impact: 80% L1 hit rate + 15% L2 hit rate = 95% of requests under 10ms. Only 5% hit database.

✅ Production Ready:

Multi-layer caching (L1: memory, L2: Redis, L3: DB)
Cache warming on startup with parallel batch processing
Thundering herd prevention (distributed locks)
Probabilistic refresh (stagger expirations)
Circuit breaker for Redis failures
Graceful degradation (fallback to DB)
Lock safety (TTL prevents deadlocks)
Proper error handling
Monitoring & alerting

Sizing Your Cache

The 80/20 Rule (Pareto Principle): 20% of your data gets 80% of the traffic.

You have 10 million users. Do you cache all feeds? No. Cache the active users.

Real example calculation:

Total users: 10,000,000
Active daily users: 2,000,000 (20%)
Active hourly users: 500,000 (5%)

Average feed size: 50 posts × 500 bytes = 25 KB
Cache budget: 16 GB (Redis instance size)

Option 1: Cache all users
10M × 25 KB = 250 GB ❌ (exceeds budget by 15x)

Option 2: Cache active hourly users
500K × 25 KB = 12.5 GB ✅ (fits with room to spare)

Hit rate: 95% (active users cached)
Miss rate: 5% (inactive users hit DB directly)

L1 vs L2 sizing:

Layer	Size	TTL	Purpose
L1 (memory)	100 MB	100ms	Prevent Redis hammering during traffic spikes
L2 (Redis)	16-64 GB	5-30min	Main cache layer, shared across servers
L3 (Database)	Unlimited	Forever	Source of truth

Rule of thumb: L1 should hold ~1000 of your hottest items. L2 should hold 10-20% of your dataset.

Cache Invalidation Strategies

Strategy	Example	When to Use
TTL-based	Profiles (5min), Posts (30s), Static (24hr)	Different data change frequencies
Event-based	User updates profile → publish event → delete cache	Immediate consistency needed
Dependency-based	New post → invalidate author + all 1000 followers	Complex relationships
Tag-based	`cache.invalidateByTag('user')` invalidates all user-related keys	Bulk invalidation
Probabilistic	Refresh randomly before expiration	Prevent thundering herd

Key Patterns Compared

Pattern	Use When	Consistency	What You Trade
Cache-Aside	Default for most reads	Eventual	Simple but cold cache penalty
Write-Through	Need strong consistency	Strong	2× write latency
Write-Behind	High write volume	Eventual	Risk of data loss if cache fails

When NOT to Cache

Don't Cache	Why
Financial transactions	Always read from source of truth. Stale balance = wrong charge.
Real-time stock prices	Price changes every second. Cache = misleading data.
Passwords / PII	Security risk. Always fetch from secure storage.
Rarely accessed data	Cache memory wasted. Cost > benefit.
User permissions/auth	Stale permissions = security hole. Cache for max 30s if needed.
Audit logs	Legal requirement for immutability. No caching.
Shopping cart totals	Needs transactional consistency. Cache items, not totals.

Redis Cluster: Scaling Beyond Single Instance

When you hit 100K+ concurrent users, single-instance Redis becomes the bottleneck. Redis Cluster distributes data across multiple nodes for horizontal scaling.

When You Need Clustering

Single Instance Limits:

Memory: ~64GB per instance (AWS/GCP limits)
Throughput: ~100K ops/sec
Network: ~1-2 Gbps

Signs you need clustering:

Redis memory usage >70% and growing
CPU consistently >60%
Network bandwidth saturating
Single point of failure unacceptable

How Redis Cluster Works

// Redis uses CRC16 hash slot algorithm
hash_slot = CRC16(key) % 16384

// Keys are distributed across nodes
Node 1: slots 0-5460
Node 2: slots 5461-10922
Node 3: slots 10923-16383

Critical: Hash Tags for Multi-Key Operations

// ❌ Keys go to different nodes - MULTI/EXEC fails
redis.mget('user:123:feed', 'user:123:profile')

// ✅ Use hash tags {} to force same slot
redis.mget('user:{123}:feed', 'user:{123}:profile')
// Both keys hash on "123", guaranteed same node

More on this in Part 2

Monitoring: What to Track

Metric	Target	Alert If	Why	Check in Redis CLI
Cache hit rate	>80%	<70%	Cache not effective	INFO stats
Memory usage	<80%	>90%	Running out of RAM	INFO memory
L1 latency	<1ms	>5ms	Memory pressure or GC issues	N/A (app-level)
L2 latency	<5ms	>20ms	Redis overloaded or network issues	--latency
Eviction rate	<100/sec	>1000/sec	Cache too small for working set	INFO stats
Miss rate	<20%	>30%	Ineffective caching strategy	INFO stats
DB query time	<50ms	>200ms	Database struggling	N/A (DB-level)

The Bottom Line

Production caching is about resilience and observability, not just performance.

Challenge	Solution
Single point of failure	Redis Cluster with replicas
Cache goes down	Circuit breakers, fallback to DB
Mystery slowness	Comprehensive monitoring + debug toolkit
Traffic spikes	Multi-layer cache + distributed locks
Can't reproduce bugs	Metric dashboards + structured logging

Start simple. Evolve deliberately. Your database will thank you.

Building scalable systems? I write about architecture patterns and clean code. Follow me on Linkedin | Twitter | Medium | GitHub

DEV Community