You're writing a research paper. Every time you need a fact, you walk to the library (10 minutes away), search for the book (20 minutes), read the page (5 minutes), return the book, walk home (10 minutes).
For 20 facts, that's 15 hours of walking and searching.
Or: First trip, photocopy the relevant pages. Keep them on your desk. Look up facts instantly. One trip, 45 minutes total.
That's caching. Stop making the same expensive trip repeatedly.
Core idea: Trade memory for speed.
The Real-World Pain
You're building a social media feed. Users refresh constantly. Right now:
- 12.7 seconds per feed load
- Database melts down at 10K concurrent users
- Paying $50K/month for oversized instances
- Database connections pool out (max 100)
- Database CPU hits 100%
The platform grinds to a halt.
The Refactoring Journey
You can't keep hitting the database for every request. Here's how most teams evolve their caching strategy.
Attempt 1: Cache Everything With Fixed TTL (25%)
The idea: Just cache it all. Redis is fast. Problem solved, right?
feed = cache.get("user:123:feed")
if (!feed) {
feed = database.query(...)
cache.set("user:123:feed", feed, ttl=300) // 5 minutes
}
return feed
What happens: It works! Data refreshes. Memory stays bounded. But then you notice something odd. Every 5 minutes, database load spikes to 100%. All cache entries expire at once. It's like a dam breaking every 5 minutes.
Also, user profiles (which barely change) get the same 5-minute TTL as rapidly updating feeds. Wasteful.
| ✅ Achieved | ❌ Still Pending |
|---|---|
| Basic caching works | No invalidation strategy |
| Reduced initial DB load | Thundering herd (all expire together) |
| Faster repeated requests | No granular control |
| Data eventually refreshes | Cache hit rate drops on mass expiry |
| Memory stays bounded | |
| Automatic cleanup |
Attempt 2: Smart Caching with Invalidation (50%)
The evolution: Different TTLs for different data. Cache-aside pattern. Write-through when needed. Event-driven invalidation.
// Read with cache-aside
feed = cache.get(key)
if (!feed) {
feed = database.query(...)
cache.set(key, feed, ttl=30) // Short TTL for feeds
}
// Write-through for critical data
database.update(userId, data)
cache.set(`user:${userId}`, data) // Update cache immediately
eventBus.publish("UserUpdated", userId) // Notify dependents
// Event handler invalidates related caches
onUserUpdated((userId) => {
cache.delete(`user:${userId}:feed`)
// Invalidate followers' feeds too
followers = getFollowers(userId)
followers.forEach(f => cache.delete(`user:${f}:feed`))
})
What happens: Much better! Fresh data after updates. But edge cases emerge.
| ✅ Achieved | ❌ Still Pending |
|---|---|
| Fresh data after updates | Race conditions (write during read) |
| Granular TTL control | No thundering herd prevention |
| Event-driven invalidation | Cold start hammers DB |
| Write-through for consistency | Single point of failure |
Attempt 3: Thundering Herd Prevention (75%)
The breakthrough: The database dies when popular cache keys expire simultaneously. 10,000 requests hit the database at once. You need to ensure only ONE request fetches from the database while others wait.
Key innovations:
- Distributed locks: Only one request refreshes expired data
- Probabilistic early expiration: Stagger refreshes before TTL expires
What happens: Traffic spike at noon. Your most popular user's feed (1M followers) expires from cache. 50,000 requests arrive in the next second. Without protection, that's 50,000 database queries. Instant death.
Impact:
- Without protection: 10,000 concurrent requests = 10,000 DB queries = database death
- With distributed lock: 10,000 concurrent requests = 1 DB query = smooth sailing
- With probabilistic refresh: Refreshes spread over 30 seconds = no expiration spike
| ✅ Achieved | ❌ Still Pending |
|---|---|
| Thundering herd eliminated | Cold start still painful |
| Smooth cache refresh | No failure handling |
| Database protected during spikes | Single-layer cache |
| Staggered expirations |
Attempt 4: Production-Ready Caching (100%) ✨
The final form: Everything from Attempt 3, plus multi-layer caching, intelligent warming, circuit breakers, and graceful degradation.
Architecture:
Request → L1 (memory, 1ms) → L2 (Redis, 7ms) → Database (200ms)
↓ (if down)
Fallback to DB
Key additions:
Multi-layer caching:
- L1 (in-memory): 100ms TTL, prevents Redis hammering during traffic spikes
- L2 (Redis): 5min TTL, shared across all servers
- L3 (Database): Source of truth
Cache warming on startup
What happens: L1 cache hits return in 1ms. L2 hits in 7ms. Database queries only happen for cache misses or cold data. No thundering herd. Redis failures degrade gracefully to database. App restarts are smooth. You scale to 100K concurrent users without breaking a sweat.
Real-world latency:
| Request Type | Latency | What Happened |
|---|---|---|
| L1 hit (same server) | 1ms | Read from Node.js memory |
| L2 hit (Redis) | 7ms | Network + Redis lookup |
| Database miss | 207ms | Full query + cache populate |
Impact: 80% L1 hit rate + 15% L2 hit rate = 95% of requests under 10ms. Only 5% hit database.
✅ Production Ready:
- Multi-layer caching (L1: memory, L2: Redis, L3: DB)
- Cache warming on startup with parallel batch processing
- Thundering herd prevention (distributed locks)
- Probabilistic refresh (stagger expirations)
- Circuit breaker for Redis failures
- Graceful degradation (fallback to DB)
- Lock safety (TTL prevents deadlocks)
- Proper error handling
- Monitoring & alerting
Sizing Your Cache
The 80/20 Rule (Pareto Principle): 20% of your data gets 80% of the traffic.
You have 10 million users. Do you cache all feeds? No. Cache the active users.
Real example calculation:
Total users: 10,000,000
Active daily users: 2,000,000 (20%)
Active hourly users: 500,000 (5%)
Average feed size: 50 posts × 500 bytes = 25 KB
Cache budget: 16 GB (Redis instance size)
Option 1: Cache all users
10M × 25 KB = 250 GB ❌ (exceeds budget by 15x)
Option 2: Cache active hourly users
500K × 25 KB = 12.5 GB ✅ (fits with room to spare)
Hit rate: 95% (active users cached)
Miss rate: 5% (inactive users hit DB directly)
L1 vs L2 sizing:
| Layer | Size | TTL | Purpose |
|---|---|---|---|
| L1 (memory) | 100 MB | 100ms | Prevent Redis hammering during traffic spikes |
| L2 (Redis) | 16-64 GB | 5-30min | Main cache layer, shared across servers |
| L3 (Database) | Unlimited | Forever | Source of truth |
Rule of thumb: L1 should hold ~1000 of your hottest items. L2 should hold 10-20% of your dataset.
Cache Invalidation Strategies
| Strategy | Example | When to Use |
|---|---|---|
| TTL-based | Profiles (5min), Posts (30s), Static (24hr) | Different data change frequencies |
| Event-based | User updates profile → publish event → delete cache | Immediate consistency needed |
| Dependency-based | New post → invalidate author + all 1000 followers | Complex relationships |
| Tag-based |
cache.invalidateByTag('user') invalidates all user-related keys |
Bulk invalidation |
| Probabilistic | Refresh randomly before expiration | Prevent thundering herd |
Key Patterns Compared
| Pattern | Use When | Consistency | What You Trade |
|---|---|---|---|
| Cache-Aside | Default for most reads | Eventual | Simple but cold cache penalty |
| Write-Through | Need strong consistency | Strong | 2× write latency |
| Write-Behind | High write volume | Eventual | Risk of data loss if cache fails |
When NOT to Cache
| Don't Cache | Why |
|---|---|
| Financial transactions | Always read from source of truth. Stale balance = wrong charge. |
| Real-time stock prices | Price changes every second. Cache = misleading data. |
| Passwords / PII | Security risk. Always fetch from secure storage. |
| Rarely accessed data | Cache memory wasted. Cost > benefit. |
| User permissions/auth | Stale permissions = security hole. Cache for max 30s if needed. |
| Audit logs | Legal requirement for immutability. No caching. |
| Shopping cart totals | Needs transactional consistency. Cache items, not totals. |
Redis Cluster: Scaling Beyond Single Instance
When you hit 100K+ concurrent users, single-instance Redis becomes the bottleneck. Redis Cluster distributes data across multiple nodes for horizontal scaling.
When You Need Clustering
Single Instance Limits:
- Memory: ~64GB per instance (AWS/GCP limits)
- Throughput: ~100K ops/sec
- Network: ~1-2 Gbps
Signs you need clustering:
- Redis memory usage >70% and growing
- CPU consistently >60%
- Network bandwidth saturating
- Single point of failure unacceptable
How Redis Cluster Works
// Redis uses CRC16 hash slot algorithm
hash_slot = CRC16(key) % 16384
// Keys are distributed across nodes
Node 1: slots 0-5460
Node 2: slots 5461-10922
Node 3: slots 10923-16383
Critical: Hash Tags for Multi-Key Operations
// ❌ Keys go to different nodes - MULTI/EXEC fails
redis.mget('user:123:feed', 'user:123:profile')
// ✅ Use hash tags {} to force same slot
redis.mget('user:{123}:feed', 'user:{123}:profile')
// Both keys hash on "123", guaranteed same node
More on this in Part 2
Monitoring: What to Track
| Metric | Target | Alert If | Why | Check in Redis CLI |
|---|---|---|---|---|
| Cache hit rate | >80% | <70% | Cache not effective | INFO stats |
| Memory usage | <80% | >90% | Running out of RAM | INFO memory |
| L1 latency | <1ms | >5ms | Memory pressure or GC issues | N/A (app-level) |
| L2 latency | <5ms | >20ms | Redis overloaded or network issues | --latency |
| Eviction rate | <100/sec | >1000/sec | Cache too small for working set | INFO stats |
| Miss rate | <20% | >30% | Ineffective caching strategy | INFO stats |
| DB query time | <50ms | >200ms | Database struggling | N/A (DB-level) |
The Bottom Line
Production caching is about resilience and observability, not just performance.
| Challenge | Solution |
|---|---|
| Single point of failure | Redis Cluster with replicas |
| Cache goes down | Circuit breakers, fallback to DB |
| Mystery slowness | Comprehensive monitoring + debug toolkit |
| Traffic spikes | Multi-layer cache + distributed locks |
| Can't reproduce bugs | Metric dashboards + structured logging |
Start simple. Evolve deliberately. Your database will thank you.
Building scalable systems? I write about architecture patterns and clean code. Follow me on Dev.to | GitHub

Top comments (0)