Caching 101: The Ultimate Guide to Caching
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
If you've ever wondered why some apps feel instant while others make you stare at a spinner — the answer is usually caching. It's one of the most powerful performance tools in a developer's toolkit, and one of the most misunderstood.
This guide covers everything — from what caching is and why it matters, to production-grade strategies used by Netflix, Twitter, and Facebook. Whether you're building a side project or designing systems at scale, this is your complete reference.
Table of Contents
- What Is Caching and Why It Matters
- The Latency Numbers Every Developer Should Know
- Types of Caching
- Caching Strategies
- Cache Eviction Policies
- Where to Cache
- HTTP Caching Deep Dive
- Redis Caching Patterns
- Cache Invalidation — The Hard Problem
- Cache Stampede / Thundering Herd
- Real-World Examples
- Decision Framework
- Common Mistakes
What Is Caching and Why It Matters
Caching is storing a copy of data in a faster storage layer so future requests for that data are served quicker. That's it. The concept is dead simple — the execution is where it gets interesting.
Think of it like your desk. You could walk to the filing cabinet every time you need a document, or you could keep the ones you use frequently right on your desk. That's caching.
Without caching:
Client → Server → Database → Server → Client
Total: ~200-500ms
With caching:
Client → Cache (hit!) → Client
Total: ~1-5ms
The performance impact is massive:
- Reduced latency — Serve responses in milliseconds instead of seconds
- Lower database load — Your DB handles 10x fewer queries
- Better scalability — Handle more users without scaling infrastructure
- Cost savings — Fewer DB reads = lower cloud bills
- Improved UX — Users don't wait, users don't leave
The Latency Numbers
These are approximate numbers every developer should internalize:
| Operation | Latency |
|---|---|
| L1 cache reference | ~1 ns |
| L2 cache reference | ~4 ns |
| RAM reference | ~100 ns |
| SSD random read | ~16 μs |
| HDD random read | ~2 ms |
| Redis GET (local) | ~0.1 ms |
| Redis GET (network) | ~0.5-1 ms |
| PostgreSQL simple query | ~1-5 ms |
| PostgreSQL complex query | ~10-100 ms |
| HTTP request (same region) | ~1-10 ms |
| HTTP request (cross-region) | ~50-150 ms |
| HTTP request (cross-continent) | ~100-300 ms |
Notice the difference: a RAM lookup is 100,000x faster than a disk read. A Redis GET is 10-100x faster than a database query. That's why caching works so well — you're moving data closer to where it's needed.
Speed Hierarchy:
┌─────────────┐
│ CPU Cache │ ← ~1-4 ns (fastest, smallest)
│ (L1/L2/L3) │
├─────────────┤
│ RAM │ ← ~100 ns
├─────────────┤
│ In-Process │ ← ~0.01 ms (HashMap, local cache)
│ Cache │
├─────────────┤
│ Distributed │ ← ~0.5-1 ms (Redis, Memcached)
│ Cache │
├─────────────┤
│ Database │ ← ~1-100 ms
├─────────────┤
│ Disk / SSD │ ← ~0.01-2 ms
├─────────────┤
│ Network │ ← ~50-300 ms (external API calls)
│ Calls │
└─────────────┘
▲
Slower, larger
Types of Caching
Caching happens at many layers of the stack. Let's walk through each one.
1. Browser Cache
The closest cache to the user. The browser stores static assets (images, CSS, JS) locally so it doesn't re-download them on every page visit.
Controlled via HTTP headers — we'll go deep on this later.
First visit: Browser → Server → downloads bundle.js (200 OK)
Second visit: Browser → checks local cache → serves bundle.js (from cache)
2. CDN Cache
Content Delivery Networks (Cloudflare, CloudFront, Fastly) cache content at edge locations worldwide. Instead of every request hitting your origin server, the CDN serves cached content from the node closest to the user.
Without CDN:
User in Tokyo → Origin in Virginia → 200ms round trip
With CDN:
User in Tokyo → CDN edge in Tokyo → 5ms round trip
Great for: static assets, API responses that don't change often, entire pages (with SSG/ISR).
3. Application-Level Cache
This is what most developers think of — caching data inside your application or in a dedicated cache store like Redis.
┌─────────┐ ┌───────────────┐ ┌──────────┐
│ Client │────▶│ Application │────▶│ Database │
│ │ │ │ │ │
│ │ │ ┌─────────┐ │ │ │
│ │ │ │ Cache │ │ │ │
│ │ │ │ (Redis) │ │ │ │
│ │ │ └─────────┘ │ │ │
└─────────┘ └───────────────┘ └──────────┘
Two flavors:
- In-process cache — Data stored in your app's memory (fast, but limited to that instance)
- Distributed cache — Shared cache like Redis or Memcached (slightly slower, but shared across instances)
4. Database Cache
Most databases have their own internal caching:
- MySQL Query Cache — Caches exact query results (deprecated in MySQL 8.0 — it rarely helped)
- PostgreSQL Shared Buffers — Caches frequently accessed pages in memory
- MongoDB WiredTiger Cache — Keeps working set in RAM
- Connection pooling — Not exactly caching, but reusing database connections avoids expensive handshakes
5. CPU Cache (Brief)
At the hardware level, CPUs have L1, L2, and L3 caches that store frequently accessed data from RAM. You can't control this directly, but writing cache-friendly code (sequential memory access, avoiding pointer chasing) can dramatically improve performance.
This matters more in systems programming (C, C++, Rust), but it's good to know the concept applies all the way down.
Caching Strategies
This is where it gets interesting. There are several patterns for how your application reads from and writes to the cache.
1. Cache-Aside (Lazy Loading)
The most common strategy. The application manages the cache directly.
Read flow:
1. App checks cache
2. If HIT → return cached data
3. If MISS → query database → store in cache → return data
Write flow:
1. App writes to database
2. App invalidates/deletes the cache entry
// Cache-Aside pattern with Redis + Node.js
async function getUser(userId) {
// 1. Check cache first
const cached = await redis.get(`user:${userId}`);
if (cached) {
return JSON.parse(cached); // Cache HIT
}
// 2. Cache MISS — query database
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
// 3. Store in cache with TTL
await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 3600); // 1 hour
return user;
}
// On update — invalidate cache
async function updateUser(userId, data) {
await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId]);
await redis.del(`user:${userId}`); // Invalidate
}
Pros: Simple, only caches what's actually requested (no wasted memory), app has full control.
Cons: Cache miss = slower first request, potential for stale data between write and invalidation.
2. Read-Through
Similar to cache-aside, but the cache itself is responsible for loading data on a miss. The application only talks to the cache.
Read flow:
1. App asks cache for data
2. If HIT → cache returns data
3. If MISS → cache loads from database → stores it → returns to app
App ←→ Cache ←→ Database
(App never talks to DB directly for reads)
Pros: Simpler application code — the cache handles everything.
Cons: Requires a cache library that supports it (like Caffeine in Java), first request is still slow.
3. Write-Through
Every write goes through the cache before hitting the database. The cache and DB are always in sync.
Write flow:
1. App writes to cache
2. Cache synchronously writes to database
3. Returns success only after both succeed
App → Cache → Database (synchronous)
┌─────┐ write ┌───────┐ write ┌──────────┐
│ App │────────────▶│ Cache │────────────▶│ Database │
│ │ │ │ (sync) │ │
│ │◀────────────│ │◀────────────│ │
│ │ success │ │ success │ │
└─────┘ └───────┘ └──────────┘
Pros: Cache is always consistent with DB, no stale reads.
Cons: Higher write latency (writing to two places), every write goes through the cache even if data isn't read often.
Best paired with: Read-Through (the combo gives you consistent reads and writes).
4. Write-Behind (Write-Back)
The app writes to the cache, and the cache asynchronously writes to the database later. The fastest write strategy.
Write flow:
1. App writes to cache → returns immediately
2. Cache batches writes and flushes to database asynchronously
App → Cache → [async queue] → Database
┌─────┐ write ┌───────┐ ┌──────────┐
│ App │────────────▶│ Cache │- - - - ▶│ Database │
│ │ │ │ async │ │
│ │◀────────────│ │ batch │ │
│ │ instant │ │ write │ │
└─────┘ └───────┘ └──────────┘
Pros: Blazing fast writes, batching reduces DB load.
Cons: Risk of data loss if cache crashes before flushing, eventual consistency, more complex.
Use case: High write throughput systems — logging, analytics, social media activity feeds.
5. Refresh-Ahead
The cache proactively refreshes entries before they expire, predicting which keys will be needed soon.
Flow:
1. Cache tracks access patterns
2. Before TTL expires on a hot key, cache refreshes it in the background
3. Users always get a cache HIT with fresh data
Pros: Eliminates cache miss latency for hot data.
Cons: Complex to implement, wastes resources if predictions are wrong.
Strategy Comparison
| Strategy | Read Perf | Write Perf | Consistency | Complexity | Best For |
|---|---|---|---|---|---|
| Cache-Aside | Good (after warm) | Good | Eventual | Low | General purpose |
| Read-Through | Good (after warm) | N/A | Eventual | Medium | Read-heavy workloads |
| Write-Through | Great | Slower | Strong | Medium | Read-heavy + consistency |
| Write-Behind | Great | Fastest | Eventual | High | Write-heavy workloads |
| Refresh-Ahead | Best (no misses) | N/A | Near-real-time | High | Hot data, predictable access |
Cache Eviction Policies
Your cache has limited memory. When it's full and a new entry needs to come in, something has to go. That's what eviction policies decide.
LRU — Least Recently Used
Evicts the entry that hasn't been accessed for the longest time.
Cache state (capacity = 3):
Access A → [A]
Access B → [A, B]
Access C → [A, B, C] ← Full
Access D → [B, C, D] ← A evicted (least recently used)
Access B → [C, D, B] ← B moved to front (recently used)
Access E → [D, B, E] ← C evicted
The most popular eviction policy. Used by Redis, Memcached, and most in-memory caches. Works well for most workloads because recently accessed data is likely to be accessed again.
LFU — Least Frequently Used
Evicts the entry that has been accessed the fewest times overall.
Cache state (capacity = 3):
A accessed 10 times, B accessed 2 times, C accessed 5 times
New entry D arrives → B evicted (lowest frequency)
Better for: Workloads with some items that are consistently popular (trending posts, popular products). But it can be slow to adapt — a once-popular item stays cached even after it stops being relevant.
FIFO — First In, First Out
Evicts the oldest entry regardless of access patterns. Simple but naive.
Cache state (capacity = 3):
Insert A → [A]
Insert B → [A, B]
Insert C → [A, B, C] ← Full
Insert D → [B, C, D] ← A evicted (first in)
Rarely used in production caches but shows up in specific scenarios (message queues, circular buffers).
TTL-Based (Time-To-Live)
Entries expire after a fixed time period, regardless of access patterns. Not really an eviction policy but a complementary mechanism.
// Redis: key expires after 1 hour
await redis.set('user:123', JSON.stringify(user), 'EX', 3600);
// Redis: key expires at a specific timestamp
await redis.expireat('user:123', Math.floor(Date.now() / 1000) + 3600);
TTL is almost always used alongside another eviction policy. For example: LRU eviction + 1-hour TTL ensures data is evicted when memory is full and doesn't get too stale.
Random Eviction
Evicts a random entry. Surprisingly effective in some workloads and very cheap to implement.
Redis supports this as allkeys-random or volatile-random.
Eviction Policy Comparison
| Policy | Best For | Weakness |
|---|---|---|
| LRU | General purpose, recency matters | Scan pollution (one-time scans evict hot data) |
| LFU | Stable popularity patterns | Slow to adapt to changing patterns |
| FIFO | Simple, predictable eviction | Ignores access patterns entirely |
| TTL | Time-sensitive data | Doesn't consider memory pressure |
| Random | When access is unpredictable | No intelligence, pure luck |
Where to Cache
Client-Side Caching
HTTP Cache Headers
The browser cache is controlled by HTTP response headers. This is the first line of defense.
Cache-Control: public, max-age=31536000 ← Cache for 1 year
Cache-Control: private, max-age=0 ← Don't cache in shared caches
Cache-Control: no-store ← Never cache this
Service Workers
Service workers can intercept network requests and serve cached responses — even when offline.
// service-worker.js — Cache-first strategy
self.addEventListener('fetch', (event) => {
event.respondWith(
caches.match(event.request).then((cached) => {
// Return cached version or fetch from network
return cached || fetch(event.request).then((response) => {
const clone = response.clone();
caches.open('v1').then((cache) => cache.put(event.request, clone));
return response;
});
})
);
});
LocalStorage / SessionStorage
Good for caching non-sensitive, small data on the client.
// Simple client-side cache
function getCachedData(key, fetchFn, ttlMs = 300000) { // 5 min default
const cached = localStorage.getItem(key);
if (cached) {
const { data, timestamp } = JSON.parse(cached);
if (Date.now() - timestamp < ttlMs) {
return data; // Cache HIT
}
}
// Cache MISS — fetch and store
return fetchFn().then((data) => {
localStorage.setItem(key, JSON.stringify({ data, timestamp: Date.now() }));
return data;
});
}
Server-Side Caching
Redis
The king of server-side caching. An in-memory data structure store that supports strings, hashes, lists, sets, sorted sets, and more.
- Speed: ~100,000 operations/second on a single node
- Persistence: Optional (RDB snapshots, AOF logging)
- Clustering: Redis Cluster for horizontal scaling
- Pub/Sub: Built-in for cache invalidation events
Memcached
Simpler than Redis — a pure key-value cache. No data structures, no persistence, no clustering (by itself).
Use Memcached when: You need a dead-simple, multi-threaded cache for string key-value pairs.
Use Redis when: You need data structures, persistence, pub/sub, Lua scripting, or more advanced features. (This is almost always the answer in 2026.)
In-Memory (Application-Level)
Sometimes the fastest cache is just a Map in your application:
// Simple in-memory cache with TTL
class InMemoryCache {
constructor() {
this.cache = new Map();
}
get(key) {
const entry = this.cache.get(key);
if (!entry) return null;
if (Date.now() > entry.expiresAt) {
this.cache.delete(key);
return null;
}
return entry.value;
}
set(key, value, ttlMs = 60000) {
this.cache.set(key, {
value,
expiresAt: Date.now() + ttlMs,
});
}
invalidate(key) {
this.cache.delete(key);
}
}
const cache = new InMemoryCache();
cache.set('config', { theme: 'dark' }, 300000); // 5 minutes
Warning: In-memory caches are per-process. If you have 4 app instances, each has its own cache — leading to inconsistency. Use Redis for shared state.
CDN Caching
CDNs like Cloudflare, AWS CloudFront, and Fastly cache responses at edge locations.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ User in │────▶│ CDN Edge │────▶│ Origin │
│ London │ │ London │ │ Virginia │
└──────────┘ └──────────┘ └──────────┘
│
Cache HIT?
┌───┴───┐
Yes No
│ │
Return │ Forward to
cached │ origin, cache
data │ response
You control CDN caching via HTTP headers or CDN-specific rules:
Cache-Control: public, s-maxage=86400, max-age=3600
│ │ │
│ │ └─ Browser caches for 1 hour
│ └─ CDN caches for 24 hours
└─ Both shared and private caches can store this
HTTP Caching Deep Dive
HTTP caching is one of the most impactful and least understood caching layers. Let's fix that.
Cache-Control Header
The main header that controls caching behavior:
# Cache publicly for 1 year (static assets with hashed filenames)
Cache-Control: public, max-age=31536000, immutable
# Cache privately (user-specific data), revalidate after 60 seconds
Cache-Control: private, max-age=60
# Don't cache at all (sensitive data, real-time data)
Cache-Control: no-store
# Cache but always revalidate before using
Cache-Control: no-cache
# CDN caches for 1 day, browser for 5 minutes
Cache-Control: public, s-maxage=86400, max-age=300
Common confusion: no-cache does NOT mean "don't cache." It means "cache it, but check with the server before using it." Use no-store to truly prevent caching.
ETag (Entity Tag)
A fingerprint of the response content. The server generates it, the browser sends it back on subsequent requests.
First request:
GET /api/user/123
Response:
ETag: "abc123"
Cache-Control: no-cache
Body: { "name": "Alice" }
Second request:
GET /api/user/123
If-None-Match: "abc123" ← "Hey server, has this changed?"
If unchanged:
304 Not Modified ← No body sent! Browser uses cached version
(saves bandwidth)
If changed:
200 OK
ETag: "def456" ← New fingerprint
Body: { "name": "Alice W." }
Last-Modified / If-Modified-Since
Similar to ETag but uses timestamps instead of fingerprints:
First request:
Response:
Last-Modified: Mon, 01 Mar 2026 12:00:00 GMT
Second request:
If-Modified-Since: Mon, 01 Mar 2026 12:00:00 GMT
If unchanged: 304 Not Modified
If changed: 200 OK with new data
ETag vs Last-Modified: ETag is more precise (content-based), Last-Modified can have issues with sub-second changes. Use ETag when possible.
stale-while-revalidate
A game-changer for perceived performance. Serve stale content immediately while fetching fresh content in the background.
Cache-Control: max-age=60, stale-while-revalidate=300
Timeline:
0-60s: Serve from cache (fresh)
60-360s: Serve STALE from cache instantly + revalidate in background
360s+: Cache expired, must fetch from origin
┌────────────────┬─────────────────────┬────────────────┐
│ 0-60 seconds │ 60-360 seconds │ After 360s │
│ │ │ │
│ Fresh cache │ Stale cache served │ Must │
│ served │ instantly + │ revalidate │
│ directly │ background refresh │ (full wait) │
└────────────────┴─────────────────────┴────────────────┘
Express.js Example — Setting Cache Headers
const express = require('express');
const app = express();
// Static assets — cache aggressively (files have hashed names)
app.use('/static', express.static('public', {
maxAge: '1y',
immutable: true,
}));
// API response — short cache with revalidation
app.get('/api/products', (req, res) => {
const products = getProducts();
const etag = generateETag(products);
// Check if client has current version
if (req.headers['if-none-match'] === etag) {
return res.status(304).end();
}
res.set({
'Cache-Control': 'public, max-age=60, stale-while-revalidate=300',
'ETag': etag,
});
res.json(products);
});
// User-specific data — private cache
app.get('/api/me', authMiddleware, (req, res) => {
res.set('Cache-Control', 'private, max-age=0, no-cache');
res.json(req.user);
});
// Sensitive data — never cache
app.get('/api/payment-methods', authMiddleware, (req, res) => {
res.set('Cache-Control', 'no-store');
res.json(getPaymentMethods(req.user.id));
});
Redis Caching Patterns
Redis is the go-to caching solution for most production systems. Here are battle-tested patterns.
Basic Key-Value Caching
const Redis = require('ioredis');
const redis = new Redis();
// Simple GET/SET with TTL
async function cacheGet(key) {
const data = await redis.get(key);
return data ? JSON.parse(data) : null;
}
async function cacheSet(key, data, ttlSeconds = 3600) {
await redis.set(key, JSON.stringify(data), 'EX', ttlSeconds);
}
Hash-Based Caching (For Objects)
Instead of serializing entire objects, use Redis hashes to cache individual fields:
// Store user as a hash — update individual fields without re-caching everything
async function cacheUser(user) {
await redis.hset(`user:${user.id}`, {
name: user.name,
email: user.email,
role: user.role,
});
await redis.expire(`user:${user.id}`, 3600);
}
// Get specific fields
async function getUserName(userId) {
return await redis.hget(`user:${userId}`, 'name');
}
// Get all fields
async function getUser(userId) {
return await redis.hgetall(`user:${userId}`);
}
Sorted Set for Leaderboards / Top-N
// Add scores
await redis.zadd('leaderboard', 1500, 'player:alice');
await redis.zadd('leaderboard', 2300, 'player:bob');
await redis.zadd('leaderboard', 1800, 'player:charlie');
// Top 10 players (highest scores first)
const top10 = await redis.zrevrange('leaderboard', 0, 9, 'WITHSCORES');
// ['player:bob', '2300', 'player:charlie', '1800', 'player:alice', '1500']
Cache Warming
Pre-populate the cache before traffic hits — especially useful after deployments.
async function warmCache() {
console.log('Warming cache...');
// Fetch most popular items and pre-cache them
const popularProducts = await db.query(
'SELECT * FROM products ORDER BY view_count DESC LIMIT 100'
);
const pipeline = redis.pipeline();
for (const product of popularProducts) {
pipeline.set(
`product:${product.id}`,
JSON.stringify(product),
'EX',
3600
);
}
await pipeline.exec(); // Execute all at once (much faster than individual calls)
console.log(`Warmed ${popularProducts.length} products`);
}
Multi-Layer Cache (L1 + L2)
Use an in-memory cache for ultra-hot data and Redis for everything else:
const NodeCache = require('node-cache');
const l1Cache = new NodeCache({ stdTTL: 30 }); // 30 second local cache
async function getWithMultiLayerCache(key, fetchFn) {
// L1: In-memory (fastest)
const l1 = l1Cache.get(key);
if (l1) return l1;
// L2: Redis (fast, shared)
const l2 = await redis.get(key);
if (l2) {
const parsed = JSON.parse(l2);
l1Cache.set(key, parsed); // Promote to L1
return parsed;
}
// L3: Database (slow, source of truth)
const data = await fetchFn();
await redis.set(key, JSON.stringify(data), 'EX', 3600); // L2
l1Cache.set(key, data); // L1
return data;
}
Request flow:
┌────┐ miss ┌───────┐ miss ┌──────────┐
│ L1 │────────▶│ L2 │────────▶│ Database │
│Mem │ │ Redis │ │ │
└──┬─┘ └───┬───┘ └────┬─────┘
│ │ │
│ hit │ hit │ fetch
▼ ▼ ▼
Return Promote Populate
instantly to L1 + L1 + L2 +
return return
Cache Invalidation
This is the genuinely hard part. Your cache is only useful if it serves correct data. Here's how to keep it consistent.
Strategy 1: TTL-Based Expiration
The simplest approach — set a TTL and accept that data might be stale for up to that duration.
// Data is eventually consistent within a 5-minute window
await redis.set('product:123', JSON.stringify(product), 'EX', 300);
When to use: Data where slight staleness is acceptable (product catalogs, blog posts, search results).
Strategy 2: Event-Driven Invalidation
Invalidate the cache when the underlying data changes. This is the most common production approach.
// When a product is updated, invalidate its cache
async function updateProduct(productId, data) {
await db.query('UPDATE products SET ... WHERE id = $1', [productId]);
// Invalidate all related cache keys
await redis.del(`product:${productId}`);
await redis.del(`products:category:${data.categoryId}`);
await redis.del('products:featured');
// Or publish an invalidation event
await redis.publish('cache:invalidate', JSON.stringify({
type: 'product',
id: productId,
}));
}
Strategy 3: Version-Based Keys
Instead of invalidating, change the cache key:
// Store a version counter
let productVersion = await redis.incr('product:123:version'); // e.g., 42
// Cache key includes the version
const cacheKey = `product:123:v${productVersion}`;
await redis.set(cacheKey, JSON.stringify(product), 'EX', 3600);
// Old versions naturally expire via TTL
Strategy 4: Pub/Sub Invalidation (Multi-Instance)
When you have multiple app instances, each with their own L1 cache:
// Subscriber — every app instance listens
const subscriber = new Redis();
subscriber.subscribe('cache:invalidate');
subscriber.on('message', (channel, message) => {
const { type, id } = JSON.parse(message);
l1Cache.del(`${type}:${id}`); // Invalidate local cache
});
// Publisher — when data changes
async function onDataChange(type, id) {
await redis.del(`${type}:${id}`); // Invalidate Redis
await redis.publish('cache:invalidate', JSON.stringify({ type, id })); // Notify all instances
}
The Invalidation Complexity Spectrum
Simple ◄────────────────────────────────────────────► Complex
TTL Event-driven Version keys Pub/Sub +
only invalidation with rollback multi-layer
invalidation
│ │ │ │
│ Stale for │ Consistent │ Zero-downtime │ Real-time
│ up to TTL │ after event │ cache updates │ consistency
│ │ │ │ across
│ Simplest │ Most common │ More complex │ all nodes
Cache Stampede / Thundering Herd
One of the nastiest caching problems. Here's the scenario:
1. A popular cache key expires
2. 1000 requests arrive simultaneously
3. All 1000 see a cache MISS
4. All 1000 query the database at the same time
5. Database gets crushed
6. Everything is slow or crashes
Normal operation: Cache stampede:
Req → Cache HIT ✓ Req₁ → Cache MISS → DB query
Req → Cache HIT ✓ Req₂ → Cache MISS → DB query
Req → Cache HIT ✓ Req₃ → Cache MISS → DB query
Req → Cache HIT ✓ ...
Req₁₀₀₀ → Cache MISS → DB query
↓
DATABASE DIES
Solution 1: Locking (Mutex)
Only one request fetches from the database. Others wait for the cache to be populated.
async function getWithLock(key, fetchFn, ttl = 3600) {
// Try cache first
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
// Try to acquire a lock
const lockKey = `lock:${key}`;
const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX'); // 10s lock timeout
if (acquired) {
try {
// We got the lock — fetch from DB and populate cache
const data = await fetchFn();
await redis.set(key, JSON.stringify(data), 'EX', ttl);
return data;
} finally {
await redis.del(lockKey); // Release lock
}
} else {
// Someone else is fetching — wait and retry
await new Promise((resolve) => setTimeout(resolve, 100));
return getWithLock(key, fetchFn, ttl); // Retry
}
}
Solution 2: Early Expiration (Probabilistic)
Randomly refresh the cache before it actually expires. The idea: if 1% of requests refresh the cache when it's close to expiry, you avoid the stampede.
async function getWithEarlyExpiry(key, fetchFn, ttl = 3600) {
const cached = await redis.get(key);
const cacheTTL = await redis.ttl(key);
if (cached && cacheTTL > 0) {
// Probabilistically refresh if TTL is low
const shouldRefresh = cacheTTL < (ttl * 0.1) && Math.random() < 0.1;
if (shouldRefresh) {
// Refresh in background — don't block the response
fetchFn().then((data) => {
redis.set(key, JSON.stringify(data), 'EX', ttl);
});
}
return JSON.parse(cached);
}
// Cache miss — fetch and store
const data = await fetchFn();
await redis.set(key, JSON.stringify(data), 'EX', ttl);
return data;
}
Solution 3: stale-while-revalidate (Never Expire)
Always serve stale data while refreshing in the background. The cache never truly "expires" — it just gets refreshed.
async function getWithSWR(key, fetchFn, freshTTL = 60, staleTTL = 3600) {
const entry = await redis.hgetall(`swr:${key}`);
if (entry.data) {
const age = Date.now() - parseInt(entry.timestamp);
if (age < freshTTL * 1000) {
return JSON.parse(entry.data); // Fresh
}
if (age < staleTTL * 1000) {
// Stale — serve immediately, refresh in background
fetchFn().then((data) => {
redis.hset(`swr:${key}`, {
data: JSON.stringify(data),
timestamp: Date.now().toString(),
});
});
return JSON.parse(entry.data); // Serve stale
}
}
// No data or expired — fetch synchronously
const data = await fetchFn();
await redis.hset(`swr:${key}`, {
data: JSON.stringify(data),
timestamp: Date.now().toString(),
});
return data;
}
Real-World Examples
Netflix
Netflix serves 250 million+ subscribers and caches aggressively at every layer:
- EVCache — Their custom distributed caching layer built on Memcached. Handles 30 million requests per second with sub-millisecond latency.
- Edge caching — Movie artwork, metadata, and personalized recommendations are cached at CDN edge locations.
- Tiered caching — L1 (in-process) → L2 (EVCache) → L3 (database). Most reads never hit the database.
- Precomputed caches — Recommendations are computed offline and stored in cache, not generated in real-time.
Twitter (X)
Twitter's timeline is one of the most cache-intensive systems:
- Timeline fanout — When a user tweets, their tweet is written to the cached timeline of all followers (write-behind pattern).
- Redis clusters — Twitter runs one of the world's largest Redis deployments for timeline caching.
- Celebrity problem — Users with millions of followers can't use fanout (too expensive), so their tweets are merged at read time (hybrid approach).
Regular user tweets (fanout-on-write):
Tweet → Write to 500 follower timelines in cache
Celebrity tweets (fanout-on-read):
Tweet → Store once → Merge into timeline when follower reads
Facebook (Meta)
Facebook's caching system is legendary:
- Memcache (TAO) — A custom graph-aware caching layer that sits in front of their MySQL databases. Handles billions of requests per second.
- Regional caching — Data is cached in the region closest to the user, with cross-region invalidation.
- Cache lease — Their solution to thundering herd: when a cache miss occurs, the first request gets a "lease" (permission to refresh), and all other requests wait.
- McRouter — A Memcached protocol router that handles cache sharding, replication, and failover.
Decision Framework
What to Cache
Good candidates for caching:
✓ Database query results (especially complex JOINs)
✓ API responses from external services
✓ Computed/aggregated data (dashboards, reports, analytics)
✓ Session data
✓ Configuration and feature flags
✓ Static assets (images, CSS, JS)
✓ User profiles and preferences
✓ Product catalogs / search results
Bad candidates for caching:
✗ Rapidly changing data (stock prices, live scores)
✗ Sensitive data (passwords, tokens, PII — unless encrypted)
✗ Write-heavy data with low read frequency
✗ Large blobs that rarely repeat (file uploads)
✗ Data that MUST be real-time consistent (financial transactions)
Decision Flowchart
Is the data read frequently?
/ \
Yes No
| |
Is staleness Don't cache.
acceptable? Not worth it.
/ \
Yes No (must be real-time)
| |
Is it expensive Consider write-through
to compute/fetch? or event-driven
| invalidation
Yes
|
┌─────┴──────┐
│ CACHE IT │
└─────┬──────┘
|
Choose your layer:
┌──────────────────────────┐
│ Static content → CDN │
│ User-specific → Redis │
│ Hot data → In-memory │
│ API responses → HTTP │
│ cache headers │
└──────────────────────────┘
Choosing a Cache TTL
| Data Type | Recommended TTL | Rationale |
|---|---|---|
| Static assets (hashed) | 1 year | Filename changes on update |
| Product catalog | 5-15 min | Changes infrequently |
| Search results | 1-5 min | Acceptable staleness |
| User profile | 1-24 hours | Rarely changes |
| Session data | 30 min - 24 hours | Depends on security needs |
| Feature flags | 1-5 min | Want changes to propagate quickly |
| API rate limit counters | Match the rate limit window | Must be accurate |
| Real-time data | Don't cache, or < 5 seconds | Staleness = bad UX |
Common Mistakes
1. Caching Sensitive Data Without Encryption
Never cache passwords, tokens, credit card numbers, or PII in plain text. If you must cache sensitive data, encrypt it and use short TTLs.
// BAD — plain text sensitive data in cache
await redis.set('user:123:payment', JSON.stringify({ cardNumber: '4111...' }));
// BETTER — encrypt before caching
const encrypted = encrypt(JSON.stringify(paymentData));
await redis.set('user:123:payment', encrypted, 'EX', 300); // Short TTL
2. Not Setting TTL
Cache entries without a TTL live forever. This leads to stale data and memory exhaustion.
// BAD — no TTL, lives forever
await redis.set('user:123', JSON.stringify(user));
// GOOD — always set a TTL
await redis.set('user:123', JSON.stringify(user), 'EX', 3600);
Also set maxmemory-policy in Redis to handle overflow:
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
3. Cache Poisoning
If an attacker can control what gets cached, they can serve malicious content to all users.
Attack: Manipulate request headers/params → server generates bad response → CDN caches it
Prevention:
- Normalize cache keys
- Validate all inputs before caching
- Use Vary headers correctly
- Don't cache error responses
4. Not Thinking About Cache Warming
After a deployment or cache flush, every request is a miss. Your database gets slammed.
Fix: Warm the cache with popular data before routing traffic.
5. Caching Too Much or Too Little
- Too much caching → Memory waste, stale data issues, complex invalidation
- Too little caching → Performance bottlenecks, high DB load
Start with caching your hottest paths — the 20% of queries that generate 80% of traffic. Measure, then expand.
6. Ignoring Cache Metrics
You can't improve what you don't measure. Track:
- Hit rate — Aim for 90%+. Below 80% means your caching strategy needs work.
- Miss rate — Every miss is a database hit.
- Eviction rate — High evictions mean your cache is too small or TTLs are wrong.
- Latency — p50, p95, p99 for cache operations.
- Memory usage — Know when you're approaching limits.
7. Not Handling Cache Failures Gracefully
Redis is down. Now what? If your app crashes because the cache is unavailable, you've made the cache a single point of failure.
// GOOD — fall back to database if cache fails
async function getUserSafe(userId) {
try {
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
} catch (err) {
console.warn('Cache unavailable, falling back to DB', err.message);
// Don't throw — fall through to DB
}
return db.query('SELECT * FROM users WHERE id = $1', [userId]);
}
TL;DR — Caching Cheat Sheet
┌──────────────────────────────────────────────────────────────┐
│ CACHING CHEAT SHEET │
├──────────────────────────────────────────────────────────────┤
│ │
│ STRATEGIES │
│ ───────── │
│ Cache-Aside .... Most common, app manages cache │
│ Read-Through ... Cache loads data on miss │
│ Write-Through .. Write to cache + DB synchronously │
│ Write-Behind ... Write to cache, async flush to DB │
│ Refresh-Ahead .. Proactively refresh before expiry │
│ │
│ EVICTION POLICIES │
│ ───────────────── │
│ LRU ............ Best default choice │
│ LFU ............ Stable popularity patterns │
│ TTL ............ Always use alongside other policies │
│ │
│ WHERE TO CACHE │
│ ────────────── │
│ Browser ........ HTTP headers, service workers │
│ CDN ............ Static assets, cacheable API responses │
│ Redis .......... Server-side, shared across instances │
│ In-memory ...... Ultra-hot data, single instance only │
│ │
│ GOLDEN RULES │
│ ──────────── │
│ 1. Always set a TTL │
│ 2. Never cache sensitive data unencrypted │
│ 3. Handle cache failures gracefully (fallback to DB) │
│ 4. Monitor hit rate — aim for 90%+ │
│ 5. Start with hot paths, expand from there │
│ 6. Cache invalidation > letting data go stale │
│ │
└──────────────────────────────────────────────────────────────┘
Let's Connect!
If this guide helped you level up your caching game, I'd love to connect! I regularly share deep dives on system design, backend architecture, and web performance.
Connect with me on LinkedIn — let's grow together.
Share this with a developer who's still hitting the database on every request!
Top comments (0)