Ishaan Pandey

Posted on Mar 21 • Originally published at ishaaan.hashnode.dev

Caching 101: The Ultimate Guide to Caching — From Basics to Production-Grade Strategies

#caching #systemdesign #backenddevelopment #webdev

Caching 101: The Ultimate Guide to Caching

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

If you've ever wondered why some apps feel instant while others make you stare at a spinner — the answer is usually caching. It's one of the most powerful performance tools in a developer's toolkit, and one of the most misunderstood.

This guide covers everything — from what caching is and why it matters, to production-grade strategies used by Netflix, Twitter, and Facebook. Whether you're building a side project or designing systems at scale, this is your complete reference.

What Is Caching and Why It Matters
The Latency Numbers Every Developer Should Know
Types of Caching
Caching Strategies
Cache Eviction Policies
Where to Cache
HTTP Caching Deep Dive
Redis Caching Patterns
Cache Invalidation — The Hard Problem
Cache Stampede / Thundering Herd
Real-World Examples
Decision Framework
Common Mistakes

What Is Caching and Why It Matters

Caching is storing a copy of data in a faster storage layer so future requests for that data are served quicker. That's it. The concept is dead simple — the execution is where it gets interesting.

Think of it like your desk. You could walk to the filing cabinet every time you need a document, or you could keep the ones you use frequently right on your desk. That's caching.

Without caching:
  Client → Server → Database → Server → Client
  Total: ~200-500ms

With caching:
  Client → Cache (hit!) → Client
  Total: ~1-5ms

The performance impact is massive:

Reduced latency — Serve responses in milliseconds instead of seconds
Lower database load — Your DB handles 10x fewer queries
Better scalability — Handle more users without scaling infrastructure
Cost savings — Fewer DB reads = lower cloud bills
Improved UX — Users don't wait, users don't leave

The Latency Numbers

These are approximate numbers every developer should internalize:

Operation	Latency
L1 cache reference	~1 ns
L2 cache reference	~4 ns
RAM reference	~100 ns
SSD random read	~16 μs
HDD random read	~2 ms
Redis GET (local)	~0.1 ms
Redis GET (network)	~0.5-1 ms
PostgreSQL simple query	~1-5 ms
PostgreSQL complex query	~10-100 ms
HTTP request (same region)	~1-10 ms
HTTP request (cross-region)	~50-150 ms
HTTP request (cross-continent)	~100-300 ms

Notice the difference: a RAM lookup is 100,000x faster than a disk read. A Redis GET is 10-100x faster than a database query. That's why caching works so well — you're moving data closer to where it's needed.

Speed Hierarchy:

  ┌─────────────┐
  │  CPU Cache   │  ← ~1-4 ns (fastest, smallest)
  │  (L1/L2/L3) │
  ├─────────────┤
  │     RAM      │  ← ~100 ns
  ├─────────────┤
  │  In-Process  │  ← ~0.01 ms (HashMap, local cache)
  │    Cache     │
  ├─────────────┤
  │  Distributed │  ← ~0.5-1 ms (Redis, Memcached)
  │    Cache     │
  ├─────────────┤
  │   Database   │  ← ~1-100 ms
  ├─────────────┤
  │  Disk / SSD  │  ← ~0.01-2 ms
  ├─────────────┤
  │   Network    │  ← ~50-300 ms (external API calls)
  │    Calls     │
  └─────────────┘
        ▲
   Slower, larger

Types of Caching

Caching happens at many layers of the stack. Let's walk through each one.

1. Browser Cache

The closest cache to the user. The browser stores static assets (images, CSS, JS) locally so it doesn't re-download them on every page visit.

Controlled via HTTP headers — we'll go deep on this later.

First visit:  Browser → Server → downloads bundle.js (200 OK)
Second visit: Browser → checks local cache → serves bundle.js (from cache)

2. CDN Cache

Content Delivery Networks (Cloudflare, CloudFront, Fastly) cache content at edge locations worldwide. Instead of every request hitting your origin server, the CDN serves cached content from the node closest to the user.

Without CDN:
  User in Tokyo → Origin in Virginia → 200ms round trip

With CDN:
  User in Tokyo → CDN edge in Tokyo → 5ms round trip

Great for: static assets, API responses that don't change often, entire pages (with SSG/ISR).

3. Application-Level Cache

This is what most developers think of — caching data inside your application or in a dedicated cache store like Redis.

┌─────────┐     ┌───────────────┐     ┌──────────┐
│  Client  │────▶│  Application  │────▶│ Database │
│         │     │               │     │          │
│         │     │  ┌─────────┐  │     │          │
│         │     │  │  Cache   │  │     │          │
│         │     │  │ (Redis)  │  │     │          │
│         │     │  └─────────┘  │     │          │
└─────────┘     └───────────────┘     └──────────┘

Two flavors:

In-process cache — Data stored in your app's memory (fast, but limited to that instance)
Distributed cache — Shared cache like Redis or Memcached (slightly slower, but shared across instances)

4. Database Cache

Most databases have their own internal caching:

MySQL Query Cache — Caches exact query results (deprecated in MySQL 8.0 — it rarely helped)
PostgreSQL Shared Buffers — Caches frequently accessed pages in memory
MongoDB WiredTiger Cache — Keeps working set in RAM
Connection pooling — Not exactly caching, but reusing database connections avoids expensive handshakes

5. CPU Cache (Brief)

At the hardware level, CPUs have L1, L2, and L3 caches that store frequently accessed data from RAM. You can't control this directly, but writing cache-friendly code (sequential memory access, avoiding pointer chasing) can dramatically improve performance.

This matters more in systems programming (C, C++, Rust), but it's good to know the concept applies all the way down.

Caching Strategies

This is where it gets interesting. There are several patterns for how your application reads from and writes to the cache.

1. Cache-Aside (Lazy Loading)

The most common strategy. The application manages the cache directly.

Read flow:
  1. App checks cache
  2. If HIT → return cached data
  3. If MISS → query database → store in cache → return data

Write flow:
  1. App writes to database
  2. App invalidates/deletes the cache entry

// Cache-Aside pattern with Redis + Node.js
async function getUser(userId) {
  // 1. Check cache first
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    return JSON.parse(cached); // Cache HIT
  }

  // 2. Cache MISS — query database
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

  // 3. Store in cache with TTL
  await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 3600); // 1 hour

  return user;
}

// On update — invalidate cache
async function updateUser(userId, data) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId]);
  await redis.del(`user:${userId}`); // Invalidate
}

Pros: Simple, only caches what's actually requested (no wasted memory), app has full control.

Cons: Cache miss = slower first request, potential for stale data between write and invalidation.

2. Read-Through

Similar to cache-aside, but the cache itself is responsible for loading data on a miss. The application only talks to the cache.

Read flow:
  1. App asks cache for data
  2. If HIT → cache returns data
  3. If MISS → cache loads from database → stores it → returns to app

  App ←→ Cache ←→ Database
  (App never talks to DB directly for reads)

Pros: Simpler application code — the cache handles everything.

Cons: Requires a cache library that supports it (like Caffeine in Java), first request is still slow.

3. Write-Through

Every write goes through the cache before hitting the database. The cache and DB are always in sync.

Write flow:
  1. App writes to cache
  2. Cache synchronously writes to database
  3. Returns success only after both succeed

  App → Cache → Database (synchronous)

┌─────┐    write    ┌───────┐    write    ┌──────────┐
│ App │────────────▶│ Cache │────────────▶│ Database │
│     │             │       │   (sync)    │          │
│     │◀────────────│       │◀────────────│          │
│     │   success   │       │   success   │          │
└─────┘             └───────┘             └──────────┘

Pros: Cache is always consistent with DB, no stale reads.

Cons: Higher write latency (writing to two places), every write goes through the cache even if data isn't read often.

Best paired with: Read-Through (the combo gives you consistent reads and writes).

4. Write-Behind (Write-Back)

The app writes to the cache, and the cache asynchronously writes to the database later. The fastest write strategy.

Write flow:
  1. App writes to cache → returns immediately
  2. Cache batches writes and flushes to database asynchronously

  App → Cache → [async queue] → Database

┌─────┐    write    ┌───────┐         ┌──────────┐
│ App │────────────▶│ Cache │- - - - ▶│ Database │
│     │             │       │  async  │          │
│     │◀────────────│       │  batch  │          │
│     │   instant   │       │  write  │          │
└─────┘             └───────┘         └──────────┘

Pros: Blazing fast writes, batching reduces DB load.

Cons: Risk of data loss if cache crashes before flushing, eventual consistency, more complex.

Use case: High write throughput systems — logging, analytics, social media activity feeds.

5. Refresh-Ahead

The cache proactively refreshes entries before they expire, predicting which keys will be needed soon.

Flow:
  1. Cache tracks access patterns
  2. Before TTL expires on a hot key, cache refreshes it in the background
  3. Users always get a cache HIT with fresh data

Pros: Eliminates cache miss latency for hot data.

Cons: Complex to implement, wastes resources if predictions are wrong.

Strategy Comparison

Strategy	Read Perf	Write Perf	Consistency	Complexity	Best For
Cache-Aside	Good (after warm)	Good	Eventual	Low	General purpose
Read-Through	Good (after warm)	N/A	Eventual	Medium	Read-heavy workloads
Write-Through	Great	Slower	Strong	Medium	Read-heavy + consistency
Write-Behind	Great	Fastest	Eventual	High	Write-heavy workloads
Refresh-Ahead	Best (no misses)	N/A	Near-real-time	High	Hot data, predictable access

Cache Eviction Policies

Your cache has limited memory. When it's full and a new entry needs to come in, something has to go. That's what eviction policies decide.

LRU — Least Recently Used

Evicts the entry that hasn't been accessed for the longest time.

Cache state (capacity = 3):

  Access A → [A]
  Access B → [A, B]
  Access C → [A, B, C]       ← Full
  Access D → [B, C, D]       ← A evicted (least recently used)
  Access B → [C, D, B]       ← B moved to front (recently used)
  Access E → [D, B, E]       ← C evicted

The most popular eviction policy. Used by Redis, Memcached, and most in-memory caches. Works well for most workloads because recently accessed data is likely to be accessed again.

LFU — Least Frequently Used

Evicts the entry that has been accessed the fewest times overall.

Cache state (capacity = 3):

  A accessed 10 times, B accessed 2 times, C accessed 5 times
  New entry D arrives → B evicted (lowest frequency)

Better for: Workloads with some items that are consistently popular (trending posts, popular products). But it can be slow to adapt — a once-popular item stays cached even after it stops being relevant.

FIFO — First In, First Out

Evicts the oldest entry regardless of access patterns. Simple but naive.

Cache state (capacity = 3):

  Insert A → [A]
  Insert B → [A, B]
  Insert C → [A, B, C]       ← Full
  Insert D → [B, C, D]       ← A evicted (first in)

Rarely used in production caches but shows up in specific scenarios (message queues, circular buffers).

TTL-Based (Time-To-Live)

Entries expire after a fixed time period, regardless of access patterns. Not really an eviction policy but a complementary mechanism.

// Redis: key expires after 1 hour
await redis.set('user:123', JSON.stringify(user), 'EX', 3600);

// Redis: key expires at a specific timestamp
await redis.expireat('user:123', Math.floor(Date.now() / 1000) + 3600);

TTL is almost always used alongside another eviction policy. For example: LRU eviction + 1-hour TTL ensures data is evicted when memory is full and doesn't get too stale.

Random Eviction

Evicts a random entry. Surprisingly effective in some workloads and very cheap to implement.

Redis supports this as allkeys-random or volatile-random.

Eviction Policy Comparison

Policy	Best For	Weakness
LRU	General purpose, recency matters	Scan pollution (one-time scans evict hot data)
LFU	Stable popularity patterns	Slow to adapt to changing patterns
FIFO	Simple, predictable eviction	Ignores access patterns entirely
TTL	Time-sensitive data	Doesn't consider memory pressure
Random	When access is unpredictable	No intelligence, pure luck

Where to Cache

Client-Side Caching

HTTP Cache Headers

The browser cache is controlled by HTTP response headers. This is the first line of defense.

Cache-Control: public, max-age=31536000    ← Cache for 1 year
Cache-Control: private, max-age=0          ← Don't cache in shared caches
Cache-Control: no-store                     ← Never cache this

Service Workers

Service workers can intercept network requests and serve cached responses — even when offline.

// service-worker.js — Cache-first strategy
self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((cached) => {
      // Return cached version or fetch from network
      return cached || fetch(event.request).then((response) => {
        const clone = response.clone();
        caches.open('v1').then((cache) => cache.put(event.request, clone));
        return response;
      });
    })
  );
});

LocalStorage / SessionStorage

Good for caching non-sensitive, small data on the client.

// Simple client-side cache
function getCachedData(key, fetchFn, ttlMs = 300000) { // 5 min default
  const cached = localStorage.getItem(key);
  if (cached) {
    const { data, timestamp } = JSON.parse(cached);
    if (Date.now() - timestamp < ttlMs) {
      return data; // Cache HIT
    }
  }
  // Cache MISS — fetch and store
  return fetchFn().then((data) => {
    localStorage.setItem(key, JSON.stringify({ data, timestamp: Date.now() }));
    return data;
  });
}

Server-Side Caching

Redis

The king of server-side caching. An in-memory data structure store that supports strings, hashes, lists, sets, sorted sets, and more.

Speed: ~100,000 operations/second on a single node
Persistence: Optional (RDB snapshots, AOF logging)
Clustering: Redis Cluster for horizontal scaling
Pub/Sub: Built-in for cache invalidation events

Memcached

Simpler than Redis — a pure key-value cache. No data structures, no persistence, no clustering (by itself).

Use Memcached when: You need a dead-simple, multi-threaded cache for string key-value pairs.

Use Redis when: You need data structures, persistence, pub/sub, Lua scripting, or more advanced features. (This is almost always the answer in 2026.)

In-Memory (Application-Level)

Sometimes the fastest cache is just a Map in your application:

// Simple in-memory cache with TTL
class InMemoryCache {
  constructor() {
    this.cache = new Map();
  }

  get(key) {
    const entry = this.cache.get(key);
    if (!entry) return null;
    if (Date.now() > entry.expiresAt) {
      this.cache.delete(key);
      return null;
    }
    return entry.value;
  }

  set(key, value, ttlMs = 60000) {
    this.cache.set(key, {
      value,
      expiresAt: Date.now() + ttlMs,
    });
  }

  invalidate(key) {
    this.cache.delete(key);
  }
}

const cache = new InMemoryCache();
cache.set('config', { theme: 'dark' }, 300000); // 5 minutes

Warning: In-memory caches are per-process. If you have 4 app instances, each has its own cache — leading to inconsistency. Use Redis for shared state.

CDN Caching

CDNs like Cloudflare, AWS CloudFront, and Fastly cache responses at edge locations.

┌──────────┐     ┌──────────┐     ┌──────────┐
│  User in  │────▶│ CDN Edge │────▶│  Origin  │
│  London   │     │ London   │     │ Virginia │
└──────────┘     └──────────┘     └──────────┘
                      │
                  Cache HIT?
                  ┌───┴───┐
                 Yes      No
                  │        │
          Return  │   Forward to
          cached  │   origin, cache
          data    │   response

You control CDN caching via HTTP headers or CDN-specific rules:

Cache-Control: public, s-maxage=86400, max-age=3600
│                       │                │
│                       │                └─ Browser caches for 1 hour
│                       └─ CDN caches for 24 hours
└─ Both shared and private caches can store this

HTTP Caching Deep Dive

HTTP caching is one of the most impactful and least understood caching layers. Let's fix that.

Cache-Control Header

The main header that controls caching behavior:

# Cache publicly for 1 year (static assets with hashed filenames)
Cache-Control: public, max-age=31536000, immutable

# Cache privately (user-specific data), revalidate after 60 seconds
Cache-Control: private, max-age=60

# Don't cache at all (sensitive data, real-time data)
Cache-Control: no-store

# Cache but always revalidate before using
Cache-Control: no-cache

# CDN caches for 1 day, browser for 5 minutes
Cache-Control: public, s-maxage=86400, max-age=300

Common confusion: no-cache does NOT mean "don't cache." It means "cache it, but check with the server before using it." Use no-store to truly prevent caching.

ETag (Entity Tag)

A fingerprint of the response content. The server generates it, the browser sends it back on subsequent requests.

First request:
  GET /api/user/123
  Response:
    ETag: "abc123"
    Cache-Control: no-cache
    Body: { "name": "Alice" }

Second request:
  GET /api/user/123
  If-None-Match: "abc123"      ← "Hey server, has this changed?"

  If unchanged:
    304 Not Modified             ← No body sent! Browser uses cached version
    (saves bandwidth)

  If changed:
    200 OK
    ETag: "def456"               ← New fingerprint
    Body: { "name": "Alice W." }

Last-Modified / If-Modified-Since

Similar to ETag but uses timestamps instead of fingerprints:

First request:
  Response:
    Last-Modified: Mon, 01 Mar 2026 12:00:00 GMT

Second request:
  If-Modified-Since: Mon, 01 Mar 2026 12:00:00 GMT

  If unchanged: 304 Not Modified
  If changed: 200 OK with new data

ETag vs Last-Modified: ETag is more precise (content-based), Last-Modified can have issues with sub-second changes. Use ETag when possible.

stale-while-revalidate

A game-changer for perceived performance. Serve stale content immediately while fetching fresh content in the background.

Cache-Control: max-age=60, stale-while-revalidate=300

Timeline:
  0-60s:    Serve from cache (fresh)
  60-360s:  Serve STALE from cache instantly + revalidate in background
  360s+:    Cache expired, must fetch from origin

┌────────────────┬─────────────────────┬────────────────┐
│  0-60 seconds  │  60-360 seconds     │  After 360s    │
│                │                     │                │
│  Fresh cache   │  Stale cache served │  Must          │
│  served        │  instantly +        │  revalidate    │
│  directly      │  background refresh │  (full wait)   │
└────────────────┴─────────────────────┴────────────────┘

Express.js Example — Setting Cache Headers

const express = require('express');
const app = express();

// Static assets — cache aggressively (files have hashed names)
app.use('/static', express.static('public', {
  maxAge: '1y',
  immutable: true,
}));

// API response — short cache with revalidation
app.get('/api/products', (req, res) => {
  const products = getProducts();
  const etag = generateETag(products);

  // Check if client has current version
  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end();
  }

  res.set({
    'Cache-Control': 'public, max-age=60, stale-while-revalidate=300',
    'ETag': etag,
  });
  res.json(products);
});

// User-specific data — private cache
app.get('/api/me', authMiddleware, (req, res) => {
  res.set('Cache-Control', 'private, max-age=0, no-cache');
  res.json(req.user);
});

// Sensitive data — never cache
app.get('/api/payment-methods', authMiddleware, (req, res) => {
  res.set('Cache-Control', 'no-store');
  res.json(getPaymentMethods(req.user.id));
});

Redis Caching Patterns

Redis is the go-to caching solution for most production systems. Here are battle-tested patterns.

Basic Key-Value Caching

const Redis = require('ioredis');
const redis = new Redis();

// Simple GET/SET with TTL
async function cacheGet(key) {
  const data = await redis.get(key);
  return data ? JSON.parse(data) : null;
}

async function cacheSet(key, data, ttlSeconds = 3600) {
  await redis.set(key, JSON.stringify(data), 'EX', ttlSeconds);
}

Hash-Based Caching (For Objects)

Instead of serializing entire objects, use Redis hashes to cache individual fields:

// Store user as a hash — update individual fields without re-caching everything
async function cacheUser(user) {
  await redis.hset(`user:${user.id}`, {
    name: user.name,
    email: user.email,
    role: user.role,
  });
  await redis.expire(`user:${user.id}`, 3600);
}

// Get specific fields
async function getUserName(userId) {
  return await redis.hget(`user:${userId}`, 'name');
}

// Get all fields
async function getUser(userId) {
  return await redis.hgetall(`user:${userId}`);
}

Sorted Set for Leaderboards / Top-N

// Add scores
await redis.zadd('leaderboard', 1500, 'player:alice');
await redis.zadd('leaderboard', 2300, 'player:bob');
await redis.zadd('leaderboard', 1800, 'player:charlie');

// Top 10 players (highest scores first)
const top10 = await redis.zrevrange('leaderboard', 0, 9, 'WITHSCORES');
// ['player:bob', '2300', 'player:charlie', '1800', 'player:alice', '1500']

Cache Warming

Pre-populate the cache before traffic hits — especially useful after deployments.

async function warmCache() {
  console.log('Warming cache...');

  // Fetch most popular items and pre-cache them
  const popularProducts = await db.query(
    'SELECT * FROM products ORDER BY view_count DESC LIMIT 100'
  );

  const pipeline = redis.pipeline();
  for (const product of popularProducts) {
    pipeline.set(
      `product:${product.id}`,
      JSON.stringify(product),
      'EX',
      3600
    );
  }

  await pipeline.exec(); // Execute all at once (much faster than individual calls)
  console.log(`Warmed ${popularProducts.length} products`);
}

Multi-Layer Cache (L1 + L2)

Use an in-memory cache for ultra-hot data and Redis for everything else:

const NodeCache = require('node-cache');
const l1Cache = new NodeCache({ stdTTL: 30 }); // 30 second local cache

async function getWithMultiLayerCache(key, fetchFn) {
  // L1: In-memory (fastest)
  const l1 = l1Cache.get(key);
  if (l1) return l1;

  // L2: Redis (fast, shared)
  const l2 = await redis.get(key);
  if (l2) {
    const parsed = JSON.parse(l2);
    l1Cache.set(key, parsed); // Promote to L1
    return parsed;
  }

  // L3: Database (slow, source of truth)
  const data = await fetchFn();
  await redis.set(key, JSON.stringify(data), 'EX', 3600); // L2
  l1Cache.set(key, data); // L1
  return data;
}

Request flow:

  ┌────┐   miss   ┌───────┐   miss   ┌──────────┐
  │ L1 │────────▶│  L2   │────────▶│ Database │
  │Mem │         │ Redis │         │          │
  └──┬─┘         └───┬───┘         └────┬─────┘
     │               │                  │
     │ hit           │ hit              │ fetch
     ▼               ▼                  ▼
  Return          Promote           Populate
  instantly       to L1 +           L1 + L2 +
                  return            return

Cache Invalidation

This is the genuinely hard part. Your cache is only useful if it serves correct data. Here's how to keep it consistent.

Strategy 1: TTL-Based Expiration

The simplest approach — set a TTL and accept that data might be stale for up to that duration.

// Data is eventually consistent within a 5-minute window
await redis.set('product:123', JSON.stringify(product), 'EX', 300);

When to use: Data where slight staleness is acceptable (product catalogs, blog posts, search results).

Strategy 2: Event-Driven Invalidation

Invalidate the cache when the underlying data changes. This is the most common production approach.

// When a product is updated, invalidate its cache
async function updateProduct(productId, data) {
  await db.query('UPDATE products SET ... WHERE id = $1', [productId]);

  // Invalidate all related cache keys
  await redis.del(`product:${productId}`);
  await redis.del(`products:category:${data.categoryId}`);
  await redis.del('products:featured');

  // Or publish an invalidation event
  await redis.publish('cache:invalidate', JSON.stringify({
    type: 'product',
    id: productId,
  }));
}

Strategy 3: Version-Based Keys

Instead of invalidating, change the cache key:

// Store a version counter
let productVersion = await redis.incr('product:123:version'); // e.g., 42

// Cache key includes the version
const cacheKey = `product:123:v${productVersion}`;
await redis.set(cacheKey, JSON.stringify(product), 'EX', 3600);

// Old versions naturally expire via TTL

Strategy 4: Pub/Sub Invalidation (Multi-Instance)

When you have multiple app instances, each with their own L1 cache:

// Subscriber — every app instance listens
const subscriber = new Redis();
subscriber.subscribe('cache:invalidate');

subscriber.on('message', (channel, message) => {
  const { type, id } = JSON.parse(message);
  l1Cache.del(`${type}:${id}`); // Invalidate local cache
});

// Publisher — when data changes
async function onDataChange(type, id) {
  await redis.del(`${type}:${id}`);  // Invalidate Redis
  await redis.publish('cache:invalidate', JSON.stringify({ type, id })); // Notify all instances
}

The Invalidation Complexity Spectrum

Simple ◄────────────────────────────────────────────► Complex

  TTL        Event-driven     Version keys    Pub/Sub +
  only       invalidation     with rollback   multi-layer
                                              invalidation
   │              │                │                │
   │  Stale for   │  Consistent    │  Zero-downtime │  Real-time
   │  up to TTL   │  after event   │  cache updates │  consistency
   │              │                │                │  across
   │  Simplest    │  Most common   │  More complex  │  all nodes

Cache Stampede / Thundering Herd

One of the nastiest caching problems. Here's the scenario:

1. A popular cache key expires
2. 1000 requests arrive simultaneously
3. All 1000 see a cache MISS
4. All 1000 query the database at the same time
5. Database gets crushed
6. Everything is slow or crashes

Normal operation:         Cache stampede:

  Req → Cache HIT ✓       Req₁ → Cache MISS → DB query
  Req → Cache HIT ✓       Req₂ → Cache MISS → DB query
  Req → Cache HIT ✓       Req₃ → Cache MISS → DB query
  Req → Cache HIT ✓       ...
                           Req₁₀₀₀ → Cache MISS → DB query
                                                    ↓
                                              DATABASE DIES

Solution 1: Locking (Mutex)

Only one request fetches from the database. Others wait for the cache to be populated.

async function getWithLock(key, fetchFn, ttl = 3600) {
  // Try cache first
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  // Try to acquire a lock
  const lockKey = `lock:${key}`;
  const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX'); // 10s lock timeout

  if (acquired) {
    try {
      // We got the lock — fetch from DB and populate cache
      const data = await fetchFn();
      await redis.set(key, JSON.stringify(data), 'EX', ttl);
      return data;
    } finally {
      await redis.del(lockKey); // Release lock
    }
  } else {
    // Someone else is fetching — wait and retry
    await new Promise((resolve) => setTimeout(resolve, 100));
    return getWithLock(key, fetchFn, ttl); // Retry
  }
}

Solution 2: Early Expiration (Probabilistic)

Randomly refresh the cache before it actually expires. The idea: if 1% of requests refresh the cache when it's close to expiry, you avoid the stampede.

async function getWithEarlyExpiry(key, fetchFn, ttl = 3600) {
  const cached = await redis.get(key);
  const cacheTTL = await redis.ttl(key);

  if (cached && cacheTTL > 0) {
    // Probabilistically refresh if TTL is low
    const shouldRefresh = cacheTTL < (ttl * 0.1) && Math.random() < 0.1;

    if (shouldRefresh) {
      // Refresh in background — don't block the response
      fetchFn().then((data) => {
        redis.set(key, JSON.stringify(data), 'EX', ttl);
      });
    }

    return JSON.parse(cached);
  }

  // Cache miss — fetch and store
  const data = await fetchFn();
  await redis.set(key, JSON.stringify(data), 'EX', ttl);
  return data;
}

Solution 3: stale-while-revalidate (Never Expire)

Always serve stale data while refreshing in the background. The cache never truly "expires" — it just gets refreshed.

async function getWithSWR(key, fetchFn, freshTTL = 60, staleTTL = 3600) {
  const entry = await redis.hgetall(`swr:${key}`);

  if (entry.data) {
    const age = Date.now() - parseInt(entry.timestamp);

    if (age < freshTTL * 1000) {
      return JSON.parse(entry.data); // Fresh
    }

    if (age < staleTTL * 1000) {
      // Stale — serve immediately, refresh in background
      fetchFn().then((data) => {
        redis.hset(`swr:${key}`, {
          data: JSON.stringify(data),
          timestamp: Date.now().toString(),
        });
      });
      return JSON.parse(entry.data); // Serve stale
    }
  }

  // No data or expired — fetch synchronously
  const data = await fetchFn();
  await redis.hset(`swr:${key}`, {
    data: JSON.stringify(data),
    timestamp: Date.now().toString(),
  });
  return data;
}

Real-World Examples

Netflix

Netflix serves 250 million+ subscribers and caches aggressively at every layer:

EVCache — Their custom distributed caching layer built on Memcached. Handles 30 million requests per second with sub-millisecond latency.
Edge caching — Movie artwork, metadata, and personalized recommendations are cached at CDN edge locations.
Tiered caching — L1 (in-process) → L2 (EVCache) → L3 (database). Most reads never hit the database.
Precomputed caches — Recommendations are computed offline and stored in cache, not generated in real-time.

Twitter (X)

Twitter's timeline is one of the most cache-intensive systems:

Timeline fanout — When a user tweets, their tweet is written to the cached timeline of all followers (write-behind pattern).
Redis clusters — Twitter runs one of the world's largest Redis deployments for timeline caching.
Celebrity problem — Users with millions of followers can't use fanout (too expensive), so their tweets are merged at read time (hybrid approach).

Regular user tweets (fanout-on-write):
  Tweet → Write to 500 follower timelines in cache

Celebrity tweets (fanout-on-read):
  Tweet → Store once → Merge into timeline when follower reads

Facebook (Meta)

Facebook's caching system is legendary:

Memcache (TAO) — A custom graph-aware caching layer that sits in front of their MySQL databases. Handles billions of requests per second.
Regional caching — Data is cached in the region closest to the user, with cross-region invalidation.
Cache lease — Their solution to thundering herd: when a cache miss occurs, the first request gets a "lease" (permission to refresh), and all other requests wait.
McRouter — A Memcached protocol router that handles cache sharding, replication, and failover.

Decision Framework

What to Cache

Good candidates for caching:
  ✓ Database query results (especially complex JOINs)
  ✓ API responses from external services
  ✓ Computed/aggregated data (dashboards, reports, analytics)
  ✓ Session data
  ✓ Configuration and feature flags
  ✓ Static assets (images, CSS, JS)
  ✓ User profiles and preferences
  ✓ Product catalogs / search results

Bad candidates for caching:
  ✗ Rapidly changing data (stock prices, live scores)
  ✗ Sensitive data (passwords, tokens, PII — unless encrypted)
  ✗ Write-heavy data with low read frequency
  ✗ Large blobs that rarely repeat (file uploads)
  ✗ Data that MUST be real-time consistent (financial transactions)

Decision Flowchart

                    Is the data read frequently?
                         /            \
                       Yes              No
                        |                |
                  Is staleness        Don't cache.
                  acceptable?         Not worth it.
                   /        \
                 Yes          No (must be real-time)
                  |                    |
           Is it expensive       Consider write-through
           to compute/fetch?     or event-driven
                  |              invalidation
                Yes
                  |
           ┌─────┴──────┐
           │   CACHE IT  │
           └─────┬──────┘
                  |
           Choose your layer:
           ┌──────────────────────────┐
           │ Static content → CDN     │
           │ User-specific → Redis    │
           │ Hot data → In-memory     │
           │ API responses → HTTP     │
           │   cache headers          │
           └──────────────────────────┘

Choosing a Cache TTL

Data Type	Recommended TTL	Rationale
Static assets (hashed)	1 year	Filename changes on update
Product catalog	5-15 min	Changes infrequently
Search results	1-5 min	Acceptable staleness
User profile	1-24 hours	Rarely changes
Session data	30 min - 24 hours	Depends on security needs
Feature flags	1-5 min	Want changes to propagate quickly
API rate limit counters	Match the rate limit window	Must be accurate
Real-time data	Don't cache, or < 5 seconds	Staleness = bad UX

Common Mistakes

1. Caching Sensitive Data Without Encryption

Never cache passwords, tokens, credit card numbers, or PII in plain text. If you must cache sensitive data, encrypt it and use short TTLs.

// BAD — plain text sensitive data in cache
await redis.set('user:123:payment', JSON.stringify({ cardNumber: '4111...' }));

// BETTER — encrypt before caching
const encrypted = encrypt(JSON.stringify(paymentData));
await redis.set('user:123:payment', encrypted, 'EX', 300); // Short TTL

2. Not Setting TTL

Cache entries without a TTL live forever. This leads to stale data and memory exhaustion.

// BAD — no TTL, lives forever
await redis.set('user:123', JSON.stringify(user));

// GOOD — always set a TTL
await redis.set('user:123', JSON.stringify(user), 'EX', 3600);

Also set maxmemory-policy in Redis to handle overflow:

# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru

3. Cache Poisoning

If an attacker can control what gets cached, they can serve malicious content to all users.

Attack: Manipulate request headers/params → server generates bad response → CDN caches it

Prevention:
  - Normalize cache keys
  - Validate all inputs before caching
  - Use Vary headers correctly
  - Don't cache error responses

4. Not Thinking About Cache Warming

After a deployment or cache flush, every request is a miss. Your database gets slammed.

Fix: Warm the cache with popular data before routing traffic.

5. Caching Too Much or Too Little

Too much caching → Memory waste, stale data issues, complex invalidation
Too little caching → Performance bottlenecks, high DB load

Start with caching your hottest paths — the 20% of queries that generate 80% of traffic. Measure, then expand.

6. Ignoring Cache Metrics

You can't improve what you don't measure. Track:

Hit rate — Aim for 90%+. Below 80% means your caching strategy needs work.
Miss rate — Every miss is a database hit.
Eviction rate — High evictions mean your cache is too small or TTLs are wrong.
Latency — p50, p95, p99 for cache operations.
Memory usage — Know when you're approaching limits.

7. Not Handling Cache Failures Gracefully

Redis is down. Now what? If your app crashes because the cache is unavailable, you've made the cache a single point of failure.

// GOOD — fall back to database if cache fails
async function getUserSafe(userId) {
  try {
    const cached = await redis.get(`user:${userId}`);
    if (cached) return JSON.parse(cached);
  } catch (err) {
    console.warn('Cache unavailable, falling back to DB', err.message);
    // Don't throw — fall through to DB
  }

  return db.query('SELECT * FROM users WHERE id = $1', [userId]);
}

TL;DR — Caching Cheat Sheet

┌──────────────────────────────────────────────────────────────┐
│                     CACHING CHEAT SHEET                       │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  STRATEGIES                                                   │
│  ─────────                                                    │
│  Cache-Aside .... Most common, app manages cache              │
│  Read-Through ... Cache loads data on miss                    │
│  Write-Through .. Write to cache + DB synchronously           │
│  Write-Behind ... Write to cache, async flush to DB           │
│  Refresh-Ahead .. Proactively refresh before expiry           │
│                                                               │
│  EVICTION POLICIES                                            │
│  ─────────────────                                            │
│  LRU ............ Best default choice                         │
│  LFU ............ Stable popularity patterns                  │
│  TTL ............ Always use alongside other policies         │
│                                                               │
│  WHERE TO CACHE                                               │
│  ──────────────                                               │
│  Browser ........ HTTP headers, service workers               │
│  CDN ............ Static assets, cacheable API responses      │
│  Redis .......... Server-side, shared across instances        │
│  In-memory ...... Ultra-hot data, single instance only        │
│                                                               │
│  GOLDEN RULES                                                 │
│  ────────────                                                 │
│  1. Always set a TTL                                          │
│  2. Never cache sensitive data unencrypted                    │
│  3. Handle cache failures gracefully (fallback to DB)         │
│  4. Monitor hit rate — aim for 90%+                           │
│  5. Start with hot paths, expand from there                   │
│  6. Cache invalidation > letting data go stale                │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Let's Connect!

If this guide helped you level up your caching game, I'd love to connect! I regularly share deep dives on system design, backend architecture, and web performance.

Connect with me on LinkedIn — let's grow together.

Share this with a developer who's still hitting the database on every request!

Top comments (1)

Andre Cytryn • Mar 21

the thundering herd section is one of the best explanations I've seen of this problem. the probabilistic early expiration approach (solution 2) is actually an approximation of the XFetch algorithm by Vattani et al. the key insight the paper adds is that the probability of refreshing should scale with the recomputation cost — a key where fetching takes 500ms should start refreshing earlier than one that takes 5ms. the formula ends up being: refresh if ttl <= delta * beta * log(rand(0,1)) where delta is recomputation time and beta is a tuning parameter (1.0 works well for most cases).

also worth adding to the common mistakes section: the "hot key" problem in Redis. when a single key gets hammered (100k+ req/s), it becomes a bottleneck on one shard even if the value is already cached. the L1+L2 pattern you show is exactly the right fix, but most people don't realize they need it until they see Redis CPU spike on one shard while all others are idle. a 30-second in-process cache for truly hot keys eliminates the problem entirely.