DEV Community

Cover image for Cache Strategies Explained: Part 1 - The Fundamentals
SpicyCode
SpicyCode

Posted on

Cache Strategies Explained: Part 1 - The Fundamentals

How tech giants (Netflix, Facebook, Google, Twitter) serve billions of requests per second using caching


Table of Contents


The Incident That Changed Everything

Netflix, Production Incident (Reported September 2025)

An experienced developer types an ALTER TABLE command in their terminal. This is routine work, something they've done hundreds of times. They hit Enter.

ALTER TABLE user_preferences...
Enter fullscreen mode Exit fullscreen mode

Three seconds later, the alert fires.

Dashboards light up red. The primary database just suffered massive corruption. Critical user preference data profiles, watch lists, personalized recommendations became unusable.

In a typical company, this is where you start calculating the millions of dollars this incident will cost. Where careers can hang in the balance.

But at Netflix, something unexpected happens.

No customer noticed anything. No complaints, no service interruption. 200+ million subscribers kept watching their shows peacefully.

How is this possible?

Two silent technologies saved the day:

  1. A cache continuing to serve valid data
  2. A Write-Ahead Log (WAL) that had captured all mutations before the corruption

Engineers simply extended the cache TTL, replayed mutations from Kafka, cleaned up the corruption, and resumed operations. Result: zero data loss, zero downtime.

Transparency note: Netflix hasn't publicly disclosed the exact number of affected records or full incident details. Information comes from their official blog post (September 2025) demonstrating the critical importance of their cache + WAL architecture for resilience.


Why Caching Is Not Optional

This incident proves that caching isn't just a performance optimization. It's a critical protection layer that can mean the difference between a minor incident and a multi-million dollar catastrophe.

In this two-part series, we'll explore:

  • Part 1: Fundamental strategies every developer should know
  • Part 2: Enterprise-grade advanced architectures (WAL, multi-region, resilience)

The 6 Fundamental Strategies

1. TTL (Time-To-Live) - Temporal Expiration

TTL defines how long data remains valid in cache before being automatically deleted or refreshed.

Implementation example:

# Redis with TTL
cache.set("user:123", user_data, ttl=3600)  # Expires after 1 hour
Enter fullscreen mode Exit fullscreen mode

Ideal use cases:

  • Weather data (hourly refresh)
  • News feeds (updated every 5 minutes)
  • Product prices (daily changes)
  • User sessions

TTL is universal. Every major tech company uses it in some form.

Important: TTL and eviction policies work together

In production, TTL and LRU/LFU operate simultaneously in Redis/Memcached:

# Redis configuration: maxmemory-policy allkeys-lru
cache.set("user:123", data, ttl=3600)

# This item will expire in 1 hour OR be evicted earlier if cache is full (LRU)
Enter fullscreen mode Exit fullscreen mode

Data can disappear from cache for two reasons:

  • TTL expired: time elapsed (3600 seconds in the example)
  • Eviction: cache full, least recently used item removed (LRU)

This combination ensures both data freshness (TTL) and optimal memory usage (LRU).


2. LRU (Least Recently Used) - Priority to Recent Items

When cache is full, LRU removes the least recently accessed data. It's like organizing your desk: you keep what you use often within reach.

Visual workflow:

Cache (capacity: 3 items)
1. Access A → [A]
2. Access B → [A, B]
3. Access C → [A, B, C]
4. Access D → [B, C, D]  // A removed (oldest)
5. Access B → [C, D, B]  // B moves to front
Enter fullscreen mode Exit fullscreen mode

Ideal use cases:

  • Web pages (repeated navigation)
  • Active user sessions
  • Browsing history

Used in production by: Netflix (EVCache with client-side LRU)


3. LFU (Least Frequently Used) - Priority to Popularity

LFU keeps the most frequently requested data, regardless of last access time.

LRU vs LFU difference:

  • LRU: "When did you last use this?"
  • LFU: "How many times have you used this total?"

Concrete example:

Data: A (used 10x), B (used 2x), C (used 5x)
Cache full → Remove B (least frequent)
Enter fullscreen mode Exit fullscreen mode

Ideal use cases:

  • E-commerce best-sellers
  • Viral content with lasting popularity
  • Repetitive search queries

4. Write-Through vs Write-Behind - Write Strategies

Write-Through (Synchronous Write)

Application writes to cache AND database simultaneously.

def save_user(user):
    cache.set(f"user:{user.id}", user)
    database.save(user)  # Both at the same time
Enter fullscreen mode Exit fullscreen mode

Pros: guaranteed data consistency
Cons: higher write latency
Use case: banking, financial transactions, critical data

Used by: Facebook TAO (synchronous cache + DB writes)


Write-Behind / Write-Back (Asynchronous Write)

Application writes to cache first, then to database asynchronously.

def save_user(user):
    cache.set(f"user:{user.id}", user)
    queue.add_job("save_to_db", user)  # Async (via message queue)
Enter fullscreen mode Exit fullscreen mode

Pros: ultra-fast writes
Cons: risk of loss if crash before DB save
Use case: logs, analytics, non-critical metrics

Important note: Simple Write-Behind has production limitations. In Part 2, we'll see how Netflix transformed it into Write-Ahead Log (WAL) for enterprise-grade durability guarantees.


5. Cache-Aside (Lazy Loading) - The Most Common Pattern

This is the dominant strategy in the industry. The application manages the cache itself.

def get_user(user_id):
    # 1. Check cache
    user = cache.get(f"user:{user_id}")

    if user:
        return user  # Cache HIT

    # 2. Not in cache? Fetch from DB
    user = database.get_user(user_id)  # Cache MISS

    # 3. Store in cache for next time
    cache.set(f"user:{user_id}", user, ttl=3600)

    return user
Enter fullscreen mode Exit fullscreen mode

Used by: Netflix, Spotify, Twitter, and most web applications


6. Read-Through Cache - Delegation to Cache

The cache itself automatically manages database reads (transparent to the application).

# Application simply asks the cache
user = cache.get("user:123")
# Cache automatically fetches from DB if needed
Enter fullscreen mode Exit fullscreen mode

Used by: Facebook (evolution of their architecture)


Comparison Table

Strategy Pros Cons Use Case
TTL Simple, predictable May serve stale data Weather, news
LRU Adapts to temporal patterns May evict important data Sessions, navigation
LFU Keeps popular data More complex to implement Best-sellers
Write-Through Guaranteed consistency Write latency Banking, critical data
Write-Behind Very fast Risk of loss Logs, analytics
Cache-Aside Flexible, full control App manages logic Most cases
Read-Through Transparent to app Requires middleware Complex systems

How Giants Use Caching

Netflix - EVCache: Billions of Requests/Second

Infrastructure:

  • Distributed cache based on Memcached
  • Combined strategies: TTL + LRU + Cache-Aside
  • Geographic replication across 4 global regions
  • Some clusters with 2 copies, others with 9 (depending on criticality)

Verified performance:

  • Handles billions of requests per second
  • Cache warming: reduced 45 GB/s → 100 MB/s network traffic

Multi-tier architecture:

L1: Local memory cache (client-side LRU)
    ↓
L2: EVCache distributed (TTL)
    ↓
L3: Multi-zone replication
    ↓
Database
Enter fullscreen mode Exit fullscreen mode

Key lesson: Netflix pre-calculates and pre-loads cache before putting servers in production (cache warming).


Facebook/Meta - TAO: 1 Billion Reads/Second

Architectural evolution:

  1. Phase 1: Memcache + MySQL (Cache-Aside look-aside)
  2. Phase 2: TAO (The Associations and Objects) - abstraction layer
  3. Current strategy: Write-Through (synchronous cache + DB writes)

Verified performance:

  • 96.4% hit rate on reads
  • Over 1 billion read requests/second
  • Millions of writes/second

Technical innovation: "Leases"
To avoid the thundering herd problem (massive rush when cache expires):

  • Only one request can hit the database every 10 seconds per key
  • Other requests wait or retrieve the freshly calculated value

Concrete result: reduction from 17,000 req/s → 1,300 req/s to database during peaks.


Twitter/X - Manhattan + Redis: Consistency at Scale

Infrastructure:

  • Manhattan (distributed key-value store)
  • Redis (Haplo) as primary cache for Timeline
  • Strategy: Cache-Aside + eventual consistency by default

Verified performance:

  • 320 million packets/second
  • 120 GB/s network throughput
  • Tens of millions of read QPS
  • Cache represents only 3% of infrastructure but is critical

Particularity: strong consistency option available via consensus for critical data.


Google - Bigtable + Spanner: Multi-Tier Cache

Sophisticated architecture:

L1: Row cache (in-memory) → Reduces CPU by 25%
    ↓
L2: Block cache (local SSD)
    ↓
L3: Colossus Flash Cache (datacenter)
    ↓
Persistent storage
Enter fullscreen mode Exit fullscreen mode

Verified performance:

  • Bigtable: 17,000 point reads/second per node (1.7x improvement)
  • Colossus Flash Cache: over 5 billion requests/second
  • Spanner automatically caches query execution plans

Innovation: CacheSack
Intelligent admission algorithm for flash cache that optimizes total cost of ownership (TCO).


Real-World Challenges

1. The Thundering Herd

The problem:
When a popular key expires, thousands of requests simultaneously hit the database.

Cache expires at 12:00:00
    ↓
10,000 requests arrive at 12:00:01
    ↓
All go to DB simultaneously → CRASH
Enter fullscreen mode Exit fullscreen mode

Facebook solution (Leases):

  • Only one request authorized every 10 seconds
  • Others wait or read the freshly calculated value

Measured result: 17,000 req/s → 1,300 req/s


2. Cache Warming

The problem:
Starting with empty cache = terrible latency during first few minutes/hours.

Netflix solution:

  • Copy data from EBS snapshots
  • Load cache BEFORE putting servers in production
  • Avoids "warm-up" period

Measured result: 45 GB/s → 100 MB/s network traffic saved


3. Geographic Consistency

The problem:
How to synchronize caches across multiple continents?

Adopted solutions:

  • Eventual consistency by default (few seconds delay acceptable)
  • Optional strong consistency for critical data
  • Asynchronous replication between regions

Examples:

  • Spotify: EU ↔ NA replication
  • Netflix: 4 global regions
  • Facebook: global datacenters with synchronization

The Invalidation Problem

As Phil Karlton's famous quote says:

"There are only 2 hard problems in computer science: cache invalidation and naming things."

The 4 Invalidation Strategies

1. TTL (Time-To-Live)

cache.set("product:123", data, ttl=3600)  # Auto-expires
Enter fullscreen mode Exit fullscreen mode
  • Simple, predictable
  • May serve stale data

2. Manual Invalidation

def update_user(user_id, new_data):
    database.update(user_id, new_data)
    cache.delete(f"user:{user_id}")  # Explicit deletion
Enter fullscreen mode Exit fullscreen mode
  • Full control
  • Risk of missing some keys

3. Event-Based

# When an event occurs
event_bus.on("user_updated", lambda user_id: cache.delete(f"user:{user_id}"))
Enter fullscreen mode Exit fullscreen mode
  • Automatic, decoupled
  • System complexity

4. Version Tagging

cache.set(f"user:{user_id}:v{version}", data)
# When updating, just change the version
Enter fullscreen mode Exit fullscreen mode
  • No need to delete old one
  • Uses more memory

Getting Started Guide

Decision Tree: Which Strategy Should You Choose?

Are your data critical (banking, healthcare, user profiles)?
│
├─ YES → Zero data loss tolerable?
│   │
│   ├─ YES → Multi-region replication necessary?
│   │   │
│   │   ├─ YES → Write-Through + WAL (Netflix-style)
│   │   │         Example: Banking, Healthcare
│   │   │
│   │   └─ NO → Write-Through (synchronous cache + DB)
│   │             Example: E-commerce, B2B SaaS
│   │
│   └─ NO → Loss of a few seconds acceptable?
│       │
│       └─ YES → Write-Behind (asynchronous)
│                 Example: Analytics, metrics
│
└─ NO → Highly unequal popularity (few items very popular)?
    │
    ├─ YES → Cache-Aside + LFU
    │         Example: E-commerce (best-selling products)
    │
    └─ NO → Data with limited lifetime?
        │
        ├─ YES → Cache-Aside + TTL
        │         Example: Weather API, RSS feeds
        │
        └─ NO → Cache-Aside + LRU (universal default)
                  Example: Majority of web applications
Enter fullscreen mode Exit fullscreen mode

Concrete use cases by company size:

Size Users Recommended Stack Example
Startup < 100K Cache-Aside + Redis + TTL Blog, MVP, early-stage SaaS
Scale-up 100K-1M Cache-Aside + Redis Cluster + LRU E-commerce, growth SaaS
Enterprise 1M-10M Write-Through + Multi-region Fintech, Healthcare
Hyper-scale 10M+ Write-Through + WAL + Flash Cache Netflix, Facebook

Simple rule:

  • Don't know what to choose? → Start with Cache-Aside + TTL + LRU
  • This is what 80% of web applications use successfully

To Start: Cache-Aside + TTL

Why this choice?

  1. It's the most used pattern in the industry
  2. Used by Netflix, Spotify, Twitter, and most startups
  3. Easy to understand and implement
  4. Works for the vast majority of use cases

Universal starting pattern:

def get_data(key):
    # 1. Check cache
    data = cache.get(key)

    if data:
        return data  # Cache HIT

    # 2. Cache MISS → go to DB
    data = database.query(key)

    # 3. Store in cache
    cache.set(key, data, ttl=300)  # 5 minutes

    return data
Enter fullscreen mode Exit fullscreen mode

Progressive Evolution: The Maturity Curve

Phase 1: Early Days (1-100K users)

  • Simple cache: Redis or Memcached
  • Pattern: Cache-Aside + TTL
  • Infrastructure: 1-2 cache servers

Phase 2: Growth (100K-1M users)

  • Distributed cache (Redis/Memcached cluster)
  • Monitoring: hit rate, latency
  • Add cache warming for popular data

Phase 3: Scale (1M-10M users)

  • Multi-tier architecture (memory + distributed)
  • Geographic replication
  • Anti-thundering herd system
  • Event-based invalidation

Phase 4: Hyper-scale (10M+ users)

  • Flash cache (SSD)
  • Sophisticated admission algorithms
  • Global replication
  • Strong consistency for critical data

Essential Metrics

1. Hit Rate

Hit Rate = (Cache Hits / Total Requests) × 100
Enter fullscreen mode Exit fullscreen mode

Targets:

  • Excellent: >95%
  • Good: 90-95%
  • Needs improvement: <90%

Hit rate measured at Facebook: 96.4%


2. Latency (P50, P95, P99)

P50: 50% of requests respond in less than X ms
P95: 95% of requests respond in less than Y ms
P99: 99% of requests respond in less than Z ms
Enter fullscreen mode Exit fullscreen mode

Typical targets:

  • Cache hit: <1ms
  • Cache miss: <50ms (including DB)

3. Eviction Rate

How many times per second are data removed from cache due to lack of space?

If too high: increase cache size or optimize TTL


Part 1 Conclusion

In this first part, we covered the fundamental caching strategies used by all web giants.

You now understand:

  • The 6 basic strategies (TTL, LRU, LFU, Write-Through/Write-Behind, Cache-Aside, Read-Through)
  • How Netflix, Facebook, Google, and Twitter use caching
  • Real-world challenges (thundering herd, cache warming, consistency)
  • Where to start for your own project

In Part 2: Advanced Architectures, we'll discover:

  • Netflix's Write-Ahead Log (WAL) in detail
  • How to survive database corruption with zero downtime
  • Multi-region replication
  • Tradeoffs and lessons learned at enterprise scale

Up next: Part 2 - From Write-Behind to Write-Ahead Log: How Netflix Guarantees Zero Data Loss

Top comments (0)