Rajkiran

Posted on May 31

System Design - 8. Caching, Caching Explained: The Single Biggest Performance Win in Any System

#architecture #backend #performance #systemdesign

The Two Hard Problems in Computer Science
Phil Karlton famously said:

"There are only two hard things in Computer Science: cache invalidation and naming things."

He said this decades ago. It's still true. Because caching is deceptively simple — until it isn't.

At its core, caching is just: *store the result of an expensive operation so you don't have to repeat it.
*
But in production, at scale, caching becomes a discipline of its own. Get it wrong and you get stale data, cache stampedes, poisoned caches, or a system that's actually slower because of the cache overhead.

Get it right? Your p99 latency drops from 400ms to 4ms. Your database load drops by 80%. Your infrastructure bill halves.
Let's get it right.

Why Caching Works: The 80/20 Rule of Data Access
In almost every system, data access follows a *power law distribution:
*

20% of your data gets 80% of the reads
The most popular tweet gets read millions of times per hour
The most popular product page gets visited thousands of times per minute
Most database rows are rarely read; a few are read constantly

This means: if you cache that hot 20% in memory, you can serve 80% of your traffic from the cache — without touching the database at all.
Without cache:
100,000 reads → 100,000 DB queries → DB overwhelmed

With cache (20% hot data cached):
100,000 reads → 80,000 from cache (0.2ms) + 20,000 DB queries
DB load reduced by 80%. Response time improved by 100x for cache hits.
That's the power of caching. Now let's learn the four strategies.

The 4 Cache Strategies

Strategy 1: Cache-Aside (Lazy Loading)
The most common pattern. The application manages the cache explicitly.

Read flow:

App checks cache for key
Cache HIT → return cached value (fast path)
Cache MISS → query database → store result in cache → return value

Write flow:

1. App writes to database
2. App invalidates (deletes) the cache entry (next read will be a miss and repopulate from DB)

Pseudocode:
pythondef get_user(user_id):
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached) # Cache hit — fast path

# 2. Cache miss — go to database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)

# 3. Store in cache with TTL
redis.setex(f"user:{user_id}", 3600, json.dumps(user))  # TTL: 1 hour

return user

def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
redis.delete(f"user:{user_id}") # Invalidate cache

Advantages:

Cache only contains data that's actually requested (no wasted memory)
Resilient to cache failures (app falls back to DB seamlessly)
Works well for read-heavy workloads

Disadvantages:

First request after a miss is slow (cold start)
Cache and database can be briefly inconsistent (the window between write and invalidation)
Race condition: two simultaneous misses can both query DB and both write to cache

Best for: General-purpose read caching. User profiles, product details, configuration data.

Strategy 2: Write-Through
Every write goes to the cache AND the database simultaneously. Cache is always in sync.
Write path:
App → [Cache + Database simultaneously] → Confirm write

Read path:
App → Cache → always hits (cache is always populated)

Pseudocode:
pythondef update_user(user_id, data):
# Write to BOTH simultaneously
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
redis.set(f"user:{user_id}", json.dumps(data)) # Always in sync

Advantages:

Cache is never stale — writes are immediately reflected
Read performance is always good (cache is fully populated)

Disadvantages:

Write latency increases (must write to both cache and DB)
Cache fills with data that might never be read again (wasted memory)
If cache fails during write, you have a consistency problem

Best for: Systems where stale reads are completely unacceptable. Financial dashboards, real-time inventory counts.

**Strategy 3: Write-Behind (Write-Back)
**The application writes to cache only. The cache asynchronously flushes to the database later.
Write path:
App → Cache (immediate) → [async, delayed] → Database

Read path:
App → Cache → fast reads

Advantages:
Blazing fast writes (just a cache write, no DB round-trip)
DB is protected from write spikes (cache absorbs bursts, flushes in batches)
Dramatically higher write throughput

Disadvantages:

Data loss risk: if cache fails before flushing to DB, writes are lost
More complex implementation
Not suitable when write durability is required

Best for: High-velocity writes where some data loss is tolerable. Analytics event counters, game scores, view counts. YouTube's view counter uses this pattern — it increments a Redis counter and periodically syncs to the database.

Strategy 4: Read-Through
The cache sits directly in front of the database. The application only talks to the cache, never the database directly.
App → Cache → (on miss) → Database
↑
Cache manages its own population

Difference from Cache-Aside: In Cache-Aside, the application handles miss logic. In Read-Through, the cache layer handles it — the app just asks the cache and never knows whether it was a hit or miss.

Advantages:

Simpler application code (no miss-handling logic in app)
Consistent cache population logic

Disadvantages:

Cold start problem (first request for any key is always slow)
Adds an abstraction layer that can obscure behavior

Best for: Systems where you want to keep cache logic out of your application code. Works well with managed cache services.

Cache Eviction Policies: What Gets Thrown Out?
When your cache is full, something must be evicted to make room for new data. The policy determines what.

LRU — Least Recently Used
Evicts the item that was accessed longest ago. The assumption: recently accessed data is more likely to be accessed again.
Cache: [A(10min ago), B(5min ago), C(1min ago), D(30sec ago)]
New item arrives, cache is full → evict A (least recently used)

Most common policy. Redis uses LRU by default. Works well for general-purpose caching.

LFU — Least Frequently Used
Evicts the item that has been accessed the fewest total times.
Cache: [A(1 access), B(50 accesses), C(200 accesses), D(3 accesses)]
New item arrives → evict A (least frequently used)
Better for long-running caches where access frequency matters more than recency. A product page viewed 200 times should stay in cache even if it wasn't accessed in the last hour.

TTL — Time to Live
Items expire automatically after a set duration.
redis.setex("user:1234", 3600, data) # Expires in 3600 seconds (1 hour)

Not a replacement for LRU/LFU — TTL is used alongside them. TTL ensures stale data doesn't live in cache forever, even if it's "popular."

Rule of thumb: Set TTL based on how often the underlying data changes. User profiles (change rarely) → 1 hour TTL. Stock prices (change every second) → 1-2 second TTL or no caching.

FIFO — First In, First Out
Evicts the oldest-added item, regardless of how recently or frequently it was accessed.
Rarely used in practice — LRU almost always performs better.

Cache Stampede: The Hidden Failure Mode
This is one of the most dangerous cache failure patterns, and most engineers don't see it coming.

The scenario:

A popular cache key expires (TTL reached)
10,000 concurrent requests arrive at exactly that moment
ALL 10,000 see a cache miss simultaneously
ALL 10,000 query the database
Database gets 10,000 simultaneous queries it cannot handle
Database crashes
Every subsequent request also misses → cascade continues

This is the thundering herd problem — a single expiry event stampedes your database.

Solution 1: Probabilistic Early Expiration
Instead of expiring exactly at TTL, expire a little before TTL with increasing probability as TTL approaches:
pythonimport random
import math

def get_with_early_expiry(key, ttl, beta=1.0):
cached = redis.get(key)
remaining_ttl = redis.ttl(key)

# Recompute early with increasing probability as TTL shrinks
if cached and remaining_ttl - beta * math.log(random.random()) < 0:
    # Time to refresh proactively (while old value still serves requests)
    refresh_cache_async(key)

return cached or fetch_from_db_and_cache(key, ttl)

Solution 2: Cache Locking (Mutex)
Only one process fetches from DB on a miss. Others wait.
pythondef get_with_lock(key):
cached = redis.get(key)
if cached:
return cached

lock_key = f"lock:{key}"
if redis.set(lock_key, "1", nx=True, ex=5):  # Acquire lock
    try:
        data = db.fetch(key)
        redis.setex(key, 3600, data)
        return data
    finally:
        redis.delete(lock_key)
else:
    # Lock held by someone else — wait and retry
    time.sleep(0.1)
    return get_with_lock(key)  # Retry (hopefully cache is warm now)

Solution 3: Staggered TTLs
Add random jitter to TTL so keys don't all expire simultaneously:
pythonimport random
base_ttl = 3600
jitter = random.randint(0, 600) # 0-10 minutes of jitter
redis.setex(key, base_ttl + jitter, data)

Cache Poisoning and Cache Miss Storm
Cache poisoning: Invalid, malicious, or corrupted data gets stored in the cache and is served to all subsequent requests.

Prevention:

Validate data before caching it
Use cache namespacing to isolate different data types
Implement cache versioning (if schema changes, old cached data is ignored)

Cache miss storm (different from stampede): An attacker sends requests for millions of unique keys that don't exist in cache — forcing every request to hit the database.
Prevention:

Cache negative results too: redis.setex("user:99999999", 60, "NOT_FOUND")
Rate limit requests per IP
Use Bloom filters to check if a key exists before querying the database (Day 8 topic)

Redis vs Memcached: The Right Choice

Both Redis and Memcached are popular in-memory key-value stores used to improve application performance by reducing database load and serving frequently accessed data at lightning speed.

While most modern applications choose Redis, Memcached still has its place in certain scenarios. Let's compare them feature by feature.

1. Data Structures

One of Redis's biggest strengths is its support for multiple data structures. In addition to simple strings, Redis provides Lists, Sets, Sorted Sets, Hashes, Streams, Bitmaps, and HyperLogLog.

Memcached, on the other hand, only supports simple string-based key-value pairs.

If your application needs more than basic caching, Redis offers significantly greater flexibility.

2. Persistence

Redis can persist data to disk using RDB snapshots and AOF (Append Only File) logs. This means data can survive server restarts and failures.

Memcached stores everything purely in memory. If the server restarts, all cached data is lost.

For applications requiring data durability or recovery, Redis is the clear winner.

3. Replication and High Availability

Redis provides built-in replication features through Redis Replicas, Redis Sentinel, and Redis Cluster.

These features help applications achieve high availability, fault tolerance, and horizontal scaling.

Memcached offers limited support in this area and typically relies on external solutions for replication and failover.

**4. Publish/Subscribe Messaging

**
Redis includes a built-in Publish/Subscribe (Pub/Sub) system, allowing applications to exchange real-time messages between producers and consumers.

This makes Redis useful beyond caching, enabling chat systems, notifications, event-driven architectures, and real-time updates.

Memcached does not provide Pub/Sub capabilities.

5. Lua Scripting

Redis supports Lua scripting, allowing developers to execute complex operations atomically on the server side.

This reduces network overhead and ensures consistency for multi-step operations.

Memcached does not support server-side scripting.

6. Memory Efficiency

Both systems are memory efficient, but Memcached is often slightly more efficient when storing large volumes of simple string data.

Since Memcached focuses solely on caching, it has lower overhead and can achieve marginally better memory utilization in specific use cases.

Redis trades a small amount of memory efficiency for additional features and flexibility.

7. Multi-Threading

Traditionally, Redis used a single-threaded execution model. Starting with Redis 6.0, it introduced I/O threading to improve network performance while maintaining its simple architecture.

Memcached is fully multi-threaded and can utilize multiple CPU cores effectively.

For extremely high-throughput workloads involving simple key-value operations, Memcached may have a performance advantage.

When Should You Choose Redis?

Choose Redis when you need:

Advanced data structures
Persistence and durability
Replication and high availability
Publish/Subscribe messaging
Lua scripting
Distributed caching capabilities
Real-time applications

When Should You Choose Memcached?

Choose Memcached when:

You only need simple key-value caching
Maximum memory efficiency is important
Data persistence is not required
Simplicity and raw cache performance are the primary goals

Final Verdict

For most modern applications, Redis is the preferred choice because it provides much more than caching. Its rich data structures, persistence options, replication support, and messaging capabilities make it a versatile platform for building scalable systems.

Memcached remains a solid option for lightweight, high-performance caching workloads where simplicity and memory efficiency matter most.

Choose Redis when:

You need anything beyond simple string storage (sorted sets for leaderboards, pub/sub for notifications, lists for queues)
You need persistence (data survives cache restarts)
You want a cache + simple message broker in one

Choose Memcached when:

You need pure, maximum-throughput string caching with multi-core scaling
Simplicity matters and you'll never need persistence or complex data types
Running on very memory-constrained servers

In practice: >90% of new systems choose Redis. The versatility is worth it.

Real Production Pattern: Twitter's Timeline Cache
Twitter pre-computes and caches the home timeline for every active user.

Without caching:
User opens Twitter feed
→ Query: find all accounts this user follows (500 follows)
→ For each: fetch their last 20 tweets
→ Merge, sort, deduplicate 10,000 tweets
→ Return top 20
Time: 300-500ms per request

With caching (fan-out on write):
When @elonmusk posts a tweet:
→ Find all his 100M followers
→ For each active follower: push tweet ID to their timeline cache
→ Cache key: "timeline:user_id" → [tweet_id_1, tweet_id_2, ...]

User opens feed:
→ Read pre-computed timeline from Redis: 2ms
→ Return top 20 tweet IDs → fetch actual content in parallel
Time: 2-10ms per request

The cache does the aggregation work at write time so reads are instant. This is the core architectural insight behind every social media feed system — and it only works because of caching.

Key Takeaways

Caching works because 20% of data receives 80% of reads — cache that 20%.
4 strategies: Cache-Aside (lazy, most common), Write-Through (always consistent), Write-Behind (fastest writes), Read-Through (clean app code).
3 eviction policies: LRU (recency), LFU (frequency), TTL (time-based). Use LRU + TTL together in most cases.
Cache stampede is deadly — prevent it with probabilistic expiry, mutex locks, or TTL jitter.
Cache negative results to prevent miss storms from attackers.
Redis wins for almost every use case — persistence, data structures, pub/sub, and replication in one package.
"Cache invalidation is hard" isn't a meme — it's a real engineering challenge. Always ask: "when does this cached value become wrong, and how do I know?"

DEV Community

System Design - 8. Caching, Caching Explained: The Single Biggest Performance Win in Any System

1. Data Structures

2. Persistence

3. Replication and High Availability

**4. Publish/Subscribe Messaging

5. Lua Scripting

6. Memory Efficiency

7. Multi-Threading

When Should You Choose Redis?

When Should You Choose Memcached?

Final Verdict

Top comments (0)