Piyush Gupta

Posted on May 5

Master-Class: Caching — What Every Software Engineer Actually Needs to Know

#database #systemdesign #distributedsystems #interview

Skip the theory rabbit holes. This is the caching knowledge that shows up in system design interviews, code reviews, and the 2 AM production incidents nobody warned you about.

Why Caching — The 30-Second Version
Where Do You Actually Cache?
Cache-Aside — The Pattern You'll Use 80% of the Time
Write Strategies — The Other Side of the Coin
Eviction Policies — LRU, LFU, and When It Matters
TTL — Getting Expiry Right
Cache Invalidation — The Hard Problem
Cache Stampede — The Failure Mode That Kills Systems
What NOT to Cache
Practical Decision Framework
Summary Cheat Sheet

1. Why Caching

Your database query takes 50ms. Your Redis lookup takes 0.5ms. If that same data is read 10,000 times before it changes, you're doing 10,000 × 50ms of database work — or you could do 1 × 50ms + 9,999 × 0.5ms. That's the entire case for caching.

Caching works because of read-heavy, write-light data patterns. Most applications read the same data far more than they change it. A product page might be read 100,000 times a day and updated once. A user profile is fetched on every page load and changed when the user updates their settings.

The fundamental tradeoff is always speed vs. freshness. A cache is a copy of data. That copy can be stale. Every caching decision is asking: how stale is acceptable, and for how long?

2. Where Do You Actually Cache?

In real applications, you'll encounter caches at multiple layers. Know them, because they interact:

In-process cache — data stored in your application's memory (a dict, a functools.lru_cache, Guava Cache in Java). Sub-millisecond. Doesn't survive restarts. Not shared across instances. Great for configuration, rarely-changing reference data, or memoization.

from functools import lru_cache

@lru_cache(maxsize=1000)
def get_country_code(country_name: str) -> str:
    return db.query("SELECT code FROM countries WHERE name = ?", country_name)

Distributed cache — Redis or Memcached. Your whole fleet of servers shares the same cache. 0.5–5ms per call. This is what people mean when they say "add a cache layer." It's the workhorse.

CDN — Cloudflare, CloudFront, Fastly. Caches HTTP responses at edge servers close to users. Milliseconds instead of hundreds of milliseconds. Invaluable for static assets, public API responses, and rendered pages.

Database query cache — Most databases (PostgreSQL, MySQL) cache query results internally. You get this for free and generally don't think about it directly, but it's why the second run of a cold query is faster than the first.

In a system design interview, when the interviewer asks "how do you handle scale?", the answer often involves: add Redis between your app and your database. Know when that's appropriate and when it isn't.

3. Cache-Aside

This is the pattern you will use most. Learn it cold.

The read flow:

Check the cache
If HIT → return the cached value
If MISS → query the database, store the result in cache with a TTL, return it

The write flow:

Write to the database
Delete (invalidate) the cache entry

def get_user(user_id: int) -> User:
    cache_key = f"user:{user_id}"

    # 1. Check cache
    cached = redis.get(cache_key)
    if cached:
        return User.from_json(cached)

    # 2. Cache miss — hit the DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Store in cache for next time (5 minute TTL)
    redis.setex(cache_key, 300, user.to_json())
    return user

def update_user(user_id: int, data: dict):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    redis.delete(f"user:{user_id}")  # Invalidate — don't update, just delete

Why delete instead of update on writes?

Updating the cache on a write sounds sensible but creates race conditions. Two concurrent writes can race each other, leaving the cache in the wrong state. Deleting is safe — the next read will fetch fresh data from the database and re-populate the cache correctly.

Why cache-aside is the safe default:

If Redis goes down, your application still works (just slower — it falls back to the DB)
Only data that's actually requested gets cached (no wasted memory)
Simple to reason about and debug

The main downside: the first request after a cache miss always pays the full DB cost. For a freshly deployed service, all traffic misses until the cache warms up.

4. Write Strategies

Cache-aside handles reads well. For writes, you have three meaningful choices depending on your consistency requirements.

Write-Through — Always Consistent, Always Slower

On every write: update the DB and update the cache in the same operation. The cache is never stale.

def update_product_price(product_id: int, new_price: float):
    db.execute("UPDATE products SET price = %s WHERE id = %s", new_price, product_id)
    # Also update cache immediately
    product = db.get_product(product_id)
    redis.setex(f"product:{product_id}", 300, product.to_json())

Use when: you have strict read-after-write requirements. A user updates their profile — they expect to immediately see the new version on their next page load.

The cost: every write is now double the latency (DB write + cache write). In write-heavy systems, this adds up.

Write-Behind (Write-Back) — Fast Writes, Risk of Data Loss

Write to the cache immediately, acknowledge success to the user, flush to the DB asynchronously in the background.

Write → Cache (instant ACK to user)
              ↓ (background, batched)
         Database

Use when: you're writing high-frequency counters, view counts, analytics events — data where you can tolerate eventual consistency and need sub-millisecond write latency.

The real risk: if Redis crashes before flushing to the DB, that data is gone. Only use this when data loss is survivable.

Write-Around — Skip the Cache on Write

Writes go straight to the DB. The cache is only populated on reads. Use this when data is written once and rarely (or never) re-read soon after writing.

Classic example: log entries, uploaded files, audit records. You write them, they go to storage, and they get cached only if someone actually requests them.

In Practice

Most applications use cache-aside for reads + delete-on-write for invalidation. Write-through is added for critical paths where consistency matters. Write-behind is reserved for high-frequency write scenarios like analytics. You're rarely choosing just one pattern — you're mixing them based on data type.

5. Eviction Policies

When your cache is full and a new entry needs to go in, something has to be evicted. Redis gives you several policies. Here's what actually matters:

LRU (Least Recently Used): evict the entry that hasn't been accessed for the longest time. The default and the right choice for most applications. Rooted in a simple truth: if you haven't touched it recently, you probably don't need it.

LFU (Least Frequently Used): evict the entry with the fewest total accesses. Better when your popular data is stable and long-lived (a top-10 product list that's been hot for months). Worse for new content — it starts with zero frequency and gets evicted immediately even if it's about to go viral.

FIFO: evict the oldest inserted entry. Ignores access patterns. Rarely the right choice.

No eviction: Redis returns errors when full instead of evicting. Useful when your cache is a source of truth and data loss is unacceptable.

Redis Configuration

# In redis.conf or via CONFIG SET:

# Evict the least recently used keys across all keys
maxmemory-policy allkeys-lru

# Evict the least frequently used keys across all keys
maxmemory-policy allkeys-lfu

# Only evict keys that have a TTL set (leave persistent keys alone)
maxmemory-policy volatile-lru

# Set max memory (Redis won't use more than this)
maxmemory 2gb

Interview answer: "I'd use allkeys-lru for a general-purpose cache and allkeys-lfu if the dataset has clear long-term hot items like a top products list."

6. TTL

TTL (Time-To-Live) is how long a cache entry lives before Redis automatically deletes it. Getting this right matters more than most engineers think.

Too short: high miss rate, unnecessary database pressure, cache barely helps.

Too long: users see stale data, which is a product problem.

Practical TTL by Data Type

# Configuration / reference data — changes almost never
redis.setex("country_codes", 86400, data)           # 24 hours

# User profiles — changes occasionally
redis.setex(f"user:{id}", 300, data)                # 5 minutes

# Product catalog — changes regularly
redis.setex(f"product:{id}", 60, data)              # 1 minute

# Live inventory / pricing — changes frequently  
redis.setex(f"inventory:{id}", 10, data)            # 10 seconds

# Trending content — freshness matters
redis.setex("trending_posts", 30, data)             # 30 seconds

# Sessions — live as long as the session
redis.setex(f"session:{token}", 3600, data)         # 1 hour

The Jitter Rule

If you deploy a service and populate 10,000 cache entries all with TTL=300, they will all expire at exactly T+300 simultaneously. Every single one misses at once. This is a thundering herd problem — more on that in the next section.

Fix it by adding randomness to your TTL:

import random

def cache_set(key: str, value, base_ttl: int):
    # Add ±10% random jitter
    jitter = random.randint(-base_ttl // 10, base_ttl // 10)
    redis.setex(key, base_ttl + jitter, value)

This one habit prevents an entire class of production incidents.

7. Cache Invalidation

Phil Karlton famously said: "There are only two hard things in computer science: cache invalidation and naming things."

It's hard for a real reason: the data in your cache can go out of sync with your database, and getting it back in sync across distributed systems without race conditions is genuinely tricky.

Strategy 1: Let TTL Handle It

The simplest approach. Set a TTL, serve slightly stale data, and accept that users might see data that's up to TTL seconds old.

Works for: product descriptions, blog posts, user profiles, catalog data — anything where "slightly stale" is a product-acceptable answer.

# Just use TTL and don't overthink it
redis.setex(f"product:{product_id}", 60, product.to_json())
# Worst case: 60 seconds of stale data

This is underused. Many engineers reach for complex invalidation schemes when a TTL is perfectly fine.

Strategy 2: Explicit Delete on Write

When data changes, delete the cache entry immediately. The next read re-populates it fresh.

def update_product(product_id: int, data: dict):
    db.update(product_id, data)
    redis.delete(f"product:{product_id}")  # Done — next read will be fresh

Works for: any data where you own both the write path and the cache. Simple and reliable when the same service writes and caches.

The problem it doesn't solve: in microservices, Service A writes the data and Service B has it cached. Service B doesn't know about Service A's write.

Strategy 3: Event-Driven Invalidation

Service A writes to the DB, publishes an event. Service B subscribes and invalidates its cache.

# Service A (Order service) — after a status change:
def update_order_status(order_id: int, new_status: str):
    db.update_order(order_id, new_status)
    kafka.publish("order.updated", {"order_id": order_id, "status": new_status})

# Service B (Dashboard service) — subscribes to events:
@kafka.subscribe("order.updated")
def handle_order_updated(event):
    redis.delete(f"order:{event['order_id']}")

Works for: microservice architectures where multiple services cache the same data. More complex, but necessary at scale.

One important nuance: after deleting from cache, don't immediately re-populate it. Let the next natural read repopulate it. If you delete and re-populate immediately inside the event handler, you might re-populate with the old data if the DB replica hasn't caught up yet.

The Race Condition to Know

This comes up in interviews. The "double write" race:

Time 1: Request A reads product:123 → cache miss → queries DB → gets price $10
Time 2: Request B updates price to $20, deletes cache:product:123
Time 3: Request A writes the old $10 value back to cache
         Cache now has stale $10 data

The standard fix: set a short TTL even when doing explicit invalidation. The window for stale data is bounded by the TTL, not infinite.

8. Cache Stampede

This is the failure mode that takes down production systems. You need to understand it, and you need to know one way to prevent it.

What Happens

A popular cache key expires. Instead of one request rebuilding it, thousands of simultaneous requests all see a cache miss and all independently query the database at the same time. Your database, which was handling 100 queries/second because the cache was absorbing the load, suddenly receives 5,000 simultaneous identical queries.

Normal:   10,000 req/s → cache (99% hit) → ~100 req/s to DB   ✅
Stampede: popular key expires → all 10,000 req/s hit DB at once 💥

This is real. In 2019, Shopify's recommendation cache expired during Black Friday. Tens of thousands of concurrent requests slammed their database. Checkout pages timed out for 8 minutes, costing millions.

Prevention 1: TTL Jitter (Mandatory)

Covered in the TTL section — always add randomness. This prevents mass synchronized expiry but doesn't protect a single hot key from expiring under high traffic.

Prevention 2: Distributed Lock (The Reliable Fix)

Only let one request rebuild the cache. Everyone else either waits or gets served slightly stale data.

def get_trending_products():
    cache_key = "trending:products"
    lock_key = "lock:trending:products"

    # Fast path — cache hit
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # Cache miss — try to acquire a rebuild lock
    acquired = redis.set(lock_key, "1", nx=True, ex=10)  # 10s lock TTL

    if acquired:
        try:
            # This process rebuilds the cache
            products = db.get_trending_products()
            redis.setex(cache_key, 300, json.dumps(products))
            return products
        finally:
            redis.delete(lock_key)
    else:
        # Another process is rebuilding — wait briefly and try cache again
        time.sleep(0.1)
        cached = redis.get(cache_key)
        return json.loads(cached) if cached else db.get_trending_products()

Key rules for the lock:

Lock TTL must exceed the maximum time the DB query takes (if it takes 2s, your lock can't be 1s)
Use atomic SET key value NX EX ttl — never two separate commands
Always release the lock in a finally block

Prevention 3: Stale-While-Revalidate

Serve the expired (stale) data immediately, trigger a background refresh. Users never see a slow response.

def get_with_stale(key: str, fetch_fn, ttl: int):
    raw = redis.get(key)
    if raw:
        payload = json.loads(raw)
        # If soft TTL expired, refresh in background but still return stale data
        if time.time() > payload["refresh_at"]:
            threading.Thread(target=lambda: _refresh(key, fetch_fn, ttl)).start()
        return payload["value"]

    # True miss — must fetch synchronously
    value = fetch_fn()
    redis.setex(key, ttl + 60, json.dumps({
        "value": value,
        "refresh_at": time.time() + ttl
    }))
    return value

This is also built into HTTP via Cache-Control: stale-while-revalidate=N — CDNs and browsers serve the stale response for up to N seconds while refreshing in the background.

Interview answer for "how do you handle cache stampede?": "I'd add TTL jitter to prevent synchronized expiry, and for high-traffic keys I'd use a distributed lock so only one process rebuilds the cache while others serve stale data or wait briefly."

9. What NOT to Cache

Knowing what to skip is as important as knowing what to add.

Unique or rarely repeated queries. If every user has a unique, personalized query that's never repeated, you'll have a 0% hit rate. You're paying cache overhead for no benefit. Search queries are a classic trap — "nike red shoes size 10" is unlikely to be queried identically again anytime soon.

Highly mutable data. If a value changes every few seconds, the TTL has to be so short that it barely helps. Invalidating it on every write negates the performance gain.

Large objects that take up disproportionate cache memory. Caching a 5MB JSON blob means that single entry uses the memory of 5,000 small entries. Cache the small things — IDs, scalars, lightweight structs — not entire serialized graphs.

User-specific data at scale. Caching each user's dashboard state for 50 million users means 50 million cache entries, most of which will be evicted before anyone reads them again. Profile data for frequently active users: cache it. Personalized dashboard content for casual users: recalculate on demand.

Data with complex invalidation dependencies. If invalidating one piece of data requires you to figure out and invalidate 20 related cache keys, you'll eventually miss one. The cache becomes inconsistently stale in ways that are very hard to debug. Sometimes the right answer is: don't cache this, compute it fresh.

Sensitive data without careful thought. PII, payment info, auth tokens — caching these in a shared Redis instance requires careful access control and encryption. The risk surface is real.

10. Practical Decision Framework

Before adding a cache, run through this:

1. Is the data read frequently and written infrequently?
   → No: caching probably won't help. Don't bother.
   → Yes: continue.

2. Is the data expensive to fetch or compute?
   → No (< ~5ms DB query, simple lookup): don't cache.
   → Yes (complex joins, external API calls, heavy computation): good candidate.

3. How stale can this data be?
   → Never stale (balances, inventory): use write-through or invalidate on every write.
   → Slightly stale OK (profiles, catalog): cache-aside + TTL (seconds to minutes).
   → Very stale OK (blog content, reference data): long TTL (hours to days).

4. Who writes the data?
   → Same service: delete from cache in the write path. Simple.
   → Different service: event-driven invalidation via Pub/Sub or message queue.
   → Multiple writers: use a short TTL as a safety net. Accept bounded staleness.

5. What's the failure mode if cache goes down?
   → Application falls back to DB (slower but works): cache-aside, safe to proceed.
   → Application dies: reconsider the architecture before adding more caching.

6. What's the expected hit rate?
   → < 50%: the cache is barely helping. Rethink the key design or don't cache.
   → > 80%: worth it.
   → > 95%: excellent.

11. Summary Cheat Sheet

Caching Patterns at a Glance

Pattern	Reads from	Writes to	Consistency	Cache fails →	Use when
Cache-Aside	Cache, then DB	DB (delete cache)	Eventual	App still works	Default choice — most scenarios
Write-Through	Cache	Cache + DB (sync)	Strong	App still works	Read-after-write consistency required
Write-Behind	Cache	Cache → DB (async)	Eventual	Data loss risk	High write throughput, counters
Write-Around	Cache, then DB	DB only	Eventual	App still works	Write-once, rarely re-read data

Eviction Policies at a Glance

Policy	Evicts	Best For
`allkeys-lru`	Least recently used	General-purpose cache (safe default)
`allkeys-lfu`	Least frequently used	Stable hot datasets (top products, popular content)
`volatile-lru`	LRU among TTL-bearing keys	When some keys must never be evicted
`noeviction`	Nothing (returns error when full)	When data loss is unacceptable

TTL Quick Reference

Data Type	Suggested TTL
Static config / country codes	24 hours
User profiles	5 minutes
Product catalog	1–5 minutes
Session tokens	Matches session lifetime
Trending / real-time feeds	30–60 seconds
Live pricing / inventory	5–15 seconds
Auth rate limit counters	1–60 seconds (by design)

Invalidation Strategies at a Glance

Strategy	When to Use	Tradeoff
TTL only	Staleness is acceptable	Simplest; stale window = TTL
Delete on write	You own both read and write paths	Zero stale data; extra step on writes
Event-driven	Multiple services cache the same data	Decoupled; slight propagation delay

Stampede Prevention

Technique	Best For
TTL jitter	Preventing synchronized mass expiry (always do this)
Distributed lock	Single hot key under high concurrent traffic
Stale-while-revalidate	When serving slightly stale data during refresh is acceptable
Cache warming	Preventing cold cache after deploy or restart

Closing Thoughts

Caching is one of the highest-ROI changes you can make to a system — but only when applied to the right problems. A well-placed Redis cache in front of an expensive query can slash p99 latency by 100x. A poorly placed cache adds complexity, stale data bugs, and operational overhead for no gain.

The rules to commit to memory:

Cache-aside is the default. Don't start with anything else unless you have a specific reason.
Delete, don't update, on writes. Prevents race conditions.
Always add TTL jitter. It's two lines of code that prevent a class of production incidents.
Know the stampede problem. It's the single most common cache-related outage pattern and it always hits at the worst moment (high traffic).
Measure hit rate. If it's below 80%, the cache is doing less than you think.
Design for cache failure. The system should degrade gracefully, not collapse, when Redis goes down.

Tags: #database #systemdesign #backend #performance #redis

DEV Community