Skip the theory rabbit holes. This is the caching knowledge that shows up in system design interviews, code reviews, and the 2 AM production incidents nobody warned you about.
Table of Contents
- Why Caching — The 30-Second Version
- Where Do You Actually Cache?
- Cache-Aside — The Pattern You'll Use 80% of the Time
- Write Strategies — The Other Side of the Coin
- Eviction Policies — LRU, LFU, and When It Matters
- TTL — Getting Expiry Right
- Cache Invalidation — The Hard Problem
- Cache Stampede — The Failure Mode That Kills Systems
- What NOT to Cache
- Practical Decision Framework
- Summary Cheat Sheet
1. Why Caching
Your database query takes 50ms. Your Redis lookup takes 0.5ms. If that same data is read 10,000 times before it changes, you're doing 10,000 × 50ms of database work — or you could do 1 × 50ms + 9,999 × 0.5ms. That's the entire case for caching.
Caching works because of read-heavy, write-light data patterns. Most applications read the same data far more than they change it. A product page might be read 100,000 times a day and updated once. A user profile is fetched on every page load and changed when the user updates their settings.
The fundamental tradeoff is always speed vs. freshness. A cache is a copy of data. That copy can be stale. Every caching decision is asking: how stale is acceptable, and for how long?
2. Where Do You Actually Cache?
In real applications, you'll encounter caches at multiple layers. Know them, because they interact:
In-process cache — data stored in your application's memory (a dict, a functools.lru_cache, Guava Cache in Java). Sub-millisecond. Doesn't survive restarts. Not shared across instances. Great for configuration, rarely-changing reference data, or memoization.
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_country_code(country_name: str) -> str:
return db.query("SELECT code FROM countries WHERE name = ?", country_name)
Distributed cache — Redis or Memcached. Your whole fleet of servers shares the same cache. 0.5–5ms per call. This is what people mean when they say "add a cache layer." It's the workhorse.
CDN — Cloudflare, CloudFront, Fastly. Caches HTTP responses at edge servers close to users. Milliseconds instead of hundreds of milliseconds. Invaluable for static assets, public API responses, and rendered pages.
Database query cache — Most databases (PostgreSQL, MySQL) cache query results internally. You get this for free and generally don't think about it directly, but it's why the second run of a cold query is faster than the first.
In a system design interview, when the interviewer asks "how do you handle scale?", the answer often involves: add Redis between your app and your database. Know when that's appropriate and when it isn't.
3. Cache-Aside
This is the pattern you will use most. Learn it cold.
The read flow:
- Check the cache
- If HIT → return the cached value
- If MISS → query the database, store the result in cache with a TTL, return it
The write flow:
- Write to the database
- Delete (invalidate) the cache entry
def get_user(user_id: int) -> User:
cache_key = f"user:{user_id}"
# 1. Check cache
cached = redis.get(cache_key)
if cached:
return User.from_json(cached)
# 2. Cache miss — hit the DB
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Store in cache for next time (5 minute TTL)
redis.setex(cache_key, 300, user.to_json())
return user
def update_user(user_id: int, data: dict):
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
redis.delete(f"user:{user_id}") # Invalidate — don't update, just delete
Why delete instead of update on writes?
Updating the cache on a write sounds sensible but creates race conditions. Two concurrent writes can race each other, leaving the cache in the wrong state. Deleting is safe — the next read will fetch fresh data from the database and re-populate the cache correctly.
Why cache-aside is the safe default:
- If Redis goes down, your application still works (just slower — it falls back to the DB)
- Only data that's actually requested gets cached (no wasted memory)
- Simple to reason about and debug
The main downside: the first request after a cache miss always pays the full DB cost. For a freshly deployed service, all traffic misses until the cache warms up.
4. Write Strategies
Cache-aside handles reads well. For writes, you have three meaningful choices depending on your consistency requirements.
Write-Through — Always Consistent, Always Slower
On every write: update the DB and update the cache in the same operation. The cache is never stale.
def update_product_price(product_id: int, new_price: float):
db.execute("UPDATE products SET price = %s WHERE id = %s", new_price, product_id)
# Also update cache immediately
product = db.get_product(product_id)
redis.setex(f"product:{product_id}", 300, product.to_json())
Use when: you have strict read-after-write requirements. A user updates their profile — they expect to immediately see the new version on their next page load.
The cost: every write is now double the latency (DB write + cache write). In write-heavy systems, this adds up.
Write-Behind (Write-Back) — Fast Writes, Risk of Data Loss
Write to the cache immediately, acknowledge success to the user, flush to the DB asynchronously in the background.
Write → Cache (instant ACK to user)
↓ (background, batched)
Database
Use when: you're writing high-frequency counters, view counts, analytics events — data where you can tolerate eventual consistency and need sub-millisecond write latency.
The real risk: if Redis crashes before flushing to the DB, that data is gone. Only use this when data loss is survivable.
Write-Around — Skip the Cache on Write
Writes go straight to the DB. The cache is only populated on reads. Use this when data is written once and rarely (or never) re-read soon after writing.
Classic example: log entries, uploaded files, audit records. You write them, they go to storage, and they get cached only if someone actually requests them.
In Practice
Most applications use cache-aside for reads + delete-on-write for invalidation. Write-through is added for critical paths where consistency matters. Write-behind is reserved for high-frequency write scenarios like analytics. You're rarely choosing just one pattern — you're mixing them based on data type.
5. Eviction Policies
When your cache is full and a new entry needs to go in, something has to be evicted. Redis gives you several policies. Here's what actually matters:
LRU (Least Recently Used): evict the entry that hasn't been accessed for the longest time. The default and the right choice for most applications. Rooted in a simple truth: if you haven't touched it recently, you probably don't need it.
LFU (Least Frequently Used): evict the entry with the fewest total accesses. Better when your popular data is stable and long-lived (a top-10 product list that's been hot for months). Worse for new content — it starts with zero frequency and gets evicted immediately even if it's about to go viral.
FIFO: evict the oldest inserted entry. Ignores access patterns. Rarely the right choice.
No eviction: Redis returns errors when full instead of evicting. Useful when your cache is a source of truth and data loss is unacceptable.
Redis Configuration
# In redis.conf or via CONFIG SET:
# Evict the least recently used keys across all keys
maxmemory-policy allkeys-lru
# Evict the least frequently used keys across all keys
maxmemory-policy allkeys-lfu
# Only evict keys that have a TTL set (leave persistent keys alone)
maxmemory-policy volatile-lru
# Set max memory (Redis won't use more than this)
maxmemory 2gb
Interview answer: "I'd use allkeys-lru for a general-purpose cache and allkeys-lfu if the dataset has clear long-term hot items like a top products list."
6. TTL
TTL (Time-To-Live) is how long a cache entry lives before Redis automatically deletes it. Getting this right matters more than most engineers think.
Too short: high miss rate, unnecessary database pressure, cache barely helps.
Too long: users see stale data, which is a product problem.
Practical TTL by Data Type
# Configuration / reference data — changes almost never
redis.setex("country_codes", 86400, data) # 24 hours
# User profiles — changes occasionally
redis.setex(f"user:{id}", 300, data) # 5 minutes
# Product catalog — changes regularly
redis.setex(f"product:{id}", 60, data) # 1 minute
# Live inventory / pricing — changes frequently
redis.setex(f"inventory:{id}", 10, data) # 10 seconds
# Trending content — freshness matters
redis.setex("trending_posts", 30, data) # 30 seconds
# Sessions — live as long as the session
redis.setex(f"session:{token}", 3600, data) # 1 hour
The Jitter Rule
If you deploy a service and populate 10,000 cache entries all with TTL=300, they will all expire at exactly T+300 simultaneously. Every single one misses at once. This is a thundering herd problem — more on that in the next section.
Fix it by adding randomness to your TTL:
import random
def cache_set(key: str, value, base_ttl: int):
# Add ±10% random jitter
jitter = random.randint(-base_ttl // 10, base_ttl // 10)
redis.setex(key, base_ttl + jitter, value)
This one habit prevents an entire class of production incidents.
7. Cache Invalidation
Phil Karlton famously said: "There are only two hard things in computer science: cache invalidation and naming things."
It's hard for a real reason: the data in your cache can go out of sync with your database, and getting it back in sync across distributed systems without race conditions is genuinely tricky.
Strategy 1: Let TTL Handle It
The simplest approach. Set a TTL, serve slightly stale data, and accept that users might see data that's up to TTL seconds old.
Works for: product descriptions, blog posts, user profiles, catalog data — anything where "slightly stale" is a product-acceptable answer.
# Just use TTL and don't overthink it
redis.setex(f"product:{product_id}", 60, product.to_json())
# Worst case: 60 seconds of stale data
This is underused. Many engineers reach for complex invalidation schemes when a TTL is perfectly fine.
Strategy 2: Explicit Delete on Write
When data changes, delete the cache entry immediately. The next read re-populates it fresh.
def update_product(product_id: int, data: dict):
db.update(product_id, data)
redis.delete(f"product:{product_id}") # Done — next read will be fresh
Works for: any data where you own both the write path and the cache. Simple and reliable when the same service writes and caches.
The problem it doesn't solve: in microservices, Service A writes the data and Service B has it cached. Service B doesn't know about Service A's write.
Strategy 3: Event-Driven Invalidation
Service A writes to the DB, publishes an event. Service B subscribes and invalidates its cache.
# Service A (Order service) — after a status change:
def update_order_status(order_id: int, new_status: str):
db.update_order(order_id, new_status)
kafka.publish("order.updated", {"order_id": order_id, "status": new_status})
# Service B (Dashboard service) — subscribes to events:
@kafka.subscribe("order.updated")
def handle_order_updated(event):
redis.delete(f"order:{event['order_id']}")
Works for: microservice architectures where multiple services cache the same data. More complex, but necessary at scale.
One important nuance: after deleting from cache, don't immediately re-populate it. Let the next natural read repopulate it. If you delete and re-populate immediately inside the event handler, you might re-populate with the old data if the DB replica hasn't caught up yet.
The Race Condition to Know
This comes up in interviews. The "double write" race:
Time 1: Request A reads product:123 → cache miss → queries DB → gets price $10
Time 2: Request B updates price to $20, deletes cache:product:123
Time 3: Request A writes the old $10 value back to cache
Cache now has stale $10 data
The standard fix: set a short TTL even when doing explicit invalidation. The window for stale data is bounded by the TTL, not infinite.
8. Cache Stampede
This is the failure mode that takes down production systems. You need to understand it, and you need to know one way to prevent it.
What Happens
A popular cache key expires. Instead of one request rebuilding it, thousands of simultaneous requests all see a cache miss and all independently query the database at the same time. Your database, which was handling 100 queries/second because the cache was absorbing the load, suddenly receives 5,000 simultaneous identical queries.
Normal: 10,000 req/s → cache (99% hit) → ~100 req/s to DB ✅
Stampede: popular key expires → all 10,000 req/s hit DB at once 💥
This is real. In 2019, Shopify's recommendation cache expired during Black Friday. Tens of thousands of concurrent requests slammed their database. Checkout pages timed out for 8 minutes, costing millions.
Prevention 1: TTL Jitter (Mandatory)
Covered in the TTL section — always add randomness. This prevents mass synchronized expiry but doesn't protect a single hot key from expiring under high traffic.
Prevention 2: Distributed Lock (The Reliable Fix)
Only let one request rebuild the cache. Everyone else either waits or gets served slightly stale data.
def get_trending_products():
cache_key = "trending:products"
lock_key = "lock:trending:products"
# Fast path — cache hit
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss — try to acquire a rebuild lock
acquired = redis.set(lock_key, "1", nx=True, ex=10) # 10s lock TTL
if acquired:
try:
# This process rebuilds the cache
products = db.get_trending_products()
redis.setex(cache_key, 300, json.dumps(products))
return products
finally:
redis.delete(lock_key)
else:
# Another process is rebuilding — wait briefly and try cache again
time.sleep(0.1)
cached = redis.get(cache_key)
return json.loads(cached) if cached else db.get_trending_products()
Key rules for the lock:
- Lock TTL must exceed the maximum time the DB query takes (if it takes 2s, your lock can't be 1s)
- Use atomic
SET key value NX EX ttl— never two separate commands - Always release the lock in a
finallyblock
Prevention 3: Stale-While-Revalidate
Serve the expired (stale) data immediately, trigger a background refresh. Users never see a slow response.
def get_with_stale(key: str, fetch_fn, ttl: int):
raw = redis.get(key)
if raw:
payload = json.loads(raw)
# If soft TTL expired, refresh in background but still return stale data
if time.time() > payload["refresh_at"]:
threading.Thread(target=lambda: _refresh(key, fetch_fn, ttl)).start()
return payload["value"]
# True miss — must fetch synchronously
value = fetch_fn()
redis.setex(key, ttl + 60, json.dumps({
"value": value,
"refresh_at": time.time() + ttl
}))
return value
This is also built into HTTP via Cache-Control: stale-while-revalidate=N — CDNs and browsers serve the stale response for up to N seconds while refreshing in the background.
Interview answer for "how do you handle cache stampede?": "I'd add TTL jitter to prevent synchronized expiry, and for high-traffic keys I'd use a distributed lock so only one process rebuilds the cache while others serve stale data or wait briefly."
9. What NOT to Cache
Knowing what to skip is as important as knowing what to add.
Unique or rarely repeated queries. If every user has a unique, personalized query that's never repeated, you'll have a 0% hit rate. You're paying cache overhead for no benefit. Search queries are a classic trap — "nike red shoes size 10" is unlikely to be queried identically again anytime soon.
Highly mutable data. If a value changes every few seconds, the TTL has to be so short that it barely helps. Invalidating it on every write negates the performance gain.
Large objects that take up disproportionate cache memory. Caching a 5MB JSON blob means that single entry uses the memory of 5,000 small entries. Cache the small things — IDs, scalars, lightweight structs — not entire serialized graphs.
User-specific data at scale. Caching each user's dashboard state for 50 million users means 50 million cache entries, most of which will be evicted before anyone reads them again. Profile data for frequently active users: cache it. Personalized dashboard content for casual users: recalculate on demand.
Data with complex invalidation dependencies. If invalidating one piece of data requires you to figure out and invalidate 20 related cache keys, you'll eventually miss one. The cache becomes inconsistently stale in ways that are very hard to debug. Sometimes the right answer is: don't cache this, compute it fresh.
Sensitive data without careful thought. PII, payment info, auth tokens — caching these in a shared Redis instance requires careful access control and encryption. The risk surface is real.
10. Practical Decision Framework
Before adding a cache, run through this:
1. Is the data read frequently and written infrequently?
→ No: caching probably won't help. Don't bother.
→ Yes: continue.
2. Is the data expensive to fetch or compute?
→ No (< ~5ms DB query, simple lookup): don't cache.
→ Yes (complex joins, external API calls, heavy computation): good candidate.
3. How stale can this data be?
→ Never stale (balances, inventory): use write-through or invalidate on every write.
→ Slightly stale OK (profiles, catalog): cache-aside + TTL (seconds to minutes).
→ Very stale OK (blog content, reference data): long TTL (hours to days).
4. Who writes the data?
→ Same service: delete from cache in the write path. Simple.
→ Different service: event-driven invalidation via Pub/Sub or message queue.
→ Multiple writers: use a short TTL as a safety net. Accept bounded staleness.
5. What's the failure mode if cache goes down?
→ Application falls back to DB (slower but works): cache-aside, safe to proceed.
→ Application dies: reconsider the architecture before adding more caching.
6. What's the expected hit rate?
→ < 50%: the cache is barely helping. Rethink the key design or don't cache.
→ > 80%: worth it.
→ > 95%: excellent.
11. Summary Cheat Sheet
Caching Patterns at a Glance
| Pattern | Reads from | Writes to | Consistency | Cache fails → | Use when |
|---|---|---|---|---|---|
| Cache-Aside | Cache, then DB | DB (delete cache) | Eventual | App still works | Default choice — most scenarios |
| Write-Through | Cache | Cache + DB (sync) | Strong | App still works | Read-after-write consistency required |
| Write-Behind | Cache | Cache → DB (async) | Eventual | Data loss risk | High write throughput, counters |
| Write-Around | Cache, then DB | DB only | Eventual | App still works | Write-once, rarely re-read data |
Eviction Policies at a Glance
| Policy | Evicts | Best For |
|---|---|---|
allkeys-lru |
Least recently used | General-purpose cache (safe default) |
allkeys-lfu |
Least frequently used | Stable hot datasets (top products, popular content) |
volatile-lru |
LRU among TTL-bearing keys | When some keys must never be evicted |
noeviction |
Nothing (returns error when full) | When data loss is unacceptable |
TTL Quick Reference
| Data Type | Suggested TTL |
|---|---|
| Static config / country codes | 24 hours |
| User profiles | 5 minutes |
| Product catalog | 1–5 minutes |
| Session tokens | Matches session lifetime |
| Trending / real-time feeds | 30–60 seconds |
| Live pricing / inventory | 5–15 seconds |
| Auth rate limit counters | 1–60 seconds (by design) |
Invalidation Strategies at a Glance
| Strategy | When to Use | Tradeoff |
|---|---|---|
| TTL only | Staleness is acceptable | Simplest; stale window = TTL |
| Delete on write | You own both read and write paths | Zero stale data; extra step on writes |
| Event-driven | Multiple services cache the same data | Decoupled; slight propagation delay |
Stampede Prevention
| Technique | Best For |
|---|---|
| TTL jitter | Preventing synchronized mass expiry (always do this) |
| Distributed lock | Single hot key under high concurrent traffic |
| Stale-while-revalidate | When serving slightly stale data during refresh is acceptable |
| Cache warming | Preventing cold cache after deploy or restart |
Closing Thoughts
Caching is one of the highest-ROI changes you can make to a system — but only when applied to the right problems. A well-placed Redis cache in front of an expensive query can slash p99 latency by 100x. A poorly placed cache adds complexity, stale data bugs, and operational overhead for no gain.
The rules to commit to memory:
- Cache-aside is the default. Don't start with anything else unless you have a specific reason.
- Delete, don't update, on writes. Prevents race conditions.
- Always add TTL jitter. It's two lines of code that prevent a class of production incidents.
- Know the stampede problem. It's the single most common cache-related outage pattern and it always hits at the worst moment (high traffic).
- Measure hit rate. If it's below 80%, the cache is doing less than you think.
- Design for cache failure. The system should degrade gracefully, not collapse, when Redis goes down.
Tags: #database #systemdesign #backend #performance #redis
Top comments (0)