Distributed caching is one of the most frequently asked system design interview questions — and one of the most important real-world backend optimizations.
If you're preparing for backend roles (especially senior-level), you should be able to clearly explain:
- Why we need a distributed cache
- How it works internally
- How it scales
- What trade-offs exist
- How to handle failures
Let’s walk through it step by step in a clean, practical, interview-ready way.
The Problem - Why Do We Need a Cache?
Imagine a system serving millions of users.
Every request:
Client → App Server → Database
Problems:
- Database becomes a bottleneck
- High latency (disk I/O is slow)
- Expensive horizontal scaling
- Increased infrastructure cost
Now imagine 90% of requests are reads for the same popular data (product details, user profile, configuration, etc.).
We are repeatedly fetching the same data from disk.
That’s inefficient.
The Core Idea of Caching
Instead of hitting the database every time:
Client → App → Cache → Database
Flow:
- App checks cache.
- If data exists → return immediately.
- If not → fetch from DB.
- Store in cache.
- Return response.
Memory access is extremely fast compared to disk.
This reduces:
- Database load
- Latency
- Cost
When Single Cache Is Not Enough
A single cache server works for small systems.
But at scale:
- Memory becomes limited
- It becomes a single point of failure
- Cannot handle massive traffic
So we move to a Distributed Cache Cluster.
High-Level Architecture
Here’s the production-ready architecture:
Component Breakdown
Clients
Users sending requests (mobile/web).
Load Balancer
Distributes traffic across multiple app servers.
App Servers
Contain business logic.
They:
- Check cache first
- On miss → fetch from DB
- Store in cache
Distributed Cache Cluster
Multiple cache nodes working together.
Features:
- Key distribution
- Replication
- TTL-based expiration
- Monitoring
Database
Source of truth.
Key Distribution
When we have multiple cache nodes:
How do we decide which key goes to which server?
Naive Approach
hash(key) % number_of_servers
Problem:
- If a server is added/removed → almost all keys get remapped.
This causes massive cache misses.
Better Approach - Consistent Hashing
Idea:
- Arrange servers in a hash ring.
- Hash each key.
- Store key on nearest server clockwise.
If one node fails:
- Only small portion of keys move.
- System remains mostly stable.
This is used in systems like Redis and Memcached.
This makes scaling predictable and stable.
Caching Strategies
Different systems use different write strategies.
Cache Aside (Lazy Loading)
Most common.
Flow:
- App checks cache.
- On miss → fetch from DB.
- Store in cache.
Pros:
- Simple
- Efficient
- Only caches frequently used data
Cons:
- First request slower (cold start)
Write Through
- Write goes to cache and DB simultaneously.
- Stronger consistency.
- Higher latency.
Write Back (Write Behind)
- Write to cache first.
- DB updated asynchronously.
Pros:
- Fast writes
Cons:
- Risk of data loss if cache crashes before DB sync.
Eviction Policies
Cache memory is limited.
When full, what do we remove?
Common policies:
- LRU - Least Recently Used
- LFU - Least Frequently Used
- FIFO - First In First Out
- TTL - Time-based expiration
Most production systems use LRU + TTL combination.
Replication for High Availability
What happens if a cache node fails?
Without Replication
- Data lost
- Rebuilt from DB
- Temporary latency spike
With Replication
- Primary + replica nodes
- Failover possible
- Higher availability
Trade-off:
Memory cost vs Availability.
In real-world production systems, replication is common.
Common Problems in Distributed Cache
Cache Stampede
Many requests miss at same time and hit DB.
Solutions:
- Mutex locking
- Request coalescing
- Pre-warming cache
Hot Keys
One key receives massive traffic.
Solutions:
- Replication
- Sharding hot key
- Rate limiting
Stale Data
Cache and DB become inconsistent.
Solutions:
- Invalidate cache on update
- Use short TTL
- Event-driven invalidation
Most systems accept eventual consistency.
Non-Functional Requirements
A distributed cache must provide:
- Very low latency (sub-millisecond)
- High availability
- Horizontal scalability
- Fault tolerance
- High throughput
In most large systems:
- Read-to-write ratio = 100:1
- Cache dramatically reduces DB pressure
Trade-Offs
End-to-End Flow
- Client sends request
- Load balancer routes to app server
- App checks cache
- If hit → return instantly
- If miss → fetch from DB
- Store in cache
- Return response
Database remains protected. System scales smoothly.


Top comments (0)