Sreya Satheesh

Posted on Mar 3 • Edited on Mar 7

SD #010 - Designing a Distributed Cache System

#systemdesign #distributedsystems #backendengineering #caching

Distributed caching is one of the most frequently asked system design interview questions — and one of the most important real-world backend optimizations.

If you're preparing for backend roles (especially senior-level), you should be able to clearly explain:

Why we need a distributed cache
How it works internally
How it scales
What trade-offs exist
How to handle failures

Let’s walk through it step by step in a clean, practical, interview-ready way.

The Problem - Why Do We Need a Cache?

Imagine a system serving millions of users.

Every request:

Client → App Server → Database

Problems:

Database becomes a bottleneck
High latency (disk I/O is slow)
Expensive horizontal scaling
Increased infrastructure cost

Now imagine 90% of requests are reads for the same popular data (product details, user profile, configuration, etc.).

We are repeatedly fetching the same data from disk.

That’s inefficient.

The Core Idea of Caching

Instead of hitting the database every time:

Client → App → Cache → Database

Flow:

App checks cache.
If data exists → return immediately.
If not → fetch from DB.
Store in cache.
Return response.

Memory access is extremely fast compared to disk.

This reduces:

Database load
Latency
Cost

When Single Cache Is Not Enough

A single cache server works for small systems.

But at scale:

Memory becomes limited
It becomes a single point of failure
Cannot handle massive traffic

So we move to a Distributed Cache Cluster.

High-Level Architecture

Here’s the production-ready architecture:

Component Breakdown

Clients

Users sending requests (mobile/web).

Load Balancer

Distributes traffic across multiple app servers.

App Servers

Contain business logic.

They:

Check cache first
On miss → fetch from DB
Store in cache

Distributed Cache Cluster

Multiple cache nodes working together.

Features:

Key distribution
Replication
TTL-based expiration
Monitoring

Database

Source of truth.

Key Distribution

When we have multiple cache nodes:

How do we decide which key goes to which server?

Naive Approach

hash(key) % number_of_servers

Problem:

If a server is added/removed → almost all keys get remapped.

This causes massive cache misses.

Better Approach - Consistent Hashing

Idea:

Arrange servers in a hash ring.
Hash each key.
Store key on nearest server clockwise.

If one node fails:

Only small portion of keys move.
System remains mostly stable.

This is used in systems like Redis and Memcached.

This makes scaling predictable and stable.

Caching Strategies

Different systems use different write strategies.

Cache Aside (Lazy Loading)

Most common.

Flow:

App checks cache.
On miss → fetch from DB.
Store in cache.

Pros:

Simple
Efficient
Only caches frequently used data

Cons:

First request slower (cold start)

Write Through

Write goes to cache and DB simultaneously.
Stronger consistency.
Higher latency.

Write Back (Write Behind)

Write to cache first.
DB updated asynchronously.

Pros:

Fast writes

Cons:

Risk of data loss if cache crashes before DB sync.

Eviction Policies

Cache memory is limited.

When full, what do we remove?

Common policies:

LRU - Least Recently Used
LFU - Least Frequently Used
FIFO - First In First Out
TTL - Time-based expiration

Most production systems use LRU + TTL combination.

Replication for High Availability

What happens if a cache node fails?

Without Replication

Data lost
Rebuilt from DB
Temporary latency spike

With Replication

Primary + replica nodes
Failover possible
Higher availability

Trade-off:

Memory cost vs Availability.

In real-world production systems, replication is common.

Common Problems in Distributed Cache

Cache Stampede

Many requests miss at same time and hit DB.

Solutions:

Mutex locking
Request coalescing
Pre-warming cache

Hot Keys

One key receives massive traffic.

Solutions:

Replication
Sharding hot key
Rate limiting

Stale Data

Cache and DB become inconsistent.

Solutions:

Invalidate cache on update
Use short TTL
Event-driven invalidation

Most systems accept eventual consistency.

Non-Functional Requirements

A distributed cache must provide:

Very low latency (sub-millisecond)
High availability
Horizontal scalability
Fault tolerance
High throughput

In most large systems:

Read-to-write ratio = 100:1
Cache dramatically reduces DB pressure

Trade-Offs

End-to-End Flow

Client sends request
Load balancer routes to app server
App checks cache
If hit → return instantly
If miss → fetch from DB
Store in cache
Return response

Top comments (1)

The discussion has been locked. New comments can't be added.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.