DEV Community

Sreya Satheesh
Sreya Satheesh

Posted on

SD #010 - Designing a Distributed Cache System

Distributed caching is one of the most frequently asked system design interview questions — and one of the most important real-world backend optimizations.

If you're preparing for backend roles (especially senior-level), you should be able to clearly explain:

  • Why we need a distributed cache
  • How it works internally
  • How it scales
  • What trade-offs exist
  • How to handle failures

Let’s walk through it step by step in a clean, practical, interview-ready way.

The Problem - Why Do We Need a Cache?

Imagine a system serving millions of users.

Every request:

Client → App Server → Database

Problems:

  • Database becomes a bottleneck
  • High latency (disk I/O is slow)
  • Expensive horizontal scaling
  • Increased infrastructure cost

Now imagine 90% of requests are reads for the same popular data (product details, user profile, configuration, etc.).

We are repeatedly fetching the same data from disk.

That’s inefficient.

The Core Idea of Caching

Instead of hitting the database every time:

Client → App → Cache → Database

Flow:

  1. App checks cache.
  2. If data exists → return immediately.
  3. If not → fetch from DB.
  4. Store in cache.
  5. Return response.

Memory access is extremely fast compared to disk.

This reduces:

  • Database load
  • Latency
  • Cost

When Single Cache Is Not Enough

A single cache server works for small systems.

But at scale:

  • Memory becomes limited
  • It becomes a single point of failure
  • Cannot handle massive traffic

So we move to a Distributed Cache Cluster.

High-Level Architecture

Here’s the production-ready architecture:

Component Breakdown

Clients

Users sending requests (mobile/web).

Load Balancer

Distributes traffic across multiple app servers.

App Servers

Contain business logic.

They:

  • Check cache first
  • On miss → fetch from DB
  • Store in cache

Distributed Cache Cluster

Multiple cache nodes working together.

Features:

  • Key distribution
  • Replication
  • TTL-based expiration
  • Monitoring

Database

Source of truth.

Key Distribution

When we have multiple cache nodes:

How do we decide which key goes to which server?

Naive Approach

hash(key) % number_of_servers

Problem:

  • If a server is added/removed → almost all keys get remapped.

This causes massive cache misses.

Better Approach - Consistent Hashing

Idea:

  • Arrange servers in a hash ring.
  • Hash each key.
  • Store key on nearest server clockwise.

If one node fails:

  • Only small portion of keys move.
  • System remains mostly stable.

This is used in systems like Redis and Memcached.

This makes scaling predictable and stable.

Caching Strategies

Different systems use different write strategies.

Cache Aside (Lazy Loading)

Most common.

Flow:

  • App checks cache.
  • On miss → fetch from DB.
  • Store in cache.

Pros:

  • Simple
  • Efficient
  • Only caches frequently used data

Cons:

  • First request slower (cold start)

Write Through

  • Write goes to cache and DB simultaneously.
  • Stronger consistency.
  • Higher latency.

Write Back (Write Behind)

  • Write to cache first.
  • DB updated asynchronously.

Pros:

  • Fast writes

Cons:

  • Risk of data loss if cache crashes before DB sync.

Eviction Policies

Cache memory is limited.

When full, what do we remove?

Common policies:

  • LRU - Least Recently Used
  • LFU - Least Frequently Used
  • FIFO - First In First Out
  • TTL - Time-based expiration

Most production systems use LRU + TTL combination.

Replication for High Availability

What happens if a cache node fails?

Without Replication

  • Data lost
  • Rebuilt from DB
  • Temporary latency spike

With Replication

  • Primary + replica nodes
  • Failover possible
  • Higher availability

Trade-off:

Memory cost vs Availability.

In real-world production systems, replication is common.

Common Problems in Distributed Cache

Cache Stampede

Many requests miss at same time and hit DB.

Solutions:

  • Mutex locking
  • Request coalescing
  • Pre-warming cache

Hot Keys

One key receives massive traffic.

Solutions:

  • Replication
  • Sharding hot key
  • Rate limiting

Stale Data

Cache and DB become inconsistent.

Solutions:

  • Invalidate cache on update
  • Use short TTL
  • Event-driven invalidation

Most systems accept eventual consistency.

Non-Functional Requirements

A distributed cache must provide:

  • Very low latency (sub-millisecond)
  • High availability
  • Horizontal scalability
  • Fault tolerance
  • High throughput

In most large systems:

  • Read-to-write ratio = 100:1
  • Cache dramatically reduces DB pressure

Trade-Offs

End-to-End Flow

  1. Client sends request
  2. Load balancer routes to app server
  3. App checks cache
  4. If hit → return instantly
  5. If miss → fetch from DB
  6. Store in cache
  7. Return response

Database remains protected. System scales smoothly.

Top comments (0)