Jaspreet singh

Posted on Jun 20

HLD Fundamentals #6 : Consistent Hashing: The Smart Way to Scale Distributed Systems

#algorithms #architecture #distributedsystems #systemdesign

Consistent Hashing Explained: The Smart Way to Scale Distributed Systems

Most system design interviews eventually reach a point where the interviewer asks:

"What happens when a new server is added?"

At first, scaling sounds easy—just add another machine.

But in distributed systems, adding or removing servers can cause massive data movement, cache misses, and performance degradation if data distribution isn't handled correctly.

This is exactly the problem Consistent Hashing solves.

It is one of the most important concepts behind systems like Amazon DynamoDB, Cassandra, Redis Cluster, Memcached, Akamai CDN, and many large-scale distributed databases.

In this article, we'll understand:

Why Consistent Hashing exists
Problems with traditional hashing
How Consistent Hashing works
Virtual Nodes
Real-world use cases
Interview-focused explanations

Why Do We Need Consistent Hashing?

Imagine you have:

3 cache servers
Millions of user sessions
Data distributed across servers

A simple approach would be:

Server = Hash(Key) % NumberOfServers

Example:

Hash(User123) % 3 = Server 1
Hash(User456) % 3 = Server 2
Hash(User789) % 3 = Server 0

This works perfectly...

Until your traffic increases.

Now you add a fourth server.

Server = Hash(Key) % 4

Suddenly almost every key gets remapped.

The Big Problem

Before:

Hash(User123) % 3 = Server 1

After adding Server 4:

Hash(User123) % 4 = Server 3

The same user now points to a completely different server.

As a result:

Cache becomes useless
Data must be migrated
Massive cache misses occur
System performance drops

This process is called Rebalancing.

The Rebalancing Problem

Suppose you have:

100 Million Records
3 Servers

You add a new server.

With modulo-based hashing:

Almost all records need redistribution

That's extremely expensive.

For large-scale systems:

Data transfer takes time
Network bandwidth increases
Cache hit ratio drops
User latency increases

We need a better solution.

Enter Consistent Hashing

Consistent Hashing is a technique designed to minimize data movement when servers are added or removed.

Main Goal

Instead of moving all data:

Move only a small portion of data

In a properly balanced system:

Only 1/N of keys are reassigned

Where:

N = Number of Servers

This makes scaling significantly cheaper.

Core Idea: The Hash Ring

Unlike normal hashing, Consistent Hashing places both:

Servers
Data Keys

on a virtual circular ring.

Step 1: Create a Ring

Imagine a hash space:

0 → 360°

0 → 2^32 - 1

Both servers and keys are hashed into this space.

Example:

Server A → 50
Server B → 150
Server C → 300

Ring:

          0
       /     \
    300       50
       \     /
        150

Step 2: Place Keys on the Ring

Suppose:

User1 → 70
User2 → 180
User3 → 320

Now each key searches clockwise for the next available server.

Data Assignment Rule

A key belongs to:

The first server encountered while moving clockwise.

Example:

User1 → 70

Clockwise:

70 → Server B(150)

Assigned to:

Server B

Another Example

User2 → 180

Clockwise:

180 → Server C(300)

Assigned to:

Server C

Another Example

User3 → 320

Clockwise:

320 → wrap around → Server A(50)

Assigned to:

Server A

This wrapping behavior is why it's called a ring.

What Happens When a New Server Is Added?

Suppose a new server appears:

Server D → 220

Before:

Server B(150) → Server C(300)

After:

Server B(150) → Server D(220) → Server C(300)

Only keys between:

150 and 220

move to Server D.

Everything else remains untouched.

Result

Instead of moving:

100% of data

We move roughly:

1/N of data

This is the biggest advantage of Consistent Hashing.

What Happens When a Server Fails?

Suppose:

Server B crashes

Only the keys owned by Server B need reassignment.

Those keys move to the next clockwise server.

Example:

Server B → Down

Keys automatically move to:

Server C

The rest of the ring remains unchanged.

This makes Consistent Hashing naturally fault tolerant.

The Uneven Distribution Problem

There is still one issue.

Suppose servers are placed like:

Server A → 10
Server B → 15
Server C → 250

Now Server C owns a huge portion of the ring.

Result:

Uneven Load Distribution

Some servers become overloaded while others sit idle.

Virtual Nodes (VNodes)

To solve uneven distribution, we use Virtual Nodes.

Instead of placing a server once:

Server A → 50

Place it multiple times:

Server A-1 → 50
Server A-2 → 120
Server A-3 → 280

Similarly:

Server B-1
Server B-2
Server B-3

Server C-1
Server C-2
Server C-3

Now the ring becomes much more balanced.

Why Virtual Nodes Matter

Better Load Balancing

Requests spread more evenly.

Easier Scaling

Adding a new machine impacts smaller portions of the ring.

Better Fault Tolerance

Failure impact is distributed across the cluster.

Real-World Example: Redis Cluster

Imagine:

5 Redis Nodes

User sessions are stored using Consistent Hashing.

When traffic grows:

Add Redis Node 6

Without Consistent Hashing:

Almost entire cache gets reshuffled

With Consistent Hashing:

Only a small subset of keys move

Cache hit rate remains high.

System stays stable.

Real-World Example: Database Sharding

Suppose Amazon stores customer data across many database shards.

Customer IDs are hashed onto a ring.

When storage grows:

Add New Shard

Only nearby data migrates.

No massive rebalancing operation is required.

Advantages vs Disadvantages

Advantages	Disadvantages
Minimal data movement	More complex than modulo hashing
Easy horizontal scaling	Requires hash ring management
High cache hit ratio	Virtual nodes add implementation complexity
Fault tolerant	Debugging can be harder
Widely used in distributed systems	Uneven distribution without VNodes

Interview Questions You Should Be Ready For

Why not use modulo hashing?

Because adding or removing a server changes the modulo value, causing almost all keys to be remapped.

What problem does Consistent Hashing solve?

It minimizes rebalancing when servers are added or removed.

How are keys assigned?

A key is assigned to the first server encountered in the clockwise direction on the hash ring.

What are Virtual Nodes?

Multiple logical representations of the same physical server placed on the ring to improve load distribution.

How much data moves when a server is added?

Approximately:

1/N of the total data

instead of almost all data.

Interview Answer in 60 Seconds

Consistent Hashing is a data distribution technique used in distributed systems to minimize rebalancing when servers are added or removed. Instead of using Hash(Key) % N, both servers and keys are placed on a virtual hash ring. A key is assigned to the next server in the clockwise direction. When a server joins or leaves, only a small subset of keys is reassigned, typically around 1/N of the total data. To ensure even distribution, Virtual Nodes are used, where each physical server is represented multiple times on the ring. It is commonly used in Redis, Cassandra, DynamoDB, Memcached, and large-scale sharded databases.

TL;DR

Traditional hashing uses Hash(Key) % N
Adding/removing servers causes massive rebalancing
Consistent Hashing uses a virtual ring
Keys move clockwise to the nearest server
Only ~1/N of data moves during scaling
Virtual Nodes solve uneven distribution
Used in Redis, Cassandra, DynamoDB, Memcached, CDNs, and database sharding
One of the most frequently asked distributed systems interview topics

Scaling isn't just about adding servers.

The real challenge is adding servers without moving everything.

That's exactly why Consistent Hashing became a foundational building block of modern distributed systems.

Have you ever implemented Consistent Hashing in a real project, or only discussed it in interviews? Share your experience in the comments.

DEV Community

HLD Fundamentals #6 : Consistent Hashing: The Smart Way to Scale Distributed Systems

Consistent Hashing Explained: The Smart Way to Scale Distributed Systems

Why Do We Need Consistent Hashing?

The Big Problem

The Rebalancing Problem

Enter Consistent Hashing

Main Goal

Core Idea: The Hash Ring

Step 1: Create a Ring

Step 2: Place Keys on the Ring

Data Assignment Rule

Another Example

Another Example

What Happens When a New Server Is Added?

Result

What Happens When a Server Fails?

The Uneven Distribution Problem

Virtual Nodes (VNodes)

Why Virtual Nodes Matter

Better Load Balancing

Easier Scaling

Better Fault Tolerance

Real-World Example: Redis Cluster

Real-World Example: Database Sharding

Advantages vs Disadvantages

Interview Questions You Should Be Ready For

Why not use modulo hashing?

What problem does Consistent Hashing solve?

How are keys assigned?

What are Virtual Nodes?

How much data moves when a server is added?

Interview Answer in 60 Seconds

TL;DR

Top comments (0)