DEV Community

Cover image for Hashing - Consistent Hashing: All about Hashing with Example
ZeeshanAli-0704
ZeeshanAli-0704

Posted on • Edited on

Hashing - Consistent Hashing: All about Hashing with Example

Consistent Hashing: A Resilient Approach to Distributed Storage

Table of Contents

Introduction

The world is generating massive amounts of data every second. It’s practically impossible to store all of it on a single server. To handle this, we rely on horizontal scaling, where data is distributed across multiple servers.

While this helps with capacity, it also introduces a key challenge:
πŸ‘‰ How do we distribute and retrieve data efficiently across servers when servers can be added or removed over time?

This is where consistent hashing comes in β€” a powerful technique widely used in distributed systems, load balancers, databases, and caching systems (e.g., Amazon DynamoDB, Apache Cassandra, Memcached, etc.).

In this article, we’ll start with the basic hashing approach, discuss its limitations, and then introduce consistent hashing with examples, diagrams, and enhancements like virtual nodes.


The Problem: Data Distribution with Hashing

Imagine you have n data objects that need to be stored across k servers.

A simple approach is to use a hash function:

serverIndex = hash(key) % k
Enter fullscreen mode Exit fullscreen mode
  • Each key is hashed.
  • The hash value determines which server stores that key.

Example:

  • 4 servers (S0, S1, S2, S3)
  • Keys: A, B, C, D, E, F

If hash(A) % 4 = 2, then key A is stored on server S2.

βœ… This works fine as long as the number of servers (k) doesn’t change.


The Rehashing Problem

Now imagine:

  • Server S3 shuts down
  • OR a new server S4 is added

Suddenly, our hashing formula (hash(key) % k) changes because k has changed.

πŸ‘‰ This means almost all keys must be remapped to different servers.
πŸ‘‰ For large-scale systems with billions of keys, this rehashing operation is extremely expensive.

This problem is called the Rehashing Problem.

Diagram – Normal Hashing vs Server Failure:

Before (k=4):           After S3 down (k=3):

hash(key)%4             hash(key)%3
Enter fullscreen mode Exit fullscreen mode

Nearly all keys are redistributed β†’ ❌ Inefficient.


Consistent Hashing to the Rescue

Consistent hashing was introduced to minimize data movement when servers are added or removed.

Instead of mapping keys directly to servers using %, both keys and servers are hashed into the same hash space (imagine values between 0 and 2^32-1).

We arrange this hash space into a circle (called a hash ring).

How it Works

  1. Hash servers onto the ring
  • Each server is assigned a position on the ring based on its hash.
  1. Hash keys onto the ring
  • Each key is assigned a position on the ring using the same hash function.
  1. Assign key to nearest server clockwise
  • A key belongs to the first server you encounter clockwise from its hash position.

Example

Suppose we have 3 servers: S1, S2, S3.

  • hash(S1) = 10
  • hash(S2) = 30
  • hash(S3) = 50

Keys: K1=12, K2=32, K3=45.

Placement:

  • K1=12 β†’ goes to S2 (30)
  • K2=32 β†’ goes to S3 (50)
  • K3=45 β†’ goes to S3 (50)

Diagram – Consistent Hash Ring:

       0 ---------------- 100
       |                 |
     S1(10)            S3(50)
        \              /
         \            /
           K1=12   K3=45
                S2(30)  
                K2=32
Enter fullscreen mode Exit fullscreen mode

Server Removal (Resilience)

If S2 (30) fails:

  • Only keys between S1 and S2 need to be moved to S3.
  • All other keys remain unaffected.

βœ… Unlike normal hashing, only a fraction of keys need to be moved.


Server Addition (Scalability)

If a new server S4(40) is added:

  • Keys between S2(30) and S4(40) move to S4.
  • All other keys remain untouched.

βœ… Adding a server also moves only a small set of keys.


Uneven Distribution Problem

Consistent hashing solves the rehashing problem, but it can still suffer from uneven load distribution.

Why?

  • The hash function might assign servers unevenly spaced positions on the ring.
  • Some servers may get a large portion of the keys, while others get very few.

Virtual Nodes (VNodes)

To solve uneven load distribution, we introduce virtual nodes.

πŸ‘‰ Instead of placing each server once on the ring, we place it multiple times (at different hash positions).

  • Each server is hashed with multiple hash functions (e.g., hash(S1+1), hash(S1+2), …).
  • These become virtual nodes belonging to the same physical server.

Example

If each server has 3 virtual nodes:

  • S1 β†’ positions at 10, 40, 70
  • S2 β†’ positions at 20, 50, 80
  • S3 β†’ positions at 30, 60, 90

Now keys are distributed more evenly, since each server controls multiple smaller segments of the ring.

βœ… Virtual nodes smoothen key distribution.
βœ… Adding/removing servers becomes even more balanced.

Diagram – Virtual Nodes:

       Ring with VNodes (10 slots)

S1: {10,40,70}
S2: {20,50,80}
S3: {30,60,90}
Enter fullscreen mode Exit fullscreen mode

Real World Applications

Consistent hashing is used in:

  • CDNs (Akamai, Cloudflare, Netflix) β†’ request routing to nearest edge servers.
  • Distributed Databases (Cassandra, DynamoDB, Riak) β†’ partitioning and replication.
  • Caching Systems (Memcached, Redis Cluster) β†’ distributing keys across cache nodes.
  • Load Balancers β†’ evenly distributing traffic to backend servers.

Key Advantages

βœ”οΈ Minimal data movement during server changes
βœ”οΈ Scalable and fault-tolerant
βœ”οΈ Works well with dynamic clusters
βœ”οΈ Virtual nodes fix uneven distribution

Key Trade offs

❌ Slightly more complex to implement than modulo hashing
❌ Metadata about servers & virtual nodes must be stored
❌ Very high number of configuration changes may still cause imbalance


Conclusion

Consistent hashing is one of the cornerstone techniques in distributed system design. By arranging servers and keys on a hash ring, it allows scalable, fault-tolerant, and efficient data distribution with minimal disruption during server changes.

With enhancements like virtual nodes, it ensures even distribution of data, making it the backbone of modern distributed databases, caching systems, and CDNs.

⚑ Next time you look up a video on Netflix or fetch a value from Redis, remember that consistent hashing is working silently in the background to keep things fast and efficient.


More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Top comments (0)