Consistent Hashing — System Design Deep Dive
A consistent hashing algorithm is a technique that allows distributed systems to evenly distribute requests and data across a cluster of servers.
At scale, consistent hashing becomes a fundamental building block for building reliable and horizontally scalable systems.
Why Do We Need Consistent Hashing?
Consistent hashing provides several important benefits:
- ✅ Minimizes data redistribution when servers join or leave
- ✅ Promotes even distribution of load across servers
- ✅ Protects systems from cascading failures during topology changes
- ✅ Reduces unnecessary cache misses at scale
Real-World Examples
Consistent hashing appears everywhere in modern distributed systems:
- Amazon DynamoDB uses it to partition and replicate data
- Apache Cassandra uses it to distribute data across nodes
- Akamai CDN uses it to route requests to edge servers
In reality, the exact implementation depends entirely on your system's access patterns and infrastructure requirements.
The Rehashing Problem
Before understanding consistent hashing, we need to understand the problem it solves.
There are n servers in your cluster. Each server's index is computed using:
server_idx = hash(key) % n
hash = Hash function to hash all keys
n = Size of your server pool
To fetch a key we perform the following operation:
f(key) % 4
The output is the server on which that key is stored.
This approach works fine when the server pool is fixed and data is distributed evenly.
Problem: Server Changes
When a server is added or removed, nearly all keys must be remapped.
Below is how server indexes are affected when we remove a server from the pool:
The new distribution of keys after the server is removed:
Most keys are redistributed — not only those that were previously stored on the server that went offline.
Core Consistent Hashing Concepts
Most consistent hashing implementations are built on a few foundational ideas:
- Hash Space and Hash Ring
- Hash Servers
- Hash Keys
- Server Lookup
Let's walk through each one.
Hash Space and Hash Ring
Imagine a hash algorithm whose output goes from x0, x1 ... xn.
For example, if our hash function is hash(key) % 100, the hash space runs from 0 to 99. In real systems, the space is far larger — SHA-1 runs from 0 to 2^160 - 1.
We then connect the two ends of the hash space to form a hash ring.
Hash Servers
We use our hashing function to map servers onto the ring using either the server name or IP address.
Note: Most real-world systems use a different hashing function for servers than for keys.
Hash Keys
Keys are hashed and placed onto the same ring.
Server Lookup
To determine which server a key belongs to, we move clockwise from the key's position on the ring until we hit the nearest server.
Adding and Removing Servers
Add a Server
When a new server is added, only a subset of keys need to be redistributed — those whose clockwise path now reaches the new server first.
In the example below, only key0 was redistributed. All other keys remain intact.
Remove a Server
When a server is removed, only the keys that were mapped to it are redistributed to the next server clockwise.
Two Issues in the Basic Approach
The basic consistent hashing approach introduces two problems:
1. Uneven partition sizes — When servers are added or removed, partitions can become imbalanced, leading to uneven load.
2. Non-uniform key distribution — Keys may cluster on certain servers, leaving others underutilized.
A technique called virtual nodes is used to solve both of these problems.
Virtual Nodes
In addition to real nodes, we place multiple virtual nodes for each server sparsely across the hash ring. Each server is now responsible for multiple partitions.
To find a key's server, we locate the nearest virtual node clockwise from the key's position.
As the number of virtual nodes increases, key distribution becomes more balanced.
Finding Affected Keys
When a server is added:
Move anticlockwise from the new server's position to the nearest existing server. All keys in that range are redistributed onto the new server.
When a server is removed:
The affected range starts at the removed server and moves anticlockwise until the next server is found. Keys in that range are redistributed to the surviving server.
Algorithm Comparison
| Algorithm | Redistribution on Change | Handles Hotspots | Complexity |
|---|---|---|---|
| Simple Modulo Hashing | All keys remapped | ❌ No | Low |
| Basic Consistent Hashing | Subset of keys | ⚠️ Partial | Medium |
| Consistent Hashing + Virtual Nodes | Subset of keys | ✅ Yes | Medium-High |
Request Flow
- Client sends a request with a key
- Key is hashed onto the ring
- System moves clockwise to find the nearest server (or virtual node)
- Request is forwarded to that server
- On topology change, only affected keys are redistributed
Consistent Hashing in Distributed Systems
Scaling introduces new challenges even with consistent hashing.
Hotspot Problem
Even with virtual nodes, certain keys (e.g. celebrity data in social networks) can generate disproportionate traffic to one server.
Solutions:
- Further subdivide hotspot partitions
- Add dedicated servers for high-traffic keys
- Apply application-level caching in front of the ring
Replication
For fault tolerance, keys are typically replicated to the next N servers clockwise on the ring. This ensures data survives individual node failures without a full redistribution.
Monitoring and Observability
After deployment, monitoring consistent hashing behavior is critical.
Track:
- Load distribution across nodes
- Redistribution events on topology changes
- Virtual node count and balance metrics
- Replication lag across replicas
- Hotspot frequency and severity
Consistent hashing is not a "set and forget" mechanism — it requires continuous tuning of virtual node counts and replication factors as your cluster evolves.
Final Thoughts
Consistent hashing is more than just a smarter way to distribute keys. It is a core scalability mechanism that:
- stabilizes systems during cluster topology changes,
- ensures fair distribution of load across nodes,
- and enables systems to scale horizontally with confidence.
Choosing the right configuration of virtual nodes and replication strategy depends heavily on your traffic patterns, data size, and fault tolerance requirements.
Design it carefully — because at scale, consistent hashing becomes part of your system's scalability strategy.
















Top comments (0)