This was just a quick overview. Read the full blog here
Consistent Hashing in Distributed Systems
When systems scale to millions of users, storing data on a single database becomes impossible.
Problems appear quickly:
- Storage limits
- Performance bottlenecks
- Single point of failure
- Expensive vertical scaling
The solution is horizontal scaling — distributing data across multiple servers.
But this introduces a new question:
How do we decide which server should store a particular piece of data?
Traditional Hashing
A simple approach is:
node = hash(key) % N
Where:
-
key→ userId, productId, etc. -
N→ number of servers
This distributes keys fairly well across servers.
However, it has a major flaw.
If the number of servers changes, most keys get reassigned.
At large scale, this can trigger massive cache misses and overload databases.
Consistent Hashing
Consistent hashing solves this by mapping both servers and keys onto a circular hash ring.
A key is assigned to:
The first server encountered while moving clockwise on the ring.
The big advantage:
- Adding or removing a server only affects a small portion of keys
- Most of the system remains untouched
This makes scaling far more stable.
Virtual Nodes
Basic consistent hashing can create uneven load.
Systems like Amazon Dynamo solve this using virtual nodes.
Instead of placing each server once on the ring, the system places it multiple times, creating better load distribution and smoother rebalancing.
Where It’s Used
Consistent hashing powers many large-scale systems:
- Distributed databases (Cassandra, DynamoDB)
- Distributed caches (Memcached, Redis)
- CDNs and load distribution systems
It’s one of the key ideas that makes modern distributed infrastructure scalable.
Top comments (0)