Madhukar Vissapragada

Posted on Jun 16 • Edited on Jul 3

Consistent Hashing — A System Design Deep Dive

#systemdesign #interview #backend #showdev

Consistent Hashing — The Algorithm Behind Scalable Systems

Section 1: The Problem

We have customers from Amazon and Flipkart who want to wishlist products from these platforms. Assume that this wishlist feature doesn't exist natively on Amazon or Flipkart. We have built a service where users can save and manage their wishlist from any e-commerce platform — all in one place.

Our service is gaining popularity and experiencing massive scale. To handle this, we first tried vertical scaling — replacing our existing server with a more powerful one. But even that wasn't enough to meet our scale expectations.

So we decided to go with horizontal scaling — adding more servers of the same configuration to share the load.

Vertical scaling — Replacing the existing server with a more powerful server.

Horizontal scaling — Adding more servers of the same existing configuration.

Horizontal scaling solves two problems at once:

Too much traffic — a single server cannot handle all the incoming requests
Too much data — a single server cannot store all the wishlist data

Both of these force us to distribute across multiple servers. But now we have a new problem — we have multiple servers, so which server handles which request? And how do we know which server holds which user's data?

Section 2: What is Sharding?

Before we talk about routing, we need to understand how data is distributed across servers.

If we normalize a table into two separate tables — say users and wishlist — on the same server, we call it normalization.

If we do the same thing but place each table on a different server, we call it separation of concerns or microservices.

If we split the data across multiple tables where the schema remains the same but the data is divided based on a sharding key, and we do this on the same server, we call it horizontal partitioning.

If we do the same thing across different servers, we call it Data Sharding. Data sharding is always done using a sharding key.

Section 3: What makes a good routing algorithm?

Now that we have sharded the data across multiple servers, we need to understand how to route requests correctly.

If user A stores their wishlist data and we route them to Server 5, then the next time user A wants to view their wishlist, we must route that request to Server 5 again — because that is where their data lives. This is the core problem a good routing algorithm must solve.

Before we pick an algorithm, let's define what makes a routing algorithm good:

Fast — computationally efficient to determine which request goes to which server
Equal distribution — load must be distributed equally across all servers
Minimal data movement — when we add or remove servers, the amount of data that needs to move should be minimal
Deterministic — all load balancers must always be in sync without constant communication between them, and requests of the same type must always go to the same server

Section 4: Round Robin

Round Robin sends each request to the next server in sequence using a simple modulo operation.

servers = ["A", "B", "C", "D"]

def handle_request(request):
    key = request.user_id
    total_servers = len(servers)
    server_id = int(key) % total_servers
    server = servers[server_id]
    forward_request(request, server)

Initial distribution with 4 servers:

A → user 0, 4, 8 ...
B → user 1, 5, 9 ...
C → user 2, 6, 10 ...
D → user 3, 7, 11 ...

When server B crashes, N changes from 4 to 3:

A → user 0, 3, 6, 9 ...
C → user 1, 4, 7, 10 ...
D → user 2, 5, 8, 11 ...

Only users 1, 5, 9... should have moved since their server B crashed. But almost every user got remapped to a different server — causing massive unnecessary data movement.

	Round Robin
Fast	✓
Equal distribution	✓
Minimal data movement	✕
Deterministic	✓

Verdict — Round Robin fails on minimal data movement. It works only when the number of servers is fixed and never changes.

Section 5: Bucketing (Range Based)

Bucketing assigns a range of user IDs to each server rather than using modulo.

server_list = ["A", "B", "C", "D"]
total_users = 400

def handle_request(request):
    key = request.user_id
    total_servers = len(server_list)           # 4
    bucket_size = total_users // total_servers  # 400 / 4 = 100
    server_id = math.floor(key / bucket_size)
    server = server_list[server_id]
    forward_request(server, request)

Initial distribution with 4 servers and 400 users:

A → user 0   – 99
B → user 100 – 199
C → user 200 – 299
D → user 300 – 399

When server B crashes, the remaining 3 servers must cover all 400 users:

A → user 0   – 132
C → user 133 – 265
D → user 266 – 399

All buckets change — causing massive unnecessary data movement across every server, not just server B's users.

When new users sign up, we cannot accommodate them on existing servers because the bucket ranges are already decided. It is impossible to add new users without first buying more servers.

	Bucketing
Fast	✓
Equal distribution	✓
Minimal data movement	✕
Deterministic	✓

Verdict — Bucketing is even worse than Round Robin. Not only does it reshuffle all data when a server is added or removed, it cannot even accommodate new users without buying new servers first.

Section 6: Mapping Table

What if instead of a formula-based solution we simply track which user is on which server? The load balancer maintains a hashmap from user_id to server_id.

mapping = {
    "user_A": "server_1",
    "user_B": "server_3",
    "user_C": "server_2",
    ...
}

def handle_request(request):
    server = mapping[request.user_id]
    forward_request(server, request)

When a server crashes, only the users on that server are reassigned to a random available server. No other users are affected.

When a new server is added, we take the total user count, pick a proportional number of random users, and move them to the new server to balance the load.

This gives us minimal data movement — only the affected users move, everyone else stays put.

The fatal flaw — all load balancers must always have the same mapping table. Keeping this table in sync across all load balancers at low latency is practically impossible. The CAP theorem tells us we cannot achieve perfect consistency with low latency at the same time.

	Mapping Table
Fast	✓
Equal distribution	✓
Minimal data movement	✓
Deterministic	✕

Verdict — Mapping Table finally solves minimal data movement but fails on the hardest problem — keeping all load balancers in sync at low latency is impossible.

Section 7: Consistent Hashing

Consistent Hashing is a hash based technique to route requests. It satisfies all four criteria — making it the algorithm of choice for distributed systems.

	Consistent Hashing
Fast	✓
Equal distribution	✓
Minimal data movement	✓
Deterministic	✓

How it works

A hash function is a digest function — it takes anything as input and returns an output within a fixed range. Common hash functions include MD5, SHA256, SHA512, and Murmur3.

In Consistent Hashing, we use a logical hash ring — the output space of the hash function bent into a circle.

We use k hash functions to place each server at k virtual spots on the ring. A good value of k is 32 or 64.
We use 1 hash function to place each user at one spot on the ring.
When a request arrives, we find the first server clockwise from the user's position on the ring using binary search.
That server handles the request and stores the user's data. This means:
All load balancers run the same hash functions and see the same ring — no sync needed
When a server is added or removed, only the users nearest to it on the ring are remapped — minimal data movement
The same user always hashes to the same spot — deterministic ### Implementation

import mmh3
import bisect

def custom_hash(data: str, i: int) -> int:
    # Murmur3 hash with seed i, output range 0..9
    return mmh3.hash(data, seed=i, signed=False) % 10

def build_ring(servers, k):
    ring = []
    for server in servers:
        for index in range(1, k + 1):
            h = custom_hash(str(server), index)
            ring.append((h, server))
    ring = sorted(ring)
    return ring

def route(user):
    user_hash = custom_hash(str(user), 0)
    # Find the first server clockwise using binary search
    index = bisect.bisect_right(ring, (user_hash, user))
    index = index % len(ring)  # wrap around the ring
    server_hash, server = ring[index]
    print(f"{user} hashed to {user_hash} → routed to {server} at position {server_hash}")

servers = ['10.11.1.13', '10.11.2.17', '10.11.13.167', '10.11.12.255', '10.11.1.0']
ring = build_ring(servers, 32)

route('Vishal')
route('Sanjana')

Notice that no matter how many times we call route('Vishal'), he always lands on the same server — that is Consistent Hashing being deterministic.

Adding a new server

When a new server K is added to the ring between user A and server H:

Before: User A → Server H
New server K added between User A and Server H
After:  User A → Server K

User A's existing data moves from Server H → Server K. All other users are completely unaffected. This is minimal data movement — only the users who fall between the new server and its predecessor on the ring are impacted.

When a server crashes

When a server crashes, all the users whose data lived on that server are automatically remapped to the next server clockwise on the ring.

Before: User A → Server H
Server H crashes
After:  User A → Server K  (next server clockwise)

User A's data moves from Server H → Server K
All other users → untouched

In both cases — adding or removing a server — only a small slice of users near that server on the ring are affected. Everyone else continues routing to the same server as before.

This is what makes Consistent Hashing fundamentally better than Round Robin and Bucketing — where any change in the number of servers causes a full reshuffle of almost all users.

Conclusion

Every routing algorithm we evaluated satisfied some criteria but failed on others.

Round Robin uses a simple modulo operation — user_id % N — which means any change in N causes almost all keys to remap to different servers, making it unusable in dynamic environments. Bucketing improves readability but suffers the same remapping problem and additionally cannot accommodate new users without pre-purchasing servers. Mapping Table achieves true minimal data movement but introduces a distributed consistency problem — keeping the hashmap in sync across all load balancers at low latency violates the guarantees of the CAP theorem.

Consistent Hashing resolves all four criteria in a single algorithm. By placing both servers and users on a logical hash ring using k independent hash functions, it guarantees:

O(log Nk) routing time via binary search on a sorted ring
Near-perfect load distribution as k approaches 32 or 64
1/N data movement when a server is added or removed — only the affected segment of the ring remaps
Zero inter-LB communication — since all load balancers share the same hash functions and server list via heartbeat, they independently compute identical routing decisions This is why Consistent Hashing is the foundation of distributed systems at scale — Cassandra, DynamoDB, Redis Cluster, and CDN routing systems like Cloudflare all rely on it.

DEV Community