JosephAkayesi

Posted on Mar 18

Design Consistent Hashing

#systemdesign #distributedsystems #backend #programming

Consistent Hashing — System Design Deep Dive

A consistent hashing algorithm is a technique that allows distributed systems to evenly distribute requests and data across a cluster of servers.

At scale, consistent hashing becomes a fundamental building block for building reliable and horizontally scalable systems.

Why Do We Need Consistent Hashing?

Consistent hashing provides several important benefits:

✅ Minimizes data redistribution when servers join or leave
✅ Promotes even distribution of load across servers
✅ Protects systems from cascading failures during topology changes
✅ Reduces unnecessary cache misses at scale

Real-World Examples

Consistent hashing appears everywhere in modern distributed systems:

Amazon DynamoDB uses it to partition and replicate data
Apache Cassandra uses it to distribute data across nodes
Akamai CDN uses it to route requests to edge servers

In reality, the exact implementation depends entirely on your system's access patterns and infrastructure requirements.

The Rehashing Problem

Before understanding consistent hashing, we need to understand the problem it solves.

There are n servers in your cluster. Each server's index is computed using:

server_idx = hash(key) % n

hash = Hash function to hash all keys
n    = Size of your server pool

To fetch a key we perform the following operation:

f(key) % 4

The output is the server on which that key is stored.

This approach works fine when the server pool is fixed and data is distributed evenly.

Problem: Server Changes

When a server is added or removed, nearly all keys must be remapped.

Below is how server indexes are affected when we remove a server from the pool:

The new distribution of keys after the server is removed:

Most keys are redistributed — not only those that were previously stored on the server that went offline.

Core Consistent Hashing Concepts

Most consistent hashing implementations are built on a few foundational ideas:

Hash Space and Hash Ring
Hash Servers
Hash Keys
Server Lookup

Let's walk through each one.

Hash Space and Hash Ring

Imagine a hash algorithm whose output goes from x0, x1 ... xn.

For example, if our hash function is hash(key) % 100, the hash space runs from 0 to 99. In real systems, the space is far larger — SHA-1 runs from 0 to 2^160 - 1.

We then connect the two ends of the hash space to form a hash ring.

Hash Servers

We use our hashing function to map servers onto the ring using either the server name or IP address.

Note: Most real-world systems use a different hashing function for servers than for keys.

Hash Keys

Keys are hashed and placed onto the same ring.

Server Lookup

To determine which server a key belongs to, we move clockwise from the key's position on the ring until we hit the nearest server.

Adding and Removing Servers

Add a Server

When a new server is added, only a subset of keys need to be redistributed — those whose clockwise path now reaches the new server first.

In the example below, only key0 was redistributed. All other keys remain intact.

Remove a Server

When a server is removed, only the keys that were mapped to it are redistributed to the next server clockwise.

Two Issues in the Basic Approach

The basic consistent hashing approach introduces two problems:

1. Uneven partition sizes — When servers are added or removed, partitions can become imbalanced, leading to uneven load.

2. Non-uniform key distribution — Keys may cluster on certain servers, leaving others underutilized.

A technique called virtual nodes is used to solve both of these problems.

Virtual Nodes

In addition to real nodes, we place multiple virtual nodes for each server sparsely across the hash ring. Each server is now responsible for multiple partitions.

To find a key's server, we locate the nearest virtual node clockwise from the key's position.

As the number of virtual nodes increases, key distribution becomes more balanced.

Finding Affected Keys

When a server is added:

Move anticlockwise from the new server's position to the nearest existing server. All keys in that range are redistributed onto the new server.

When a server is removed:

The affected range starts at the removed server and moves anticlockwise until the next server is found. Keys in that range are redistributed to the surviving server.

Algorithm Comparison

Algorithm	Redistribution on Change	Handles Hotspots	Complexity
Simple Modulo Hashing	All keys remapped	❌ No	Low
Basic Consistent Hashing	Subset of keys	⚠️ Partial	Medium
Consistent Hashing + Virtual Nodes	Subset of keys	✅ Yes	Medium-High

Request Flow

Client sends a request with a key
Key is hashed onto the ring
System moves clockwise to find the nearest server (or virtual node)
Request is forwarded to that server
On topology change, only affected keys are redistributed

Consistent Hashing in Distributed Systems

Scaling introduces new challenges even with consistent hashing.

Hotspot Problem

Even with virtual nodes, certain keys (e.g. celebrity data in social networks) can generate disproportionate traffic to one server.

Solutions:

Further subdivide hotspot partitions
Add dedicated servers for high-traffic keys
Apply application-level caching in front of the ring

Replication

For fault tolerance, keys are typically replicated to the next N servers clockwise on the ring. This ensures data survives individual node failures without a full redistribution.

Monitoring and Observability

After deployment, monitoring consistent hashing behavior is critical.

Track:

Load distribution across nodes
Redistribution events on topology changes
Virtual node count and balance metrics
Replication lag across replicas
Hotspot frequency and severity

Consistent hashing is not a "set and forget" mechanism — it requires continuous tuning of virtual node counts and replication factors as your cluster evolves.

Final Thoughts

Consistent hashing is more than just a smarter way to distribute keys. It is a core scalability mechanism that:

stabilizes systems during cluster topology changes,
ensures fair distribution of load across nodes,
and enables systems to scale horizontally with confidence.

Choosing the right configuration of virtual nodes and replication strategy depends heavily on your traffic patterns, data size, and fault tolerance requirements.

Design it carefully — because at scale, consistent hashing becomes part of your system's scalability strategy.

DEV Community