I Built a Consistent Hashing Ring in Pure Python and Finally Understood How Cassandra Distributes Data
I've been using Cassandra and Redis Cluster for years. I knew consistent hashing was "how they work." But I never truly got it until I built one myself from scratch, in pure Python, with zero dependencies.
This post is about what I learned doing that.
The Problem Consistent Hashing Solves
Imagine you have 3 servers and 1 million keys. The naive approach: server = hash(key) % 3.
It works great until you add or remove a server. Change 3 to 4, and almost every key remaps to a different server. In a caching layer, that means near 100% cache miss. In a database, it means massive data movement.
That's the problem consistent hashing solves. When you add or remove a node, only a fraction of keys move. Specifically, 1/n of the keys, where n is the number of nodes.
Building the Ring
The core idea: place both nodes and keys on a circular number line from 0 to 2^32 (or any large integer). To find which node owns a key, walk clockwise until you hit a node.
Here's the minimal version:
import hashlib
import bisect
class ConsistentHashRing:
def __init__(self, replicas=150):
self.replicas = replicas
self.ring = {} # hash -> node name
self.sorted_keys = [] # sorted hash positions
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node: str):
for i in range(self.replicas):
virtual_key = f"{node}:vnode:{i}"
h = self._hash(virtual_key)
self.ring[h] = node
bisect.insort(self.sorted_keys, h)
def remove_node(self, node: str):
for i in range(self.replicas):
virtual_key = f"{node}:vnode:{i}"
h = self._hash(virtual_key)
del self.ring[h]
idx = bisect.bisect_left(self.sorted_keys, h)
self.sorted_keys.pop(idx)
def get_node(self, key: str) -> str:
if not self.ring:
raise ValueError("Ring is empty")
h = self._hash(key)
idx = bisect.bisect_right(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
That's it. The entire routing logic fits in about 30 lines.
The bisect module does the heavy lifting. bisect_right finds the position where the key's hash would be inserted to keep the list sorted. That gives us the "next node clockwise." If we fall off the end, we wrap around to index 0.
Why Virtual Nodes Matter
Without virtual nodes, three physical servers sit at three points on the ring. The space between them is uneven. One server might own 60% of the ring, another 15%.
Virtual nodes fix this. Each physical node gets replicas positions on the ring. 150 replicas per node is common in production systems. The positions scatter across the ring uniformly (not perfectly, but well enough).
Let's verify the distribution:
ring = ConsistentHashRing(replicas=150)
ring.add_node("node-A")
ring.add_node("node-B")
ring.add_node("node-C")
counts = {"node-A": 0, "node-B": 0, "node-C": 0}
for i in range(100_000):
key = f"user:{i}"
owner = ring.get_node(key)
counts[owner] += 1
for node, count in counts.items():
print(f"{node}: {count} keys ({count/1000:.1f}%)")
Output:
node-A: 33241 keys (33.2%)
node-B: 33489 keys (33.5%)
node-C: 33270 keys (33.3%)
Roughly one third each. That's the goal.
Testing Node Addition
Now let's see what happens when we add a fourth node:
# Record original placement for 10,000 keys
original = {}
for i in range(10_000):
key = f"session:{i}"
original[key] = ring.get_node(key)
ring.add_node("node-D")
moved = sum(1 for k, v in original.items() if ring.get_node(k) != v)
print(f"Keys moved: {moved} / {len(original)} = {moved/len(original)*100:.1f}%")
Output:
Keys moved: 2498 / 10000 = 25.0%
Adding 1 node to a 3-node ring moved 25% of keys. The ideal is 25% (1 in 4 keys belongs to the new node). That's consistent hashing working correctly.
Compare that to modulo hashing: adding one server remaps roughly 75% of keys.
Replication: Storing Keys on Multiple Nodes
Production systems store each key on multiple nodes for fault tolerance. That's easy to add:
def get_nodes(self, key: str, count: int) -> list[str]:
if not self.ring:
raise ValueError("Ring is empty")
h = self._hash(key)
idx = bisect.bisect_right(self.sorted_keys, h)
nodes = []
seen = set()
checked = 0
while len(nodes) < count and checked < len(self.sorted_keys):
real_idx = (idx + checked) % len(self.sorted_keys)
node = self.ring[self.sorted_keys[real_idx]]
if node not in seen:
nodes.append(node)
seen.add(node)
checked += 1
return nodes
Walk clockwise from the key's position, collect unique nodes until you have the replication factor. Cassandra uses replication factor 3 by default. If a node is down, your application reads from the next one clockwise.
Weighted Nodes
Not all servers are equal. A 32-core machine should own more of the ring than a 4-core box. Handle this by scaling replicas:
def add_node(self, node: str, weight: float = 1.0):
count = int(self.replicas * weight)
for i in range(count):
virtual_key = f"{node}:vnode:{i}"
h = self._hash(virtual_key)
self.ring[h] = node
bisect.insort(self.sorted_keys, h)
A node with weight=2.0 gets twice as many virtual nodes and owns roughly twice the keyspace.
What I Got Wrong at First
My first version had a bug: I used bisect_left instead of bisect_right. With bisect_left, if the key's hash exactly matches a node position (rare but possible), the key routes to the node at that position. With bisect_right, it routes to the next node clockwise. The difference is subtle, but bisect_right is the correct semantic for "walk clockwise until you hit a node."
I also initially forgot to handle the wraparound case. If a key hashes to a value larger than all node positions, bisect_right returns len(sorted_keys). Wrap to 0. Easy fix, but easy to miss.
Where This Shows Up in the Real World
Cassandra uses consistent hashing with virtual nodes (called vnodes). Their default was 256 vnodes per node until Cassandra 4, which moved to a more sophisticated token allocation strategy. But the principle is the same.
Redis Cluster uses a related approach called hash slots. Instead of a continuous ring, it divides the keyspace into 16,384 slots and assigns ranges to nodes. Closer to consistent hashing than to modulo, but not identical.
Amazon DynamoDB's original paper (2007) is where most people first encountered consistent hashing at scale. Reading it after building this made the paper click in a way it never had before.
The Toy You Should Build
If you depend on a distributed data store, build the ring. It takes two hours. You'll understand:
- Why rebalancing is cheap when you add nodes
- Why virtual nodes exist and what count to use
- Why Cassandra recommends specific replication factors for multi-datacenter setups
- Why hash slot approaches exist as an alternative
Build the toy version of the thing you trust with production data.
What's your replication factor in production? 2, 3, or something else, and why?
Top comments (0)