Dylan Dumont

Posted on Apr 1

Consistent Hashing: How Distributed Caches Route Keys Without Reshuffling

#architecture #database #systems #backend

When a cache node joins or leaves a cluster, traditional hashing invalidates every key, but consistent hashing keeps the data storm localized.

What We're Building

We are building a minimal consistent hash ring implementation in Python. This scope covers the hash function, virtual node mapping, key routing logic, and rebalancing logic for node churn. We avoid heavy frameworks to show the underlying mechanics that libraries like Redis Cluster or Consul rely on. By stripping away the abstractions of a full database library, we expose the core algorithm that enables stateless scaling. The goal is to replicate the behavior of redis.clients.RedisCluster without the overhead of the full network layer, focusing purely on the deterministic placement of data.

Step 1 — Virtualizing Physical Nodes

A single physical node can host multiple virtual slots on the ring to ensure even distribution across the hash space. This is the first major divergence from naive hashing. In a standard setup, if a node fails, all keys mapping to that node become stale. By assigning a physical server server-1 to 100 virtual nodes, we effectively multiply our control over the hash space.

from collections import namedtuple

VirtualNode = namedtuple("VirtualNode", ['node', 'index', 'weight'])

def create_virtual_nodes(physical_node, virtual_count=128):
    """
    Expands one physical node into N virtual nodes.
    """
    nodes = []
    for i in range(virtual_count):
        nodes.append(VirtualNode(
            node=physical_node,
            index=hash(f"{physical_node}:{i}") % 2**64,
            weight=1
        ))
    return nodes

Using multiple virtual nodes per physical instance smooths out hash bucket sizes and reduces skew. This prevents a scenario where one node becomes a bottleneck for all the "warm" keys while another sits idle. The index is calculated by hashing the tuple of the physical node name and a virtual ID.

Step 2 — Mapping Keys to the Ring

Keys must be hashed to 64-bit integers before mapping to a virtual node. This step determines which bucket a specific data item belongs to. The hash function (like MD5 or MURMUR3) returns a large integer. We use this integer to locate the virtual node on the circle.

Hashing to a wide integer space prevents collisions that would occur if we used simple string keys directly. We calculate the hash modulo the total number of slots. Then, we compare this result against the sorted list of virtual node hashes.

def hash_key(key):
    # Simplified Murmur3 logic for demonstration
    import zlib
    h = int(hashlib.md5(key.encode()).hexdigest(), 16)
    return h & ((1 << 64) - 1)

def find_owner(virtual_nodes, key_hash):
    """
    Finds the virtual node immediately clockwise from the key hash.
    """
    nodes = sorted(virtual_nodes, key=lambda x: x.index)

    if not nodes:
        return None

    # Find the first node where node.index > key_hash
    # If no such node exists, we wrap around to the first node
    for i, node in enumerate(nodes):
        if node.index > key_hash:
            return node

    return nodes[0]

This logic ensures that data is always served by the closest physical node in the sequence. If node.index is less than key_hash, the previous node owns it. We use a sorted list to perform this lookup efficiently in O(log N) time with binary search, though a linear scan works for small rings.

Step 3 — Clockwise Routing Logic

We visualize the ring as a circle. When a key is added, we find its position. When a node is added, we find the keys that fall between the new node and the previous one. These keys are moved to the new node. When a node is removed, the keys immediately following it in the clockwise direction are handed over to the next surviving node.

This logic ensures that data is always served by the closest physical node in the sequence. The ring structure inherently handles the "wrap-around" case where a key sits on the far right and a node sits on the far left.

class ConsistentHashRing:
    def __init__(self, key_factory=lambda x: hash(x), virtual_count=128):
        self.key_factory = key_factory
        self.ring = []

    def add_node(self, physical_node):
        nodes = create_virtual_nodes(physical_node, virtual_count=virtual_count)
        self.ring.extend(nodes)
        self.ring.sort(key=lambda x: x.index)
        self._rebalance()

    def remove_node(self, physical_node):
        # Remove virtual nodes belonging to this physical node
        self.ring = [v for v in self.ring if v.node != physical_node]
        self.ring.sort(key=lambda x: x.index)
        self._rebalance()

Step 4 — Handling Node Churn Gracefully

When a node is removed, only its neighbors take over its keys. This mechanism guarantees that we only reassign a fraction of keys rather than the entire dataset. In a real-world scenario, this happens asynchronously. The rebalance process iterates through the keys currently owned by the removed node. Each key is re-registered to the next available node clockwise.

This is where the concept of partial rebalancing comes in. We don't move every key in the world. We only move the keys that were physically adjacent to the failed node in the hash space.

    def _rebalance(self):
        """
        Simulates moving keys owned by removed nodes to the next available.
        In a real system, we would iterate over the keys owned by each node.
        """
        # Placeholder for key iteration logic
        pass

Key Takeaways

Virtual Nodes allow us to treat a single server as multiple points on the ring. Hash Uniformity is achieved by ensuring that the virtual nodes are evenly spaced around the circle. Clockwise Search is the fundamental routing algorithm that finds the nearest owner. Partial Rebalancing is the safety net that prevents massive data migrations during node maintenance.

What's Next?

We have implemented the core ring logic. Next, we will discuss Vector Clocks to handle eventual consistency in distributed logs. We will then explore Ceph’s implementation of CRUSH, which extends consistent hashing to handle erasure coding and placement group topology.

About this Series

This is part of the Async Patterns series.

DEV Community