Mustafa Veysi Soyvural

Posted on Apr 2

Consistent Hashing in Go: From the Math to Production-Grade Code

#backend #go #systemdesign #distributedsystems

When you scale a cache cluster from 5 to 6 servers using modulo hashing, 83% of your keys remap to different servers. All at once. Every one of those remapped keys hits your database simultaneously — the thundering herd problem that has taken down production systems at every scale.

Before:  hash("user:1001") % 5 = 3  → Server C
After:   hash("user:1001") % 6 = 1  → Server A  ← cache miss

Consistent hashing solves this. It's the algorithm behind DynamoDB's partitioning, Cassandra's token ring, Discord's chat server routing, and Netflix's CDN distribution. This post walks through how it works, why it works, and the implementation details that separate a textbook explanation from production-ready code — backed by benchmarks and chaos testing.

The full implementation is available on GitHub: ~800 lines of Go, zero external dependencies.

The Algorithm

Consistent hashing was introduced by Karger et al. in 1997 to solve cache distribution for the early web. The original paper defined four formal properties that any consistent hash function must satisfy:

Balance — Keys distribute evenly across all nodes. No single node should be disproportionately loaded.
Monotonicity — When a new node joins, keys only move to the new node. Keys never reshuffle between existing nodes.
Spread — Across different client views of the cluster, a key maps to a small number of distinct nodes.
Load — No node receives more than its fair share of keys, regardless of which subset of nodes a client sees.

These aren't aspirational goals — they're the mathematical contract. If your implementation violates any of them, you don't have consistent hashing; you have a hash ring with bugs.

How the Ring Works

Instead of hash(key) % N, both nodes and keys are placed on a circular ring of size 2^32. To find which node owns a key:

Hash the key to a position on the ring
Walk clockwise until you hit a node
That node owns the key

The critical property: adding or removing a node only affects the keys in the arc between the changed node and its predecessor. Everything else stays put.

Strategy          │ Keys Remapped (5→6 nodes, 100K keys)
──────────────────┼─────────────────────────────────────
Modulo (hash%N)   │  83,803  (83.8%)
Consistent Hash   │  15,723  (15.7%)  ← ~K/N theoretical bound

That's monotonicity in action. The theoretical bound is K/N (20,000 for 100K keys across 5 nodes). The measured 15,723 falls within expected variance.

Virtual Nodes: From Theory to Practice

The raw ring algorithm satisfies monotonicity but fails badly on balance. With one position per physical node on a 5-node ring:

Node A:  32,014 keys
Node B:  27,412 keys
Node C:  18,805 keys
Node D:  19,914 keys
Node E:   1,855 keys  ← 17x less than Node A

A 17:1 imbalance. In production, Node E sits idle while Node A melts.

The fix is virtual nodes: each physical node gets multiple positions on the ring. Node-A#0 through Node-A#149 are each hashed independently to different ring positions. A lookup that lands on any of Node A's virtual nodes routes to the physical Node A.

Here's how balance improves as you add virtual nodes (measured across 100K keys, 5 physical nodes):

Virtual Nodes	Std Deviation	Worst Node Ratio
1	11,353	3.20x average
50	4,601	1.83x average
150	2,824	1.47x average
500	976	1.17x average

Diminishing returns kick in hard after 150. Going from 1 to 150 vnodes cuts standard deviation by 75%. Going from 150 to 500 cuts it by another 65%, but at the cost of 3.3x more ring entries, more memory, and slower binary searches. 150 is the sweet spot for most deployments — and it's what production systems like Cassandra converge on.

Proving Correctness Under Chaos

Benchmarks show the happy path. Chaos testing proves the guarantees hold when things break.

Failure Isolation

When a node dies, only its keys are affected. This is a direct consequence of monotonicity — surviving nodes' arc boundaries don't change.

Testing with 10,000 keys across five Redis nodes:

redis-1     2,197 keys
redis-2     1,731 keys
redis-3     1,559 keys  ← killed
redis-4     1,730 keys
redis-5     2,783 keys

After redis-3 failure:
  Cache hit rate:  84.4%
  Cache misses:    1,559  ← exactly redis-3's key count
  Keys remapped from surviving nodes: 0

Zero remapping for surviving nodes. The blast radius is perfectly contained.

Catastrophic Scenarios

Five tests verified the formal properties under extreme conditions:

Mass failure (50% node loss): Removed 5 of 10 nodes simultaneously. Zero keys remapped between surviving nodes. All misses traced to removed nodes only — monotonicity holds.
Rapid churn (20 add/remove cycles): Each operation remapped within 1.5x of the K/N theoretical bound. No cumulative drift — balance recovers after each operation.
Concurrent chaos: 2.4 million reads during simultaneous node additions and removals. Zero data races. 100% correctness on every read that targeted a live node.

These aren't unit tests — they're property-based proofs that the implementation satisfies Karger's guarantees under adversarial conditions.

Implementation Deep-Dive

Hash Function Selection

The implementation supports pluggable hash functions: FNV-1a, MD5, and CRC32. The choice matters more than you'd expect.

FNV-1a is the default, and for good reason. Consistent hashing doesn't need collision resistance — there's no adversary trying to forge hash collisions on your cache keys. What it needs is uniform distribution and speed. FNV-1a delivers both: non-cryptographic, fast, and well-distributed across the 32-bit space.

MD5 works but wastes cycles on cryptographic properties you don't need. You're paying for collision resistance that buys you nothing in a hash ring context. Use it only if you need compatibility with an existing system that already uses MD5 hashes.

CRC32 is the fastest option but has known distribution weaknesses with certain input patterns. Fine for benchmarking, risky for production.

If you need maximum throughput, consider xxHash — it's faster than FNV-1a with comparable distribution quality. The implementation's pluggable interface makes swapping trivial.

Collision Handling in Virtual Node Space

With 150 virtual nodes per physical node across 5 nodes, you're placing 750 points in a 2^32 space. Collisions are rare but inevitable. The birthday problem applies: at 750 points in 4 billion slots, collisions will occur in roughly 1 in 5,700 deployments.

The implementation detects and skips duplicate positions rather than overwriting. Overwriting silently transfers ownership of a key range from one node to another — a violation of the balance property that produces no error and no log entry. A silent data correctness bug that only manifests under load.

Concurrency: RWMutex, Not Mutex

The hash ring has a heavily skewed read/write ratio. Key lookups (reads) happen on every cache operation — potentially millions per second. Node additions and removals (writes) happen during deployments and failures — maybe a few times per day.

sync.RWMutex allows concurrent readers with exclusive writers. A plain sync.Mutex would serialize every lookup behind every other lookup, destroying throughput for no safety benefit.

Ring Lookup: Sorted Array + Binary Search

The ring is stored as a sorted array of virtual node positions. Key lookup uses sort.Search (binary search) to find the first node position clockwise from the key's hash — O(log n) where n is the total number of virtual nodes.

An alternative is a self-balancing BST (red-black tree, etc.), which gives O(log n) insertion and deletion without re-sorting. But Go's sort.Search on a slice is cache-friendly and fast in practice. Re-sorting 750 elements on the rare node change is negligible compared to the millions of lookups that benefit from contiguous memory layout.

Error Handling on Empty Rings

An empty ring — no nodes registered — must return an explicit error, not a zero value. A zero-value return silently routes all keys to... nothing. No crash, no log, no alert. Data just disappears.

if len(r.nodes) == 0 {
    return "", ErrEmptyRing
}

This is a system boundary. Fail loudly.

When NOT to Use Consistent Hashing

Consistent hashing is not universally optimal. Knowing when to reach for something else is as important as knowing the algorithm itself.

Jump Consistent Hash

Google's Jump Consistent Hash (Lamping & Veach, 2014) uses O(1) space and O(ln n) time with near-perfect balance — no virtual nodes needed. The catch: nodes must be identified by sequential integers (0, 1, 2, ...), not names or addresses. You can't remove node 3 without renumbering nodes 4+.

Use jump hash when: nodes are numbered slots (e.g., sharded database partitions) and you only add/remove from the tail. Use ring-based consistent hashing when: nodes have identities (IP addresses, hostnames) and any node can fail independently.

Bounded-Load Consistent Hashing

Google's bounded-load variant (Mirrokni et al., 2018) caps each node's load at ceil(average_load * (1 + epsilon)). When a node would exceed its cap, the algorithm continues clockwise to the next under-capacity node.

This solves the "celebrity problem" — when a small number of keys receive massively disproportionate traffic. Standard consistent hashing distributes keys evenly but not load if key popularity is skewed. A single viral cache key can overwhelm its assigned node while others idle.

Use bounded-load when: key access patterns are highly skewed (social media feeds, trending content, celebrity profiles).

Rendezvous Hashing (Highest Random Weight)

Each key computes a weight for every node; the highest-weight node wins. O(n) per lookup, but n is typically small (tens of nodes, not thousands). No ring, no virtual nodes, no rebalancing logic. Elegant for small clusters.

Use rendezvous when: your node count is small (< 50) and you value simplicity over lookup speed.

Plain Modulo with Planned Migration

If you can tolerate a maintenance window, hash % N with a full data migration on resize is simpler, faster per-lookup, and has zero overhead. Don't reach for consistent hashing if your cluster never changes size in production or if downtime during resizing is acceptable.

Use modulo when: the cluster is static, or downtime during scaling is acceptable.

Where Consistent Hashing Runs in Production

Amazon DynamoDB — The 2007 Dynamo paper popularized consistent hashing for key-value partitioning. Each node owns a range of the ring; virtual nodes handle heterogeneous hardware.
Apache Cassandra — Token ring partitioning assigns each node a set of token ranges. Virtual nodes (vnodes) were added in Cassandra 1.2 for automatic rebalancing.
Discord — Routes users to chat servers using consistent hashing, ensuring session stickiness during server scaling events.
Netflix — CDN edge servers use consistent hashing to route content requests, minimizing cache misses during server pool changes.
Memcached clients — libmemcached and most client libraries use consistent hashing by default for key-to-server routing.

Code & Resources

The full implementation: github.com/soyvural/consistent-hashing

~800 lines of Go. Zero external dependencies. Pluggable hash functions (FNV-1a, MD5, CRC32). Includes the benchmarks and chaos tests described above.

git clone https://github.com/soyvural/consistent-hashing.git
cd consistent-hashing
go run cmd/demo/main.go

DEV Community