DEV Community

pickuma
pickuma

Posted on • Originally published at pickuma.com

Consistent Hashing, Explained Through the Problem It Actually Solves

Most explanations of consistent hashing start with the ring. That's backwards. The ring only makes sense once you've felt the pain it removes, so we'll start with the pain.

The remapping problem

You have 40 million cached objects spread across 4 cache servers. The rule that decides which server holds which object is one line: server = hash(key) % 4. It works. Reads land on the right box, load is roughly even, and you move on.

Then traffic grows and you add a fifth server. The rule becomes hash(key) % 5. That one-character change is far more violent than it looks. A key stays on its old server only if hash(key) % 4 and hash(key) % 5 happen to agree, and across the full keyspace they agree about 20% of the time. The other 80% — roughly 32 of your 40 million objects — now resolve to a different server.

For a cache, that's close to a cold start. Thirty-two million lookups miss, fall through to the database, and refetch at the same moment you were trying to reduce load by adding capacity. The act of scaling up triggers a thundering herd.

The root cause is that modulo ties every key's location to the exact value of N. Change N at all and you've rewritten the placement of nearly the entire dataset. What you actually want is for adding one server to disturb roughly one server's worth of keys — no more.

How the ring fixes it

Consistent hashing changes the question. Instead of mapping keys into "one of N slots," you map both keys and servers onto the same large circular number space — say 0 to 2³² − 1, with the top wrapped around to meet the bottom to form a ring.

Hash each server's name to a point on the ring. Hash each key to a point too. A key belongs to the first server you hit walking clockwise from the key's position. That's the entire ownership rule.

Now watch what happens when you add a server. It lands at a single point on the ring and claims only the keys sitting in the arc between it and the next server counter-clockwise. Every other key on the ring keeps the exact owner it had before. On average the new server pulls in K/N keys — about 8 million of our 40 million, not 32 million — and those are the only keys that move.

The useful property has a precise statement: adding or removing a node relocates on average K/N keys, where K is the total number of keys and N the number of nodes. Movement is proportional to the change you made, not to the size of the system. That single guarantee is the whole reason the technique exists.

Removal is the mirror image. When a server leaves the ring, the keys it owned fall to the next server clockwise, and nothing else is touched. No global reshuffle, no cache stampede across the fleet — just one arc changing hands.

Virtual nodes, and where this runs in production

The basic ring has two weaknesses, and both come from the same source: with only N points scattered on a huge ring, the arcs between them are uneven. One server might randomly own a 30% slice of the ring while another owns 5%. Worse, when a server dies, its entire slice lands on a single clockwise neighbor, which can see its load jump sharply in an instant.

Virtual nodes solve both at once. Instead of placing each physical server at one point, you place it at many — 100 to 200 is a common range — by hashing server-name#1, server-name#2, and so on. Each physical server now owns a hundred small arcs scattered around the ring rather than one big one.

Two things improve. First, distribution smooths out: more points means the law of large numbers works in your favor, and the gap between the busiest and least-busy server typically shrinks into the single-digit percentage range. Second, failure is graceful — a dead server's hundred small arcs are inherited by many different neighbors instead of dumping everything on one, so the recovered load spreads across the fleet.

More virtual nodes is not free. Each one is an entry in the sorted structure you binary-search on every lookup, and the ring has to be rebuilt or copied when membership changes. A few hundred per node buys you smooth balancing; tens of thousands buys you a slower, memory-heavier ring for a balancing gain you won't measure. Pick a number, load-test it, and stop tuning.

This isn't academic. Amazon's 2007 Dynamo paper put consistent hashing into wide industrial use, and Cassandra and Riak inherited the design directly. memcached client libraries distribute keys across servers with it through the "ketama" algorithm. CDNs and L7 load balancers use it to pin a given client to the same backend so sessions and warm caches survive, even as the backend pool changes underneath.

If you're implementing it yourself, the core is small: a sorted list of (hash, node) pairs and a binary search to find the first entry clockwise of a key's hash. The subtle parts are choosing a hash with good distribution and getting the wrap-around case right at the top of the ring.

The mental model to keep: modulo hashing optimizes for even distribution at a fixed size and pays for it catastrophically when the size changes. Consistent hashing trades a little structural complexity for stability under change. If your node count never moves, plain modulo is simpler and fine. The moment nodes come and go — autoscaling, failures, rolling deploys — the ring earns its keep.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Top comments (0)