DEV Community

Rupesh Konduru
Rupesh Konduru

Posted on

Load Balancing & Consistent Hashing — The Art of Splitting Work Fairly

You hired ten servers. Now someone needs to hand out the work — fairly, intelligently, and without breaking when one of them disappears.


In the last post, we talked about horizontal scaling — adding more servers to handle more traffic. It sounds simple enough. But here's the question nobody asks out loud: how does a user's request know which server to go to?

If you just point everyone at the same IP address, they all pile into Server 1 while Server 2 and Server 3 sit there doing nothing. You've spent money on more machines and gained absolutely nothing.

You need a traffic director. That's what this post is about.


The Load Balancer

A Load Balancer sits in front of all your servers and acts as the single point of contact for every incoming request. Users talk to it, it decides which server handles the work, and the server responds.

                     ┌─────────────────┐
                     │                 │──→ Server 1
Users ──→ Load Balancer                │──→ Server 2
                     │                 │──→ Server 3
                     └─────────────────┘
Enter fullscreen mode Exit fullscreen mode

From the user's perspective, they're talking to one address. They have no idea ten servers exist behind it. That invisibility is intentional — and it's one of the most elegant things about how the web works.

How Does It Actually Decide?

There are several routing strategies, each with a different personality:

🔄 Round Robin — Take turns

Request 1 → Server 1. Request 2 → Server 2. Request 3 → Server 3. Request 4 → back to Server 1.

✅ Dead simple, zero overhead
❌ Doesn't account for request weight — a heavy video upload and a tiny ping get treated identically

⚖️ Weighted Round Robin — Not all workers are equal

Same rotation, but servers get weights based on capacity. A powerful server might get 3 out of every 5 requests while a smaller one gets 2.

✅ Great when your servers have different specs
❌ Still doesn't account for what's actually happening on each server right now

🔗 Least Connections — Go to whoever is least busy

The load balancer tracks active connections in real time and always routes to the least busy server.

✅ Smart and dynamic — handles variable request durations well
❌ Slightly more overhead to track connection counts continuously

🔒 IP Hashing — Same user, same server

The user's IP address gets hashed and always maps to the same server.

✅ Useful for stateful sessions that can't be refactored
❌ If that server goes down, re-routing gets complicated — and this leads us to our next topic


Layer 4 vs Layer 7 — Two Kinds of Intelligence

Load balancers can operate at different levels of the network stack:

Type What it sees Best for
Layer 4 IP addresses and TCP info only Raw speed, simple routing
Layer 7 Full HTTP content — URL, headers, cookies Smart, content-aware routing

Layer 7 is where things get powerful. You can route requests to completely different server clusters based on what the request is actually asking for:

/api/videos  ──→  Video processing servers
/api/auth    ──→  Auth servers
/api/search  ──→  Search servers
Enter fullscreen mode Exit fullscreen mode

This is called path-based routing and it's the backbone of how microservices are structured at real companies.

Wait — isn't the load balancer itself a single point of failure?
Yes. The fix: run multiple load balancers. One active, one on standby. If the active one goes silent, the standby takes over automatically. This is called active-passive failover and it shows up everywhere in resilient system design.


Consistent Hashing — When Servers Come and Go

IP hashing introduced a sneaky problem: what happens when a server dies? Let me show you why this is nastier than it sounds.

The Catastrophe of Simple Hashing

Say you have 3 cache servers and a simple formula:

server_index = hash(key) % number_of_servers

hash("user_123") % 3 = 1  → Server 1
hash("user_456") % 3 = 2  → Server 2
hash("user_789") % 3 = 0  → Server 0
Enter fullscreen mode Exit fullscreen mode

Works perfectly. Until Server 1 crashes. Now you have 2 servers:

hash("user_123") % 2 = 1  → Server 1 (gone 💀)
hash("user_456") % 2 = 0  → Server 0 (was on Server 2!)
hash("user_789") % 2 = 1  → Server 1 (gone 💀)
Enter fullscreen mode Exit fullscreen mode

Almost every key remaps to a different server. In a cache, this triggers a massive wave of cache misses — every request now hits your database directly. Your database gets hammered. Your system crawls to a halt. All because one server went down.

This is the problem Consistent Hashing was invented to solve.

The Hash Ring

Imagine a ring — like a clock face — numbered 0 to 360 degrees. This is called the hash ring. Both your servers and your data keys get hashed onto this same ring.

The rule is beautifully simple: to find which server handles a key, start at that key's position and walk clockwise until you hit a server.

user_123 at 120° → walks clockwise → hits Server B at 180° ✅
user_456 at 200° → walks clockwise → hits Server C at 270° ✅
user_789 at 300° → walks clockwise → wraps around → Server A at 90° ✅
Enter fullscreen mode Exit fullscreen mode

Now Watch What Happens When a Server Dies

Server B at 180° crashes. What happens to user_123 at 120°?

Before: user_123 at 120° → Server B at 180° ✅
After:  user_123 at 120° → Server B GONE
                         → keeps walking → Server C at 270° ✅
Enter fullscreen mode Exit fullscreen mode

Only the keys pointing to Server B get reassigned — flowing to the next server clockwise. Every other key? Completely undisturbed.

With simple hashing, one server dying reshuffles everything. With consistent hashing, it only affects about 1/N of your keys.

That difference is the entire reason this algorithm exists.

Virtual Nodes — Fixing Uneven Distribution

If servers land unevenly on the ring, some get far more traffic than others. The fix: place each server multiple times at different positions using multiple hash functions. These are called virtual nodes.

Server A → hashes to 60°, 180°, 300°
Server B → hashes to 30°, 150°, 270°
Server C → hashes to 90°, 210°, 330°

Result:
30°[B] 60°[A] 90°[C] 150°[B] 180°[A] 210°[C] 270°[B] 300°[A] 330°[C]
Enter fullscreen mode Exit fullscreen mode

Servers interleave evenly around the ring. When one dies, its load spreads across all remaining servers proportionally. This is what Amazon DynamoDB, Apache Cassandra, and most large CDNs actually use.

Scenario Simple Hashing Consistent Hashing
Server removed ~100% of keys remap ~1/N keys remap
Server added ~100% of keys remap ~1/N keys remap
Load distribution Even (if lucky) Even with virtual nodes
Used in production Rarely at scale DynamoDB, Cassandra, CDNs

The Architecture So Far

Post 1:  User → Server → Database

Post 1:  User → [Server 1]
              → [Server 2] → Database
              → [Server 3]
         (but how do requests get routed?)

Post 2:  User → Load Balancer → [Server 1] → Database
                             → [Server 2] → Database
                             → [Server 3] → Database
         (with consistent hashing deciding distribution)
Enter fullscreen mode Exit fullscreen mode

Each solution creates the next problem. That rhythm is exactly how distributed systems evolved historically.


Next in the series → Message Queues — The Superpower That Makes Systems Resilient

Top comments (1)

Collapse
 
acytryn profile image
Andre Cytryn

the virtual nodes section is the part most beginner articles skip over. the basic ring is intuitive but the uneven distribution problem is what actually bites you in production. one thing worth mentioning: the number of virtual nodes per server (the vnodes count) is usually tunable in systems like Cassandra. too few and you get hotspots, too many and the memory overhead for tracking ring positions adds up. have you looked at how Cassandra specifically balances this tradeoff?