You hired ten servers. Now someone needs to hand out the work — fairly, intelligently, and without breaking when one of them disappears.
In the last post, we talked about horizontal scaling — adding more servers to handle more traffic. It sounds simple enough. But here's the question nobody asks out loud: how does a user's request know which server to go to?
If you just point everyone at the same IP address, they all pile into Server 1 while Server 2 and Server 3 sit there doing nothing. You've spent money on more machines and gained absolutely nothing.
You need a traffic director. That's what this post is about.
The Load Balancer
A Load Balancer sits in front of all your servers and acts as the single point of contact for every incoming request. Users talk to it, it decides which server handles the work, and the server responds.
┌─────────────────┐
│ │──→ Server 1
Users ──→ Load Balancer │──→ Server 2
│ │──→ Server 3
└─────────────────┘
From the user's perspective, they're talking to one address. They have no idea ten servers exist behind it. That invisibility is intentional — and it's one of the most elegant things about how the web works.
How Does It Actually Decide?
There are several routing strategies, each with a different personality:
🔄 Round Robin — Take turns
Request 1 → Server 1. Request 2 → Server 2. Request 3 → Server 3. Request 4 → back to Server 1.
✅ Dead simple, zero overhead
❌ Doesn't account for request weight — a heavy video upload and a tiny ping get treated identically
⚖️ Weighted Round Robin — Not all workers are equal
Same rotation, but servers get weights based on capacity. A powerful server might get 3 out of every 5 requests while a smaller one gets 2.
✅ Great when your servers have different specs
❌ Still doesn't account for what's actually happening on each server right now
🔗 Least Connections — Go to whoever is least busy
The load balancer tracks active connections in real time and always routes to the least busy server.
✅ Smart and dynamic — handles variable request durations well
❌ Slightly more overhead to track connection counts continuously
🔒 IP Hashing — Same user, same server
The user's IP address gets hashed and always maps to the same server.
✅ Useful for stateful sessions that can't be refactored
❌ If that server goes down, re-routing gets complicated — and this leads us to our next topic
Layer 4 vs Layer 7 — Two Kinds of Intelligence
Load balancers can operate at different levels of the network stack:
| Type | What it sees | Best for |
|---|---|---|
| Layer 4 | IP addresses and TCP info only | Raw speed, simple routing |
| Layer 7 | Full HTTP content — URL, headers, cookies | Smart, content-aware routing |
Layer 7 is where things get powerful. You can route requests to completely different server clusters based on what the request is actually asking for:
/api/videos ──→ Video processing servers
/api/auth ──→ Auth servers
/api/search ──→ Search servers
This is called path-based routing and it's the backbone of how microservices are structured at real companies.
Wait — isn't the load balancer itself a single point of failure?
Yes. The fix: run multiple load balancers. One active, one on standby. If the active one goes silent, the standby takes over automatically. This is called active-passive failover and it shows up everywhere in resilient system design.
Consistent Hashing — When Servers Come and Go
IP hashing introduced a sneaky problem: what happens when a server dies? Let me show you why this is nastier than it sounds.
The Catastrophe of Simple Hashing
Say you have 3 cache servers and a simple formula:
server_index = hash(key) % number_of_servers
hash("user_123") % 3 = 1 → Server 1
hash("user_456") % 3 = 2 → Server 2
hash("user_789") % 3 = 0 → Server 0
Works perfectly. Until Server 1 crashes. Now you have 2 servers:
hash("user_123") % 2 = 1 → Server 1 (gone 💀)
hash("user_456") % 2 = 0 → Server 0 (was on Server 2!)
hash("user_789") % 2 = 1 → Server 1 (gone 💀)
Almost every key remaps to a different server. In a cache, this triggers a massive wave of cache misses — every request now hits your database directly. Your database gets hammered. Your system crawls to a halt. All because one server went down.
This is the problem Consistent Hashing was invented to solve.
The Hash Ring
Imagine a ring — like a clock face — numbered 0 to 360 degrees. This is called the hash ring. Both your servers and your data keys get hashed onto this same ring.
The rule is beautifully simple: to find which server handles a key, start at that key's position and walk clockwise until you hit a server.
user_123 at 120° → walks clockwise → hits Server B at 180° ✅
user_456 at 200° → walks clockwise → hits Server C at 270° ✅
user_789 at 300° → walks clockwise → wraps around → Server A at 90° ✅
Now Watch What Happens When a Server Dies
Server B at 180° crashes. What happens to user_123 at 120°?
Before: user_123 at 120° → Server B at 180° ✅
After: user_123 at 120° → Server B GONE
→ keeps walking → Server C at 270° ✅
Only the keys pointing to Server B get reassigned — flowing to the next server clockwise. Every other key? Completely undisturbed.
With simple hashing, one server dying reshuffles everything. With consistent hashing, it only affects about 1/N of your keys.
That difference is the entire reason this algorithm exists.
Virtual Nodes — Fixing Uneven Distribution
If servers land unevenly on the ring, some get far more traffic than others. The fix: place each server multiple times at different positions using multiple hash functions. These are called virtual nodes.
Server A → hashes to 60°, 180°, 300°
Server B → hashes to 30°, 150°, 270°
Server C → hashes to 90°, 210°, 330°
Result:
30°[B] 60°[A] 90°[C] 150°[B] 180°[A] 210°[C] 270°[B] 300°[A] 330°[C]
Servers interleave evenly around the ring. When one dies, its load spreads across all remaining servers proportionally. This is what Amazon DynamoDB, Apache Cassandra, and most large CDNs actually use.
| Scenario | Simple Hashing | Consistent Hashing |
|---|---|---|
| Server removed | ~100% of keys remap | ~1/N keys remap |
| Server added | ~100% of keys remap | ~1/N keys remap |
| Load distribution | Even (if lucky) | Even with virtual nodes |
| Used in production | Rarely at scale | DynamoDB, Cassandra, CDNs |
The Architecture So Far
Post 1: User → Server → Database
Post 1: User → [Server 1]
→ [Server 2] → Database
→ [Server 3]
(but how do requests get routed?)
Post 2: User → Load Balancer → [Server 1] → Database
→ [Server 2] → Database
→ [Server 3] → Database
(with consistent hashing deciding distribution)
Each solution creates the next problem. That rhythm is exactly how distributed systems evolved historically.
Next in the series → Message Queues — The Superpower That Makes Systems Resilient
Top comments (1)
the virtual nodes section is the part most beginner articles skip over. the basic ring is intuitive but the uneven distribution problem is what actually bites you in production. one thing worth mentioning: the number of virtual nodes per server (the vnodes count) is usually tunable in systems like Cassandra. too few and you get hotspots, too many and the memory overhead for tracking ring positions adds up. have you looked at how Cassandra specifically balances this tradeoff?