Balaji Yalla

Posted on May 12

System Design 002: Load Balancing Explained

#loadbalancing #systemdesign #architecture #l4vsl7

Load Balancing

🧱 What Is It?

A Load Balancer (LB) distributes incoming network traffic across multiple backend servers to ensure no single server is overwhelmed. It acts as the traffic cop sitting between clients and your backend servers.

🚀 Why Do You Need It?

Scalability: Distributes load horizontally so you can scale by adding more machines rather than buying one massive expensive server.
High Availability (HA): Automatically detects dead servers (via health checks) and redirects traffic to healthy ones.
Redundancy & Failover: Prevents your system from having a Single Point of Failure (SPOF).

📂 L4 vs L7 Load Balancers (The Core Division)

The names L4 and L7 come from the OSI Model (Open Systems Interconnection). To understand how they differ, think of a physical mail package:

✉️ L4 (Layer 4 - Transport Layer) — "The Mailman"

How it works: An L4 load balancer only looks at the outside of the envelope. It only reads the IP addresses (Layer 3) and the Port numbers (Layer 4, TCP/UDP). It does not open the payload.
Pros: Blazing-fast speeds! Because it doesn't decrypt traffic or parse protocols, it just forwards raw data packets.
Cons (Dumb): It cannot make smart decisions based on content (e.g., it cannot look at a cookie or direct /images to a separate server).

🚀 Real-World Applications Where L4 Makes Sense:

High-Throughput Raw Streaming (UDP/TCP): Media streaming services (VoIP, Zoom, RTSP video feeds) where raw speed is critical, and there are no HTTP headers/cookies to read.
Multiplayer Online Gaming Servers: Real-time games stream player coordinates using custom, lightweight TCP/UDP binary packets. An L4 balancer handles this easily, whereas an L7 HTTP-bound balancer would not understand the gaming binary protocol.
Database Read-Replica Routing: Directing SQL/NoSQL database TCP connections (MySQL, Redis, PostgreSQL) across replica nodes. Database traffic has no HTTP headers, so it is routed purely on TCP port levels.
Edge Gateway (The Front Door): Massive platforms (like Google or Netflix) put a super-fast L4 load balancer at the absolute front of their datacenter to absorb billions of requests and spread them across a fleet of smart L7 balancers.

📂 L7 (Layer 7 - Application Layer) — "The Receptionist"

How it works: An L7 load balancer opens the envelope and reads your letter. It performs TLS Termination (decrypts HTTPS) and parses the HTTP/HTTPS request to read cookies, headers, URL paths, and query parameters.
What it can do: It can make highly intelligent routing decisions:
- "If URL starts with /api, send to API Cluster. If it starts with /video, send to Video Cluster."
- "If there is a cookie called session_id, use **Consistent Hashing* to keep the user on the same server."*
Pros: Extremely smart, highly precise, and handles content-aware routing.
Cons: Slower than L4 because decrypting TLS and parsing strings requires more CPU overhead.

🚀 Real-World Applications Where L7 Makes Sense:

E-Commerce & SaaS User Logins (Sticky Sessions): Shopping carts and login systems require users to stay "stuck" to the same server holding their temporary state. L7 uses Consistent Hashing on the Session Cookie to achieve this.
Microservices & API Gateways (Path-Based Routing): Routing client requests based on URL path. For example, a single domain api.company.com needs to route /payments to the Payments Microservice and /auth to the Identity Microservice.
Header-Based Security & Optimization: Routing users to specific backend pools based on their device (User-Agent header) or geolocations (Accept-Language header).
Canary Deployments & A/B Testing: Routing 10% of users with an experimental cookie (group=canary) to a new test server, and the other 90% to the stable production server.

⚡ The TCP/IP Reality: Why No L5 or L6 Balancers?

The theoretical OSI Model has 7 layers, including Layer 5 (Session) and Layer 6 (Presentation). However, the real-world internet runs on the TCP/IP Model.

In the TCP/IP model, Layers 5, 6, and 7 are consolidated under a single layer: Layer 7 (Application).

THEORETICAL (OSI Model)         PRACTICAL (TCP/IP Model)
┌─────────────────────────┐     ┌─────────────────────────┐
│ Layer 7: Application    │ ──► │                         │
├─────────────────────────┤     │                         │
│ Layer 6: Presentation   │ ──► │  Layer 7: Application   │
├─────────────────────────┤     │                         │
│ Layer 5: Session        │ ──► │                         │
└─────────────────────────┘     └─────────────────────────┘

Because of this consolidation, when an L7 load balancer decrypts SSL/TLS (Layer 6) or manages browser sessions/connections (Layer 5), it is executed within the same software process that parses HTTP (Layer 7). Thus, we refer to it simply as an L7 Load Balancer.

⚙️ Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Routes requests sequentially (`Server 1` → `2` → `3` → `1`).	Stateless servers (simplest approach).
Least Connections	Routes to the server with the fewest active requests.	Long-running queries, downloads, or video streaming.
Weighted Round Robin	Sends more requests to beefier, higher-capacity servers.	Mixed-capacity server environments.
Consistent Hashing	Maps requests and servers clockwise onto a virtual ring.	Stateful servers (Sticky Sessions) and distributed caches.

🌀 Consistent Hashing for "Sticky Sessions"

When servers are stateful (e.g., they hold local user WebSocket connections or in-memory chat buffers), the load balancer must achieve Session Stickiness—sending the same user to the same server.

🚨 The Modulo Hazard (Why Modulo Fails)

If you hash user IPs using standard modulo:
$$\text{Server ID} = \text{hash}(\text{user_ip}) \pmod N$$
When a server crashes or you auto-scale (changing $N$ by just 1), the math changes. Nearly 90% of your users will suddenly get routed to the wrong server and logged out.

🛡️ The Consistent Hashing Solution

By mapping servers and client requests onto a circular virtual ring, adding or removing a server only affects a tiny fraction ($\approx 1/N$) of users. The other 90% remain connected with their sessions completely unbroken!

⚠️ The Corporate "NAT / Proxy" Problem (L4 vs L7 Hashing)

[!IMPORTANT]
The Golden Law of Sticky Sessions:

L4 Layer Limitation: You cannot get the information of a unique User UUID or Session ID at the L4 layer. These are only available at Layer 7 because they reside inside the HTTP headers/payload, which L4 cannot decrypt or read.

The 2000-Employee Hotspot Risk: If you establish sticky sessions at Layer 4, you are forced to hash on the Client IP Address. However, client IP addresses are not always unique! If 2,000 employees in a corporate office connect to your app, they will share the exact same network public IP (due to NAT). This causes the L4 load balancer to stick all 2,000 employees onto a single backend server, overloading it instantly while other servers sit idle.

IP Hash at L4: Fast and simple, but risks overloading servers if multiple users connect from shared corporate networks or public hotspots (the NAT/Proxy bottleneck).
Cookie/UUID Hash at L7: Opens the packet, decrypts TLS, extracts the unique Session Cookie or User UUID, and hashes on it. Even if 2,000 users connect from the same office IP, they will have 2,000 distinct session IDs—resulting in perfect, uniform load balancing across your cluster.

🏆 Architectural Case Studies: When to Choose L4 vs. L7

📰 Case Study A: The Stateless Public App (e.g., News Reading Apps / Wikipedia)

The Scenario: You are building a static public consumption app like a news reading app or a public wiki. Every visitor reads the exact same articles, and there is no user login state or shopping cart.
The Architecture:
- No Sticky Sessions Needed: Because the app is 100% stateless, any server can answer any request at any millisecond.
- L4 is the Perfect Choice: Since sticky sessions are not required, we put a blazing-fast L4 Load Balancer in front running simple Round Robin or Least Connections. It is highly cost-effective, has minimal CPU requirements, and handles massive traffic surges effortlessly.

🏢 Case Study B: The Internal Enterprise Network (Corporate Intranets / VPCs)

The Scenario: You are building a stateful application (like an internal corporate chat tool) deployed strictly on an internal corporate intranet or secure Virtual Private Cloud (VPC).
The Architecture:
- Private IP Uniqueness: Unlike the public internet, every corporate device inside an enterprise is assigned a unique, non-shared Private IP Address by the internal DHCP server. There are no external NAT gateway collisions!
- L4 IP-Based Hashing Wins: Because IP addresses are guaranteed to be unique inside the private intranet, you can safely use L4 Consistent Hashing based on Client IP to achieve sticky sessions. It gives you session stickiness at blazing-fast, hardware-level speeds with zero SSL decryption/TLS parsing overhead!

🛍️ Case Study C: The Public Consumer App (e.g., E-Commerce / Chat Apps)

The Scenario: You are building a public stateful app (like a retail site or messaging client) exposed to a large consumer base on the open internet.
The Architecture:
- NAT Bottlenecks: Users connect from shared networks (public Wi-Fi, cellular towers, home routers) where thousands of users share a single public IP. Hashing on IP at L4 will cause severe server hotspots.
- L7 Cookie-Based Hashing is Mandatory: To ensure session stickiness without overload, you must use an L7 Load Balancer that decrypts the traffic and hashes on unique User UUIDs or Session Cookies.

🎥 Deep Dive: How Zoom/Video Calls Load Balance (The Signaling vs. Media Split)

You just hit on a massive architectural challenge. In a video call platform (like Zoom, Google Meet, or Discord Voice), all participants in the same video room MUST connect to the exact same server instance (called an SFU - Selective Forwarding Unit) so the server can mix and forward their audio/video streams.

However, raw video streams are incredibly heavy. If you tried to route gigabytes of encrypted video traffic through a single smart L7 HTTP load balancer to read a "Room UUID," the L7 load balancer's CPU would instantly melt from the decryption overhead.

To solve this, real-world platforms use a brilliant architectural pattern called The Signaling vs. Media Plane Split (Two-Tier Routing):

 1. Join Room 123 (HTTP)      ┌───────────────────┐      3. Direct Media Server IP
 ────────────────────────────►│ L7 Load Balancer  ├──────────────────────────────────┐
                              └─────────┬─────────┘                                  │
                                        │ 2. Route                                   ▼
                              ┌─────────▼─────────┐                       ┌─────────────────────┐
                              │ Signaling Server  │                       │    Zoom Client      │
                              │ (Room Orchestrator│                       └──────────┬──────────┘
                              │   finds Server B) │                                  │
                              └───────────────────┘                                  │ 4. Direct Stream
                                                                                     │    (UDP IP:Port)
                                                                                     ▼
                              ┌───────────────────┐                       ┌─────────────────────┐
                              │  Media Server A   │                       │   Media Server B    │
                              │ (Room 456 Active) │                       │  (Room 123 Active)  │
                              └───────────────────┘                       └─────────────────────┘

Tier 1: The Signaling Plane (L7 HTTP/WebSockets) — "The Booking Agent"

When you click "Join Room 123", your client does not send video yet. It first sends a tiny, lightweight HTTP request over the L7 Load Balancer.
Because this is L7, the load balancer parses the payload and sends it to a Room Orchestrator Service.
The Orchestrator looks at its database and says: "Room 123 is currently hosted on **Media Server B* (IP: 54.12.34.56, Port: 50005)."*
The Signaling Server replies to your client: "Welcome to Room 123! Go stream your video directly to IP 54.12.34.56 on Port 50005!"

Tier 2: The Media Plane (L4 UDP) — "The Direct Highway"

Now, your Zoom client opens a raw UDP socket connection directly to 54.12.34.56:50005.
It bypasses the L7 HTTP load balancers entirely!
The raw, heavy video packets are routed directly at Layer 4 (Transport/IP level) straight to that dedicated media server.

🌟 Why this is so brilliant:

Zero L7 CPU Overhead for Video: Your expensive L7 load balancers only handle the lightweight "handshakes" (signaling). They never touch a single frame of the heavy video data.
Perfect Stickiness Without Packet Reading: You get perfect room-level stickiness because the client was explicitly told the exact IP address of the target server during the handshake phase!

💡 Interview Tips & Talking Points

Never assume stateless: If an interviewer asks you to scale a WebSocket-based chat app, immediately mention: "Since connections are stateful, we will use an L7 load balancer with Consistent Hashing on the User UUID to maintain sticky sessions."
Mention NAT Limitation: When asked about using IP Hash for routing, bring up the Corporate NAT/Proxy problem—it proves you understand real-world network traffic.
Default to L7 for API Gateways: In microservices, put an L7 load balancer (like Nginx/Envoy) at the front to handle path-based routing (/api/v1/users $\to$ User Service).

📚 Study Resources

Gaurav Sen: "Load Balancing" — Excellent coverage of L4 vs L7.
Gaurav Sen: "Consistent Hashing" — Critical video to watch.
ByteByteGo: "Load Balancer" — High-quality animated visual guide.

DEV Community