Modern applications donβt fail because of bad code β they fail because of traffic. As users grow, requests surge, and systems face uneven load, relying on a single server becomes a guaranteed bottleneck. This is where load balancing comes in. Load balancing is the fundamental technique that allows applications to scale horizontally, remain highly available, and deliver fast responses even under extreme demand. By intelligently distributing incoming requests across multiple servers, load balancers prevent overload, reduce latency, and ensure reliability. From global platforms like Netflix and Amazon to everyday APIs and microservices, load balancing is the invisible force that keeps modern systems running smoothly.
1. Load Balancing β The One-Line Idea
Load balancing distributes incoming requests across multiple servers so no single server becomes overloaded, slow, or crashes.
Why this matters:
- Computers have limits
- Traffic is uneven
- Failures are inevitable
Load balancing is how real-world systems survive.
2. Mental Model (Very Important)
Without Load Balancer
Users β Server
β overloaded
β slow
β crash
With Load Balancer
Users
β
Load Balancer
β
Server A Server B Server C
π The load balancer is the brain + traffic cop.
3. What a Load Balancer Actually Does
At runtime, a load balancer:
- Receives client requests
- Checks which servers are available
- Applies a routing algorithm
- Forwards the request
- Monitors server health
- Removes failed servers automatically
This happens millions of times per second.
4. When You Need Load Balancing (The βWhenβ)
You need load balancing when:
π₯ High Traffic
- Netflix
- Amazon sales
- Social media feeds
π Scaling
- One server β many servers
- Horizontal scaling
π‘οΈ Fault Tolerance
- One server fails β traffic rerouted
- No downtime
β‘ Performance
- Route to fastest / closest server
π§ Parallel Workloads
- APIs
- ML inference
- Data processing
5. Where Load Balancing Lives (OSI Layers)
Layer 4 (Transport Layer)
- Works with IP + Port
- TCP / UDP
- Very fast
- Less intelligent
Example:
Send traffic on port 443 to server with least connections
Used for:
- Databases
- Simple services
Layer 7 (Application Layer)
- Understands HTTP/HTTPS
-
Looks at:
- URL paths
- Headers
- Cookies
Example:
/login β auth servers
/video β streaming servers
/api β API servers
Netflix, Google, AWS β heavy Layer 7 usage
6. Load Balancing Algorithms (The Brain)
| Algorithm | How It Works | When to Use |
|---|---|---|
| Round Robin | One by one | Equal servers |
| Least Connections | Fewest active users | Real-world traffic |
| Least Response Time | Fastest server | Low latency apps |
| IP Hash | Same user β same server | Sessions |
| Weighted | Strong servers get more traffic | Mixed hardware |
π Most common in practice:
Least Connections + Health Checks
7. Types of Load Balancers (Real Systems)
1οΈβ£ Hardware
- Physical devices
- Very fast
- Very expensive
Used by banks, telecoms
2οΈβ£ Software
Runs on normal machines:
- NGINX
- HAProxy
- Envoy
Flexible, popular, powerful
3οΈβ£ Cloud Load Balancers (Most Used Today)
- AWS β ELB / ALB / NLB
- Google β Cloud LB
- Azure β Azure LB
Benefits:
- Auto-scaling
- Built-in redundancy
- Easy setup
4οΈβ£ DNS Load Balancing
- DNS returns different IPs
- Good for global traffic
- No real-time health awareness
8. Health Checks (Why Systems Donβt Collapse)
Load balancer repeatedly asks servers:
GET /health
If server:
- Fails
- Times out
- Returns errors
β Removed from rotation
β
Added back when healthy
9. Code Example 1: Simple Backend Servers (Node.js)
Letβs create 3 servers.
server.js
const http = require("http");
const PORT = process.env.PORT;
const NAME = process.env.NAME;
http.createServer((req, res) => {
res.end(`Hello from ${NAME}\n`);
}).listen(PORT, () => {
console.log(`${NAME} running on port ${PORT}`);
});
Run:
PORT=3001 NAME=Server-A node server.js
PORT=3002 NAME=Server-B node server.js
PORT=3003 NAME=Server-C node server.js
10. Code Example 2: NGINX Load Balancer
nginx.conf
http {
upstream backend_servers {
least_conn;
server localhost:3001;
server localhost:3002;
server localhost:3003;
}
server {
listen 80;
location / {
proxy_pass http://backend_servers;
}
}
}
Whatβs happening?
-
upstream= server pool -
least_conn= algorithm - NGINX distributes traffic automatically
11. Session Persistence (Sticky Sessions)
For logged-in users:
upstream backend_servers {
ip_hash;
server localhost:3001;
server localhost:3002;
}
π Same user β same server
12. Load Balancer + Auto Scaling (Real Production)
| Load Balancer | Auto Scaling |
|---|---|
| Distributes traffic | Adds/removes servers |
| Prevents overload | Handles growth |
| Always on | Trigger-based |
Used together in:
- AWS
- Kubernetes
- Netflix
13. Netflix Example (End-to-End)
Netflix uses multiple layers:
- DNS β nearest region
- Edge Load Balancers β CDN
- Regional Load Balancers
- Service-to-service balancing
- Microservices
Each click:
User β DNS β Load Balancer β Service β Cache β Stream
Result:
- No overload
- Fast startup
- Global scale
14. Security Benefits
Load balancers can:
- Terminate SSL
- Hide backend IPs
- Rate limit traffic
- Mitigate DDoS
- Integrate WAF
So theyβre also security gates.
15. Common Pitfalls
β Single load balancer (SPOF)
β Sticky sessions everywhere
β No health checks
β Poor monitoring
β Wrong algorithm choice
16. What You Should Remember (Exam / Interview Gold)
β Load balancing distributes requests
β Prevents overload & downtime
β Uses algorithms to route traffic
β Health checks are critical
β Layer 7 = smarter routing
β Essential for scalable systems
17. How This Fits in Modern Systems
- Microservices β service mesh load balancing
- Kubernetes β Ingress + Services
- Cloud apps β Managed LBs
- CDNs β Global load balancing
In todayβs distributed world, load balancing is no longer an optimization β itβs a necessity. Whether youβre serving a small web app or a global platform with millions of users, effective load balancing ensures resilience, performance, and scalability. By combining the right algorithms, health checks, and infrastructure choices, systems can handle traffic spikes, survive failures, and grow without disruption. Understanding load balancing isnβt just about infrastructure knowledge; itβs about learning how real-world software stays alive under pressure. Master this concept, and youβll be thinking like a true systems engineer.
Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.
π Architecture Series β Index
Thanks for reading! π
Until next time, π«‘
Usman Awan (your friendly dev π)
Top comments (0)