Load balancing distributes traffic across multiple servers. It's essential for scalability and reliability. Here's everything you need to know.
Why Load Balance?
- Handle more traffic than one server can manage
- Eliminate single points of failure
- Enable zero-downtime deployments
- Distribute load geographically
Load Balancing Algorithms
Round Robin
Simplest approach — rotate through servers:
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000;
server 10.0.0.3:8000;
}
Weighted Round Robin
Give more traffic to stronger servers:
upstream backend {
server 10.0.0.1:8000 weight=5; # gets 5x traffic
server 10.0.0.2:8000 weight=3;
server 10.0.0.3:8000 weight=1;
}
Least Connections
Send to the server with fewest active connections:
upstream backend {
least_conn;
server 10.0.0.1:8000;
server 10.0.0.2:8000;
server 10.0.0.3:8000;
}
IP Hash (Sticky Sessions)
Same client always goes to same server:
upstream backend {
ip_hash;
server 10.0.0.1:8000;
server 10.0.0.2:8000;
}
Full Nginx Load Balancer Config
http {
upstream api_servers {
least_conn;
server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.3:8000 backup; # only used when others are down
}
server {
listen 80;
server_name api.example.com;
location / {
proxy_pass http://api_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
location /health {
access_log off;
return 200 "OK";
}
}
}
Health Checks
# Application health endpoint
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
async def health():
checks = {
"database": await check_db(),
"cache": await check_redis(),
"disk": check_disk_space(),
}
healthy = all(checks.values())
return {"status": "healthy" if healthy else "unhealthy", "checks": checks}
Application-Level Load Balancing
import random
import httpx
class LoadBalancer:
def __init__(self, servers: list[str]):
self.servers = servers
self.healthy = set(servers)
async def request(self, path: str) -> dict:
available = list(self.healthy)
random.shuffle(available)
for server in available:
try:
async with httpx.AsyncClient() as client:
resp = await client.get(f"{server}{path}", timeout=5.0)
return resp.json()
except Exception:
self.healthy.discard(server)
self._schedule_health_check(server)
raise Exception("No healthy servers available")
async def _health_check(self, server: str):
try:
async with httpx.AsyncClient() as client:
resp = await client.get(f"{server}/health", timeout=2.0)
if resp.status_code == 200:
self.healthy.add(server)
except Exception:
pass
lb = LoadBalancer(["http://10.0.0.1:8000", "http://10.0.0.2:8000", "http://10.0.0.3:8000"])
Layer 4 vs Layer 7
- Layer 4 (TCP): Faster, no content inspection. Use for databases, raw TCP.
- Layer 7 (HTTP): Can route by URL, headers, cookies. Use for web apps.
# Layer 7: Route by path
location /api/ {
proxy_pass http://api_servers;
}
location /static/ {
proxy_pass http://cdn_servers;
}
Key Takeaways
- Start with round robin, switch to least connections under load
- Always configure health checks and failure thresholds
- Use backup servers for failover
- Layer 7 for HTTP apps, Layer 4 for databases
- Monitor server response times to detect imbalances
6. Sticky sessions only when absolutely necessary (they reduce distribution)
🚀 Level up your AI workflow! Check out my AI Developer Mega Prompt Pack — 80 battle-tested prompts for developers. $9.99
Top comments (0)