DEV Community

郑沛沛
郑沛沛

Posted on

Load Balancing Explained: Algorithms, Patterns, and Real Configs

Load balancing distributes traffic across multiple servers. It's essential for scalability and reliability. Here's everything you need to know.

Why Load Balance?

  • Handle more traffic than one server can manage
  • Eliminate single points of failure
  • Enable zero-downtime deployments
  • Distribute load geographically

Load Balancing Algorithms

Round Robin

Simplest approach — rotate through servers:

upstream backend {
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
    server 10.0.0.3:8000;
}
Enter fullscreen mode Exit fullscreen mode

Weighted Round Robin

Give more traffic to stronger servers:

upstream backend {
    server 10.0.0.1:8000 weight=5;  # gets 5x traffic
    server 10.0.0.2:8000 weight=3;
    server 10.0.0.3:8000 weight=1;
}
Enter fullscreen mode Exit fullscreen mode

Least Connections

Send to the server with fewest active connections:

upstream backend {
    least_conn;
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
    server 10.0.0.3:8000;
}
Enter fullscreen mode Exit fullscreen mode

IP Hash (Sticky Sessions)

Same client always goes to same server:

upstream backend {
    ip_hash;
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
}
Enter fullscreen mode Exit fullscreen mode

Full Nginx Load Balancer Config

http {
    upstream api_servers {
        least_conn;
        server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
        server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
        server 10.0.0.3:8000 backup;  # only used when others are down
    }

    server {
        listen 80;
        server_name api.example.com;

        location / {
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_connect_timeout 5s;
            proxy_read_timeout 30s;
        }

        location /health {
            access_log off;
            return 200 "OK";
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Health Checks

# Application health endpoint
from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
async def health():
    checks = {
        "database": await check_db(),
        "cache": await check_redis(),
        "disk": check_disk_space(),
    }
    healthy = all(checks.values())
    return {"status": "healthy" if healthy else "unhealthy", "checks": checks}
Enter fullscreen mode Exit fullscreen mode

Application-Level Load Balancing

import random
import httpx

class LoadBalancer:
    def __init__(self, servers: list[str]):
        self.servers = servers
        self.healthy = set(servers)

    async def request(self, path: str) -> dict:
        available = list(self.healthy)
        random.shuffle(available)

        for server in available:
            try:
                async with httpx.AsyncClient() as client:
                    resp = await client.get(f"{server}{path}", timeout=5.0)
                    return resp.json()
            except Exception:
                self.healthy.discard(server)
                self._schedule_health_check(server)

        raise Exception("No healthy servers available")

    async def _health_check(self, server: str):
        try:
            async with httpx.AsyncClient() as client:
                resp = await client.get(f"{server}/health", timeout=2.0)
                if resp.status_code == 200:
                    self.healthy.add(server)
        except Exception:
            pass

lb = LoadBalancer(["http://10.0.0.1:8000", "http://10.0.0.2:8000", "http://10.0.0.3:8000"])
Enter fullscreen mode Exit fullscreen mode

Layer 4 vs Layer 7

  • Layer 4 (TCP): Faster, no content inspection. Use for databases, raw TCP.
  • Layer 7 (HTTP): Can route by URL, headers, cookies. Use for web apps.
# Layer 7: Route by path
location /api/ {
    proxy_pass http://api_servers;
}
location /static/ {
    proxy_pass http://cdn_servers;
}
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. Start with round robin, switch to least connections under load
  2. Always configure health checks and failure thresholds
  3. Use backup servers for failover
  4. Layer 7 for HTTP apps, Layer 4 for databases
  5. Monitor server response times to detect imbalances

6. Sticky sessions only when absolutely necessary (they reduce distribution)

🚀 Level up your AI workflow! Check out my AI Developer Mega Prompt Pack — 80 battle-tested prompts for developers. $9.99

Top comments (0)