Introduction
A laod balancer is a traffic director sets between clients and a pool of backend server, deciding which server handles each request while distributing that traffic in very specific way configured by the developer.
The goal of load balancers is to distribute workload, scale traffic dynamically and enhance services availability* three properties you don't get from a single server no matter how big or sophsiticated you make it.
Load balancers come in two main kinds:
- L4 ( Transport Layer ) : Hardware-based load-balancer usually found in data centers. Routes based on IP, port and TCP/UDP connection state. Doesn't inspect content for maximum speed. Fast and for simple load-balancing applications.
- L7 ( Application Layer ) : Inspects HTTP headers, paths, cookies, even request bodies. Enables content-based routing and robust security. Slower per-request than L4 but vastly more felxible for wider range of applications. Runs on commodity hardware, which makes it suitable for a wider range of applications.
A third category would be Global Load Balancers ( GSLB ). Operates above both, distributing traffic across regions using DNS-based or anycast routing. That's out of the scope of this post, but worth knowing GSLB usually set on top of the L4/L7 layer.
L4, L7 can be managed or self-hosted. With managed L7 you get less ops but less-low level control. With managed L4 you get limited features for minimal ops and without the complexity of L4 load balancers.
One intresting design worth exploring in depth for load-balancers is the algorithms which helps a load balancer to decide how to pick the destination.
Types of Load-balancer algorithms
There are two types of load balancer algorithms stateful and stateless. Stateful algorithms Track per-backend metrics to make a decision. Stateless algorithms Make decisions from configurations alone. Stateful algorithms are smarter but requires the LB to track more, while stateless algorithms are faster but they are for very specific use-cases.
- Round Robin Alogirhtm: Rotate through available servers in a loop.
- IP Hash: We want the same client to always hit the same backend. We can do this by the following fuction hash(client_ip % servers_count ).
- Weighted Round Robin: We want to send more traffic to proportionally more capable servers. If A has weight 5, B and C have weight 1. A gets ~5/7 of the traffic.
- URL hash: Certain paths always go to the same backend. One usecase here could be cache locality, for example, /users have hot cache for user data, thus we want clients to hit a specific backend where that hot cache is stored so we can save memory else where.
- Least Connections : Route to the backend which has the fewest active connections right now.
- Random Two with Least Connections : Pick backends at random, send to whichever has fewer connections.
One algorithm we missed in that list is sticky cookie where the load-balancer route traffic of the client per sesson with the backend.
The algorithm seams identical per use-case for IP hashing but they are different. We want more accurate traffic routing in this each client route to a known backend.
As you know the say goes like this " talk is easy show me the code" It's actually really easy to do this with known loadbalancers such as nginx.
- Round Robin with Nginx
events { worker_connections 1024; }
http {
upstream backend {
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
- IP Hashing
events { worker_connections 1024; }
http {
upstream backend {
ip_hash;
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
- Weighted Round Robin
events { worker_connections 1024; }
http {
upstream backend {
server backend1:5678 weight=5;
server backend2:5678 weight=1;
server backend3:5678 weight=1;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
- URL hashing
events { worker_connections 1024; }
http {
upstream backend {
hash $request_uri consistent;
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
- Least Connection
events { worker_connections 1024; }
http {
upstream backend {
least_conn;
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
- The more modern approach ( Power of Two )
events { worker_connections 1024; }
http {
upstream backend {
random two least_conn;
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
Finally the sticky cookie ( Note only for Nginx Plus version ) :
events { worker_connections 1024; }
http {
upstream backend {
sticky name=srv_id expires=1h path=/;
server backend1:5678;
server backend2:5678;
server backend3:5678;
}
server {
listen 80;
location / { proxy_pass http://backend; }
}
}
Bonus: Observability
Algorithms are part of the story. The other worth knowing what's actually happening in production. Load-balancers give huge leverage when it comes to visibility. Here is what load-balancers help you see about your services :
- Performance: Response time, LB Metrics, Throuput, Latency P50, P99.
- Health : Failed health checks, Active health checks.
- Errors: HTTP error rates, Dropped connections.
- Traffic: Total connections, Request Rate.
All of these can be acehived for free with open-source self managed load balancers. Nevertheless, if you want more you must pay. Pay for what exactly, here is some examples what paid LB versions give you:
- Active health checks: OSS LB does only passive checks, it makrs backends down after liveness request fail. Active checks help you go further and decide states such as ( slow-but-alive ) backends.
- More stateful algorithms. Nginx OSS give you IP-Hash but most people confuse this as solution for session pased load-balancing, the edge case they missed is that multiple clients can have the same IP, which is poor proxy for the same IP.
- Least time : Nginx Plus allow you to route by observed response latency.
Best regards,
Ahmed
Top comments (0)