João Godinho

Posted on Oct 7, 2024

Load Balancing 101

#computerscience #systemdesign #backend

Overview

Have you ever noticed how websites stay up and running even when many people visit? That’s thanks to load balancing. It helps spread the work across several servers, keeping everything running smoothly.
In this article, I will share much information about what load balancing is, how it works, and why it’s important. By the end, you’ll understand the theory well and be ready to dive into the practical side!

Overview
What is a Load Balancing?
Use Cases of Load Balancing
In Which "Layers" Can We Have Load Balancing?
Static Load Balancing Algorithms
Dynamic Load Balancing Algorithms
Failover and Monitoring
References
Thanks for Reading!
Contacts

What is a Load Balancing?

Load balancing is the process of distributing computational workloads between two or more servers. This process helps to prevent a single resource from becoming overwhelmed, enhancing performance and reliability. Load Balancer is the service that does it.

Example: When choosing a checkout line at the grocery store, you have multiple lines (servers) and can select the one you believe will be the fastest based on some logic.

Image source: CloudFront

Use cases of Load Balancing

High Traffic Websites: Distribute incoming traffic across multiple servers to prevent overload and ensure high availability.
- An extra layer of security against DDoS. (but not a silver bullet)
API Version Management: Route API requests to different servers based on versioning, allowing for concurrent deployment of multiple API versions.
SSL Termination: Offload SSL processing from backend servers to the load balancer, improving performance and simplifying certificate management.
- Instead of handling each server individually, do it once on Load Balancer.
Geographic Load Balancing: Direct traffic to the nearest server location based on the user's geographic location, reducing latency.
- Global Server Load Balancing (GSLB).
A/B Testing: Route a percentage of traffic to different application versions for testing and gathering user feedback.
- Also great for deploying updates to a selected percentage of users.
Session Persistence: Maintain user sessions on the same server for consistency, useful for applications that require stateful interactions.
- Ensures that all requests from a specific user session are routed to the same server.
Failover and Redundancy: Automatically reroute traffic to healthy servers in case of failure, ensuring high availability and reliability.
Resource Scaling: Dynamically allocate traffic to new servers as demand increases, facilitating horizontal scaling.
Content Delivery: Improve content delivery speed by directing users to servers that host cached content or static resources.
Microservices Architecture: Balance traffic between various microservices in an application, optimizing resource usage and performance.

In which "layers" can we have Load Balancing?

Software:
- Transport Layer (Layer 4): This layer distributes traffic based on network protocols, such as TCP/UDP. Load balancers at this layer manage traffic using information from network packets (IP addresses, port numbers) without inspecting the payload, making them faster and more efficient in scenarios where deep inspection is unnecessary.
- Application Layer (Layer 7): This layer focuses on distributing traffic based on application-specific data, such as HTTP headers, cookies, or request types. Load balancers operating at this layer can make more intelligent routing decisions and can handle features like SSL termination, session persistence, and content-based routing.
We can also have Hardware Load Balancing, but it's not the focus of this article. But it's good to know that load balancers started as hardware devices.

Static Load Balancing Algorithms

Distribute workloads based on a fixed plan, ignoring the current state of the servers. A static load balancer doesn’t know which servers are slow or underutilized, leading to potential inefficiencies. Quick to set up, but it may not optimize resource use effectively.

Round-robin: Each request is sent to the next server in the list sequentially. (round-robin fashion)
- Example with 2 servers: Req1 -> server1, req2 -> server2, req3 -> server1, req4 -> server2...
- DNS Round-robin: You can do round-robin without a Load Balancer in front of your server, instead, you can create multiple A records on your DNS Server inside the same domain.
- This is not so good because:
  - DNS responses are cached on (client-side, ISPs, etc..). Once a DNS resolution happens, the resolved IP address is cached for the duration of its TTL (Time To Live).
  - DNS servers can't do health checks, so if one of your servers is down. DNS round-robin can still forward requests to this server.
Sticky round-robin: Requests from the same client are consistently sent to the same server while still using a round-robin method for different clients.
Weighted Round Robin: Each server is assigned a weight that determines the percentage of requests allocated to it, enabling differentiated load distribution based on server capacity or capability. This method can also be used for A/B testing and gradual deployments.
- Example: If server 1 has a weight of 0.8 (80%) and server 2 has a weight of 0.2 (20%), then for every 10 requests, 8 would be sent to server 1, and 2 would be sent to server 2.
IP hash: The load balancer creates a hash with the client IP address, it converts the client IP address to a number, which is then mapped to individual servers. (same client always goes to same server)
- hash = Mathematical function that transforms input data into a fixed-size string of characters, which represents the original data.

Dynamic Load Balancing Algorithms

Consider the availability, workload, and health of servers. They shift traffic from overloaded or slow servers to underutilized ones, ensuring efficient distribution. However, they are harder to configure, as server availability depends on health, capacity, and task size. The load balancer itself can become a bottleneck due to monitoring the servers.

Least connection: Forward the traffic to the server with the least open connections. This method assumes that all connections require equal processing power for all servers. (open connection = still processing the request)
Weighted least connection: Assume that some servers can handle more active connections than others (it's configured by administrators). It differs from round-robin (static load balancing) by considering server connections and also capacity. (weight is also set up by the administrator as in the round-robin)
Least response time: This combines the server with the shortest response time and fewest connections.
- response time = time to process each request.
Resource-based: Traffic is distributed by analyzing the current server load. A software called agent runs on each server to monitor resource usage (CPU, memory), and the load balancer directs traffic to the server with the most available resources.
- The Resource-based method uses real-time data, while the weighted least connection method relies on pre-configured settings set by administrators.
Note: To visualize better these Load Balancing algorithms check ByteByteGo video

Failover and Monitoring:

Monitoring: Dynamic load balancers must be aware of server health: their current status, how well they are performing, and so on.
Failover: If a server or group of servers is performing slowly, the load balancer distributes less traffic to it. If a server or group of servers fails, the load balancer reroutes traffic to another group of servers, a process known as "failover."
- Strategies of Server Failover
  - Active-active: Both servers are working and sharing the load at the same time. If one fails, the other takes over.
  - Active-standby: One server works while the other waits to take over if the main fails.

References

Thanks for Reading!

Feel free to reach out if you have any questions, feedback, or suggestions. Your engagement is appreciated!

Contacts

You can find this and more content on:

DEV Community