Load balancing is one of those “system design” terms that sounds complex… but the idea is actually very simple.
If a lot of people use your app at the same time, a single server can get overloaded and crash. Load balancing solves this by distributing traffic across multiple servers so the system stays fast and reliable.
This post explains load balancing in a beginner-friendly way — no heavy jargon, no complicated diagrams.
What is Load Balancing?
Load balancing means:
Distributing incoming requests across multiple servers, so no single server becomes a bottleneck.
Instead of sending all users to a single server:
Users → Server 1 (overloaded ❌)
Load balancing routes requests to multiple servers:
Users → Load Balancer → Server 1 ✅
→ Server 2 ✅
→ Server 3 ✅
The load balancer is like a traffic manager that sits in front of your servers.
Why Do We Need Load Balancing?
Here are the big reasons:
1) Prevent server overload
If too many requests hit the same server, it can:
- slow down
- return
500errors - crash completely
Load balancing spreads traffic out.
2) Improve performance
When requests are distributed evenly:
- servers respond faster
- users get better experience
3) High availability (fault tolerance)
If one server goes down:
✅ load balancer detects it
✅ stops sending traffic to it
✅ routes users to healthy servers
This is one of the most important benefits.
Types of Load Balancers
There are two common categories:
✅ Hardware load balancer
- Physical device used in data centers
- Powerful but expensive
✅ Software load balancer (most common today)
- A service or program that runs on machines/cloud Examples:
- Nginx
- HAProxy
- AWS Elastic Load Balancer (ELB)
- Google Cloud Load Balancer
- Azure Load Balancer
Most developers use cloud load balancers.
How Does a Load Balancer Choose a Server?
A load balancer uses routing strategies (algorithms) to decide where each request goes.
Here are the most common ones:
1) Round Robin (simplest)
Requests are sent in rotation:
Req1 → Server 1
Req2 → Server 2
Req3 → Server 3
Req4 → Server 1
✅ Great when servers are identical
❌ Not ideal if one server is slower than others
2) Least Connections (smarter)
Sends requests to the server with the least active connections.
✅ Great when:
- requests take different durations
- traffic is uneven
Example:
- Server 1 has 120 active users
- Server 2 has 50 active users
➡️ New request goes to Server 2
3) IP Hash / Sticky routing
Routes the same user to the same server.
✅ Useful for session-based systems
❌ Not always ideal for scaling
A Real Example (Why Load Balancing Matters)
Let’s say:
- 1 server can handle 1000 users
- your app suddenly gets 10,000 users
One server will die.
With load balancing:
- you run 10 servers
- each handles ~1000 users
✅ app stays stable
✅ users don’t suffer
✅ the system scales
Sessions Problem (Very Important)
Load balancing creates one common issue: sessions.
The issue
If login/session is stored inside server memory:
- User logs in → routed to Server 1 ✅
- Next request → routed to Server 2 ❌ (Server 2 doesn’t know the session)
So user suddenly feels “logged out”.
Fix options
✅ Option A: Sticky sessions
The load balancer ensures the same user always goes to the same server.
This works, but has drawbacks in large systems.
✅ Option B (best practice): Central session storage
Store sessions in:
- Redis
- database
Now any server can handle the user.
✅ stateless servers
✅ better scaling
✅ easy server replacement
Health Checks (How LB Detects Broken Servers)
Load balancers constantly check server health using an endpoint like:
/health/status
Example health check:
GET /health
200 OK
If a server stops responding:
❌ Load balancer removes it temporarily
✅ Sends traffic only to healthy servers
Where Load Balancers Fit in System Architecture
Typical setup looks like this:
Users
↓
CDN (optional)
↓
Load Balancer
↓
App Servers (multiple)
↓
Cache (Redis) + Database
This is how modern apps handle scale.
Load Balancer vs API Gateway (Common Confusion)
They sound similar but do different jobs.
✅ Load Balancer
Main job:
- distribute traffic between servers
✅ API Gateway
Main job:
- manage API requests (smart routing + security) Includes:
- authentication
- rate limiting
- analytics
- routing microservices (e.g.,
/users,/payments)
In large systems, both can exist together.
TL;DR
✅ Load balancing distributes traffic across multiple servers.
Benefits:
- prevents overload
- improves performance
- provides fault tolerance
Common algorithms:
- Round Robin
- Least Connections
- IP Hash / Sticky Sessions
Top comments (0)