DEV Community

Mourya Vamsi Modugula
Mourya Vamsi Modugula

Posted on

Load Balancing Explained (Simple Guide for Beginners)

Load balancing is one of those “system design” terms that sounds complex… but the idea is actually very simple.

If a lot of people use your app at the same time, a single server can get overloaded and crash. Load balancing solves this by distributing traffic across multiple servers so the system stays fast and reliable.

This post explains load balancing in a beginner-friendly way — no heavy jargon, no complicated diagrams.


What is Load Balancing?

Load balancing means:

Distributing incoming requests across multiple servers, so no single server becomes a bottleneck.

Instead of sending all users to a single server:

Users → Server 1 (overloaded ❌)
Enter fullscreen mode Exit fullscreen mode

Load balancing routes requests to multiple servers:

Users → Load Balancer → Server 1 ✅
                     → Server 2 ✅
                     → Server 3 ✅
Enter fullscreen mode Exit fullscreen mode

The load balancer is like a traffic manager that sits in front of your servers.


Why Do We Need Load Balancing?

Here are the big reasons:

1) Prevent server overload

If too many requests hit the same server, it can:

  • slow down
  • return 500 errors
  • crash completely

Load balancing spreads traffic out.


2) Improve performance

When requests are distributed evenly:

  • servers respond faster
  • users get better experience

3) High availability (fault tolerance)

If one server goes down:

✅ load balancer detects it
✅ stops sending traffic to it
✅ routes users to healthy servers

This is one of the most important benefits.


Types of Load Balancers

There are two common categories:

✅ Hardware load balancer

  • Physical device used in data centers
  • Powerful but expensive

✅ Software load balancer (most common today)

  • A service or program that runs on machines/cloud Examples:
  • Nginx
  • HAProxy
  • AWS Elastic Load Balancer (ELB)
  • Google Cloud Load Balancer
  • Azure Load Balancer

Most developers use cloud load balancers.


How Does a Load Balancer Choose a Server?

A load balancer uses routing strategies (algorithms) to decide where each request goes.

Here are the most common ones:


1) Round Robin (simplest)

Requests are sent in rotation:

Req1 → Server 1
Req2 → Server 2
Req3 → Server 3
Req4 → Server 1
Enter fullscreen mode Exit fullscreen mode

✅ Great when servers are identical
❌ Not ideal if one server is slower than others


2) Least Connections (smarter)

Sends requests to the server with the least active connections.

✅ Great when:

  • requests take different durations
  • traffic is uneven

Example:

  • Server 1 has 120 active users
  • Server 2 has 50 active users

➡️ New request goes to Server 2


3) IP Hash / Sticky routing

Routes the same user to the same server.

✅ Useful for session-based systems
❌ Not always ideal for scaling


A Real Example (Why Load Balancing Matters)

Let’s say:

  • 1 server can handle 1000 users
  • your app suddenly gets 10,000 users

One server will die.

With load balancing:

  • you run 10 servers
  • each handles ~1000 users

✅ app stays stable
✅ users don’t suffer
✅ the system scales


Sessions Problem (Very Important)

Load balancing creates one common issue: sessions.

The issue

If login/session is stored inside server memory:

  • User logs in → routed to Server 1
  • Next request → routed to Server 2 ❌ (Server 2 doesn’t know the session)

So user suddenly feels “logged out”.


Fix options

✅ Option A: Sticky sessions

The load balancer ensures the same user always goes to the same server.

This works, but has drawbacks in large systems.


✅ Option B (best practice): Central session storage

Store sessions in:

  • Redis
  • database

Now any server can handle the user.

✅ stateless servers
✅ better scaling
✅ easy server replacement


Health Checks (How LB Detects Broken Servers)

Load balancers constantly check server health using an endpoint like:

  • /health
  • /status

Example health check:

GET /health
200 OK
Enter fullscreen mode Exit fullscreen mode

If a server stops responding:

❌ Load balancer removes it temporarily
✅ Sends traffic only to healthy servers


Where Load Balancers Fit in System Architecture

Typical setup looks like this:

Users
  ↓
CDN (optional)
  ↓
Load Balancer
  ↓
App Servers (multiple)
  ↓
Cache (Redis) + Database
Enter fullscreen mode Exit fullscreen mode

This is how modern apps handle scale.


Load Balancer vs API Gateway (Common Confusion)

They sound similar but do different jobs.

✅ Load Balancer

Main job:

  • distribute traffic between servers

✅ API Gateway

Main job:

  • manage API requests (smart routing + security) Includes:
  • authentication
  • rate limiting
  • analytics
  • routing microservices (e.g., /users, /payments)

In large systems, both can exist together.


TL;DR

Load balancing distributes traffic across multiple servers.

Benefits:

  • prevents overload
  • improves performance
  • provides fault tolerance

Common algorithms:

  • Round Robin
  • Least Connections
  • IP Hash / Sticky Sessions

Top comments (0)