DEV Community: Niraj Mourya

Load Balancers Explained

Niraj Mourya — Fri, 20 Feb 2026 18:44:01 +0000

When your application starts getting traffic, one server is never enough.

At scale, distributing traffic correctly becomes one of the most important architectural decisions.

In this article, we’ll break down:

What a load balancer is
Why it’s needed
Types of load balancers
Algorithms used
Health checks
Real-world architecture patterns

The Scaling Problem

Imagine:

1 server
500,000 users
Peak traffic

Even if the server is powerful, it will eventually hit limits:

CPU saturation
Memory exhaustion
Network bottlenecks

The solution is horizontal scaling.

But how do users know which server to hit?

That’s where load balancers come in.

What Is a Load Balancer?

A load balancer is a system that:

Accepts incoming client requests
Distributes them across multiple backend servers
Ensures no single server is overloaded

Basic architecture:

Clients → Load Balancer → Server Pool

Why Use Load Balancers?

1️⃣ Scalability
Add more servers without changing client logic.

2️⃣ High Availability
If one server fails, traffic is redirected.

3️⃣ Fault Tolerance
Prevents cascading failures.

4️⃣ Zero-Downtime Deployments
You can remove a server from rotation during updates.

Load Balancing Algorithms

Round Robin
Requests distributed sequentially.
Simple, effective for uniform workloads.

Least Connections
Traffic goes to the server with the fewest active connections.
Good for uneven workloads.

IP Hash
Same client IP → same backend server.
Useful for session stickiness.

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer)

Works at TCP/UDP level
Faster
Doesn’t inspect HTTP content

Layer 7 (Application Layer)

Works at HTTP level
Can route based on:
- URL path
- Headers
- Cookies
Enables smarter routing

Health Checks

Load balancers constantly monitor backend servers.

If a server:

Stops responding
Returns errors

It is removed from rotation.

This prevents sending traffic to unhealthy instances.

Real-World Setup

In production systems, architecture often looks like this:

Users
   ↓
Load Balancer
   ↓
Web Servers (Auto-scaled)
   ↓
Database / Cache Layer

In cloud systems:

AWS ELB / ALB
Google Cloud Load Balancer
NGINX
HAProxy

Load Balancer vs Reverse Proxy

A reverse proxy:

Sits in front of servers
Forwards requests

A load balancer:

Specifically distributes load

Many tools (like NGINX) do both.

Key Takeaways

Load balancers distribute traffic across servers
Enable horizontal scaling
Improve availability
Support multiple routing strategies
Critical in system design interviews

Database Concepts Every System Design Interview Expects You to Know

Niraj Mourya — Fri, 13 Feb 2026 21:29:41 +0000

Scaling applications is not just about adding more servers.

At scale, the database becomes the bottleneck long before your application code does.

In system design interviews — and in real-world production systems — understanding database concepts like replication, sharding, indexing, and CAP theorem is critical.

Let’s break them down clearly.

1️⃣ Vertical vs Horizontal Scaling

Before we talk about distributed databases, we need to understand scaling.

🔼 Vertical Scaling (Scale Up)

You increase the capacity of a single machine:

More CPU
More RAM
Faster SSD

Pros

Simple
No architectural changes required

Cons

Hardware limit
Expensive
Single point of failure

Vertical scaling works — but only up to a point.

🔁 Horizontal Scaling (Scale Out)

You add more machines.

Instead of 1 large server → 10 smaller servers.

Pros

Virtually unlimited scale
Better fault tolerance

Cons

Complexity increases
Requires distributed architecture

Horizontal scaling is where replication and sharding come in.

2️⃣ Database Replication

Replication means copying the same data to multiple database servers.

Why do we replicate?

Improve read scalability
Improve availability
Provide fault tolerance

🔹 Primary–Replica Architecture

A common setup:

          Primary (Writes)
               |
        -----------------
        |               |
     Replica 1       Replica 2

All writes go to the Primary
Reads can go to Replicas

Replication Types

🔹 Synchronous Replication

The primary waits for confirmation from replicas before confirming a write.

✔ Strong consistency
❌ Slower writes

🔹 Asynchronous Replication

The primary does not wait for replicas.

✔ Faster writes
❌ Possible replication lag

This introduces eventual consistency.

When to Use Replication?

Read-heavy applications
High availability systems
Systems that cannot tolerate downtime

Example:

News websites
E-commerce platforms
Social media feeds

3️⃣ Database Sharding

Replication copies data.

Sharding splits data.

Instead of one massive database:

Shard 1 → Users 1–1M
Shard 2 → Users 1M–2M
Shard 3 → Users 2M–3M

Each shard contains only part of the data.

Why Shard?

Handle massive datasets
Improve write scalability
Avoid single database bottleneck

Sharding Strategies

🔹 Range-Based Sharding

User ID 1–1000 → Shard A
User ID 1001–2000 → Shard B

Simple but can cause hotspots.

🔹 Hash-Based Sharding

hash(user_id) % N

Distributes load evenly.

Harder to rebalance later.

🔹 Geo-Based Sharding

US Users → US Database
EU Users → EU Database

Useful for:

Latency optimization
Regulatory compliance

When to Use Sharding?

Massive write traffic
Large datasets
Clear partitioning strategy

Example:

Instagram user data
Large SaaS platforms
Messaging systems

4️⃣ Replication vs Sharding

Concept	Replication	Sharding
Data	Copied	Split
Improves	Read scalability	Write scalability
Complexity	Moderate	High
Use case	High availability	Massive scale

They solve different problems — and are often used together.

5️⃣ Indexing

Without an index:

SELECT * FROM users WHERE email = 'x';

The database scans every row.

With an index:

It directly locates the record.

Benefits

Faster reads
Efficient lookups

Trade-offs

Slower writes
More storage
Index maintenance overhead

Indexes are not free — they are a trade-off.

6️⃣ Read-Write Splitting

Often used with replication:

Application
   ├── Writes → Primary
   └── Reads  → Replicas

This reduces load on the primary database.

But introduces:

Consistency concerns
Replication lag issues

7️⃣ CAP Theorem

In distributed systems, you can’t have all three:

Consistency
Availability
Partition Tolerance

You must choose two.

Most real-world systems choose:

Availability
Partition tolerance

Which means accepting eventual consistency.

8️⃣ Caching (Bonus but Critical)

Sometimes the best database optimization is:

👉 Not hitting the database at all.

Tools:

Redis
Memcached

Used for:

Session storage
Frequently accessed queries
Rate limiting

Caching drastically reduces database load.

Real-World Architecture Example

Large systems often combine:

Sharding (scale writes)
Replication (scale reads)
Caching (reduce load)
Indexing (speed queries)

There is no single silver bullet.

Key Takeaways

Replication improves read scalability and availability
Sharding improves write scalability and handles large datasets
Indexing speeds up queries but slows writes
CAP theorem forces trade-offs
Caching reduces database pressure
Database design decisions shape system scalability more than code does.

Monolithic vs Microservices Architecture (Explained Simply)

Niraj Mourya — Sun, 01 Feb 2026 15:42:37 +0000

Software architecture isn’t about trends —
it’s about choosing the right trade-offs.

Two common approaches dominate modern systems:

Monolithic architecture
Microservices architecture

Let’s break them down 👇

🧱 What is Monolithic Architecture?

In a monolith, the entire application is built as one single unit.

UI
Business logic
Database access
Authentication
APIs

👉 All live in one codebase, deployed together.

Example

An e-commerce app where:

User management
Orders
Payments
Inventory are all part of one application and deployed as one service.

🔗 What is Microservices Architecture?

In microservices, the application is split into small, independent services.

Each service:

Owns a single responsibility
Has its own codebase
Can be deployed independently
Communicates via APIs (HTTP, gRPC, events)

Example

Same e-commerce app, but:

User Service
Order Service
Payment Service
Inventory Service

Each runs independently and talks over the network.

🧠 Core Difference (High Level)

Aspect	Monolith	Microservices
Codebase	Single	Multiple
Deployment	One unit	Independent
Scaling	Whole app	Per service
Complexity	Low initially	High initially
Operational overhead	Low	High
Failure isolation	Poor	Better

✅ When Monolithic Architecture Makes Sense

Monoliths are not bad. They’re often the right choice.

Use monoliths when:

You’re building an MVP
Team size is small
Domain complexity is low
You want faster development
Operational simplicity matters

Real-world example

Early-stage startup
Internal tools
Small SaaS products

Many successful companies started as monoliths (and some still are).

✅ When Microservices Make Sense

Microservices shine at scale, not at the beginning.

Use microservices when:

Large engineering teams
Clear domain boundaries
Need independent scaling
High availability requirements
Multiple teams deploying frequently

Real-world example

Large e-commerce platforms
Streaming services
Financial systems
Companies like Netflix, Amazon (at scale)

⚠️ Common Misconception

❌ “Microservices are better than monoliths”

✅ Reality:

Microservices solve organizational and scaling problems, not small-codebase problems.

Many teams move too early — and pay the price in:

DevOps complexity
Network failures
Debugging difficulty

🧭 A Practical Rule of Thumb

Start with a monolith.
Move to microservices when pain forces you to.

Good architecture evolves — it isn’t chosen on Day 1.

🔑 Key Takeaways

✔ Monoliths are simple and fast to build
✔ Microservices add flexibility but complexity
✔ Scale and team size matter more than trends
✔ Architecture should match your problem

💬 If you’ve worked with both:

What trade-offs did you face?
Would you choose differently next time?