DEV Community: Rupesh Konduru

Message Queues & Why Async Changes Everything

Rupesh Konduru — Mon, 30 Mar 2026 13:11:00 +0000

What if the two sides of a conversation don't need to be available at the same time? That one idea unlocks a completely different way of building systems.

Think about the difference between a phone call and an email.

A phone call requires both people present at the exact same moment. If the other person is busy, you're blocked. Nothing happens until they pick up.

An email is different. You write it, send it, move on. The other person reads it when they're ready. You're not waiting. Life continues.

That difference — synchronous vs asynchronous — is the entire soul of Message Queues.

And once you understand it, you'll see it everywhere in systems you use every day.

The Problem With Talking Directly

In a typical system, when Service A needs Service B to do something, it calls it directly and waits:

Service A ──→ HTTP Request ──→ Service B
              (waits...)
Service A ←── Response    ←── Service B

Clean, simple — and fragile in ways that only reveal themselves at scale.

Tight Coupling: Both services must be running simultaneously. If B crashes, A crashes too. Two independent services become dependent on each other's heartbeat.

Speed Mismatch: What if Service A fires 10,000 requests per second but B can only process 500? Requests pile up, time out, and fail. A is screaming into a bottleneck it has no control over.

No Safety Net: If B is temporarily down and A's request fails, that work is just gone. A needs complex retry logic or accepts data loss.

Message Queues solve all three simultaneously.

The Diner Analogy

Picture a busy restaurant. When a customer orders, the waiter doesn't march into the kitchen and stand there waiting until the food is ready before taking another order. The entire front of house would grind to a halt.

Instead, the waiter writes the order on a ticket, clips it to the rail, and goes back to take more orders. The kitchen picks up tickets at its own pace.

Waiter ──→ Order ticket rail ──→ Kitchen
(Producer)  (Message Queue)    (Consumer)

The waiter doesn't care how long the kitchen takes
The kitchen doesn't get overwhelmed by 50 simultaneous verbal orders
If a chef calls in sick, tickets pile up briefly — nothing is lost
You can hire more chefs independently of the front of house

This is exactly how message queues work in software.

The Three Superpowers

⚡ Superpower 1 — Decoupling

Without a queue, your User Service has direct wires to your Email Service, Analytics Service, and Notification Service. If any one of them goes down, your User Service feels it.

❌ Without Queue:
User Service ──→ Email Service
User Service ──→ Analytics Service
User Service ──→ Notification Service
(breaks if ANY of these go down)

✅ With Queue:
User Service ──→ [QUEUE]
                    ↓
               Email Service reads when ready
               Analytics Service reads when ready
               Notification Service reads when ready

You can add new consumers — new services that react to events — without touching the producer at all. Plug-and-play architecture.

🛡️ Superpower 2 — Durability

Messages sit in the queue until they're successfully processed. If the consumer crashes mid-task, the message doesn't disappear — it gets redelivered when the consumer comes back up.

This works through acknowledgements. The queue only deletes a message after the consumer explicitly says "I handled this successfully." Your system can crash and restart without losing a single unit of work.

🚀 Superpower 3 — Load Leveling

Imagine a sudden surge: Black Friday, a viral post, a TV segment about your app. Without a queue, 10,000 requests per second hitting a service that handles 500 means collapse.

Without Queue:
10,000 req/sec → Service B (handles 500/sec) → 💀

With Queue:
10,000 req/sec → Queue (holds patiently)
                   ↓
              Service B processes at 500/sec → ✅ everything handled

The queue acts as a shock absorber. Your system bends instead of breaks. You can also spin up more consumers automatically when the backlog grows — scaling in direct response to real demand.

What Actually Goes Into a Queue?

Anything that doesn't need an instant response:

Trigger	Producer	Consumer(s)
User signs up	Auth Service	Email Service → welcome email
Video uploaded	Upload Service	Transcoding Service → compression
Order placed	Order Service	Inventory, Billing, Notifications
Image posted	App Server	Thumbnail generator, content moderation
Any log event	Any service	Analytics and monitoring pipeline

The pattern: anything that can happen a moment after the user gets their response belongs in a queue. The user doesn't need to wait for their welcome email before they see the dashboard.

When NOT to Use a Queue

Async isn't always better. Sometimes you genuinely need a direct answer right now.

Use synchronous when	Use async (queue) when
User is waiting for the result	It can happen in the background
Fast, simple operations	Long-running or heavy processing
Checking login credentials	Sending a welcome email
Payment confirmation	Generating a monthly PDF statement

A payment confirmation needs to be synchronous — the user is staring at a spinner. Generating their statement PDF? Queue it. Learning to tell the difference is one of the core instincts of a backend engineer.

Tools you'll encounter: Kafka, RabbitMQ, Amazon SQS, Google Pub/Sub. The concept is identical across all of them — producer, queue, consumer. The details differ.

The Full Picture — Everything Together

Here's how our architecture evolved across all three posts:

Post 1 — The Beginning:
  User → Server → Database
  Works great. Until 500,000 people show up.

Post 1 — Scaling:
  User → [Server 1]
       → [Server 2] → Database
       → [Server 3]
  More capacity. But how do requests get routed?

Post 2 — Load Balancing:
  User → Load Balancer → [Server 1] → Database
                      → [Server 2] → Database
                      → [Server 3] → Database
  Traffic distributes intelligently now.

Post 2 — Consistent Hashing:
  Same setup, but servers and caches use a hash ring.
  A node dying reshuffles ~1/N keys instead of everything.

Post 3 — Message Queues:
  User → Load Balancer → [Servers]
                              |
                       [MESSAGE QUEUE]
                              |
                    Worker Services (async)
                              |
                           Database
  Heavy work moves off the critical path.
  The system absorbs spikes. Nothing is lost.

That final architecture isn't exotic. It's the baseline of how most production systems you interact with every day are built — Instagram, Spotify, WhatsApp. The specific implementations differ, but the principles are exactly these.

Every solution introduces the next problem. That's not a bug — that's the game. And once you see the pattern, you can't unsee it.

What Comes Next

We've covered the foundation layer. But there's a whole second layer waiting:

Databases at scale — SQL vs NoSQL, replication, sharding, CAP theorem
Caching — Redis, Memcached, cache invalidation strategies
CDNs — How static content gets served from 50ms away no matter where you are
Rate Limiting — How systems protect themselves from being overwhelmed

Each of these connects back to the five concepts we covered in this series. The vocabulary you've built here is the foundation everything else sits on.

This is Part 3 of the System Design from First Principles series.
← Part 1: What Is System Design, Really?
← Part 2: Load Balancing & Consistent Hashing

Load Balancing & Consistent Hashing — The Art of Splitting Work Fairly

Rupesh Konduru — Thu, 26 Mar 2026 17:58:00 +0000

You hired ten servers. Now someone needs to hand out the work — fairly, intelligently, and without breaking when one of them disappears.

In the last post, we talked about horizontal scaling — adding more servers to handle more traffic. It sounds simple enough. But here's the question nobody asks out loud: how does a user's request know which server to go to?

If you just point everyone at the same IP address, they all pile into Server 1 while Server 2 and Server 3 sit there doing nothing. You've spent money on more machines and gained absolutely nothing.

You need a traffic director. That's what this post is about.

The Load Balancer

A Load Balancer sits in front of all your servers and acts as the single point of contact for every incoming request. Users talk to it, it decides which server handles the work, and the server responds.

                     ┌─────────────────┐
                     │                 │──→ Server 1
Users ──→ Load Balancer                │──→ Server 2
                     │                 │──→ Server 3
                     └─────────────────┘

From the user's perspective, they're talking to one address. They have no idea ten servers exist behind it. That invisibility is intentional — and it's one of the most elegant things about how the web works.

How Does It Actually Decide?

There are several routing strategies, each with a different personality:

🔄 Round Robin — Take turns

Request 1 → Server 1. Request 2 → Server 2. Request 3 → Server 3. Request 4 → back to Server 1.

✅ Dead simple, zero overhead
❌ Doesn't account for request weight — a heavy video upload and a tiny ping get treated identically

⚖️ Weighted Round Robin — Not all workers are equal

Same rotation, but servers get weights based on capacity. A powerful server might get 3 out of every 5 requests while a smaller one gets 2.

✅ Great when your servers have different specs
❌ Still doesn't account for what's actually happening on each server right now

🔗 Least Connections — Go to whoever is least busy

The load balancer tracks active connections in real time and always routes to the least busy server.

✅ Smart and dynamic — handles variable request durations well
❌ Slightly more overhead to track connection counts continuously

🔒 IP Hashing — Same user, same server

The user's IP address gets hashed and always maps to the same server.

✅ Useful for stateful sessions that can't be refactored
❌ If that server goes down, re-routing gets complicated — and this leads us to our next topic

Layer 4 vs Layer 7 — Two Kinds of Intelligence

Load balancers can operate at different levels of the network stack:

Type	What it sees	Best for
Layer 4	IP addresses and TCP info only	Raw speed, simple routing
Layer 7	Full HTTP content — URL, headers, cookies	Smart, content-aware routing

Layer 7 is where things get powerful. You can route requests to completely different server clusters based on what the request is actually asking for:

/api/videos  ──→  Video processing servers
/api/auth    ──→  Auth servers
/api/search  ──→  Search servers

This is called path-based routing and it's the backbone of how microservices are structured at real companies.

Wait — isn't the load balancer itself a single point of failure?
Yes. The fix: run multiple load balancers. One active, one on standby. If the active one goes silent, the standby takes over automatically. This is called active-passive failover and it shows up everywhere in resilient system design.

Consistent Hashing — When Servers Come and Go

IP hashing introduced a sneaky problem: what happens when a server dies? Let me show you why this is nastier than it sounds.

The Catastrophe of Simple Hashing

Say you have 3 cache servers and a simple formula:

server_index = hash(key) % number_of_servers

hash("user_123") % 3 = 1  → Server 1
hash("user_456") % 3 = 2  → Server 2
hash("user_789") % 3 = 0  → Server 0

Works perfectly. Until Server 1 crashes. Now you have 2 servers:

hash("user_123") % 2 = 1  → Server 1 (gone 💀)
hash("user_456") % 2 = 0  → Server 0 (was on Server 2!)
hash("user_789") % 2 = 1  → Server 1 (gone 💀)

Almost every key remaps to a different server. In a cache, this triggers a massive wave of cache misses — every request now hits your database directly. Your database gets hammered. Your system crawls to a halt. All because one server went down.

This is the problem Consistent Hashing was invented to solve.

The Hash Ring

Imagine a ring — like a clock face — numbered 0 to 360 degrees. This is called the hash ring. Both your servers and your data keys get hashed onto this same ring.

The rule is beautifully simple: to find which server handles a key, start at that key's position and walk clockwise until you hit a server.

user_123 at 120° → walks clockwise → hits Server B at 180° ✅
user_456 at 200° → walks clockwise → hits Server C at 270° ✅
user_789 at 300° → walks clockwise → wraps around → Server A at 90° ✅

Now Watch What Happens When a Server Dies

Server B at 180° crashes. What happens to user_123 at 120°?

Before: user_123 at 120° → Server B at 180° ✅
After:  user_123 at 120° → Server B GONE
                         → keeps walking → Server C at 270° ✅

Only the keys pointing to Server B get reassigned — flowing to the next server clockwise. Every other key? Completely undisturbed.

With simple hashing, one server dying reshuffles everything. With consistent hashing, it only affects about 1/N of your keys.

That difference is the entire reason this algorithm exists.

Virtual Nodes — Fixing Uneven Distribution

If servers land unevenly on the ring, some get far more traffic than others. The fix: place each server multiple times at different positions using multiple hash functions. These are called virtual nodes.

Server A → hashes to 60°, 180°, 300°
Server B → hashes to 30°, 150°, 270°
Server C → hashes to 90°, 210°, 330°

Result:
30°[B] 60°[A] 90°[C] 150°[B] 180°[A] 210°[C] 270°[B] 300°[A] 330°[C]

Servers interleave evenly around the ring. When one dies, its load spreads across all remaining servers proportionally. This is what Amazon DynamoDB, Apache Cassandra, and most large CDNs actually use.

Scenario	Simple Hashing	Consistent Hashing
Server removed	~100% of keys remap	~1/N keys remap
Server added	~100% of keys remap	~1/N keys remap
Load distribution	Even (if lucky)	Even with virtual nodes
Used in production	Rarely at scale	DynamoDB, Cassandra, CDNs

The Architecture So Far

Post 1:  User → Server → Database

Post 1:  User → [Server 1]
              → [Server 2] → Database
              → [Server 3]
         (but how do requests get routed?)

Post 2:  User → Load Balancer → [Server 1] → Database
                             → [Server 2] → Database
                             → [Server 3] → Database
         (with consistent hashing deciding distribution)

Each solution creates the next problem. That rhythm is exactly how distributed systems evolved historically.

Next in the series → Message Queues — The Superpower That Makes Systems Resilient

What Is System Design, Really?

Rupesh Konduru — Wed, 25 Mar 2026 19:58:04 +0000

And why your perfectly working code can still fail spectacularly at scale.

Let me start with something honest: I used to think system design was something only senior engineers needed to worry about. Write clean code, pass the tests, ship the feature. Done.

Then I started actually thinking about what happens when your app goes from 500 users to 500,000 — and I realized good code alone doesn't save you. The structure of your system is what either holds or collapses under pressure.

This is the first post in a three-part series where I break down the foundations of system design the way I wish someone had explained them to me — through real analogies, simple diagrams, and plain English.

The Restaurant That Went Viral

Imagine you open a small restaurant. Day one, it's just you — you cook, you serve, you clean. Ten customers walk in. Everything runs smoothly. You're happy.

Now imagine a food blogger with a million followers posts about your place. The next morning, 10,000 people show up.

Suddenly you need multiple chefs. A system for taking orders without everyone shouting at once. A pantry that restocks itself. A way to handle the dinner rush without the kitchen catching fire.

System design is the art of building software that doesn't fall apart when the world shows up at your door.

That's it. That's the whole field. Everything else is just details of how to do that well.

What System Design Actually Asks

When you solve a LeetCode problem, you're asking: does this work?

When you do system design, you're asking something completely different: does this work for ten million people, reliably, cheaply, and without going down at 2am on a Sunday?

These are two different kinds of thinking. The first is about correctness. The second is about architecture — and that's what this series is about.

The two goals every system must balance:

Scalability — Can it handle growth?
Reliability — Does it keep working when things go wrong?

Every design decision you ever make is a trade-off between these two (and cost). There's no perfect answer — only informed choices.

Your Starter Kit — The Building Blocks

Think of system design like LEGO. Before you build a castle, you need to know what pieces exist. Here's the vocabulary you need before anything else makes sense:

Component	What it does	Restaurant analogy
Client	The browser or app making requests	The customer walking in
Server	Processes incoming requests	The kitchen
Database	Stores data persistently	The pantry and fridge
Cache	Fast, temporary storage	Pre-prepped ingredients on the counter
Load Balancer	Distributes traffic across servers	The host who seats customers evenly
Message Queue	Holds tasks to be processed later	The order ticket rail in a diner

We'll go deep on each of these. For now, just know they exist and roughly what job they do.

Scaling: What Happens When Your App Blows Up

So your app got popular. Great problem to have. Now what?

You have exactly two moves. The mental model: your server is a worker in a factory.

Option 1 — Vertical Scaling

Make the worker stronger. Give your existing server more RAM, a faster CPU, more storage. Simple, no code changes needed, works immediately.

Before:  [ Server: 8GB RAM,  4 cores  ]
After:   [ Server: 64GB RAM, 32 cores ]

This works — until it doesn't. There's a physical ceiling to how powerful one machine can get. And here's the silent killer: if that one giant server goes down, everything goes down. You've built a very expensive single point of failure.

Option 2 — Horizontal Scaling

Instead of making one worker stronger, hire more workers. Add more servers and split the work between them.

Before:  [ Server 1 ]

After:   [ Server 1 ]  [ Server 2 ]  [ Server 3 ]

This is how Google, Amazon, and Netflix operate. Theoretically infinite — just keep adding machines. And if one dies, the others keep running. No single point of failure.

The downside? Complexity. Now you need something to coordinate these servers. And a new question emerges: if a user logs in on Server 1, does Server 3 know who they are?

The Stateless Insight That Makes It All Work

When you have multiple servers, a user might hit Server 1 on their first request and Server 3 on their next. If their login session was stored inside Server 1, Server 3 has no idea who they are.

The elegant fix: make your servers stateless. They don't remember anything about the user themselves. All session data lives in a shared database or cache that every server can reach.

❌ Stateful — bad for scaling:
User → Server 1 (remembers session) ✅
User → Server 3 (no memory)         ❌

✅ Stateless — good for scaling:
User → Server 1 → reads from shared DB ✅
User → Server 3 → reads from shared DB ✅

Every server becomes interchangeable — like identical chefs who all read from the same recipe book. It doesn't matter which one handles your order. The output is the same.

Don't Forget: Your Database Scales Too

Here's a mistake beginners almost always make. You scale your servers to 100 instances — but they're all hammering the same single database. That database becomes your new bottleneck. You've just moved the problem downstream.

Two techniques to know for now:

Replication — Copy your database across multiple machines. Reads get faster and you get built-in backups.

Sharding — Split your database into chunks. User IDs 1–1M on DB1, 1M–2M on DB2. Each machine handles a slice of the data.

The key insight: every layer of your system can become a bottleneck, and every layer can be scaled.

The Mental Model to Keep

Whenever someone asks "how would you scale X?" — think in layers:

Traffic surge hits →
  → Scale your servers (horizontal)
  → Put a Load Balancer in front
  → Make servers stateless
  → Scale your database (replication / sharding)
  → Add a Cache to reduce DB load
  → Add a CDN for static content

Each fix reveals the next bottleneck. That's not a bug — that's the game.

Anyone can write code. Not everyone can think about what happens when 10 million people run that code simultaneously.

That's what system design is training you to do.

Next in the series → Load Balancing & Consistent Hashing — The Art of Splitting Work Fairly