<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rupesh Konduru</title>
    <description>The latest articles on DEV Community by Rupesh Konduru (@rupesh_konduru_7516122dd2).</description>
    <link>https://dev.to/rupesh_konduru_7516122dd2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843412%2F44f8b6ef-e384-4587-b429-c43005491eeb.png</url>
      <title>DEV Community: Rupesh Konduru</title>
      <link>https://dev.to/rupesh_konduru_7516122dd2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rupesh_konduru_7516122dd2"/>
    <language>en</language>
    <item>
      <title>Message Queues &amp; Why Async Changes Everything</title>
      <dc:creator>Rupesh Konduru</dc:creator>
      <pubDate>Mon, 30 Mar 2026 13:11:00 +0000</pubDate>
      <link>https://dev.to/rupesh_konduru_7516122dd2/message-queues-why-async-changes-everything-5eod</link>
      <guid>https://dev.to/rupesh_konduru_7516122dd2/message-queues-why-async-changes-everything-5eod</guid>
      <description>&lt;p&gt;What if the two sides of a conversation don't need to be available at the same time? That one idea unlocks a completely different way of building systems.&lt;/p&gt;




&lt;p&gt;Think about the difference between a phone call and an email.&lt;/p&gt;

&lt;p&gt;A phone call requires both people present at the exact same moment. If the other person is busy, you're blocked. Nothing happens until they pick up.&lt;/p&gt;

&lt;p&gt;An email is different. You write it, send it, move on. The other person reads it when they're ready. You're not waiting. Life continues.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That difference — synchronous vs asynchronous — is the entire soul of Message Queues.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And once you understand it, you'll see it everywhere in systems you use every day.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Talking Directly
&lt;/h2&gt;

&lt;p&gt;In a typical system, when Service A needs Service B to do something, it calls it directly and waits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Service A ──→ HTTP Request ──→ Service B
              (waits...)
Service A ←── Response    ←── Service B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean, simple — and fragile in ways that only reveal themselves at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tight Coupling:&lt;/strong&gt; Both services must be running simultaneously. If B crashes, A crashes too. Two independent services become dependent on each other's heartbeat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed Mismatch:&lt;/strong&gt; What if Service A fires 10,000 requests per second but B can only process 500? Requests pile up, time out, and fail. A is screaming into a bottleneck it has no control over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Safety Net:&lt;/strong&gt; If B is temporarily down and A's request fails, that work is just gone. A needs complex retry logic or accepts data loss.&lt;/p&gt;

&lt;p&gt;Message Queues solve all three simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Diner Analogy
&lt;/h2&gt;

&lt;p&gt;Picture a busy restaurant. When a customer orders, the waiter doesn't march into the kitchen and stand there waiting until the food is ready before taking another order. The entire front of house would grind to a halt.&lt;/p&gt;

&lt;p&gt;Instead, the waiter writes the order on a ticket, clips it to the rail, and goes back to take more orders. The kitchen picks up tickets at its own pace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Waiter ──→ Order ticket rail ──→ Kitchen
(Producer)  (Message Queue)    (Consumer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The waiter doesn't care how long the kitchen takes&lt;/li&gt;
&lt;li&gt;The kitchen doesn't get overwhelmed by 50 simultaneous verbal orders&lt;/li&gt;
&lt;li&gt;If a chef calls in sick, tickets pile up briefly — nothing is lost&lt;/li&gt;
&lt;li&gt;You can hire more chefs independently of the front of house&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly how message queues work in software.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Superpowers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ⚡ Superpower 1 — Decoupling
&lt;/h3&gt;

&lt;p&gt;Without a queue, your User Service has direct wires to your Email Service, Analytics Service, and Notification Service. If any one of them goes down, your User Service feels it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Without Queue:
User Service ──→ Email Service
User Service ──→ Analytics Service
User Service ──→ Notification Service
(breaks if ANY of these go down)

✅ With Queue:
User Service ──→ [QUEUE]
                    ↓
               Email Service reads when ready
               Analytics Service reads when ready
               Notification Service reads when ready
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can add new consumers — new services that react to events — without touching the producer at all. Plug-and-play architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛡️ Superpower 2 — Durability
&lt;/h3&gt;

&lt;p&gt;Messages sit in the queue until they're successfully processed. If the consumer crashes mid-task, the message doesn't disappear — it gets redelivered when the consumer comes back up.&lt;/p&gt;

&lt;p&gt;This works through &lt;strong&gt;acknowledgements&lt;/strong&gt;. The queue only deletes a message after the consumer explicitly says "I handled this successfully." Your system can crash and restart without losing a single unit of work.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 Superpower 3 — Load Leveling
&lt;/h3&gt;

&lt;p&gt;Imagine a sudden surge: Black Friday, a viral post, a TV segment about your app. Without a queue, 10,000 requests per second hitting a service that handles 500 means collapse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without Queue:
10,000 req/sec → Service B (handles 500/sec) → 💀

With Queue:
10,000 req/sec → Queue (holds patiently)
                   ↓
              Service B processes at 500/sec → ✅ everything handled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queue acts as a &lt;strong&gt;shock absorber&lt;/strong&gt;. Your system bends instead of breaks. You can also spin up more consumers automatically when the backlog grows — scaling in direct response to real demand.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Goes Into a Queue?
&lt;/h2&gt;

&lt;p&gt;Anything that doesn't need an instant response:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Producer&lt;/th&gt;
&lt;th&gt;Consumer(s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;User signs up&lt;/td&gt;
&lt;td&gt;Auth Service&lt;/td&gt;
&lt;td&gt;Email Service → welcome email&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Video uploaded&lt;/td&gt;
&lt;td&gt;Upload Service&lt;/td&gt;
&lt;td&gt;Transcoding Service → compression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Order placed&lt;/td&gt;
&lt;td&gt;Order Service&lt;/td&gt;
&lt;td&gt;Inventory, Billing, Notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image posted&lt;/td&gt;
&lt;td&gt;App Server&lt;/td&gt;
&lt;td&gt;Thumbnail generator, content moderation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any log event&lt;/td&gt;
&lt;td&gt;Any service&lt;/td&gt;
&lt;td&gt;Analytics and monitoring pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern: anything that can happen &lt;em&gt;a moment after&lt;/em&gt; the user gets their response belongs in a queue. The user doesn't need to wait for their welcome email before they see the dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use a Queue
&lt;/h2&gt;

&lt;p&gt;Async isn't always better. Sometimes you genuinely need a direct answer right now.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use synchronous when&lt;/th&gt;
&lt;th&gt;Use async (queue) when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;User is waiting for the result&lt;/td&gt;
&lt;td&gt;It can happen in the background&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast, simple operations&lt;/td&gt;
&lt;td&gt;Long-running or heavy processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checking login credentials&lt;/td&gt;
&lt;td&gt;Sending a welcome email&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payment confirmation&lt;/td&gt;
&lt;td&gt;Generating a monthly PDF statement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A payment confirmation needs to be synchronous — the user is staring at a spinner. Generating their statement PDF? Queue it. Learning to tell the difference is one of the core instincts of a backend engineer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tools you'll encounter:&lt;/strong&gt; Kafka, RabbitMQ, Amazon SQS, Google Pub/Sub. The concept is identical across all of them — producer, queue, consumer. The details differ.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Full Picture — Everything Together
&lt;/h2&gt;

&lt;p&gt;Here's how our architecture evolved across all three posts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Post 1 — The Beginning:
  User → Server → Database
  Works great. Until 500,000 people show up.

Post 1 — Scaling:
  User → [Server 1]
       → [Server 2] → Database
       → [Server 3]
  More capacity. But how do requests get routed?

Post 2 — Load Balancing:
  User → Load Balancer → [Server 1] → Database
                      → [Server 2] → Database
                      → [Server 3] → Database
  Traffic distributes intelligently now.

Post 2 — Consistent Hashing:
  Same setup, but servers and caches use a hash ring.
  A node dying reshuffles ~1/N keys instead of everything.

Post 3 — Message Queues:
  User → Load Balancer → [Servers]
                              |
                       [MESSAGE QUEUE]
                              |
                    Worker Services (async)
                              |
                           Database
  Heavy work moves off the critical path.
  The system absorbs spikes. Nothing is lost.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That final architecture isn't exotic. It's the baseline of how most production systems you interact with every day are built — Instagram, Spotify, WhatsApp. The specific implementations differ, but the principles are exactly these.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every solution introduces the next problem. That's not a bug — that's the game. And once you see the pattern, you can't unsee it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;We've covered the foundation layer. But there's a whole second layer waiting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Databases at scale&lt;/strong&gt; — SQL vs NoSQL, replication, sharding, CAP theorem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching&lt;/strong&gt; — Redis, Memcached, cache invalidation strategies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDNs&lt;/strong&gt; — How static content gets served from 50ms away no matter where you are&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate Limiting&lt;/strong&gt; — How systems protect themselves from being overwhelmed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these connects back to the five concepts we covered in this series. The vocabulary you've built here is the foundation everything else sits on.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 3 of the System Design from First Principles series.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;← Part 1: What Is System Design, Really?&lt;/em&gt;&lt;br&gt;
&lt;em&gt;← Part 2: Load Balancing &amp;amp; Consistent Hashing&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>Load Balancing &amp; Consistent Hashing — The Art of Splitting Work Fairly</title>
      <dc:creator>Rupesh Konduru</dc:creator>
      <pubDate>Thu, 26 Mar 2026 17:58:00 +0000</pubDate>
      <link>https://dev.to/rupesh_konduru_7516122dd2/load-balancing-consistent-hashing-the-art-of-splitting-work-fairly-43ma</link>
      <guid>https://dev.to/rupesh_konduru_7516122dd2/load-balancing-consistent-hashing-the-art-of-splitting-work-fairly-43ma</guid>
      <description>&lt;p&gt;You hired ten servers. Now someone needs to hand out the work — fairly, intelligently, and without breaking when one of them disappears.&lt;/p&gt;




&lt;p&gt;In the last post, we talked about horizontal scaling — adding more servers to handle more traffic. It sounds simple enough. But here's the question nobody asks out loud: &lt;em&gt;how does a user's request know which server to go to?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you just point everyone at the same IP address, they all pile into Server 1 while Server 2 and Server 3 sit there doing nothing. You've spent money on more machines and gained absolutely nothing.&lt;/p&gt;

&lt;p&gt;You need a traffic director. That's what this post is about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Load Balancer
&lt;/h2&gt;

&lt;p&gt;A Load Balancer sits in front of all your servers and acts as the single point of contact for every incoming request. Users talk to it, it decides which server handles the work, and the server responds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                     ┌─────────────────┐
                     │                 │──→ Server 1
Users ──→ Load Balancer                │──→ Server 2
                     │                 │──→ Server 3
                     └─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the user's perspective, they're talking to one address. They have no idea ten servers exist behind it. That invisibility is intentional — and it's one of the most elegant things about how the web works.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does It Actually Decide?
&lt;/h2&gt;

&lt;p&gt;There are several routing strategies, each with a different personality:&lt;/p&gt;

&lt;h3&gt;
  
  
  🔄 Round Robin — Take turns
&lt;/h3&gt;

&lt;p&gt;Request 1 → Server 1. Request 2 → Server 2. Request 3 → Server 3. Request 4 → back to Server 1.&lt;/p&gt;

&lt;p&gt;✅ Dead simple, zero overhead&lt;br&gt;
❌ Doesn't account for request weight — a heavy video upload and a tiny ping get treated identically&lt;/p&gt;
&lt;h3&gt;
  
  
  ⚖️ Weighted Round Robin — Not all workers are equal
&lt;/h3&gt;

&lt;p&gt;Same rotation, but servers get weights based on capacity. A powerful server might get 3 out of every 5 requests while a smaller one gets 2.&lt;/p&gt;

&lt;p&gt;✅ Great when your servers have different specs&lt;br&gt;
❌ Still doesn't account for what's actually happening on each server right now&lt;/p&gt;
&lt;h3&gt;
  
  
  🔗 Least Connections — Go to whoever is least busy
&lt;/h3&gt;

&lt;p&gt;The load balancer tracks active connections in real time and always routes to the least busy server.&lt;/p&gt;

&lt;p&gt;✅ Smart and dynamic — handles variable request durations well&lt;br&gt;
❌ Slightly more overhead to track connection counts continuously&lt;/p&gt;
&lt;h3&gt;
  
  
  🔒 IP Hashing — Same user, same server
&lt;/h3&gt;

&lt;p&gt;The user's IP address gets hashed and always maps to the same server.&lt;/p&gt;

&lt;p&gt;✅ Useful for stateful sessions that can't be refactored&lt;br&gt;
❌ If that server goes down, re-routing gets complicated — and this leads us to our next topic&lt;/p&gt;


&lt;h2&gt;
  
  
  Layer 4 vs Layer 7 — Two Kinds of Intelligence
&lt;/h2&gt;

&lt;p&gt;Load balancers can operate at different levels of the network stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;What it sees&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Layer 4&lt;/td&gt;
&lt;td&gt;IP addresses and TCP info only&lt;/td&gt;
&lt;td&gt;Raw speed, simple routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 7&lt;/td&gt;
&lt;td&gt;Full HTTP content — URL, headers, cookies&lt;/td&gt;
&lt;td&gt;Smart, content-aware routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Layer 7 is where things get powerful. You can route requests to completely different server clusters based on what the request is actually asking for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/api/videos  ──→  Video processing servers
/api/auth    ──→  Auth servers
/api/search  ──→  Search servers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is called &lt;strong&gt;path-based routing&lt;/strong&gt; and it's the backbone of how microservices are structured at real companies.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Wait — isn't the load balancer itself a single point of failure?&lt;/strong&gt;&lt;br&gt;
Yes. The fix: run multiple load balancers. One active, one on standby. If the active one goes silent, the standby takes over automatically. This is called &lt;strong&gt;active-passive failover&lt;/strong&gt; and it shows up everywhere in resilient system design.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Consistent Hashing — When Servers Come and Go
&lt;/h2&gt;

&lt;p&gt;IP hashing introduced a sneaky problem: what happens when a server dies? Let me show you why this is nastier than it sounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Catastrophe of Simple Hashing
&lt;/h3&gt;

&lt;p&gt;Say you have 3 cache servers and a simple formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;server_index = hash(key) % number_of_servers

hash("user_123") % 3 = 1  → Server 1
hash("user_456") % 3 = 2  → Server 2
hash("user_789") % 3 = 0  → Server 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works perfectly. Until Server 1 crashes. Now you have 2 servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hash("user_123") % 2 = 1  → Server 1 (gone 💀)
hash("user_456") % 2 = 0  → Server 0 (was on Server 2!)
hash("user_789") % 2 = 1  → Server 1 (gone 💀)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Almost every key remaps to a different server. In a cache, this triggers a massive wave of cache misses — every request now hits your database directly. Your database gets hammered. Your system crawls to a halt. All because one server went down.&lt;/p&gt;

&lt;p&gt;This is the problem Consistent Hashing was invented to solve.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hash Ring
&lt;/h3&gt;

&lt;p&gt;Imagine a ring — like a clock face — numbered 0 to 360 degrees. This is called the &lt;strong&gt;hash ring&lt;/strong&gt;. Both your servers and your data keys get hashed onto this same ring.&lt;/p&gt;

&lt;p&gt;The rule is beautifully simple: to find which server handles a key, start at that key's position and &lt;strong&gt;walk clockwise until you hit a server.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user_123 at 120° → walks clockwise → hits Server B at 180° ✅
user_456 at 200° → walks clockwise → hits Server C at 270° ✅
user_789 at 300° → walks clockwise → wraps around → Server A at 90° ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Now Watch What Happens When a Server Dies
&lt;/h3&gt;

&lt;p&gt;Server B at 180° crashes. What happens to user_123 at 120°?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: user_123 at 120° → Server B at 180° ✅
After:  user_123 at 120° → Server B GONE
                         → keeps walking → Server C at 270° ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only the keys pointing to Server B get reassigned — flowing to the next server clockwise. Every other key? Completely undisturbed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;With simple hashing, one server dying reshuffles everything. With consistent hashing, it only affects about 1/N of your keys.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That difference is the entire reason this algorithm exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Virtual Nodes — Fixing Uneven Distribution
&lt;/h3&gt;

&lt;p&gt;If servers land unevenly on the ring, some get far more traffic than others. The fix: place each server multiple times at different positions using multiple hash functions. These are called &lt;strong&gt;virtual nodes&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server A → hashes to 60°, 180°, 300°
Server B → hashes to 30°, 150°, 270°
Server C → hashes to 90°, 210°, 330°

Result:
30°[B] 60°[A] 90°[C] 150°[B] 180°[A] 210°[C] 270°[B] 300°[A] 330°[C]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Servers interleave evenly around the ring. When one dies, its load spreads across &lt;em&gt;all&lt;/em&gt; remaining servers proportionally. This is what Amazon DynamoDB, Apache Cassandra, and most large CDNs actually use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Simple Hashing&lt;/th&gt;
&lt;th&gt;Consistent Hashing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server removed&lt;/td&gt;
&lt;td&gt;~100% of keys remap&lt;/td&gt;
&lt;td&gt;~1/N keys remap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server added&lt;/td&gt;
&lt;td&gt;~100% of keys remap&lt;/td&gt;
&lt;td&gt;~1/N keys remap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load distribution&lt;/td&gt;
&lt;td&gt;Even (if lucky)&lt;/td&gt;
&lt;td&gt;Even with virtual nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Used in production&lt;/td&gt;
&lt;td&gt;Rarely at scale&lt;/td&gt;
&lt;td&gt;DynamoDB, Cassandra, CDNs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Architecture So Far
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Post 1:  User → Server → Database

Post 1:  User → [Server 1]
              → [Server 2] → Database
              → [Server 3]
         (but how do requests get routed?)

Post 2:  User → Load Balancer → [Server 1] → Database
                             → [Server 2] → Database
                             → [Server 3] → Database
         (with consistent hashing deciding distribution)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each solution creates the next problem. That rhythm is exactly how distributed systems evolved historically.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next in the series → Message Queues — The Superpower That Makes Systems Resilient&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>What Is System Design, Really?</title>
      <dc:creator>Rupesh Konduru</dc:creator>
      <pubDate>Wed, 25 Mar 2026 19:58:04 +0000</pubDate>
      <link>https://dev.to/rupesh_konduru_7516122dd2/what-is-system-design-really-3j9n</link>
      <guid>https://dev.to/rupesh_konduru_7516122dd2/what-is-system-design-really-3j9n</guid>
      <description>&lt;p&gt;And why your perfectly working code can still fail spectacularly at scale.&lt;/p&gt;




&lt;p&gt;Let me start with something honest: I used to think system design was something only senior engineers needed to worry about. Write clean code, pass the tests, ship the feature. Done.&lt;/p&gt;

&lt;p&gt;Then I started actually thinking about what happens when your app goes from 500 users to 500,000 — and I realized good code alone doesn't save you. The &lt;em&gt;structure&lt;/em&gt; of your system is what either holds or collapses under pressure.&lt;/p&gt;

&lt;p&gt;This is the first post in a three-part series where I break down the foundations of system design the way I wish someone had explained them to me — through real analogies, simple diagrams, and plain English.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Restaurant That Went Viral
&lt;/h2&gt;

&lt;p&gt;Imagine you open a small restaurant. Day one, it's just you — you cook, you serve, you clean. Ten customers walk in. Everything runs smoothly. You're happy.&lt;/p&gt;

&lt;p&gt;Now imagine a food blogger with a million followers posts about your place. The next morning, 10,000 people show up.&lt;/p&gt;

&lt;p&gt;Suddenly you need multiple chefs. A system for taking orders without everyone shouting at once. A pantry that restocks itself. A way to handle the dinner rush without the kitchen catching fire.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;System design is the art of building software that doesn't fall apart when the world shows up at your door.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. That's the whole field. Everything else is just details of &lt;em&gt;how&lt;/em&gt; to do that well.&lt;/p&gt;

&lt;h2&gt;
  
  
  What System Design Actually Asks
&lt;/h2&gt;

&lt;p&gt;When you solve a LeetCode problem, you're asking: &lt;em&gt;does this work?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you do system design, you're asking something completely different: &lt;em&gt;does this work for ten million people, reliably, cheaply, and without going down at 2am on a Sunday?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These are two different kinds of thinking. The first is about correctness. The second is about architecture — and that's what this series is about.&lt;/p&gt;

&lt;p&gt;The two goals every system must balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; — Can it handle growth?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt; — Does it keep working when things go wrong?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every design decision you ever make is a trade-off between these two (and cost). There's no perfect answer — only informed choices.&lt;/p&gt;




&lt;h2&gt;
  
  
  Your Starter Kit — The Building Blocks
&lt;/h2&gt;

&lt;p&gt;Think of system design like LEGO. Before you build a castle, you need to know what pieces exist. Here's the vocabulary you need before anything else makes sense:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Restaurant analogy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Client&lt;/td&gt;
&lt;td&gt;The browser or app making requests&lt;/td&gt;
&lt;td&gt;The customer walking in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Processes incoming requests&lt;/td&gt;
&lt;td&gt;The kitchen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Stores data persistently&lt;/td&gt;
&lt;td&gt;The pantry and fridge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;Fast, temporary storage&lt;/td&gt;
&lt;td&gt;Pre-prepped ingredients on the counter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load Balancer&lt;/td&gt;
&lt;td&gt;Distributes traffic across servers&lt;/td&gt;
&lt;td&gt;The host who seats customers evenly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Message Queue&lt;/td&gt;
&lt;td&gt;Holds tasks to be processed later&lt;/td&gt;
&lt;td&gt;The order ticket rail in a diner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We'll go deep on each of these. For now, just know they exist and roughly what job they do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scaling: What Happens When Your App Blows Up
&lt;/h2&gt;

&lt;p&gt;So your app got popular. Great problem to have. Now what?&lt;/p&gt;

&lt;p&gt;You have exactly two moves. The mental model: your server is a worker in a factory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1 — Vertical Scaling
&lt;/h3&gt;

&lt;p&gt;Make the worker stronger. Give your existing server more RAM, a faster CPU, more storage. Simple, no code changes needed, works immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before:  [ Server: 8GB RAM,  4 cores  ]
After:   [ Server: 64GB RAM, 32 cores ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works — until it doesn't. There's a physical ceiling to how powerful one machine can get. And here's the silent killer: if that one giant server goes down, &lt;em&gt;everything&lt;/em&gt; goes down. You've built a very expensive single point of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2 — Horizontal Scaling
&lt;/h3&gt;

&lt;p&gt;Instead of making one worker stronger, hire more workers. Add more servers and split the work between them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before:  [ Server 1 ]

After:   [ Server 1 ]  [ Server 2 ]  [ Server 3 ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how Google, Amazon, and Netflix operate. Theoretically infinite — just keep adding machines. And if one dies, the others keep running. No single point of failure.&lt;/p&gt;

&lt;p&gt;The downside? Complexity. Now you need something to coordinate these servers. And a new question emerges: if a user logs in on Server 1, does Server 3 know who they are?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stateless Insight That Makes It All Work
&lt;/h2&gt;

&lt;p&gt;When you have multiple servers, a user might hit Server 1 on their first request and Server 3 on their next. If their login session was stored &lt;em&gt;inside&lt;/em&gt; Server 1, Server 3 has no idea who they are.&lt;/p&gt;

&lt;p&gt;The elegant fix: make your servers &lt;strong&gt;stateless&lt;/strong&gt;. They don't remember anything about the user themselves. All session data lives in a shared database or cache that every server can reach.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Stateful — bad for scaling:
User → Server 1 (remembers session) ✅
User → Server 3 (no memory)         ❌

✅ Stateless — good for scaling:
User → Server 1 → reads from shared DB ✅
User → Server 3 → reads from shared DB ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every server becomes interchangeable — like identical chefs who all read from the same recipe book. It doesn't matter which one handles your order. The output is the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Forget: Your Database Scales Too
&lt;/h2&gt;

&lt;p&gt;Here's a mistake beginners almost always make. You scale your servers to 100 instances — but they're all hammering the same single database. That database becomes your new bottleneck. You've just moved the problem downstream.&lt;/p&gt;

&lt;p&gt;Two techniques to know for now:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication&lt;/strong&gt; — Copy your database across multiple machines. Reads get faster and you get built-in backups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sharding&lt;/strong&gt; — Split your database into chunks. User IDs 1–1M on DB1, 1M–2M on DB2. Each machine handles a slice of the data.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;em&gt;every layer of your system can become a bottleneck, and every layer can be scaled.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mental Model to Keep
&lt;/h2&gt;

&lt;p&gt;Whenever someone asks "how would you scale X?" — think in layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traffic surge hits →
  → Scale your servers (horizontal)
  → Put a Load Balancer in front
  → Make servers stateless
  → Scale your database (replication / sharding)
  → Add a Cache to reduce DB load
  → Add a CDN for static content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each fix reveals the next bottleneck. That's not a bug — that's the game.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Anyone can write code. Not everyone can think about what happens when 10 million people run that code simultaneously.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's what system design is training you to do.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next in the series → Load Balancing &amp;amp; Consistent Hashing — The Art of Splitting Work Fairly&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
