<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: aashna mahajan</title>
    <description>The latest articles on DEV Community by aashna mahajan (@aashna_mahajan).</description>
    <link>https://dev.to/aashna_mahajan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948100%2F5dc6a428-d0fc-415d-a2ec-46ce21d948d8.png</url>
      <title>DEV Community: aashna mahajan</title>
      <link>https://dev.to/aashna_mahajan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aashna_mahajan"/>
    <language>en</language>
    <item>
      <title>I Failed My First System Design Interviews. These 5 Concepts Were Why.</title>
      <dc:creator>aashna mahajan</dc:creator>
      <pubDate>Sun, 24 May 2026 22:37:04 +0000</pubDate>
      <link>https://dev.to/aashna_mahajan/i-bombed-my-first-system-design-interviews-these-5-concepts-were-why-4nb3</link>
      <guid>https://dev.to/aashna_mahajan/i-bombed-my-first-system-design-interviews-these-5-concepts-were-why-4nb3</guid>
      <description>&lt;p&gt;I failed three system design interviews in a row.&lt;/p&gt;

&lt;p&gt;Not because I didn't know the concepts. I knew them cold. Caching, sharding, consistent hashing, CAP theorem, message queues — I could define every one.&lt;/p&gt;

&lt;p&gt;What I couldn't do: answer what came next.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"What happens when the cache gets stale?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Why are you sharding this?"&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"So you'd ignore partition tolerance?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every time, I had the surface answer. Every time, the follow-up question exposed that I'd never thought one level deeper.&lt;/p&gt;

&lt;p&gt;These are the 5 places that gap cost me. Each one explained from scratch — with real code and real failure modes — so the follow-up question doesn't catch you off guard.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Everyone Adds a Cache. Almost Nobody Thinks About What Comes Next.
&lt;/h2&gt;

&lt;p&gt;Your phone saves images from apps you visit so it doesn't re-download them every time. That's a cache — a faster copy of data, closer to the user.&lt;/p&gt;

&lt;p&gt;In backend systems, instead of hitting a database on every request, you store hot data in something like &lt;strong&gt;Redis&lt;/strong&gt; — an in-memory store that responds in under a millisecond. Instagram uses it to serve your profile. Reddit uses it to serve hot posts to millions of readers simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A well-placed cache absorbs 80–95% of read traffic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's how cache-aside — the most common pattern — looks in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Check cache first
&lt;/span&gt;    &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# Cache hit ✓
&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 2: Cache miss — go to the database
&lt;/span&gt;    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_one&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Store result for next time (expires in 1 hour)
&lt;/span&gt;    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;

&lt;span class="c1"&gt;# ⚠️ The hidden danger: what if the user updates their profile?
# If you forget: redis.delete(f"user:{user_id}")
# ...they'll see stale data for up to an hour.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrwnm6fqo1v8y9qe0y0j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrwnm6fqo1v8y9qe0y0j.png" alt="Cache write strategies: Cache-Aside, Write-Through, and Write-Behind compared" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Three strategies — each is right in one situation and quietly catastrophic in the wrong one.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I once watched a team spend three days debugging incorrect pricing at checkout. The cache populated correctly on product creation — but silently failed to invalidate when the price &lt;em&gt;changed&lt;/em&gt;. Wrong prices served for six weeks. Code looked fine. Tests passed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A cache without an invalidation strategy isn't a performance win. It's a time bomb with a clean interface.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;Interviewer follow-up you must answer:&lt;/strong&gt; &lt;em&gt;"How does the cache know when to invalidate?"&lt;/em&gt; Have that answer ready before they ask.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  2. Sharding: Impressive on a Whiteboard, Painful in Production
&lt;/h2&gt;

&lt;p&gt;I once watched a candidate spend 25 minutes designing a sharding strategy for a system serving &lt;strong&gt;3,000 daily users&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Custom shard keys. Cross-shard routing. Resharding logic. The interviewer stopped him mid-sentence.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Why are you sharding this?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;He didn't have an answer. He didn't get the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over-engineering isn't ambition. It's anxiety wearing the mask of thoroughness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sharding splits your database across multiple servers — each one owns a slice of the data. It's real and powerful. It's also genuinely hard: joins across shards become painful, transactions need distributed coordination, and debugging requires knowing which shard holds your data.&lt;/p&gt;

&lt;p&gt;The right order before you even think about sharding:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3ogblh32mu6jyba8t5g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3ogblh32mu6jyba8t5g.png" alt="Scale ladder showing progression from single DB through read replicas and caching to sharding" width="800" height="1131"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Exhaust every step before moving to the next. Most systems never need to go past step 2.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Bad shard key — timestamp creates a "hot shard"
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_shard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;NUM_SHARDS&lt;/span&gt;
    &lt;span class="c1"&gt;# All new writes hit the same shard. Others sit idle.
&lt;/span&gt;
&lt;span class="c1"&gt;# ✅ Good shard key — user_id distributes evenly
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_shard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;NUM_SHARDS&lt;/span&gt;
    &lt;span class="c1"&gt;# Load spreads evenly. No single shard gets hammered.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instagram ran on a single &lt;strong&gt;Postgres&lt;/strong&gt; instance far longer than most people realize. They only sharded when simpler options genuinely couldn't keep up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;The answer that lands:&lt;/strong&gt; Not &lt;em&gt;"here's my sharding strategy."&lt;/em&gt; But &lt;em&gt;"here's why I'd exhaust vertical scaling, read replicas, and caching first — and here's the signal that would tell me it's time."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Why Adding One Server Can Break Your Entire Cache
&lt;/h2&gt;

&lt;p&gt;This one surprises people. Let's see it in code first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Naive modulo — breaks the moment you add a server
&lt;/span&gt;&lt;span class="n"&gt;servers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# → "server_2"
&lt;/span&gt;
&lt;span class="c1"&gt;# You add a 4th server to handle more load...
&lt;/span&gt;&lt;span class="n"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;get_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# → "server_1"  ← different server!
&lt;/span&gt;
&lt;span class="c1"&gt;# Almost every key now maps somewhere new.
# Your entire cache just went cold. Enjoy the database stampede.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a real problem at scale. Adding one server to a cache cluster can invalidate most of your cached data at once — causing every request to hit the database simultaneously. That's an outage, not a scaling win.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consistent hashing&lt;/strong&gt; is the fix. Instead of &lt;code&gt;key % N&lt;/code&gt;, both servers and keys are mapped onto a circular ring. Each key belongs to the nearest server clockwise.&lt;/p&gt;

&lt;p&gt;Add a server? It takes only the keys between itself and its neighbor — roughly 1/N of data. Everything else stays put.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb55yz2czj2rjefpevhqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb55yz2czj2rjefpevhqp.png" alt="Consistent hashing ring showing Node A, B, C with new Node D being inserted and only a small arc of keys remapped" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Adding one node displaces ~1/N keys. With naive modulo hashing, you'd be moving almost everything.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Redis Cluster, Cassandra, and DynamoDB all use consistent hashing under the hood. Akamai — one of the world's largest CDNs — was built on it. Their founders wrote the original academic paper on it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;Why this matters in an interview:&lt;/strong&gt; Most candidates skip this. Knowing &lt;em&gt;why&lt;/em&gt; consistent hashing exists — not just what it is — signals you've thought seriously about distributed systems.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  4. CAP Theorem Is Taught Wrong. Here's What It Actually Means.
&lt;/h2&gt;

&lt;p&gt;I once confidently explained CAP theorem in an interview. The interviewer looked up and asked:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"So you'd consider building a system that ignores partition tolerance?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I had nothing. The conversation got uncomfortable fast.&lt;/p&gt;

&lt;p&gt;Here's the problem: &lt;strong&gt;CAP is almost always taught as "pick any two." That's misleading.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Partition tolerance — the ability to keep working when servers can't talk to each other — isn't optional. Networks fail. It will happen to your system. So the real choice is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When a network partition occurs, do you prioritize consistency or availability?&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;CP (Consistency first)&lt;/th&gt;
&lt;th&gt;AP (Availability first)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Behaviour&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Refuses requests until partition heals&lt;/td&gt;
&lt;td&gt;Keeps responding, may return stale data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Risk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Downtime during failures&lt;/td&gt;
&lt;td&gt;Stale reads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use when&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Payments, inventory, anything financial&lt;/td&gt;
&lt;td&gt;Social feeds, recommendations, analytics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Postgres, CockroachDB, ZooKeeper&lt;/td&gt;
&lt;td&gt;Cassandra, DynamoDB, CouchDB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Facebook chose AP for their social graph — a slightly stale follower count beats an app that won't load. A payments system needs CP — you'd rather reject a transaction than double-charge someone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;The move:&lt;/strong&gt; Don't just define CAP — apply it. &lt;em&gt;"This is a payments system, so I'd use Postgres with synchronous replication. I'd rather reject a write during a network failure than risk charging someone twice."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  5. Message Queues Don't Guarantee What You Think They Guarantee
&lt;/h2&gt;

&lt;p&gt;Picture a restaurant on a Friday night. Orders are flying in faster than the kitchen can handle. If every waiter walked directly to a chef and demanded immediate attention, the kitchen collapses.&lt;/p&gt;

&lt;p&gt;Instead: orders go on a ticket rail. Chefs work through them steadily. The kitchen stays calm no matter how busy the front gets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's a message queue.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Uber's trip events flow through Kafka. Netflix triggers encoding jobs through queues. Slack's notification pipeline is async. The pattern is everywhere.&lt;/p&gt;

&lt;p&gt;Most candidates in interviews draw a queue, say "this decouples the services," and move on. That's not wrong. But here's what nobody mentions until they're paged at 3am:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queues guarantee at-least-once delivery. Not exactly-once.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your consumer will sometimes process the same message twice — a network timeout triggers a retry, or a crash causes redelivery. Without protection, a user gets charged twice, an email sends twice, a report generates twice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payment_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# ✅ Idempotency check — safe to receive this twice
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;payment_already_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Already processed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;payment_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Skipping.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# Process only if we haven't seen this before
&lt;/span&gt;    &lt;span class="n"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;card_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_payment_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Without this: duplicate charge on retry.
# With this: second delivery is a no-op. ✓
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60r8iljzmun6vpp3bjn1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60r8iljzmun6vpp3bjn1.png" alt="Queue vs Pub/Sub: Queue routes one message to one consumer, Pub/Sub fans out one event to multiple consumers" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Queue delivers to one. Pub/Sub delivers to all. Get this wrong and you'll either starve consumers or duplicate work across all of them.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The property that saves you is called &lt;strong&gt;idempotency&lt;/strong&gt; — running an operation twice produces the same result as running it once. Stripe's payment API is idempotent by design. Every queue-based incident I've seen had idempotency missing somewhere.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;The line that stands out:&lt;/strong&gt; &lt;em&gt;"Consumers will check for duplicate message IDs before processing — queues guarantee at-least-once delivery, not exactly-once."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Real Interview Starts Before You Pick Up the Pen
&lt;/h2&gt;

&lt;p&gt;Every concept here has a surface answer and a real answer.&lt;/p&gt;

&lt;p&gt;Surface answers get you through the definition check. Real answers — knowing that cache invalidation is harder than caching, that sharding is a last resort, that partition tolerance isn't optional, that queues will deliver your message twice — that's what separates candidates who studied from engineers who've built and broken these systems.&lt;/p&gt;

&lt;p&gt;The candidate who got &lt;em&gt;"exactly right"&lt;/em&gt; out loud didn't know more patterns than me. He asked one question before drawing a single box:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What scale are we actually targeting here?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That question. Every time. Before the pen touches the whiteboard.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? I write about system design, engineering interviews, and real production systems. Follow along — more coming.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>webdev</category>
      <category>career</category>
      <category>interview</category>
    </item>
  </channel>
</rss>
