<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rachit Misra</title>
    <description>The latest articles on DEV Community by Rachit Misra (@hungryformore).</description>
    <link>https://dev.to/hungryformore</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3852556%2F81a358ca-0516-4761-a62f-e462e280b523.jpg</url>
      <title>DEV Community: Rachit Misra</title>
      <link>https://dev.to/hungryformore</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hungryformore"/>
    <language>en</language>
    <item>
      <title>Designing Instagram at Scale: A Complete System Design Deep Dive</title>
      <dc:creator>Rachit Misra</dc:creator>
      <pubDate>Fri, 03 Apr 2026 03:59:26 +0000</pubDate>
      <link>https://dev.to/hungryformore/designing-instagram-at-scale-a-complete-system-design-deep-dive-49ef</link>
      <guid>https://dev.to/hungryformore/designing-instagram-at-scale-a-complete-system-design-deep-dive-49ef</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;From a ₹800/month server to 500M daily users — every component, every trade-off, every edge case.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Why Instagram is a perfect system design problem&lt;/li&gt;
&lt;li&gt;The numbers that define the problem&lt;/li&gt;
&lt;li&gt;The scaling journey — Stage by Stage&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Feed Generation&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Stories &amp;amp; Expiry&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Media Upload &amp;amp; CDN&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Notifications&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Search &amp;amp; Discovery&lt;/li&gt;
&lt;li&gt;Component Deep Dive: Likes &amp;amp; Comments&lt;/li&gt;
&lt;li&gt;Database Design — Every Decision Justified&lt;/li&gt;
&lt;li&gt;API Design — Full Contracts&lt;/li&gt;
&lt;li&gt;Edge Cases Nobody Draws on Their Diagram&lt;/li&gt;
&lt;li&gt;Key Trade-offs Summary&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Why Instagram is a Perfect System Design Problem {#why-instagram}
&lt;/h2&gt;

&lt;p&gt;Instagram sits at the intersection of every hard distributed systems problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-heavy&lt;/strong&gt; (people scroll more than they post)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write-heavy at peaks&lt;/strong&gt; (52,000 likes per second)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media-intensive&lt;/strong&gt; (photos, videos, reels, stories)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time&lt;/strong&gt; (stories expire, feeds update, notifications land)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Socially connected&lt;/strong&gt; (the graph makes everything harder)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Globally distributed&lt;/strong&gt; (500M users across every timezone)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not one hard problem. It’s eight hard problems running simultaneously, sharing infrastructure, with users who notice every hiccup.&lt;/p&gt;

&lt;p&gt;This is why it appears in almost every senior system design interview. And why most candidates fail it — not because they don’t know the components, but because they don’t know &lt;em&gt;why&lt;/em&gt; each component exists and what breaks without it.&lt;/p&gt;

&lt;p&gt;This article covers everything. By the end you’ll be able to design Instagram from first principles, justify every decision, and handle every curveball an interviewer can throw.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Numbers That Define the Problem {#the-numbers}
&lt;/h2&gt;

&lt;p&gt;Before writing a single box on your architecture diagram, establish the scale. This isn’t optional ceremony — it determines every design decision you make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2B registered users&lt;/li&gt;
&lt;li&gt;500M Daily Active Users (DAU)&lt;/li&gt;
&lt;li&gt;Peak concurrent users: ~50M&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Content scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100M photos/videos uploaded per day → &lt;strong&gt;~1,150 uploads/second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;500M stories created per day&lt;/li&gt;
&lt;li&gt;4.5B likes per day → &lt;strong&gt;~52,000 likes/second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;100M comments per day → ~1,150 comments/second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Read scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each user opens the app ~7x/day&lt;/li&gt;
&lt;li&gt;3.5B feed loads/day → &lt;strong&gt;~40,000 feed requests/second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Feed load is your most expensive operation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Storage scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average photo: 3MB (after compression)&lt;/li&gt;
&lt;li&gt;100M photos/day × 3MB × 3 sizes = ~900TB new storage per day&lt;/li&gt;
&lt;li&gt;Video and reels multiply this significantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Derived constraints:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read:Write ratio ≈ 80:20 (mostly read)&lt;/li&gt;
&lt;li&gt;Feed generation is the critical path&lt;/li&gt;
&lt;li&gt;Like storage needs write-optimised infrastructure&lt;/li&gt;
&lt;li&gt;Media storage needs a CDN — serving from origin is impossible at this scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now you can design. Everything flows from these numbers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uoabdrsawgigy79p3rb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uoabdrsawgigy79p3rb.png" alt=" " width="800" height="1006"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The Scaling Journey — Stage by Stage {#scaling-journey}
&lt;/h2&gt;

&lt;p&gt;The biggest mistake in system design interviews is jumping straight to the 500M DAU architecture. Real systems don’t start there. Understanding the journey is what separates a junior answer from a senior one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1 — 1K DAU: Ship Fast
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure:&lt;/strong&gt; Single server, single PostgreSQL instance, S3 for photos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works:&lt;/strong&gt; Everything. At 1K users, you have no scaling problems. Your only job is shipping features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks first:&lt;/strong&gt; PostgreSQL connection limits. Default is 100 max connections. At ~80 concurrent users hitting the DB, you start seeing &lt;code&gt;too many connections&lt;/code&gt; errors. Fix: &lt;strong&gt;PgBouncer&lt;/strong&gt; for connection pooling. Trade-off: one more component to operate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Single Server (App + Postgres + PgBouncer) → S3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 2 — 100K DAU: The First Real Pain
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What breaks:&lt;/strong&gt; Feed queries. &lt;code&gt;SELECT posts WHERE user_id IN (list_of_500_followings) ORDER BY created_at DESC LIMIT 10&lt;/code&gt; becomes a slow full table scan as posts accumulate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; for pre-computed feeds (cache-aside pattern, TTL 10 min)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read replicas&lt;/strong&gt; so reads don’t compete with writes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDN (CloudFront)&lt;/strong&gt; in front of S3 — stop serving media from origin&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Story expiry cron&lt;/strong&gt; — a job every 15 minutes marking expired stories deleted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New problems introduced:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache invalidation: whose feed do you invalidate when someone posts?&lt;/li&gt;
&lt;li&gt;Read replica lag: users might briefly see stale data (eventual consistency)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Load Balancer → App Servers → {Postgres Primary, Redis, S3+CDN}
                                     ↓
                              Postgres Read Replicas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 3 — 10M DAU: Real Distributed Systems
&lt;/h3&gt;

&lt;p&gt;This is the interesting stage. Three things break simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Monolith deployment slows down feature development — team coordination hell&lt;/li&gt;
&lt;li&gt;Like/comment write throughput saturates PostgreSQL&lt;/li&gt;
&lt;li&gt;Text search with &lt;code&gt;LIKE&lt;/code&gt; queries is unusably slow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Fixes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split into &lt;strong&gt;microservices&lt;/strong&gt; (User, Post, Feed, Comment, Notification, Search)&lt;/li&gt;
&lt;li&gt;Introduce &lt;strong&gt;Kafka&lt;/strong&gt; as the event backbone — services stop calling each other synchronously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cassandra&lt;/strong&gt; for likes and comments (write-optimised, no transactions needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elasticsearch&lt;/strong&gt; for search, hashtags, and explore&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → API Gateway → Microservices → Kafka → Consumers
                              ↓
              {Postgres, Redis, Cassandra, Elasticsearch, S3+CDN}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 4 — 500M DAU: Planetary Scale
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What changes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Geo-distribution: data centres in US, EU, Asia-Pacific&lt;/li&gt;
&lt;li&gt;ML-powered feed ranking replaces chronological ordering&lt;/li&gt;
&lt;li&gt;Sharding Postgres by user_id across multiple instances&lt;/li&gt;
&lt;li&gt;Cassandra runs as a multi-region cluster&lt;/li&gt;
&lt;li&gt;Kafka handles millions of events per second with consumer groups&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Component Deep Dive: Feed Generation {#feed-generation}
&lt;/h2&gt;

&lt;p&gt;Feed generation is the hardest problem in the Instagram system design. Get this wrong and every other component is irrelevant.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Question: Push vs Pull
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fan-out on Write (Push):&lt;/strong&gt;&lt;br&gt;
When a user posts, immediately write that post to every follower’s feed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Feed reads are O(1) — just read the pre-computed list&lt;/li&gt;
&lt;li&gt;❌ Write amplification: 1 post × 10,000 followers = 10,000 writes&lt;/li&gt;
&lt;li&gt;❌ Catastrophic for celebrities (Ronaldo posting = 600M writes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fan-out on Read (Pull):&lt;/strong&gt;&lt;br&gt;
When a user opens their feed, fetch posts from everyone they follow in real-time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ No write amplification — posts are written once&lt;/li&gt;
&lt;li&gt;❌ Read is expensive: fetch from 500 followings, merge, sort, rank&lt;/li&gt;
&lt;li&gt;❌ Slow for power users with many followings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Instagram’s Solution: Hybrid Fan-out&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular users (&amp;lt; 1M followers): &lt;strong&gt;push model&lt;/strong&gt; — fan-out on write to their followers’ feeds&lt;/li&gt;
&lt;li&gt;Celebrity users (&amp;gt; 1M followers): &lt;strong&gt;pull model&lt;/strong&gt; — merge their latest posts at read time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On feed load:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read pre-computed feed from Redis ZSET (sorted by ML ranking score)&lt;/li&gt;
&lt;li&gt;For any celebrity accounts the user follows, fetch their latest posts&lt;/li&gt;
&lt;li&gt;Merge, re-rank, serve&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Feed Storage in Redis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;feed:{user_id}&lt;/span&gt;
&lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ZSET (sorted set)&lt;/span&gt;
&lt;span class="na"&gt;Score&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ML ranking score (not timestamp)&lt;/span&gt;
&lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;post_id&lt;/span&gt;
&lt;span class="na"&gt;TTL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On cache miss → fall back to Cassandra user_timeline table → re-rank → re-cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ML Ranking Signals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recency (newer posts scored higher)&lt;/li&gt;
&lt;li&gt;Relationship strength (how often you interact with this account)&lt;/li&gt;
&lt;li&gt;Post engagement velocity (likes/comments in first hour)&lt;/li&gt;
&lt;li&gt;Content type preference (video vs photo history)&lt;/li&gt;
&lt;li&gt;Session context (what you’ve engaged with this session)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Feed Edge Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Offline user returning after 2 weeks:&lt;/strong&gt;&lt;br&gt;
Don’t backfill 14 days of fan-out events. Their feed cache is cold and stale. Generate fresh on first open from Cassandra. Accept that the first load is slightly slower.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User unfollows someone mid-request:&lt;/strong&gt;&lt;br&gt;
Eventual consistency means you might briefly surface one post from an unfollowed account. Don’t try to prevent this at the storage layer — the complexity isn’t worth it. Filter at the display layer if it’s a concern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deleted post in cached feed:&lt;/strong&gt;&lt;br&gt;
Store &lt;code&gt;is_deleted&lt;/code&gt; flag. Check at serve time. Never serve deleted content from cache regardless of what the feed list says.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New user with zero followings (cold start):&lt;/strong&gt;&lt;br&gt;
Show explore/trending content until they follow enough accounts for a meaningful feed.&lt;/p&gt;


&lt;h2&gt;
  
  
  5. Component Deep Dive: Stories &amp;amp; Expiry {#stories}
&lt;/h2&gt;

&lt;p&gt;Stories feel deceptively simple — post a photo, it disappears after 24 hours. The distributed expiry pipeline behind this is non-trivial.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Storage Architecture
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Story metadata  → PostgreSQL (story_id, user_id, expires_at, is_deleted)
Story TTL       → Redis SET (key: story:{user_id}, TTL: 24h)
Story media     → S3 (deleted async after expiry)
Story views     → Redis SET (key: viewed:{user_id}:{story_id}) + async counter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The Expiry Pipeline
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Story uploaded → &lt;code&gt;expires_at = NOW() + 24h&lt;/code&gt; written to Postgres&lt;/li&gt;
&lt;li&gt;Redis key set with matching TTL&lt;/li&gt;
&lt;li&gt;On Redis TTL expiry → Kafka &lt;code&gt;story.expired&lt;/code&gt; event published&lt;/li&gt;
&lt;li&gt;Kafka consumer: soft-delete in Postgres (&lt;code&gt;is_deleted = true&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Kafka consumer: issue S3 delete for the media file&lt;/li&gt;
&lt;li&gt;Kafka consumer: invalidate CDN cache for the media URL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; What if the Kafka consumer is down when the TTL fires?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Reconciliation cron job running every 15 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;story_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;stories&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;is_deleted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anything this finds is cleaned up. Eventual deletion — not real-time. Acceptable for stories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Story Feed
&lt;/h3&gt;

&lt;p&gt;When a user opens stories:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetch user IDs they follow from Postgres (or Redis cache)&lt;/li&gt;
&lt;li&gt;For each, check if &lt;code&gt;story:{user_id}&lt;/code&gt; key exists in Redis&lt;/li&gt;
&lt;li&gt;Return story IDs sorted by recency&lt;/li&gt;
&lt;li&gt;Mark viewed: &lt;code&gt;SADD viewed:{viewer_id}:{story_id}&lt;/code&gt; in Redis (idempotent)&lt;/li&gt;
&lt;li&gt;Increment view count async (avoid hot write on every view)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  6. Component Deep Dive: Media Upload &amp;amp; CDN {#media-upload}
&lt;/h2&gt;

&lt;p&gt;At 1,150 uploads per second, your servers cannot be in the media path. Every byte going through your application servers is wasted CPU and network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-Signed S3 Upload Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Client → POST /v1/posts  { caption, media_type }
2. Server → generates pre-signed S3 URL (valid 15 min) + post_id
3. Server → returns { post_id, upload_url } to client
4. Client → PUT directly to S3 using upload_url
5. S3 → fires s3:ObjectCreated event
6. Lambda/consumer → publishes media.uploaded to Kafka
7. Kafka consumer → generates thumbnails, updates post status, triggers feed fan-out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your servers touch zero bytes of media. They handle only metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Media Sizes
&lt;/h3&gt;

&lt;p&gt;Every photo is stored in three sizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Thumbnail:&lt;/strong&gt; 150×150px — profile grids, search results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feed:&lt;/strong&gt; 720px wide — home feed display&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full:&lt;/strong&gt; 1080px wide — post detail view&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Original:&lt;/strong&gt; preserved for potential future use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stored at: &lt;code&gt;s3://ig-media-{region}/{user_id}/{post_id}/{size}.jpg&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CDN Strategy
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;CloudFront in front of all S3 buckets.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache-Control headers: &lt;code&gt;max-age=31536000&lt;/code&gt; (1 year) for immutable media&lt;/li&gt;
&lt;li&gt;Edge locations serve 95%+ of media requests — origin never gets hit&lt;/li&gt;
&lt;li&gt;On post delete: CDN invalidation API call (small window of stale serving — acceptable)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The edge case:&lt;/strong&gt; Client successfully uploads to S3 but dies before confirming to your API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; S3 event notification independently triggers the Kafka event. Your Post Service confirms the upload without waiting for client confirmation. The client can poll &lt;code&gt;GET /v1/posts/:post_id&lt;/code&gt; to check status.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Component Deep Dive: Notifications {#notifications}
&lt;/h2&gt;

&lt;p&gt;Notifications are a fan-out problem dressed in a UX problem’s clothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Notification Types &amp;amp; Channels
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;th&gt;Latency Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Like on your post&lt;/td&gt;
&lt;td&gt;Push (FCM/APNs) + In-app&lt;/td&gt;
&lt;td&gt;&amp;lt; 5 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comment on your post&lt;/td&gt;
&lt;td&gt;Push + In-app&lt;/td&gt;
&lt;td&gt;&amp;lt; 5 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New follower&lt;/td&gt;
&lt;td&gt;Push + In-app&lt;/td&gt;
&lt;td&gt;&amp;lt; 10 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Story view&lt;/td&gt;
&lt;td&gt;In-app only&lt;/td&gt;
&lt;td&gt;&amp;lt; 30 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mention in caption&lt;/td&gt;
&lt;td&gt;Push + In-app&lt;/td&gt;
&lt;td&gt;&amp;lt; 5 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Action → Kafka event → Notification Service consumer
       → Enrich (fetch user prefs, device tokens, do-not-disturb)
       → Route (push? in-app? email? all?)
       → Send via FCM (Android) / APNs (iOS)
       → Store in notifications DB for in-app feed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Hard Edge Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Notification storm — viral post:&lt;/strong&gt;&lt;br&gt;
A post gets 10M likes. Without batching, your Notification Service receives 10M &lt;code&gt;like.created&lt;/code&gt; events and tries to push 10M individual notifications to the post author.&lt;/p&gt;

&lt;p&gt;Fix: &lt;strong&gt;Debouncing in the Notification Service.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Window: 60 seconds&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;like.created&lt;/code&gt; events for the same &lt;code&gt;post_id&lt;/code&gt; + &lt;code&gt;user_id&lt;/code&gt; (recipient) exceed threshold → batch into “X and 9,999 others liked your post”&lt;/li&gt;
&lt;li&gt;Store the count in Redis, flush as single notification at window close&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dead device tokens:&lt;/strong&gt;&lt;br&gt;
User uninstalls app. FCM/APNs return &lt;code&gt;NotRegistered&lt;/code&gt; or &lt;code&gt;BadDeviceToken&lt;/code&gt; on delivery attempt.&lt;/p&gt;

&lt;p&gt;Fix: Notification Service listens for delivery failure callbacks → marks device token as invalid in DB → stops sending to that token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User preference: notifications off:&lt;/strong&gt;&lt;br&gt;
Check user notification preferences &lt;em&gt;before&lt;/em&gt; publishing to Kafka. Don’t generate events for users who have disabled that notification type. Saves downstream processing entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do Not Disturb windows:&lt;/strong&gt;&lt;br&gt;
Store user timezone + DND preferences. Notification Service checks at delivery time — if in DND window, store notification, deliver at window end.&lt;/p&gt;


&lt;h2&gt;
  
  
  8. Component Deep Dive: Search &amp;amp; Discovery {#search}
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Why Not Postgres?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;SELECT * FROM posts WHERE caption LIKE '%golden gate%'&lt;/code&gt; is a sequential scan on a table with billions of rows. At any meaningful scale, this query will timeout before returning.&lt;/p&gt;

&lt;p&gt;You need an inverted index. That’s Elasticsearch.&lt;/p&gt;
&lt;h3&gt;
  
  
  Elasticsearch Index Design
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Posts Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"post_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"caption"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text (analyzed, english stemming)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hashtags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"geo_point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"like_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_deleted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Users Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"follower_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_private"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_verified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Keeping Elasticsearch in Sync
&lt;/h3&gt;

&lt;p&gt;Elasticsearch is updated &lt;strong&gt;asynchronously&lt;/strong&gt; from Postgres via Kafka:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Postgres write → Kafka (post.created / post.updated / post.deleted)
              → Elasticsearch consumer → index update
              → Lag: ~1-2 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The dual-source pattern:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Post appears in owner’s &lt;strong&gt;profile&lt;/strong&gt; immediately (read from Postgres — source of truth)&lt;/li&gt;
&lt;li&gt;Post appears in &lt;strong&gt;search results&lt;/strong&gt; after ~2 seconds (read from Elasticsearch)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two sources of truth for two different use cases. This is intentional, not a bug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trending Hashtags
&lt;/h3&gt;

&lt;p&gt;Trending is a sliding window count problem. Redis handles it elegantly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;On hashtag used: ZINCRBY trending:1h &amp;lt;tag&amp;gt; 1
                 ZINCRBY trending:24h &amp;lt;tag&amp;gt; 1
                 ZINCRBY trending:7d &amp;lt;tag&amp;gt; 1

Expire keys: trending:1h → TTL 1 hour (rolling via scheduled reset)
             trending:24h → TTL 24 hours
             trending:7d → TTL 7 days

Read trending: ZREVRANGE trending:1h 0 9 WITHSCORES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For true sliding windows (not fixed-window resets), use a sorted set with timestamps as members and prune periodically with &lt;code&gt;ZREMRANGEBYSCORE&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explore / Discover
&lt;/h3&gt;

&lt;p&gt;Explore isn’t search — it’s recommendation. ML-powered, personalised, continuously reranked.&lt;/p&gt;

&lt;p&gt;Pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Candidate generation: posts with high engagement velocity in last 24h&lt;/li&gt;
&lt;li&gt;User interest modelling: what content types has this user engaged with?&lt;/li&gt;
&lt;li&gt;Collaborative filtering: what are similar users engaging with?&lt;/li&gt;
&lt;li&gt;Re-ranking: apply diversity, freshness, safety filters&lt;/li&gt;
&lt;li&gt;Serve top 50 candidates per request&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Infrastructure: Apache Spark for batch feature computation, TensorFlow Serving for real-time scoring, Redis for caching ranked candidate lists per user.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Component Deep Dive: Likes &amp;amp; Comments {#likes-comments}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Postgres Can’t Handle Likes
&lt;/h3&gt;

&lt;p&gt;52,000 likes per second. In Postgres, each like is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;code&gt;INSERT&lt;/code&gt; into the likes table&lt;/li&gt;
&lt;li&gt;An &lt;code&gt;UPDATE&lt;/code&gt; on the post’s &lt;code&gt;like_count&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Potentially a row lock while updating the count&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 52K/second, you’ll hit write contention, lock timeouts, and deadlocks. Postgres wasn’t built for this write pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cassandra for Likes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Cassandra table design&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;likes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;post_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;reaction_type&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMEUUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this schema:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;post_id&lt;/code&gt; as partition key → all likes for a post on one node&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt; as clustering key → O(1) check “has this user liked this post?”&lt;/li&gt;
&lt;li&gt;TIMEUUID for ordering without separate timestamp column&lt;/li&gt;
&lt;li&gt;INSERT is idempotent → same &lt;code&gt;(post_id, user_id)&lt;/code&gt; twice = one like (handles retries)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Like count:&lt;/strong&gt;&lt;br&gt;
Don’t store count in Cassandra (COUNTER type has consistency quirks). Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atomic &lt;code&gt;INCR&lt;/code&gt; in Redis: &lt;code&gt;like_ct:{post_id}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Write-back to Postgres &lt;code&gt;posts.like_count&lt;/code&gt; every 30 seconds async&lt;/li&gt;
&lt;li&gt;Accept: count shown may be ~30s behind actual. Nobody notices.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Comments in Cassandra
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;post_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;comment_id&lt;/span&gt; &lt;span class="n"&gt;TIMEUUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;like_count&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;comment_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;CLUSTERING&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comment_id&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Why TIMEUUID as clustering key:&lt;/strong&gt;&lt;br&gt;
Ordering is built into the key — no &lt;code&gt;ORDER BY&lt;/code&gt; at query time. Comments are naturally sorted chronologically. Pagination with &lt;code&gt;WHERE comment_id &amp;gt; &amp;lt;last_seen&amp;gt;&lt;/code&gt; is efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;post_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;comment_id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="c1"&gt;-- cursor&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Efficient. No full scans. Scales to millions of comments per post.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Database Design — Every Decision Justified {#database-design}
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsnswzg0c0n4edu1z3dwx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsnswzg0c0n4edu1z3dwx.png" alt=" " width="800" height="1040"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  PostgreSQL — The Relational Core
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;users table:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt;      &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;username&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;password_hash&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;bio&lt;/span&gt;          &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;profile_pic_url&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;follower_count&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;following_count&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;is_private&lt;/span&gt;   &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;is_verified&lt;/span&gt;  &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_username&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_email&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;posts table:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;post_id&lt;/span&gt;      &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt;      &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;caption&lt;/span&gt;      &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;media_urls&lt;/span&gt;   &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="n"&gt;media_type&lt;/span&gt;   &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_type&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'photo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'video'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'reel'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="n"&gt;location_lat&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;location_lng&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;like_count&lt;/span&gt;   &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;comment_count&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;is_deleted&lt;/span&gt;   &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_posts_user_created&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;follows table:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;follows&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;follower_id&lt;/span&gt;  &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;following_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'blocked'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;follower_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;following_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_follows_following&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;follows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;following_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Redis Key Design
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Key Pattern&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;TTL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;feed:{user_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ZSET&lt;/td&gt;
&lt;td&gt;Pre-computed ranked feed&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;story:{user_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SET of story_ids&lt;/td&gt;
&lt;td&gt;Active stories&lt;/td&gt;
&lt;td&gt;24h&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;session:{token}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;STRING&lt;/td&gt;
&lt;td&gt;Auth session → user_id&lt;/td&gt;
&lt;td&gt;7 days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rate:{uid}:{action}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;COUNTER&lt;/td&gt;
&lt;td&gt;Rate limit window&lt;/td&gt;
&lt;td&gt;1 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;like_ct:{post_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;STRING&lt;/td&gt;
&lt;td&gt;Atomic like counter&lt;/td&gt;
&lt;td&gt;No TTL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trending:{window}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ZSET&lt;/td&gt;
&lt;td&gt;Hashtag trending scores&lt;/td&gt;
&lt;td&gt;Window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;viewed:{uid}:{story_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;STRING&lt;/td&gt;
&lt;td&gt;Story viewed flag&lt;/td&gt;
&lt;td&gt;24h&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Storage Selection Rationale
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Users, Posts, Follows, Stories&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Relational, consistency required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Likes, Comments, Timelines&lt;/td&gt;
&lt;td&gt;Cassandra&lt;/td&gt;
&lt;td&gt;Write-heavy, no JOINs needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feeds, Sessions, Counters&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Speed, TTL support, atomic ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search, Hashtags, Explore&lt;/td&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Inverted index, full-text, geo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Photos, Videos, Stories&lt;/td&gt;
&lt;td&gt;S3 + CDN&lt;/td&gt;
&lt;td&gt;Cheap, durable, globally distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  11. API Design — Full Contracts {#api-design}
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4le71rfyu8dn3b7dvupf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4le71rfyu8dn3b7dvupf.png" alt=" " width="800" height="1086"&gt;&lt;/a&gt;&lt;br&gt;
All APIs are RESTful, JWT-authenticated, cursor-paginated, and rate-limited at the API Gateway via Redis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /v1/auth/register
Body: { username, email, password, full_name }
Returns: { access_token, refresh_token, user }

POST /v1/auth/login
Body: { email, password }
Returns: { access_token, refresh_token }

POST /v1/auth/refresh
Header: Authorization: Bearer &amp;lt;refresh_token&amp;gt;
Returns: { access_token }

POST /v1/auth/logout
Header: Authorization: Bearer &amp;lt;access_token&amp;gt;
Action: DEL session:{token} from Redis
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  User APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET  /v1/users/:username
Returns: { user_id, username, bio, follower_count, following_count,
           posts_count, is_followed, is_private, is_verified }

PATCH /v1/users/me
Body: { bio?, profile_pic_url?, is_private? }

GET  /v1/users/:id/followers?cursor=&amp;lt;cursor&amp;gt;&amp;amp;limit=20
Returns: { users[], next_cursor }

POST   /v1/users/:id/follow     → idempotent, Kafka: follow.created
DELETE /v1/users/:id/follow     → Kafka: follow.removed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Post APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /v1/posts
Body: { caption, media_type, location? }
Returns: { post_id, upload_url }   ← pre-signed S3 URL
Action: client uploads directly to S3, S3 event triggers Kafka

GET  /v1/posts/:post_id
Returns: { post, author, like_count, comment_count, is_liked, is_saved }

DELETE /v1/posts/:post_id
Action: is_deleted=true, Kafka: post.deleted → CDN purge

POST   /v1/posts/:post_id/like
Header: Idempotency-Key: &amp;lt;uuid&amp;gt;
Action: Redis INCR + Cassandra write + Kafka: like.created

DELETE /v1/posts/:post_id/like
Action: Redis DECR + Cassandra delete + Kafka: like.removed

GET  /v1/posts/:post_id/comments?cursor=&amp;lt;cursor&amp;gt;&amp;amp;limit=20
Returns: { comments[], next_cursor }   ← from Cassandra

POST /v1/posts/:post_id/comments
Body: { text }
Action: Cassandra write + Kafka: comment.created → notification
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Feed &amp;amp; Stories APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /v1/feed?cursor=&amp;lt;cursor&amp;gt;&amp;amp;limit=10
Action: Redis ZSET read → cache miss → Cassandra rebuild → re-cache
Returns: { posts[], next_cursor }

GET /v1/explore?page=1&amp;amp;limit=20
Action: Elasticsearch + ML ranking
Returns: { posts[], next_page }

POST /v1/stories
Body: { media_type }
Returns: { story_id, upload_url }
Action: Redis TTL set + Kafka: story.created

GET /v1/stories/feed
Returns: { stories[] }   ← unviewed, sorted by recency

POST /v1/stories/:id/view
Action: Redis SADD viewed:{uid} + async view_count INCR

DELETE /v1/stories/:id
Action: is_deleted=true + Kafka: story.deleted → S3 purge
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Search APIs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /v1/search/users?q=rachit&amp;amp;limit=10
Action: Elasticsearch users_index, fuzzy match, boost by follower_count

GET /v1/search/posts?q=sunset&amp;amp;hashtag=travel&amp;amp;lat=28.6&amp;amp;lng=77.2&amp;amp;radius=10km
Action: Elasticsearch posts_index, geo-filter + text match

GET /v1/search/trending?window=1h
Action: ZREVRANGE trending:1h 0 9 WITHSCORES
Returns: { tags: [{ tag, post_count, delta }] }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rate Limits
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;POST /posts&lt;/td&gt;
&lt;td&gt;10/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST /like&lt;/td&gt;
&lt;td&gt;60/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET /feed&lt;/td&gt;
&lt;td&gt;30/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST /comments&lt;/td&gt;
&lt;td&gt;20/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET /search&lt;/td&gt;
&lt;td&gt;20/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST /stories&lt;/td&gt;
&lt;td&gt;5/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST /follow&lt;/td&gt;
&lt;td&gt;30/min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  12. Edge Cases Nobody Draws on Their Diagram {#edge-cases}
&lt;/h2&gt;

&lt;p&gt;This section is what turns a good system design into a great one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Celebrity Fan-out Storm
&lt;/h3&gt;

&lt;p&gt;Cristiano Ronaldo posts. 600M followers. Fan-out on write to all of them simultaneously would generate 600M Cassandra writes in seconds — your cluster dies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Celebrity detection at post time (follower_count &amp;gt; 1M). Skip fan-out. At feed read time, fetch celebrity’s latest posts separately and merge. The merge happens in the Feed Service, in memory, before Redis caching.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Disappearing Story
&lt;/h3&gt;

&lt;p&gt;Redis TTL fires (story expires). Kafka consumer is restarting at that exact moment. The &lt;code&gt;story.expired&lt;/code&gt; event is consumed, but the consumer crashes before committing the offset. The event replays. The delete runs twice on S3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; S3 delete is idempotent (deleting a non-existent object returns 204). The Cassandra write is idempotent (same &lt;code&gt;story_id&lt;/code&gt; soft-delete runs twice = same result). Design all consumers to handle duplicate events safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Double Like
&lt;/h3&gt;

&lt;p&gt;Network is flaky. User taps like. Request times out client-side. Client retries. Server receives two &lt;code&gt;POST /like&lt;/code&gt; requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;code&gt;Idempotency-Key: &amp;lt;uuid&amp;gt;&lt;/code&gt; header on every like request. Server checks &lt;code&gt;SETNX idempotency:{key} 1 EX 86400&lt;/code&gt; in Redis before processing. If key exists, return cached response. If not, process and cache. Same key = same result, always.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comment on Deleted Post
&lt;/h3&gt;

&lt;p&gt;Post is soft-deleted. User (who has the post open on their screen) tries to comment. Request hits Comment Service before the deletion propagates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Comment Service calls Post Service to validate &lt;code&gt;is_deleted&lt;/code&gt; before writing. Or: API Gateway checks post status. Or: accept the race condition and clean up orphaned comments in a background job. Third option is usually right — the complexity of synchronous cross-service validation isn’t worth the edge case frequency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Notification Flood
&lt;/h3&gt;

&lt;p&gt;10M likes in 10 minutes on a viral reel. Without batching, the post author gets 10M push notifications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Debounce in Notification Service. Redis counter per &lt;code&gt;(recipient_id, post_id, notification_type)&lt;/code&gt; with 60-second window. At window close, fire one notification: “Priya and 9,999 others liked your reel.” Reset counter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold Start Feed
&lt;/h3&gt;

&lt;p&gt;New user. Zero followings. Feed is empty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Onboarding flow → interest selection → seed feed with high-engagement posts matching selected interests from Elasticsearch. After 5+ follows, switch to normal feed generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Geo-Replication Lag
&lt;/h3&gt;

&lt;p&gt;User in Mumbai follows someone in New York. The follow write goes to primary (US). Mumbai’s read replica is 800ms behind. User immediately views the newly-followed account’s profile — replica says “not following.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; For follow-status checks that are user-initiated immediately after a follow action, route the read to primary (or use a read-your-own-writes cache in Redis). This is the one case where eventual consistency is genuinely confusing to users.&lt;/p&gt;




&lt;h2&gt;
  
  
  13. Key Trade-offs Summary {#trade-offs}
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cassandra for likes&lt;/td&gt;
&lt;td&gt;Write speed vs. no ACID, no JOINs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push feed fan-out&lt;/td&gt;
&lt;td&gt;Fast reads vs. write amplification for popular accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async Elasticsearch sync&lt;/td&gt;
&lt;td&gt;Search features vs. 1-2 second indexing lag&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redis like counters&lt;/td&gt;
&lt;td&gt;Speed vs. 30-second write-back delay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eventual consistency on replicas&lt;/td&gt;
&lt;td&gt;Read scale vs. briefly stale data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soft deletes everywhere&lt;/td&gt;
&lt;td&gt;Safety / auditability vs. storage overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-signed S3 uploads&lt;/td&gt;
&lt;td&gt;Scalable media ingestion vs. more complex client logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid fan-out&lt;/td&gt;
&lt;td&gt;Balanced throughput vs. more complex feed assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Instagram at 500M DAU isn’t one system. It’s eight systems — feed, stories, media, notifications, search, likes, comments, and the graph — running in parallel, sharing Kafka as the connective tissue, each independently scalable.&lt;/p&gt;

&lt;p&gt;The principles that hold across all of them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Design for the read path first&lt;/strong&gt; — reads outnumber writes 80:20&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async everything that doesn’t need to be sync&lt;/strong&gt; — Kafka is your friend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name your trade-offs explicitly&lt;/strong&gt; — “we accept 2-second search lag for write simplicity”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for idempotency everywhere&lt;/strong&gt; — networks fail, retries happen, duplicates arrive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The cache is not the source of truth&lt;/strong&gt; — always have a fallback to the DB&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s what Instagram-scale system design looks like.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next in this series: Why SQL beats NoSQL for 90% of startups — the data, the nuance, and why the benchmarks lie.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Rachit writes about system design, backend engineering, and the real trade-offs nobody talks about. Follow for weekly deep dives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;system-design&lt;/code&gt; &lt;code&gt;backend&lt;/code&gt; &lt;code&gt;distributed-systems&lt;/code&gt; &lt;code&gt;instagram&lt;/code&gt; &lt;code&gt;software-architecture&lt;/code&gt; &lt;code&gt;database&lt;/code&gt; &lt;code&gt;kafka&lt;/code&gt; &lt;code&gt;redis&lt;/code&gt; &lt;code&gt;elasticsearch&lt;/code&gt; &lt;code&gt;cassandra&lt;/code&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>distributedsystems</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Your Shiny New Kafka Cluster is a Ticking Time Bomb</title>
      <dc:creator>Rachit Misra</dc:creator>
      <pubDate>Tue, 31 Mar 2026 02:10:16 +0000</pubDate>
      <link>https://dev.to/hungryformore/your-shiny-new-kafka-cluster-is-a-ticking-time-bomb-50a</link>
      <guid>https://dev.to/hungryformore/your-shiny-new-kafka-cluster-is-a-ticking-time-bomb-50a</guid>
      <description>&lt;p&gt;What nobody tells you about event-driven architecture until it’s 3 AM and your database is corrupted&lt;br&gt;
A war story about distributed systems, offset commits, and the most expensive lesson of my engineering career.&lt;/p&gt;

&lt;p&gt;Everyone said Kafka would fix our feed latency.&lt;br&gt;
They were right.&lt;br&gt;
For exactly 31 days.&lt;br&gt;
Then PagerDuty fired at 3:17 AM, consumer lag hit 2.4 million messages, and we discovered something that no architecture diagram had ever shown us — Kafka doesn’t care about your database state. It just delivers. Faithfully. Mercilessly.&lt;br&gt;
This is that story.&lt;/p&gt;

&lt;p&gt;The Setup — How We Got Here&lt;br&gt;
Q3. Product team screaming for faster feeds. Our engineering lead had just come back from a conference. Someone in the architecture meeting said the word Kafka.&lt;br&gt;
The room lit up. Senior engineers nodded. Someone opened a laptop and pulled up the Confluent documentation.&lt;br&gt;
We had a simple Redis-based event queue that was “too slow.” Our P99 latency was 800ms on feed generation. Product wanted 200ms. Kafka promised sub-100ms. The decision felt obvious.&lt;br&gt;
Three weeks later, we shipped it. Latency dropped to 80ms. The product team celebrated. Engineering celebrated. We had graphs going the right direction.&lt;br&gt;
We had no idea what we’d just invited into our system.&lt;/p&gt;

&lt;p&gt;What Kafka Actually Is&lt;br&gt;
Before we get to the 3 AM incident, let’s talk about what Kafka actually is — because most teams get this wrong before they write a single line of code.&lt;br&gt;
Kafka is not a message queue.&lt;br&gt;
This is the most dangerous misconception in distributed systems today. When engineers hear “message queue,” they think: send a message, someone receives it, it’s gone. Like an email inbox.&lt;br&gt;
Kafka is a distributed commit log.&lt;br&gt;
Every event is written to a log. The log is partitioned across brokers. Each partition is an ordered, immutable sequence of records. Consumers read from the log at their own pace, tracked by an offset — a pointer to their position in the log.&lt;/p&gt;

&lt;p&gt;Partition 0: [event1] [event2] [event3] [event4] [event5]&lt;br&gt;
                                          ↑&lt;br&gt;
                                    Consumer offset&lt;br&gt;
                                    (Consumer has read up to here)&lt;/p&gt;

&lt;p&gt;The crucial difference from a traditional queue: Kafka doesn’t delete messages after consumption. Messages are retained for a configured retention period (default 7 days). Consumers can seek to any offset and replay any portion of the log.&lt;br&gt;
This is both Kafka’s superpower and its most dangerous property.&lt;br&gt;
Superpower: Replay events for new consumers, rebuild state, debug issues by replaying history.&lt;br&gt;
Danger: If your consumers are not idempotent and your system state has mutated, replay doesn’t recover your system. It corrupts it further.&lt;/p&gt;

&lt;p&gt;The Architecture We Built&lt;br&gt;
Our feed generation system looked elegant on the whiteboard:&lt;/p&gt;

&lt;p&gt;User Action&lt;br&gt;
    ↓&lt;br&gt;
[Producer] → [Kafka Topic: user-events]&lt;br&gt;
                    ↓&lt;br&gt;
            [Consumer Group A] → Updates user feed cache&lt;br&gt;
            [Consumer Group B] → Updates recommendation engine&lt;br&gt;
            [Consumer Group C] → Updates activity timeline&lt;/p&gt;

&lt;p&gt;Three consumer groups. Each consuming the same events for different purposes. Decoupled. Independent. Horizontally scalable.&lt;br&gt;
The architecture review went smoothly. “What about consumer failures?” someone asked.&lt;br&gt;
“Kafka handles that,” we said. “It retains messages. We can replay.”&lt;br&gt;
We were right. And catastrophically wrong.&lt;/p&gt;

&lt;p&gt;Month 2 — The Nightmare Begins&lt;br&gt;
3:17 AM&lt;br&gt;
PagerDuty fires.&lt;/p&gt;

&lt;p&gt;ALERT: Consumer lag critical&lt;br&gt;
Topic: order-events&lt;br&gt;
Consumer Group: order-processor&lt;br&gt;
Lag: 2,400,000 messages&lt;/p&gt;

&lt;p&gt;2.4 million messages behind. Not a small spike — Consumer Group B had been silently falling behind for 6 hours. A memory leak in one of the consumer instances had caused it to slow down. Kubernetes hadn’t restarted it because it was still running — just slowly.&lt;br&gt;
By the time the alert fired, we had 2.4 million unprocessed order events.&lt;br&gt;
The Decision That Made Everything Worse&lt;br&gt;
“No problem,” the on-call engineer said. “We’ll just restart the consumers and let them catch up.”&lt;br&gt;
This is where the real disaster began.&lt;br&gt;
The consumers restarted. They picked up from their last committed offset. They began processing 2.4 million events.&lt;br&gt;
Thirty minutes later, our database was in a state of chaos:&lt;br&gt;
    ∙ Orders were being marked as “processing” that had already been delivered&lt;br&gt;
    ∙ Inventory counters were going negative&lt;br&gt;
    ∙ Users were being charged twice for orders they’d already received&lt;br&gt;
    ∙ Recommendation scores were being recalculated with stale data, overwriting fresh data&lt;br&gt;
We stopped the consumers. But the damage was done.&lt;br&gt;
What Actually Happened&lt;br&gt;
Let me reconstruct the failure precisely, because understanding the exact mechanism is everything.&lt;br&gt;
Step 1 — The initial processing:&lt;/p&gt;

&lt;p&gt;Event: OrderPlaced {orderId: 12345, amount: 599, userId: user456}&lt;br&gt;
Consumer processes event:&lt;br&gt;
  → Creates order record in DB&lt;br&gt;
  → Charges payment gateway&lt;br&gt;
  → Updates inventory: item_count -= 1&lt;br&gt;
  → Commits offset&lt;/p&gt;

&lt;p&gt;This worked correctly for months.&lt;br&gt;
Step 2 — The slow consumer:&lt;/p&gt;

&lt;p&gt;Event: OrderPlaced {orderId: 67890, amount: 1299, userId: user789}&lt;br&gt;
Consumer receives event&lt;br&gt;
Consumer starts processing...&lt;br&gt;
  → Creates order record in DB ✅&lt;br&gt;
  → Charges payment gateway ✅&lt;br&gt;
  → Updates inventory: item_count -= 1 ✅&lt;br&gt;
Consumer crashes before committing offset ❌&lt;/p&gt;

&lt;p&gt;The offset for this batch was never committed. From Kafka’s perspective, these events were never successfully consumed.&lt;br&gt;
Step 3 — The restart:&lt;br&gt;
When the consumer restarted, it read from the last committed offset — before the crash. It received the same events again.&lt;/p&gt;

&lt;p&gt;Event: OrderPlaced {orderId: 67890, amount: 1299, userId: user789}&lt;br&gt;
Consumer processes event AGAIN:&lt;br&gt;
  → Creates order record in DB ← DUPLICATE&lt;br&gt;
  → Charges payment gateway ← DOUBLE CHARGE&lt;br&gt;
  → Updates inventory: item_count -= 1 ← WRONG AGAIN&lt;br&gt;
  → Commits offset ✅&lt;/p&gt;

&lt;p&gt;Step 4 — The 2.4 million event replay:&lt;br&gt;
Multiply this across 2.4 million events, many of which had been partially processed or fully processed but whose offsets hadn’t been committed. Some events were processed once, some twice, some partially.&lt;br&gt;
The database was now in an indeterminate state. We couldn’t tell which operations had run once and which had run twice.&lt;br&gt;
Kafka didn’t fail. It worked exactly as designed.&lt;br&gt;
We had built a perfectly reliable pipeline for delivering chaos at scale.&lt;/p&gt;

&lt;p&gt;The Root Cause Analysis&lt;br&gt;
Our postmortem identified three fundamental mistakes.&lt;br&gt;
Mistake 1: No Idempotency&lt;br&gt;
Idempotency means that running the same operation multiple times produces the same result as running it once.&lt;br&gt;
Our consumers were not idempotent. Processing the same OrderPlaced event twice created two orders, two charges, two inventory decrements.&lt;br&gt;
The fix:&lt;br&gt;
Every event handler must check: “Have I already processed this event?”&lt;/p&gt;

&lt;p&gt;@KafkaListener(topics = "order-events")&lt;br&gt;
public void handleOrderEvent(OrderEvent event) {&lt;br&gt;
    // Idempotency check FIRST&lt;br&gt;
    if (eventProcessingRepository.exists(event.getEventId())) {&lt;br&gt;
        log.info("Event {} already processed, skipping", event.getEventId());&lt;br&gt;
        return;&lt;br&gt;
    }&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try {
    // Process the event
    orderService.processOrder(event);

    // Mark as processed ATOMICALLY with the business operation
    // Use a DB transaction to ensure both happen or neither happens
    eventProcessingRepository.markProcessed(event.getEventId());

} catch (Exception e) {
    // Don't mark as processed — allow retry
    throw e;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;CREATE TABLE processed_events (&lt;br&gt;
    event_id VARCHAR(255) PRIMARY KEY,&lt;br&gt;
    processed_at TIMESTAMP DEFAULT NOW(),&lt;br&gt;
    consumer_group VARCHAR(100)&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;-- With TTL index to avoid unbounded growth&lt;br&gt;
CREATE INDEX idx_processed_events_time ON processed_events(processed_at);&lt;/p&gt;

&lt;p&gt;Every unique event ID is stored. Before processing, check if it exists. If yes — skip. If no — process and insert atomically.&lt;br&gt;
The idempotency key design matters:&lt;br&gt;
Your event IDs must be stable and unique across retries. Use a combination of business identifiers:&lt;/p&gt;

&lt;p&gt;// Good: stable, business-meaningful&lt;br&gt;
String eventId = "order-placed-" + orderId + "-" + userId;&lt;/p&gt;

&lt;p&gt;// Bad: changes on retry&lt;br&gt;
String eventId = UUID.randomUUID().toString();&lt;/p&gt;

&lt;p&gt;Mistake 2: Auto-Commit Was a Lie&lt;br&gt;
Kafka consumers have two offset commit strategies:&lt;br&gt;
Auto-commit (the default):&lt;/p&gt;

&lt;p&gt;props.put("enable.auto.commit", "true");&lt;br&gt;
props.put("auto.commit.interval.ms", "5000");&lt;/p&gt;

&lt;p&gt;Every 5 seconds, Kafka automatically commits the current offset regardless of whether your application has finished processing. If your consumer crashes between the auto-commit and finishing processing — the message is considered consumed but your application never finished handling it. Silent data loss.&lt;br&gt;
Alternatively, if your consumer crashes after processing but before the next auto-commit — the message replays on restart. Duplicate processing.&lt;br&gt;
Auto-commit gives you the worst of both worlds: potential data loss AND potential duplicates, with no control over which failure mode you experience.&lt;br&gt;
Manual commit (the correct approach):&lt;/p&gt;

&lt;p&gt;props.put("enable.auto.commit", "false");&lt;/p&gt;

&lt;p&gt;@KafkaListener(topics = "order-events")&lt;br&gt;
public void handleOrderEvent(&lt;br&gt;
        OrderEvent event,&lt;br&gt;
        Acknowledgment acknowledgment) {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try {
    // Process the event fully
    orderService.processOrder(event);

    // Only commit offset AFTER successful processing
    acknowledgment.acknowledge();

} catch (Exception e) {
    // Don't acknowledge — message will be redelivered
    // Make sure your handler is idempotent!
    log.error("Failed to process event {}", event.getEventId(), e);
    throw e;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;With manual commit, you control exactly when an offset is committed. The offset only advances when you’ve confirmed successful processing.&lt;br&gt;
At-least-once vs exactly-once:&lt;br&gt;
Manual commit gives you at-least-once delivery — messages are never lost, but may be delivered more than once. Combined with idempotency, this is safe and practical.&lt;br&gt;
Exactly-once delivery is possible with Kafka transactions but comes with significant complexity and performance overhead. For most systems, at-least-once + idempotency is the right trade-off.&lt;/p&gt;

&lt;p&gt;// Exactly-once with Kafka transactions (complex, high overhead)&lt;br&gt;
@Transactional&lt;br&gt;
public void processWithExactlyOnce(OrderEvent event) {&lt;br&gt;
    // Kafka transaction spans both the consumer offset commit&lt;br&gt;
    // and the producer write — atomic&lt;br&gt;
    kafkaTemplate.executeInTransaction(operations -&amp;gt; {&lt;br&gt;
        orderService.processOrder(event);&lt;br&gt;
        operations.send("order-processed", event.getOrderId());&lt;br&gt;
        return null;&lt;br&gt;
    });&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Mistake 3: No Replay Strategy&lt;br&gt;
When we decided to replay 2.4 million events, we hadn’t asked a fundamental question:&lt;br&gt;
Is the current state of our database compatible with replaying these events?&lt;br&gt;
The answer was no. The database had moved forward. Events that assumed “inventory = 100” were being replayed against a database where “inventory = 47.”&lt;br&gt;
The correct replay strategy:&lt;br&gt;
Before replaying events, you must answer:&lt;br&gt;
    1.  What is the current state of the system? Take a snapshot before replay begins.&lt;br&gt;
    2.  Are the events you’re replaying compatible with the current state? If an event says “deduct 1 from inventory” and inventory is already at the post-event value — replaying it will corrupt state.&lt;br&gt;
    3.  Can you replay to a shadow system first? Replay events against a read replica or a staging environment to validate the outcome before applying to production.&lt;br&gt;
    4.  Do you have a compensation mechanism? If replay causes inconsistency, can you detect and correct it?&lt;/p&gt;

&lt;p&gt;public class SafeReplayService {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public void replayEvents(String topic, long fromOffset, long toOffset) {
    // Step 1: Take DB snapshot for rollback capability
    String snapshotId = snapshotService.createSnapshot();

    // Step 2: Enable replay mode (idempotency is critical here)
    replayModeFlag.set(true);

    // Step 3: Replay in small batches with validation
    for (long offset = fromOffset; offset &amp;lt; toOffset; offset += BATCH_SIZE) {
        List&amp;lt;ConsumerRecord&amp;gt; batch = fetchBatch(topic, offset, BATCH_SIZE);

        // Validate state compatibility before processing
        if (!stateCompatibilityChecker.isCompatible(batch)) {
            log.error("State incompatibility detected at offset {}", offset);
            rollbackService.rollback(snapshotId);
            throw new ReplayException("Cannot safely replay at offset " + offset);
        }

        processBatch(batch);
        validateBatchOutcome(batch);
    }

    replayModeFlag.set(false);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;The Kafka Failure Modes Nobody Talks About&lt;br&gt;
Consumer Group Rebalancing&lt;br&gt;
When a consumer joins or leaves a consumer group, Kafka triggers a rebalance — reassigning partitions across consumers.&lt;br&gt;
During rebalance, all consumers in the group stop processing. For a group of 10 consumers handling a high-throughput topic, a rebalance can pause processing for 30-60 seconds.&lt;br&gt;
Causes of unexpected rebalances:&lt;br&gt;
    ∙ Consumer takes longer than max.poll.interval.ms to process a batch&lt;br&gt;
    ∙ Consumer fails to send heartbeat within session.timeout.ms&lt;br&gt;
    ∙ Deployment rolling update adds/removes consumer instances&lt;br&gt;
Mitigation:&lt;/p&gt;

&lt;p&gt;// Increase poll interval for slow processors&lt;br&gt;
props.put("max.poll.interval.ms", "600000"); // 10 minutes&lt;/p&gt;

&lt;p&gt;// Reduce batch size to ensure processing within interval&lt;br&gt;
props.put("max.poll.records", "100"); // Process 100 records at a time&lt;/p&gt;

&lt;p&gt;// Use static membership to reduce rebalances during restarts&lt;br&gt;
props.put("group.instance.id", "consumer-instance-1");&lt;/p&gt;

&lt;p&gt;Log Compaction Surprises&lt;br&gt;
Kafka supports log compaction on certain topics — retaining only the latest message for each key. This is useful for event sourcing and change data capture.&lt;br&gt;
The surprise: if you’re consuming a compacted topic and your consumer falls behind, some of the events you missed may have been compacted away. You’ll never see intermediate states.&lt;/p&gt;

&lt;p&gt;Before compaction:&lt;br&gt;
[key:user1, value:name=Alice] [key:user1, value:name=AliceB] [key:user1, value:name=Carol]&lt;/p&gt;

&lt;p&gt;After compaction:&lt;br&gt;
[key:user1, value:name=Carol]&lt;/p&gt;

&lt;p&gt;Your consumer that missed the first two events will only see “Carol.” If your system expected to process every name change — you’ve silently lost data.&lt;br&gt;
Partition Hot Spots&lt;br&gt;
Kafka distributes messages across partitions using a partitioning key. If your partitioning key has low cardinality — for example, partitioning order events by country in a primarily US-based app — one partition receives 90% of the traffic.&lt;br&gt;
The consumers reading that partition are overloaded. Others are idle. Horizontal scaling doesn’t help because you can’t have more consumers than partitions for a given topic.&lt;/p&gt;

&lt;p&gt;// Bad: low cardinality key&lt;br&gt;
producer.send(new ProducerRecord&amp;lt;&amp;gt;("orders", order.getCountry(), order));&lt;/p&gt;

&lt;p&gt;// Good: high cardinality key&lt;br&gt;
producer.send(new ProducerRecord&amp;lt;&amp;gt;("orders", order.getOrderId(), order));&lt;/p&gt;

&lt;p&gt;The Poison Pill&lt;br&gt;
A single malformed message that causes your consumer to crash on every processing attempt. Auto-commit means the offset never advances. Your consumer restarts, fetches the same message, crashes again. Infinite loop.&lt;/p&gt;

&lt;p&gt;@KafkaListener(topics = "order-events")&lt;br&gt;
public void handleOrderEvent(OrderEvent event) {&lt;br&gt;
    try {&lt;br&gt;
        orderService.processOrder(event);&lt;br&gt;
        acknowledgment.acknowledge();&lt;br&gt;
    } catch (PoisonPillException e) {&lt;br&gt;
        // Dead letter queue for messages that can't be processed&lt;br&gt;
        deadLetterProducer.send("order-events-dlq", event);&lt;br&gt;
        acknowledgment.acknowledge(); // Move past the poison pill&lt;br&gt;
        log.error("Moved poison pill to DLQ: {}", event.getEventId());&lt;br&gt;
    } catch (RetryableException e) {&lt;br&gt;
        // Don't acknowledge — allow retry&lt;br&gt;
        throw e;&lt;br&gt;
    }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Always implement a dead letter queue (DLQ) for messages that consistently fail processing. Without it, a single bad message can halt your entire consumer group.&lt;/p&gt;

&lt;p&gt;The Three Questions You Must Answer Before Adding Kafka&lt;br&gt;
After our incident, we created a pre-Kafka checklist. Every team considering Kafka must answer these three questions before writing a single line of producer code.&lt;br&gt;
Question 1: Are Your Consumers Idempotent?&lt;br&gt;
Can the same event be processed twice without corrupting state?&lt;br&gt;
If you can’t answer yes with confidence — don’t add Kafka yet. Build idempotency first.&lt;br&gt;
Test this explicitly:&lt;/p&gt;

&lt;p&gt;&lt;a class="mentioned-user" href="https://dev.to/test"&gt;@test&lt;/a&gt;&lt;br&gt;
public void processingSameEventTwiceProducesSameResult() {&lt;br&gt;
    OrderEvent event = createTestOrderEvent();&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;orderConsumer.handle(event);
orderConsumer.handle(event); // Process twice

// State should be identical to processing once
Order order = orderRepository.findById(event.getOrderId());
assertEquals(1, orderRepository.countByUserId(event.getUserId()));
assertEquals(OrderStatus.PLACED, order.getStatus());
// Payment should only be captured once
assertEquals(1, paymentRepository.countByOrderId(event.getOrderId()));
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;Question 2: Is Your Offset Commit Strategy Deliberate?&lt;br&gt;
Have you explicitly chosen between at-least-once, at-most-once, and exactly-once?&lt;br&gt;
Have you disabled auto-commit and implemented manual acknowledgment?&lt;br&gt;
Have you tested what happens when your consumer crashes mid-batch?&lt;br&gt;
Question 3: Do You Have a Replay Strategy That Accounts for Current DB State?&lt;br&gt;
If you need to replay 3 days of events tomorrow, can you do it safely?&lt;br&gt;
Do you have the tooling to:&lt;br&gt;
    ∙ Check state compatibility before replay?&lt;br&gt;
    ∙ Replay to a shadow environment for validation?&lt;br&gt;
    ∙ Roll back if replay causes inconsistency?&lt;br&gt;
If you can’t answer yes to all three — you have a time bomb, not a pipeline.&lt;/p&gt;

&lt;p&gt;When Kafka IS the Right Answer&lt;br&gt;
After all of this, I want to be clear: Kafka is extraordinary for the right problems.&lt;br&gt;
Use Kafka when:&lt;br&gt;
    ∙ You need event replay. Building a new analytics service that needs to process 6 months of historical events? Kafka’s retention makes this trivial. A traditional queue can’t do this.&lt;br&gt;
    ∙ Multiple consumers need the same events. Order placed → update inventory, send email, update recommendations, charge payment. Each consumer group processes independently at their own pace.&lt;br&gt;
    ∙ You need high throughput with durability. Kafka handles millions of messages per second with persistence guarantees. Traditional queues struggle here.&lt;br&gt;
    ∙ You’re building event sourcing. Kafka’s log is a natural fit for storing the complete history of state changes.&lt;br&gt;
    ∙ You need decoupling between services. Producers don’t know about consumers. Services can be added without modifying existing code.&lt;br&gt;
Don’t use Kafka when:&lt;br&gt;
    ∙ You just need a simple task queue. Redis queues, RabbitMQ, or AWS SQS are simpler and sufficient.&lt;br&gt;
    ∙ Your team doesn’t understand distributed systems fundamentals. Kafka amplifies your architecture’s weaknesses.&lt;br&gt;
    ∙ You need simple request-response patterns. Kafka’s async nature adds latency and complexity for synchronous workflows.&lt;br&gt;
    ∙ You’re a startup with 100 users. Your feed latency problem is probably a missing database index, not a missing message broker.&lt;/p&gt;

&lt;p&gt;The Checklist We Wish We Had&lt;/p&gt;

&lt;p&gt;Pre-Kafka Production Checklist:&lt;/p&gt;

&lt;p&gt;Consumer Design&lt;br&gt;
□ Idempotency implemented and tested&lt;br&gt;
□ Auto-commit disabled&lt;br&gt;
□ Manual offset commit with acknowledgment pattern&lt;br&gt;
□ Dead letter queue configured&lt;br&gt;
□ Poison pill handling implemented&lt;br&gt;
□ Consumer lag alerting configured&lt;/p&gt;

&lt;p&gt;Operations&lt;br&gt;
□ Replay strategy documented&lt;br&gt;
□ State compatibility validation tooling built&lt;br&gt;
□ Shadow replay environment available&lt;br&gt;
□ Kafka cluster monitoring configured&lt;br&gt;
  □ Consumer lag per group/topic&lt;br&gt;
  □ Broker disk usage&lt;br&gt;
  □ Under-replicated partitions&lt;br&gt;
  □ Controller election rate&lt;/p&gt;

&lt;p&gt;Architecture&lt;br&gt;
□ Partition count matches consumer scaling requirements&lt;br&gt;
□ Partitioning key has high cardinality&lt;br&gt;
□ Retention period matches replay requirements&lt;br&gt;
□ Log compaction behavior understood for topic&lt;br&gt;
□ Rebalance frequency monitored&lt;/p&gt;

&lt;p&gt;Testing&lt;br&gt;
□ Consumer crash mid-batch tested&lt;br&gt;
□ Double-processing tested (idempotency verification)&lt;br&gt;
□ Consumer lag recovery tested&lt;br&gt;
□ Replay tested against production-like data volume&lt;/p&gt;

&lt;p&gt;What Distributed Systems Actually Teach You&lt;br&gt;
The 3 AM incident taught us something that no architecture talk had ever communicated clearly:&lt;br&gt;
Distributed systems don’t punish bad architecture immediately.&lt;br&gt;
They let you deploy. They let you celebrate the latency improvements. They let you present the graphs to product. They let you write the blog post about how you scaled.&lt;br&gt;
Then, weeks or months later, under exactly the right (wrong) conditions — a slow consumer, an unexpected traffic spike, a network partition — they collect.&lt;br&gt;
The bill is always paid at 3 AM.&lt;br&gt;
The engineers who truly understand distributed systems aren’t the ones who avoided incidents. They’re the ones who’ve been humbled by them, understood exactly why they happened, and built systems that fail gracefully instead of catastrophically.&lt;br&gt;
Kafka isn’t a silver bullet. It’s a distributed consistency nightmare dressed in a hoodie that says “low latency.”&lt;br&gt;
Respect it, or it will humble you.&lt;/p&gt;

&lt;p&gt;What’s Next&lt;br&gt;
Next post: Why SQL quietly beats NoSQL for 90% of startups — despite everything the hype machine told you. We’ll look at the actual data, the benchmark lies, and the cases where PostgreSQL outperforms MongoDB at scale.&lt;br&gt;
This one is going to make people uncomfortable.&lt;br&gt;
Follow to stay updated. 🔔&lt;/p&gt;

&lt;p&gt;If this saved you from a 3 AM incident, share it with your team. The best time to learn this lesson is before it’s your production database.&lt;/p&gt;

&lt;p&gt;Tags: Kafka, Distributed Systems, Backend Engineering, System Design, Software Architecture, Java, Event-Driven Architecture, Microservices, Interview Prep&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>softwaredevelopment</category>
      <category>career</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
