DEV Community

Cover image for Designing Instagram at Scale: A Complete System Design Deep Dive
Rachit Misra
Rachit Misra

Posted on

Designing Instagram at Scale: A Complete System Design Deep Dive

From a ₹800/month server to 500M daily users — every component, every trade-off, every edge case.


Table of Contents

  1. Why Instagram is a perfect system design problem
  2. The numbers that define the problem
  3. The scaling journey — Stage by Stage
  4. Component Deep Dive: Feed Generation
  5. Component Deep Dive: Stories & Expiry
  6. Component Deep Dive: Media Upload & CDN
  7. Component Deep Dive: Notifications
  8. Component Deep Dive: Search & Discovery
  9. Component Deep Dive: Likes & Comments
  10. Database Design — Every Decision Justified
  11. API Design — Full Contracts
  12. Edge Cases Nobody Draws on Their Diagram
  13. Key Trade-offs Summary

1. Why Instagram is a Perfect System Design Problem {#why-instagram}

Instagram sits at the intersection of every hard distributed systems problem:

  • Read-heavy (people scroll more than they post)
  • Write-heavy at peaks (52,000 likes per second)
  • Media-intensive (photos, videos, reels, stories)
  • Real-time (stories expire, feeds update, notifications land)
  • Socially connected (the graph makes everything harder)
  • Globally distributed (500M users across every timezone)

It’s not one hard problem. It’s eight hard problems running simultaneously, sharing infrastructure, with users who notice every hiccup.

This is why it appears in almost every senior system design interview. And why most candidates fail it — not because they don’t know the components, but because they don’t know why each component exists and what breaks without it.

This article covers everything. By the end you’ll be able to design Instagram from first principles, justify every decision, and handle every curveball an interviewer can throw.


2. The Numbers That Define the Problem {#the-numbers}

Before writing a single box on your architecture diagram, establish the scale. This isn’t optional ceremony — it determines every design decision you make.

User scale:

  • 2B registered users
  • 500M Daily Active Users (DAU)
  • Peak concurrent users: ~50M

Content scale:

  • 100M photos/videos uploaded per day → ~1,150 uploads/second
  • 500M stories created per day
  • 4.5B likes per day → ~52,000 likes/second
  • 100M comments per day → ~1,150 comments/second

Read scale:

  • Each user opens the app ~7x/day
  • 3.5B feed loads/day → ~40,000 feed requests/second
  • Feed load is your most expensive operation

Storage scale:

  • Average photo: 3MB (after compression)
  • 100M photos/day × 3MB × 3 sizes = ~900TB new storage per day
  • Video and reels multiply this significantly

Derived constraints:

  • Read:Write ratio ≈ 80:20 (mostly read)
  • Feed generation is the critical path
  • Like storage needs write-optimised infrastructure
  • Media storage needs a CDN — serving from origin is impossible at this scale

Now you can design. Everything flows from these numbers.


3. The Scaling Journey — Stage by Stage {#scaling-journey}

The biggest mistake in system design interviews is jumping straight to the 500M DAU architecture. Real systems don’t start there. Understanding the journey is what separates a junior answer from a senior one.

Stage 1 — 1K DAU: Ship Fast

Infrastructure: Single server, single PostgreSQL instance, S3 for photos.

What works: Everything. At 1K users, you have no scaling problems. Your only job is shipping features.

What breaks first: PostgreSQL connection limits. Default is 100 max connections. At ~80 concurrent users hitting the DB, you start seeing too many connections errors. Fix: PgBouncer for connection pooling. Trade-off: one more component to operate.

Architecture:

Client → Single Server (App + Postgres + PgBouncer) → S3
Enter fullscreen mode Exit fullscreen mode

Stage 2 — 100K DAU: The First Real Pain

What breaks: Feed queries. SELECT posts WHERE user_id IN (list_of_500_followings) ORDER BY created_at DESC LIMIT 10 becomes a slow full table scan as posts accumulate.

Fixes:

  • Redis for pre-computed feeds (cache-aside pattern, TTL 10 min)
  • Read replicas so reads don’t compete with writes
  • CDN (CloudFront) in front of S3 — stop serving media from origin
  • Story expiry cron — a job every 15 minutes marking expired stories deleted

New problems introduced:

  • Cache invalidation: whose feed do you invalidate when someone posts?
  • Read replica lag: users might briefly see stale data (eventual consistency)

Architecture:

Client → Load Balancer → App Servers → {Postgres Primary, Redis, S3+CDN}
                                     ↓
                              Postgres Read Replicas
Enter fullscreen mode Exit fullscreen mode

Stage 3 — 10M DAU: Real Distributed Systems

This is the interesting stage. Three things break simultaneously.

What breaks:

  1. Monolith deployment slows down feature development — team coordination hell
  2. Like/comment write throughput saturates PostgreSQL
  3. Text search with LIKE queries is unusably slow

Fixes:

  • Split into microservices (User, Post, Feed, Comment, Notification, Search)
  • Introduce Kafka as the event backbone — services stop calling each other synchronously
  • Cassandra for likes and comments (write-optimised, no transactions needed)
  • Elasticsearch for search, hashtags, and explore

Architecture:

Client → API Gateway → Microservices → Kafka → Consumers
                              ↓
              {Postgres, Redis, Cassandra, Elasticsearch, S3+CDN}
Enter fullscreen mode Exit fullscreen mode

Stage 4 — 500M DAU: Planetary Scale

What changes:

  • Geo-distribution: data centres in US, EU, Asia-Pacific
  • ML-powered feed ranking replaces chronological ordering
  • Sharding Postgres by user_id across multiple instances
  • Cassandra runs as a multi-region cluster
  • Kafka handles millions of events per second with consumer groups

4. Component Deep Dive: Feed Generation {#feed-generation}

Feed generation is the hardest problem in the Instagram system design. Get this wrong and every other component is irrelevant.

The Core Question: Push vs Pull

Fan-out on Write (Push):
When a user posts, immediately write that post to every follower’s feed.

  • ✅ Feed reads are O(1) — just read the pre-computed list
  • ❌ Write amplification: 1 post × 10,000 followers = 10,000 writes
  • ❌ Catastrophic for celebrities (Ronaldo posting = 600M writes)

Fan-out on Read (Pull):
When a user opens their feed, fetch posts from everyone they follow in real-time.

  • ✅ No write amplification — posts are written once
  • ❌ Read is expensive: fetch from 500 followings, merge, sort, rank
  • ❌ Slow for power users with many followings

Instagram’s Solution: Hybrid Fan-out

  • Regular users (< 1M followers): push model — fan-out on write to their followers’ feeds
  • Celebrity users (> 1M followers): pull model — merge their latest posts at read time

On feed load:

  1. Read pre-computed feed from Redis ZSET (sorted by ML ranking score)
  2. For any celebrity accounts the user follows, fetch their latest posts
  3. Merge, re-rank, serve

Feed Storage in Redis:

Key: feed:{user_id}
Type: ZSET (sorted set)
Score: ML ranking score (not timestamp)
Value: post_id
TTL: 10 minutes
Enter fullscreen mode Exit fullscreen mode

On cache miss → fall back to Cassandra user_timeline table → re-rank → re-cache.

ML Ranking Signals:

  • Recency (newer posts scored higher)
  • Relationship strength (how often you interact with this account)
  • Post engagement velocity (likes/comments in first hour)
  • Content type preference (video vs photo history)
  • Session context (what you’ve engaged with this session)

Feed Edge Cases

Offline user returning after 2 weeks:
Don’t backfill 14 days of fan-out events. Their feed cache is cold and stale. Generate fresh on first open from Cassandra. Accept that the first load is slightly slower.

User unfollows someone mid-request:
Eventual consistency means you might briefly surface one post from an unfollowed account. Don’t try to prevent this at the storage layer — the complexity isn’t worth it. Filter at the display layer if it’s a concern.

Deleted post in cached feed:
Store is_deleted flag. Check at serve time. Never serve deleted content from cache regardless of what the feed list says.

New user with zero followings (cold start):
Show explore/trending content until they follow enough accounts for a meaningful feed.


5. Component Deep Dive: Stories & Expiry {#stories}

Stories feel deceptively simple — post a photo, it disappears after 24 hours. The distributed expiry pipeline behind this is non-trivial.

The Storage Architecture

Story metadata  → PostgreSQL (story_id, user_id, expires_at, is_deleted)
Story TTL       → Redis SET (key: story:{user_id}, TTL: 24h)
Story media     → S3 (deleted async after expiry)
Story views     → Redis SET (key: viewed:{user_id}:{story_id}) + async counter
Enter fullscreen mode Exit fullscreen mode

The Expiry Pipeline

  1. Story uploaded → expires_at = NOW() + 24h written to Postgres
  2. Redis key set with matching TTL
  3. On Redis TTL expiry → Kafka story.expired event published
  4. Kafka consumer: soft-delete in Postgres (is_deleted = true)
  5. Kafka consumer: issue S3 delete for the media file
  6. Kafka consumer: invalidate CDN cache for the media URL

The problem: What if the Kafka consumer is down when the TTL fires?

The fix: Reconciliation cron job running every 15 minutes:

SELECT story_id FROM stories
WHERE expires_at < NOW()
AND is_deleted = false;
Enter fullscreen mode Exit fullscreen mode

Anything this finds is cleaned up. Eventual deletion — not real-time. Acceptable for stories.

Story Feed

When a user opens stories:

  1. Fetch user IDs they follow from Postgres (or Redis cache)
  2. For each, check if story:{user_id} key exists in Redis
  3. Return story IDs sorted by recency
  4. Mark viewed: SADD viewed:{viewer_id}:{story_id} in Redis (idempotent)
  5. Increment view count async (avoid hot write on every view)

6. Component Deep Dive: Media Upload & CDN {#media-upload}

At 1,150 uploads per second, your servers cannot be in the media path. Every byte going through your application servers is wasted CPU and network.

Pre-Signed S3 Upload Flow

1. Client → POST /v1/posts  { caption, media_type }
2. Server → generates pre-signed S3 URL (valid 15 min) + post_id
3. Server → returns { post_id, upload_url } to client
4. Client → PUT directly to S3 using upload_url
5. S3 → fires s3:ObjectCreated event
6. Lambda/consumer → publishes media.uploaded to Kafka
7. Kafka consumer → generates thumbnails, updates post status, triggers feed fan-out
Enter fullscreen mode Exit fullscreen mode

Your servers touch zero bytes of media. They handle only metadata.

Media Sizes

Every photo is stored in three sizes:

  • Thumbnail: 150×150px — profile grids, search results
  • Feed: 720px wide — home feed display
  • Full: 1080px wide — post detail view
  • Original: preserved for potential future use

Stored at: s3://ig-media-{region}/{user_id}/{post_id}/{size}.jpg

CDN Strategy

CloudFront in front of all S3 buckets.

  • Cache-Control headers: max-age=31536000 (1 year) for immutable media
  • Edge locations serve 95%+ of media requests — origin never gets hit
  • On post delete: CDN invalidation API call (small window of stale serving — acceptable)

The edge case: Client successfully uploads to S3 but dies before confirming to your API.

The fix: S3 event notification independently triggers the Kafka event. Your Post Service confirms the upload without waiting for client confirmation. The client can poll GET /v1/posts/:post_id to check status.


7. Component Deep Dive: Notifications {#notifications}

Notifications are a fan-out problem dressed in a UX problem’s clothing.

Notification Types & Channels

Trigger Channel Latency Target
Like on your post Push (FCM/APNs) + In-app < 5 seconds
Comment on your post Push + In-app < 5 seconds
New follower Push + In-app < 10 seconds
Story view In-app only < 30 seconds
Mention in caption Push + In-app < 5 seconds

The Pipeline

Action → Kafka event → Notification Service consumer
       → Enrich (fetch user prefs, device tokens, do-not-disturb)
       → Route (push? in-app? email? all?)
       → Send via FCM (Android) / APNs (iOS)
       → Store in notifications DB for in-app feed
Enter fullscreen mode Exit fullscreen mode

The Hard Edge Cases

Notification storm — viral post:
A post gets 10M likes. Without batching, your Notification Service receives 10M like.created events and tries to push 10M individual notifications to the post author.

Fix: Debouncing in the Notification Service.

  • Window: 60 seconds
  • If like.created events for the same post_id + user_id (recipient) exceed threshold → batch into “X and 9,999 others liked your post”
  • Store the count in Redis, flush as single notification at window close

Dead device tokens:
User uninstalls app. FCM/APNs return NotRegistered or BadDeviceToken on delivery attempt.

Fix: Notification Service listens for delivery failure callbacks → marks device token as invalid in DB → stops sending to that token.

User preference: notifications off:
Check user notification preferences before publishing to Kafka. Don’t generate events for users who have disabled that notification type. Saves downstream processing entirely.

Do Not Disturb windows:
Store user timezone + DND preferences. Notification Service checks at delivery time — if in DND window, store notification, deliver at window end.


8. Component Deep Dive: Search & Discovery {#search}

Why Not Postgres?

SELECT * FROM posts WHERE caption LIKE '%golden gate%' is a sequential scan on a table with billions of rows. At any meaningful scale, this query will timeout before returning.

You need an inverted index. That’s Elasticsearch.

Elasticsearch Index Design

Posts Index:

{
  "post_id": "keyword",
  "caption": "text (analyzed, english stemming)",
  "hashtags": ["keyword"],
  "location": "geo_point",
  "created_at": "date",
  "like_count": "integer",
  "user_id": "keyword",
  "is_deleted": "boolean"
}
Enter fullscreen mode Exit fullscreen mode

Users Index:

{
  "user_id": "keyword",
  "username": "keyword",
  "bio": "text",
  "follower_count": "integer",
  "is_private": "boolean",
  "is_verified": "boolean"
}
Enter fullscreen mode Exit fullscreen mode

Keeping Elasticsearch in Sync

Elasticsearch is updated asynchronously from Postgres via Kafka:

Postgres write → Kafka (post.created / post.updated / post.deleted)
              → Elasticsearch consumer → index update
              → Lag: ~1-2 seconds
Enter fullscreen mode Exit fullscreen mode

The dual-source pattern:

  • Post appears in owner’s profile immediately (read from Postgres — source of truth)
  • Post appears in search results after ~2 seconds (read from Elasticsearch)

Two sources of truth for two different use cases. This is intentional, not a bug.

Trending Hashtags

Trending is a sliding window count problem. Redis handles it elegantly:

On hashtag used: ZINCRBY trending:1h <tag> 1
                 ZINCRBY trending:24h <tag> 1
                 ZINCRBY trending:7d <tag> 1

Expire keys: trending:1h → TTL 1 hour (rolling via scheduled reset)
             trending:24h → TTL 24 hours
             trending:7d → TTL 7 days

Read trending: ZREVRANGE trending:1h 0 9 WITHSCORES
Enter fullscreen mode Exit fullscreen mode

For true sliding windows (not fixed-window resets), use a sorted set with timestamps as members and prune periodically with ZREMRANGEBYSCORE.

Explore / Discover

Explore isn’t search — it’s recommendation. ML-powered, personalised, continuously reranked.

Pipeline:

  1. Candidate generation: posts with high engagement velocity in last 24h
  2. User interest modelling: what content types has this user engaged with?
  3. Collaborative filtering: what are similar users engaging with?
  4. Re-ranking: apply diversity, freshness, safety filters
  5. Serve top 50 candidates per request

Infrastructure: Apache Spark for batch feature computation, TensorFlow Serving for real-time scoring, Redis for caching ranked candidate lists per user.


9. Component Deep Dive: Likes & Comments {#likes-comments}

Why Postgres Can’t Handle Likes

52,000 likes per second. In Postgres, each like is:

  • An INSERT into the likes table
  • An UPDATE on the post’s like_count
  • Potentially a row lock while updating the count

At 52K/second, you’ll hit write contention, lock timeouts, and deadlocks. Postgres wasn’t built for this write pattern.

Cassandra for Likes

-- Cassandra table design
CREATE TABLE likes (
  post_id UUID,
  user_id UUID,
  reaction_type TEXT,
  created_at TIMEUUID,
  PRIMARY KEY (post_id, user_id)
);
Enter fullscreen mode Exit fullscreen mode

Why this schema:

  • post_id as partition key → all likes for a post on one node
  • user_id as clustering key → O(1) check “has this user liked this post?”
  • TIMEUUID for ordering without separate timestamp column
  • INSERT is idempotent → same (post_id, user_id) twice = one like (handles retries)

Like count:
Don’t store count in Cassandra (COUNTER type has consistency quirks). Instead:

  • Atomic INCR in Redis: like_ct:{post_id}
  • Write-back to Postgres posts.like_count every 30 seconds async
  • Accept: count shown may be ~30s behind actual. Nobody notices.

Comments in Cassandra

CREATE TABLE comments (
  post_id UUID,
  comment_id TIMEUUID,
  user_id UUID,
  text TEXT,
  like_count INT,
  PRIMARY KEY (post_id, comment_id)
) WITH CLUSTERING ORDER BY (comment_id ASC);
Enter fullscreen mode Exit fullscreen mode

Why TIMEUUID as clustering key:
Ordering is built into the key — no ORDER BY at query time. Comments are naturally sorted chronologically. Pagination with WHERE comment_id > <last_seen> is efficient.

Query pattern:

SELECT * FROM comments
WHERE post_id = ?
AND comment_id > ? -- cursor
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Efficient. No full scans. Scales to millions of comments per post.


10. Database Design — Every Decision Justified {#database-design}

PostgreSQL — The Relational Core

users table:

CREATE TABLE users (
  user_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  username     VARCHAR(30) UNIQUE NOT NULL,
  email        VARCHAR(255) UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  bio          TEXT,
  profile_pic_url TEXT,
  follower_count INT DEFAULT 0,
  following_count INT DEFAULT 0,
  is_private   BOOLEAN DEFAULT false,
  is_verified  BOOLEAN DEFAULT false,
  created_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
Enter fullscreen mode Exit fullscreen mode

posts table:

CREATE TABLE posts (
  post_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id      UUID NOT NULL REFERENCES users(user_id),
  caption      TEXT,
  media_urls   TEXT[],
  media_type   VARCHAR(10) CHECK (media_type IN ('photo','video','reel')),
  location_lat DECIMAL(9,6),
  location_lng DECIMAL(9,6),
  like_count   BIGINT DEFAULT 0,
  comment_count INT DEFAULT 0,
  is_deleted   BOOLEAN DEFAULT false,
  created_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC);
Enter fullscreen mode Exit fullscreen mode

follows table:

CREATE TABLE follows (
  follower_id  UUID NOT NULL REFERENCES users(user_id),
  following_id UUID NOT NULL REFERENCES users(user_id),
  status       VARCHAR(10) CHECK (status IN ('active','pending','blocked')),
  created_at   TIMESTAMPTZ DEFAULT NOW(),
  PRIMARY KEY (follower_id, following_id)
);

CREATE INDEX idx_follows_following ON follows(following_id);
Enter fullscreen mode Exit fullscreen mode

Redis Key Design

Key Pattern Type Purpose TTL
feed:{user_id} ZSET Pre-computed ranked feed 10 min
story:{user_id} SET of story_ids Active stories 24h
session:{token} STRING Auth session → user_id 7 days
rate:{uid}:{action} COUNTER Rate limit window 1 min
like_ct:{post_id} STRING Atomic like counter No TTL
trending:{window} ZSET Hashtag trending scores Window
viewed:{uid}:{story_id} STRING Story viewed flag 24h

Storage Selection Rationale

Data Storage Why
Users, Posts, Follows, Stories PostgreSQL Relational, consistency required
Likes, Comments, Timelines Cassandra Write-heavy, no JOINs needed
Feeds, Sessions, Counters Redis Speed, TTL support, atomic ops
Search, Hashtags, Explore Elasticsearch Inverted index, full-text, geo
Photos, Videos, Stories S3 + CDN Cheap, durable, globally distributed

11. API Design — Full Contracts {#api-design}


All APIs are RESTful, JWT-authenticated, cursor-paginated, and rate-limited at the API Gateway via Redis.

Auth APIs

POST /v1/auth/register
Body: { username, email, password, full_name }
Returns: { access_token, refresh_token, user }

POST /v1/auth/login
Body: { email, password }
Returns: { access_token, refresh_token }

POST /v1/auth/refresh
Header: Authorization: Bearer <refresh_token>
Returns: { access_token }

POST /v1/auth/logout
Header: Authorization: Bearer <access_token>
Action: DEL session:{token} from Redis
Enter fullscreen mode Exit fullscreen mode

User APIs

GET  /v1/users/:username
Returns: { user_id, username, bio, follower_count, following_count,
           posts_count, is_followed, is_private, is_verified }

PATCH /v1/users/me
Body: { bio?, profile_pic_url?, is_private? }

GET  /v1/users/:id/followers?cursor=<cursor>&limit=20
Returns: { users[], next_cursor }

POST   /v1/users/:id/follow     → idempotent, Kafka: follow.created
DELETE /v1/users/:id/follow     → Kafka: follow.removed
Enter fullscreen mode Exit fullscreen mode

Post APIs

POST /v1/posts
Body: { caption, media_type, location? }
Returns: { post_id, upload_url }   ← pre-signed S3 URL
Action: client uploads directly to S3, S3 event triggers Kafka

GET  /v1/posts/:post_id
Returns: { post, author, like_count, comment_count, is_liked, is_saved }

DELETE /v1/posts/:post_id
Action: is_deleted=true, Kafka: post.deleted → CDN purge

POST   /v1/posts/:post_id/like
Header: Idempotency-Key: <uuid>
Action: Redis INCR + Cassandra write + Kafka: like.created

DELETE /v1/posts/:post_id/like
Action: Redis DECR + Cassandra delete + Kafka: like.removed

GET  /v1/posts/:post_id/comments?cursor=<cursor>&limit=20
Returns: { comments[], next_cursor }   ← from Cassandra

POST /v1/posts/:post_id/comments
Body: { text }
Action: Cassandra write + Kafka: comment.created → notification
Enter fullscreen mode Exit fullscreen mode

Feed & Stories APIs

GET /v1/feed?cursor=<cursor>&limit=10
Action: Redis ZSET read → cache miss → Cassandra rebuild → re-cache
Returns: { posts[], next_cursor }

GET /v1/explore?page=1&limit=20
Action: Elasticsearch + ML ranking
Returns: { posts[], next_page }

POST /v1/stories
Body: { media_type }
Returns: { story_id, upload_url }
Action: Redis TTL set + Kafka: story.created

GET /v1/stories/feed
Returns: { stories[] }   ← unviewed, sorted by recency

POST /v1/stories/:id/view
Action: Redis SADD viewed:{uid} + async view_count INCR

DELETE /v1/stories/:id
Action: is_deleted=true + Kafka: story.deleted → S3 purge
Enter fullscreen mode Exit fullscreen mode

Search APIs

GET /v1/search/users?q=rachit&limit=10
Action: Elasticsearch users_index, fuzzy match, boost by follower_count

GET /v1/search/posts?q=sunset&hashtag=travel&lat=28.6&lng=77.2&radius=10km
Action: Elasticsearch posts_index, geo-filter + text match

GET /v1/search/trending?window=1h
Action: ZREVRANGE trending:1h 0 9 WITHSCORES
Returns: { tags: [{ tag, post_count, delta }] }
Enter fullscreen mode Exit fullscreen mode

Rate Limits

Endpoint Limit
POST /posts 10/min
POST /like 60/min
GET /feed 30/min
POST /comments 20/min
GET /search 20/min
POST /stories 5/min
POST /follow 30/min

12. Edge Cases Nobody Draws on Their Diagram {#edge-cases}

This section is what turns a good system design into a great one.

Celebrity Fan-out Storm

Cristiano Ronaldo posts. 600M followers. Fan-out on write to all of them simultaneously would generate 600M Cassandra writes in seconds — your cluster dies.

Fix: Celebrity detection at post time (follower_count > 1M). Skip fan-out. At feed read time, fetch celebrity’s latest posts separately and merge. The merge happens in the Feed Service, in memory, before Redis caching.

The Disappearing Story

Redis TTL fires (story expires). Kafka consumer is restarting at that exact moment. The story.expired event is consumed, but the consumer crashes before committing the offset. The event replays. The delete runs twice on S3.

Fix: S3 delete is idempotent (deleting a non-existent object returns 204). The Cassandra write is idempotent (same story_id soft-delete runs twice = same result). Design all consumers to handle duplicate events safely.

The Double Like

Network is flaky. User taps like. Request times out client-side. Client retries. Server receives two POST /like requests.

Fix: Idempotency-Key: <uuid> header on every like request. Server checks SETNX idempotency:{key} 1 EX 86400 in Redis before processing. If key exists, return cached response. If not, process and cache. Same key = same result, always.

Comment on Deleted Post

Post is soft-deleted. User (who has the post open on their screen) tries to comment. Request hits Comment Service before the deletion propagates.

Fix: Comment Service calls Post Service to validate is_deleted before writing. Or: API Gateway checks post status. Or: accept the race condition and clean up orphaned comments in a background job. Third option is usually right — the complexity of synchronous cross-service validation isn’t worth the edge case frequency.

Notification Flood

10M likes in 10 minutes on a viral reel. Without batching, the post author gets 10M push notifications.

Fix: Debounce in Notification Service. Redis counter per (recipient_id, post_id, notification_type) with 60-second window. At window close, fire one notification: “Priya and 9,999 others liked your reel.” Reset counter.

Cold Start Feed

New user. Zero followings. Feed is empty.

Fix: Onboarding flow → interest selection → seed feed with high-engagement posts matching selected interests from Elasticsearch. After 5+ follows, switch to normal feed generation.

Geo-Replication Lag

User in Mumbai follows someone in New York. The follow write goes to primary (US). Mumbai’s read replica is 800ms behind. User immediately views the newly-followed account’s profile — replica says “not following.”

Fix: For follow-status checks that are user-initiated immediately after a follow action, route the read to primary (or use a read-your-own-writes cache in Redis). This is the one case where eventual consistency is genuinely confusing to users.


13. Key Trade-offs Summary {#trade-offs}

Decision Trade-off
Cassandra for likes Write speed vs. no ACID, no JOINs
Push feed fan-out Fast reads vs. write amplification for popular accounts
Async Elasticsearch sync Search features vs. 1-2 second indexing lag
Redis like counters Speed vs. 30-second write-back delay
Eventual consistency on replicas Read scale vs. briefly stale data
Soft deletes everywhere Safety / auditability vs. storage overhead
Pre-signed S3 uploads Scalable media ingestion vs. more complex client logic
Hybrid fan-out Balanced throughput vs. more complex feed assembly

Final Thoughts

Instagram at 500M DAU isn’t one system. It’s eight systems — feed, stories, media, notifications, search, likes, comments, and the graph — running in parallel, sharing Kafka as the connective tissue, each independently scalable.

The principles that hold across all of them:

  1. Design for the read path first — reads outnumber writes 80:20
  2. Async everything that doesn’t need to be sync — Kafka is your friend
  3. Name your trade-offs explicitly — “we accept 2-second search lag for write simplicity”
  4. Design for idempotency everywhere — networks fail, retries happen, duplicates arrive
  5. The cache is not the source of truth — always have a fallback to the DB

That’s what Instagram-scale system design looks like.


Next in this series: Why SQL beats NoSQL for 90% of startups — the data, the nuance, and why the benchmarks lie.


About the author: Rachit writes about system design, backend engineering, and the real trade-offs nobody talks about. Follow for weekly deep dives.

Tags: system-design backend distributed-systems instagram software-architecture database kafka redis elasticsearch cassandra

Top comments (0)