Rachit Misra

Posted on Apr 3

Designing Instagram at Scale: A Complete System Design Deep Dive

#architecture #backend #distributedsystems #systemdesign

From a ₹800/month server to 500M daily users — every component, every trade-off, every edge case.

Why Instagram is a perfect system design problem
The numbers that define the problem
The scaling journey — Stage by Stage
Component Deep Dive: Feed Generation
Component Deep Dive: Stories & Expiry
Component Deep Dive: Media Upload & CDN
Component Deep Dive: Notifications
Component Deep Dive: Search & Discovery
Component Deep Dive: Likes & Comments
Database Design — Every Decision Justified
API Design — Full Contracts
Edge Cases Nobody Draws on Their Diagram
Key Trade-offs Summary

1. Why Instagram is a Perfect System Design Problem {#why-instagram}

Instagram sits at the intersection of every hard distributed systems problem:

Read-heavy (people scroll more than they post)
Write-heavy at peaks (52,000 likes per second)
Media-intensive (photos, videos, reels, stories)
Real-time (stories expire, feeds update, notifications land)
Socially connected (the graph makes everything harder)
Globally distributed (500M users across every timezone)

It’s not one hard problem. It’s eight hard problems running simultaneously, sharing infrastructure, with users who notice every hiccup.

This is why it appears in almost every senior system design interview. And why most candidates fail it — not because they don’t know the components, but because they don’t know why each component exists and what breaks without it.

This article covers everything. By the end you’ll be able to design Instagram from first principles, justify every decision, and handle every curveball an interviewer can throw.

2. The Numbers That Define the Problem {#the-numbers}

Before writing a single box on your architecture diagram, establish the scale. This isn’t optional ceremony — it determines every design decision you make.

User scale:

2B registered users
500M Daily Active Users (DAU)
Peak concurrent users: ~50M

Content scale:

100M photos/videos uploaded per day → ~1,150 uploads/second
500M stories created per day
4.5B likes per day → ~52,000 likes/second
100M comments per day → ~1,150 comments/second

Read scale:

Each user opens the app ~7x/day
3.5B feed loads/day → ~40,000 feed requests/second
Feed load is your most expensive operation

Storage scale:

Average photo: 3MB (after compression)
100M photos/day × 3MB × 3 sizes = ~900TB new storage per day
Video and reels multiply this significantly

Derived constraints:

Read:Write ratio ≈ 80:20 (mostly read)
Feed generation is the critical path
Like storage needs write-optimised infrastructure
Media storage needs a CDN — serving from origin is impossible at this scale

Now you can design. Everything flows from these numbers.

3. The Scaling Journey — Stage by Stage {#scaling-journey}

The biggest mistake in system design interviews is jumping straight to the 500M DAU architecture. Real systems don’t start there. Understanding the journey is what separates a junior answer from a senior one.

Stage 1 — 1K DAU: Ship Fast

Infrastructure: Single server, single PostgreSQL instance, S3 for photos.

What works: Everything. At 1K users, you have no scaling problems. Your only job is shipping features.

What breaks first: PostgreSQL connection limits. Default is 100 max connections. At ~80 concurrent users hitting the DB, you start seeing too many connections errors. Fix: PgBouncer for connection pooling. Trade-off: one more component to operate.

Architecture:

Client → Single Server (App + Postgres + PgBouncer) → S3

Stage 2 — 100K DAU: The First Real Pain

What breaks: Feed queries. SELECT posts WHERE user_id IN (list_of_500_followings) ORDER BY created_at DESC LIMIT 10 becomes a slow full table scan as posts accumulate.

Fixes:

Redis for pre-computed feeds (cache-aside pattern, TTL 10 min)
Read replicas so reads don’t compete with writes
CDN (CloudFront) in front of S3 — stop serving media from origin
Story expiry cron — a job every 15 minutes marking expired stories deleted

New problems introduced:

Cache invalidation: whose feed do you invalidate when someone posts?
Read replica lag: users might briefly see stale data (eventual consistency)

Architecture:

Client → Load Balancer → App Servers → {Postgres Primary, Redis, S3+CDN}
                                     ↓
                              Postgres Read Replicas

Stage 3 — 10M DAU: Real Distributed Systems

This is the interesting stage. Three things break simultaneously.

What breaks:

Monolith deployment slows down feature development — team coordination hell
Like/comment write throughput saturates PostgreSQL
Text search with LIKE queries is unusably slow

Fixes:

Split into microservices (User, Post, Feed, Comment, Notification, Search)
Introduce Kafka as the event backbone — services stop calling each other synchronously
Cassandra for likes and comments (write-optimised, no transactions needed)
Elasticsearch for search, hashtags, and explore

Architecture:

Client → API Gateway → Microservices → Kafka → Consumers
                              ↓
              {Postgres, Redis, Cassandra, Elasticsearch, S3+CDN}

Stage 4 — 500M DAU: Planetary Scale

What changes:

Geo-distribution: data centres in US, EU, Asia-Pacific
ML-powered feed ranking replaces chronological ordering
Sharding Postgres by user_id across multiple instances
Cassandra runs as a multi-region cluster
Kafka handles millions of events per second with consumer groups

4. Component Deep Dive: Feed Generation {#feed-generation}

Feed generation is the hardest problem in the Instagram system design. Get this wrong and every other component is irrelevant.

The Core Question: Push vs Pull

Fan-out on Write (Push):
When a user posts, immediately write that post to every follower’s feed.

✅ Feed reads are O(1) — just read the pre-computed list
❌ Write amplification: 1 post × 10,000 followers = 10,000 writes
❌ Catastrophic for celebrities (Ronaldo posting = 600M writes)

Fan-out on Read (Pull):
When a user opens their feed, fetch posts from everyone they follow in real-time.

✅ No write amplification — posts are written once
❌ Read is expensive: fetch from 500 followings, merge, sort, rank
❌ Slow for power users with many followings

Instagram’s Solution: Hybrid Fan-out

Regular users (< 1M followers): push model — fan-out on write to their followers’ feeds
Celebrity users (> 1M followers): pull model — merge their latest posts at read time

On feed load:

Read pre-computed feed from Redis ZSET (sorted by ML ranking score)
For any celebrity accounts the user follows, fetch their latest posts
Merge, re-rank, serve

Feed Storage in Redis:

Key: feed:{user_id}
Type: ZSET (sorted set)
Score: ML ranking score (not timestamp)
Value: post_id
TTL: 10 minutes

On cache miss → fall back to Cassandra user_timeline table → re-rank → re-cache.

ML Ranking Signals:

Recency (newer posts scored higher)
Relationship strength (how often you interact with this account)
Post engagement velocity (likes/comments in first hour)
Content type preference (video vs photo history)
Session context (what you’ve engaged with this session)

Feed Edge Cases

Offline user returning after 2 weeks:
Don’t backfill 14 days of fan-out events. Their feed cache is cold and stale. Generate fresh on first open from Cassandra. Accept that the first load is slightly slower.

User unfollows someone mid-request:
Eventual consistency means you might briefly surface one post from an unfollowed account. Don’t try to prevent this at the storage layer — the complexity isn’t worth it. Filter at the display layer if it’s a concern.

Deleted post in cached feed:
Store is_deleted flag. Check at serve time. Never serve deleted content from cache regardless of what the feed list says.

New user with zero followings (cold start):
Show explore/trending content until they follow enough accounts for a meaningful feed.

5. Component Deep Dive: Stories & Expiry {#stories}

Stories feel deceptively simple — post a photo, it disappears after 24 hours. The distributed expiry pipeline behind this is non-trivial.

The Storage Architecture

Story metadata  → PostgreSQL (story_id, user_id, expires_at, is_deleted)
Story TTL       → Redis SET (key: story:{user_id}, TTL: 24h)
Story media     → S3 (deleted async after expiry)
Story views     → Redis SET (key: viewed:{user_id}:{story_id}) + async counter

The Expiry Pipeline

Story uploaded → expires_at = NOW() + 24h written to Postgres
Redis key set with matching TTL
On Redis TTL expiry → Kafka story.expired event published
Kafka consumer: soft-delete in Postgres (is_deleted = true)
Kafka consumer: issue S3 delete for the media file
Kafka consumer: invalidate CDN cache for the media URL

The problem: What if the Kafka consumer is down when the TTL fires?

The fix: Reconciliation cron job running every 15 minutes:

SELECT story_id FROM stories
WHERE expires_at < NOW()
AND is_deleted = false;

Anything this finds is cleaned up. Eventual deletion — not real-time. Acceptable for stories.

Story Feed

When a user opens stories:

Fetch user IDs they follow from Postgres (or Redis cache)
For each, check if story:{user_id} key exists in Redis
Return story IDs sorted by recency
Mark viewed: SADD viewed:{viewer_id}:{story_id} in Redis (idempotent)
Increment view count async (avoid hot write on every view)

6. Component Deep Dive: Media Upload & CDN {#media-upload}

At 1,150 uploads per second, your servers cannot be in the media path. Every byte going through your application servers is wasted CPU and network.

Pre-Signed S3 Upload Flow

1. Client → POST /v1/posts  { caption, media_type }
2. Server → generates pre-signed S3 URL (valid 15 min) + post_id
3. Server → returns { post_id, upload_url } to client
4. Client → PUT directly to S3 using upload_url
5. S3 → fires s3:ObjectCreated event
6. Lambda/consumer → publishes media.uploaded to Kafka
7. Kafka consumer → generates thumbnails, updates post status, triggers feed fan-out

Your servers touch zero bytes of media. They handle only metadata.

Media Sizes

Every photo is stored in three sizes:

Thumbnail: 150×150px — profile grids, search results
Feed: 720px wide — home feed display
Full: 1080px wide — post detail view
Original: preserved for potential future use

Stored at: s3://ig-media-{region}/{user_id}/{post_id}/{size}.jpg

CDN Strategy

CloudFront in front of all S3 buckets.

Cache-Control headers: max-age=31536000 (1 year) for immutable media
Edge locations serve 95%+ of media requests — origin never gets hit
On post delete: CDN invalidation API call (small window of stale serving — acceptable)

The edge case: Client successfully uploads to S3 but dies before confirming to your API.

The fix: S3 event notification independently triggers the Kafka event. Your Post Service confirms the upload without waiting for client confirmation. The client can poll GET /v1/posts/:post_id to check status.

7. Component Deep Dive: Notifications {#notifications}

Notifications are a fan-out problem dressed in a UX problem’s clothing.

Notification Types & Channels

Trigger	Channel	Latency Target
Like on your post	Push (FCM/APNs) + In-app	< 5 seconds
Comment on your post	Push + In-app	< 5 seconds
New follower	Push + In-app	< 10 seconds
Story view	In-app only	< 30 seconds
Mention in caption	Push + In-app	< 5 seconds

The Pipeline

Action → Kafka event → Notification Service consumer
       → Enrich (fetch user prefs, device tokens, do-not-disturb)
       → Route (push? in-app? email? all?)
       → Send via FCM (Android) / APNs (iOS)
       → Store in notifications DB for in-app feed

The Hard Edge Cases

Notification storm — viral post:
A post gets 10M likes. Without batching, your Notification Service receives 10M like.created events and tries to push 10M individual notifications to the post author.

Fix: Debouncing in the Notification Service.

Window: 60 seconds
If like.created events for the same post_id + user_id (recipient) exceed threshold → batch into “X and 9,999 others liked your post”
Store the count in Redis, flush as single notification at window close

Dead device tokens:
User uninstalls app. FCM/APNs return NotRegistered or BadDeviceToken on delivery attempt.

Fix: Notification Service listens for delivery failure callbacks → marks device token as invalid in DB → stops sending to that token.

User preference: notifications off:
Check user notification preferences before publishing to Kafka. Don’t generate events for users who have disabled that notification type. Saves downstream processing entirely.

Do Not Disturb windows:
Store user timezone + DND preferences. Notification Service checks at delivery time — if in DND window, store notification, deliver at window end.

8. Component Deep Dive: Search & Discovery {#search}

Why Not Postgres?

SELECT * FROM posts WHERE caption LIKE '%golden gate%' is a sequential scan on a table with billions of rows. At any meaningful scale, this query will timeout before returning.

You need an inverted index. That’s Elasticsearch.

Elasticsearch Index Design

Posts Index:

{
  "post_id": "keyword",
  "caption": "text (analyzed, english stemming)",
  "hashtags": ["keyword"],
  "location": "geo_point",
  "created_at": "date",
  "like_count": "integer",
  "user_id": "keyword",
  "is_deleted": "boolean"
}

Users Index:

{
  "user_id": "keyword",
  "username": "keyword",
  "bio": "text",
  "follower_count": "integer",
  "is_private": "boolean",
  "is_verified": "boolean"
}

Keeping Elasticsearch in Sync

Elasticsearch is updated asynchronously from Postgres via Kafka:

Postgres write → Kafka (post.created / post.updated / post.deleted)
              → Elasticsearch consumer → index update
              → Lag: ~1-2 seconds

The dual-source pattern:

Post appears in owner’s profile immediately (read from Postgres — source of truth)
Post appears in search results after ~2 seconds (read from Elasticsearch)

Two sources of truth for two different use cases. This is intentional, not a bug.

Trending Hashtags

Trending is a sliding window count problem. Redis handles it elegantly:

On hashtag used: ZINCRBY trending:1h <tag> 1
                 ZINCRBY trending:24h <tag> 1
                 ZINCRBY trending:7d <tag> 1

Expire keys: trending:1h → TTL 1 hour (rolling via scheduled reset)
             trending:24h → TTL 24 hours
             trending:7d → TTL 7 days

Read trending: ZREVRANGE trending:1h 0 9 WITHSCORES

For true sliding windows (not fixed-window resets), use a sorted set with timestamps as members and prune periodically with ZREMRANGEBYSCORE.

Explore / Discover

Explore isn’t search — it’s recommendation. ML-powered, personalised, continuously reranked.

Pipeline:

Candidate generation: posts with high engagement velocity in last 24h
User interest modelling: what content types has this user engaged with?
Collaborative filtering: what are similar users engaging with?
Re-ranking: apply diversity, freshness, safety filters
Serve top 50 candidates per request

Infrastructure: Apache Spark for batch feature computation, TensorFlow Serving for real-time scoring, Redis for caching ranked candidate lists per user.

9. Component Deep Dive: Likes & Comments {#likes-comments}

Why Postgres Can’t Handle Likes

52,000 likes per second. In Postgres, each like is:

An INSERT into the likes table
An UPDATE on the post’s like_count
Potentially a row lock while updating the count

At 52K/second, you’ll hit write contention, lock timeouts, and deadlocks. Postgres wasn’t built for this write pattern.

Cassandra for Likes

-- Cassandra table design
CREATE TABLE likes (
  post_id UUID,
  user_id UUID,
  reaction_type TEXT,
  created_at TIMEUUID,
  PRIMARY KEY (post_id, user_id)
);

Why this schema:

post_id as partition key → all likes for a post on one node
user_id as clustering key → O(1) check “has this user liked this post?”
TIMEUUID for ordering without separate timestamp column
INSERT is idempotent → same (post_id, user_id) twice = one like (handles retries)

Like count:
Don’t store count in Cassandra (COUNTER type has consistency quirks). Instead:

Atomic INCR in Redis: like_ct:{post_id}
Write-back to Postgres posts.like_count every 30 seconds async
Accept: count shown may be ~30s behind actual. Nobody notices.

Comments in Cassandra

CREATE TABLE comments (
  post_id UUID,
  comment_id TIMEUUID,
  user_id UUID,
  text TEXT,
  like_count INT,
  PRIMARY KEY (post_id, comment_id)
) WITH CLUSTERING ORDER BY (comment_id ASC);

Why TIMEUUID as clustering key:
Ordering is built into the key — no ORDER BY at query time. Comments are naturally sorted chronologically. Pagination with WHERE comment_id > <last_seen> is efficient.

Query pattern:

SELECT * FROM comments
WHERE post_id = ?
AND comment_id > ? -- cursor
LIMIT 20;

Efficient. No full scans. Scales to millions of comments per post.

10. Database Design — Every Decision Justified {#database-design}

PostgreSQL — The Relational Core

users table:

CREATE TABLE users (
  user_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  username     VARCHAR(30) UNIQUE NOT NULL,
  email        VARCHAR(255) UNIQUE NOT NULL,
  password_hash TEXT NOT NULL,
  bio          TEXT,
  profile_pic_url TEXT,
  follower_count INT DEFAULT 0,
  following_count INT DEFAULT 0,
  is_private   BOOLEAN DEFAULT false,
  is_verified  BOOLEAN DEFAULT false,
  created_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);

posts table:

CREATE TABLE posts (
  post_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id      UUID NOT NULL REFERENCES users(user_id),
  caption      TEXT,
  media_urls   TEXT[],
  media_type   VARCHAR(10) CHECK (media_type IN ('photo','video','reel')),
  location_lat DECIMAL(9,6),
  location_lng DECIMAL(9,6),
  like_count   BIGINT DEFAULT 0,
  comment_count INT DEFAULT 0,
  is_deleted   BOOLEAN DEFAULT false,
  created_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_user_created ON posts(user_id, created_at DESC);

follows table:

CREATE TABLE follows (
  follower_id  UUID NOT NULL REFERENCES users(user_id),
  following_id UUID NOT NULL REFERENCES users(user_id),
  status       VARCHAR(10) CHECK (status IN ('active','pending','blocked')),
  created_at   TIMESTAMPTZ DEFAULT NOW(),
  PRIMARY KEY (follower_id, following_id)
);

CREATE INDEX idx_follows_following ON follows(following_id);

Redis Key Design

Key Pattern	Type	Purpose	TTL
`feed:{user_id}`	ZSET	Pre-computed ranked feed	10 min
`story:{user_id}`	SET of story_ids	Active stories	24h
`session:{token}`	STRING	Auth session → user_id	7 days
`rate:{uid}:{action}`	COUNTER	Rate limit window	1 min
`like_ct:{post_id}`	STRING	Atomic like counter	No TTL
`trending:{window}`	ZSET	Hashtag trending scores	Window
`viewed:{uid}:{story_id}`	STRING	Story viewed flag	24h

Storage Selection Rationale

Data	Storage	Why
Users, Posts, Follows, Stories	PostgreSQL	Relational, consistency required
Likes, Comments, Timelines	Cassandra	Write-heavy, no JOINs needed
Feeds, Sessions, Counters	Redis	Speed, TTL support, atomic ops
Search, Hashtags, Explore	Elasticsearch	Inverted index, full-text, geo
Photos, Videos, Stories	S3 + CDN	Cheap, durable, globally distributed

11. API Design — Full Contracts {#api-design}

All APIs are RESTful, JWT-authenticated, cursor-paginated, and rate-limited at the API Gateway via Redis.

Auth APIs

POST /v1/auth/register
Body: { username, email, password, full_name }
Returns: { access_token, refresh_token, user }

POST /v1/auth/login
Body: { email, password }
Returns: { access_token, refresh_token }

POST /v1/auth/refresh
Header: Authorization: Bearer <refresh_token>
Returns: { access_token }

POST /v1/auth/logout
Header: Authorization: Bearer <access_token>
Action: DEL session:{token} from Redis

User APIs

GET  /v1/users/:username
Returns: { user_id, username, bio, follower_count, following_count,
           posts_count, is_followed, is_private, is_verified }

PATCH /v1/users/me
Body: { bio?, profile_pic_url?, is_private? }

GET  /v1/users/:id/followers?cursor=<cursor>&limit=20
Returns: { users[], next_cursor }

POST   /v1/users/:id/follow     → idempotent, Kafka: follow.created
DELETE /v1/users/:id/follow     → Kafka: follow.removed

Post APIs

POST /v1/posts
Body: { caption, media_type, location? }
Returns: { post_id, upload_url }   ← pre-signed S3 URL
Action: client uploads directly to S3, S3 event triggers Kafka

GET  /v1/posts/:post_id
Returns: { post, author, like_count, comment_count, is_liked, is_saved }

DELETE /v1/posts/:post_id
Action: is_deleted=true, Kafka: post.deleted → CDN purge

POST   /v1/posts/:post_id/like
Header: Idempotency-Key: <uuid>
Action: Redis INCR + Cassandra write + Kafka: like.created

DELETE /v1/posts/:post_id/like
Action: Redis DECR + Cassandra delete + Kafka: like.removed

GET  /v1/posts/:post_id/comments?cursor=<cursor>&limit=20
Returns: { comments[], next_cursor }   ← from Cassandra

POST /v1/posts/:post_id/comments
Body: { text }
Action: Cassandra write + Kafka: comment.created → notification

Feed & Stories APIs

GET /v1/feed?cursor=<cursor>&limit=10
Action: Redis ZSET read → cache miss → Cassandra rebuild → re-cache
Returns: { posts[], next_cursor }

GET /v1/explore?page=1&limit=20
Action: Elasticsearch + ML ranking
Returns: { posts[], next_page }

POST /v1/stories
Body: { media_type }
Returns: { story_id, upload_url }
Action: Redis TTL set + Kafka: story.created

GET /v1/stories/feed
Returns: { stories[] }   ← unviewed, sorted by recency

POST /v1/stories/:id/view
Action: Redis SADD viewed:{uid} + async view_count INCR

DELETE /v1/stories/:id
Action: is_deleted=true + Kafka: story.deleted → S3 purge

Search APIs

GET /v1/search/users?q=rachit&limit=10
Action: Elasticsearch users_index, fuzzy match, boost by follower_count

GET /v1/search/posts?q=sunset&hashtag=travel&lat=28.6&lng=77.2&radius=10km
Action: Elasticsearch posts_index, geo-filter + text match

GET /v1/search/trending?window=1h
Action: ZREVRANGE trending:1h 0 9 WITHSCORES
Returns: { tags: [{ tag, post_count, delta }] }

Rate Limits

Endpoint	Limit
POST /posts	10/min
POST /like	60/min
GET /feed	30/min
POST /comments	20/min
GET /search	20/min
POST /stories	5/min
POST /follow	30/min

12. Edge Cases Nobody Draws on Their Diagram {#edge-cases}

This section is what turns a good system design into a great one.

Celebrity Fan-out Storm

Cristiano Ronaldo posts. 600M followers. Fan-out on write to all of them simultaneously would generate 600M Cassandra writes in seconds — your cluster dies.

Fix: Celebrity detection at post time (follower_count > 1M). Skip fan-out. At feed read time, fetch celebrity’s latest posts separately and merge. The merge happens in the Feed Service, in memory, before Redis caching.

The Disappearing Story

Redis TTL fires (story expires). Kafka consumer is restarting at that exact moment. The story.expired event is consumed, but the consumer crashes before committing the offset. The event replays. The delete runs twice on S3.

Fix: S3 delete is idempotent (deleting a non-existent object returns 204). The Cassandra write is idempotent (same story_id soft-delete runs twice = same result). Design all consumers to handle duplicate events safely.

The Double Like

Network is flaky. User taps like. Request times out client-side. Client retries. Server receives two POST /like requests.

Fix: Idempotency-Key: <uuid> header on every like request. Server checks SETNX idempotency:{key} 1 EX 86400 in Redis before processing. If key exists, return cached response. If not, process and cache. Same key = same result, always.

Comment on Deleted Post

Post is soft-deleted. User (who has the post open on their screen) tries to comment. Request hits Comment Service before the deletion propagates.

Fix: Comment Service calls Post Service to validate is_deleted before writing. Or: API Gateway checks post status. Or: accept the race condition and clean up orphaned comments in a background job. Third option is usually right — the complexity of synchronous cross-service validation isn’t worth the edge case frequency.

Notification Flood

10M likes in 10 minutes on a viral reel. Without batching, the post author gets 10M push notifications.

Fix: Debounce in Notification Service. Redis counter per (recipient_id, post_id, notification_type) with 60-second window. At window close, fire one notification: “Priya and 9,999 others liked your reel.” Reset counter.

Cold Start Feed

New user. Zero followings. Feed is empty.

Fix: Onboarding flow → interest selection → seed feed with high-engagement posts matching selected interests from Elasticsearch. After 5+ follows, switch to normal feed generation.

Geo-Replication Lag

User in Mumbai follows someone in New York. The follow write goes to primary (US). Mumbai’s read replica is 800ms behind. User immediately views the newly-followed account’s profile — replica says “not following.”

Fix: For follow-status checks that are user-initiated immediately after a follow action, route the read to primary (or use a read-your-own-writes cache in Redis). This is the one case where eventual consistency is genuinely confusing to users.

13. Key Trade-offs Summary {#trade-offs}

Decision	Trade-off
Cassandra for likes	Write speed vs. no ACID, no JOINs
Push feed fan-out	Fast reads vs. write amplification for popular accounts
Async Elasticsearch sync	Search features vs. 1-2 second indexing lag
Redis like counters	Speed vs. 30-second write-back delay
Eventual consistency on replicas	Read scale vs. briefly stale data
Soft deletes everywhere	Safety / auditability vs. storage overhead
Pre-signed S3 uploads	Scalable media ingestion vs. more complex client logic
Hybrid fan-out	Balanced throughput vs. more complex feed assembly

Final Thoughts

Instagram at 500M DAU isn’t one system. It’s eight systems — feed, stories, media, notifications, search, likes, comments, and the graph — running in parallel, sharing Kafka as the connective tissue, each independently scalable.

The principles that hold across all of them:

Design for the read path first — reads outnumber writes 80:20
Async everything that doesn’t need to be sync — Kafka is your friend
Name your trade-offs explicitly — “we accept 2-second search lag for write simplicity”
Design for idempotency everywhere — networks fail, retries happen, duplicates arrive
The cache is not the source of truth — always have a fallback to the DB

That’s what Instagram-scale system design looks like.

Next in this series: Why SQL beats NoSQL for 90% of startups — the data, the nuance, and why the benchmarks lie.

About the author: Rachit writes about system design, backend engineering, and the real trade-offs nobody talks about. Follow for weekly deep dives.

Tags: system-design backend distributed-systems instagram software-architecture database kafka redis elasticsearch cassandra

Table of Contents

1. Why Instagram is a Perfect System Design Problem {#why-instagram}

2. The Numbers That Define the Problem {#the-numbers}

3. The Scaling Journey — Stage by Stage {#scaling-journey}

Stage 1 — 1K DAU: Ship Fast

Stage 2 — 100K DAU: The First Real Pain

Stage 3 — 10M DAU: Real Distributed Systems

Stage 4 — 500M DAU: Planetary Scale

4. Component Deep Dive: Feed Generation {#feed-generation}

The Core Question: Push vs Pull

Feed Edge Cases

5. Component Deep Dive: Stories & Expiry {#stories}

The Storage Architecture

The Expiry Pipeline

Story Feed

6. Component Deep Dive: Media Upload & CDN {#media-upload}

Pre-Signed S3 Upload Flow

Media Sizes

CDN Strategy

7. Component Deep Dive: Notifications {#notifications}

Notification Types & Channels

The Pipeline

The Hard Edge Cases

8. Component Deep Dive: Search & Discovery {#search}

Why Not Postgres?

Elasticsearch Index Design

Keeping Elasticsearch in Sync

Trending Hashtags

Explore / Discover

9. Component Deep Dive: Likes & Comments {#likes-comments}

Why Postgres Can’t Handle Likes

Cassandra for Likes

Comments in Cassandra

10. Database Design — Every Decision Justified {#database-design}

PostgreSQL — The Relational Core

Redis Key Design

Storage Selection Rationale

11. API Design — Full Contracts {#api-design}

Auth APIs

User APIs

Post APIs

Feed & Stories APIs

Search APIs

Rate Limits

12. Edge Cases Nobody Draws on Their Diagram {#edge-cases}

Celebrity Fan-out Storm

The Disappearing Story

The Double Like

Comment on Deleted Post

Notification Flood

Cold Start Feed

Geo-Replication Lag

13. Key Trade-offs Summary {#trade-offs}

Final Thoughts