Who is this for? Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.
Scale We're Designing For
| Metric | Number |
|---|---|
| Monthly active users | 1B+ |
| Videos uploaded per day | ~34 million |
| Target feed latency (P99) | ~167ms |
| Peak egress bandwidth | ~26 Tbps |
1. Requirements
Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.
Functional requirements:
- Upload and transcode short videos
- Serve a personalized "For You" feed
- Like, comment, share, follow
- Search videos and creators
- Live streaming
Non-functional requirements:
- High availability (99.99% uptime)
- Sub-200ms feed latency
- Horizontal scalability
- Global CDN video delivery
- Strong eventual consistency
2. High-Level Architecture
The system splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph.
┌─────────────────────────────────────────────────┐
│ Mobile / Web Clients │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Global CDN / Edge PoPs │
│ Video delivery, static assets, geo-routing │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ API Gateway + Load Balancer │
│ Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
│ │
┌─────▼──────┐ ┌──────────────┐ ┌▼────────────────┐
│ Upload │ │ Feed Service │ │ Social Graph │
│ Service │ │(pre-compute │ │ Service │
│ │ │ + real-time) │ │ │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐
│ Transcode │ │Recommendation│ │ Notification │
│ Workers │ │ Engine │ │ Service │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐
│ Object │ │ Feature Store│ │ Search Service │
│ Storage │ │(Redis+Cassie)│ │ (Elasticsearch) │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌────────▼────────────────▼────────────▼──────────────┐
│ Async Message Bus (Kafka) │
└──────────┬──────────────┬──────────────┬────────────┘
│ │ │
┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
│MySQL/Vitess│ │ Redis │ │ Cassandra │
│(user data, │ │ (counters,│ │ (timelines, │
│ metadata) │ │ cache) │ │ history) │
└────────────┘ └───────────┘ └─────────────┘
All services communicate asynchronously via Kafka for non-critical paths.
3. Key Components Explained
CDN + Edge PoPs
TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.
Upload Pipeline
Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via SHA-256 before writing. Transcode jobs run on GPU fleets — outputs include 360p, 720p, 1080p, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.
Recommendation Engine
A two-tower neural network:
- Tower 1 — encodes user state (watch history, device, time of day, location)
- Tower 2 — encodes video features (visual embeddings, audio, caption text)
Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.
Feed Assembly (Pre-compute + Real-time Merge)
This is where TikTok differs from Twitter/Instagram:
- Celebrity/high-follow accounts — fan-out on write (posts pushed to follower inboxes eagerly)
- Regular accounts — fan-out on read (merged at request time)
The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s TTL.
Kafka Message Bus
All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:
- Analytics pipeline
- Notification fan-out
- ML feature store updater
- Search indexer
Topics are partitioned by user_id for ordered processing per user. This decouples services and allows independent scaling.
Database Strategy
| Store | Use Case | Why |
|---|---|---|
| MySQL / Vitess | User profiles, video metadata, social graph | ACID, sharded by user_id
|
| Redis Cluster | Counters (likes, views), session tokens, feed cache | Sub-millisecond reads |
| Cassandra | Watch history, timelines, notification logs | Wide-row reads, high write throughput |
4. Key Design Trade-offs
Fan-out on Write vs Read
The classic dilemma in social feed systems. TikTok uses a hybrid approach (the "celebrity problem" split):
Fan-out on write (for accounts with millions of followers):
- Read path is O(1) — just read the inbox
- Fast feed assembly at serving time
- Massive write amplification when a celebrity posts
Fan-out on read (for regular users):
- No write amplification on post
- Storage-efficient
- Slower feed assembly if following thousands of accounts
Eventual vs Strong Consistency
Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.
Push vs Pull for Notifications
Likes and comments use WebSocket push for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email.
5. Back-of-Envelope Estimates
Assumptions: 1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.
Storage:
34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage
Feed reads:
500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps
Bandwidth:
500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress
This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.
6. What Makes TikTok's Architecture Special?
Most social platforms optimize for social graph traversal — show me what people I follow posted. TikTok inverted this: the algorithm is the product. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.
Three things stand out:
Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.
Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.
Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.
Interview Tips
If you're using this for a system design interview:
- Start with requirements — always clarify scale before designing anything
- Estimate first — back-of-envelope math shows you understand the constraints
- Sketch the high-level diagram — then dive into the component your interviewer cares about
- Talk through trade-offs — interviewers want reasoning, not a list of technologies
- Bottleneck hunt — proactively identify where the system will break and how you'd fix it
Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.
Top comments (0)