Daniel Keya

Posted on May 25

Designing TikTok from Scratch — A System Design Deep Dive

#architecture #backend #distributedsystems #systemdesign

Who is this for? Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.

Scale We're Designing For

Metric	Number
Monthly active users	1B+
Videos uploaded per day	~34 million
Target feed latency (P99)	~167ms
Peak egress bandwidth	~26 Tbps

1. Requirements

Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.

Functional requirements:

Upload and transcode short videos
Serve a personalized "For You" feed
Like, comment, share, follow
Search videos and creators
Live streaming

Non-functional requirements:

High availability (99.99% uptime)
Sub-200ms feed latency
Horizontal scalability
Global CDN video delivery
Strong eventual consistency

2. High-Level Architecture

The system splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph.

┌─────────────────────────────────────────────────┐
│              Mobile / Web Clients                │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│         Global CDN / Edge PoPs                   │
│   Video delivery, static assets, geo-routing    │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│       API Gateway + Load Balancer                │
│   Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
         │                             │
   ┌─────▼──────┐  ┌──────────────┐  ┌▼────────────────┐
   │  Upload    │  │ Feed Service │  │  Social Graph   │
   │  Service   │  │(pre-compute  │  │    Service      │
   │            │  │ + real-time) │  │                 │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │ Transcode  │  │Recommendation│  │  Notification   │
   │  Workers   │  │   Engine     │  │    Service      │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │  Object    │  │ Feature Store│  │  Search Service │
   │  Storage   │  │(Redis+Cassie)│  │ (Elasticsearch) │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
┌────────▼────────────────▼────────────▼──────────────┐
│              Async Message Bus (Kafka)               │
└──────────┬──────────────┬──────────────┬────────────┘
           │              │              │
    ┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
    │MySQL/Vitess│ │   Redis   │ │  Cassandra  │
    │(user data, │ │ (counters,│ │ (timelines, │
    │ metadata)  │ │  cache)   │ │  history)   │
    └────────────┘ └───────────┘ └─────────────┘

All services communicate asynchronously via Kafka for non-critical paths.

3. Key Components Explained

CDN + Edge PoPs

TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.

Upload Pipeline

Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via SHA-256 before writing. Transcode jobs run on GPU fleets — outputs include 360p, 720p, 1080p, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.

Recommendation Engine

A two-tower neural network:

Tower 1 — encodes user state (watch history, device, time of day, location)
Tower 2 — encodes video features (visual embeddings, audio, caption text)

Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.

Feed Assembly (Pre-compute + Real-time Merge)

This is where TikTok differs from Twitter/Instagram:

Celebrity/high-follow accounts — fan-out on write (posts pushed to follower inboxes eagerly)
Regular accounts — fan-out on read (merged at request time)

The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s TTL.

Kafka Message Bus

All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:

Analytics pipeline
Notification fan-out
ML feature store updater
Search indexer

Topics are partitioned by user_id for ordered processing per user. This decouples services and allows independent scaling.

Database Strategy

Store	Use Case	Why
MySQL / Vitess	User profiles, video metadata, social graph	ACID, sharded by `user_id`
Redis Cluster	Counters (likes, views), session tokens, feed cache	Sub-millisecond reads
Cassandra	Watch history, timelines, notification logs	Wide-row reads, high write throughput

4. Key Design Trade-offs

Fan-out on Write vs Read

The classic dilemma in social feed systems. TikTok uses a hybrid approach (the "celebrity problem" split):

Fan-out on write (for accounts with millions of followers):

Read path is O(1) — just read the inbox
Fast feed assembly at serving time
Massive write amplification when a celebrity posts

Fan-out on read (for regular users):

No write amplification on post
Storage-efficient
Slower feed assembly if following thousands of accounts

Eventual vs Strong Consistency

Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.

Push vs Pull for Notifications

Likes and comments use WebSocket push for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email.

5. Back-of-Envelope Estimates

Assumptions: 1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.

Storage:

34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage

Feed reads:

500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps

Bandwidth:

500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress

This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.

6. What Makes TikTok's Architecture Special?

Most social platforms optimize for social graph traversal — show me what people I follow posted. TikTok inverted this: the algorithm is the product. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.

Three things stand out:

Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.
Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.
Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.

Interview Tips

If you're using this for a system design interview:

Start with requirements — always clarify scale before designing anything
Estimate first — back-of-envelope math shows you understand the constraints
Sketch the high-level diagram — then dive into the component your interviewer cares about
Talk through trade-offs — interviewers want reasoning, not a list of technologies
Bottleneck hunt — proactively identify where the system will break and how you'd fix it

Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.

DEV Community