Introduction
Netflix serves 4K video to 200 million subscribers simultaneously without buffering. On the surface it is just playing a video file. Underneath it is one of the most sophisticated distributed systems ever built — spanning global content delivery, adaptive streaming, parallel encoding pipelines, geo licensing, and personalized recommendations all working together seamlessly. This post walks through every challenge question by question, including wrong turns and how to navigate out of them.
Challenge 1: Global Video Delivery
Interview Question: If you had one central server in the US storing all Netflix movies, a user in Mumbai, a user in London, and a user in Tokyo all click play simultaneously. What is the most fundamental problem they all face?
Navigation: Serving 100GB video files from a single central server to users across the world means every user pays the cost of geographic distance — high latency, slow start times, and saturated long distance network links. The solution is obvious once you frame it correctly — video data needs to physically live close to the user.
Solution: Content Delivery Network — CDN with regional servers.
Netflix built their own CDN called Open Connect. They place servers called Open Connect Appliances directly inside ISP data centers worldwide. Instead of video traveling from a US central server to Mumbai, it travels from a Mumbai ISP server a few milliseconds away.
- Central server stores master copy of all content
- Regional CDN servers cache popular content close to users
- Mumbai user streams from Mumbai CDN server — low latency
- London user streams from London CDN server — low latency
- Tokyo user streams from Tokyo CDN server — low latency
Key Insight: Video data must live geographically close to the user. A CDN is not an optimization — it is a fundamental requirement for global video streaming at scale.
Challenge 2: Smart CDN Cache Management
Interview Question: CDN storage is expensive. You cannot cache every title on every regional server. How do you decide what content to cache on which regional server — and what happens when a user requests a title not cached on their nearest CDN?
Solution: Two caching strategies working together — push and pull.
Push based caching — proactive:
- New blockbuster release approaching launch day
- Netflix proactively pushes content to all relevant CDN servers before launch
- Day one release — content already cached everywhere — zero cache misses on launch
Pull based caching — reactive:
- User requests title not cached on nearest CDN server
- CDN pulls content from central server on first request
- Caches it locally with TTL for future requests
- All subsequent users in that region hit the cache
Unpopular titles:
- Rarely requested content never gets cached
- Served directly from central server
- CDN storage reserved for content that justifies caching
Dynamic TTL based on viewing patterns:
A fixed TTL for all content is naive. A show extremely popular during its launch week but dead 3 months later should not hold CDN space indefinitely.
Solution: Viewing Analytics Service collects watch events and publishes to Kafka. TTL Management Service consumes from Kafka, computes viewing frequency per title per region, and dynamically adjusts TTL accordingly.
- Avengers launch week — 10 million views — TTL 30 days
- Avengers 3 months later — 1000 views — TTL 3 days
- Obscure documentary — 10 views — TTL 0, evict immediately
All TTL adjustments happen asynchronously — never blocking the streaming experience.
Key Insight: Smart CDN management combines proactive push for known blockbusters, reactive pull for long tail content, and dynamic TTL adjustment based on real viewing patterns. Static caching strategies waste expensive CDN storage.
Challenge 3: Video Chunking and Instant Playback
Interview Question: A 4K HDR movie is 100GB. A user has a 50 Mbps connection. Downloading 100GB takes 4.4 hours. But Netflix starts playing in under 3 seconds. How is this physically possible?
Navigation: The user does not need all 100GB before playback starts. They only need the first few seconds of video. If you split the video into small chunks and start playing the first chunk while the rest download in the background, playback starts almost instantly.
Solution: Video Chunking — split video into 2 to 10 second chunks.
- 3 hour movie split into roughly 2000 individual chunks
- Each chunk is 2 to 10 seconds of video
- User needs only the first chunk to start playing — a few megabytes
- Subsequent chunks download in background while current chunk plays
- Playback starts in under 3 seconds regardless of file size
Key Insight: You never need the whole file to start playing. Chunking transforms an impossible 100GB download problem into a trivial few megabyte first chunk problem.
Challenge 4: Adaptive Bitrate Streaming
Interview Question: Netflix stores the same chunk at 5 different quality levels. Why — and who decides which quality to request?
Netflix stores every chunk at multiple quality levels:
- 4K HDR — 8 Mbps — 4MB per chunk
- 1080p — 4 Mbps — 2MB per chunk
- 720p — 2 Mbps — 1MB per chunk
- 480p — 1 Mbps — 0.5MB per chunk
- 360p — 0.5 Mbps — 0.25MB per chunk
Different users have different bandwidth. The same user has different bandwidth at different moments — strong WiFi at home, weak signal in the kitchen, mobile data on the commute.
With a single quality video: connection degrades → buffering → terrible experience.
With multiple quality versions: connection degrades → seamlessly switch to lower quality chunk → no buffering.
The client decides — not the server. The Netflix app runs an ABR Algorithm — Adaptive Bitrate Algorithm — that continuously monitors download speed of recent chunks and current buffer level, then decides which quality chunk to request next.
ABR decision logic:
- Buffer above 30 seconds and speed above 20 Mbps — request 4K chunk
- Buffer above 15 seconds and speed above 8 Mbps — request 1080p chunk
- Buffer below 10 seconds and speed dropping — request 720p chunk
- Buffer below 5 seconds — request 360p immediately to prevent buffering
Quality adjustments are seamless and invisible to the user. The app makes hundreds of these decisions per viewing session.
Key Insight: Multiple quality versions per chunk combined with client side adaptive bitrate selection eliminates buffering under any network condition. The client always has the right quality for current bandwidth.
Challenge 5: Parallel Video Encoding Pipeline
Interview Question: A raw master file could be 1TB of uncompressed footage. Netflix needs to create thousands of chunks at 5 quality levels each. Encoding a 3 hour movie sequentially on one machine could take days. Netflix adds thousands of new titles every year. How do you process them fast enough?
Navigation: The key insight is that chunks are independent of each other. You do not need to encode chunk 1 before encoding chunk 2. If you can encode all chunks simultaneously across thousands of machines, a process that took days takes minutes.
Solution: Parallel chunk encoding across a distributed encoding farm.
- Raw master file arrives
- Chunking Service splits movie into 2000 independent chunks
- Each chunk published as a job to Kafka job queue
- 2000 encoding machines each pick up one job
- Every chunk encodes simultaneously across the farm
- Each machine produces 5 quality versions of its chunk
- All 2000 chunks complete — Merge Service reassembles into final encoded movie
- CDN Distribution pushes encoded content to regional servers
A 3 hour movie that took days on one machine now takes minutes across 2000 machines.
Key Insight: Chunking solves two problems simultaneously — instant playback for users and parallel encoding for Netflix. The same chunk boundaries that enable streaming also enable massively parallel encoding.
Challenge 6: Fault Tolerant Encoding Pipeline
Interview Question: Your encoding farm has 2000 machines encoding chunks simultaneously. One machine fails mid encoding. 1999 chunks complete successfully but chunk 847 is lost. The entire movie is incomplete. At scale machine failures happen constantly. How do you make the pipeline fault tolerant?
Navigation: The failed chunk needs to be detected and retried on a different machine automatically. This requires a coordinator that tracks the state of every chunk job and reassigns failed jobs without human intervention.
Solution: Kafka job queue with Coordinator Service tracking chunk states.
Every chunk job has a state:
- PENDING — waiting to be picked up by a worker
- IN PROGRESS — currently being encoded by a worker
- COMPLETED — successfully encoded
- FAILED — encoding failed, needs retry
Coordinator Service monitors all job states:
- Job stuck IN PROGRESS too long — worker crashed — set back to PENDING — reassigned to healthy worker
- All 2000 jobs COMPLETED — trigger Merge Service automatically
- Job fails repeatedly — alert engineering team
TTL on IN PROGRESS state — same pattern from WhatsApp and Stock Exchange designs:
- Worker picks up chunk — job marked IN PROGRESS with TTL of expected encoding time plus buffer
- Worker crashes — TTL expires — job automatically returns to PENDING — reassigned
- No manual intervention needed — pipeline self heals
Key Insight: A job queue with explicit state tracking and TTL based failure detection makes distributed encoding pipelines self healing. Individual machine failures never block movie processing.
Challenge 7: Geo Licensing Checks
Interview Question: Netflix operates in 190 countries with different licensing rules per title per country. When a user clicks play Netflix must instantly verify if content is licensed in their country. This check must happen in milliseconds. How do you design it?
Solution: Redis Set per title with DynamoDB fallback and fail open strategy.
Data structure — Redis Set per title:
- Key is title ID
- Value is Redis Set containing all countries where title is licensed
- SISMEMBER titleID countryCode returns true or false in O(1)
- 10000 titles times 190 countries — entirely manageable in Redis memory
Licensing updates — eventual consistency is acceptable:
- Licensing changes happen a few times per day not in milliseconds
- DynamoDB stores licensing rules as source of truth
- Async background job syncs DynamoDB to Redis every few minutes
- Eventual consistency is perfectly fine — nobody needs sub-second licensing propagation
- This is deliberate under-engineering — Kafka streaming for licensing updates would be overkill
Fallback strategy — graceful degradation:
- Redis available — O(1) licensing check — instant response
- Redis down — fall back to DynamoDB — slightly slower but always available
- Both down — fail open — allow playback — minor licensing risk accepted
Fail open philosophy: Netflix prioritizes user experience over minor licensing violations. Blocking 200 million users from watching anything during a 30 second Redis outage causes massive revenue loss and reputational damage. Serving unlicensed content to a small number of users for 30 seconds is an acceptable tradeoff.
This is called Graceful Degradation — system degrades to a slower but functional state rather than failing completely.
Key Insight: Redis Set gives O(1) geo licensing checks. DynamoDB fallback means Redis downtime is never user facing. Fail open philosophy ensures Netflix never goes dark over a licensing check infrastructure failure.
Challenge 8: Personalized Recommendations
Interview Question: Netflix shows every user a completely personalized homepage based on watch history, ratings, similar users, trending content, and time of day patterns. Generating this in real time for 200 million users simultaneously seems impossible. How do you make personalized recommendations appear instantly?
Navigation: Recommendations do not need to be real time. Your taste does not change between 2pm and 2:05pm. Generating recommendations once per day and caching them is indistinguishable from real time generation — but vastly cheaper and faster.
Solution: Offline precomputed recommendations with DynamoDB plus Redis cache.
Offline ML pipeline runs continuously in background:
- Analyzes watch history of all 200 million users
- Runs collaborative filtering — users with similar taste to you watched X
- Runs content based filtering — you liked action movies so here are more
- Computes personalized top 100 recommendations per user per region
- Stores results in DynamoDB
- Invalidates Redis cache so fresh recommendations load on next session
User opens Netflix app:
- Fetch precomputed recommendations from Redis cache — instant O(1) read
- Cache miss — load from DynamoDB — populate Redis — return results
- No ML computation at request time — pure database read
- Homepage loads in under 200ms
Cache invalidation strategy:
- ML pipeline reruns — new recommendations computed — DynamoDB updated
- Redis cache invalidated per user
- Next app open — cache miss — fresh recommendations loaded from DynamoDB — cached again
Key Insight: Precomputing recommendations offline transforms an impossibly complex real time ML problem into a trivial database read. User taste changes slowly — daily recomputation is indistinguishable from real time for the user experience.
Full Architecture Summary
Global video delivery — CDN with regional servers, Netflix Open Connect inside ISPs
CDN cache management — Push for blockbusters, pull on demand, dynamic TTL via Kafka
Video chunking — 2 to 10 second chunks for instant playback start
Adaptive bitrate — 5 quality versions per chunk, client side ABR algorithm
Encoding pipeline — Parallel chunk encoding across distributed farm
Fault tolerant encoding — Kafka job queue with coordinator and TTL based failure detection
Geo licensing — Redis Set per title with DynamoDB fallback and fail open strategy
Personalized recommendations — Offline ML pipeline stored in DynamoDB plus Redis cache
Final Thoughts
Netflix is a masterclass in knowing when to compute eagerly and when to compute lazily. Recommendations are precomputed because real time ML at 200 million users is impossible. Chunks are encoded in parallel because sequential encoding is too slow. Licensing checks use eventual consistency because sub-second propagation is unnecessary overkill.
The recurring theme across every layer is that the right architecture matches the actual requirements — not a theoretical ideal. Netflix does not need real time recommendations. It does not need synchronous licensing updates. It does not need to serve the full 100GB file before playback. Recognizing what you do not need is just as important as knowing what you do.
Happy building. 🚀
Top comments (0)