What you'll learn: How Netflix streams video to 230 million users without buffering. How DRM actually protects content (and why it's still not perfect). How WebRTC works peer-to-peer. Why you need LiveKit instead of raw WebRTC. And how to design all of it in a system design interview. Written in plain English with real architecture decisions.
Table of Contents
- The Core Problem — Why All This Exists
- Netflix End-to-End Pipeline
- Video Encoding & Adaptive Streaming
- DRM — How Content Protection Actually Works
- CDN — Why Playback Feels Instant
- WebRTC — Real-Time Video from Scratch
- WebRTC vs Socket.IO — When to Use What
- LiveKit — Why You Should Not Build Raw WebRTC
- Mini Netflix System Design (Interview Ready)
- Real Challenges Engineers Face
- Architecture Decision Guide
- Interview Answers
1. The Core Problem — Why All This Exists
Every piece of technology in this article exists to solve a specific, painful problem.
What "just serving a video file" looks like
Imagine you're YouTube in 2005. You put a .mp4 file on a server. Users download it. Simple.
Now scale to 230 million users across every device, internet speed, and geography. Suddenly you have:
| Problem | What happens without a solution |
|---|---|
| High latency | Users wait 10–30 seconds before video starts |
| Bandwidth variation | HD video freezes on slow connections |
| Piracy | Anyone can download and redistribute your paid content |
| Global delivery | A server in the US is slow for users in India |
| Real-time interaction | Standard HTTP can't do live two-way video |
| Scale | One server cannot handle a million concurrent viewers |
The solutions:
- Adaptive Streaming (HLS/DASH) → handles bandwidth variation
- CDN → solves global delivery and scale
- DRM → solves piracy
- WebRTC → solves real-time interaction
- LiveKit / SFU → solves WebRTC at scale
Each technology solves exactly one problem. That's why they all exist together in a modern streaming system.
2. Netflix End-to-End Pipeline
Before diving into individual components, here's the complete journey a video takes from creation to your screen:
Creator uploads raw video
↓
[Encoding Service]
Convert to multiple resolutions (240p → 4K)
Convert to streaming formats (HLS, DASH)
↓
[DRM Service]
Encrypt every video segment
Store encryption keys in Key Management System
↓
[Object Storage — S3]
Store all encoded, encrypted segments
↓
[CDN — Cloudflare / Akamai]
Cache segments at edge servers globally
↓
[User clicks Play]
↓
[Backend API]
Authenticate user
Return CDN URL for video
Return license server URL
↓
[Video Player]
Fetch .m3u8 manifest (HLS) from CDN
Request DRM license (decryption key)
Fetch encrypted video segments from CDN
Decrypt segments → decode → render frames
↓
[User watches video]
Every step in this pipeline is solving a specific problem. Let's break each one down.
3. Video Encoding & Adaptive Streaming
Why raw video cannot be served directly
A raw 4K video file is enormous — often 50–100GB for a 2-hour movie. You cannot stream that over the internet. You need to:
- Compress it — reduce file size while keeping acceptable quality
- Create multiple versions — serve different quality based on available bandwidth
- Chunk it — break it into small segments so streaming can start immediately
How encoding works
Tools like FFmpeg take the raw video and produce multiple renditions:
# FFmpeg command to create multiple quality levels from one source
ffmpeg -i original_movie.mp4 \
-vf scale=426:240 -b:v 400k output_240p.mp4 \
-vf scale=854:480 -b:v 1000k output_480p.mp4 \
-vf scale=1280:720 -b:v 2500k output_720p.mp4 \
-vf scale=1920:1080 -b:v 5000k output_1080p.mp4
Each resolution is then cut into 2–6 second segments — small chunks that can be fetched independently.
HLS (HTTP Live Streaming) — The format that makes Netflix work
HLS is Apple's streaming protocol, now universally supported. It works like this:
Step 1 — The manifest file (.m3u8)
#EXTM3U
#EXT-X-VERSION:3
# Available quality levels
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240
/video/240p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
/video/480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
/video/720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
/video/1080p/playlist.m3u8
Step 2 — Quality-specific playlist
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:3
#EXTINF:6.0,
/video/720p/segment_001.ts
#EXTINF:6.0,
/video/720p/segment_002.ts
#EXTINF:6.0,
/video/720p/segment_003.ts
Step 3 — The player reads bandwidth and switches quality
// This is what the video player does internally
// HLS.js does this automatically for you
const hls = new Hls({
autoLevelEnabled: true, // auto switch quality based on bandwidth
startLevel: -1, // -1 means auto-detect best starting quality
abrBandWidthFactor: 0.95, // use 95% of measured bandwidth for safety margin
});
hls.loadSource("https://cdn.netflix.com/videos/movie123/master.m3u8");
hls.attachMedia(videoElement);
hls.on(Hls.Events.LEVEL_SWITCHED, (event, data) => {
console.log(`Quality switched to level ${data.level}`);
// Level 0 = 240p, Level 1 = 480p, Level 2 = 720p, Level 3 = 1080p
});
This is Adaptive Bitrate Streaming (ABR) — the player constantly monitors download speed and switches quality in real time. Your internet slows down → player quietly drops to 480p. It speeds up → player bumps back to 1080p. No buffering, no interruption.
4. DRM — How Content Protection Actually Works
DRM (Digital Rights Management) is the system that prevents you from downloading and redistributing Netflix content. Understanding how it actually works (not just "it encrypts the video") is what separates junior devs from senior ones.
The Three Major DRM Systems
| DRM System | Created By | Used On |
|---|---|---|
| Widevine | Chrome, Android, most browsers | |
| FairPlay | Apple | Safari, iOS, macOS |
| PlayReady | Microsoft | Edge, Windows, Xbox |
Netflix uses all three — which browser/device you're on determines which DRM system is used.
How DRM Actually Works (Step by Step)
Video before DRM:
[Raw segment data — anyone can play this]
Video after DRM encryption:
[Encrypted segment — looks like garbage without the key]
The key never travels with the video.
The key only comes from the License Server after authorization.
The complete DRM flow:
1. Studio delivers movie to Netflix
2. Netflix encrypts every video segment with AES-128 encryption
- A unique content key (CEK) is generated
- CEK is stored in a Key Management System (KMS) — NOT in the video
3. Encrypted segments uploaded to CDN
4. User signs up, pays for subscription
5. User clicks Play
6. Player sends License Request to DRM License Server:
{
userId: "user_123",
contentId: "movie_abc",
sessionToken: "jwt_...",
deviceFingerprint: "browser_xyz"
}
7. License Server checks:
✓ Is user subscribed?
✓ Is this content available in user's region?
✓ Is device authorized?
✓ Has concurrent stream limit been exceeded?
8. If all checks pass, License Server returns:
{
contentKey: "AES_KEY_encrypted_for_this_device",
expiresAt: "2026-01-01T12:00:00Z",
allowOffline: false,
hdcpRequired: true // requires HDCP-compliant monitor for 4K
}
9. Player decrypts segments in memory using the content key
(key never written to disk)
10. Decoded frames rendered to screen
(OS-level protection prevents screen capture in Widevine L1)
DRM Security Levels — Why Netflix is 1080p on Chrome but 4K on the App
Widevine Security Levels:
L1 — Highest security
- Decryption happens in hardware (TEE — Trusted Execution Environment)
- Screen capture blocked at hardware level
- 4K + HDR content allowed
- Available on: Android apps, Netflix Windows app, Chromecast
L2 — Medium security
- Rarely used
L3 — Software only
- Decryption in software (can be intercepted in theory)
- Netflix limits to 1080p
- Available on: Chrome, Firefox, Edge browsers
This is why you can watch 4K on the Netflix app but the same show is only 1080p in Chrome. It's not a technical limitation — it's a content protection policy based on security level.
What DRM Cannot Prevent
Here's the honest truth that every developer should know:
DRM prevents: DRM cannot prevent:
✅ Direct file download ❌ Screen recording (at L3)
✅ HTTP segment download ❌ Analog capture (HDMI splitter)
✅ URL sharing ❌ Camcorder pointed at screen
✅ Unauthorized players ❌ A determined attacker with hardware
DRM's real goal is not to make piracy impossible.
It's to make piracy hard enough that casual users don't bother,
and to satisfy the legal requirements of content studios.
DRM Frontend Integration
// Encrypted Media Extensions (EME) — the browser API for DRM
const video = document.querySelector("video");
// Configuration tells browser which DRM systems you support
const config = [{
initDataTypes: ["cenc"],
videoCapabilities: [{
contentType: 'video/mp4; codecs="avc1.42E01E"',
robustness: "SW_SECURE_CRYPTO", // Widevine L3 (browser)
}],
}];
// Request access to DRM system
const access = await navigator.requestMediaKeySystemAccess(
"com.widevine.alpha", // Widevine
config
);
const mediaKeys = await access.createMediaKeys();
await video.setMediaKeys(mediaKeys);
// When video encounters encrypted segments, this fires
video.addEventListener("encrypted", async (event) => {
const session = mediaKeys.createSession();
// Send license request to YOUR license server
session.addEventListener("message", async (msgEvent) => {
const licenseResponse = await fetch("/api/drm/license", {
method: "POST",
headers: {
"Content-Type": "application/octet-stream",
"Authorization": `Bearer ${userToken}`,
},
body: msgEvent.message, // Widevine license request blob
});
const license = await licenseResponse.arrayBuffer();
await session.update(license); // Give key to player
});
await session.generateRequest(event.initDataType, event.initData);
});
In practice, you would use Shaka Player or Video.js with DRM plugins rather than writing EME code directly — but understanding what happens underneath is what counts in an interview.
5. CDN — Why Playback Feels Instant
The Geography Problem
Netflix's servers are in the US. You're in Mumbai. Without a CDN, every video request crosses the Pacific Ocean — adding ~300ms of network latency to every segment fetch. With 6-second segments, that adds noticeable buffering.
How CDN Works
Without CDN:
User in Mumbai → Request → Netflix Server (Virginia, USA)
~300ms round trip per request
Video starts after 3-5 seconds
With CDN:
User in Mumbai → Request → CDN Edge Server (Mumbai)
~10ms round trip
Video starts in < 1 second
How CDN caches video segments:
First user in Mumbai plays Movie X (720p, segment 001):
1. Player requests segment_001.ts from CDN
2. CDN edge (Mumbai) doesn't have it → fetches from Netflix origin
3. CDN caches segment_001.ts in Mumbai edge
4. Delivers to user
1000th user in Mumbai plays Movie X (720p, segment 001):
1. Player requests segment_001.ts from CDN
2. CDN edge (Mumbai) has it cached → delivers instantly
3. Zero load on Netflix origin servers
Popular video segments (new releases, trending shows) are cached at hundreds of edge locations worldwide. Netflix actually pre-positions content they predict will be popular before it's released — pushing it to CDN nodes in advance.
CDN Cache-Control headers for video:
Cache-Control: public, max-age=86400 # Static segments → cache 24 hours
Cache-Control: no-cache # Manifest (.m3u8) → always re-fetch
(because quality levels may change)
6. WebRTC — Real-Time Video from Scratch
Why WebRTC Was Invented
Before WebRTC (pre-2012), doing a video call in a browser required Flash or a plugin. Video data traveled: User A → Server → User B. This added latency at both hops and put enormous load on servers.
WebRTC was designed to enable browser-to-browser direct video/audio with no plugins and no server in the data path. The result: sub-100ms latency, no server bandwidth costs for video data, and true peer-to-peer communication.
How WebRTC Establishes a Connection (Step by Step)
The tricky part: how do two browsers find each other on the internet? They're both behind NAT (your home router). They don't know each other's public IP. This is what makes WebRTC setup complex.
The connection process (called "signaling"):
User A (wants to call User B)
↓
Step 1: Create RTCPeerConnection
Step 2: Get local media (camera/mic)
Step 3: Create an "offer" (SDP — Session Description Protocol)
The offer says: "I can send/receive video in these formats..."
Step 4: Send offer to User B via signaling server (Socket.IO)
User B receives offer
↓
Step 5: Create RTCPeerConnection
Step 6: Set A's offer as remote description
Step 7: Create an "answer" (their own SDP)
Step 8: Send answer back to User A via signaling server
Both users exchange ICE candidates
↓
Step 9: ICE (Interactive Connectivity Establishment) finds the best path:
Option 1: Direct P2P (if no NAT issues) — best
Option 2: Via STUN server (discover public IP) — good
Option 3: Via TURN server (relay) — fallback
Step 10: Connection established — video flows directly peer-to-peer
Real WebRTC Code
// ============================================
// Complete WebRTC video call implementation
// ============================================
const servers = {
iceServers: [
{ urls: "stun:stun.l.google.com:19302" }, // STUN: discover public IP
{
urls: "turn:turn.yourserver.com:3478", // TURN: relay fallback
username: "user",
credential: "password",
},
],
};
let localStream;
let peerConnection;
// Step 1 — Get camera and microphone
const startCall = async () => {
localStream = await navigator.mediaDevices.getUserMedia({
video: { width: 1280, height: 720 },
audio: true,
});
document.getElementById("localVideo").srcObject = localStream;
// Step 2 — Create peer connection
peerConnection = new RTCPeerConnection(servers);
// Add local stream tracks to peer connection
localStream.getTracks().forEach((track) => {
peerConnection.addTrack(track, localStream);
});
// Step 3 — When remote stream arrives, display it
peerConnection.ontrack = (event) => {
document.getElementById("remoteVideo").srcObject = event.streams[0];
};
// Step 4 — Send ICE candidates to remote peer via signaling
peerConnection.onicecandidate = (event) => {
if (event.candidate) {
socket.emit("ice-candidate", {
roomId,
candidate: event.candidate,
});
}
};
// Step 5 — Create and send offer
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
socket.emit("offer", { roomId, offer });
};
// ── On the receiving side ──────────────────────
socket.on("offer", async ({ offer }) => {
peerConnection = new RTCPeerConnection(servers);
// Set up the same event handlers...
peerConnection.ontrack = (event) => {
document.getElementById("remoteVideo").srcObject = event.streams[0];
};
peerConnection.onicecandidate = (event) => {
if (event.candidate) {
socket.emit("ice-candidate", { roomId, candidate: event.candidate });
}
};
// Get local media and add tracks
localStream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
localStream.getTracks().forEach(track => peerConnection.addTrack(track, localStream));
// Set remote description from offer
await peerConnection.setRemoteDescription(offer);
// Create answer
const answer = await peerConnection.createAnswer();
await peerConnection.setLocalDescription(answer);
socket.emit("answer", { roomId, answer });
});
socket.on("answer", async ({ answer }) => {
await peerConnection.setRemoteDescription(answer);
});
socket.on("ice-candidate", async ({ candidate }) => {
await peerConnection.addIceCandidate(candidate);
});
STUN vs TURN — Understanding NAT Traversal
STUN Server (Session Traversal Utilities for NAT):
- Tells you your own public IP address
- Your router hides your real IP — STUN reveals it
- Used 80% of the time for P2P connections
- Cheap — almost no bandwidth cost
TURN Server (Traversal Using Relays around NAT):
- Full relay server — all data goes through it
- Used when P2P fails (symmetric NAT, strict firewalls)
- Expensive — you pay for all video bandwidth
- Used ~20% of the time
- Without TURN: ~15% of calls fail to connect
Real ICE process:
1. Try direct P2P → success 40% of the time
2. Try STUN → success 40% more
3. Fall back to TURN → handles remaining 20%
7. WebRTC vs Socket.IO — When to Use What
This is a common interview question and source of confusion. They solve completely different problems.
Socket.IO — Server-based real-time messaging
How Socket.IO works:
User A → Server → User B
↑ ↓
Message Message
All data goes through your server. Always.
// Socket.IO — perfect for chat, notifications, presence
// Server
io.on("connection", (socket) => {
socket.on("message", (data) => {
io.to(data.room).emit("message", data); // relay to room
});
});
// Client
socket.emit("message", { room: "general", text: "Hello!" });
socket.on("message", (data) => {
displayMessage(data);
});
WebRTC — Peer-to-peer binary data (video/audio/files)
How WebRTC works:
User A ←──────────────→ User B
Direct connection
No server in video path
// WebRTC — perfect for video calls, file sharing
// Video travels directly browser-to-browser
// Server is only needed for initial handshake (signaling)
The Real Comparison
| Feature | Socket.IO | WebRTC |
|---|---|---|
| Data path | Client → Server → Client | Client → Client (direct) |
| Latency | 50–200ms | 20–100ms |
| Video/audio | ❌ Not designed for it | ✅ Built for it |
| Text/events | ✅ Perfect | ✅ Also possible (DataChannel) |
| Server cost | High (all data through server) | Low (only signaling) |
| Setup complexity | Simple | Complex |
| Scales to millions | Hard without clustering | Harder (need SFU) |
| Firewall issues | Rare | TURN needed sometimes |
When to use each
Use Socket.IO for:
✅ Chat messages
✅ Notifications ("3 people liked your post")
✅ Collaborative cursors (Google Docs style)
✅ Live dashboards (stock prices, sports scores)
✅ Game state synchronization (turn-based)
✅ WebRTC signaling itself!
Use WebRTC for:
✅ Video calls (Google Meet, Zoom)
✅ Voice calls
✅ Live streaming with < 500ms latency
✅ Screen sharing
✅ Real-time file transfer
✅ Online gaming with low-latency requirements
The trick: Many production apps use BOTH.
Socket.IO handles signaling for WebRTC + chat/notifications.
WebRTC handles video/audio stream.
8. LiveKit — Why You Should Not Build Raw WebRTC
The Problem with Raw WebRTC at Scale
Raw WebRTC is P2P — it works great for 2 people. But what happens in a video call with 10 people?
Naive P2P Mesh approach (10 users):
User 1 sends video to → Users 2, 3, 4, 5, 6, 7, 8, 9, 10 (9 streams)
User 2 sends video to → Users 1, 3, 4, 5, 6, 7, 8, 9, 10 (9 streams)
...
Total connections: n(n-1)/2 = 45 connections
Each user uploads 9 streams simultaneously.
On a typical 10 Mbps upload connection: 9 × 1 Mbps video = 9 Mbps used.
This completely saturates the user's upload bandwidth.
At 20 users it becomes completely impossible.
The Solution: SFU (Selective Forwarding Unit)
An SFU is a media server that sits between participants and intelligently forwards streams:
With SFU (LiveKit):
User 1 → SFU → User 2
→ User 3
→ User 4
→ ...
User 1 only uploads ONE stream (to SFU)
SFU forwards it to everyone else
User 1's upload: 1 × 1 Mbps = 1 Mbps — sustainable at any scale
SFU benefits:
✅ Each user uploads only 1 stream
✅ SFU can selectively forward (only send video of visible participants)
✅ SFU can transcode for different quality subscribers
✅ Recording is handled at the SFU (not the client)
✅ Simulcast: user uploads 3 quality levels, SFU picks the right one per subscriber
Why LiveKit Specifically
Building your own SFU from scratch is a months-long project. LiveKit gives you:
LiveKit provides out of the box:
✅ SFU infrastructure
✅ React/React Native SDKs
✅ Recording and egress (stream to RTMP/YouTube/Twitch)
✅ End-to-end encryption
✅ Simulcast support
✅ Screen sharing
✅ Data channels (for chat alongside video)
✅ Kubernetes-ready deployment
✅ Self-hostable (open source) or cloud service
LiveKit in a React App
// Install: npm install @livekit/components-react livekit-client
import {
LiveKitRoom,
VideoConference,
RoomAudioRenderer,
} from "@livekit/components-react";
import "@livekit/components-styles";
// That's it — complete video conference UI in ~15 lines
const VideoCall = ({ roomName, userId }) => {
const [token, setToken] = useState("");
useEffect(() => {
// Fetch a room token from YOUR backend
fetch(`/api/livekit/token?room=${roomName}&user=${userId}`)
.then(res => res.json())
.then(data => setToken(data.token));
}, [roomName, userId]);
if (!token) return <div>Connecting...</div>;
return (
<LiveKitRoom
video={true}
audio={true}
token={token}
serverUrl={process.env.REACT_APP_LIVEKIT_URL}
style={{ height: "100vh" }}
>
<VideoConference />
<RoomAudioRenderer />
</LiveKitRoom>
);
};
// Backend: Generate a room access token (Node.js)
const { AccessToken } = require("livekit-server-sdk");
app.get("/api/livekit/token", (req, res) => {
const { room, user } = req.query;
const token = new AccessToken(
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
{ identity: user }
);
token.addGrant({
roomJoin: true,
room: room,
canPublish: true,
canSubscribe: true,
});
res.json({ token: token.toJwt() });
});
WebRTC vs LiveKit Decision
Use raw WebRTC when:
✅ 1-to-1 calls only
✅ You want zero infrastructure cost
✅ Simplest possible setup
✅ Learning/experimenting
Use LiveKit when:
✅ 3+ participants in a call
✅ You need recording
✅ You need live streaming egress (RTMP to YouTube)
✅ You need transcription/AI integration
✅ Production application at any real scale
9. Mini Netflix System Design (Interview Ready)
This is how you answer "Design Netflix" or "Design a video streaming platform" in a system design interview.
Clarify Requirements First (Always do this)
Functional Requirements:
✅ Users can watch videos
✅ Multiple video quality levels
✅ Adaptive quality (auto-switch based on connection)
✅ Resume playback from where user left off
✅ Content protection (paid content)
✅ Search and browse catalog
✅ Recommendations based on watch history
Non-Functional Requirements:
✅ Scale: 100 million daily active users
✅ Availability: 99.99% uptime
✅ Latency: Video starts playing within 2 seconds
✅ Global: Users worldwide
High-Level Architecture
┌─────────────────┐
│ Load Balancer │
└────────┬────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Auth API │ │ Metadata API│ │ License API │
│ Service │ │ Service │ │ Service(DRM)│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌──────▼───────┐ ┌──────▼───────┐ ┌──────▼───────┐
│ User DB │ │ Video DB │ │ Key Mgmt │
│ (PostgreSQL) │ │ (PostgreSQL/ │ │ System │
└──────────────┘ │ Cassandra) │ └──────────────┘
└──────────────┘
┌──────────────────────────────────────────────────┐
│ Upload Pipeline │
│ Upload Service → Message Queue → Encoding Worker│
│ (FFmpeg on EC2) │
└──────────────────────────┬───────────────────────┘
│
┌────────▼────────┐
│ S3 Storage │
│ (encoded files) │
└────────┬────────┘
│
┌────────▼────────┐
│ CDN (Akamai / │
│ Cloudflare) │
└────────┬────────┘
│
┌────────▼────────┐
│ Video Player │
│ (HLS + EME/DRM) │
└─────────────────┘
Database Design
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
plan VARCHAR(50) DEFAULT 'basic', -- basic, standard, premium
created_at TIMESTAMP DEFAULT NOW()
);
-- Videos table (metadata only — actual files in S3)
CREATE TABLE videos (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title VARCHAR(500) NOT NULL,
description TEXT,
duration_seconds INTEGER,
genres TEXT[], -- PostgreSQL array
status VARCHAR(50) DEFAULT 'processing', -- processing, ready, failed
content_id VARCHAR(255), -- DRM content ID
created_at TIMESTAMP DEFAULT NOW()
);
-- Video files (multiple quality levels per video)
CREATE TABLE video_files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
video_id UUID REFERENCES videos(id),
quality VARCHAR(20), -- 240p, 480p, 720p, 1080p, 4k
cdn_url VARCHAR(1000), -- HLS manifest URL
file_size_bytes BIGINT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Watch history (for resume playback)
CREATE TABLE watch_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id),
video_id UUID REFERENCES videos(id),
position_seconds INTEGER DEFAULT 0, -- resume position
completed BOOLEAN DEFAULT FALSE,
watched_at TIMESTAMP DEFAULT NOW(),
UNIQUE(user_id, video_id)
);
Video Upload & Processing Flow
// 1. Upload Service — handles multipart upload to S3
app.post("/api/upload", async (req, res) => {
const { title, description } = req.body;
// Create video record in DB
const video = await db.video.create({
data: { title, description, status: "processing" }
});
// Get pre-signed S3 URL (client uploads directly to S3, bypassing our server)
const uploadUrl = await s3.getSignedUploadUrl({
key: `raw/${video.id}/original.mp4`,
contentType: "video/mp4",
expiresIn: 3600,
});
// Publish encoding job to queue
await messageQueue.publish("encoding-jobs", {
videoId: video.id,
s3Key: `raw/${video.id}/original.mp4`,
});
res.json({ videoId: video.id, uploadUrl });
});
// 2. Encoding Worker (separate service, runs on powerful EC2)
messageQueue.consume("encoding-jobs", async (job) => {
const { videoId, s3Key } = job;
// Download from S3
await s3.download(s3Key, "/tmp/original.mp4");
// Encode to multiple qualities using FFmpeg
const qualities = ["240p", "480p", "720p", "1080p"];
for (const quality of qualities) {
await ffmpeg.encode({
input: "/tmp/original.mp4",
output: `/tmp/${quality}/`,
quality,
format: "hls",
segmentDuration: 6,
});
// Apply DRM encryption to segments
await drm.encryptHls({
inputDir: `/tmp/${quality}/`,
contentId: videoId,
});
// Upload encrypted segments to S3
await s3.uploadDirectory(`/tmp/${quality}/`, `videos/${videoId}/${quality}/`);
}
// Update video status
await db.video.update({ id: videoId, status: "ready" });
});
Full Playback Flow (Backend + Frontend)
// Backend — Play endpoint
app.get("/api/videos/:id/play", authenticate, async (req, res) => {
const video = await db.video.findById(req.params.id);
// Check subscription for premium content
if (video.isPremium && req.user.plan === "basic") {
return res.status(403).json({ error: "Upgrade to watch this content" });
}
// Get resume position
const history = await db.watchHistory.findOne({
userId: req.user.id,
videoId: video.id,
});
// Return playback info
res.json({
manifestUrl: `https://cdn.netflix.com/videos/${video.id}/master.m3u8`,
licenseServerUrl: `https://license.netflix.com/widevine`,
resumePosition: history?.positionSeconds ?? 0,
contentId: video.contentId,
});
});
// Frontend — Video Player Component
const VideoPlayer = ({ videoId }) => {
const playerRef = useRef(null);
const hlsRef = useRef(null);
useEffect(() => {
const initPlayer = async () => {
// Get playback info from API
const { manifestUrl, licenseServerUrl, resumePosition, contentId } =
await fetch(`/api/videos/${videoId}/play`).then(r => r.json());
const video = playerRef.current;
// Initialize HLS.js
const hls = new Hls({ autoLevelEnabled: true });
hlsRef.current = hls;
hls.loadSource(manifestUrl);
hls.attachMedia(video);
// Set up DRM (Widevine)
video.addEventListener("encrypted", async (event) => {
if (!video.mediaKeys) {
const access = await navigator.requestMediaKeySystemAccess(
"com.widevine.alpha",
[{ initDataTypes: ["cenc"], videoCapabilities: [{ contentType: 'video/mp4; codecs="avc1"' }] }]
);
const mediaKeys = await access.createMediaKeys();
await video.setMediaKeys(mediaKeys);
}
const session = video.mediaKeys.createSession();
session.addEventListener("message", async (msg) => {
const license = await fetch(licenseServerUrl, {
method: "POST",
headers: { Authorization: `Bearer ${userToken}` },
body: msg.message,
}).then(r => r.arrayBuffer());
await session.update(license);
});
await session.generateRequest(event.initDataType, event.initData);
});
hls.on(Hls.Events.MANIFEST_PARSED, () => {
video.currentTime = resumePosition; // resume where user left off
video.play();
});
};
initPlayer();
// Save position every 5 seconds
const saveInterval = setInterval(() => {
if (playerRef.current) {
fetch(`/api/videos/${videoId}/progress`, {
method: "POST",
body: JSON.stringify({ position: Math.floor(playerRef.current.currentTime) }),
});
}
}, 5000);
return () => {
hlsRef.current?.destroy();
clearInterval(saveInterval);
};
}, [videoId]);
return (
<video
ref={playerRef}
controls
style={{ width: "100%", maxHeight: "80vh" }}
/>
);
};
Scaling Strategy
Handling 100 million daily users:
1. CDN handles ~95% of video traffic
- No origin servers involved in most plays
- Cloudflare / Akamai has edge nodes everywhere
2. Horizontal scaling for APIs
- Stateless services behind load balancer
- Auto-scaling groups (more traffic → more instances)
3. Database scaling
- Read replicas for metadata queries
- Redis caching for popular video metadata
- Cassandra/DynamoDB for watch history (write-heavy, needs sharding)
4. Async encoding
- Message queue (SQS/Kafka) for encoding jobs
- Encoding workers scale independently
- Slow encoding doesn't block uploads
5. Microservices
- Upload Service
- Encoding Service
- Streaming Service (CDN management)
- License Service (DRM)
- Recommendation Service
- Auth Service
Each scales independently based on its own load.
10. Real Challenges Engineers Face
Challenge 1 — Buffering Under Poor Networks
Problem: User has 2 Mbps connection. 1080p requires 5 Mbps.
Solution stack:
1. Adaptive Bitrate (ABR) — auto-switch to 480p
2. Pre-buffering — download 3 segments ahead
3. CDN edge selection — pick nearest server
4. Connection quality prediction — use network API to detect before buffering
// Browser Network Information API
const connection = navigator.connection;
if (connection.downlink < 2) {
// Proactively cap quality at 480p
hls.currentLevel = 1; // Index of 480p in levels array
}
Challenge 2 — DRM Integration Complexity
Problem: Three different DRM systems for three different browsers.
Solution — Multi-DRM with a single abstraction:
Shaka Player handles DRM detection automatically:
const player = new shaka.Player(videoElement);
await player.configure({
drm: {
servers: {
"com.widevine.alpha": "https://license.example.com/widevine", // Chrome
"com.apple.fps": "https://license.example.com/fairplay", // Safari
"com.microsoft.playready": "https://license.example.com/playready" // Edge
}
}
});
// Shaka picks the right DRM for the current browser automatically
await player.load("https://cdn.example.com/video/master.m3u8");
Challenge 3 — WebRTC ICE Failure (Calls Don't Connect)
Problem: ~15% of WebRTC calls fail to connect without TURN.
Root cause: Symmetric NAT — user's router blocks incoming P2P connections.
Solution:
1. Deploy TURN server (coturn is open source)
2. Use paid TURN providers (Twilio STUN/TURN, Xirsys)
3. Budget: TURN bandwidth = all video passing through it (~$0.10/GB)
Detection:
peerConnection.oniceconnectionstatechange = () => {
if (peerConnection.iceConnectionState === "failed") {
// Force TURN relay — restart ICE with only TURN candidates
peerConnection.restartIce();
}
};
Challenge 4 — Storage Costs
Problem: 1 hour of raw 4K video = ~50GB
1 movie encoded to all qualities = ~15GB
Netflix catalog = 15,000+ titles
Math:
15,000 titles × 15GB = 225TB just for video files
Plus CDN cache across 200+ edge locations...
Solutions:
1. S3 Intelligent-Tiering — automatically move rarely-accessed content to cheaper storage
2. Compress older content more aggressively
3. Delete source files after encoding (only keep encoded versions)
4. Use per-title encoding — Netflix uses ML to find optimal bitrate per-scene
(action scenes need higher bitrate; static scenes can use lower)
11. Architecture Decision Guide
"Which architecture do I need?"
Is it real-time (< 500ms latency)?
YES → WebRTC (+ LiveKit if 3+ people)
NO ↓
Is it live streaming (one-to-many, < 5s delay acceptable)?
YES → RTMP ingest + HLS output + CDN
NO ↓
Is it on-demand video (Netflix type)?
YES → HLS + CDN + DRM + Adaptive Bitrate
NO ↓
Is it just real-time events/messages?
YES → Socket.IO / WebSockets
"Do I need DRM?"
Is this paid premium content? → YES → DRM required
Could redistribution harm business? → YES → DRM required
Is this free/public content? → NO → DRM optional
Is this internal/corporate video? → NO → DRM probably not needed
"WebRTC or LiveKit?"
Is it 1-to-1 call only? → Raw WebRTC is fine
Is it 3+ participants? → LiveKit / SFU required
Do you need recording? → LiveKit
Do you need RTMP output? → LiveKit
Is this production? > 100 users? → LiveKit
Are you learning/prototyping? → Raw WebRTC to understand concepts
12. Interview-Ready Answers
Q: How does Netflix serve video to millions of users without buffering?
Netflix uses a combination of adaptive bitrate streaming (HLS/DASH), a global CDN that caches video segments at edge servers worldwide, and DRM-encrypted content delivery. When a user plays a video, the player fetches a manifest file listing available quality levels, then the HLS.js player monitors bandwidth and automatically switches quality. The CDN serves segments from the nearest edge node — often less than 10ms away — while the origin servers only need to serve the small percentage of requests that miss the CDN cache.
Q: How does DRM work and why can't it be bypassed?
DRM encrypts video segments using AES-128. The encryption key is never included with the video — it only comes from a License Server after verifying the user's subscription and device authorization. In browsers, the decryption happens inside the browser's Content Decryption Module (CDM), a closed-source component that protects the key in a sandbox. At the hardware level (Widevine L1), decryption happens inside a Trusted Execution Environment where even the OS cannot access the key. However, DRM isn't foolproof — screen recording is possible at the software level (L3), and analog capture is always possible. DRM's goal is to raise the effort bar high enough for casual users and to satisfy legal studio requirements.
Q: Why use WebRTC instead of WebSockets for video calls?
WebSockets route all data through a server, which adds latency and creates enormous server bandwidth costs for video. WebRTC establishes a direct peer-to-peer connection between browsers using ICE/STUN/TURN for NAT traversal, so video data travels directly between users without touching a server. This reduces latency from 100–200ms to 20–100ms and eliminates server bandwidth costs for the video stream. The server (using Socket.IO) is still used for signaling — exchanging the SDP offer/answer and ICE candidates — but once the connection is established, all media flows peer-to-peer.
Q: What is an SFU and why does LiveKit use one?
In a multi-user video call, pure peer-to-peer creates an N×(N-1) mesh — with 10 users, each user uploads 9 video streams simultaneously, which saturates their upload bandwidth. An SFU (Selective Forwarding Unit) is a media server where each user uploads one stream to the SFU, and the SFU forwards it to all subscribers. This reduces each user's upload requirement from N-1 streams to 1 stream. The SFU can also selectively forward — only sending video for visible participants — and support simulcast, where users upload multiple quality levels and the SFU sends the appropriate quality to each subscriber based on their bandwidth.
If this helped you understand the real architecture behind streaming and real-time video, drop a ❤️. Questions? Put them in the comments — especially if you're preparing for a system design interview.
Next: Low-latency live streaming architecture, how to build your own DRM license server, and debugging WebRTC connection failures.
Top comments (0)