Munna Thakur

Posted on Mar 18

WebRTC, DRM & Netflix System Design: The Complete Guide (2026)

#webdev #programming #productivity #javascript

What you'll learn: How Netflix streams video to 230 million users without buffering. How DRM actually protects content (and why it's still not perfect). How WebRTC works peer-to-peer. Why you need LiveKit instead of raw WebRTC. And how to design all of it in a system design interview. Written in plain English with real architecture decisions.

The Core Problem — Why All This Exists
Netflix End-to-End Pipeline
Video Encoding & Adaptive Streaming
DRM — How Content Protection Actually Works
CDN — Why Playback Feels Instant
WebRTC — Real-Time Video from Scratch
WebRTC vs Socket.IO — When to Use What
LiveKit — Why You Should Not Build Raw WebRTC
Mini Netflix System Design (Interview Ready)
Real Challenges Engineers Face
Architecture Decision Guide
Interview Answers

1. The Core Problem — Why All This Exists

Every piece of technology in this article exists to solve a specific, painful problem.

What "just serving a video file" looks like

Imagine you're YouTube in 2005. You put a .mp4 file on a server. Users download it. Simple.

Now scale to 230 million users across every device, internet speed, and geography. Suddenly you have:

Problem	What happens without a solution
High latency	Users wait 10–30 seconds before video starts
Bandwidth variation	HD video freezes on slow connections
Piracy	Anyone can download and redistribute your paid content
Global delivery	A server in the US is slow for users in India
Real-time interaction	Standard HTTP can't do live two-way video
Scale	One server cannot handle a million concurrent viewers

The solutions:

Adaptive Streaming (HLS/DASH) → handles bandwidth variation
CDN → solves global delivery and scale
DRM → solves piracy
WebRTC → solves real-time interaction
LiveKit / SFU → solves WebRTC at scale

Each technology solves exactly one problem. That's why they all exist together in a modern streaming system.

2. Netflix End-to-End Pipeline

Before diving into individual components, here's the complete journey a video takes from creation to your screen:

Creator uploads raw video
         ↓
[Encoding Service]
  Convert to multiple resolutions (240p → 4K)
  Convert to streaming formats (HLS, DASH)
         ↓
[DRM Service]
  Encrypt every video segment
  Store encryption keys in Key Management System
         ↓
[Object Storage — S3]
  Store all encoded, encrypted segments
         ↓
[CDN — Cloudflare / Akamai]
  Cache segments at edge servers globally
         ↓
[User clicks Play]
         ↓
[Backend API]
  Authenticate user
  Return CDN URL for video
  Return license server URL
         ↓
[Video Player]
  Fetch .m3u8 manifest (HLS) from CDN
  Request DRM license (decryption key)
  Fetch encrypted video segments from CDN
  Decrypt segments → decode → render frames
         ↓
[User watches video]

Every step in this pipeline is solving a specific problem. Let's break each one down.

3. Video Encoding & Adaptive Streaming

Why raw video cannot be served directly

A raw 4K video file is enormous — often 50–100GB for a 2-hour movie. You cannot stream that over the internet. You need to:

Compress it — reduce file size while keeping acceptable quality
Create multiple versions — serve different quality based on available bandwidth
Chunk it — break it into small segments so streaming can start immediately

How encoding works

Tools like FFmpeg take the raw video and produce multiple renditions:

# FFmpeg command to create multiple quality levels from one source
ffmpeg -i original_movie.mp4 \
  -vf scale=426:240   -b:v 400k  output_240p.mp4 \
  -vf scale=854:480   -b:v 1000k output_480p.mp4 \
  -vf scale=1280:720  -b:v 2500k output_720p.mp4 \
  -vf scale=1920:1080 -b:v 5000k output_1080p.mp4

Each resolution is then cut into 2–6 second segments — small chunks that can be fetched independently.

HLS (HTTP Live Streaming) — The format that makes Netflix work

HLS is Apple's streaming protocol, now universally supported. It works like this:

Step 1 — The manifest file (.m3u8)

#EXTM3U
#EXT-X-VERSION:3

# Available quality levels
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240
/video/240p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
/video/480p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
/video/720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
/video/1080p/playlist.m3u8

Step 2 — Quality-specific playlist

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-VERSION:3

#EXTINF:6.0,
/video/720p/segment_001.ts

#EXTINF:6.0,
/video/720p/segment_002.ts

#EXTINF:6.0,
/video/720p/segment_003.ts

Step 3 — The player reads bandwidth and switches quality

// This is what the video player does internally
// HLS.js does this automatically for you
const hls = new Hls({
  autoLevelEnabled: true,      // auto switch quality based on bandwidth
  startLevel: -1,              // -1 means auto-detect best starting quality
  abrBandWidthFactor: 0.95,    // use 95% of measured bandwidth for safety margin
});

hls.loadSource("https://cdn.netflix.com/videos/movie123/master.m3u8");
hls.attachMedia(videoElement);

hls.on(Hls.Events.LEVEL_SWITCHED, (event, data) => {
  console.log(`Quality switched to level ${data.level}`);
  // Level 0 = 240p, Level 1 = 480p, Level 2 = 720p, Level 3 = 1080p
});

This is Adaptive Bitrate Streaming (ABR) — the player constantly monitors download speed and switches quality in real time. Your internet slows down → player quietly drops to 480p. It speeds up → player bumps back to 1080p. No buffering, no interruption.

4. DRM — How Content Protection Actually Works

DRM (Digital Rights Management) is the system that prevents you from downloading and redistributing Netflix content. Understanding how it actually works (not just "it encrypts the video") is what separates junior devs from senior ones.

The Three Major DRM Systems

DRM System	Created By	Used On
Widevine	Google	Chrome, Android, most browsers
FairPlay	Apple	Safari, iOS, macOS
PlayReady	Microsoft	Edge, Windows, Xbox

Netflix uses all three — which browser/device you're on determines which DRM system is used.

How DRM Actually Works (Step by Step)

Video before DRM:
[Raw segment data — anyone can play this]

Video after DRM encryption:
[Encrypted segment — looks like garbage without the key]

The key never travels with the video.
The key only comes from the License Server after authorization.

The complete DRM flow:

1. Studio delivers movie to Netflix

2. Netflix encrypts every video segment with AES-128 encryption
   - A unique content key (CEK) is generated
   - CEK is stored in a Key Management System (KMS) — NOT in the video

3. Encrypted segments uploaded to CDN

4. User signs up, pays for subscription

5. User clicks Play

6. Player sends License Request to DRM License Server:
   {
     userId: "user_123",
     contentId: "movie_abc",
     sessionToken: "jwt_...",
     deviceFingerprint: "browser_xyz"
   }

7. License Server checks:
   ✓ Is user subscribed?
   ✓ Is this content available in user's region?
   ✓ Is device authorized?
   ✓ Has concurrent stream limit been exceeded?

8. If all checks pass, License Server returns:
   {
     contentKey: "AES_KEY_encrypted_for_this_device",
     expiresAt: "2026-01-01T12:00:00Z",
     allowOffline: false,
     hdcpRequired: true    // requires HDCP-compliant monitor for 4K
   }

9. Player decrypts segments in memory using the content key
   (key never written to disk)

10. Decoded frames rendered to screen
    (OS-level protection prevents screen capture in Widevine L1)

DRM Security Levels — Why Netflix is 1080p on Chrome but 4K on the App

Widevine Security Levels:

L1 — Highest security
  - Decryption happens in hardware (TEE — Trusted Execution Environment)
  - Screen capture blocked at hardware level
  - 4K + HDR content allowed
  - Available on: Android apps, Netflix Windows app, Chromecast

L2 — Medium security
  - Rarely used

L3 — Software only
  - Decryption in software (can be intercepted in theory)
  - Netflix limits to 1080p
  - Available on: Chrome, Firefox, Edge browsers

This is why you can watch 4K on the Netflix app but the same show is only 1080p in Chrome. It's not a technical limitation — it's a content protection policy based on security level.

What DRM Cannot Prevent

Here's the honest truth that every developer should know:

DRM prevents:             DRM cannot prevent:
✅ Direct file download   ❌ Screen recording (at L3)
✅ HTTP segment download  ❌ Analog capture (HDMI splitter)
✅ URL sharing            ❌ Camcorder pointed at screen
✅ Unauthorized players   ❌ A determined attacker with hardware

DRM's real goal is not to make piracy impossible.
It's to make piracy hard enough that casual users don't bother,
and to satisfy the legal requirements of content studios.

DRM Frontend Integration

// Encrypted Media Extensions (EME) — the browser API for DRM
const video = document.querySelector("video");

// Configuration tells browser which DRM systems you support
const config = [{
  initDataTypes: ["cenc"],
  videoCapabilities: [{
    contentType: 'video/mp4; codecs="avc1.42E01E"',
    robustness: "SW_SECURE_CRYPTO",   // Widevine L3 (browser)
  }],
}];

// Request access to DRM system
const access = await navigator.requestMediaKeySystemAccess(
  "com.widevine.alpha",   // Widevine
  config
);

const mediaKeys = await access.createMediaKeys();
await video.setMediaKeys(mediaKeys);

// When video encounters encrypted segments, this fires
video.addEventListener("encrypted", async (event) => {
  const session = mediaKeys.createSession();

  // Send license request to YOUR license server
  session.addEventListener("message", async (msgEvent) => {
    const licenseResponse = await fetch("/api/drm/license", {
      method: "POST",
      headers: {
        "Content-Type": "application/octet-stream",
        "Authorization": `Bearer ${userToken}`,
      },
      body: msgEvent.message,   // Widevine license request blob
    });

    const license = await licenseResponse.arrayBuffer();
    await session.update(license);  // Give key to player
  });

  await session.generateRequest(event.initDataType, event.initData);
});

In practice, you would use Shaka Player or Video.js with DRM plugins rather than writing EME code directly — but understanding what happens underneath is what counts in an interview.

5. CDN — Why Playback Feels Instant

The Geography Problem

Netflix's servers are in the US. You're in Mumbai. Without a CDN, every video request crosses the Pacific Ocean — adding ~300ms of network latency to every segment fetch. With 6-second segments, that adds noticeable buffering.

How CDN Works

Without CDN:
User in Mumbai → Request → Netflix Server (Virginia, USA)
                            ~300ms round trip per request
                            Video starts after 3-5 seconds

With CDN:
User in Mumbai → Request → CDN Edge Server (Mumbai)
                            ~10ms round trip
                            Video starts in < 1 second

How CDN caches video segments:

First user in Mumbai plays Movie X (720p, segment 001):
1. Player requests segment_001.ts from CDN
2. CDN edge (Mumbai) doesn't have it → fetches from Netflix origin
3. CDN caches segment_001.ts in Mumbai edge
4. Delivers to user

1000th user in Mumbai plays Movie X (720p, segment 001):
1. Player requests segment_001.ts from CDN
2. CDN edge (Mumbai) has it cached → delivers instantly
3. Zero load on Netflix origin servers

Popular video segments (new releases, trending shows) are cached at hundreds of edge locations worldwide. Netflix actually pre-positions content they predict will be popular before it's released — pushing it to CDN nodes in advance.

CDN Cache-Control headers for video:
Cache-Control: public, max-age=86400     # Static segments → cache 24 hours
Cache-Control: no-cache                  # Manifest (.m3u8) → always re-fetch
                                          (because quality levels may change)

6. WebRTC — Real-Time Video from Scratch

Why WebRTC Was Invented

Before WebRTC (pre-2012), doing a video call in a browser required Flash or a plugin. Video data traveled: User A → Server → User B. This added latency at both hops and put enormous load on servers.

WebRTC was designed to enable browser-to-browser direct video/audio with no plugins and no server in the data path. The result: sub-100ms latency, no server bandwidth costs for video data, and true peer-to-peer communication.

How WebRTC Establishes a Connection (Step by Step)

The tricky part: how do two browsers find each other on the internet? They're both behind NAT (your home router). They don't know each other's public IP. This is what makes WebRTC setup complex.

The connection process (called "signaling"):

User A (wants to call User B)
      ↓
Step 1: Create RTCPeerConnection
Step 2: Get local media (camera/mic)
Step 3: Create an "offer" (SDP — Session Description Protocol)
        The offer says: "I can send/receive video in these formats..."
Step 4: Send offer to User B via signaling server (Socket.IO)

User B receives offer
      ↓
Step 5: Create RTCPeerConnection
Step 6: Set A's offer as remote description
Step 7: Create an "answer" (their own SDP)
Step 8: Send answer back to User A via signaling server

Both users exchange ICE candidates
      ↓
Step 9: ICE (Interactive Connectivity Establishment) finds the best path:
        Option 1: Direct P2P (if no NAT issues) — best
        Option 2: Via STUN server (discover public IP) — good
        Option 3: Via TURN server (relay) — fallback

Step 10: Connection established — video flows directly peer-to-peer

Real WebRTC Code

// ============================================
// Complete WebRTC video call implementation
// ============================================

const servers = {
  iceServers: [
    { urls: "stun:stun.l.google.com:19302" },    // STUN: discover public IP
    {
      urls: "turn:turn.yourserver.com:3478",       // TURN: relay fallback
      username: "user",
      credential: "password",
    },
  ],
};

let localStream;
let peerConnection;

// Step 1 — Get camera and microphone
const startCall = async () => {
  localStream = await navigator.mediaDevices.getUserMedia({
    video: { width: 1280, height: 720 },
    audio: true,
  });

  document.getElementById("localVideo").srcObject = localStream;

  // Step 2 — Create peer connection
  peerConnection = new RTCPeerConnection(servers);

  // Add local stream tracks to peer connection
  localStream.getTracks().forEach((track) => {
    peerConnection.addTrack(track, localStream);
  });

  // Step 3 — When remote stream arrives, display it
  peerConnection.ontrack = (event) => {
    document.getElementById("remoteVideo").srcObject = event.streams[0];
  };

  // Step 4 — Send ICE candidates to remote peer via signaling
  peerConnection.onicecandidate = (event) => {
    if (event.candidate) {
      socket.emit("ice-candidate", {
        roomId,
        candidate: event.candidate,
      });
    }
  };

  // Step 5 — Create and send offer
  const offer = await peerConnection.createOffer();
  await peerConnection.setLocalDescription(offer);

  socket.emit("offer", { roomId, offer });
};

// ── On the receiving side ──────────────────────

socket.on("offer", async ({ offer }) => {
  peerConnection = new RTCPeerConnection(servers);

  // Set up the same event handlers...
  peerConnection.ontrack = (event) => {
    document.getElementById("remoteVideo").srcObject = event.streams[0];
  };

  peerConnection.onicecandidate = (event) => {
    if (event.candidate) {
      socket.emit("ice-candidate", { roomId, candidate: event.candidate });
    }
  };

  // Get local media and add tracks
  localStream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
  localStream.getTracks().forEach(track => peerConnection.addTrack(track, localStream));

  // Set remote description from offer
  await peerConnection.setRemoteDescription(offer);

  // Create answer
  const answer = await peerConnection.createAnswer();
  await peerConnection.setLocalDescription(answer);

  socket.emit("answer", { roomId, answer });
});

socket.on("answer", async ({ answer }) => {
  await peerConnection.setRemoteDescription(answer);
});

socket.on("ice-candidate", async ({ candidate }) => {
  await peerConnection.addIceCandidate(candidate);
});

STUN vs TURN — Understanding NAT Traversal

STUN Server (Session Traversal Utilities for NAT):
  - Tells you your own public IP address
  - Your router hides your real IP — STUN reveals it
  - Used 80% of the time for P2P connections
  - Cheap — almost no bandwidth cost

TURN Server (Traversal Using Relays around NAT):
  - Full relay server — all data goes through it
  - Used when P2P fails (symmetric NAT, strict firewalls)
  - Expensive — you pay for all video bandwidth
  - Used ~20% of the time
  - Without TURN: ~15% of calls fail to connect

Real ICE process:
  1. Try direct P2P → success 40% of the time
  2. Try STUN → success 40% more
  3. Fall back to TURN → handles remaining 20%

7. WebRTC vs Socket.IO — When to Use What

This is a common interview question and source of confusion. They solve completely different problems.

Socket.IO — Server-based real-time messaging

How Socket.IO works:
User A → Server → User B
       ↑       ↓
   Message   Message

All data goes through your server. Always.

// Socket.IO — perfect for chat, notifications, presence
// Server
io.on("connection", (socket) => {
  socket.on("message", (data) => {
    io.to(data.room).emit("message", data);  // relay to room
  });
});

// Client
socket.emit("message", { room: "general", text: "Hello!" });
socket.on("message", (data) => {
  displayMessage(data);
});

WebRTC — Peer-to-peer binary data (video/audio/files)

How WebRTC works:
User A ←──────────────→ User B
        Direct connection
        No server in video path

// WebRTC — perfect for video calls, file sharing
// Video travels directly browser-to-browser
// Server is only needed for initial handshake (signaling)

The Real Comparison

Feature	Socket.IO	WebRTC
Data path	Client → Server → Client	Client → Client (direct)
Latency	50–200ms	20–100ms
Video/audio	❌ Not designed for it	✅ Built for it
Text/events	✅ Perfect	✅ Also possible (DataChannel)
Server cost	High (all data through server)	Low (only signaling)
Setup complexity	Simple	Complex
Scales to millions	Hard without clustering	Harder (need SFU)
Firewall issues	Rare	TURN needed sometimes

When to use each

Use Socket.IO for:
✅ Chat messages
✅ Notifications ("3 people liked your post")
✅ Collaborative cursors (Google Docs style)
✅ Live dashboards (stock prices, sports scores)
✅ Game state synchronization (turn-based)
✅ WebRTC signaling itself!

Use WebRTC for:
✅ Video calls (Google Meet, Zoom)
✅ Voice calls
✅ Live streaming with < 500ms latency
✅ Screen sharing
✅ Real-time file transfer
✅ Online gaming with low-latency requirements

The trick: Many production apps use BOTH.
Socket.IO handles signaling for WebRTC + chat/notifications.
WebRTC handles video/audio stream.

8. LiveKit — Why You Should Not Build Raw WebRTC

The Problem with Raw WebRTC at Scale

Raw WebRTC is P2P — it works great for 2 people. But what happens in a video call with 10 people?

Naive P2P Mesh approach (10 users):
User 1 sends video to → Users 2, 3, 4, 5, 6, 7, 8, 9, 10 (9 streams)
User 2 sends video to → Users 1, 3, 4, 5, 6, 7, 8, 9, 10 (9 streams)
...
Total connections: n(n-1)/2 = 45 connections

Each user uploads 9 streams simultaneously.
On a typical 10 Mbps upload connection: 9 × 1 Mbps video = 9 Mbps used.
This completely saturates the user's upload bandwidth.
At 20 users it becomes completely impossible.

The Solution: SFU (Selective Forwarding Unit)

An SFU is a media server that sits between participants and intelligently forwards streams:

With SFU (LiveKit):
User 1 → SFU → User 2
              → User 3
              → User 4
              → ...

User 1 only uploads ONE stream (to SFU)
SFU forwards it to everyone else
User 1's upload: 1 × 1 Mbps = 1 Mbps — sustainable at any scale

SFU benefits:
✅ Each user uploads only 1 stream
✅ SFU can selectively forward (only send video of visible participants)
✅ SFU can transcode for different quality subscribers
✅ Recording is handled at the SFU (not the client)
✅ Simulcast: user uploads 3 quality levels, SFU picks the right one per subscriber

Why LiveKit Specifically

Building your own SFU from scratch is a months-long project. LiveKit gives you:

LiveKit provides out of the box:
✅ SFU infrastructure
✅ React/React Native SDKs
✅ Recording and egress (stream to RTMP/YouTube/Twitch)
✅ End-to-end encryption
✅ Simulcast support
✅ Screen sharing
✅ Data channels (for chat alongside video)
✅ Kubernetes-ready deployment
✅ Self-hostable (open source) or cloud service

LiveKit in a React App

// Install: npm install @livekit/components-react livekit-client

import {
  LiveKitRoom,
  VideoConference,
  RoomAudioRenderer,
} from "@livekit/components-react";
import "@livekit/components-styles";

// That's it — complete video conference UI in ~15 lines
const VideoCall = ({ roomName, userId }) => {
  const [token, setToken] = useState("");

  useEffect(() => {
    // Fetch a room token from YOUR backend
    fetch(`/api/livekit/token?room=${roomName}&user=${userId}`)
      .then(res => res.json())
      .then(data => setToken(data.token));
  }, [roomName, userId]);

  if (!token) return <div>Connecting...</div>;

  return (
    <LiveKitRoom
      video={true}
      audio={true}
      token={token}
      serverUrl={process.env.REACT_APP_LIVEKIT_URL}
      style={{ height: "100vh" }}
    >
      <VideoConference />
      <RoomAudioRenderer />
    </LiveKitRoom>
  );
};

// Backend: Generate a room access token (Node.js)
const { AccessToken } = require("livekit-server-sdk");

app.get("/api/livekit/token", (req, res) => {
  const { room, user } = req.query;

  const token = new AccessToken(
    process.env.LIVEKIT_API_KEY,
    process.env.LIVEKIT_API_SECRET,
    { identity: user }
  );

  token.addGrant({
    roomJoin: true,
    room: room,
    canPublish: true,
    canSubscribe: true,
  });

  res.json({ token: token.toJwt() });
});

WebRTC vs LiveKit Decision

Use raw WebRTC when:
✅ 1-to-1 calls only
✅ You want zero infrastructure cost
✅ Simplest possible setup
✅ Learning/experimenting

Use LiveKit when:
✅ 3+ participants in a call
✅ You need recording
✅ You need live streaming egress (RTMP to YouTube)
✅ You need transcription/AI integration
✅ Production application at any real scale

9. Mini Netflix System Design (Interview Ready)

This is how you answer "Design Netflix" or "Design a video streaming platform" in a system design interview.

Clarify Requirements First (Always do this)

Functional Requirements:
✅ Users can watch videos
✅ Multiple video quality levels
✅ Adaptive quality (auto-switch based on connection)
✅ Resume playback from where user left off
✅ Content protection (paid content)
✅ Search and browse catalog
✅ Recommendations based on watch history

Non-Functional Requirements:
✅ Scale: 100 million daily active users
✅ Availability: 99.99% uptime
✅ Latency: Video starts playing within 2 seconds
✅ Global: Users worldwide

High-Level Architecture

                        ┌─────────────────┐
                        │   Load Balancer │
                        └────────┬────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              ▼                  ▼                  ▼
     ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
     │   Auth API   │  │  Metadata API│  │  License API │
     │   Service    │  │   Service    │  │  Service(DRM)│
     └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
            │                  │                  │
     ┌──────▼───────┐  ┌──────▼───────┐  ┌──────▼───────┐
     │  User DB     │  │  Video DB    │  │  Key Mgmt    │
     │ (PostgreSQL) │  │ (PostgreSQL/ │  │  System      │
     └──────────────┘  │  Cassandra)  │  └──────────────┘
                        └──────────────┘

     ┌──────────────────────────────────────────────────┐
     │                  Upload Pipeline                  │
     │  Upload Service → Message Queue → Encoding Worker│
     │                                  (FFmpeg on EC2) │
     └──────────────────────────┬───────────────────────┘
                                 │
                        ┌────────▼────────┐
                        │   S3 Storage    │
                        │ (encoded files) │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │  CDN (Akamai /  │
                        │   Cloudflare)   │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │  Video Player   │
                        │ (HLS + EME/DRM) │
                        └─────────────────┘

Database Design

-- Users table
CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       VARCHAR(255) UNIQUE NOT NULL,
  password_hash VARCHAR(255) NOT NULL,
  plan        VARCHAR(50) DEFAULT 'basic',   -- basic, standard, premium
  created_at  TIMESTAMP DEFAULT NOW()
);

-- Videos table (metadata only — actual files in S3)
CREATE TABLE videos (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title       VARCHAR(500) NOT NULL,
  description TEXT,
  duration_seconds INTEGER,
  genres      TEXT[],                         -- PostgreSQL array
  status      VARCHAR(50) DEFAULT 'processing', -- processing, ready, failed
  content_id  VARCHAR(255),                   -- DRM content ID
  created_at  TIMESTAMP DEFAULT NOW()
);

-- Video files (multiple quality levels per video)
CREATE TABLE video_files (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  video_id    UUID REFERENCES videos(id),
  quality     VARCHAR(20),                    -- 240p, 480p, 720p, 1080p, 4k
  cdn_url     VARCHAR(1000),                  -- HLS manifest URL
  file_size_bytes BIGINT,
  created_at  TIMESTAMP DEFAULT NOW()
);

-- Watch history (for resume playback)
CREATE TABLE watch_history (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     UUID REFERENCES users(id),
  video_id    UUID REFERENCES videos(id),
  position_seconds INTEGER DEFAULT 0,        -- resume position
  completed   BOOLEAN DEFAULT FALSE,
  watched_at  TIMESTAMP DEFAULT NOW(),
  UNIQUE(user_id, video_id)
);

Video Upload & Processing Flow

// 1. Upload Service — handles multipart upload to S3
app.post("/api/upload", async (req, res) => {
  const { title, description } = req.body;

  // Create video record in DB
  const video = await db.video.create({
    data: { title, description, status: "processing" }
  });

  // Get pre-signed S3 URL (client uploads directly to S3, bypassing our server)
  const uploadUrl = await s3.getSignedUploadUrl({
    key: `raw/${video.id}/original.mp4`,
    contentType: "video/mp4",
    expiresIn: 3600,
  });

  // Publish encoding job to queue
  await messageQueue.publish("encoding-jobs", {
    videoId: video.id,
    s3Key: `raw/${video.id}/original.mp4`,
  });

  res.json({ videoId: video.id, uploadUrl });
});

// 2. Encoding Worker (separate service, runs on powerful EC2)
messageQueue.consume("encoding-jobs", async (job) => {
  const { videoId, s3Key } = job;

  // Download from S3
  await s3.download(s3Key, "/tmp/original.mp4");

  // Encode to multiple qualities using FFmpeg
  const qualities = ["240p", "480p", "720p", "1080p"];

  for (const quality of qualities) {
    await ffmpeg.encode({
      input: "/tmp/original.mp4",
      output: `/tmp/${quality}/`,
      quality,
      format: "hls",
      segmentDuration: 6,
    });

    // Apply DRM encryption to segments
    await drm.encryptHls({
      inputDir: `/tmp/${quality}/`,
      contentId: videoId,
    });

    // Upload encrypted segments to S3
    await s3.uploadDirectory(`/tmp/${quality}/`, `videos/${videoId}/${quality}/`);
  }

  // Update video status
  await db.video.update({ id: videoId, status: "ready" });
});

Full Playback Flow (Backend + Frontend)

// Backend — Play endpoint
app.get("/api/videos/:id/play", authenticate, async (req, res) => {
  const video = await db.video.findById(req.params.id);

  // Check subscription for premium content
  if (video.isPremium && req.user.plan === "basic") {
    return res.status(403).json({ error: "Upgrade to watch this content" });
  }

  // Get resume position
  const history = await db.watchHistory.findOne({
    userId: req.user.id,
    videoId: video.id,
  });

  // Return playback info
  res.json({
    manifestUrl: `https://cdn.netflix.com/videos/${video.id}/master.m3u8`,
    licenseServerUrl: `https://license.netflix.com/widevine`,
    resumePosition: history?.positionSeconds ?? 0,
    contentId: video.contentId,
  });
});

// Frontend — Video Player Component
const VideoPlayer = ({ videoId }) => {
  const playerRef = useRef(null);
  const hlsRef = useRef(null);

  useEffect(() => {
    const initPlayer = async () => {
      // Get playback info from API
      const { manifestUrl, licenseServerUrl, resumePosition, contentId } =
        await fetch(`/api/videos/${videoId}/play`).then(r => r.json());

      const video = playerRef.current;

      // Initialize HLS.js
      const hls = new Hls({ autoLevelEnabled: true });
      hlsRef.current = hls;
      hls.loadSource(manifestUrl);
      hls.attachMedia(video);

      // Set up DRM (Widevine)
      video.addEventListener("encrypted", async (event) => {
        if (!video.mediaKeys) {
          const access = await navigator.requestMediaKeySystemAccess(
            "com.widevine.alpha",
            [{ initDataTypes: ["cenc"], videoCapabilities: [{ contentType: 'video/mp4; codecs="avc1"' }] }]
          );
          const mediaKeys = await access.createMediaKeys();
          await video.setMediaKeys(mediaKeys);
        }

        const session = video.mediaKeys.createSession();
        session.addEventListener("message", async (msg) => {
          const license = await fetch(licenseServerUrl, {
            method: "POST",
            headers: { Authorization: `Bearer ${userToken}` },
            body: msg.message,
          }).then(r => r.arrayBuffer());

          await session.update(license);
        });

        await session.generateRequest(event.initDataType, event.initData);
      });

      hls.on(Hls.Events.MANIFEST_PARSED, () => {
        video.currentTime = resumePosition;   // resume where user left off
        video.play();
      });
    };

    initPlayer();

    // Save position every 5 seconds
    const saveInterval = setInterval(() => {
      if (playerRef.current) {
        fetch(`/api/videos/${videoId}/progress`, {
          method: "POST",
          body: JSON.stringify({ position: Math.floor(playerRef.current.currentTime) }),
        });
      }
    }, 5000);

    return () => {
      hlsRef.current?.destroy();
      clearInterval(saveInterval);
    };
  }, [videoId]);

  return (
    <video
      ref={playerRef}
      controls
      style={{ width: "100%", maxHeight: "80vh" }}
    />
  );
};

Scaling Strategy

Handling 100 million daily users:

1. CDN handles ~95% of video traffic
   - No origin servers involved in most plays
   - Cloudflare / Akamai has edge nodes everywhere

2. Horizontal scaling for APIs
   - Stateless services behind load balancer
   - Auto-scaling groups (more traffic → more instances)

3. Database scaling
   - Read replicas for metadata queries
   - Redis caching for popular video metadata
   - Cassandra/DynamoDB for watch history (write-heavy, needs sharding)

4. Async encoding
   - Message queue (SQS/Kafka) for encoding jobs
   - Encoding workers scale independently
   - Slow encoding doesn't block uploads

5. Microservices
   - Upload Service
   - Encoding Service
   - Streaming Service (CDN management)
   - License Service (DRM)
   - Recommendation Service
   - Auth Service
   Each scales independently based on its own load.

10. Real Challenges Engineers Face

Challenge 1 — Buffering Under Poor Networks

Problem: User has 2 Mbps connection. 1080p requires 5 Mbps.

Solution stack:
1. Adaptive Bitrate (ABR) — auto-switch to 480p
2. Pre-buffering — download 3 segments ahead
3. CDN edge selection — pick nearest server
4. Connection quality prediction — use network API to detect before buffering

// Browser Network Information API
const connection = navigator.connection;
if (connection.downlink < 2) {
  // Proactively cap quality at 480p
  hls.currentLevel = 1;   // Index of 480p in levels array
}

Challenge 2 — DRM Integration Complexity

Problem: Three different DRM systems for three different browsers.

Solution — Multi-DRM with a single abstraction:

Shaka Player handles DRM detection automatically:

const player = new shaka.Player(videoElement);
await player.configure({
  drm: {
    servers: {
      "com.widevine.alpha":  "https://license.example.com/widevine",   // Chrome
      "com.apple.fps":       "https://license.example.com/fairplay",   // Safari
      "com.microsoft.playready": "https://license.example.com/playready" // Edge
    }
  }
});
// Shaka picks the right DRM for the current browser automatically
await player.load("https://cdn.example.com/video/master.m3u8");

Challenge 3 — WebRTC ICE Failure (Calls Don't Connect)

Problem: ~15% of WebRTC calls fail to connect without TURN.

Root cause: Symmetric NAT — user's router blocks incoming P2P connections.

Solution:
1. Deploy TURN server (coturn is open source)
2. Use paid TURN providers (Twilio STUN/TURN, Xirsys)
3. Budget: TURN bandwidth = all video passing through it (~$0.10/GB)

Detection:
peerConnection.oniceconnectionstatechange = () => {
  if (peerConnection.iceConnectionState === "failed") {
    // Force TURN relay — restart ICE with only TURN candidates
    peerConnection.restartIce();
  }
};

Challenge 4 — Storage Costs

Problem: 1 hour of raw 4K video = ~50GB
         1 movie encoded to all qualities = ~15GB
         Netflix catalog = 15,000+ titles

Math:
15,000 titles × 15GB = 225TB just for video files
Plus CDN cache across 200+ edge locations...

Solutions:
1. S3 Intelligent-Tiering — automatically move rarely-accessed content to cheaper storage
2. Compress older content more aggressively
3. Delete source files after encoding (only keep encoded versions)
4. Use per-title encoding — Netflix uses ML to find optimal bitrate per-scene
   (action scenes need higher bitrate; static scenes can use lower)

11. Architecture Decision Guide

"Which architecture do I need?"

Is it real-time (< 500ms latency)?
  YES → WebRTC (+ LiveKit if 3+ people)
  NO  ↓

Is it live streaming (one-to-many, < 5s delay acceptable)?
  YES → RTMP ingest + HLS output + CDN
  NO  ↓

Is it on-demand video (Netflix type)?
  YES → HLS + CDN + DRM + Adaptive Bitrate
  NO  ↓

Is it just real-time events/messages?
  YES → Socket.IO / WebSockets

"Do I need DRM?"

Is this paid premium content?       → YES → DRM required
Could redistribution harm business? → YES → DRM required
Is this free/public content?        → NO  → DRM optional
Is this internal/corporate video?   → NO  → DRM probably not needed

"WebRTC or LiveKit?"

Is it 1-to-1 call only?          → Raw WebRTC is fine
Is it 3+ participants?            → LiveKit / SFU required
Do you need recording?            → LiveKit
Do you need RTMP output?          → LiveKit
Is this production? > 100 users?  → LiveKit
Are you learning/prototyping?     → Raw WebRTC to understand concepts

12. Interview-Ready Answers

Q: How does Netflix serve video to millions of users without buffering?

Netflix uses a combination of adaptive bitrate streaming (HLS/DASH), a global CDN that caches video segments at edge servers worldwide, and DRM-encrypted content delivery. When a user plays a video, the player fetches a manifest file listing available quality levels, then the HLS.js player monitors bandwidth and automatically switches quality. The CDN serves segments from the nearest edge node — often less than 10ms away — while the origin servers only need to serve the small percentage of requests that miss the CDN cache.

Q: How does DRM work and why can't it be bypassed?

DRM encrypts video segments using AES-128. The encryption key is never included with the video — it only comes from a License Server after verifying the user's subscription and device authorization. In browsers, the decryption happens inside the browser's Content Decryption Module (CDM), a closed-source component that protects the key in a sandbox. At the hardware level (Widevine L1), decryption happens inside a Trusted Execution Environment where even the OS cannot access the key. However, DRM isn't foolproof — screen recording is possible at the software level (L3), and analog capture is always possible. DRM's goal is to raise the effort bar high enough for casual users and to satisfy legal studio requirements.

Q: Why use WebRTC instead of WebSockets for video calls?

WebSockets route all data through a server, which adds latency and creates enormous server bandwidth costs for video. WebRTC establishes a direct peer-to-peer connection between browsers using ICE/STUN/TURN for NAT traversal, so video data travels directly between users without touching a server. This reduces latency from 100–200ms to 20–100ms and eliminates server bandwidth costs for the video stream. The server (using Socket.IO) is still used for signaling — exchanging the SDP offer/answer and ICE candidates — but once the connection is established, all media flows peer-to-peer.

Q: What is an SFU and why does LiveKit use one?

In a multi-user video call, pure peer-to-peer creates an N×(N-1) mesh — with 10 users, each user uploads 9 video streams simultaneously, which saturates their upload bandwidth. An SFU (Selective Forwarding Unit) is a media server where each user uploads one stream to the SFU, and the SFU forwards it to all subscribers. This reduces each user's upload requirement from N-1 streams to 1 stream. The SFU can also selectively forward — only sending video for visible participants — and support simulcast, where users upload multiple quality levels and the SFU sends the appropriate quality to each subscriber based on their bandwidth.

If this helped you understand the real architecture behind streaming and real-time video, drop a ❤️. Questions? Put them in the comments — especially if you're preparing for a system design interview.

Next: Low-latency live streaming architecture, how to build your own DRM license server, and debugging WebRTC connection failures.

webrtc #systemdesign #javascript #webdev #streaming #drm #livekit #architecture #interview

Table of Contents