Netflix · Live Streaming · 17 May 2026
November 15, 2024: 65 million people log on to watch Mike Tyson fight Jake Paul, the largest live sports stream in history. Behind the scenes, Netflix engineers are white-knuckling a system they built from scratch — one where a single bad video segment, a CDN request storm, or a missed 2-second write deadline means millions of viewers see a black screen.
- 65M concurrent streams
- 113ms → 25ms p50 latency
- 200Gbps+ read throughput
- 2-second segment SLA
- 90%+ cache hit on 404 storms
- 38M events/sec monitored
The Story
Our back-of-the-envelope calculations showed worst-case read throughput in the O(100Gbps) range, which would normally be extremely expensive for a strongly-consistent storage engine like Apache Cassandra.
— — Xiaomei Liu, Joseph Lynch, Chris Newton — via Netflix Engineering Blog
On November 15, 2024, Netflix did something it had been quietly engineering toward for three years: it streamed the biggest live sports event the internet had ever seen. 65 million concurrent viewers watched Mike Tyson and Jake Paul trade punches at AT&T Stadium in Arlington, Texas — a number that dwarfs any single streaming event Netflix had previously attempted. Open Connect (Netflix's proprietary global CDN, a network of hardware appliances co-located inside ISPs worldwide, purpose-built for video delivery) nodes around the world hammered the origin for segments every two seconds, each chunk potentially several megabytes, from every timezone simultaneously. The pressure on what Netflix engineers call Live Origin — the custom-built microservice bridging the cloud encoding pipeline and the CDN — was unlike anything in the company's history. This was not a load test. There was no rollback button if the system buckled.
Netflix's engineering challenge with live video is categorically different from its on-demand catalog. SVOD (Subscription Video on Demand — Netflix's traditional business, where content is pre-encoded, cached extensively, and served from ISP-colocated appliances at near-zero origin load) content is encoded once, uploaded once, and then served almost entirely from the edge with the origin barely involved. Live video destroys this model entirely. Every 2-second segment is brand new — it must be encoded, packaged, DRM-encrypted, and written to the origin within a hard real-time deadline, while simultaneously dozens of CDN nodes are requesting that same segment the moment it should exist. Netflix's existing infrastructure, including its massive Open Connect network, was built for static content; live content required the engineers to rethink storage, traffic management, and quality control from first principles.
📡
Netflix's early live events used plain AWS S3 buckets as the segment store — and the results were brutal. Median write latency of 113ms against a 2-second publishing deadline meant the system was spending nearly 6% of every segment's entire window just waiting on storage acknowledgment, with p99 latencies of 267ms making late segments a near-certainty at scale.
The original Live Origin architecture relied on S3 (Amazon Simple Storage Service — a general-purpose object store, highly durable and scalable but not optimized for the strict latency SLAs of real-time live streaming) as the backing store for video segments. When the packager finished encoding a segment, it issued a PUT request to S3; when an Open Connect node needed that segment, it issued a GET. The problem was that S3 is designed for general-purpose durability, not microsecond consistency. High latency variation on writes meant segments frequently missed their publishing window. At high request rates exceeding 100 RPS per event , S3 throttled the origin, causing playback latency spikes visible to viewers as buffering. The team knew that scaling to tens of millions of concurrent streams would make this completely untenable — a generic storage solution cannot serve as the foundation of a real-time broadcast.
The Origin Storm Problem
Even after replacing S3, the engineers faced a second failure mode that they called the Origin Storm (the scenario where many Open Connect CDN nodes simultaneously request the same segment from the origin at once, generating read throughput that can overwhelm the storage system). When a new segment becomes available — at the top of every 2-second clock tick — potentially dozens of top-tier Open Connect nodes across different geographic sites all issue GET requests simultaneously. Each segment can be several megabytes of encoded video. Back-of-envelope calculations put worst-case read throughput at over 100 Gbps — a volume that would obliterate write performance on any strongly-consistent database, including Apache Cassandra. The engineers had traded one problem for another: a write-optimized store that couldn't survive its own read traffic.
Problem
S3 Can't Keep Up
Early live events reveal S3 segment writes hitting 113ms median latency and 267ms at p99, against a hard 2-second publishing deadline. CDN nodes requesting segments early get throttled responses; playback stalls and buffering appear for viewers in real time.
Cause
Generic Storage Meets Real-Time Deadlines
S3 lacks the write SLA (a guaranteed maximum time within which a write operation will be acknowledged, critical in live streaming where a missed segment publish directly causes viewer-visible buffering) Netflix requires for live. Its request throttling (S3's automatic rate-limiting when a single prefix receives too many requests per second, designed to protect shared infrastructure) kicks in at the exact request rates a live event generates. No amount of tuning can make a general-purpose object store behave like a real-time media system.
Solution
Custom KeyValue Store on Cassandra + EVCache
Netflix builds a custom KeyValue abstraction layered on Apache Cassandra with LSM (Log-Structured Merge Tree — Cassandra's write-optimized storage engine that buffers writes in memory and flushes sequentially, achieving high write throughput with predictable latency) for durability, and adds EVCache (Memcached-based) as a write-through read cache. Large segment payloads are chunked to enable idempotent retries and load distribution across the Cassandra cluster. Separate EC2 stacks and storage clusters are provisioned for publishing and CDN-facing traffic.
Result
65M Streams Without a Dropped Segment
Median write latency drops from 113ms to 25ms ; p99 improves from 267ms to 129ms. The EVCache layer absorbs nearly all read traffic, allowing the system to sustain 200Gbps+ read throughput without touching the write path. The Tyson vs. Paul fight streams successfully to 65 million concurrent viewers — the largest live sports event ever delivered over the internet.
⚠️
The Dual-Write Tension
Cassandra is excellent for writes, but its local-quorum consistency (requiring acknowledgment from a majority of nodes in the local datacenter before a write is confirmed, trading some latency for guaranteed durability even if one availability zone fails) model meant that high concurrent reads from dozens of CDN nodes created resource contention that degraded writes. Netflix had to explicitly separate read and write paths at the infrastructure level — not just logically, but physically — to prevent the CDN read storm from destroying the publisher's ability to deliver new segments.
The elegant solution to the origin storm was a write-through cache : every segment written to Cassandra is simultaneously cached in EVCache (Netflix's distributed Memcached layer). When an Open Connect node requests a segment, it hits EVCache first. Cache hits serve at network speed; only misses reach Cassandra. This achieves read-write separation without separate infrastructure — the write path remains clean and fast through Cassandra's LSM engine, while reads are absorbed almost entirely by the in-memory cache. The team validated this against its own back-of-envelope: if the cache hits 90% of reads, then only 10% of the theoretical 100Gbps storm ever reaches Cassandra, putting it comfortably in the sustainable range. In practice the system exceeded 200Gbps sustained read throughput with no write degradation observed.
THE REDUNDANT PIPELINE
Netflix runs two completely independent live encoding pipelines across separate AWS regions, with separate contribution feeds, encoders, and packagers. Epoch locking (a mechanism where both pipelines use synchronized timestamps derived from UTC timecodes embedded in each video frame, ensuring their output segments are interchangeable without direct inter-pipeline communication) ensures the two pipelines produce interchangeable segments. When the Live Origin receives a CDN request, it selects the first valid segment from either pipeline — providing transparent, automatic failover without any client involvement.
Netflix spent three years quietly solving a problem nobody knew it had, and the 65 million people watching the fight had no idea any of this was happening — which is exactly how it was supposed to work.
TechLogStack — built at scale, broken in public, rebuilt by engineers
The Fix
- 113ms→25ms — Segment write median latency: S3 baseline vs. Cassandra+EVCache — a 4.5x improvement that brought writes inside the 2-second segment publishing budget with headroom to spare
- 200Gbps+ — Sustained read throughput the EVCache layer absorbs from Open Connect CDN nodes, preventing origin storms from reaching the Cassandra write path during peak live events
- 90%+ — Cache hit ratio for control-plane metadata during 404 storms — in-memory caching of event and rendition metadata means non-existent segment requests are rejected before they ever touch the storage layer
- 5s TTL — Max-age returned with HTTP 503 to low-priority DVR traffic under load — deliberately instructing Open Connect to back off and batch repeated requests, dampening traffic storms at the CDN layer
The Storage Architecture Fix
The core fix was replacing S3 with a purpose-built KeyValue abstraction (a storage API layer Netflix built internally, originally for cloud game-save state, adapted for Live Origin to provide chunked storage of multi-megabyte video segments with idempotent retry semantics and strict latency guarantees) layered on Apache Cassandra. The existing system, originally built for gaming cloud saves, needed three significant enhancements for live video: write availability through AZ failures (solved by Cassandra's local-quorum model), handling of large MiB-scale payloads (solved by the chunking algorithm), and read throughput during CDN storms (solved by EVCache write-through caching). Netflix engineers noted the solution was significantly more expensive than continuing with S3, but explicitly deprioritized cost — at 65 million concurrent streams, a $5 latency spike per viewer is a service-ending event, not a cost optimization tradeoff.
# Netflix Live Origin: Simplified segment write path with chunking and priority rate limiting
def write_segment(segment: LiveSegment, priority: Priority) -> WriteResult:
# 1. Break large MiB segment into small chunks for parallel writes
# Each chunk can be independently retried without re-sending the whole segment
chunks = chunk_payload(segment.data, chunk_size_kb=256)
# 2. Write to Cassandra with local-quorum consistency
# local-quorum = majority of nodes in this AZ must ack before returning
# This survives a full AZ failure while keeping latency predictable
for chunk in chunks:
cassandra_kv.put(
key=segment.url_path + f":chunk:{chunk.index}",
value=chunk.data,
consistency=Consistency.LOCAL_QUORUM # AZ-resilient, ~25ms median
)
# 3. Simultaneously warm the EVCache read layer (write-through)
# This means the very first CDN GET for this segment hits cache, not Cassandra
# absorbing the origin storm before it reaches the write path
evcache.set(
key=segment.url_path,
value=segment.data,
ttl_seconds=segment.duration + BUFFER # expire after segment is "old"
)
return WriteResult.OK
def handle_cdn_get(url: str, request_type: RequestType) -> HttpResponse:
# Priority rate limiting: protect write path when storage is under stress
if storage_platform.is_stressed():
if request_type == RequestType.DVR: # low priority: time-shifted playback
# Tell CDN to back off for 5 seconds and retry
return HttpResponse(503, headers={"Cache-Control": "max-age=5"})
# Live edge traffic (request_type == LIVE_EDGE) is ALWAYS allowed through
# Check EVCache first — ~90%+ hit rate during normal operation
cached = evcache.get(url)
if cached:
return HttpResponse(200, body=cached) # fast path: no Cassandra touch
# Cache miss: reconstruct from Cassandra chunks
chunks = cassandra_kv.get_all_chunks(url) # only ~10% of requests reach here
return HttpResponse(200, body=reassemble(chunks))
ℹ️
The 404 Storm Defense
Live origin structures metadata hierarchically as event → stream rendition → segment. When CDN nodes request segments that don't yet exist — because the encoder hasn't finished them yet — the origin rejects the request using cached event and rendition metadata , achieving a 90%+ cache hit ratio on these control-plane lookups and preventing the 404 flood from ever reaching the Cassandra storage layer.
Path isolation was the second major fix. Netflix built completely separate EC2 compute stacks for publishing traffic (from the cloud packager) and CDN-facing traffic (from Open Connect nodes). At the storage layer, separate KeyValue clusters serve read and write operations independently. This means a CDN traffic surge — which happens at the exact moment a high-profile event begins — cannot physically reach the publishing path. The packager's write operations run on dedicated infrastructure that is invisible to the CDN. Blast radius (the scope of damage if one component fails — by isolating publishing from CDN retrieval, Netflix ensured that an origin storm could degrade DVR delivery without ever threatening the live edge) is contained at the architecture level rather than by runtime throttling.
✅
Dual-Pipeline Segment Selection
The Live Origin runs two independent encoding pipelines across separate AWS regions. When a CDN node requests a segment, the origin checks both pipelines in deterministic order and returns the first valid one. Segment defects — detected by lightweight media inspection at the packager — are communicated as metadata, allowing the origin to skip bad segments and serve good ones transparently, with no client-side logic required.
⚠️
Cost Was Not a Constraint
Netflix engineers explicitly called out that the Cassandra+EVCache solution is more expensive than S3. The architectural choice to separate read and write paths with dedicated compute stacks and dual Cassandra clusters means materially higher infrastructure spend per event. Netflix accepted this as an explicit design constraint: for a 65-million-viewer live event, reliability is a product requirement, not a cost optimization target.
MILLISECOND-GRAIN CACHING
Standard HTTP Cache-Control headers work only at one-second granularity — a lifetime when segments are generated every 2 seconds. Netflix added millisecond-grain caching to nginx at the edge to enable sub-second backoff signals. This prevents CDN nodes from hammering the origin during the brief window between when they expect a segment and when it actually arrives, significantly reducing origin-facing request chatter without modifying any client device code.
Architecture
The Netflix Live Origin sits at the exact intersection of the cloud encoding pipeline and the global Open Connect CDN — a position that makes it simultaneously the last step for publishers and the first step for hundreds of millions of viewer requests. Before the Cassandra rebuild, this was a single S3 bucket per region: simple, cheap, and completely unable to handle the latency requirements and read-write contention of a live event at scale. The post-rebuild architecture separates publishing and CDN retrieval into physically isolated compute stacks, introduces a write-optimized Cassandra backing store with EVCache write-through caching, and adds intelligent traffic prioritization at every layer. REaP (Redundant Encoding and Packaging — an ISO/IEC standard (23009-9) for dual-pipeline live streaming where both pipelines produce interchangeable segments without inter-pipeline communication) compliance ensures both encoding pipelines produce segments that any downstream component can select from interchangeably.
Before: S3-Based Origin — Single Store, No Path Isolation
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
THE WRITE-READ CONFLICT
In the S3 architecture, the same storage endpoint served both segment publishers (writing every 2 seconds) and Open Connect nodes (reading simultaneously from dozens of global locations). There was no isolation : a CDN request surge at event launch directly competed with the packager's write operations, and S3's throttling kicked in at exactly the wrong moment — when the event started and both loads peaked simultaneously.
After: Live Origin with Cassandra+EVCache and Full Path Isolation
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
The architecture diagram reveals the key insight: publishing and CDN retrieval share no infrastructure path. The packager writes to a dedicated EC2 stack with its own KeyValue write cluster backed by Cassandra's LSM engine. Open Connect nodes read from a completely separate EC2 stack with its own KeyValue read cluster, serviced almost entirely by EVCache. The Cassandra cluster sits beneath both, but the write-through cache ensures reads almost never reach it. When the system is under storage stress, the origin's priority rate limiter (a per-request traffic shaper that categorizes incoming requests by user impact — live edge (highest priority) vs. DVR (lower priority) — and sheds lower-priority traffic first when resources are constrained) sheds DVR traffic with HTTP 503 + 5-second max-age headers, protecting live edge delivery for every viewer currently watching the event.
🔒
Epoch Locking: How Two Pipelines Stay Synchronized Without Talking
Both encoding pipelines embed UTC timecodes (precise wall-clock time embedded in each video frame as SEI metadata by the contribution encoder, ensuring both pipelines produce segments with identical timing boundaries despite running completely independently) as SEI messages in each video frame. This allows both pipelines to produce segments with identical duration boundaries and interchangeable segment numbers — so the Live Origin can transparently switch between pipelines on a per-segment basis with no viewer-visible discontinuity.
Live Origin storage architecture: before vs. after, by key operational dimension
| Dimension | S3 (Before) | Cassandra + EVCache (After) |
|---|---|---|
| Write latency p50 | 113ms | 25ms |
| Write latency p99 | 267ms | 129ms |
| Max read throughput | ~S3 limit (throttled) | 200Gbps+ (EVCache) |
| Write-read isolation | None — shared bucket | Fully isolated EC2 stacks |
| AZ failure resilience | S3 replication | Cassandra local-quorum |
| DVR storm protection | None | 503 + 5s TTL backpressure |
| 404 storm defense | None | In-memory metadata cache (90%+ hit) |
Lessons
Netflix's Live Origin story is a masterclass in what happens when you try to use general-purpose infrastructure for a real-time system with hard latency deadlines. Every decision — from choosing Cassandra over S3, to physically separating publishing and CDN stacks, to the priority rate limiter — came from a specific failure mode the engineers encountered in production. These are the principles that survive contact with 65 million simultaneous viewers.
- 01. General-purpose storage cannot serve as the foundation of a real-time system. S3 is brilliant at what it does — durable, scalable object storage — but its p99 latency (the response time that 99% of requests fall under; if p99 is 267ms on a 2-second deadline, roughly 1-in-100 segments publishes dangerously close to the boundary) and request throttling behavior are incompatible with hard 2-second segment deadlines. If your system has a real-time SLA, audit every storage dependency and ask whether it was designed to meet that SLA — not just whether it typically does.
- 02. Read storms and write requirements are incompatible on shared storage without explicit isolation. The Origin Storm (when many CDN nodes simultaneously request the same fresh segment, generating burst read throughput that competes with the write path on shared storage infrastructure) only became visible at live-event scale, but the architectural vulnerability existed from day one. Separate your write path from your read path at the infrastructure level — not just logically — before you discover the contention in production.
- 03. Write-through caching is a write-protection strategy, not just a read-acceleration one. Netflix's EVCache layer absorbs over 90% of CDN reads , meaning the write path in Cassandra operates in near-silence even during a 65-million-viewer storm. When you add a cache to a system, design it explicitly to protect your write path — not merely to speed up reads.
- 04. Priority-based degradation is an active reliability tool, not a failure mode. Deliberately returning HTTP 503 with a 5-second TTL to low-priority DVR traffic is not a bug — it is an engineered backpressure mechanism that protects the live edge for viewers watching right now. Build explicit traffic tiers into your architecture, with defined behaviors for what gets shed first when resources are constrained.
- 05. Redundancy only works if it is tested at the actual failure granularity. Netflix runs two fully independent encoding pipelines across separate AWS regions, contribution feeds, encoders, and packagers — not just two copies of the same pipeline in the same region. Epoch locking (synchronizing both pipelines via UTC timecodes embedded at the contribution encoder, so they produce interchangeable segments without any inter-pipeline communication) makes them interchangeable at the segment level, not just the stream level. True redundancy must be tested at the smallest unit of failure in your system, not the largest.
ℹ️
The Cost Admission
Netflix engineers explicitly wrote that the Cassandra+EVCache architecture is more expensive than S3. This honesty is rare and valuable: reliability at scale sometimes costs more, and pretending otherwise leads to systems that are cheap until the moment they matter, then catastrophically expensive. Know your constraints before you optimize.
OBSERVABILITY AT LIVE SCALE
Netflix processed 38 million telemetry events per second during its largest live events, using a mix of internal tools (Atlas, Mantis, Lumen) and open-source tech (Kafka, Druid) to deliver critical metrics to the Control Center in seconds. Live streaming is not just an engineering problem — it is a real-time operations problem. Build your observability layer before the event, not during it.
Netflix built a multi-year, multi-million-dollar storage architecture so that 65 million people could watch two men punch each other — and the highest praise it will ever receive is that nobody noticed.
TechLogStack — built at scale, broken in public, rebuilt by engineers
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack → (interactive diagrams, source links, and the full reader experience).
Top comments (0)