- 65M concurrent streams — the largest live sports event ever delivered over the internet
- 113ms → 25ms p50 write latency — S3 baseline vs Cassandra+EVCache
- 200Gbps+ sustained read throughput absorbed by EVCache, not touching the write path
- 2-second segment SLA — every video chunk must be encoded, packaged, and written within this window
- 90%+ cache hit ratio on 404 storms — control-plane metadata cached in-memory
- Two completely independent encoding pipelines — dual-region, epoch-locked, interchangeable at the segment level
November 15, 2024: 65 million people log on to watch Mike Tyson fight Jake Paul, the largest live sports stream in history. Behind the scenes, Netflix engineers are white-knuckling a system they built from scratch — one where a single bad video segment, a CDN request storm, or a missed 2-second write deadline means millions of viewers see a black screen.
The Story
Our back-of-the-envelope calculations showed worst-case read throughput in the O(100Gbps) range, which would normally be extremely expensive for a strongly-consistent storage engine like Apache Cassandra.
— Xiaomei Liu, Joseph Lynch, Chris Newton, via Netflix Engineering Blog
On November 15, 2024, Netflix streamed the biggest live sports event the internet had ever seen. 65 million concurrent viewers watched the fight at AT&T Stadium — a number that dwarfs any single streaming event Netflix had previously attempted. Open Connect (Netflix's proprietary global CDN, a network of hardware appliances co-located inside ISPs worldwide, purpose-built for video delivery) nodes around the world hammered the origin for segments every two seconds, each chunk potentially several megabytes, from every timezone simultaneously. The pressure on Live Origin — the custom-built microservice bridging the cloud encoding pipeline and the CDN — was unlike anything in the company's history. There was no rollback button.
Netflix's engineering challenge with live video is categorically different from its on-demand catalogue. SVOD (Subscription Video on Demand — Netflix's traditional business, where content is pre-encoded, cached extensively, and served from ISP-colocated appliances at near-zero origin load) content is encoded once and served almost entirely from the edge. Live video destroys this model. Every 2-second segment is brand new — it must be encoded, packaged, DRM-encrypted, and written to the origin within a hard real-time deadline, while simultaneously dozens of CDN nodes request that same segment the moment it should exist.
Problem
S3 Can't Keep Up
Early live events revealed S3 segment writes hitting 113ms median latency and 267ms at p99, against a hard 2-second publishing deadline. CDN nodes requesting segments early got throttled responses; playback stalls and buffering appeared for viewers in real time.
Cause
Generic Storage Meets Real-Time Deadlines
S3 lacks the write SLA (a guaranteed maximum time within which a write operation will be acknowledged, critical in live streaming where a missed segment publish directly causes viewer-visible buffering) Netflix requires for live. Its request throttling (S3's automatic rate-limiting when a single prefix receives too many requests per second) kicks in at the exact request rates a live event generates. No amount of tuning can make a general-purpose object store behave like a real-time media system.
Solution
Custom KeyValue Store on Cassandra + EVCache
Netflix built a custom KeyValue abstraction layered on Apache Cassandra with LSM (Log-Structured Merge Tree — Cassandra's write-optimised storage engine that buffers writes in memory and flushes sequentially, achieving high write throughput with predictable latency) for durability, and added EVCache (Memcached-based) as a write-through read cache. Large segment payloads are chunked to enable idempotent retries. Separate EC2 stacks and storage clusters are provisioned for publishing and CDN-facing traffic.
Result
65M Streams Without a Dropped Segment
Median write latency dropped from 113ms to 25ms; p99 improved from 267ms to 129ms. The EVCache layer absorbs nearly all read traffic, allowing the system to sustain 200Gbps+ read throughput without touching the write path.
The Fix
The Storage Architecture Fix
The core fix was replacing S3 with a purpose-built KeyValue abstraction (a storage API layer Netflix built internally, adapted for Live Origin to provide chunked storage of multi-megabyte video segments with idempotent retry semantics and strict latency guarantees) layered on Apache Cassandra. Netflix engineers noted the solution was significantly more expensive than continuing with S3, but explicitly deprioritised cost — at 65 million concurrent streams, reliability is a product requirement, not a cost optimisation target.
- 113ms → 25ms — segment write median latency: S3 baseline vs Cassandra+EVCache
- 200Gbps+ — sustained read throughput absorbed by EVCache, preventing origin storms from reaching the write path
- 90%+ — cache hit ratio for control-plane metadata during 404 storms
- 5s TTL — max-age returned with HTTP 503 to low-priority DVR traffic; instructs CDN to back off and batch requests
# Netflix Live Origin: Simplified segment write path with chunking and priority rate limiting
def write_segment(segment: LiveSegment, priority: Priority) -> WriteResult:
# 1. Break large MiB segment into chunks for parallel writes
# Each chunk independently retryable without re-sending the whole segment
chunks = chunk_payload(segment.data, chunk_size_kb=256)
# 2. Write to Cassandra with local-quorum consistency
# local-quorum: majority of nodes in this AZ must ack before returning
# Survives a full AZ failure while keeping latency predictable (~25ms median)
for chunk in chunks:
cassandra_kv.put(
key=segment.url_path + f":chunk:{chunk.index}",
value=chunk.data,
consistency=Consistency.LOCAL_QUORUM
)
# 3. Simultaneously warm the EVCache read layer (write-through)
# The very first CDN GET for this segment hits cache, not Cassandra
# This absorbs the origin storm before it reaches the write path
evcache.set(
key=segment.url_path,
value=segment.data,
ttl_seconds=segment.duration + BUFFER
)
return WriteResult.OK
def handle_cdn_get(url: str, request_type: RequestType) -> HttpResponse:
# Priority rate limiting: protect write path when storage is under stress
if storage_platform.is_stressed():
if request_type == RequestType.DVR: # low priority: time-shifted playback
# Tell CDN to back off 5 seconds — dampens traffic storms at CDN layer
return HttpResponse(503, headers={"Cache-Control": "max-age=5"})
# Live edge traffic is ALWAYS allowed through — never shed
# Check EVCache first — ~90%+ hit rate during normal operation
cached = evcache.get(url)
if cached:
return HttpResponse(200, body=cached) # fast path: no Cassandra touch
# Cache miss: reconstruct from Cassandra chunks (~10% of requests)
chunks = cassandra_kv.get_all_chunks(url)
return HttpResponse(200, body=reassemble(chunks))
Path isolation was the second major fix. Netflix built completely separate EC2 compute stacks for publishing traffic (from the cloud packager) and CDN-facing traffic (from Open Connect nodes). At the storage layer, separate KeyValue clusters serve read and write operations independently. A CDN traffic surge cannot physically reach the publishing path. Blast radius (the scope of damage if one component fails — by isolating publishing from CDN retrieval, Netflix ensured that an origin storm could degrade DVR delivery without ever threatening the live edge) is contained at the architecture level rather than by runtime throttling.
The redundant pipeline: epoch locking without communication
Netflix runs two completely independent live encoding pipelines across separate AWS regions, with separate contribution feeds, encoders, and packagers. Epoch locking (a mechanism where both pipelines use synchronised timestamps derived from UTC timecodes embedded in each video frame, ensuring their output segments are interchangeable without direct inter-pipeline communication) ensures the two pipelines produce interchangeable segments. When the Live Origin receives a CDN request, it selects the first valid segment from either pipeline — providing transparent, automatic failover without any client involvement.
The 404 storm defence: metadata caching
Live Origin structures metadata hierarchically as event → stream rendition → segment. When CDN nodes request segments that don't yet exist — because the encoder hasn't finished them yet — the origin rejects the request using cached event and rendition metadata, achieving a 90%+ cache hit ratio on these control-plane lookups and preventing the 404 flood from ever reaching the Cassandra storage layer.
Architecture
The Netflix Live Origin sits at the intersection of the cloud encoding pipeline and the global Open Connect CDN — simultaneously the last step for publishers and the first step for hundreds of millions of viewer requests. Before the Cassandra rebuild, this was a single S3 bucket per region: simple, cheap, and completely unable to handle the latency requirements and read-write contention of a live event at scale.
Before: S3-Based Origin — Single Store, No Path Isolation
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
After: Live Origin with Cassandra+EVCache and Full Path Isolation
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Storage architecture before vs after:
| Dimension | S3 (Before) | Cassandra + EVCache (After) |
|---|---|---|
| Write latency p50 | 113ms | 25ms |
| Write latency p99 | 267ms | 129ms |
| Max read throughput | ~S3 limit (throttled) | 200Gbps+ (EVCache) |
| Write-read isolation | None — shared bucket | Fully isolated EC2 stacks |
| AZ failure resilience | S3 replication | Cassandra local-quorum |
| DVR storm protection | None | 503 + 5s TTL backpressure |
| 404 storm defence | None | In-memory metadata cache (90%+ hit) |
Lessons
General-purpose storage cannot serve as the foundation of a real-time system. S3 is brilliant at what it does — durable, scalable object storage — but its p99 latency and request throttling behaviour are incompatible with hard 2-second segment deadlines. If your system has a real-time SLA, audit every storage dependency and ask whether it was designed to meet that SLA — not just whether it typically does.
Read storms and write requirements are incompatible on shared storage without explicit isolation. The Origin Storm only became visible at live-event scale, but the architectural vulnerability existed from day one. Separate your write path from your read path at the infrastructure level — not just logically — before you discover the contention in production.
Write-through caching is a write-protection strategy, not just a read-acceleration one. Netflix's EVCache layer absorbs over 90% of CDN reads, meaning the write path in Cassandra operates in near-silence even during a 65-million-viewer storm. When you add a cache to a system, design it explicitly to protect your write path — not merely to speed up reads.
Priority-based degradation is an active reliability tool, not a failure mode. Deliberately returning HTTP 503 with a 5-second TTL to low-priority DVR traffic is an engineered backpressure mechanism that protects the live edge for viewers watching right now. Build explicit traffic tiers into your architecture, with defined behaviours for what gets shed first when resources are constrained.
Redundancy only works if it is tested at the actual failure granularity. Netflix runs two fully independent encoding pipelines across separate AWS regions, contribution feeds, encoders, and packagers. Epoch locking (synchronising both pipelines via UTC timecodes embedded at the contribution encoder, so they produce interchangeable segments without inter-pipeline communication) makes them interchangeable at the segment level. True redundancy must be tested at the smallest unit of failure in your system, not the largest.
Engineering Glossary
Epoch locking — a synchronisation mechanism where both Live Origin encoding pipelines use timestamps derived from UTC timecodes embedded in each video frame, ensuring their output segments are interchangeable without direct inter-pipeline communication. Enables per-segment failover transparency.
EVCache — Netflix's distributed Memcached layer, used as a write-through read cache in the Live Origin architecture. Absorbs 90%+ of CDN read requests, keeping read traffic from reaching the Cassandra write path.
Live Origin — the custom-built microservice bridging Netflix's cloud encoding pipeline and the Open Connect CDN. The last step for publishers (writing encoded segments) and the first step for CDN nodes (requesting those segments).
LSM (Log-Structured Merge Tree) — Cassandra's write-optimised storage engine that buffers writes in memory and flushes sequentially to disk, achieving high write throughput with predictable latency.
Open Connect — Netflix's proprietary global CDN, a network of hardware appliances co-located inside ISPs worldwide. Each appliance caches video content close to viewers; for live content, they must request new segments from Live Origin every 2 seconds.
Origin Storm — the scenario where many Open Connect CDN nodes simultaneously request the same segment from the origin at once, generating burst read throughput that can overwhelm the storage system. Solved by EVCache write-through caching reducing the storm that reaches Cassandra to ~10%.
Priority rate limiter — a per-request traffic shaper in Live Origin that categorises incoming requests by user impact (live edge vs DVR) and sheds lower-priority traffic first when resources are constrained. Returns HTTP 503 + 5-second max-age TTL to DVR requests, instructing CDN to back off.
SVOD (Subscription Video on Demand) — Netflix's traditional on-demand content model where content is pre-encoded, cached extensively at CDN edges, and served with near-zero origin load. Architecturally opposite to live streaming where every segment is brand new.
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack →
(Interactive diagrams, source links, and the full reader experience)
TechLogStack — built at scale, broken in public, rebuilt by engineers.
Top comments (0)