TechLogStack

Posted on May 20 • Originally published at techlogstack.com on May 17

65 Million Streams: How Netflix Rebuilt Its Guts for Live

#backend #architecture #devops #webdev

65M concurrent streams — the largest live sports event ever delivered over the internet
113ms → 25ms p50 write latency — S3 baseline vs Cassandra+EVCache
200Gbps+ sustained read throughput absorbed by EVCache, not touching the write path
2-second segment SLA — every video chunk must be encoded, packaged, and written within this window
90%+ cache hit ratio on 404 storms — control-plane metadata cached in-memory
Two completely independent encoding pipelines — dual-region, epoch-locked, interchangeable at the segment level

November 15, 2024: 65 million people log on to watch Mike Tyson fight Jake Paul, the largest live sports stream in history. Behind the scenes, Netflix engineers are white-knuckling a system they built from scratch — one where a single bad video segment, a CDN request storm, or a missed 2-second write deadline means millions of viewers see a black screen.

The Story

Our back-of-the-envelope calculations showed worst-case read throughput in the O(100Gbps) range, which would normally be extremely expensive for a strongly-consistent storage engine like Apache Cassandra.

— Xiaomei Liu, Joseph Lynch, Chris Newton, via Netflix Engineering Blog

On November 15, 2024, Netflix streamed the biggest live sports event the internet had ever seen. 65 million concurrent viewers watched the fight at AT&T Stadium — a number that dwarfs any single streaming event Netflix had previously attempted. Open Connect (Netflix's proprietary global CDN, a network of hardware appliances co-located inside ISPs worldwide, purpose-built for video delivery) nodes around the world hammered the origin for segments every two seconds, each chunk potentially several megabytes, from every timezone simultaneously. The pressure on Live Origin — the custom-built microservice bridging the cloud encoding pipeline and the CDN — was unlike anything in the company's history. There was no rollback button.

Netflix's engineering challenge with live video is categorically different from its on-demand catalogue. SVOD (Subscription Video on Demand — Netflix's traditional business, where content is pre-encoded, cached extensively, and served from ISP-colocated appliances at near-zero origin load) content is encoded once and served almost entirely from the edge. Live video destroys this model. Every 2-second segment is brand new — it must be encoded, packaged, DRM-encrypted, and written to the origin within a hard real-time deadline, while simultaneously dozens of CDN nodes request that same segment the moment it should exist.

The Origin Storm Problem

Even after replacing S3, engineers faced a second failure mode: the Origin Storm (the scenario where many Open Connect CDN nodes simultaneously request the same segment, generating read throughput that can overwhelm the storage system). When a new segment becomes available at the top of every 2-second clock tick, potentially dozens of CDN nodes across different geographic sites all issue GET requests simultaneously. Back-of-envelope calculations put worst-case read throughput at over 100 Gbps — a volume that would obliterate write performance on any strongly-consistent database, including Apache Cassandra. The engineers had traded one problem for another.

Problem

S3 Can't Keep Up

Early live events revealed S3 segment writes hitting 113ms median latency and 267ms at p99, against a hard 2-second publishing deadline. CDN nodes requesting segments early got throttled responses; playback stalls and buffering appeared for viewers in real time.

Cause

Generic Storage Meets Real-Time Deadlines

S3 lacks the write SLA (a guaranteed maximum time within which a write operation will be acknowledged, critical in live streaming where a missed segment publish directly causes viewer-visible buffering) Netflix requires for live. Its request throttling (S3's automatic rate-limiting when a single prefix receives too many requests per second) kicks in at the exact request rates a live event generates. No amount of tuning can make a general-purpose object store behave like a real-time media system.

Solution

Custom KeyValue Store on Cassandra + EVCache

Netflix built a custom KeyValue abstraction layered on Apache Cassandra with LSM (Log-Structured Merge Tree — Cassandra's write-optimised storage engine that buffers writes in memory and flushes sequentially, achieving high write throughput with predictable latency) for durability, and added EVCache (Memcached-based) as a write-through read cache. Large segment payloads are chunked to enable idempotent retries. Separate EC2 stacks and storage clusters are provisioned for publishing and CDN-facing traffic.

Result

65M Streams Without a Dropped Segment

Median write latency dropped from 113ms to 25ms; p99 improved from 267ms to 129ms. The EVCache layer absorbs nearly all read traffic, allowing the system to sustain 200Gbps+ read throughput without touching the write path.

The Fix

The Storage Architecture Fix

The core fix was replacing S3 with a purpose-built KeyValue abstraction (a storage API layer Netflix built internally, adapted for Live Origin to provide chunked storage of multi-megabyte video segments with idempotent retry semantics and strict latency guarantees) layered on Apache Cassandra. Netflix engineers noted the solution was significantly more expensive than continuing with S3, but explicitly deprioritised cost — at 65 million concurrent streams, reliability is a product requirement, not a cost optimisation target.

113ms → 25ms — segment write median latency: S3 baseline vs Cassandra+EVCache
200Gbps+ — sustained read throughput absorbed by EVCache, preventing origin storms from reaching the write path
90%+ — cache hit ratio for control-plane metadata during 404 storms
5s TTL — max-age returned with HTTP 503 to low-priority DVR traffic; instructs CDN to back off and batch requests

# Netflix Live Origin: Simplified segment write path with chunking and priority rate limiting

def write_segment(segment: LiveSegment, priority: Priority) -> WriteResult:
    # 1. Break large MiB segment into chunks for parallel writes
    # Each chunk independently retryable without re-sending the whole segment
    chunks = chunk_payload(segment.data, chunk_size_kb=256)

    # 2. Write to Cassandra with local-quorum consistency
    # local-quorum: majority of nodes in this AZ must ack before returning
    # Survives a full AZ failure while keeping latency predictable (~25ms median)
    for chunk in chunks:
        cassandra_kv.put(
            key=segment.url_path + f":chunk:{chunk.index}",
            value=chunk.data,
            consistency=Consistency.LOCAL_QUORUM
        )

    # 3. Simultaneously warm the EVCache read layer (write-through)
    # The very first CDN GET for this segment hits cache, not Cassandra
    # This absorbs the origin storm before it reaches the write path
    evcache.set(
        key=segment.url_path,
        value=segment.data,
        ttl_seconds=segment.duration + BUFFER
    )

    return WriteResult.OK

def handle_cdn_get(url: str, request_type: RequestType) -> HttpResponse:
    # Priority rate limiting: protect write path when storage is under stress
    if storage_platform.is_stressed():
        if request_type == RequestType.DVR:  # low priority: time-shifted playback
            # Tell CDN to back off 5 seconds — dampens traffic storms at CDN layer
            return HttpResponse(503, headers={"Cache-Control": "max-age=5"})
        # Live edge traffic is ALWAYS allowed through — never shed

    # Check EVCache first — ~90%+ hit rate during normal operation
    cached = evcache.get(url)
    if cached:
        return HttpResponse(200, body=cached)  # fast path: no Cassandra touch

    # Cache miss: reconstruct from Cassandra chunks (~10% of requests)
    chunks = cassandra_kv.get_all_chunks(url)
    return HttpResponse(200, body=reassemble(chunks))

Write-Through Caching as Write Protection

The elegant solution to the origin storm was a write-through cache: every segment written to Cassandra is simultaneously cached in EVCache. When an Open Connect node requests a segment, it hits EVCache first. Cache hits serve at network speed; only misses reach Cassandra. The team validated this against its own back-of-envelope: if the cache hits 90% of reads, only 10% of the theoretical 100Gbps storm ever reaches Cassandra. This achieves read-write separation without separate infrastructure — the write path operates in near-silence even during a 65-million-viewer event.

Path isolation was the second major fix. Netflix built completely separate EC2 compute stacks for publishing traffic (from the cloud packager) and CDN-facing traffic (from Open Connect nodes). At the storage layer, separate KeyValue clusters serve read and write operations independently. A CDN traffic surge cannot physically reach the publishing path. Blast radius (the scope of damage if one component fails — by isolating publishing from CDN retrieval, Netflix ensured that an origin storm could degrade DVR delivery without ever threatening the live edge) is contained at the architecture level rather than by runtime throttling.

The redundant pipeline: epoch locking without communication

Netflix runs two completely independent live encoding pipelines across separate AWS regions, with separate contribution feeds, encoders, and packagers. Epoch locking (a mechanism where both pipelines use synchronised timestamps derived from UTC timecodes embedded in each video frame, ensuring their output segments are interchangeable without direct inter-pipeline communication) ensures the two pipelines produce interchangeable segments. When the Live Origin receives a CDN request, it selects the first valid segment from either pipeline — providing transparent, automatic failover without any client involvement.

The 404 storm defence: metadata caching

Live Origin structures metadata hierarchically as event → stream rendition → segment. When CDN nodes request segments that don't yet exist — because the encoder hasn't finished them yet — the origin rejects the request using cached event and rendition metadata, achieving a 90%+ cache hit ratio on these control-plane lookups and preventing the 404 flood from ever reaching the Cassandra storage layer.

Architecture

The Netflix Live Origin sits at the intersection of the cloud encoding pipeline and the global Open Connect CDN — simultaneously the last step for publishers and the first step for hundreds of millions of viewer requests. Before the Cassandra rebuild, this was a single S3 bucket per region: simple, cheap, and completely unable to handle the latency requirements and read-write contention of a live event at scale.

Before: S3-Based Origin — Single Store, No Path Isolation

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).

After: Live Origin with Cassandra+EVCache and Full Path Isolation

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).

Storage architecture before vs after:

Dimension	S3 (Before)	Cassandra + EVCache (After)
Write latency p50	113ms	25ms
Write latency p99	267ms	129ms
Max read throughput	~S3 limit (throttled)	200Gbps+ (EVCache)
Write-read isolation	None — shared bucket	Fully isolated EC2 stacks
AZ failure resilience	S3 replication	Cassandra local-quorum
DVR storm protection	None	503 + 5s TTL backpressure
404 storm defence	None	In-memory metadata cache (90%+ hit)

Priority Rate Limiting: Engineering Backpressure Into the System

When the system is under storage stress, the origin's priority rate limiter (a per-request traffic shaper that categorises incoming requests by user impact — live edge (highest priority) vs DVR (lower priority) — and sheds lower-priority traffic first when resources are constrained) sheds DVR traffic with HTTP 503 + 5-second max-age headers. Deliberately returning HTTP 503 to low-priority traffic is not a bug — it is an engineered backpressure mechanism that protects the live edge for every viewer currently watching the event.

Lessons

General-purpose storage cannot serve as the foundation of a real-time system. S3 is brilliant at what it does — durable, scalable object storage — but its p99 latency and request throttling behaviour are incompatible with hard 2-second segment deadlines. If your system has a real-time SLA, audit every storage dependency and ask whether it was designed to meet that SLA — not just whether it typically does.
Read storms and write requirements are incompatible on shared storage without explicit isolation. The Origin Storm only became visible at live-event scale, but the architectural vulnerability existed from day one. Separate your write path from your read path at the infrastructure level — not just logically — before you discover the contention in production.
Write-through caching is a write-protection strategy, not just a read-acceleration one. Netflix's EVCache layer absorbs over 90% of CDN reads, meaning the write path in Cassandra operates in near-silence even during a 65-million-viewer storm. When you add a cache to a system, design it explicitly to protect your write path — not merely to speed up reads.
Priority-based degradation is an active reliability tool, not a failure mode. Deliberately returning HTTP 503 with a 5-second TTL to low-priority DVR traffic is an engineered backpressure mechanism that protects the live edge for viewers watching right now. Build explicit traffic tiers into your architecture, with defined behaviours for what gets shed first when resources are constrained.
Redundancy only works if it is tested at the actual failure granularity. Netflix runs two fully independent encoding pipelines across separate AWS regions, contribution feeds, encoders, and packagers. Epoch locking (synchronising both pipelines via UTC timecodes embedded at the contribution encoder, so they produce interchangeable segments without inter-pipeline communication) makes them interchangeable at the segment level. True redundancy must be tested at the smallest unit of failure in your system, not the largest.

Engineering Glossary

Epoch locking — a synchronisation mechanism where both Live Origin encoding pipelines use timestamps derived from UTC timecodes embedded in each video frame, ensuring their output segments are interchangeable without direct inter-pipeline communication. Enables per-segment failover transparency.

EVCache — Netflix's distributed Memcached layer, used as a write-through read cache in the Live Origin architecture. Absorbs 90%+ of CDN read requests, keeping read traffic from reaching the Cassandra write path.

Live Origin — the custom-built microservice bridging Netflix's cloud encoding pipeline and the Open Connect CDN. The last step for publishers (writing encoded segments) and the first step for CDN nodes (requesting those segments).

LSM (Log-Structured Merge Tree) — Cassandra's write-optimised storage engine that buffers writes in memory and flushes sequentially to disk, achieving high write throughput with predictable latency.

Open Connect — Netflix's proprietary global CDN, a network of hardware appliances co-located inside ISPs worldwide. Each appliance caches video content close to viewers; for live content, they must request new segments from Live Origin every 2 seconds.

Origin Storm — the scenario where many Open Connect CDN nodes simultaneously request the same segment from the origin at once, generating burst read throughput that can overwhelm the storage system. Solved by EVCache write-through caching reducing the storm that reaches Cassandra to ~10%.

Priority rate limiter — a per-request traffic shaper in Live Origin that categorises incoming requests by user impact (live edge vs DVR) and sheds lower-priority traffic first when resources are constrained. Returns HTTP 503 + 5-second max-age TTL to DVR requests, instructing CDN to back off.

SVOD (Subscription Video on Demand) — Netflix's traditional on-demand content model where content is pre-encoded, cached extensively at CDN edges, and served with near-zero origin load. Architecturally opposite to live streaming where every segment is brand new.

This case is a plain-English retelling of publicly available engineering material.

Read the full case on TechLogStack →

(Interactive diagrams, source links, and the full reader experience)

TechLogStack — built at scale, broken in public, rebuilt by engineers.

DEV Community