How Netflix Streams Live Video to 100 Million Devices

Netflix VOD is a solved problem. Content is encoded ahead of time, cached on edge servers, and served from the closest node. You press play, it works.

Live streaming is an entirely different beast. Every video segment must be encoded, packaged, and delivered to millions of viewers within seconds. There's no "ahead of time." There's only "right now."

When Netflix introduced live streaming for events like the Tyson vs. Paul fight (65 million concurrent streams), they built a custom system called Live Origin to sit between their cloud encoding pipelines and Open Connect, their CDN. Here's how it works.

The Core Architecture

The Live Origin operates as a multi-tenant microservice on Amazon EC2 instances. The communication model is straightforward:

The Packager (which prepares video segments) sends segments to the Origin using HTTP PUT
Open Connect nodes retrieve segments using HTTP GET
The PUT URL matches the GET URL — simple storage and retrieval

Encoder → Packager → [HTTP PUT] → Live Origin → [HTTP GET] → Open Connect → Viewers

Two architectural decisions shaped everything:

Redundant pipelines. Netflix runs two independent encoding pipelines simultaneously, each in a different cloud region with its own encoder, packager, and video feed. If one produces a bad segment, the other typically produces a good one.

Predictable segment templates. Instead of constantly updating a manifest to list available segments, each segment has a fixed duration of 2 seconds. The Origin can predict exactly when each segment should be published.

Multi-Pipeline Intelligent Selection

Live video feeds inevitably produce defects — short segments with missing frames, missing segments entirely, or timing discontinuities. Two independent pipelines in different regions substantially reduce the chance that both produce defective segments at the same time.

When Open Connect requests a segment, the Origin:

Checks candidates from both pipelines
Reads defect metadata (set by the Packager during lightweight inspection)
Selects the first valid segment

Pipeline A (us-east-1): segment_1042 → ✓ valid
Pipeline B (us-west-2): segment_1042 → ✗ missing frames

Origin decision: serve Pipeline A's segment

In the rare case where both are defective, the defect metadata passes downstream so clients can handle the error gracefully. No silent corruption.

Optimizing for the CDN

Open Connect was built for VOD — years of tuning nginx for pre-positioned content. Live streaming didn't fit that model, so Netflix extended nginx's proxy-caching with several optimizations:

Reject invalid requests early. Open Connect nodes know the segment templates. If a request asks for a segment outside the legitimate range, it's rejected immediately without hitting the Origin.

Cache 404s intelligently. When a segment isn't available yet, the Origin returns a 404 with an expiration policy. Open Connect caches this 404 until just before the segment is expected, preventing repeated failed requests.

Hold requests at the live edge. When a request arrives for the next segment that's about to be published, instead of returning a 404 that propagates back to the client, the Origin holds the request open. Once the segment is published, it responds immediately. This eliminates a full round-trip for requests that arrive slightly early.

Millisecond-grain caching. Standard HTTP Cache-Control works at second granularity — too coarse when segments are generated every 2 seconds. Netflix added millisecond-precision caching to nginx.

Streaming Metadata Through HTTP Headers

Netflix uses custom HTTP headers to broadcast live streaming events at scale. The encoding pipeline sends notifications to the Origin, which inserts them as headers on subsequent segments. These headers are cumulative — they persist to future segments.

When a segment arrives at an Open Connect node, notifications are extracted from response headers and stored in memory. When serving to clients, the latest notification data is attached, regardless of where the viewer is in the stream.

This efficiently communicates ad breaks, content warnings, or live event updates to millions of devices, independent of playback position. No separate notification channel needed.

Storage Architecture: From S3 to Cassandra

Netflix initially used AWS S3, similar to their VOD infrastructure. It worked for low-traffic events. Then they discovered live streaming's unique requirements:

Every write is critical and time-sensitive (2-second retry budget)
Any failed write within 500ms was considered a bug
Hundreds of MB replicating across regions
Read latency under 1 second with strong consistency
Potential worst-case read throughput of 100+ Gbps

S3 met its uptime guarantees, but the strict timing budget meant any delay was catastrophic. The requirements were closer to a global, low-latency, highly available database than object storage.

Netflix built on their Key-Value Storage Abstraction using Apache Cassandra:

Large payloads are broken into chunks that can be independently retried
Local-quorum consistency allows write availability even with an entire AZ outage
Write-optimized LSM Tree storage engine handles the write-heavy workload

The results: median latency dropped from 113ms to 25ms. P99 latency improved from 267ms to 129ms.

But there was still the Origin Storm problem — dozens of top-tier caches simultaneously requesting large segments. Worst-case read throughput could reach 100+ Gbps. Concurrent reads degraded write performance unacceptably.

The solution: write-through caching with EVCache (Netflix's distributed caching layer based on Memcached). Almost all reads are served from cache at 200+ Gbps throughput without affecting the write path. Only cache misses hit Cassandra.

Write path:  Packager → Origin → KeyValue → Cassandra (+ EVCache write-through)
Read path:   Open Connect → Origin → EVCache (hit) → response
                                   → Cassandra (miss) → EVCache → response

Request Prioritization

Not all requests have the same importance:

Origin writes — most critical (failure impacts the pipeline)
Live edge reads — critical (failure impacts most viewers)
DVR reads — less critical (failure only affects rewinding viewers)

Netflix implemented complete path isolation:

Separate EC2 stacks for publishing and CDN traffic
Separate clusters for read and write in the storage layer
EVCache handles reads, Cassandra handles writes

During stress, priority-based rate limiting kicks in. Live edge traffic gets prioritized over DVR traffic. Detection uses the predictable segment template cached in memory — no datastore access needed, which is especially valuable during datastore stress.

For traffic surges, when low-priority traffic is impacted, the Origin sets max-age=5 and returns HTTP 503, telling Open Connect to cache the rejection for 5 seconds. This dampens traffic storms without manual intervention.

Handling 404 Storms

Traffic storms from requests for non-existent segments are handled through hierarchical metadata:

Unknown event → 404
Known event, wrong timing → 404 with TTL matching expected publish time
Known event, segment never generated → 410 (Gone), client stops requesting

Event and rendition-level metadata benefits from high in-memory cache hit ratios at the Origin. During 404 storms, the control plane cache hit ratio exceeds 90%. No datastore stress.

The Takeaway

Netflix's Live Origin succeeds because it respects the fundamental constraint of live streaming: time is the budget. Every design decision — redundant pipelines, predictive templates, held requests, write-through caching, priority rate limiting — exists to protect the 2-second window between encoding and delivery.

At 65 million concurrent streams during the Tyson vs. Paul fight, the system held. Not because it was clever, but because it was deliberately designed around its failure modes.

Design for the 2-second budget. Everything else follows.

— blanho

Netflix VOD is a solved problem. Content is encoded ahead of time, cached on edge servers, and served from the closest node. You press play, it works.

The Core Architecture

The Live Origin operates as a multi-tenant microservice on Amazon EC2 instances. The communication model is straightforward:

The Packager (which prepares video segments) sends segments to the Origin using HTTP PUT
Open Connect nodes retrieve segments using HTTP GET
The PUT URL matches the GET URL — simple storage and retrieval

Encoder → Packager → [HTTP PUT] → Live Origin → [HTTP GET] → Open Connect → Viewers

Two architectural decisions shaped everything:

Multi-Pipeline Intelligent Selection

When Open Connect requests a segment, the Origin:

Checks candidates from both pipelines
Reads defect metadata (set by the Packager during lightweight inspection)
Selects the first valid segment

Pipeline A (us-east-1): segment_1042 → ✓ valid
Pipeline B (us-west-2): segment_1042 → ✗ missing frames

Origin decision: serve Pipeline A's segment

In the rare case where both are defective, the defect metadata passes downstream so clients can handle the error gracefully. No silent corruption.

Optimizing for the CDN

Open Connect was built for VOD — years of tuning nginx for pre-positioned content. Live streaming didn't fit that model, so Netflix extended nginx's proxy-caching with several optimizations:

Reject invalid requests early. Open Connect nodes know the segment templates. If a request asks for a segment outside the legitimate range, it's rejected immediately without hitting the Origin.

Streaming Metadata Through HTTP Headers

This efficiently communicates ad breaks, content warnings, or live event updates to millions of devices, independent of playback position. No separate notification channel needed.

Storage Architecture: From S3 to Cassandra

Netflix initially used AWS S3, similar to their VOD infrastructure. It worked for low-traffic events. Then they discovered live streaming's unique requirements:

Every write is critical and time-sensitive (2-second retry budget)
Any failed write within 500ms was considered a bug
Hundreds of MB replicating across regions
Read latency under 1 second with strong consistency
Potential worst-case read throughput of 100+ Gbps

S3 met its uptime guarantees, but the strict timing budget meant any delay was catastrophic. The requirements were closer to a global, low-latency, highly available database than object storage.

Netflix built on their Key-Value Storage Abstraction using Apache Cassandra:

Large payloads are broken into chunks that can be independently retried
Local-quorum consistency allows write availability even with an entire AZ outage
Write-optimized LSM Tree storage engine handles the write-heavy workload

The results: median latency dropped from 113ms to 25ms. P99 latency improved from 267ms to 129ms.

Write path:  Packager → Origin → KeyValue → Cassandra (+ EVCache write-through)
Read path:   Open Connect → Origin → EVCache (hit) → response
                                   → Cassandra (miss) → EVCache → response

Request Prioritization

Not all requests have the same importance:

Origin writes — most critical (failure impacts the pipeline)
Live edge reads — critical (failure impacts most viewers)
DVR reads — less critical (failure only affects rewinding viewers)

Netflix implemented complete path isolation:

Separate EC2 stacks for publishing and CDN traffic
Separate clusters for read and write in the storage layer
EVCache handles reads, Cassandra handles writes

Handling 404 Storms

Traffic storms from requests for non-existent segments are handled through hierarchical metadata:

Unknown event → 404
Known event, wrong timing → 404 with TTL matching expected publish time
Known event, segment never generated → 410 (Gone), client stops requesting

Event and rendition-level metadata benefits from high in-memory cache hit ratios at the Origin. During 404 storms, the control plane cache hit ratio exceeds 90%. No datastore stress.

The Takeaway

At 65 million concurrent streams during the Tyson vs. Paul fight, the system held. Not because it was clever, but because it was deliberately designed around its failure modes.

Design for the 2-second budget. Everything else follows.

— blanho

How Netflix Streams Live Video to 100 Million Devices

The Core Architecture

Multi-Pipeline Intelligent Selection

Optimizing for the CDN

Streaming Metadata Through HTTP Headers

Storage Architecture: From S3 to Cassandra

Request Prioritization

Handling 404 Storms

The Takeaway

Related Posts

Why Every Complex System Eventually Becomes Event-Driven

Stateful Services Will Break Your Scaling

Why Your Fast System Feels Slow

How Netflix Streams Live Video to 100 Million Devices

The Core Architecture

Multi-Pipeline Intelligent Selection

Optimizing for the CDN

Streaming Metadata Through HTTP Headers

Storage Architecture: From S3 to Cassandra

Request Prioritization

Handling 404 Storms

The Takeaway

Related Posts

Why Every Complex System Eventually Becomes Event-Driven

Why Every Complex System Eventually Becomes Event-Driven

Stateful Services Will Break Your Scaling

Stateful Services Will Break Your Scaling

Why Your Fast System Feels Slow

Why Your Fast System Feels Slow

Why Every Complex System Eventually Becomes Event-Driven

Stateful Services Will Break Your Scaling

Why Your Fast System Feels Slow