Pritam Roy

Posted on Mar 10 • Originally published at pritamroy.com

How JioHotstar Engineered 82.1 Crore Concurrent Streams - A DevOps Deep Dive into the T20 World Cup 2026 Final

#cloudnative #aws #devops #kubernetes

Originally published on pritamroy.com

Setting the Stage: What Actually Happened on March 8, 2026

Before we talk infrastructure, let's appreciate the scale of the event that stress-tested it.

India defeated New Zealand by 96 runs in the ICC Men's T20 World Cup 2026 Final at the Narendra Modi Stadium in Ahmedabad, posting a mammoth 255/5 - the highest total ever in a T20 World Cup final. India became the first team in history to retain their T20 World Cup title, and the first to win three T20 World Cup titles overall. The stadium held 86,000 roaring fans. Hundreds of millions watched on screens across every corner of India and the world.

And JioHotstar? It didn't just survive. It rewrote history.

The Numbers That Made Engineers Sweat (And Then Celebrate)

The concurrent viewership peaked at 82.1 crore simultaneous streams during the post-match presentation ceremony. Let that number sink in - 821 million streams at a single moment, from a single platform, from a single country.

Here's how the demand curve looked throughout the match:

Moment	Concurrent Viewers
Ricky Martin's opening performance	2.1 crore
At the toss	4.2 crore
End of India's innings (255/5)	43.9 crore
Innings break	44.3 crore
New Zealand start chasing 255	49.9 crore
End of the 1st over of the chase	50.3 crore
The moment the last wicket fell	74.5 crore
Post-match presentation ceremony	82.1 crore

This is a textbook demand curve for any DevOps engineer to study - a slow warm-up, a steep mid-event ramp, and a vertical spike at the moment of maximum drama. Every system design decision JioHotstar made had to account for exactly this shape.

For context: the 2024 T20 WC Final peaked at just 5.3 crore on Disney+ Hotstar. In two years, they scaled peak concurrency by more than 15x. That is not an accident - that is an engineering masterclass.

They also came into the Final having broken a world record just days earlier. During the India vs England semi-final on March 5, JioHotstar recorded 65.2 million peak concurrent viewers - the highest concurrency ever achieved for a live event across any digital platform in the world. The Final obliterated even that number.

Part 1: The Foundation - Understanding JioHotstar's Architecture Origins

To understand the engineering decisions, you need to understand the entity first.

By late 2024, Reliance Industries (through Viacom18) and The Walt Disney Company announced an $8.5 billion joint venture called JioStar, combining Viacom18's media assets with Disney's Star India and Hotstar operations in India.

This merger gave the DevOps teams something rare: two battle-hardened streaming backends to draw lessons from. Disney+ Hotstar had years of cricket-at-scale experience, having served the 2023 ODI World Cup and multiple IPL seasons. JioCinema had cracked the 4K pipeline and aggressive CDN work. The 2026 World Cup was the first true test of whether the combined architecture could handle something neither had ever attempted alone.

Part 2: Pre-Tournament Planning - How You Prepare for an 82-Crore Spike

In DevOps, you never wait for production to find your limits. JioHotstar's SRE teams began capacity planning months before the first ball was bowled.

Traffic Forecasting Using Historical Data

The SRE teams forecast traffic using a predictive model trained on data from previous major streaming events: the 2024 T20 WC Final (5.3 crore), the 2023 ODI WC Final (5.9 crore), Asia Cup peaks, and IPL finals. Engineers built regression models accounting for factors like: is India playing? What stage of the tournament is it? What time of day? What are the network conditions across India's diverse geography?

The key insight: the Final was always going to be the largest event, and the models needed to be revised upward after each knockout match. After the semi-final already set a world record at 65.2 million concurrent, the capacity plan for the Final had to be re-evaluated entirely.

Load Testing at Scale - Project HULK

JioHotstar created an in-house project called "Project HULK" specifically to stress-test their platform before major events. The load generation infrastructure used c5.9xlarge machines distributed across 8 different AWS regions to simultaneously hit the CDN, load balancers, and application layers.

The reason for distributing across 8 regions is subtle but important: cloud providers share underlying physical infrastructure. A massive synthetic load originating from a single region could inadvertently impact other customers co-located on the same hardware. By spreading synthetic load across regions, you simulate a real-world distributed user base while being a responsible cloud tenant.

Pre-warming: The Underrated Hero

Every time a major cricket match was about to begin under the old architecture, the operations team had to manually pre-warm hundreds of load balancers. In the new architecture, this process was fully automated.

But the discipline of pre-warming remained: before the Ricky Martin opening performance even started, JioHotstar's edge nodes, CDN caches, and application clusters were already scaled up and warm. Pre-warming CDN caches with the stream's initial HLS segments, spinning up Kubernetes node pools ahead of anticipated demand, pre-populating authentication session caches - all of this is part of the playbook.

You don't wait for traffic to arrive. You meet it at the door.

Part 3: The Kubernetes Architecture - DataCenter Abstraction

This is the most significant architectural evolution in JioHotstar's history, and the one with the most lessons for any platform engineering team.

The Old World

Previously, Hotstar managed its workloads on two large, self-managed Kubernetes clusters built using KOPS (Kubernetes Operations), running 800+ microservices across them. Every microservice had its own AWS Application Load Balancer (ALB) using NodePort services.

The request flow looked like this:

Client → CDN → ALB → NodePort → kube-proxy → Pod

The problems were multiple. Hundreds of ALBs needed to be manually pre-warmed before every major match - an error-prone, time-consuming process. The old Cluster Autoscaler was too slow to release or consolidate nodes efficiently during off-peak periods. And scaling beyond 400 nodes simultaneously caused API server throttling - a hard ceiling on their peak capacity.

The New Model: DataCenter Abstraction

The new model introduced a concept called DataCenter Abstraction. A "data center" in this model doesn't refer to a physical building - it's a logical grouping of multiple Kubernetes clusters within a specific region. Together, these clusters behave like a single large compute unit, with each application team given a single logical namespace.

What this means in practice for the World Cup Final:

JioHotstar could treat its AWS infrastructure across Mumbai, Hyderabad, and Delhi as a single logical pool
A central Envoy proxy replaced hundreds of individual ALBs, unifying traffic routing, authentication, and rate-limiting in one place
Services moved from NodePort to ClusterIP + ALB Ingress, eliminating hard port limits
Developers deploy one YAML manifest per service; the platform handles failover and routing behind the scenes

They also migrated from self-managed KOPS clusters to Amazon EKS, offloading Kubernetes control plane management to AWS. Combined with Karpenter, nodes now provision in seconds rather than minutes - critical when viewership goes from 44 crore to 74 crore in the final 4 overs of a chase.

# Karpenter NodePool - simplified example
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: live-streaming-pool
spec:
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["c6i.8xlarge", "c6g.8xlarge", "c5.9xlarge"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand", "spot"]
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["ap-south-1a", "ap-south-1b", "ap-south-1c"]
      kubelet:
        maxPods: 110
  limits:
    cpu: "8000"
    memory: "16Ti"
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h

The capacity-type includes both on-demand and spot, meaning Karpenter intelligently places stateless, fault-tolerant workloads on cheaper Spot instances while keeping critical session services on On-Demand. The consolidationPolicy: WhenUnderutilized ensures nodes are immediately released during the innings break, saving cost in real time.

IP Address Management - A Lesson from 2023

A critical incident during the 2023 World Cup involved running out of IP addresses. The VPC CNI plugin's WARM_IP_TARGET and MINIMUM_IP_TARGET settings were over-allocating IPs per node. For 2026, engineers used larger CIDR blocks (/18 instead of /20) and fine-tuned these settings, allowing clusters to scale beyond 400 nodes without hitting IP exhaustion.

Part 4: Infrastructure Scaling - Eliminating Every Bottleneck

Kubernetes architecture is only part of the picture. The network infrastructure underneath also needed surgery.

NAT Gateway Scaling

Monitoring with VPC Flow Logs revealed a frightening discovery during a pre-tournament load test: a single Kubernetes cluster was consuming 50% of its NAT Gateway throughput at just 10% of expected peak load. At full Final traffic, this would have been a catastrophic bottleneck.

The fix: scale out from one NAT Gateway per Availability Zone to one NAT Gateway per subnet. This distributed the external traffic load evenly and eliminated the pressure point entirely.

Worker Node Network Optimization

Load tests showed that internal API Gateway pods were consuming 8–9 Gbps of network bandwidth on individual nodes, causing severe contention with other services.

Two fixes were implemented in parallel:

Deploy high-throughput nodes with a minimum capacity of 10 Gbps for API Gateway workloads
Use Kubernetes topology spread constraints to ensure only one API Gateway pod runs per node

# Topology spread constraint for API Gateway pods
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: api-gateway

This constraint ensures Kubernetes never schedules two API Gateway pods on the same physical node. The result: throughput stabilized at 2–3 Gbps per node even at peak, rather than saturating at 8–9 Gbps on a few overloaded nodes.

Part 5: The Video Pipeline - From Camera to 82 Crore Phones in Under 5 Seconds

Most people think of streaming as "just sending video." For a live match at this scale, it is an extraordinarily intricate real-time data pipeline with multiple stages, each completing in sub-second timeframes.

Stage 1 - Ingestion: Getting the Feed from the Ground

At the Narendra Modi Stadium, production crews captured the match using multiple HD and 4K cameras. The raw feed travels via dedicated broadcast fiber links using SRT (Secure Reliable Transport) protocol. SRT provides approximately 20% packet loss recovery compared to the older RTMP protocol - critical given India's network variability.

Stage 2 - Transcoding: One Feed, 100 Million Devices

Raw feeds hit AWS Elemental MediaLive on p4d.24xlarge GPU instances, transcoding multiple adaptive renditions in under 2 seconds. A single 4K broadcast feed is simultaneously converted into:

Profile	Target Audience
360p	2G/3G users in rural India
480p	Moderate connections
720p	Standard HD
1080p	Good broadband
4K HDR	Premium fiber/5G subscribers

The 2026 World Cup featured true 4K HDR streaming - not upscaled 1080p - at genuinely high bitrates. Every rendition generated in real-time, in parallel, with sub-2-second latency.

Stage 3 - Packaging: HLS, DASH, and DRM

AWS MediaPackage segments outputs into HLS/DASH chunks at over 100,000 chunks per second, applies DRM encryption through Widevine and PlayReady, and dynamically adds captions and regional subtitles. MediaPackage does just-in-time packaging - eliminating the need to pre-generate format-specific segments for every device type.

Stage 4 - Storage and Delivery

Amazon S3 Intelligent-Tiering stores HLS/DASH chunks with multi-AZ replication. CloudFront delivers them via 300+ edge locations worldwide. Live stream segments are accessed billions of times in their first few seconds and then almost never again - S3 Intelligent-Tiering handles this access pattern perfectly, automatically reducing storage costs.

Part 6: The CDN Layer - The True Workhorse of 82 Crore Streams

If the video pipeline is the heart, the CDN is the circulatory system. No single origin server can serve 82 crore simultaneous streams.

Multi-CDN Strategy

JioHotstar employs a multi-CDN strategy with an in-house CDN load optimizer that dynamically chooses between Akamai, CloudFront, and others, always routing viewers through the least congested path. If one CDN faces an issue, another picks up the slack - completely transparent to the viewer.

Traffic Segregation

Traffic Type	Routing Strategy
Cacheable (scorecards, stats, highlights)	Dedicated CDN domain, aggressive cache TTLs
Non-cacheable (sessions, personalization)	Separate routing path, correctness-first
Non-video (images, metadata)	Cost-efficient CDN providers

This segregation preserves high-performance CDN capacity specifically for video segment delivery.

The Jio Network Advantage: A Moat No Competitor Can Copy

JioHotstar is part of a company that also owns the physical network delivering the stream. Jio's 5G network works with Jio's own Mobile Edge Computing (MEC) servers, placing compute resources physically inside the telecom network - at the base station layer - rather than in a distant cloud data center.

For 500 million+ Jio subscribers, the World Cup Final was served from their own carrier's edge - a fundamentally different and faster delivery path than what any competitor can offer.

Part 7: Microservices at Scale - 800+ Services Serving One Match

The microservices architecture means video playback, authentication, personalization, live chat, multilingual commentary routing, payment processing, and analytics are all independent services. This isolation is critical: if the live emoji reaction feature crashes during Bumrah's 4th wicket, it should crash without affecting the video stream.

Feature Flags: The Safety Net

Feature flags allow gradual rollout and instant kill-switches without any deployment. In a worst-case scenario - say, a memory leak in the live chat microservice - engineers flip a single flag to disable chat for all users, immediately reducing load without any restart or deployment.

The Kafka and Flink Real-Time Pipeline

Every viewer generates continuous telemetry events. At 82 crore concurrent users, this is billions of messages per second.

Apache Kafka - distributed, fault-tolerant message queue absorbing event bursts
Apache Flink - real-time processing for dashboards, anomaly detection, and adaptive algorithms

Part 8: Observability - The SRE War Room During the Final

The monitoring stack ran three layers simultaneously:

Tool	Purpose
AWS CloudWatch	Infrastructure metrics (EC2 CPU, RDS connections, NAT throughput)
Prometheus	Application-level and custom business metrics
Grafana	Real-time visualization - latency, throughput, rebuffer trends

The single most important metric: rebuffer rate - the percentage of viewers experiencing playback interruption.

# Prometheus alert rule for rebuffer rate
sum(rate(media_rebuffer_events[5m])) / sum(rate(media_play_time[5m])) > 0.004

At 82 crore viewers, 0.4% means 3.28 crore people buffering simultaneously - an unacceptable outcome. Every metric had an automated alert. Every alert had a documented runbook. Every runbook had been practiced.

Chaos Engineering: Breaking Things Before Match Day

Before major events, JioHotstar's teams ran chaos drills at 2 AM:

Deliberately killing an entire Availability Zone
Simulating a CDN provider outage
Injecting latency into the authentication service
Validating automated failover and recovery

Good SRE teams don't wait for production failures - they engineer them deliberately.

Part 9: Caching Strategy - Keeping 82 Crore Sessions Alive

The solution is an aggressive multi-layer caching hierarchy:

Layer 1 - CDN Edge Cache
The video segment cached at the CDN. If served from a CloudFront edge PoP, JioHotstar's origin never sees that request at all. This is the most important cache hit in the entire system.

Layer 2 - Application-Level Redis Cache
User session tokens and subscription entitlements cached in Redis clusters. Subscription verified once at playback start, cached for the match duration. Subsequent requests bypass the database entirely.

Layer 3 - Database Read Replicas
Multiple read replicas spread across AZs serve preferences and recommendation data. Write traffic goes only to the primary.

A well-designed caching layer means 82 crore viewers might generate fewer database queries than 5 lakh viewers on a poorly designed system.

Part 10: Adaptive Bitrate and AI Optimization - Client Intelligence at Scale

The ABR player constantly measures download speed, buffer health, and network latency - running entirely on the client side. For 82 crore simultaneous viewers, even a 1ms server-side computation per quality decision would be catastrophic - that's 820,000 seconds of compute per decision cycle.

JioHotstar's AI-powered bitrate optimization achieves:

25% average bitrate reduction without compromising perceived quality
12% more watch time due to reduced buffering
Proactive network condition prediction before rebuffering begins

Part 11: Cost Architecture - 15x Scale Without 15x the Bill

Metric	Value
Cost per 1M viewers	~$0.87–$0.92
Budget variance	~22% under budget
Spot instance discount	Up to 90% vs On-Demand

Spot Instances were used for all stateless, fault-tolerant workloads: transcoding workers, telemetry processors, recommendation engines. Session-critical services ran on On-Demand or Reserved capacity.

Karpenter's bin-packing and consolidation continuously released underutilized nodes between matches, reducing running costs to near-zero between sessions.

Part 12: Multi-Language, Multi-Format - Serving Every Indian

India is not one market. It is 22 official languages, hundreds of dialects, and a spectrum from 2G feature phones in rural UP to 5G flagship devices in Bangalore.

Commentary was available in Hindi, English, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, and more - each a separate audio track dynamically stitched into the HLS manifest at request time based on viewer preference.

JioHotstar simultaneously ran four distinct product experiences from the same underlying stream:

Standard player
Hype Mode (vertical video with real-time stat overlays)
Multi-cam view
Highlights scrubber

The platform also deployed CMAF (Common Media Application Format) low-latency protocol at massive scale, achieving end-to-end delay of only a few seconds - crucial when millions of viewers are watching simultaneously with stadium audio bleeding through their windows.

Part 13: Graceful Degradation - Planning for What You Don't Plan For

In the event of unexpected traffic spikes beyond provisioned capacity, instead of showing a blank screen or error, the system pre-caches and serves static still images (scoreboard, static broadcast frame) as a temporary placeholder while the video pipeline catches up.

The engineering philosophy is clear:

Protect the stream above everything else.

Key Takeaways for DevOps and SRE Engineers

1. Automate pre-warming and scale playbooks.
At 82 crore scale, there is no time for human intervention in the scaling loop.

2. Data-driven capacity planning beats gut feel every time.
Use past events to forecast. Validate with load tests. Revise upward after each knockout match.

3. Layered optimization covers every tier.
CDN edge → Kubernetes node pool → NAT gateway → database read replica. A bottleneck at any tier collapses the stack.

4. Managed services let teams focus on workloads, not infrastructure.
Moving from KOPS to EKS freed the platform team to focus on the microservices that actually differentiate their product.

5. Infrastructure as Code is non-negotiable at 800+ microservices.
Every load balancer, CDN config, autoscaling policy, and node pool declared in code, version-controlled in Git, deployed through CI/CD.

6. Observability is not optional.
CloudWatch + Prometheus + Grafana + documented runbooks + practiced responses. This is what separates platforms that survive scale from platforms that become post-mortems.

7. Plan for graceful failure, not just successful scale.
Feature flags as kill switches, static fallback images, circuit breakers - the difference between "lower quality for 30 seconds" and "error page for 82 crore people."

The Final Score

86,000 fans sang Vande Mataram inside the Narendra Modi Stadium as India lifted their third T20 World Cup. And 82.1 crore people watched it happen - simultaneously, on a single platform, without a single major outage, without viral complaints of buffering, and without the platform going down at the moment of the winning wicket.

India won on the field. JioHotstar won in the server room. Both victories were built the same way: with preparation, with execution under pressure, and with a team that had practiced for exactly this moment.

The next time you're tempted to skip the chaos drill or leave the pre-warming script manual, remember: someone at JioHotstar ran that drill at 2 AM so that 82 crore people could watch Bumrah take his 4th wicket on the smoothest stream of their lives.

Originally published at pritamroy.com

Let's Discuss 💬

Have you worked on large-scale streaming infrastructure, CDN optimization, or SRE for real-time systems? What architectural choices did your team make differently - especially around multi-CDN routing, Kubernetes autoscaling, or observability at high concurrency?

Drop a comment below - I'd love to hear your experience. 👇

DEV Community