Originally published on pritamroy.com
Setting the Stage: What Actually Happened on March 8, 2026
Before we talk infrastructure, let's appreciate the scale of the event that stress-tested it.
India defeated New Zealand by 96 runs in the ICC Men's T20 World Cup 2026 Final at the Narendra Modi Stadium in Ahmedabad, posting a mammoth 255/5 - the highest total ever in a T20 World Cup final. India became the first team in history to retain their T20 World Cup title, and the first to win three T20 World Cup titles overall. The stadium held 86,000 roaring fans. Hundreds of millions watched on screens across every corner of India and the world.
And JioHotstar? It didn't just survive. It rewrote history.
The Numbers That Made Engineers Sweat (And Then Celebrate)
The concurrent viewership peaked at 82.1 crore simultaneous streams during the post-match presentation ceremony. Let that number sink in - 821 million streams at a single moment, from a single platform, from a single country.
Here's how the demand curve looked throughout the match:
| Moment | Concurrent Viewers |
|---|---|
| Ricky Martin's opening performance | 2.1 crore |
| At the toss | 4.2 crore |
| End of India's innings (255/5) | 43.9 crore |
| Innings break | 44.3 crore |
| New Zealand start chasing 255 | 49.9 crore |
| End of the 1st over of the chase | 50.3 crore |
| The moment the last wicket fell | 74.5 crore |
| Post-match presentation ceremony | 82.1 crore |
This is a textbook demand curve for any DevOps engineer to study - a slow warm-up, a steep mid-event ramp, and a vertical spike at the moment of maximum drama. Every system design decision JioHotstar made had to account for exactly this shape.
For context: the 2024 T20 WC Final peaked at just 5.3 crore on Disney+ Hotstar. In two years, they scaled peak concurrency by more than 15x. That is not an accident - that is an engineering masterclass.
They also came into the Final having broken a world record just days earlier. During the India vs England semi-final on March 5, JioHotstar recorded 65.2 million peak concurrent viewers - the highest concurrency ever achieved for a live event across any digital platform in the world. The Final obliterated even that number.
Part 1: The Foundation - Understanding JioHotstar's Architecture Origins
To understand the engineering decisions, you need to understand the entity first.
By late 2024, Reliance Industries (through Viacom18) and The Walt Disney Company announced an $8.5 billion joint venture called JioStar, combining Viacom18's media assets with Disney's Star India and Hotstar operations in India.
This merger gave the DevOps teams something rare: two battle-hardened streaming backends to draw lessons from. Disney+ Hotstar had years of cricket-at-scale experience, having served the 2023 ODI World Cup and multiple IPL seasons. JioCinema had cracked the 4K pipeline and aggressive CDN work. The 2026 World Cup was the first true test of whether the combined architecture could handle something neither had ever attempted alone.
Part 2: Pre-Tournament Planning - How You Prepare for an 82-Crore Spike
In DevOps, you never wait for production to find your limits. JioHotstar's SRE teams began capacity planning months before the first ball was bowled.
Traffic Forecasting Using Historical Data
The SRE teams forecast traffic using a predictive model trained on data from previous major streaming events: the 2024 T20 WC Final (5.3 crore), the 2023 ODI WC Final (5.9 crore), Asia Cup peaks, and IPL finals. Engineers built regression models accounting for factors like: is India playing? What stage of the tournament is it? What time of day? What are the network conditions across India's diverse geography?
The key insight: the Final was always going to be the largest event, and the models needed to be revised upward after each knockout match. After the semi-final already set a world record at 65.2 million concurrent, the capacity plan for the Final had to be re-evaluated entirely.
Load Testing at Scale - Project HULK
JioHotstar created an in-house project called "Project HULK" specifically to stress-test their platform before major events. The load generation infrastructure used c5.9xlarge machines distributed across 8 different AWS regions to simultaneously hit the CDN, load balancers, and application layers.
The reason for distributing across 8 regions is subtle but important: cloud providers share underlying physical infrastructure. A massive synthetic load originating from a single region could inadvertently impact other customers co-located on the same hardware. By spreading synthetic load across regions, you simulate a real-world distributed user base while being a responsible cloud tenant.
Pre-warming: The Underrated Hero
Every time a major cricket match was about to begin under the old architecture, the operations team had to manually pre-warm hundreds of load balancers. In the new architecture, this process was fully automated.
But the discipline of pre-warming remained: before the Ricky Martin opening performance even started, JioHotstar's edge nodes, CDN caches, and application clusters were already scaled up and warm. Pre-warming CDN caches with the stream's initial HLS segments, spinning up Kubernetes node pools ahead of anticipated demand, pre-populating authentication session caches - all of this is part of the playbook.
You don't wait for traffic to arrive. You meet it at the door.
Part 3: The Kubernetes Architecture - DataCenter Abstraction
This is the most significant architectural evolution in JioHotstar's history, and the one with the most lessons for any platform engineering team.
The Old World
Previously, Hotstar managed its workloads on two large, self-managed Kubernetes clusters built using KOPS (Kubernetes Operations), running 800+ microservices across them. Every microservice had its own AWS Application Load Balancer (ALB) using NodePort services.
The request flow looked like this:
Client → CDN → ALB → NodePort → kube-proxy → Pod
The problems were multiple. Hundreds of ALBs needed to be manually pre-warmed before every major match - an error-prone, time-consuming process. The old Cluster Autoscaler was too slow to release or consolidate nodes efficiently during off-peak periods. And scaling beyond 400 nodes simultaneously caused API server throttling - a hard ceiling on their peak capacity.
The New Model: DataCenter Abstraction
The new model introduced a concept called DataCenter Abstraction. A "data center" in this model doesn't refer to a physical building - it's a logical grouping of multiple Kubernetes clusters within a specific region. Together, these clusters behave like a single large compute unit, with each application team given a single logical namespace.
What this means in practice for the World Cup Final:
- JioHotstar could treat its AWS infrastructure across Mumbai, Hyderabad, and Delhi as a single logical pool
- A central Envoy proxy replaced hundreds of individual ALBs, unifying traffic routing, authentication, and rate-limiting in one place
- Services moved from NodePort to ClusterIP + ALB Ingress, eliminating hard port limits
- Developers deploy one YAML manifest per service; the platform handles failover and routing behind the scenes
They also migrated from self-managed KOPS clusters to Amazon EKS, offloading Kubernetes control plane management to AWS. Combined with Karpenter, nodes now provision in seconds rather than minutes - critical when viewership goes from 44 crore to 74 crore in the final 4 overs of a chase.
# Karpenter NodePool - simplified example
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: live-streaming-pool
spec:
template:
spec:
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["c6i.8xlarge", "c6g.8xlarge", "c5.9xlarge"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand", "spot"]
- key: "topology.kubernetes.io/zone"
operator: In
values: ["ap-south-1a", "ap-south-1b", "ap-south-1c"]
kubelet:
maxPods: 110
limits:
cpu: "8000"
memory: "16Ti"
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h
The capacity-type includes both on-demand and spot, meaning Karpenter intelligently places stateless, fault-tolerant workloads on cheaper Spot instances while keeping critical session services on On-Demand. The consolidationPolicy: WhenUnderutilized ensures nodes are immediately released during the innings break, saving cost in real time.
IP Address Management - A Lesson from 2023
A critical incident during the 2023 World Cup involved running out of IP addresses. The VPC CNI plugin's WARM_IP_TARGET and MINIMUM_IP_TARGET settings were over-allocating IPs per node. For 2026, engineers used larger CIDR blocks (/18 instead of /20) and fine-tuned these settings, allowing clusters to scale beyond 400 nodes without hitting IP exhaustion.
Part 4: Infrastructure Scaling - Eliminating Every Bottleneck
Kubernetes architecture is only part of the picture. The network infrastructure underneath also needed surgery.
NAT Gateway Scaling
Monitoring with VPC Flow Logs revealed a frightening discovery during a pre-tournament load test: a single Kubernetes cluster was consuming 50% of its NAT Gateway throughput at just 10% of expected peak load. At full Final traffic, this would have been a catastrophic bottleneck.
The fix: scale out from one NAT Gateway per Availability Zone to one NAT Gateway per subnet. This distributed the external traffic load evenly and eliminated the pressure point entirely.
Worker Node Network Optimization
Load tests showed that internal API Gateway pods were consuming 8–9 Gbps of network bandwidth on individual nodes, causing severe contention with other services.
Two fixes were implemented in parallel:
- Deploy high-throughput nodes with a minimum capacity of 10 Gbps for API Gateway workloads
- Use Kubernetes topology spread constraints to ensure only one API Gateway pod runs per node
# Topology spread constraint for API Gateway pods
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-gateway
This constraint ensures Kubernetes never schedules two API Gateway pods on the same physical node. The result: throughput stabilized at 2–3 Gbps per node even at peak, rather than saturating at 8–9 Gbps on a few overloaded nodes.
Part 5: The Video Pipeline - From Camera to 82 Crore Phones in Under 5 Seconds
Most people think of streaming as "just sending video." For a live match at this scale, it is an extraordinarily intricate real-time data pipeline with multiple stages, each completing in sub-second timeframes.
Stage 1 - Ingestion: Getting the Feed from the Ground
At the Narendra Modi Stadium, production crews captured the match using multiple HD and 4K cameras. The raw feed travels via dedicated broadcast fiber links using SRT (Secure Reliable Transport) protocol. SRT provides approximately 20% packet loss recovery compared to the older RTMP protocol - critical given India's network variability.
Stage 2 - Transcoding: One Feed, 100 Million Devices
Raw feeds hit AWS Elemental MediaLive on p4d.24xlarge GPU instances, transcoding multiple adaptive renditions in under 2 seconds. A single 4K broadcast feed is simultaneously converted into:
| Profile | Target Audience |
|---|---|
| 360p | 2G/3G users in rural India |
| 480p | Moderate connections |
| 720p | Standard HD |
| 1080p | Good broadband |
| 4K HDR | Premium fiber/5G subscribers |
The 2026 World Cup featured true 4K HDR streaming - not upscaled 1080p - at genuinely high bitrates. Every rendition generated in real-time, in parallel, with sub-2-second latency.
Stage 3 - Packaging: HLS, DASH, and DRM
AWS MediaPackage segments outputs into HLS/DASH chunks at over 100,000 chunks per second, applies DRM encryption through Widevine and PlayReady, and dynamically adds captions and regional subtitles. MediaPackage does just-in-time packaging - eliminating the need to pre-generate format-specific segments for every device type.
Stage 4 - Storage and Delivery
Amazon S3 Intelligent-Tiering stores HLS/DASH chunks with multi-AZ replication. CloudFront delivers them via 300+ edge locations worldwide. Live stream segments are accessed billions of times in their first few seconds and then almost never again - S3 Intelligent-Tiering handles this access pattern perfectly, automatically reducing storage costs.
Part 6: The CDN Layer - The True Workhorse of 82 Crore Streams
If the video pipeline is the heart, the CDN is the circulatory system. No single origin server can serve 82 crore simultaneous streams.
Multi-CDN Strategy
JioHotstar employs a multi-CDN strategy with an in-house CDN load optimizer that dynamically chooses between Akamai, CloudFront, and others, always routing viewers through the least congested path. If one CDN faces an issue, another picks up the slack - completely transparent to the viewer.
Traffic Segregation
| Traffic Type | Routing Strategy |
|---|---|
| Cacheable (scorecards, stats, highlights) | Dedicated CDN domain, aggressive cache TTLs |
| Non-cacheable (sessions, personalization) | Separate routing path, correctness-first |
| Non-video (images, metadata) | Cost-efficient CDN providers |
This segregation preserves high-performance CDN capacity specifically for video segment delivery.
The Jio Network Advantage: A Moat No Competitor Can Copy
JioHotstar is part of a company that also owns the physical network delivering the stream. Jio's 5G network works with Jio's own Mobile Edge Computing (MEC) servers, placing compute resources physically inside the telecom network - at the base station layer - rather than in a distant cloud data center.
For 500 million+ Jio subscribers, the World Cup Final was served from their own carrier's edge - a fundamentally different and faster delivery path than what any competitor can offer.
Part 7: Microservices at Scale - 800+ Services Serving One Match
The microservices architecture means video playback, authentication, personalization, live chat, multilingual commentary routing, payment processing, and analytics are all independent services. This isolation is critical: if the live emoji reaction feature crashes during Bumrah's 4th wicket, it should crash without affecting the video stream.
Feature Flags: The Safety Net
Feature flags allow gradual rollout and instant kill-switches without any deployment. In a worst-case scenario - say, a memory leak in the live chat microservice - engineers flip a single flag to disable chat for all users, immediately reducing load without any restart or deployment.
The Kafka and Flink Real-Time Pipeline
Every viewer generates continuous telemetry events. At 82 crore concurrent users, this is billions of messages per second.
- Apache Kafka - distributed, fault-tolerant message queue absorbing event bursts
- Apache Flink - real-time processing for dashboards, anomaly detection, and adaptive algorithms
Part 8: Observability - The SRE War Room During the Final
The monitoring stack ran three layers simultaneously:
| Tool | Purpose |
|---|---|
| AWS CloudWatch | Infrastructure metrics (EC2 CPU, RDS connections, NAT throughput) |
| Prometheus | Application-level and custom business metrics |
| Grafana | Real-time visualization - latency, throughput, rebuffer trends |
The single most important metric: rebuffer rate - the percentage of viewers experiencing playback interruption.
# Prometheus alert rule for rebuffer rate
sum(rate(media_rebuffer_events[5m])) / sum(rate(media_play_time[5m])) > 0.004
At 82 crore viewers, 0.4% means 3.28 crore people buffering simultaneously - an unacceptable outcome. Every metric had an automated alert. Every alert had a documented runbook. Every runbook had been practiced.
Chaos Engineering: Breaking Things Before Match Day
Before major events, JioHotstar's teams ran chaos drills at 2 AM:
- Deliberately killing an entire Availability Zone
- Simulating a CDN provider outage
- Injecting latency into the authentication service
- Validating automated failover and recovery
Good SRE teams don't wait for production failures - they engineer them deliberately.
Part 9: Caching Strategy - Keeping 82 Crore Sessions Alive
The solution is an aggressive multi-layer caching hierarchy:
Layer 1 - CDN Edge Cache
The video segment cached at the CDN. If served from a CloudFront edge PoP, JioHotstar's origin never sees that request at all. This is the most important cache hit in the entire system.
Layer 2 - Application-Level Redis Cache
User session tokens and subscription entitlements cached in Redis clusters. Subscription verified once at playback start, cached for the match duration. Subsequent requests bypass the database entirely.
Layer 3 - Database Read Replicas
Multiple read replicas spread across AZs serve preferences and recommendation data. Write traffic goes only to the primary.
A well-designed caching layer means 82 crore viewers might generate fewer database queries than 5 lakh viewers on a poorly designed system.
Part 10: Adaptive Bitrate and AI Optimization - Client Intelligence at Scale
The ABR player constantly measures download speed, buffer health, and network latency - running entirely on the client side. For 82 crore simultaneous viewers, even a 1ms server-side computation per quality decision would be catastrophic - that's 820,000 seconds of compute per decision cycle.
JioHotstar's AI-powered bitrate optimization achieves:
- 25% average bitrate reduction without compromising perceived quality
- 12% more watch time due to reduced buffering
- Proactive network condition prediction before rebuffering begins
Part 11: Cost Architecture - 15x Scale Without 15x the Bill
| Metric | Value |
|---|---|
| Cost per 1M viewers | ~$0.87–$0.92 |
| Budget variance | ~22% under budget |
| Spot instance discount | Up to 90% vs On-Demand |
Spot Instances were used for all stateless, fault-tolerant workloads: transcoding workers, telemetry processors, recommendation engines. Session-critical services ran on On-Demand or Reserved capacity.
Karpenter's bin-packing and consolidation continuously released underutilized nodes between matches, reducing running costs to near-zero between sessions.
Part 12: Multi-Language, Multi-Format - Serving Every Indian
India is not one market. It is 22 official languages, hundreds of dialects, and a spectrum from 2G feature phones in rural UP to 5G flagship devices in Bangalore.
Commentary was available in Hindi, English, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, and more - each a separate audio track dynamically stitched into the HLS manifest at request time based on viewer preference.
JioHotstar simultaneously ran four distinct product experiences from the same underlying stream:
- Standard player
- Hype Mode (vertical video with real-time stat overlays)
- Multi-cam view
- Highlights scrubber
The platform also deployed CMAF (Common Media Application Format) low-latency protocol at massive scale, achieving end-to-end delay of only a few seconds - crucial when millions of viewers are watching simultaneously with stadium audio bleeding through their windows.
Part 13: Graceful Degradation - Planning for What You Don't Plan For
In the event of unexpected traffic spikes beyond provisioned capacity, instead of showing a blank screen or error, the system pre-caches and serves static still images (scoreboard, static broadcast frame) as a temporary placeholder while the video pipeline catches up.
The engineering philosophy is clear:
Protect the stream above everything else.
Key Takeaways for DevOps and SRE Engineers
1. Automate pre-warming and scale playbooks.
At 82 crore scale, there is no time for human intervention in the scaling loop.
2. Data-driven capacity planning beats gut feel every time.
Use past events to forecast. Validate with load tests. Revise upward after each knockout match.
3. Layered optimization covers every tier.
CDN edge → Kubernetes node pool → NAT gateway → database read replica. A bottleneck at any tier collapses the stack.
4. Managed services let teams focus on workloads, not infrastructure.
Moving from KOPS to EKS freed the platform team to focus on the microservices that actually differentiate their product.
5. Infrastructure as Code is non-negotiable at 800+ microservices.
Every load balancer, CDN config, autoscaling policy, and node pool declared in code, version-controlled in Git, deployed through CI/CD.
6. Observability is not optional.
CloudWatch + Prometheus + Grafana + documented runbooks + practiced responses. This is what separates platforms that survive scale from platforms that become post-mortems.
7. Plan for graceful failure, not just successful scale.
Feature flags as kill switches, static fallback images, circuit breakers - the difference between "lower quality for 30 seconds" and "error page for 82 crore people."
The Final Score
86,000 fans sang Vande Mataram inside the Narendra Modi Stadium as India lifted their third T20 World Cup. And 82.1 crore people watched it happen - simultaneously, on a single platform, without a single major outage, without viral complaints of buffering, and without the platform going down at the moment of the winning wicket.
India won on the field. JioHotstar won in the server room. Both victories were built the same way: with preparation, with execution under pressure, and with a team that had practiced for exactly this moment.
The next time you're tempted to skip the chaos drill or leave the pre-warming script manual, remember: someone at JioHotstar ran that drill at 2 AM so that 82 crore people could watch Bumrah take his 4th wicket on the smoothest stream of their lives.
Originally published at pritamroy.com
Let's Discuss 💬
Have you worked on large-scale streaming infrastructure, CDN optimization, or SRE for real-time systems? What architectural choices did your team make differently - especially around multi-CDN routing, Kubernetes autoscaling, or observability at high concurrency?
Drop a comment below - I'd love to hear your experience. 👇
Top comments (0)