Hotstar · Live Streaming · 17 May 2026
July 9th, 2019. India vs New Zealand, Cricket World Cup semi-final. MS Dhoni walks to the crease and 1.1 million new viewers join Hotstar every single minute. Then he gets run out — and 24 million people hit the back button almost simultaneously.
- 25.3M peak concurrent
- 1.1M users/min growth
- 5.7 Tbps bandwidth
- 1M requests/sec
- 108,000 CPU test rig
- <90s scale reaction
The Story
🏏
At the peak of the 2019 ICC World Cup semi-final between India and New Zealand, 25.3 million people were simultaneously streaming live on Hotstar — a global record for any streaming platform. The platform consumed 5.7 Tbps of bandwidth: roughly 70% of India's total internet capacity at the time.
Cricket is not just a sport in India — it is synchronized national emotion. When MS Dhoni walks to the crease, tens of millions of people who had given up on the match suddenly reconsider. They reach for their phones. They open Hotstar. At one point during the semi-final, viewership was growing at 1.1 million users per minute — a rate that would overwhelm most cloud architectures before the first over was complete. Hotstar's engineering team had spent months anticipating exactly this moment, building an entirely custom infrastructure stack designed around one insight: the biggest danger in live sports streaming is not the peak load itself, but the speed of arrival at that peak.
THE DHONI PROBLEM
The growth rate was not the hardest engineering challenge — it was the drop. When Dhoni was run out, Hotstar went from 25.3 million concurrent viewers to under 1 million in minutes. Millions of users hit the back button simultaneously. They didn't leave the app — they returned to the homepage, where personalization, recommendation, and content-discovery APIs were suddenly hammered by a traffic tsunami that had nothing to do with video streaming. A system built only for video delivery would have collapsed on the exit.
The platform's architecture in 2019 ran on AWS EC2 instances with Akamai as the primary CDN (Content Delivery Network — a distributed system of edge servers that caches and delivers content from locations close to the user, reducing latency and offloading origin servers), backed by Apache Kafka for real-time event streaming and Apache Flink for stream processing. Hotstar had already made the strategic shift to microservices (an architecture where an application is built as a collection of small, independently deployable services, each responsible for a specific function), migrating from a monolith in 2018, which allowed individual services to be scaled independently. But AWS's native Auto Scaling Groups (AWS's built-in mechanism for automatically adding or removing EC2 instances based on metrics like CPU usage) had a fundamental problem: when you need to go from 1.5 million to 15 million concurrent viewers in under ten minutes, sequential instance provisioning isn't fast enough.
Problem
The Speed of Arrival
During the IPL Final and World Cup, Hotstar's traffic grew at +500,000 concurrent users per minute during peak moments — triggered by push notifications sent by the marketing team when the match became exciting. AWS native auto-scaling groups couldn't provision new EC2 instances fast enough: they experienced insufficient capacity errors in specific availability zones, and their step-size mechanism added nodes in fixed batches rather than responding to the rate of change.
Cause
AWS ASG's Three Fatal Flaws for Live Events
Availability Zone (an isolated, physically separate data center within an AWS region, designed to be independent of failures in other zones) imbalance meant adding capacity to one zone could leave another undersupplied. ASG step size — adding a fixed number of instances per scaling action — couldn't respond fast enough to 1.1M user/minute growth. And insufficient capacity errors in a zone at peak were unrecoverable: there was simply no spare inventory to allocate on-demand during a national cricket final.
Solution
Pre-warming + Custom Scaling Engine
Hotstar abandoned AWS ASG for live events entirely. Instead, they pre-warmed the full expected infrastructure before each match based on Project HULK predictions, maintaining a 2-million-user capacity buffer at all times. A custom internal scaling engine — driven by request rate and concurrency, not CPU — could spin up new capacity in under 90 seconds. A secondary ASG (the backup auto-scaling group kept on standby to provide a different instance type mix if the primary cluster hits limits) provided a different instance-type mix as a fallback if the primary cluster hit hard limits.
Result
25.3 Million — Zero Downtime
The IND vs NZ semi-final became the largest concurrent live stream in history at the time: 25.3 million viewers, zero reported downtime. The platform handled both the spike to peak and the catastrophic drop when Dhoni got out — the homepage API layer, pre-scaled and tested via chaos engineering, absorbed the sudden exit traffic without incident. Hotstar's engineering approach became a canonical reference talk for scaling live event infrastructure on AWS.
The most consequential engineering decision Hotstar made was building Project HULK — an internal load-testing platform with a footprint bigger than most companies' production environments. At full scale, Project HULK deployed 108,000 CPUs, 216 TB of RAM, and 200 Gbps of outbound network across 8 geographically distributed AWS regions, running geo-distributed load generators to simulate realistic user journeys. It performed four categories of tests: load generation to establish baseline capacity, Tsunami tests that simulated the sudden spike-and-drop profile of a Dhoni innings, chaos engineering (the practice of deliberately injecting failures into a system to discover weaknesses before they manifest in production) to test resilience when an availability zone went down, and ML-driven traffic pattern modelling to predict load curves for upcoming events. The insight that emerged was that Hotstar needed to face the real game before the actual game.
ℹ️
Bandwidth at the Edge of India's Internet
At 25.3 million concurrent users, Hotstar's bandwidth consumption hit 5.7 Tbps — approximately 70% of India's total available internet capacity at the time. This is not a platform metric; it is a national infrastructure metric. Hotstar engineers were not just managing application scale, they were managing demand at the level of physical network capacity across an entire country.
Race against time: +500K growth rate per minute concurrency. Fully baked AMIs: 4 minutes. Application boot-up time: 90 seconds. Reaction time: push notifications.
— — Gaurav Kamboj, Cloud Architect at Hotstar — AWS Community Day Bengaluru 2018
⚠️
The Push Notification Trap
The marketing team's push notifications — sent to bring users back to the app during exciting match moments — were a hidden traffic generator that engineering had no advance warning of. Every notification blast created an immediate, synchronized spike in both video and API traffic. Hotstar had to build a feedback loop where marketing intent was translated into infrastructure capacity decisions before the notification was sent, not after.
🔄
The Graceful Degradation Contract
When a service hit its capacity limits, Hotstar implemented panic mode (a defined degradation state where non-essential services are deliberately disabled to preserve capacity for the most critical user path — live video delivery). Recommendations, personalization, and social features were the first to shed load. Video streaming was always the last service standing. The contract was explicit: degrade gracefully rather than fail catastrophically.
The Infradashboard — Hotstar's internal capacity planning tool — gave the operations team a real-time view of infrastructure headroom and allowed proactive scale-up decisions hours before a match began. The team maintained a permanent buffer of 2 million concurrent user capacity above current load at all times during live events. Because AWS ASG couldn't add new nodes fast enough during the actual match, the buffer had to already exist before the first ball was bowled. The combination of pre-warming, buffer maintenance, and custom scaling logic turned a reactive system into a predictive one — and it held at 25.3 million.
✅
Record After Record — The Architecture That Kept Scaling
The 2019 record of 25.3 million concurrent viewers was the first proof of concept. By 2023, Hotstar hit 59 million during the Cricket World Cup final. In 2025, the ICC Champions Trophy Final drew 61 million simultaneous streams — more than the entire population of Italy, watching live on a single app. Each generation of the architecture was built directly on the engineering lessons of the one before.
The Fix
How Hotstar Replaced AWS Autoscaling with a Custom Scaling Engine
AWS Auto Scaling Groups were built for the average web application: traffic grows gradually, CPU rises, new instances are added over minutes. Live sports streaming is the opposite. Traffic can double in 90 seconds when MS Dhoni appears on screen. The AWS ASG step-size mechanism — adding instances in fixed batches — was simply too slow for this pattern. Hotstar's engineering team identified three specific failure modes in ASG behavior during live events: insufficient capacity errors (AWS returning an error when the requested EC2 instance type has no available inventory in a specific availability zone at that moment) during peak demand, availability zone skew (a condition where one AZ accumulates more load than others, exhausting its capacity while other AZs still have headroom) when scaling across AZs, and the lag between a scaling trigger and a serving-ready instance that could run to 4+ minutes when including AMI bake time and application boot.
# Hotstar's custom scaling logic — conceptual pseudocode
# Key insight: scale on REQUEST RATE and CONCURRENCY, not CPU utilization
class HotstarScaler:
CAPACITY_BUFFER = 2_000_000 # always maintain 2M concurrent user headroom
REACTION_TIME_TARGET = 90 # seconds to new serving capacity
BOOT_TIME = 75 # seconds for app boot from pre-baked AMI
def compute_required_capacity(self, current_concurrency, request_rate):
# Project forward 5 minutes using ML traffic model for this match
projected_peak = self.ml_model.predict_peak(
current=current_concurrency,
rate=request_rate,
event_type='cricket_live'
)
# Add mandatory buffer — never let buffer drop below 2M
return projected_peak + self.CAPACITY_BUFFER
def pre_warm_for_event(self, event_metadata):
# Called hours before match start — pre-bake AMIs in all regions
# Uses HULK traffic model predictions, not current load
predicted_peak = self.ml_model.predict_peak_from_event(event_metadata)
target_capacity = predicted_peak + self.CAPACITY_BUFFER
for region in ACTIVE_REGIONS:
# Launch PRIMARY ASG with main instance type mix
primary_asg.scale_to(target_capacity * 0.8, region)
# Launch SECONDARY ASG with diverse instance types as fallback
# Protects against insufficient capacity in any single instance family
secondary_asg.scale_to(target_capacity * 0.2, region)
def scale_on_signal(self, concurrency_metric, request_rate):
required = self.compute_required_capacity(concurrency_metric, request_rate)
current = infrastructure.get_active_capacity()
if current < required:
delta = required - current
# SNS alert triggers Lambda to activate secondary ASG immediately
sns.publish('scale_required', delta=delta)
lambda_handler.trigger_secondary_asg(delta)
THE SECONDARY ASG PATTERN
Hotstar ran two auto-scaling groups simultaneously for every live event. The primary ASG held the bulk of capacity, pre-warmed before the match. The secondary ASG — with a different mix of instance types — acted as an emergency reserve. An AWS SNS alert triggered a Lambda function to activate the secondary ASG when the primary hit limits. This avoided the single-point failure of depending on one instance family being available in a specific AZ during peak national demand.
- 25.3M — Peak concurrent viewers on July 9, 2019 — a global record for any streaming platform at the time
- <90s — Target reaction time to new serving capacity with pre-baked AMIs — vs 4+ minutes with cold AWS ASG provisioning
- 2M buffer — Permanent capacity headroom maintained above current load during all live events — never let it drop below this floor
- 0 — Reported downtime during the 25.3M peak — Project HULK's chaos engineering paid off when the moment arrived
The Infradashboard was the operational nerve centre during live events. Engineers could see capacity headroom in real time, trigger manual scale-up actions hours in advance of anticipated spikes, and monitor which services were approaching their degradation thresholds. The key operational insight was that live sports infrastructure management begins the morning of the match, not when the concurrency alert fires. Pre-warming full capacity ahead of push notifications — the external traffic amplifiers that marketing controlled — meant the infrastructure was already serving at near-peak levels before the first viewer joined. The Infradashboard made proactive capacity management a team sport between engineering and marketing.
✅
Kubernetes Migration: The Long-Term Fix
The 2019 architecture was built on raw EC2 instances with a custom autoscaler. The engineering team recognized this was not indefinitely scalable. In 2018, Hotstar had begun migrating to containerized microservices on a self-managed Kubernetes cluster , which enabled pod-level scaling in seconds rather than minutes. By 2023, this evolution reached EKS with Data Center Abstraction — the infrastructure that would handle 61 million concurrent viewers four years later.
ℹ️
Graceful Degradation Protocol
Hotstar defined a tiered degradation order for when services approached capacity limits. Tier 1 (shed first): recommendations, social features, personalization. Tier 2 : match statistics, secondary content APIs. Tier 3 (never shed): live video delivery, playback APIs, authentication. The protocol was automated and tested via Project HULK's chaos engineering scenarios — so when the real moment came, the system degraded on its own without human intervention.
📡
The Client-Side Backoff Contract
When backend latency exceeded thresholds, Hotstar's client applications were programmed to increase the interval between retry requests — backing off rather than hammering the server. This client-side behaviour was the last line of defense: when 25 million devices all experience a glitch simultaneously, the difference between a recoverable spike and a cascading failure can come down to whether the clients know to wait before retrying.
Architecture
Hotstar's architecture in 2019 was built for one governing constraint: traffic does not arrive smoothly. A cricket match involving India can go from zero to 10 million concurrent viewers in under ten minutes — faster than any traditional auto-scaling system can respond. The platform ran on AWS EC2 across multiple regions, with Akamai as the CDN (Content Delivery Network — a distributed network of edge servers that caches video segments and static assets close to users, absorbing the majority of load before it reaches origin servers) delivering video segments. Apache Kafka ingested 10 billion-plus clickstream events per match day. Microservices handled video playback, match statistics, personalization, and authentication independently — each able to be scaled or degraded without affecting the others. The three-layer architecture — CDN edge, application tier, data tier — had to be specifically engineered so that no layer became a bottleneck at the velocity of a cricket crowd.
Hotstar 2019 architecture — the path from a viewer's phone to live video, and where traffic concentrations hit
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
PROJECT HULK: THE PRODUCTION REHEARSAL
Project HULK was not a staging environment — it was a separate production-scale load testing infrastructure that ran geo-distributed tests from 8 AWS regions simultaneously. Its load generation cluster alone used c5.9xlarge instances (36 vCPUs, 72 GB RAM each) to generate realistic concurrent user traffic. Simulations ran four test types: baseline load, tsunami testing (a stress test pattern that simulates sudden extreme spikes and drops in traffic, matching the pattern of a cricket match when a wicket falls), chaos engineering with AZ failures, and ML-trained traffic pattern replay of previous match profiles. The goal: there should be no mode of failure in production that HULK has not already triggered in a test.
Project HULK — load testing architecture that simulated the real match before it happened
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
⚠️
The Bandwidth Ceiling No One Talks About
At 5.7 Tbps peak, Hotstar was approaching a hard physical limit — not a software limit. Adding more servers would not help if India's total internet capacity couldn't carry the bytes. This ceiling forced Hotstar's engineers to think about CDN efficiency and adaptive bitrate streaming (a technology that dynamically switches video quality based on the viewer's network conditions, delivering the best quality the connection can support at any moment) not just as user experience decisions, but as existential infrastructure constraints.
🌐
The Microservices Migration That Made It Possible
Hotstar migrated from a monolith to microservices in 2018 — just one year before the 25.3M record. This was the architectural prerequisite for everything else: without independent service scaling, graceful degradation tiers, and per-service capacity controls, the custom autoscaling and Infradashboard capabilities described here would have been impossible to build. You cannot surgically shed load from a monolith.
Lessons
Hotstar's 2019 story is not about surviving a traffic spike — it is about re-designing the relationship between infrastructure and time. The core lesson is that reactive systems cannot serve live events. When your traffic can double in 90 seconds, you need infrastructure that is already there. Every engineering decision Hotstar made — Project HULK, pre-warming, custom ASG, the buffer, graceful degradation — was aimed at the same goal: transforming a system that responds to load into one that anticipates it.
- 01. For live events, autoscaling solves the wrong problem. AWS ASG is designed for gradual load growth — it cannot provision capacity at the rate that 1.1 million users per minute demands. The correct model for predictable traffic spikes (load patterns where the timing and approximate magnitude of peak traffic is known in advance — like scheduled live events) is pre-warming based on predicted load, not reactive scaling based on current metrics. If you have a scheduled high-traffic event, provision for it before it starts.
- 02. Scale on the metrics that reflect user experience, not server health. Hotstar rejected CPU and memory utilization as scaling signals in favor of request rate and concurrency — the metrics that directly reflect how many users the system is currently serving. Build your autoscaler around the business-level constraint, not the infrastructure-level symptom. A server at 30% CPU can still be serving users who are getting a degraded experience.
- 03. Test the exit, not just the entry. Every load test rehearses the traffic spike. Almost none rehearse the drop. When 24 million users hit back simultaneously , the homepage and recommendation APIs absorbed a wave that was entirely different in character from streaming traffic. Hotstar's tsunami testing (load test pattern that simulates sudden, extreme traffic spikes followed by equally extreme drops — named for the wave that recedes before it strikes) explicitly rehearsed both the spike and the collapse. Design graceful degradation tiers, and test that they activate correctly.
- 04. Marketing is an unplanned infrastructure event. Push notifications sent at peak match moments created synchronized traffic spikes that engineering had no advance knowledge of. Build a feedback loop where marketing decisions trigger infrastructure responses before the notification is sent, not after. The Infradashboard gave engineers visibility; the next step is giving marketing a capacity-aware interface for campaign timing.
- 05. A capacity buffer is not waste — it is the cost of reliability for live events. Hotstar maintained a permanent 2-million-user capacity buffer above current load throughout every match. This headroom meant that unexpected spikes — a viral moment, an unexpected partnership, a push notification — could be absorbed without triggering a scaling event. For live events, the cost of over-provisioning is always lower than the cost of the three minutes when it fails.
THE DHONI EFFECT — AND WHAT CAME NEXT
The 2019 record of 25.3 million concurrent viewers held until 2023, when Hotstar hit 59 million during the Cricket World Cup — then broke its own record again in 2025 with 61 million during the ICC Champions Trophy Final. Each leap required a new generation of architecture: from EC2 to Kubernetes, from Kubernetes to EKS, from EKS to Data Center Abstraction with Envoy-based gateways. The 2019 engineering story was chapter one.
🌍
The Template That Changed Streaming Architecture
Hotstar's 2019 engineering approach — pre-warming, custom autoscaling based on concurrency, graceful degradation tiers, and game-day chaos testing — became a reference architecture cited in AWS re:Invent talks and engineering conference circuits worldwide. The project that started as a solution to one cricket match became the blueprint for live-event streaming infrastructure globally.
One cricket match used a quarter of India's internet. The engineering team called that a success.
TechLogStack — built at scale, broken in public, rebuilt by engineers
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack → (interactive diagrams, source links, and the full reader experience).
Top comments (0)