TechLogStack

Posted on May 20 • Originally published at techlogstack.com on May 17

When MS Dhoni Got Out: How Hotstar Survived 25 Million Concurrent Users

#backend #architecture #devops #webdev

25.3M peak concurrent viewers — a global streaming record on July 9 2019
1.1M users/minute growth rate at peak — faster than AWS Auto Scaling Groups could provision
5.7 Tbps bandwidth — roughly 70% of India's total internet capacity at the time
<90 seconds reaction time to new serving capacity with pre-baked AMIs
2M user buffer maintained above current load at all times during live events
0 reported downtime during the 25.3M peak

July 9th, 2019. India vs New Zealand, Cricket World Cup semi-final. MS Dhoni walks to the crease and 1.1 million new viewers join Hotstar every single minute. Then he gets run out — and 24 million people hit the back button almost simultaneously. This is the engineering story of the platform that survived both.

The Story

Cricket is not just a sport in India — it is synchronised national emotion. When MS Dhoni walks to the crease, tens of millions of people who had given up on the match suddenly reconsider. They reach for their phones. They open Hotstar. At one point during the semi-final, viewership was growing at 1.1 million users per minute — a rate that would overwhelm most cloud architectures before the first over was complete. Hotstar's engineering team had spent months anticipating exactly this moment, building a custom infrastructure stack designed around one insight: the biggest danger in live sports streaming is not the peak load itself, but the speed of arrival at that peak.

Race against time: +500K growth rate per minute concurrency. Fully baked AMIs: 4 minutes. Application boot-up time: 90 seconds. Reaction time: push notifications.

— Gaurav Kamboj, Cloud Architect at Hotstar, AWS Community Day Bengaluru 2018

The platform's architecture in 2019 ran on AWS EC2 instances with Akamai as the primary CDN (Content Delivery Network — a distributed system of edge servers that caches and delivers content from locations close to the user, reducing latency and offloading origin servers), backed by Apache Kafka for real-time event streaming and Apache Flink for stream processing. Hotstar had migrated from a monolith to microservices (an architecture where an application is built as a collection of small, independently deployable services, each responsible for a specific function) in 2018, which allowed individual services to be scaled independently. But AWS's native Auto Scaling Groups had a fundamental problem: when you need to go from 1.5 million to 15 million concurrent viewers in under ten minutes, sequential instance provisioning isn't fast enough.

The Dhoni Problem: It Was the Drop, Not the Peak

The growth rate was not the hardest engineering challenge — it was the exit. When Dhoni was run out, Hotstar went from 25.3 million concurrent viewers to under 1 million in minutes. Millions of users hit the back button simultaneously. They didn't leave the app — they returned to the homepage, where personalisation, recommendation, and content-discovery APIs were suddenly hammered by a traffic tsunami that had nothing to do with video streaming. A system built only for video delivery would have collapsed on the exit.

Problem

The Speed of Arrival

During the IPL Final and World Cup, Hotstar's traffic grew at +500,000 concurrent users per minute during peak moments — triggered by push notifications sent by the marketing team when the match became exciting. AWS native Auto Scaling Groups couldn't provision new EC2 instances fast enough: they experienced insufficient capacity errors in specific availability zones, and their step-size mechanism added nodes in fixed batches rather than responding to the rate of change.

Cause

AWS ASG's Three Fatal Flaws for Live Events

Availability Zone (an isolated, physically separate data centre within an AWS region, designed to be independent of failures in other zones) imbalance meant adding capacity to one zone could leave another undersupplied. ASG step size — adding a fixed number of instances per scaling action — couldn't respond fast enough to 1.1M user/minute growth. And insufficient capacity errors in a zone at peak were unrecoverable: there was simply no spare inventory to allocate on-demand during a national cricket final.

Solution

Pre-warming + Custom Scaling Engine

Hotstar abandoned AWS ASG for live events entirely. Instead, they pre-warmed the full expected infrastructure before each match based on Project HULK predictions, maintaining a 2-million-user capacity buffer at all times. A custom internal scaling engine — driven by request rate and concurrency, not CPU — could spin up new capacity in under 90 seconds. A secondary ASG provided a different instance-type mix as a fallback if the primary cluster hit hard limits.

Result

25.3 Million — Zero Downtime

The IND vs NZ semi-final became the largest concurrent live stream in history at the time: 25.3 million viewers, zero reported downtime. The platform handled both the spike to peak and the catastrophic drop when Dhoni got out — the homepage API layer, pre-scaled and tested via chaos engineering, absorbed the sudden exit traffic without incident.

The Fix

How Hotstar Replaced AWS Autoscaling with a Custom Scaling Engine

AWS Auto Scaling Groups were built for the average web application: traffic grows gradually, CPU rises, new instances are added over minutes. Live sports streaming is the opposite. Traffic can double in 90 seconds when MS Dhoni appears on screen. Hotstar's engineering team identified three specific failure modes in ASG behaviour during live events: insufficient capacity errors (AWS returning an error when the requested EC2 instance type has no available inventory in a specific AZ at that moment) during peak demand, availability zone skew (a condition where one AZ accumulates more load than others, exhausting its capacity while other AZs still have headroom) when scaling across AZs, and the lag between a scaling trigger and a serving-ready instance that could run to 4+ minutes when including AMI bake time and application boot.

25.3M — peak concurrent viewers, July 9 2019 — a global streaming record at the time
<90s — target reaction time to new serving capacity with pre-baked AMIs vs 4+ minutes with cold ASG provisioning
2M buffer — permanent capacity headroom maintained above current load during all live events
0 — reported downtime during the 25.3M peak; Project HULK's chaos engineering paid off

# Hotstar's custom scaling logic — conceptual pseudocode
# Key insight: scale on REQUEST RATE and CONCURRENCY, not CPU utilisation

class HotstarScaler:
    CAPACITY_BUFFER = 2_000_000  # always maintain 2M concurrent user headroom
    REACTION_TIME_TARGET = 90    # seconds to new serving capacity
    BOOT_TIME = 75               # seconds for app boot from pre-baked AMI

    def compute_required_capacity(self, current_concurrency, request_rate):
        # Project forward 5 minutes using ML traffic model for this match
        projected_peak = self.ml_model.predict_peak(
            current=current_concurrency,
            rate=request_rate,
            event_type='cricket_live'
        )
        return projected_peak + self.CAPACITY_BUFFER  # never let buffer drop below 2M

    def pre_warm_for_event(self, event_metadata):
        # Called hours before match start — pre-bake AMIs in all regions
        # Uses HULK traffic model predictions, NOT current load
        predicted_peak = self.ml_model.predict_peak_from_event(event_metadata)
        target_capacity = predicted_peak + self.CAPACITY_BUFFER

        for region in ACTIVE_REGIONS:
            # PRIMARY ASG: main instance type mix, 80% of capacity
            primary_asg.scale_to(target_capacity * 0.8, region)
            # SECONDARY ASG: diverse instance types as fallback
            # Protects against insufficient capacity in any single instance family
            secondary_asg.scale_to(target_capacity * 0.2, region)

    def scale_on_signal(self, concurrency_metric, request_rate):
        required = self.compute_required_capacity(concurrency_metric, request_rate)
        current = infrastructure.get_active_capacity()
        if current < required:
            delta = required - current
            # SNS alert triggers Lambda to activate secondary ASG immediately
            sns.publish('scale_required', delta=delta)
            lambda_handler.trigger_secondary_asg(delta)

Project HULK: The Production Rehearsal

The most consequential engineering decision was building Project HULK — an internal load-testing platform with a footprint bigger than most companies' production environments. At full scale: 108,000 CPUs, 216 TB of RAM, and 200 Gbps of outbound network across 8 geographically distributed AWS regions. It performed four test categories: baseline load, tsunami tests (simulating sudden spike-and-drop profiles matching a Dhoni innings), chaos engineering with AZ failures, and ML-driven traffic pattern modelling. The goal: there should be no mode of failure in production that HULK has not already triggered in a test.

The secondary ASG pattern: emergency reserve against insufficient capacity

Hotstar ran two auto-scaling groups simultaneously for every live event. The primary ASG held the bulk of capacity, pre-warmed before the match. The secondary ASG — with a different mix of instance types — acted as an emergency reserve. An AWS SNS alert triggered a Lambda function to activate the secondary ASG when the primary hit limits. This avoided the single-point failure of depending on one instance family being available in a specific AZ during peak national demand. Instance diversity is the defence against AWS's "insufficient capacity" errors at peak.

The graceful degradation protocol: video delivery last

Hotstar defined a tiered degradation order for when services approached capacity limits. Tier 1 (shed first): recommendations, social features, personalisation. Tier 2: match statistics, secondary content APIs. Tier 3 (never shed): live video delivery, playback APIs, authentication. The protocol was automated and tested via Project HULK's chaos engineering scenarios — so when the real moment came, the system degraded on its own without human intervention. The contract was explicit: degrade gracefully rather than fail catastrophically.

The push notification trap: marketing as an unplanned infrastructure event

The marketing team's push notifications — sent to bring users back to the app during exciting match moments — were a hidden traffic generator that engineering had no advance warning of. Every notification blast created an immediate, synchronised spike in both video and API traffic. Hotstar had to build a feedback loop where marketing intent was translated into infrastructure capacity decisions before the notification was sent, not after. The Infradashboard gave engineers visibility into the capacity impact of planned notification campaigns — proactive management, not reactive firefighting.

Architecture

Hotstar's architecture in 2019 was built for one governing constraint: traffic does not arrive smoothly. A cricket match involving India can go from zero to 10 million concurrent viewers in under ten minutes — faster than any traditional auto-scaling system can respond. Apache Kafka ingested 10 billion-plus clickstream events per match day. Microservices handled video playback, match statistics, personalisation, and authentication independently — each able to be scaled or degraded without affecting the others.

Hotstar 2019 Architecture: The Path from Phone to Live Video

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).

Project HULK: Load Testing Architecture That Simulated the Real Match

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).

Bandwidth at the Edge of India's Internet

At 25.3 million concurrent users, Hotstar's bandwidth consumption hit 5.7 Tbps — approximately 70% of India's total available internet capacity at the time. This is not a platform metric; it is a national infrastructure metric. Adding more servers would not help if India's total internet capacity couldn't carry the bytes. This ceiling forced engineers to think about CDN efficiency and adaptive bitrate streaming (a technology that dynamically switches video quality based on the viewer's network conditions) not just as user experience decisions, but as existential infrastructure constraints.

Lessons

For live events, autoscaling solves the wrong problem. AWS ASG is designed for gradual load growth — it cannot provision capacity at the rate that 1.1 million users per minute demands. The correct model for predictable traffic spikes (load patterns where the timing and approximate magnitude of peak traffic is known in advance) is pre-warming based on predicted load, not reactive scaling based on current metrics. If you have a scheduled high-traffic event, provision for it before it starts.
Scale on the metrics that reflect user experience, not server health. Hotstar rejected CPU and memory utilisation as scaling signals in favour of request rate and concurrency — the metrics that directly reflect how many users the system is currently serving. A server at 30% CPU can still be serving users who are getting a degraded experience. Build your autoscaler around the business-level constraint, not the infrastructure-level symptom.
Test the exit, not just the entry. Every load test rehearses the traffic spike. Almost none rehearse the drop. When 24 million users hit back simultaneously, the homepage and recommendation APIs absorbed a wave entirely different in character from streaming traffic. Tsunami testing (a load test pattern simulating sudden, extreme traffic spikes followed by equally extreme drops) explicitly rehearses both the spike and the collapse. Design graceful degradation tiers, and test that they activate correctly.
Marketing is an unplanned infrastructure event. Push notifications sent at peak match moments created synchronised traffic spikes that engineering had no advance knowledge of. Build a feedback loop where marketing decisions trigger infrastructure responses before the notification is sent, not after. The Infradashboard gave engineers visibility; the next step is giving marketing a capacity-aware interface for campaign timing.
A capacity buffer is not waste — it is the cost of reliability for live events. Hotstar maintained a permanent 2-million-user capacity buffer above current load throughout every match. For live events, the cost of over-provisioning is always lower than the cost of the three minutes when it fails.

Engineering Glossary

Adaptive bitrate streaming (ABR) — a technology that dynamically switches video quality based on the viewer's available network bandwidth, delivering the best quality the connection can support at any moment. Critical at Hotstar's scale where India's total internet capacity was the binding constraint.

Availability Zone (AZ) skew — a condition where one AWS Availability Zone accumulates more load than others, exhausting its capacity while other AZs still have headroom. Hotstar's secondary ASG with diverse instance types was the defence against this.

CDN (Content Delivery Network) — a distributed system of edge servers that caches and delivers content from locations close to the user, reducing latency and offloading origin servers. Akamai served as Hotstar's CDN, delivering video segments and absorbing the majority of load before it reached origin infrastructure.

Chaos engineering — the practice of deliberately injecting failures into a system to discover weaknesses before they manifest in production. Project HULK's chaos engineering tests simulated AZ failures during load, validating that Hotstar's infrastructure could survive them during a match.

Graceful degradation — a system design principle where, under capacity pressure, non-essential features are disabled in a defined order to preserve capacity for the most critical user path. For Hotstar: recommendations and social features shed first; live video delivery sheds last.

Insufficient capacity error — an AWS error returned when the requested EC2 instance type has no available inventory in a specific Availability Zone at that moment. Unrecoverable during peak national demand — the defence is pre-warming with a diverse instance type mix before the event.

Pre-warming — provisioning infrastructure to expected peak capacity before load arrives, based on predicted demand rather than current metrics. The correct model for predictable traffic spikes like scheduled live sports events.

Tsunami test — a load test pattern that simulates sudden, extreme traffic spikes followed by equally extreme drops — matching the profile of a cricket match when a wicket falls. Tests both the spike response and the exit traffic handling. Named for the wave pattern.

This case is a plain-English retelling of publicly available engineering material.

Read the full case on TechLogStack →

(Interactive diagrams, source links, and the full reader experience)

TechLogStack — built at scale, broken in public, rebuilt by engineers.

DEV Community