<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pritam Roy</title>
    <description>The latest articles on DEV Community by Pritam Roy (@pritamroy-devops).</description>
    <link>https://dev.to/pritamroy-devops</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3801847%2F59724c13-747f-4073-82f4-b28400eb0ce7.png</url>
      <title>DEV Community: Pritam Roy</title>
      <link>https://dev.to/pritamroy-devops</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pritamroy-devops"/>
    <language>en</language>
    <item>
      <title>The AI Platform Wars 2026: Stop Asking Which AI Is Best - Ask This Instead</title>
      <dc:creator>Pritam Roy</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:48:03 +0000</pubDate>
      <link>https://dev.to/pritamroy-devops/the-ai-platform-wars-2026-stop-asking-which-ai-is-best-ask-this-instead-2hc8</link>
      <guid>https://dev.to/pritamroy-devops/the-ai-platform-wars-2026-stop-asking-which-ai-is-best-ask-this-instead-2hc8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;strong&gt;This is a curated excerpt. The full deep-dive - with detailed comparison tables, architecture-level insights, performance breakdowns, and final verdicts - is on my blog:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;👉 &lt;a href="https://www.pritamroy.com/blog/posts/the-ai-platform-wars-2026-edition-chatgpt-vs-claude-vs-gemini-vs-copilot-vs-grok.html" rel="noopener noreferrer"&gt;pritamroy.com - The AI Platform Wars 2026&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Question Everyone Is Asking (And Getting Wrong)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Which AI is the best right now?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I see this on Twitter, Reddit, LinkedIn, every tech forum - every single day.&lt;/p&gt;

&lt;p&gt;And almost every answer misses the point entirely.&lt;/p&gt;

&lt;p&gt;Because &lt;strong&gt;ChatGPT, Claude, Gemini, Copilot, Grok, Perplexity, and DeepSeek are not on the same battlefield.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They are solving different problems.&lt;br&gt;&lt;br&gt;
For different users.&lt;br&gt;&lt;br&gt;
With different architectural bets.&lt;/p&gt;

&lt;p&gt;Comparing them without that context is like asking:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Is a scalpel better than a hammer?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It depends on what you are trying to build - or cut open.&lt;/p&gt;


&lt;h2&gt;
  
  
  ⚔️ We Are Not in an AI Tool Race
&lt;/h2&gt;

&lt;p&gt;We are in a &lt;strong&gt;Platform Ecosystem War.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each of these companies is not just building a smarter chatbot.&lt;br&gt;&lt;br&gt;
They are building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;strong&gt;Integrations&lt;/strong&gt; - into your IDE, your email, your browser, your cloud&lt;/li&gt;
&lt;li&gt;🏗️ &lt;strong&gt;Developer workflows&lt;/strong&gt; - where you code, how you ship&lt;/li&gt;
&lt;li&gt;🗄️ &lt;strong&gt;Data pipelines&lt;/strong&gt; - who owns your enterprise context&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Ecosystems&lt;/strong&gt; - that lock you in, gently but completely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI response quality? That's table stakes now.&lt;/p&gt;

&lt;p&gt;The real war is about &lt;strong&gt;who controls your workflow.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🧠 The 7 Platforms - What They Actually Are
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ⚡ ChatGPT - The Swiss Army Knife
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Balanced across coding, writing, analysis, and structured tasks&lt;/li&gt;
&lt;li&gt;Largest ecosystem of plugins and integrations&lt;/li&gt;
&lt;li&gt;GPT-4o brings strong multimodal capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; General-purpose AI work, teams that need one tool for everything&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🧩 Claude - The Deep Thinker
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Best-in-class long-context reasoning (up to 200K tokens)&lt;/li&gt;
&lt;li&gt;Exceptional for nuanced, complex, multi-step problems&lt;/li&gt;
&lt;li&gt;Strong focus on safety, alignment, and reducing hallucinations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Deep research, long-form writing, architecture reviews, anything requiring sustained reasoning&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🌐 Gemini - The Google Insider
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Native integration with Google Workspace (Docs, Sheets, Gmail, Drive)&lt;/li&gt;
&lt;li&gt;Strong multimodal capabilities - image, audio, video&lt;/li&gt;
&lt;li&gt;Rapidly closing the gap in reasoning benchmarks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Teams living in Google ecosystem, multimodal workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  💻 Microsoft Copilot - The Developer's Co-Pilot
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Embedded directly into VS Code, GitHub, Azure, M365&lt;/li&gt;
&lt;li&gt;Context-aware - understands your repo, your codebase, your tickets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Enterprise developers, Microsoft-stack teams, productivity workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  ⚡ Grok - The Real-Time Reactor
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Direct access to X (Twitter) real-time data&lt;/li&gt;
&lt;li&gt;Fast, opinionated, less filtered responses&lt;/li&gt;
&lt;li&gt;Still maturing in depth and accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Current events, social data analysis, quick real-time lookups&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔍 Perplexity - The Research Engine
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hybrid of search engine + AI reasoning&lt;/li&gt;
&lt;li&gt;Citation-backed answers - you can verify every claim&lt;/li&gt;
&lt;li&gt;Strong factual accuracy for technical and news queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Research, fact-checking, staying current without hallucination risk&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🚀 DeepSeek - The Emerging Disruptor
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Exceptional coding performance, often competing with GPT-4 class models&lt;/li&gt;
&lt;li&gt;Significantly lower cost&lt;/li&gt;
&lt;li&gt;Open-weight models available for self-hosting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Coding-heavy teams, cost-conscious deployments, developers who want control&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  📊 The Cheat Sheet (Quick Reference)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Core Strength&lt;/th&gt;
&lt;th&gt;Biggest Weakness&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ChatGPT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balanced, versatile&lt;/td&gt;
&lt;td&gt;Jack of all trades&lt;/td&gt;
&lt;td&gt;General-purpose, coding, structured tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long reasoning, nuance&lt;/td&gt;
&lt;td&gt;Slower, fewer integrations&lt;/td&gt;
&lt;td&gt;Deep thinking, long-form content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google integration, multimodal&lt;/td&gt;
&lt;td&gt;Still catching up on reasoning&lt;/td&gt;
&lt;td&gt;Google workspace, image/video work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IDE + enterprise integration&lt;/td&gt;
&lt;td&gt;Narrow outside Microsoft stack&lt;/td&gt;
&lt;td&gt;Dev productivity, code reviews&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Grok&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time data&lt;/td&gt;
&lt;td&gt;Depth and accuracy&lt;/td&gt;
&lt;td&gt;Live news, social trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perplexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Factual accuracy, citations&lt;/td&gt;
&lt;td&gt;Not for creative tasks&lt;/td&gt;
&lt;td&gt;Research, fact-checking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Coding, cost efficiency&lt;/td&gt;
&lt;td&gt;Smaller ecosystem&lt;/td&gt;
&lt;td&gt;Budget-conscious coding teams&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  🧩 The Real Framework: Map Tool to Task
&lt;/h2&gt;

&lt;p&gt;Stop asking &lt;strong&gt;"which is best?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start asking &lt;strong&gt;"which is best for THIS?"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Top Picks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🧑‍💻 &lt;strong&gt;Writing code&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Copilot → DeepSeek → ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📝 &lt;strong&gt;Long-form writing&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Claude → ChatGPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔍 &lt;strong&gt;Research &amp;amp; fact-checking&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Perplexity → Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📊 &lt;strong&gt;Data analysis&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;ChatGPT → Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;⚡ &lt;strong&gt;Real-time information&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Grok → Perplexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🏢 &lt;strong&gt;Enterprise Microsoft stack&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌐 &lt;strong&gt;Google Workspace&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🏗️ &lt;strong&gt;Architecture reviews&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;💰 &lt;strong&gt;Budget-conscious deployment&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  🔥 The Insight Most Engineers Miss
&lt;/h2&gt;

&lt;p&gt;Here is what is actually happening beneath the surface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Quality Gap → Closing Fast
Ecosystem Control Gap → Widening Fast
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A year ago, GPT-4 had a clear quality lead.&lt;br&gt;&lt;br&gt;
Today? DeepSeek matches it on coding. Claude matches it on reasoning. Gemini is catching up on multimodal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model quality differentiation is shrinking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But the &lt;strong&gt;ecosystem lock-in is growing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Copilot understands your GitHub repo.&lt;br&gt;&lt;br&gt;
Gemini reads your Google Drive.&lt;br&gt;&lt;br&gt;
Claude remembers your enterprise documents.&lt;/p&gt;

&lt;p&gt;The question is no longer just "which AI gives better answers?"&lt;/p&gt;

&lt;p&gt;It is: &lt;strong&gt;"Which AI is embedded so deep in your workflow that switching becomes painful?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the real Platform War.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 What's Your Daily Driver?
&lt;/h2&gt;

&lt;p&gt;I am genuinely curious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which AI platform do you actually use most in your engineering or creative workflow?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And more importantly - &lt;strong&gt;why that one?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Real usage patterns beat benchmark comparisons every time.&lt;br&gt;&lt;br&gt;
Drop it in the comments. 👇&lt;/p&gt;




&lt;blockquote&gt;
&lt;h2&gt;
  
  
  👉 Want the Full Deep-Dive?
&lt;/h2&gt;

&lt;p&gt;This post covered the high-level landscape.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;full article on my blog&lt;/strong&gt; includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📊 Detailed comparison tables across 12+ dimensions&lt;/li&gt;
&lt;li&gt;🏗️ Architecture-level analysis of each platform&lt;/li&gt;
&lt;li&gt;⚡ Real-world performance breakdown by engineering task&lt;/li&gt;
&lt;li&gt;🔮 Where each platform is heading in 2026&lt;/li&gt;
&lt;li&gt;🎯 My personal verdict after daily usage of all 7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;👉 &lt;a href="https://www.pritamroy.com/blog/posts/the-ai-platform-wars-2026-edition-chatgpt-vs-claude-vs-gemini-vs-copilot-vs-grok.html" rel="noopener noreferrer"&gt;Read the full analysis on pritamroy.com&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://www.pritamroy.com" rel="noopener noreferrer"&gt;Pritam Roy&lt;/a&gt; - Senior AWS Cloud &amp;amp; DevOps Engineer. I write about cloud architecture, real-world infrastructure, and the tools engineers actually use.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If this was useful, drop a ❤️ - it helps other engineers find it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devops</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How JioHotstar Engineered 82.1 Crore Concurrent Streams - A DevOps Deep Dive into the T20 World Cup 2026 Final</title>
      <dc:creator>Pritam Roy</dc:creator>
      <pubDate>Tue, 10 Mar 2026 10:31:03 +0000</pubDate>
      <link>https://dev.to/pritamroy-devops/how-jiohotstar-engineered-821-crore-concurrent-streams-a-devops-deep-dive-into-the-t20-world-cup-2h93</link>
      <guid>https://dev.to/pritamroy-devops/how-jiohotstar-engineered-821-crore-concurrent-streams-a-devops-deep-dive-into-the-t20-world-cup-2h93</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://www.pritamroy.com/blog/posts/how-jiohotstar-engineered-821-crore-concurrent-streams-a-devops-deep-dive-into-t.html" rel="noopener noreferrer"&gt;pritamroy.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting the Stage: What Actually Happened on March 8, 2026
&lt;/h2&gt;

&lt;p&gt;Before we talk infrastructure, let's appreciate the scale of the event that stress-tested it.&lt;/p&gt;

&lt;p&gt;India defeated New Zealand by 96 runs in the ICC Men's T20 World Cup 2026 Final at the Narendra Modi Stadium in Ahmedabad, posting a mammoth 255/5 - the highest total ever in a T20 World Cup final. India became the first team in history to retain their T20 World Cup title, and the first to win three T20 World Cup titles overall. The stadium held 86,000 roaring fans. Hundreds of millions watched on screens across every corner of India and the world.&lt;/p&gt;

&lt;p&gt;And JioHotstar? It didn't just survive. It rewrote history.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers That Made Engineers Sweat (And Then Celebrate)
&lt;/h2&gt;

&lt;p&gt;The concurrent viewership peaked at &lt;strong&gt;82.1 crore simultaneous streams&lt;/strong&gt; during the post-match presentation ceremony. Let that number sink in - 821 million streams at a single moment, from a single platform, from a single country.&lt;/p&gt;

&lt;p&gt;Here's how the demand curve looked throughout the match:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Moment&lt;/th&gt;
&lt;th&gt;Concurrent Viewers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ricky Martin's opening performance&lt;/td&gt;
&lt;td&gt;2.1 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;At the toss&lt;/td&gt;
&lt;td&gt;4.2 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;End of India's innings (255/5)&lt;/td&gt;
&lt;td&gt;43.9 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Innings break&lt;/td&gt;
&lt;td&gt;44.3 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Zealand start chasing 255&lt;/td&gt;
&lt;td&gt;49.9 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;End of the 1st over of the chase&lt;/td&gt;
&lt;td&gt;50.3 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The moment the last wicket fell&lt;/td&gt;
&lt;td&gt;74.5 crore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-match presentation ceremony&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82.1 crore&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a textbook demand curve for any DevOps engineer to study - a slow warm-up, a steep mid-event ramp, and a vertical spike at the moment of maximum drama. Every system design decision JioHotstar made had to account for exactly this shape.&lt;/p&gt;

&lt;p&gt;For context: the 2024 T20 WC Final peaked at just 5.3 crore on Disney+ Hotstar. In two years, they scaled peak concurrency by more than &lt;strong&gt;15x&lt;/strong&gt;. That is not an accident - that is an engineering masterclass.&lt;/p&gt;

&lt;p&gt;They also came into the Final having broken a world record just days earlier. During the India vs England semi-final on March 5, JioHotstar recorded 65.2 million peak concurrent viewers - the highest concurrency ever achieved for a live event across any digital platform in the world. The Final obliterated even that number.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: The Foundation - Understanding JioHotstar's Architecture Origins
&lt;/h2&gt;

&lt;p&gt;To understand the engineering decisions, you need to understand the entity first.&lt;/p&gt;

&lt;p&gt;By late 2024, Reliance Industries (through Viacom18) and The Walt Disney Company announced an &lt;strong&gt;$8.5 billion joint venture&lt;/strong&gt; called JioStar, combining Viacom18's media assets with Disney's Star India and Hotstar operations in India.&lt;/p&gt;

&lt;p&gt;This merger gave the DevOps teams something rare: two battle-hardened streaming backends to draw lessons from. Disney+ Hotstar had years of cricket-at-scale experience, having served the 2023 ODI World Cup and multiple IPL seasons. JioCinema had cracked the 4K pipeline and aggressive CDN work. The 2026 World Cup was the first true test of whether the combined architecture could handle something neither had ever attempted alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: Pre-Tournament Planning - How You Prepare for an 82-Crore Spike
&lt;/h2&gt;

&lt;p&gt;In DevOps, you never wait for production to find your limits. JioHotstar's SRE teams began capacity planning months before the first ball was bowled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traffic Forecasting Using Historical Data
&lt;/h3&gt;

&lt;p&gt;The SRE teams forecast traffic using a predictive model trained on data from previous major streaming events: the 2024 T20 WC Final (5.3 crore), the 2023 ODI WC Final (5.9 crore), Asia Cup peaks, and IPL finals. Engineers built regression models accounting for factors like: is India playing? What stage of the tournament is it? What time of day? What are the network conditions across India's diverse geography?&lt;/p&gt;

&lt;p&gt;The key insight: the Final was always going to be the largest event, and the models needed to be revised upward after each knockout match. After the semi-final already set a world record at 65.2 million concurrent, the capacity plan for the Final had to be re-evaluated entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load Testing at Scale - Project HULK
&lt;/h3&gt;

&lt;p&gt;JioHotstar created an in-house project called &lt;strong&gt;"Project HULK"&lt;/strong&gt; specifically to stress-test their platform before major events. The load generation infrastructure used &lt;code&gt;c5.9xlarge&lt;/code&gt; machines distributed across 8 different AWS regions to simultaneously hit the CDN, load balancers, and application layers.&lt;/p&gt;

&lt;p&gt;The reason for distributing across 8 regions is subtle but important: cloud providers share underlying physical infrastructure. A massive synthetic load originating from a single region could inadvertently impact other customers co-located on the same hardware. By spreading synthetic load across regions, you simulate a real-world distributed user base while being a responsible cloud tenant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-warming: The Underrated Hero
&lt;/h3&gt;

&lt;p&gt;Every time a major cricket match was about to begin under the old architecture, the operations team had to manually pre-warm hundreds of load balancers. In the new architecture, this process was fully automated.&lt;/p&gt;

&lt;p&gt;But the discipline of pre-warming remained: before the Ricky Martin opening performance even started, JioHotstar's edge nodes, CDN caches, and application clusters were already scaled up and warm. Pre-warming CDN caches with the stream's initial HLS segments, spinning up Kubernetes node pools ahead of anticipated demand, pre-populating authentication session caches - all of this is part of the playbook.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You don't wait for traffic to arrive. You meet it at the door.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 3: The Kubernetes Architecture - DataCenter Abstraction
&lt;/h2&gt;

&lt;p&gt;This is the most significant architectural evolution in JioHotstar's history, and the one with the most lessons for any platform engineering team.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Old World
&lt;/h3&gt;

&lt;p&gt;Previously, Hotstar managed its workloads on two large, self-managed Kubernetes clusters built using KOPS (Kubernetes Operations), running 800+ microservices across them. Every microservice had its own AWS Application Load Balancer (ALB) using NodePort services.&lt;/p&gt;

&lt;p&gt;The request flow looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → CDN → ALB → NodePort → kube-proxy → Pod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problems were multiple. Hundreds of ALBs needed to be manually pre-warmed before every major match - an error-prone, time-consuming process. The old Cluster Autoscaler was too slow to release or consolidate nodes efficiently during off-peak periods. And scaling beyond 400 nodes simultaneously caused API server throttling - a hard ceiling on their peak capacity.&lt;/p&gt;

&lt;h3&gt;
  
  
  The New Model: DataCenter Abstraction
&lt;/h3&gt;

&lt;p&gt;The new model introduced a concept called &lt;strong&gt;DataCenter Abstraction&lt;/strong&gt;. A "data center" in this model doesn't refer to a physical building - it's a logical grouping of multiple Kubernetes clusters within a specific region. Together, these clusters behave like a single large compute unit, with each application team given a single logical namespace.&lt;/p&gt;

&lt;p&gt;What this means in practice for the World Cup Final:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JioHotstar could treat its AWS infrastructure across Mumbai, Hyderabad, and Delhi as a &lt;strong&gt;single logical pool&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A central &lt;strong&gt;Envoy proxy&lt;/strong&gt; replaced hundreds of individual ALBs, unifying traffic routing, authentication, and rate-limiting in one place&lt;/li&gt;
&lt;li&gt;Services moved from NodePort to &lt;strong&gt;ClusterIP + ALB Ingress&lt;/strong&gt;, eliminating hard port limits&lt;/li&gt;
&lt;li&gt;Developers deploy one YAML manifest per service; the platform handles failover and routing behind the scenes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They also migrated from self-managed KOPS clusters to &lt;strong&gt;Amazon EKS&lt;/strong&gt;, offloading Kubernetes control plane management to AWS. Combined with &lt;strong&gt;Karpenter&lt;/strong&gt;, nodes now provision in seconds rather than minutes - critical when viewership goes from 44 crore to 74 crore in the final 4 overs of a chase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Karpenter NodePool - simplified example&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;karpenter.sh/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NodePool&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live-streaming-pool&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;requirements&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node.kubernetes.io/instance-type"&lt;/span&gt;
          &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;c6i.8xlarge"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;c6g.8xlarge"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;c5.9xlarge"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;karpenter.sh/capacity-type"&lt;/span&gt;
          &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on-demand"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spot"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topology.kubernetes.io/zone"&lt;/span&gt;
          &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1a"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1b"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1c"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;kubelet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;maxPods&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;110&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000"&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16Ti"&lt;/span&gt;
  &lt;span class="na"&gt;disruption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;consolidationPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WhenUnderutilized&lt;/span&gt;
    &lt;span class="na"&gt;expireAfter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;720h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;capacity-type&lt;/code&gt; includes both &lt;code&gt;on-demand&lt;/code&gt; and &lt;code&gt;spot&lt;/code&gt;, meaning Karpenter intelligently places stateless, fault-tolerant workloads on cheaper Spot instances while keeping critical session services on On-Demand. The &lt;code&gt;consolidationPolicy: WhenUnderutilized&lt;/code&gt; ensures nodes are immediately released during the innings break, saving cost in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  IP Address Management - A Lesson from 2023
&lt;/h3&gt;

&lt;p&gt;A critical incident during the 2023 World Cup involved running out of IP addresses. The VPC CNI plugin's &lt;code&gt;WARM_IP_TARGET&lt;/code&gt; and &lt;code&gt;MINIMUM_IP_TARGET&lt;/code&gt; settings were over-allocating IPs per node. For 2026, engineers used larger CIDR blocks (&lt;code&gt;/18&lt;/code&gt; instead of &lt;code&gt;/20&lt;/code&gt;) and fine-tuned these settings, allowing clusters to scale beyond 400 nodes without hitting IP exhaustion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: Infrastructure Scaling - Eliminating Every Bottleneck
&lt;/h2&gt;

&lt;p&gt;Kubernetes architecture is only part of the picture. The network infrastructure underneath also needed surgery.&lt;/p&gt;

&lt;h3&gt;
  
  
  NAT Gateway Scaling
&lt;/h3&gt;

&lt;p&gt;Monitoring with VPC Flow Logs revealed a frightening discovery during a pre-tournament load test: a single Kubernetes cluster was consuming &lt;strong&gt;50% of its NAT Gateway throughput at just 10% of expected peak load&lt;/strong&gt;. At full Final traffic, this would have been a catastrophic bottleneck.&lt;/p&gt;

&lt;p&gt;The fix: scale out from one NAT Gateway per Availability Zone to &lt;strong&gt;one NAT Gateway per subnet&lt;/strong&gt;. This distributed the external traffic load evenly and eliminated the pressure point entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker Node Network Optimization
&lt;/h3&gt;

&lt;p&gt;Load tests showed that internal API Gateway pods were consuming 8–9 Gbps of network bandwidth on individual nodes, causing severe contention with other services.&lt;/p&gt;

&lt;p&gt;Two fixes were implemented in parallel:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy high-throughput nodes with a minimum capacity of &lt;strong&gt;10 Gbps&lt;/strong&gt; for API Gateway workloads&lt;/li&gt;
&lt;li&gt;Use Kubernetes topology spread constraints to ensure only one API Gateway pod runs per node
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Topology spread constraint for API Gateway pods&lt;/span&gt;
&lt;span class="na"&gt;topologySpreadConstraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;maxSkew&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/hostname&lt;/span&gt;
    &lt;span class="na"&gt;whenUnsatisfiable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoNotSchedule&lt;/span&gt;
    &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-gateway&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This constraint ensures Kubernetes never schedules two API Gateway pods on the same physical node. The result: throughput stabilized at &lt;strong&gt;2–3 Gbps per node&lt;/strong&gt; even at peak, rather than saturating at 8–9 Gbps on a few overloaded nodes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: The Video Pipeline - From Camera to 82 Crore Phones in Under 5 Seconds
&lt;/h2&gt;

&lt;p&gt;Most people think of streaming as "just sending video." For a live match at this scale, it is an extraordinarily intricate real-time data pipeline with multiple stages, each completing in sub-second timeframes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1 - Ingestion: Getting the Feed from the Ground
&lt;/h3&gt;

&lt;p&gt;At the Narendra Modi Stadium, production crews captured the match using multiple HD and 4K cameras. The raw feed travels via dedicated broadcast fiber links using &lt;strong&gt;SRT (Secure Reliable Transport)&lt;/strong&gt; protocol. SRT provides approximately 20% packet loss recovery compared to the older RTMP protocol - critical given India's network variability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2 - Transcoding: One Feed, 100 Million Devices
&lt;/h3&gt;

&lt;p&gt;Raw feeds hit &lt;strong&gt;AWS Elemental MediaLive&lt;/strong&gt; on &lt;code&gt;p4d.24xlarge&lt;/code&gt; GPU instances, transcoding multiple adaptive renditions in under 2 seconds. A single 4K broadcast feed is simultaneously converted into:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Target Audience&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;360p&lt;/td&gt;
&lt;td&gt;2G/3G users in rural India&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;480p&lt;/td&gt;
&lt;td&gt;Moderate connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;720p&lt;/td&gt;
&lt;td&gt;Standard HD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1080p&lt;/td&gt;
&lt;td&gt;Good broadband&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4K HDR&lt;/td&gt;
&lt;td&gt;Premium fiber/5G subscribers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 2026 World Cup featured &lt;strong&gt;true 4K HDR&lt;/strong&gt; streaming - not upscaled 1080p - at genuinely high bitrates. Every rendition generated in real-time, in parallel, with sub-2-second latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3 - Packaging: HLS, DASH, and DRM
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS MediaPackage&lt;/strong&gt; segments outputs into HLS/DASH chunks at over 100,000 chunks per second, applies DRM encryption through Widevine and PlayReady, and dynamically adds captions and regional subtitles. MediaPackage does just-in-time packaging - eliminating the need to pre-generate format-specific segments for every device type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4 - Storage and Delivery
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Amazon S3 Intelligent-Tiering&lt;/strong&gt; stores HLS/DASH chunks with multi-AZ replication. &lt;strong&gt;CloudFront&lt;/strong&gt; delivers them via 300+ edge locations worldwide. Live stream segments are accessed billions of times in their first few seconds and then almost never again - S3 Intelligent-Tiering handles this access pattern perfectly, automatically reducing storage costs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 6: The CDN Layer - The True Workhorse of 82 Crore Streams
&lt;/h2&gt;

&lt;p&gt;If the video pipeline is the heart, the CDN is the circulatory system. No single origin server can serve 82 crore simultaneous streams.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-CDN Strategy
&lt;/h3&gt;

&lt;p&gt;JioHotstar employs a &lt;strong&gt;multi-CDN strategy&lt;/strong&gt; with an in-house CDN load optimizer that dynamically chooses between Akamai, CloudFront, and others, always routing viewers through the least congested path. If one CDN faces an issue, another picks up the slack - completely transparent to the viewer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traffic Segregation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traffic Type&lt;/th&gt;
&lt;th&gt;Routing Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cacheable (scorecards, stats, highlights)&lt;/td&gt;
&lt;td&gt;Dedicated CDN domain, aggressive cache TTLs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-cacheable (sessions, personalization)&lt;/td&gt;
&lt;td&gt;Separate routing path, correctness-first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-video (images, metadata)&lt;/td&gt;
&lt;td&gt;Cost-efficient CDN providers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This segregation preserves high-performance CDN capacity specifically for video segment delivery.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Jio Network Advantage: A Moat No Competitor Can Copy
&lt;/h3&gt;

&lt;p&gt;JioHotstar is part of a company that also &lt;strong&gt;owns the physical network&lt;/strong&gt; delivering the stream. Jio's 5G network works with Jio's own &lt;strong&gt;Mobile Edge Computing (MEC)&lt;/strong&gt; servers, placing compute resources physically inside the telecom network - at the base station layer - rather than in a distant cloud data center.&lt;/p&gt;

&lt;p&gt;For 500 million+ Jio subscribers, the World Cup Final was served from their own carrier's edge - a fundamentally different and faster delivery path than what any competitor can offer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 7: Microservices at Scale - 800+ Services Serving One Match
&lt;/h2&gt;

&lt;p&gt;The microservices architecture means video playback, authentication, personalization, live chat, multilingual commentary routing, payment processing, and analytics are all independent services. This isolation is critical: if the live emoji reaction feature crashes during Bumrah's 4th wicket, &lt;strong&gt;it should crash without affecting the video stream&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature Flags: The Safety Net
&lt;/h3&gt;

&lt;p&gt;Feature flags allow gradual rollout and instant kill-switches without any deployment. In a worst-case scenario - say, a memory leak in the live chat microservice - engineers flip a single flag to disable chat for all users, immediately reducing load without any restart or deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Kafka and Flink Real-Time Pipeline
&lt;/h3&gt;

&lt;p&gt;Every viewer generates continuous telemetry events. At 82 crore concurrent users, this is &lt;strong&gt;billions of messages per second&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Apache Kafka&lt;/strong&gt; - distributed, fault-tolerant message queue absorbing event bursts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Flink&lt;/strong&gt; - real-time processing for dashboards, anomaly detection, and adaptive algorithms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 8: Observability - The SRE War Room During the Final
&lt;/h2&gt;

&lt;p&gt;The monitoring stack ran three layers simultaneously:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS CloudWatch&lt;/td&gt;
&lt;td&gt;Infrastructure metrics (EC2 CPU, RDS connections, NAT throughput)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Application-level and custom business metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;Real-time visualization - latency, throughput, rebuffer trends&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single most important metric: &lt;strong&gt;rebuffer rate&lt;/strong&gt; - the percentage of viewers experiencing playback interruption.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Prometheus alert rule for rebuffer rate&lt;/span&gt;
&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;rate&lt;span class="o"&gt;(&lt;/span&gt;media_rebuffer_events[5m]&lt;span class="o"&gt;))&lt;/span&gt; / &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;rate&lt;span class="o"&gt;(&lt;/span&gt;media_play_time[5m]&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; 0.004
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At 82 crore viewers, 0.4% means &lt;strong&gt;3.28 crore people buffering simultaneously&lt;/strong&gt; - an unacceptable outcome. Every metric had an automated alert. Every alert had a documented runbook. Every runbook had been practiced.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chaos Engineering: Breaking Things Before Match Day
&lt;/h3&gt;

&lt;p&gt;Before major events, JioHotstar's teams ran chaos drills at 2 AM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deliberately killing an entire Availability Zone&lt;/li&gt;
&lt;li&gt;Simulating a CDN provider outage&lt;/li&gt;
&lt;li&gt;Injecting latency into the authentication service&lt;/li&gt;
&lt;li&gt;Validating automated failover and recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Good SRE teams don't wait for production failures - they engineer them deliberately.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 9: Caching Strategy - Keeping 82 Crore Sessions Alive
&lt;/h2&gt;

&lt;p&gt;The solution is an aggressive multi-layer caching hierarchy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 - CDN Edge Cache&lt;/strong&gt;&lt;br&gt;
The video segment cached at the CDN. If served from a CloudFront edge PoP, JioHotstar's origin never sees that request at all. This is the most important cache hit in the entire system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 - Application-Level Redis Cache&lt;/strong&gt;&lt;br&gt;
User session tokens and subscription entitlements cached in Redis clusters. Subscription verified once at playback start, cached for the match duration. Subsequent requests bypass the database entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 - Database Read Replicas&lt;/strong&gt;&lt;br&gt;
Multiple read replicas spread across AZs serve preferences and recommendation data. Write traffic goes only to the primary.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A well-designed caching layer means 82 crore viewers might generate fewer database queries than 5 lakh viewers on a poorly designed system.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 10: Adaptive Bitrate and AI Optimization - Client Intelligence at Scale
&lt;/h2&gt;

&lt;p&gt;The ABR player constantly measures download speed, buffer health, and network latency - running entirely on the client side. For 82 crore simultaneous viewers, even a 1ms server-side computation per quality decision would be catastrophic - &lt;strong&gt;that's 820,000 seconds of compute per decision cycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;JioHotstar's &lt;strong&gt;AI-powered bitrate optimization&lt;/strong&gt; achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;25% average bitrate reduction without compromising perceived quality&lt;/li&gt;
&lt;li&gt;12% more watch time due to reduced buffering&lt;/li&gt;
&lt;li&gt;Proactive network condition prediction before rebuffering begins&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 11: Cost Architecture - 15x Scale Without 15x the Bill
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost per 1M viewers&lt;/td&gt;
&lt;td&gt;~$0.87–$0.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget variance&lt;/td&gt;
&lt;td&gt;~22% under budget&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spot instance discount&lt;/td&gt;
&lt;td&gt;Up to 90% vs On-Demand&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Spot Instances&lt;/strong&gt; were used for all stateless, fault-tolerant workloads: transcoding workers, telemetry processors, recommendation engines. Session-critical services ran on On-Demand or Reserved capacity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Karpenter's bin-packing and consolidation&lt;/strong&gt; continuously released underutilized nodes between matches, reducing running costs to near-zero between sessions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 12: Multi-Language, Multi-Format - Serving Every Indian
&lt;/h2&gt;

&lt;p&gt;India is not one market. It is 22 official languages, hundreds of dialects, and a spectrum from 2G feature phones in rural UP to 5G flagship devices in Bangalore.&lt;/p&gt;

&lt;p&gt;Commentary was available in Hindi, English, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, and more - each a &lt;strong&gt;separate audio track dynamically stitched&lt;/strong&gt; into the HLS manifest at request time based on viewer preference.&lt;/p&gt;

&lt;p&gt;JioHotstar simultaneously ran four distinct product experiences from the same underlying stream:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard player&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hype Mode&lt;/strong&gt; (vertical video with real-time stat overlays)&lt;/li&gt;
&lt;li&gt;Multi-cam view&lt;/li&gt;
&lt;li&gt;Highlights scrubber&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The platform also deployed &lt;strong&gt;CMAF (Common Media Application Format)&lt;/strong&gt; low-latency protocol at massive scale, achieving end-to-end delay of only a few seconds - crucial when millions of viewers are watching simultaneously with stadium audio bleeding through their windows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 13: Graceful Degradation - Planning for What You Don't Plan For
&lt;/h2&gt;

&lt;p&gt;In the event of unexpected traffic spikes beyond provisioned capacity, instead of showing a blank screen or error, the system pre-caches and serves &lt;strong&gt;static still images&lt;/strong&gt; (scoreboard, static broadcast frame) as a temporary placeholder while the video pipeline catches up.&lt;/p&gt;

&lt;p&gt;The engineering philosophy is clear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Protect the stream above everything else.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key Takeaways for DevOps and SRE Engineers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Automate pre-warming and scale playbooks.&lt;/strong&gt;&lt;br&gt;
At 82 crore scale, there is no time for human intervention in the scaling loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Data-driven capacity planning beats gut feel every time.&lt;/strong&gt;&lt;br&gt;
Use past events to forecast. Validate with load tests. Revise upward after each knockout match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Layered optimization covers every tier.&lt;/strong&gt;&lt;br&gt;
CDN edge → Kubernetes node pool → NAT gateway → database read replica. A bottleneck at any tier collapses the stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Managed services let teams focus on workloads, not infrastructure.&lt;/strong&gt;&lt;br&gt;
Moving from KOPS to EKS freed the platform team to focus on the microservices that actually differentiate their product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Infrastructure as Code is non-negotiable at 800+ microservices.&lt;/strong&gt;&lt;br&gt;
Every load balancer, CDN config, autoscaling policy, and node pool declared in code, version-controlled in Git, deployed through CI/CD.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Observability is not optional.&lt;/strong&gt;&lt;br&gt;
CloudWatch + Prometheus + Grafana + documented runbooks + practiced responses. This is what separates platforms that survive scale from platforms that become post-mortems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Plan for graceful failure, not just successful scale.&lt;/strong&gt;&lt;br&gt;
Feature flags as kill switches, static fallback images, circuit breakers - the difference between "lower quality for 30 seconds" and "error page for 82 crore people."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Final Score
&lt;/h2&gt;

&lt;p&gt;86,000 fans sang Vande Mataram inside the Narendra Modi Stadium as India lifted their third T20 World Cup. And 82.1 crore people watched it happen - simultaneously, on a single platform, without a single major outage, without viral complaints of buffering, and without the platform going down at the moment of the winning wicket.&lt;/p&gt;

&lt;p&gt;India won on the field. JioHotstar won in the server room. Both victories were built the same way: with preparation, with execution under pressure, and with a team that had practiced for exactly this moment.&lt;/p&gt;

&lt;p&gt;The next time you're tempted to skip the chaos drill or leave the pre-warming script manual, remember: &lt;strong&gt;someone at JioHotstar ran that drill at 2 AM so that 82 crore people could watch Bumrah take his 4th wicket on the smoothest stream of their lives.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.pritamroy.com/blog/posts/how-jiohotstar-engineered-821-crore-concurrent-streams-a-devops-deep-dive-into-t.html" rel="noopener noreferrer"&gt;pritamroy.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Discuss 💬
&lt;/h2&gt;

&lt;p&gt;Have you worked on large-scale streaming infrastructure, CDN optimization, or SRE for real-time systems? What architectural choices did your team make differently - especially around multi-CDN routing, Kubernetes autoscaling, or observability at high concurrency?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drop a comment below - I'd love to hear your experience. 👇&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Pritam Roy: From Network Engineer to AWS DevOps &amp; Cloud Engineer</title>
      <dc:creator>Pritam Roy</dc:creator>
      <pubDate>Mon, 02 Mar 2026 14:33:41 +0000</pubDate>
      <link>https://dev.to/pritamroy-devops/pritam-roy-from-network-engineer-to-aws-devops-cloud-engineer-jgb</link>
      <guid>https://dev.to/pritamroy-devops/pritam-roy-from-network-engineer-to-aws-devops-cloud-engineer-jgb</guid>
      <description>&lt;p&gt;When I started my career in IT, I didn’t begin in the cloud or with automation tools. I began by understanding how systems behave when they fail.&lt;/p&gt;

&lt;p&gt;I’m Pritam Roy, and over the past 9+ years in IT, my journey has taken me from working in network operations to designing and managing large-scale AWS cloud infrastructure and DevOps pipelines for production environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It Started
&lt;/h2&gt;

&lt;p&gt;I began my career in a Network Operations Center, monitoring enterprise MPLS and ILL networks. Working in a 24×7 operations environment taught me something that no certification can teach:&lt;/p&gt;

&lt;p&gt;Reliability matters more than theory.&lt;/p&gt;

&lt;p&gt;Seeing outages, latency spikes, and configuration issues in real time helped me understand how critical stability, monitoring, and structured processes are in infrastructure engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transitioning to Cloud and DevOps
&lt;/h2&gt;

&lt;p&gt;During the pandemic, I used that period to deeply invest in learning AWS, Linux systems, infrastructure automation, and deployment pipelines.&lt;/p&gt;

&lt;p&gt;Instead of focusing only on courses, I practiced by building real environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated deployments on AWS&lt;/li&gt;
&lt;li&gt;CI/CD pipelines for application delivery&lt;/li&gt;
&lt;li&gt;Secure networking setups&lt;/li&gt;
&lt;li&gt;Infrastructure provisioning using Terraform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That shift changed the trajectory of my career.&lt;/p&gt;

&lt;h2&gt;
  
  
  Working in Production Cloud Environments
&lt;/h2&gt;

&lt;p&gt;Today, I work as a Senior AWS Cloud and DevOps Engineer managing real production infrastructure.&lt;/p&gt;

&lt;p&gt;My work includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing hundreds of EC2 instances across environments&lt;/li&gt;
&lt;li&gt;Designing secure AWS VPC architectures&lt;/li&gt;
&lt;li&gt;Building CI/CD pipelines for Java, Node.js, React, Angular, and mobile apps&lt;/li&gt;
&lt;li&gt;Running containerized workloads with Docker and Kubernetes&lt;/li&gt;
&lt;li&gt;Implementing monitoring with Prometheus, Grafana, and CloudWatch&lt;/li&gt;
&lt;li&gt;Strengthening infrastructure security with IAM, GuardDuty, and WAF&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Working in fintech environments especially taught me how critical reliability, observability, and automation are.&lt;/p&gt;

&lt;p&gt;Infrastructure in such environments cannot afford downtime or weak security models.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Approach to DevOps
&lt;/h2&gt;

&lt;p&gt;For me, DevOps isn’t about tools. It’s about how systems are designed.&lt;/p&gt;

&lt;p&gt;I focus on:&lt;/p&gt;

&lt;p&gt;Automation — anything repeatable should be automated&lt;br&gt;
Security — infrastructure must be secure by design&lt;br&gt;
Scalability — systems should grow without breaking&lt;br&gt;
Observability — if you can’t see it, you can’t fix it&lt;/p&gt;

&lt;p&gt;If a system requires constant manual intervention, it isn’t engineered yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Focus On Today
&lt;/h2&gt;

&lt;p&gt;Currently, my core areas include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS infrastructure design and optimization&lt;/li&gt;
&lt;li&gt;Kubernetes and container orchestration&lt;/li&gt;
&lt;li&gt;Infrastructure as Code&lt;/li&gt;
&lt;li&gt;Secure cloud networking&lt;/li&gt;
&lt;li&gt;CI/CD automation&lt;/li&gt;
&lt;li&gt;Monitoring and reliability engineering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I enjoy working on systems that are built to last, not just built to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Connect
&lt;/h2&gt;

&lt;p&gt;If you’re interested in cloud engineering, DevOps practices, or infrastructure automation, feel free to connect.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.pritamroy.com" rel="noopener noreferrer"&gt;https://www.pritamroy.com&lt;/a&gt;&lt;br&gt;
LinkedIn: &lt;a href="https://www.linkedin.com/in/pritam-roy-2a55b684/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/pritam-roy-2a55b684/&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/pritamrai99" rel="noopener noreferrer"&gt;https://github.com/pritamrai99&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks for reading my journey. The cloud ecosystem keeps evolving, and I’m excited to keep building, learning, and improving the systems I work on.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>linux</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
