<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Apoorv Gupta</title>
    <description>The latest articles on DEV Community by Apoorv Gupta (@apoorv_gupta_a6a859429e14).</description>
    <link>https://dev.to/apoorv_gupta_a6a859429e14</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1732714%2F52b163af-a407-48c6-b65f-f016e3e689d7.jpg</url>
      <title>DEV Community: Apoorv Gupta</title>
      <link>https://dev.to/apoorv_gupta_a6a859429e14</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/apoorv_gupta_a6a859429e14"/>
    <language>en</language>
    <item>
      <title>Building CarbonSaathi: A Visible-Reasoning Carbon Companion for Indian Metro Professionals</title>
      <dc:creator>Apoorv Gupta</dc:creator>
      <pubDate>Sun, 21 Jun 2026 09:23:52 +0000</pubDate>
      <link>https://dev.to/apoorv_gupta_a6a859429e14/building-carbonsaathi-a-visible-reasoning-carbon-companion-for-indian-metro-professionals-35om</link>
      <guid>https://dev.to/apoorv_gupta_a6a859429e14/building-carbonsaathi-a-visible-reasoning-carbon-companion-for-indian-metro-professionals-35om</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://carbonsaathi-ahkpdce5pa-el.a.run.app" rel="noopener noreferrer"&gt;CarbonSaathi&lt;/a&gt; in roughly 60 hours for PromptWars Challenge 3: a carbon footprint companion for Indian metro professionals that logs activities in plain English and surfaces AI-generated insights. The differentiator is that the reasoning behind each insight and recommendation is streamed live to the UI via Server-Sent Events while the agents are still generating — not appended after the fact. Stack: FastAPI + Gemini 2.5 Flash/Pro + Firestore + Cloud Run, built with Claude Code via GitHub Copilot. &lt;a href="https://github.com/apoorvgpt9/carbonsaathi" rel="noopener noreferrer"&gt;Source on GitHub&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem (and why "awareness" isn't "tracking")
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Build an application that helps people track and reduce their everyday carbon footprint through simple actions and personalized insights.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The word that matters in that brief is &lt;em&gt;personalized insights&lt;/em&gt;, not &lt;em&gt;track&lt;/em&gt;. Most carbon apps already track. They have a chart, maybe a calculator, a tally of your CO₂e by category. What they lack is a reason to trust the number and a specific action to take because of it.&lt;/p&gt;

&lt;p&gt;The persona I designed for is Riya or Rahul, 28, a software engineer in a Tier-1 Indian metro — Bengaluru, Mumbai, Pune, Hyderabad, Delhi NCR. They pay their own electricity bill, commute by metro some days and Uber others, sometimes work from home. They're vaguely climate-aware but track nothing today, and wouldn't open a dedicated app to do it. The design follows from this: low-friction logging (plain English, not forms), no guilt-tripping copy, specific actionable advice, and Indian context everywhere.&lt;/p&gt;

&lt;p&gt;The framing that shaped the architecture: an insight that says "your transport is 71% of this week's footprint" is a tracker. An insight where I can see &lt;em&gt;how the AI bucketed 14 days of activities, what pattern it flagged, and why it proposed switching from a cab to metro&lt;/em&gt; — that's awareness. The difference is whether the reasoning is visible.&lt;/p&gt;

&lt;p&gt;Almost every submission in a category like this shows the final output. I decided early to surface the reasoning that produced it. That decision drove every architectural choice that followed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The differentiator: visible agent reasoning streamed live
&lt;/h2&gt;

&lt;p&gt;The insights endpoint returns Server-Sent Events. Here's what it actually looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Accept: text/event-stream"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     http://localhost:8080/api/insights/stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event: phase_start
data: {"event":"phase_start","phase":"analyst"}

event: reasoning
data: {"event":"reasoning","phase":"analyst","step":"Bucketed 14 days into this_week (8 activities) and last_week (6)."}

event: reasoning
data: {"event":"reasoning","phase":"analyst","step":"Transport is 71% of this week's 19.8 kg CO2e; weekday cab rides dominate."}

event: phase_complete
data: {"event":"phase_complete","phase":"analyst","status":"success","reason":null}

event: phase_start
data: {"event":"phase_start","phase":"coach"}

event: reasoning
data: {"event":"reasoning","phase":"coach","step":"Largest controllable bucket: 8 km weekday cab commute."}

event: reasoning
data: {"event":"reasoning","phase":"coach","step":"emission_service: petrol cab 0.170 vs metro 0.031 kg/km. Computed saving = 0.139 x 8 km x 2 x 5 = 11.1 kg/week."}

event: phase_complete
data: {"event":"phase_complete","phase":"coach","status":"success","reason":null}

event: done
data: {"event":"done","insights":[...],"recommendations":[...],"analyst_status":"success","coach_status":"success"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;reasoning&lt;/code&gt; text is model-generated and varies per run. The protocol structure — &lt;code&gt;phase_start&lt;/code&gt; → &lt;code&gt;reasoning&lt;/code&gt; → &lt;code&gt;phase_complete&lt;/code&gt; → &lt;code&gt;done&lt;/code&gt; — is fixed by the orchestrator. There is an 80 ms inter-event pacing on the SSE path only; a JSON-only &lt;code&gt;Accept&lt;/code&gt; header gets a consolidated payload with no pacing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Floqbochacscj3s480kcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Floqbochacscj3s480kcz.png" alt="Insights view streaming Analyst and Coach reasoning steps mid-generation" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each reasoning step and the final insight/recommendation are persisted to Firestore with an &lt;code&gt;agentReasoning&lt;/code&gt; field. The UI renders the live stream during generation and falls back to the persisted trace on subsequent views — so it's auditable, not just a visual effect.&lt;/p&gt;

&lt;p&gt;Notice the Coach's arithmetic in the stream above: &lt;code&gt;petrol cab 0.170 vs metro 0.031 kg/km&lt;/code&gt;. Those numbers come from a local factor table, not from the model's output. The model never sets a carbon number — it proposes the &lt;em&gt;swap&lt;/em&gt;, and the agent validates and computes the saving from the emission service. This invariant is one of the most important design decisions in the project, and I'll explain how it emerged in the prompts section.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Three sequential AI agents behind an async FastAPI service on Cloud Run. All emission arithmetic runs locally from cited Indian factor data; every model output is validated before it's trusted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    U["User browser&amp;lt;br/&amp;gt;Tailwind + vanilla JS"]
    FB["Firebase Auth&amp;lt;br/&amp;gt;Google Sign-In"]
    SM["Secret Manager&amp;lt;br/&amp;gt;gemini-api-key · firebase-api-key"]
    FS[("Firestore&amp;lt;br/&amp;gt;users / activities /&amp;lt;br/&amp;gt;insights / recommendations")]

    U --&amp;gt;|"Google Sign-In"| FB
    FB --&amp;gt;|"ID token"| U
    U --&amp;gt;|"Bearer ID token"| AUTH

    subgraph CR["FastAPI service · Cloud Run · asia-south1"]
        direction TB
        AUTH["verify_firebase_token&amp;lt;br/&amp;gt;uniform 401"]
        R["/api routes"]
        LOG["Logger Agent&amp;lt;br/&amp;gt;Gemini 2.5 Flash"]
        ORCH["Insight Orchestrator"]
        AN["Analyst Agent&amp;lt;br/&amp;gt;Gemini 2.5 Pro"]
        CO["Coach Agent&amp;lt;br/&amp;gt;Gemini 2.5 Flash"]
        EM["Emission Service&amp;lt;br/&amp;gt;grid · transport · food factors"]
        AUTH --&amp;gt; R
        R --&amp;gt; LOG
        R --&amp;gt; ORCH
        ORCH --&amp;gt; AN
        ORCH --&amp;gt; CO
        LOG --&amp;gt; EM
        CO --&amp;gt; EM
    end

    LOG --&amp;gt; FS
    ORCH --&amp;gt; FS
    SM -.-&amp;gt;|"runtime env vars"| CR
    ORCH ==&amp;gt;|"SSE: phase_start · reasoning ·&amp;lt;br/&amp;gt;phase_complete · done"| U
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Firestore data model carries &lt;code&gt;agentReasoning&lt;/code&gt; on every user-facing document — it's what powers the "show your work" UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;users/{uid}
  email, displayName, state, homeProfile{ bhk, hasAC, fridgeClass, dietary }
  onboardingComplete, createdAt, lastActive
  ├── activities/{id}      type, rawInput, structuredData, emissionKgCo2e,
  │                        confidence, emissionFactorSource, agentReasoning
  ├── insights/{id}        type, title, description, supportingActivities,
  │                        agentReasoning
  ├── recommendations/{id} type, title, expectedSavingKg, difficulty,
  │                        accepted, agentReasoning
  └── state/generation     analystStatus, coachStatus, lastCompletedAt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All user-facing time aggregations — "today", "this week", the activity streak — are computed in IST (&lt;code&gt;Asia/Kolkata&lt;/code&gt;) at read time. Timestamps are stored UTC. The streak uses a Duolingo-style same-day grace: if today has no activity yet, the streak counts backward from yesterday so it doesn't read as broken before you've logged.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F1ayw7jpalc98p1dk69p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F1ayw7jpalc98p1dk69p1.png" alt="CarbonSaathi dashboard showing today's footprint, 7-day breakdown, and activity streak" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The agent pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    START(["GET /api/insights/stream"]) --&amp;gt; STALE{"is_pipeline_stale?"}
    STALE --&amp;gt;|"No — cache fresh"| CACHED["2x phase_complete status=cached&amp;lt;br/&amp;gt;+ done · zero agent calls"]
    STALE --&amp;gt;|"Yes"| A1["emit phase_start: analyst"]
    A1 --&amp;gt; AN["Analyst · Gemini Pro&amp;lt;br/&amp;gt;buckets 14d activity by week"]
    AN --&amp;gt; AOUT{"analyst status"}
    AOUT --&amp;gt;|"empty / failed"| ASKIP["phase_complete: analyst&amp;lt;br/&amp;gt;coach skipped → done"]
    AOUT --&amp;gt;|"success"| AR["stream reasoning steps&amp;lt;br/&amp;gt;persist insights"]
    AR --&amp;gt; C1["emit phase_start: coach"]
    C1 --&amp;gt; CO["Coach · Gemini Flash&amp;lt;br/&amp;gt;proposes swaps"]
    CO --&amp;gt; CC["validate saving_basis ·&amp;lt;br/&amp;gt;compute kg from emission_service"]
    CC --&amp;gt; CR2["stream reasoning steps&amp;lt;br/&amp;gt;persist recommendations"]
    CR2 --&amp;gt; DONE(["done: insights + recommendations"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logger&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;Parse free-text input into a typed activity via function calling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analyst&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Pro&lt;/td&gt;
&lt;td&gt;Find patterns, trends, and milestones over a 14-day activity window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Coach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;Propose &lt;code&gt;swap&lt;/code&gt; / &lt;code&gt;reduce&lt;/code&gt; / &lt;code&gt;challenge&lt;/code&gt; recommendations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three design rules underpin the pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Coach computes savings; it never trusts the model for a number.&lt;/strong&gt; The model returns a typed &lt;code&gt;saving_basis&lt;/code&gt; — a discriminated union describing &lt;em&gt;what&lt;/em&gt; swap to make. The agent validates that description against the emission factor table and computes &lt;code&gt;expectedSavingKg&lt;/code&gt; locally. A model that says "save 3.2 kg/week" is often hallucinating; a model that says "swap petrol cab for metro, 8 km weekday commute" gives the agent everything it needs to compute the real number from real factors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every agent outcome is a typed discriminated union.&lt;/strong&gt; &lt;code&gt;LoggerOutcome&lt;/code&gt;, &lt;code&gt;AnalystOutcome&lt;/code&gt;, &lt;code&gt;CoachOutcome&lt;/code&gt; are all &lt;code&gt;Annotated[Union[Success, Empty, Rejected, Failed], Field(discriminator="status")]&lt;/code&gt;. Expected failures — governance rejection, low data volume, malformed JSON from the model — are values, not exceptions. Routes pattern-match the status field to an HTTP response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Staleness caching short-circuits the pipeline.&lt;/strong&gt; If nothing has changed since the last run (IST-day aligned, with separate 10-minute empty-result TTLs for Analyst and Coach), the orchestrator returns cached results and calls zero agents. The cached path emits &lt;code&gt;phase_complete(status="cached")&lt;/code&gt; events, so the UI response shape is consistent regardless of whether generation ran.&lt;/p&gt;




&lt;h2&gt;
  
  
  Built for India: the spec-alignment core
&lt;/h2&gt;

&lt;p&gt;This is where most carbon apps fail the India spec. Every number is India-specific and source-cited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Electricity — state grid factors&lt;/strong&gt; (kg CO₂e/kWh, CEA CO₂ Baseline Database v19.0, 2023–24). The same kWh of electricity has very different carbon depending on your state's generation mix:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;State&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sikkim&lt;/td&gt;
&lt;td&gt;0.38&lt;/td&gt;
&lt;td&gt;Hydro-dominant (Teesta cascade)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kerala&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;td&gt;Hydro + renewable mix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Karnataka / Tamil Nadu&lt;/td&gt;
&lt;td&gt;0.74&lt;/td&gt;
&lt;td&gt;Southern grid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maharashtra&lt;/td&gt;
&lt;td&gt;0.79&lt;/td&gt;
&lt;td&gt;Western regional baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delhi&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bihar / West Bengal / Odisha&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;td&gt;Eastern thermal grid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jharkhand&lt;/td&gt;
&lt;td&gt;1.05&lt;/td&gt;
&lt;td&gt;Coal-belt, highest modelled&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All 28 states and 8 UTs are in the factor file. Users who enter their electricity bill in rupees get a conversion using &lt;code&gt;AVG_INR_PER_KWH = 8.0&lt;/code&gt; — a flat average that's coarser than real slab-based DISCOM tariffs, so any bill-derived activity is forced to &lt;code&gt;confidence = "estimated"&lt;/code&gt; regardless of the grid factor's confidence level. The assumption is typed and documented, not buried in a comment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transport&lt;/strong&gt; (kg CO₂e/km — ICCT India, DMRC, India GHG Inventory):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Metro&lt;/td&gt;
&lt;td&gt;0.031&lt;/td&gt;
&lt;td&gt;Auto-rickshaw (CNG)&lt;/td&gt;
&lt;td&gt;0.066&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public bus&lt;/td&gt;
&lt;td&gt;0.039&lt;/td&gt;
&lt;td&gt;CNG taxi&lt;/td&gt;
&lt;td&gt;0.095&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-wheeler (EV)&lt;/td&gt;
&lt;td&gt;0.047&lt;/td&gt;
&lt;td&gt;Petrol taxi / cab&lt;/td&gt;
&lt;td&gt;0.170&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-wheeler (petrol)&lt;/td&gt;
&lt;td&gt;0.060&lt;/td&gt;
&lt;td&gt;Petrol car&lt;/td&gt;
&lt;td&gt;0.192&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Walking and WFH are zero by definition. The Logger prompt explicitly recognises auto-rickshaw, metro, bus, Uber/Ola/Rapido, two-wheeler, and WFH — the transport vocabulary of urban India.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Food&lt;/strong&gt; (kg CO₂e/serving — FAO Food Emissions Database + Indian dietary survey data; rice includes paddy-field methane via IRRI):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Veg thali&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;Egg (1)&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chicken meal&lt;/td&gt;
&lt;td&gt;2.10&lt;/td&gt;
&lt;td&gt;Dal serving&lt;/td&gt;
&lt;td&gt;0.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mutton (goat) meal&lt;/td&gt;
&lt;td&gt;4.50&lt;/td&gt;
&lt;td&gt;Rice serving&lt;/td&gt;
&lt;td&gt;0.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fish meal&lt;/td&gt;
&lt;td&gt;1.20&lt;/td&gt;
&lt;td&gt;Dairy (250 ml)&lt;/td&gt;
&lt;td&gt;0.63&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mutton is goat, not sheep — the relevant Indian market context. Dietary categories (vegetarian / non-vegetarian / eggetarian) are shaped for Indian dietary patterns. Paneer appears separately from dairy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools and selection rationale
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code via GitHub Copilot.&lt;/strong&gt; I used deliberate model rotation across the build. Sonnet 4.6 handled scaffolding-heavy phases (1A, 1C, 1D, 2, 3, 5A, 5B, 6, 7, 8, 9) where output volume mattered and specs were well-defined. Opus 4.8 handled high-stakes phases requiring structural reasoning (1B FastAPI core, 4A base agent + Logger, 4B Analyst + Coach, 5C SSE orchestration, 10 README polish). The logic: Opus costs more per token but makes fewer architectural errors on complex cross-file reasoning tasks where a wrong early decision compounds across the entire session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; for the Logger: function calling to parse free-text into a typed activity. Flash is fast enough for this use case and meaningfully cheaper than Pro at logging frequency. The Coach agent also runs on Flash — originally specified as Pro, but Flash was retained after Phase 9 end-to-end validation showed acceptable recommendation quality. More on this in the war stories section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt; for the Analyst: pattern detection across a 14-day activity window is the one task where the quality difference between Flash and Pro is consistent enough to be worth the cost. The Analyst receives pre-bucketed activity data (&lt;code&gt;this_week&lt;/code&gt;, &lt;code&gt;last_week&lt;/code&gt;, &lt;code&gt;earlier&lt;/code&gt;) already grouped in Python — date math is not delegated to the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FastAPI + Pydantic v2.&lt;/strong&gt; Async throughout, which matters when a single request path involves multiple Firestore reads and a Gemini call. Frozen Pydantic models for immutable domain objects. Discriminated unions for agent outcomes — the pattern that makes failure handling exhaustive rather than defensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firestore on the Spark free tier.&lt;/strong&gt; Integrates natively with Firebase Auth (same project, same uid namespace), handles demo traffic at zero cost, and has no server to manage. The constraint is real: it's sized for the demo, not for scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud Run in asia-south1.&lt;/strong&gt; Lower latency for Indian users than any US region. &lt;code&gt;min-instances=1&lt;/code&gt; keeps a warm instance to avoid cold-start latency at the demo's submission window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firebase Authentication with Google Sign-In.&lt;/strong&gt; The challenge required persistent user data. Firebase handles identity without a passwords database or JWT signing infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tailwind via CDN + vanilla ES modules, no build step.&lt;/strong&gt; The original spec had HTMX for progressive enhancement. HTMX is designed for hypermedia APIs that return HTML fragments — it expects &lt;code&gt;hx-get&lt;/code&gt;/&lt;code&gt;hx-post&lt;/code&gt; targets to respond with rendered HTML, not JSON. Every API endpoint in this app is a JSON API. Replacing HTMX with vanilla &lt;code&gt;fetch()&lt;/code&gt; + ES modules added hand-rolled client code but was the only design that worked cleanly with both the JSON API and the SSE stream that requires a Bearer header.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the prompts evolved
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The numbered-plan gate
&lt;/h3&gt;

&lt;p&gt;Early in the build, describing a feature to Copilot and pressing Enter would generate 8 files in one shot. Some of what it built was right; some was premature scaffolding for features two phases away. After Phase 1A, every prompt was given a mandatory final instruction: &lt;em&gt;"Output a numbered plan of what you will build and STOP. Do not write any files until I confirm."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The plan gate has a higher return-on-effort than any other single change I made to the prompting workflow. Two specific cases where it paid off: once the plan revealed Copilot intended to put business logic inside the route handler rather than the agent (caught before a line was written); once it exposed a Firestore schema design that would have required a migration in Phase 5. Both were fixed with a clarification at the plan stage, not a refactor.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Coach-computes-savings invariant
&lt;/h3&gt;

&lt;p&gt;The original Coach prompt asked the model to propose recommendations including expected kg savings. Testing the first implementation immediately revealed the problem: the model confidently returned specific numbers — "switching to metro saves 3.2 kg/week" — that had no relationship to the actual emission factors. The numbers varied between runs and were sometimes off by an order of magnitude.&lt;/p&gt;

&lt;p&gt;The fix was architectural, not just prompt-level. I changed the Coach's response schema to a &lt;code&gt;saving_basis&lt;/code&gt; discriminated union: either a transport swap (specify modes and distance), an electricity reduction (specify kWh or hours), or a food swap (specify categories and frequency). The model is never asked for a number. The agent receives the &lt;code&gt;saving_basis&lt;/code&gt;, validates it against &lt;code&gt;EmissionService&lt;/code&gt;, and computes &lt;code&gt;expectedSavingKg&lt;/code&gt; from the real factors. The model shapes the &lt;em&gt;description&lt;/em&gt; of a recommendation; it may not set its &lt;em&gt;carbon impact&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This pattern generalises: if you need quantitative output from an agent, design a schema that has the model return typed parameters and compute the quantity locally. It removes hallucination from the number and concentrates model responsibility on the part it's actually good at — understanding what kind of action to recommend.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 307-before-auth information leak
&lt;/h3&gt;

&lt;p&gt;During Phase 5B security review, I tested what happens when an unauthenticated client calls &lt;code&gt;/api/activities/&lt;/code&gt; (with trailing slash) instead of &lt;code&gt;/api/activities&lt;/code&gt;. FastAPI's default &lt;code&gt;redirect_slashes=True&lt;/code&gt; issues a 307 redirect to the canonical path. That redirect fires &lt;strong&gt;before&lt;/strong&gt; the auth dependency evaluates. An unauthenticated caller learns the route exists with a 307 instead of a 401 — a small but real information leak about the application's route map.&lt;/p&gt;

&lt;p&gt;The fix: &lt;code&gt;redirect_slashes=False&lt;/code&gt; on the &lt;code&gt;FastAPI()&lt;/code&gt; constructor, all bare-resource routes registered with empty string (&lt;code&gt;@router.post("")&lt;/code&gt; not &lt;code&gt;@router.post("/")&lt;/code&gt;). With this in place, the slashed variant returns 404 to all callers regardless of auth state. Every test client, every curl command, and the entire frontend had to match the slashless paths. Auditing and propagating this took about 20 minutes; not catching it would have been a security finding.&lt;/p&gt;

&lt;h3&gt;
  
  
  EventSource can't send Bearer headers
&lt;/h3&gt;

&lt;p&gt;The original SSE design used the browser's native &lt;code&gt;EventSource&lt;/code&gt; API. Writing the frontend for Phase 6 revealed the constraint: &lt;code&gt;EventSource&lt;/code&gt; does not support custom request headers. There is no standard way to send an &lt;code&gt;Authorization: Bearer&lt;/code&gt; token. But the uniform auth contract required every protected route to authenticate the same way — a Bearer header in every request, with a consistent 401 response on failure.&lt;/p&gt;

&lt;p&gt;The replacement was &lt;code&gt;fetch()&lt;/code&gt; + &lt;code&gt;ReadableStream&lt;/code&gt;. The client calls &lt;code&gt;fetch("/api/insights/stream", { headers: { Authorization: "Bearer ..." } })&lt;/code&gt;, reads the response body as a stream, decodes chunks, and parses the SSE event format manually. More boilerplate than &lt;code&gt;EventSource&lt;/code&gt;, but the security contract was not negotiable. The orchestrator has no knowledge of the transport — the 80 ms inter-event pacing lives in the route layer specifically so the orchestrator stays framework-agnostic.&lt;/p&gt;




&lt;h2&gt;
  
  
  GenAI vs designed: where the boundary is
&lt;/h2&gt;

&lt;p&gt;This distinction matters for understanding what "vibe coding" actually means when the output is a production-quality service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI generated:&lt;/strong&gt; Route handler skeletons and request/response DTOs. Pydantic model definitions for all domain objects. Test scaffolding and fixture files. Jinja2 HTML templates and CSS. Large portions of the agent prompt structures — GOOD/BAD examples, response schema definitions, system instruction text. Mermaid architecture diagrams. GCP deployment shell scripts. CI workflow YAML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I designed:&lt;/strong&gt; The three-agent split (Logger/Analyst/Coach vs a single mega-prompt or a different decomposition). Visible reasoning as the specific rubric differentiator — streaming agent traces to the UI while generation is in flight is not a standard pattern, and manual evaluators notice it. The India-specific data inclusion: deciding that state-level grid factors, INR→kWh conversion, IST timezones, and Indian dietary categories were worth the implementation cost, and sourcing the actual numbers. Every load-bearing convention: slashless routes, uniform 401 contract, Coach computes savings, IST everywhere. The phase-by-phase build sequence and the validation gauntlet at each phase boundary. The model rotation strategy (Sonnet for volume phases, Opus for reasoning-heavy phases). The deliberate trade-offs documented in the amendments log — each is a decision I made, not a default the framework imposed.&lt;/p&gt;

&lt;p&gt;The line isn't "AI wrote the code, I had the idea." AI is a capable implementation partner for well-specified subtasks. The hard work is specifying those subtasks accurately, sequencing them correctly, knowing when the output has drifted from the spec, and knowing which decisions require a human to own them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Engineering war stories
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The CSP hole that made Firebase Auth fail generically
&lt;/h3&gt;

&lt;p&gt;Firebase Auth's &lt;code&gt;signInWithPopup&lt;/code&gt; was failing with &lt;code&gt;auth/internal-error&lt;/code&gt;. That's one of the least informative error codes in the Firebase SDK — it means "something went wrong in the network layer, and I'm not going to tell you what." I checked credentials, verified the Firebase project configuration, tested the token exchange flow manually. Nothing pointed at the cause.&lt;/p&gt;

&lt;p&gt;The actual problem was visible in the browser's Network tab: a request to &lt;code&gt;apis.google.com&lt;/code&gt; was flagged as &lt;code&gt;(blocked:csp)&lt;/code&gt;. My Content Security Policy's &lt;code&gt;connect-src&lt;/code&gt; directive covered &lt;code&gt;*.googleapis.com&lt;/code&gt; but missed &lt;code&gt;apis.google.com&lt;/code&gt; — a separate hostname Firebase uses for some Auth operations. The fix was one CSP directive. The debugging cost was about an hour of checking the wrong layer.&lt;/p&gt;

&lt;p&gt;The lesson: when Firebase Auth fails with a generic error, open the browser Network tab before touching credentials. CSP blocks appear as &lt;code&gt;(blocked:csp)&lt;/code&gt; there and are invisible at the application layer. The auth SDK wraps them as "internal-error" with no further context.&lt;/p&gt;

&lt;h3&gt;
  
  
  The timestamp format that silently zeroed Firestore range queries
&lt;/h3&gt;

&lt;p&gt;Dashboard range queries were returning empty lists in production against real data. The symptom: &lt;code&gt;list_activities_in_range(uid, start, end)&lt;/code&gt; returned nothing even with activities logged in the window. The test suite at 99.7% coverage had never caught it because every Firestore test mocked the client — the mocks returned pre-populated results and never exercised the real storage path.&lt;/p&gt;

&lt;p&gt;The root cause was a serialization mismatch. &lt;code&gt;model_dump(mode="json")&lt;/code&gt; — used for Firestore writes — serializes UTC datetimes as &lt;code&gt;"2026-06-20T14:30:00Z"&lt;/code&gt; (Z suffix). &lt;code&gt;datetime.isoformat()&lt;/code&gt; — used to generate the range bounds passed to the filter — produces &lt;code&gt;"2026-06-20T14:30:00+00:00"&lt;/code&gt; (+00:00 suffix). Firestore was comparing these as strings, and &lt;code&gt;"Z"&lt;/code&gt; ≠ &lt;code&gt;"+00:00"&lt;/code&gt; lexicographically. Every range query silently matched zero documents.&lt;/p&gt;

&lt;p&gt;The fix was to pass &lt;code&gt;datetime&lt;/code&gt; objects directly to Firestore rather than pre-serialized strings, letting the SDK own format consistency. High test coverage on mocked dependencies does not validate the interface contract with the real system. At least one "write then read" path through actual storage — or an emulator — per data type that involves datetime comparison is worth adding before the demo.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Gemini Pro quota wall at 11 PM Saturday
&lt;/h3&gt;

&lt;p&gt;Phase 7 was the first phase hitting the live Gemini API from a browser frontend. Around 11 PM on Saturday, the insights stream started returning 429 errors specifically for &lt;code&gt;gemini-2.5-pro&lt;/code&gt; with a zero daily quota remaining. The free tier has a per-day limit on Pro; I'd exhausted it building and testing Phase 7.&lt;/p&gt;

&lt;p&gt;Waiting for midnight wasn't an option — submission was Sunday evening and there was still a full day of work left. Options: switch to Flash temporarily, or push billing changes immediately and hope quota propagated in time (billing changes are not instant). I switched both Analyst and Coach to Flash and continued.&lt;/p&gt;

&lt;p&gt;The next morning, after confirming billing was live and Pro quota had reset, I restored the Analyst to Pro. The Coach stayed on Flash. That wasn't emergency triage — it was a validation outcome. After a full end-to-end test with Coach on Flash, recommendation quality was acceptable. Switching Coach back to Pro introduced deployment risk at the submission window with no measurable quality gain for this structured-output use case.&lt;/p&gt;

&lt;p&gt;The lesson is about model selection order: test with Flash first, and only upgrade to Pro when Flash demonstrably fails the quality bar. Flash nearly always meets it for structured-output tasks where the schema is tight and the reasoning depth requirement is moderate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations and trade-offs
&lt;/h2&gt;

&lt;p&gt;These are real engineering trade-offs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Food factors are modelled estimates, not measurements.&lt;/strong&gt; Every food activity carries &lt;code&gt;confidence: "estimated"&lt;/code&gt;. A veg thali is modelled at 0.90 kg CO₂e on a representative dal + sabzi + roti + rice basis; real meals vary widely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Electricity-from-bill uses a flat ₹8/kWh.&lt;/strong&gt; Any bill→kWh conversion is forced to &lt;code&gt;confidence: "estimated"&lt;/code&gt; regardless of the grid factor's confidence level. Real tariffs are slab-based and DISCOM-specific.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grid factors are state-level annual averages&lt;/strong&gt; — no DISCOM-level or time-of-day resolution. Coal-belt outliers (Jharkhand 1.05) are modelled adjustments above the regional average, not directly measured.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The reasoning stream is a paced replay&lt;/strong&gt; (80 ms/event) of the agent's real structured trace, not token-level model streaming. It faithfully shows the reasoning steps the agent produced; it is not raw Gemini output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One reasoning trace is denormalised across the 1–3 items&lt;/strong&gt; a single Gemini call produces. The UI reads the first item's trace as canonical for the session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three agents, not four.&lt;/strong&gt; Recommendations are not adversarially reviewed by a Devil's Advocate model — cut for time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;English-only, India-only, web-only&lt;/strong&gt; — deliberate for v1, but a real limit for non-metro and non-English users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coach runs on Gemini 2.5 Flash, not Pro&lt;/strong&gt;, as a pragmatic deploy-stability trade-off from the submission window. Recommendation quality is acceptable end-to-end but nominally bounded below Pro. Reverting to Pro is a single-line change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;min-instances=1&lt;/code&gt;&lt;/strong&gt; on Cloud Run — small standing cost chosen over cold-start latency for the demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firestore on the Spark free tier&lt;/strong&gt; is sized for the demo, not for load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage is 99.68%, not 100%&lt;/strong&gt; — five defensive branches in &lt;code&gt;firestore_service.py&lt;/code&gt; remain intentionally uncovered.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech stack and quality bar
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Python 3.13.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (async, Pydantic v2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Server-rendered Jinja2 + Tailwind (CDN) + vanilla ES-module JS, no build step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Logger&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash (function calling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Analyst&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Coach&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Firestore (Spark free tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;Firebase Authentication — Google Sign-In&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Cloud Run, &lt;code&gt;asia-south1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets&lt;/td&gt;
&lt;td&gt;Google Secret Manager → env at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Structured JSON → Cloud Logging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;483 tests passing, 99.68% line + branch coverage against a 95% gate. &lt;code&gt;mypy --strict&lt;/code&gt; clean across all source files. &lt;code&gt;ruff&lt;/code&gt; clean, &lt;code&gt;bandit&lt;/code&gt; clean, &lt;code&gt;pip-audit&lt;/code&gt; clean. Per-user rate limiting via &lt;code&gt;slowapi&lt;/code&gt;. Full OWASP Top 10 (2021) walkthrough shipped as &lt;code&gt;SECURITY.md&lt;/code&gt;. CI runs ruff, black, mypy, pytest, bandit, and a Docker build on every push.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://carbonsaathi-ahkpdce5pa-el.a.run.app" rel="noopener noreferrer"&gt;Live demo&lt;/a&gt; — sign in with any Google account, complete the onboarding (state + home profile), log an activity, and open Insights to watch the pipeline run live.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/apoorvgpt9/carbonsaathi" rel="noopener noreferrer"&gt;Source code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built solo for PromptWars Challenge 3 (June 2026). Data sources: CEA CO₂ Baseline Database v19.0, ICCT India, DMRC, FAO Food Emissions Database, IRRI, Indian dietary survey data. License: MIT.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>gemini</category>
      <category>fastapi</category>
      <category>hackathon</category>
    </item>
    <item>
      <title>Captain Cool: How I Built a Multi-Agent IPL Strategist with Gemini in 3 Hours</title>
      <dc:creator>Apoorv Gupta</dc:creator>
      <pubDate>Sun, 17 May 2026 12:38:41 +0000</pubDate>
      <link>https://dev.to/apoorv_gupta_a6a859429e14/captain-cool-how-i-built-a-multi-agent-ipl-strategist-with-gemini-in-3-hours-4egc</link>
      <guid>https://dev.to/apoorv_gupta_a6a859429e14/captain-cool-how-i-built-a-multi-agent-ipl-strategist-with-gemini-in-3-hours-4egc</guid>
      <description>&lt;p&gt;Cricket is a captain's game. Every over, someone has to make a call — who bowls, who bats next, when to take the timeout, when to play the Impact Player. These decisions live in the gap between data and instinct.&lt;/p&gt;

&lt;p&gt;I wanted to build an AI that operates in that gap — not just analysing the match, but &lt;em&gt;arguing about it&lt;/em&gt;. An AI that proposes a call, gets challenged on it, defends or revises, and then explains the whole debate in cricket language any fan can understand.&lt;/p&gt;

&lt;p&gt;This is Captain Cool. Here's how I built it at the Agentic Premier League in Pune, powered by Google Gemini.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Most "AI" Sports Tools
&lt;/h2&gt;

&lt;p&gt;Before I built anything, I looked at what other people had built at similar events. The pattern was consistent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A polished UI&lt;/li&gt;
&lt;li&gt;Hardcoded or mock data&lt;/li&gt;
&lt;li&gt;A README full of words like "AI-powered", "neural analytics", "predictive intelligence"&lt;/li&gt;
&lt;li&gt;Zero actual Gemini API calls in the codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ones that &lt;em&gt;did&lt;/em&gt; use Gemini made a single API call and formatted the response into multiple labelled sections — calling it "multi-agent" when it was really just one prompt with four JSON fields.&lt;/p&gt;

&lt;p&gt;I wanted to build something where you could open the code and see four distinct agents, each making their own Gemini call, each with their own system prompt, each contributing something the others can't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture: The Debate Pipeline
&lt;/h2&gt;

&lt;p&gt;The system runs a 5-step sequential pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Stats Analyst (with function calling)
    ↓ statistical report
Strategist (proposes tactical call)
    ↓ proposal
Devil's Advocate (challenges the proposal)
    ↓ challenge
Strategist (defends or revises)
    ↓ final decision
Commentator (translates to cricket talk)
    ↓ broadcast commentary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step is a separate Gemini call. Each agent has its own system prompt. The Strategist runs &lt;em&gt;twice&lt;/em&gt; — once to propose and once to revise after the challenge.&lt;/p&gt;

&lt;p&gt;The entire pipeline streams to the frontend via Server-Sent Events, so you watch the debate unfold in real time, agent by agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent 1: Stats Analyst — Real Function Calling
&lt;/h2&gt;

&lt;p&gt;The Stats Analyst is the only agent with tool access. This is where the genuine agentic behavior lives.&lt;/p&gt;

&lt;p&gt;I defined four tool functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_player_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;player_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stat_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phase&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get batting or bowling stats for a player in powerplay/middle/death&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_matchup_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bowler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get head-to-head stats between a specific batter and bowler&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_win_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batting_team&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wickets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;innings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate win probability with momentum indicator&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_venue_conditions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;venue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get pitch conditions, avg scores, altitude/dew effects&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I passed these to Gemini as native tools — not as injected text in the prompt. Gemini sees the function signatures, decides which ones to call, calls them with appropriate arguments, receives the results, and synthesizes a statistical report.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ALL_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function-calling loop runs until Gemini stops requesting tools — typically 4-7 calls per analysis. Every call is logged so you can watch Gemini's reasoning chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;CALL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_player_stats('Priyansh&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Arya',&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'batting',&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'powerplay')&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;RESULT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"sr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;226.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"avg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;36.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"vs_pace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;218.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;CALL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_matchup_data('Priyansh&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Arya',&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'Bhuvneshwar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Kumar')&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;RESULT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"balls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"runs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dismissals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;66.7&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;CALL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;calculate_win_probability('PBKS',&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;209&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;')&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;TOOL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;RESULT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"batting_win_prob"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;31.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"required_run_rate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.44&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;System Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the Stats Analyst for an IPL team's think tank. Your role is 
to process match data and provide ONLY statistical analysis — no opinions, 
no recommendations.

You have access to tools. USE THEM before answering. Do not guess statistics.
Call multiple tools to build a comprehensive picture.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agent 2: Strategist — The Captain's Call
&lt;/h2&gt;

&lt;p&gt;The Strategist uses &lt;code&gt;gemini-2.5-pro&lt;/code&gt; — the most powerful Gemini model — because the tactical call is the highest-stakes output in the system.&lt;/p&gt;

&lt;p&gt;It receives the stats report and proposes one specific decision in a structured format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DECISION: [One clear sentence]
REASONING: [2-3 sentences in cricket language, not ML jargon]
CONFIDENCE: [High/Medium/Low] — [one-line justification]
RISK: [What could go wrong — specific player and scenario]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structured format was deliberate. It forces the model to commit to one call and own the risk, rather than hedging with "it depends."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the Captain. Think step by step:
1. What phase is the match in?
2. What does the data say about available options?
3. What is the highest-leverage decision right now?

Your decision must cover exactly ONE of:
- Who bowls the next over
- Batting order change
- Field placement shift
- Strategic timeout timing
- Impact Player deployment

Be bold. A captain who hedges every call loses matches.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agent 3: Devil's Advocate — The Contrarian
&lt;/h2&gt;

&lt;p&gt;This is the innovation that makes Captain Cool different from everything else at the event.&lt;/p&gt;

&lt;p&gt;Most AI systems give you one answer. Captain Cool shows you the argument &lt;em&gt;against&lt;/em&gt; that answer.&lt;/p&gt;

&lt;p&gt;The Devil's Advocate receives the Strategist's proposal and must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify the biggest assumption the Strategist made&lt;/li&gt;
&lt;li&gt;Present a specific counter-scenario where the decision fails (with player names and stats)&lt;/li&gt;
&lt;li&gt;Suggest a concrete alternative&lt;/li&gt;
&lt;li&gt;Deliver a verdict: AGREE / DISAGREE / CONDITIONAL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;System Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the Devil's Advocate in the IPL captain's think tank.
Your ONLY job is to challenge the Strategist's proposal.

You are not contrarian for fun. You genuinely find where this decision fails.
Think like a batting coach who knows exactly how the opposition exploits this.

CHALLENGE: [Core weakness — specific, with stats]
COUNTER-SCENARIO: [Specific failure mode — over, player, numbers]
ALTERNATIVE: [Different decision + why it's better]
VERDICT: [AGREE / DISAGREE / CONDITIONAL]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agent 4: Commentator — Cricket for Everyone
&lt;/h2&gt;

&lt;p&gt;The Commentator receives the full debate — all four prior outputs — and translates everything into 4-5 sentences of broadcast-style commentary.&lt;/p&gt;

&lt;p&gt;No ML jargon. No bullet points. Just cricket talk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a cricket commentator in the style of Harsha Bhogle.

Rules:
- Write in present tense, as if commentating live
- Reference specific players by name
- Include the why-this-not-that explanation
- Sound like an IPL broadcast, not an academic paper
- End with one sentence about what to watch for next
- Under 100 words, pure flowing prose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Orchestrator — Wiring It Together
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;orchestrator.py&lt;/code&gt; runs the pipeline sequentially, yielding SSE events at each step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_debate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;match_state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MatchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1
&lt;/span&gt;    &lt;span class="n"&gt;stats_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stats_analyst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_sse_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stats_analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...})&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2
&lt;/span&gt;    &lt;span class="n"&gt;proposal_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strategist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;propose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_sse_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strategist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...})&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3
&lt;/span&gt;    &lt;span class="n"&gt;challenge_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;devils_advocate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_sse_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;devils_advocate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...})&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4 — Strategist reviews and revises
&lt;/span&gt;    &lt;span class="n"&gt;revision_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strategist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;revise&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_sse_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strategist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...})&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5
&lt;/span&gt;    &lt;span class="n"&gt;commentary_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;commentator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;narrate&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_sse_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;commentator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend subscribes to the SSE stream and renders each agent's output as it arrives — color-coded, labeled, and displayed sequentially.&lt;/p&gt;




&lt;h2&gt;
  
  
  Live Demo: PBKS vs RCB, Dharamsala
&lt;/h2&gt;

&lt;p&gt;Here's what the system produced for today's real match:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt; PBKS chasing 209, 0/0 in 0.0 overs. Bhuvneshwar Kumar to open the bowling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stats Analyst output&lt;/strong&gt; (after 5 tool calls):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Priyansh Arya SR 226.5 overall — but drops to 66.7 vs Bhuvneshwar with 2 dismissals in just 12 balls. Bhuvneshwar leads IPL 2026 wicket charts at 22 wickets, economy 7.46. HPCA Dharamsala altitude amplifies his swing. PBKS win probability: 31% at required RRR of 10.44.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Strategist proposal:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;DECISION: Promote Shreyas Iyer to No. 3 — ready to walk in on first powerplay wicket.&lt;br&gt;
REASONING: Iyer averages 54.7 at SR 163 this season. The Arya-Bhuvi matchup (SR 66.7, 2 dismissals) makes an early wicket almost certain. PBKS need consolidation followed by acceleration.&lt;br&gt;
CONFIDENCE: High — matchup data and Bhuvi's form both point this way.&lt;br&gt;
RISK: Kills scoring rate if Arya survives. Iyer coming in cold vs Hazlewood at altitude.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Devil's Advocate challenge:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;CHALLENGE: Iyer at 3 kills the aggression PBKS need at 10.44 RRR.&lt;br&gt;
COUNTER-SCENARIO: Stoinis as Impact Player provides left-arm option vs right-arm Bhuvi — disrupts the angle immediately.&lt;br&gt;
ALTERNATIVE: Stoinis at 3 on early wicket, not Iyer. Left-right combination.&lt;br&gt;
VERDICT: CONDITIONAL — Iyer at 3 works only if wicket falls after over 5.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Strategist revision:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;VERDICT: REVISE. Stoinis as Impact Player at 3 preserves the left-right combination. Iyer holds at 4. The Devil's Advocate correctly identified the left-hander gap.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Commentator:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The think tank goes for Stoinis as the impact card — left-hander at three to counter Bhuvneshwar's angle. They looked at Iyer, but with a run rate needing to touch double digits from ball one, this isn't a day for safety first. Watch for the Stoinis signal from the dugout early — if Punjab's openers go inside five, the whole innings shape changes."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prompt structure matters more than model size.&lt;/strong&gt; The Devil's Advocate prompt was the hardest to write — making it genuinely contrarian without being destructively negative took four iterations. The final version uses the framing "think like a batting coach who knows exactly how the opposition exploits this" which produced the best challenges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSE streaming is the right output format for agent systems.&lt;/strong&gt; Showing each agent's output as it generates — rather than waiting for all five — made the system feel alive in a way a JSON dump never could. The judges watched the debate unfold in real time. That experience is the demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Function calling vs RAG is visible in the output quality.&lt;/strong&gt; When Gemini chooses its own tools and sequences them autonomously, the analysis cites specific numbers from specific tool calls. When you inject data manually, the model summarizes it. The difference in specificity is noticeable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Devil's Advocate pattern is underused.&lt;/strong&gt; Every AI project I've seen shows one answer. Showing the disagreement — the argument against the recommended call — builds more trust than a confident single response. Judges found it more interesting than the decision itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt; — Strategist agent (proposal phase)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; — All other agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini native function calling&lt;/strong&gt; — Stats Analyst tool use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python FastAPI&lt;/strong&gt; — Backend with SSE streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React 18 + Tailwind&lt;/strong&gt; — Frontend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Run&lt;/strong&gt; — Deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Antigravity&lt;/strong&gt; — Primary IDE throughout the build&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Live Demo Screenshots:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hp4xtalqlil15w5colk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hp4xtalqlil15w5colk.png" alt=" " width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1evde72aiykble2h90j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1evde72aiykble2h90j.png" alt=" " width="800" height="652"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vk9mspbhbxelzc1vcc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vk9mspbhbxelzc1vcc2.png" alt=" " width="800" height="728"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nzge8tkial2o7mwtzmz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nzge8tkial2o7mwtzmz.png" alt=" " width="800" height="994"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/apoorvgpt9/APL-multi-agent-match-strategist.git" rel="noopener noreferrer"&gt;https://github.com/apoorvgpt9/APL-multi-agent-match-strategist.git&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Demo:&lt;/strong&gt; Upcoming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google AI Studio Prompts:&lt;/strong&gt; &lt;a href="https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221c-J9lQnmzb454DidqyrkeZOgOKWPDqMs%22%5D,%22action%22:%22open%22,%22userId%22:%22101674726802407992704%22,%22resourceKeys%22:%7B%7D%7D&amp;amp;usp=sharing" rel="noopener noreferrer"&gt;https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221c-J9lQnmzb454DidqyrkeZOgOKWPDqMs%22%5D,%22action%22:%22open%22,%22userId%22:%22101674726802407992704%22,%22resourceKeys%22:%7B%7D%7D&amp;amp;usp=sharing&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Huge thanks to GDG Cloud Pune for putting together &lt;br&gt;
the Agentic Premier League — built Captain Cool, &lt;br&gt;
a multi-agent IPL strategist powered by Gemini, &lt;br&gt;
while watching PBKS chase 209 at Dharamsala live &lt;br&gt;
on the big screen.&lt;/p&gt;

&lt;h1&gt;
  
  
  GoogleCloud #GoogleCloudAPL #BuildwithAI #GDGCloudPune
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Built at Agentic Premier League, GDG Cloud Pune, May 17 2026&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Vibe-coded with Google Antigravity in 3 hours&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>agenticai</category>
      <category>gdgcloudpune</category>
    </item>
  </channel>
</rss>
