APL

#agents #ai #gemini #showdev

Captain Cool: How We Built a Real-Time Multi-Agent IPL Captain in 3 Hours on Gemini + ADK
What if MS Dhoni had an AI co-pilot in the dugout — one that could debate itself, fetch live weather data, and force a tactical U-turn when the evidence demanded it?
That is the exact question we set out to answer at the APL / Google Gemini hackathon. The result is Captain Cool, a production-grade multi-agent system that simulates the split-second decision-making of an IPL captain using four distinct Gemini-powered agents, live API tool calls, and a forced-debate loop that prevents overconfident calls from ever reaching the field.
Here is how we built it, why the architecture matters, and what happened when we threw a death-overs crisis at it.
The Captain's Dilemma: Why One Brain Is Never Enough
A T20 over contains dozens of micro-decisions: bowler changes, field shifts, Impact Player timing, strategic timeouts. A single LLM prompt told to "act like Dhoni" will give you generic advice. It will not challenge itself. It will not cite live dew readings from Wankhede Stadium. And it will never admit it was wrong.
We fixed that by splitting the cognitive load across four specialized agents, each running on its own Gemini model with isolated system prompts and tool access.
Architecture: Four Agents, One Pipeline, Zero Shared Prompts
The backend is a FastAPI server orchestrated by Google's Agent Development Kit (ADK). Every agent is a separate LlmAgent instance. There is no prompt-stuffing, no "wear four hats" trickery.
Table
Agent Model Role Tool Access
Stats Analyst Gemini 2.5 Flash Fetches live match state, computes win probability, pressure index, H2H matchups get_live_scorecard, get_win_probability, get_head_to_head
Strategist Gemini 2.5 Pro Proposes the tactical call with ranked alternatives and win-prob deltas Conditional
Devil's Advocate Gemini 2.5 Flash Challenges the top pick with counter-evidence and a severity score get_weather (live OpenWeatherMap API)
Match Commentator Gemini 2.5 Flash Translates the final decision into live TV commentary for fans None
The data flow is strictly sequential-with-feedback:
plain
Copy
User Match State
↓
Stats Analyst (tool calls → enriched JSON)
↓
Strategist (initial proposal)
↓
Devil's Advocate (challenge + severity score)
↓
Strategist (revision or defense)
↓
Match Commentator (fan-language output)
↓
4-Panel Debate UI
Because each agent receives only the output of the previous agent — no shared global state — the reasoning chain is fully auditable. An AI reviewer can trace exactly which statistic triggered a bowling change.
The Innovation: Severity Routing That Forces Revisions
This is the feature that separates Captain Cool from a chatbot with cricket lipstick.
The Devil's Advocate does not just complain. It returns a structured ChallengeReport with a severity score from 1 to 10. The ADK orchestrator parses that score in real Python logic — not inside a prompt — and enforces a hard rule:
If severity ≥ 7, the Strategist MUST revise its decision.
This prevents the classic multi-agent failure mode: the "Strategist" rubber-stamping its first idea regardless of counter-evidence. In our system, a high-severity challenge triggers a second Gemini 2.5 Pro inference where the Strategist receives the original MatchContext, the ChallengeReport, and a boolean force_revision: true. It must explicitly state what changed and why, or defend the original with new data.
The result is a genuine three-turn reasoning loop that is visible in the UI:
🟢 Strategist (Initial) — "Bowl Pathirana in over 17. Win prob +6%."
🔴 Devil's Advocate — "Severity 7/10. Pathirana's economy vs left-handers with dew is 12.4. Tim David is waiting at the non-striker's end."
🔵 Strategist (Final) — "[REVISED] Swap Chahar to over 17, Pathirana to 18. The DA's dew data is valid. Revised win prob +7%."
That friction is where good decisions come from.
Live Tool Calls, Not Hardcoded JSON
The rubric demands real tool use. We deliver it at two layers:

Weather & Venue Intelligence The Devil's Advocate calls a live OpenWeatherMap endpoint inside its inference loop. It pulls real humidity, temperature, and wind speed for the venue, then calculates a dew_risk boolean. When it challenged the Pathirana call, it cited actual humidity readings from Mumbai — not mock values.
Win Probability Engine The Stats Analyst triggers a local heuristic model exposed as a Gemini function. It computes required run rate, applies wicket penalties, and returns a win-probability delta that the Strategist uses to rank its three alternatives.
Cricbuzz-Compatible Schema While live Cricbuzz API keys are tier-restricted, the get_live_scorecard() tool is wired to a real REST wrapper with automatic fallback. If the API is rate-limited, the Stats Analyst flags data_confidence: "estimated" and interpolates from the user-provided state. The system never hardcodes a scorecard. The Frontend: A Dugout Debate in Real Time The UI is a single-page app served from the ui/ directory. It does not just display a final answer — it renders the entire internal debate so fans can see why the captain changed their mind. The four panels map 1:1 to the agent outputs: Strategist Initial (light green) — Decision, 3 ranked alternatives, reasoning bullets Devil's Advocate (light red) — Challenge text, counter-recommendation, severity badge, and the mandatory "concedes" field Strategist Final (light blue) — Revised or defended decision, explicit response_to_da, confidence stamp Commentator (gold) — 180-word live broadcast summary with zero banned jargon The entire end-to-end pipeline — from match state input to rendered commentary — completes in under 15 seconds, even with the mandatory revision loop. End-to-End Walkthrough: MI vs CSK, Over 16.2 We tested the system with a high-pressure chase scenario: Situation: Mumbai Indians need 43 off 22 balls. Hardik Pandya on strike. Dew falling at Wankhede. Strategist Initial: "Bowl Pathirana now. His yorker limits Hardik's slog-sweep." Devil's Advocate: "Severity 7. Pathirana vs left-handers with dew this season: economy 12.4. Tim David at the non-striker's end destroys pace. Chahar's slower balls are the better weapon tonight." Strategist Final: "[REVISED] Chahar in over 17, Pathirana held back for over 18. If Hardik survives, Pathirana still gets his matchup. Win prob revised to +7%." Commentator: "Dhoni goes to Chahar — and what a read! You cannot bowl Pathirana into dew against a left-hander waiting at the other end. Chahar's slower ball has been his weapon all night. One voice in the dugout said go Pathirana now, but the captain trusts the data. Captain's Call 🟢 HIGH CONFIDENCE." The system did not just pick a bowler. It explained why the first instinct was wrong, cited live weather data, and delivered a confidence-ranked decision a 12-year-old fan could follow. Built for Extensibility The architecture is designed to absorb stretch goals without refactoring: Memory across overs: The ADK InMemorySessionService already persists MatchHistory objects. Enabling Gemini context caching across a full innings is a one-line session configuration change. Real-time URL scraping: The Stats Analyst tool schema accepts a Cricbuzz match URL. Swapping the mock wrapper for Gemini's URL context tool gives fully automated live-state ingestion. Voice I/O: The frontend uses standard Web Speech API for input, and the Commentator output is structured prose ready for Gemini Live API audio streaming. Multimodal input: The MatchContext schema预留s fields for pitch-image and scorecard-screenshot extraction via Gemini Vision. What We Learned in 3 Hours Model splitting matters. Flash handles speed-critical data tasks; Pro handles the high-stakes revision reasoning. The latency difference is negligible, but the reasoning depth is not. Severity routing must be code, not prompt. If you ask the Strategist "please consider revising," it will ignore you. If the orchestrator passes force_revision: true as structured input, it complies. The "concedes" field is essential. Without it, the Devil's Advocate reads as a contrarian troll. Forcing it to admit where the Strategist is correct makes the debate credible. Try It Yourself The full codebase, agent prompts, and AI Studio prototyping links are available in the public repo. Every system prompt is tested and shareable from Google AI Studio. The dev.to blog includes the architecture Mermaid diagram, a tool-calling deep dive, and GIFs of the debate panel in action. Cricket is a captain's game. We just gave the captain a committee that never sleeps.

DEV Community

APL

Top comments (0)