Aditya Yuvraj Patil

Posted on May 17

Building a Multi-Agent AI Strategist with Gemini Function Calling, SSE & Vision API

#gdgcloudpune

I Built an AI Captain That Debates Before It Decides — Multi-Agent Cricket Strategy with Gemini Function Calling

"In cricket, the best captains don't react — they think three overs ahead."
Captain Cool is the AI version of that captain.

The Spark: What if AI Could Captain an IPL Team?

It was late into the IPL season. MI needed 44 off 24 balls. Dew was heavy. Pathirana had 2 overs left. Jadeja had bowled 2. The question every cricket fan was yelling at their TV: who bowls the next over?

That question became Captain Cool.

I didn't want to build another cricket statistics dashboard. I wanted to build something that argues with itself — a system where AI agents with different philosophies debate a live tactical decision, challenge each other's reasoning, and only then hand down a verdict.

The result is an agentic multi-model war room, powered by Gemini 2.5 Flash Function Calling, OpenWeatherMap, Sportmonks Cricket API, real-time Server-Sent Events, and a Vision API that reads match screenshots. This post is the complete technical deep-dive into how I built it, what broke spectacularly, and what genuinely surprised me about modern agentic AI.

The Architecture: A War Room, Not a Chatbot

Most "AI" apps in the sports space are wrappers — they shove stats into a prompt and return a response. Captain Cool is fundamentally different. It operates as a multi-agent deliberation pipeline with four distinct agents that are architecturally separated by role, personality, and access to tools.

User Input (Match State)
        │
        ▼
┌──────────────────────────────────────────────────────┐
│              Stats Analyst (Gemini FC)               │
│  ┌────────────┐  ┌───────────────┐  ┌─────────────┐ │
│  │ Sportmonks │  │ OpenWeatherMap│  │ Win Prob Fn │ │
│  └────────────┘  └───────────────┘  └─────────────┘ │
└─────────────────────────┬────────────────────────────┘
                          │ Fact Sheet (structured)
                          ▼
              ┌───────────────────┐
              │    Strategist     │  ← proposes decision
              └─────────┬─────────┘
                        │
                        ▼
              ┌───────────────────┐
              │  Devil's Advocate │  ← challenges decision
              └─────────┬─────────┘
                        │
                        ▼
              ┌───────────────────┐
              │ Strategist Revised│  ← defends or pivots
              └─────────┬─────────┘
                        │
                        ▼
              ┌───────────────────┐
              │   Commentator     │  ← final verdict, drama
              └─────────┬─────────┘
                        │
                        ▼
              🏏 Captain's Call JSON

Every agent speaks through Server-Sent Events (SSE) — so the frontend gets tokens word-by-word, as if each agent is genuinely thinking and typing in real time.

The Stack

Layer	Technology	Why
Backend	FastAPI + Python	Async SSE streaming, clean Pydantic schemas
LLM	Gemini 2.5 Flash	Sub-2s TTFT, function calling, vision
Cricket Data	Sportmonks Cricket API	Live scores, player stats, ball-by-ball
Weather	OpenWeatherMap API	Venue-specific humidity, dew point
Frontend	React 19 + Vite	Hooks-based SSE consumer, fast HMR
Markdown	react-markdown + remark-gfm	Render agent output as formatted prose
Styling	Vanilla CSS (glassmorphism)	No framework tax, full control

Part 1: The Stats Analyst — True Gemini Function Calling

This is the most important technical piece. Early versions of this project had a fake Stats Analyst — it just received the user's input and wrote a summary. That's not agentic. That's a dressed-up prompt.

The v2 Stats Analyst is genuinely agentic. It uses Gemini's Function Calling API to decide — by itself — which tools it needs to call, in which order, and how to synthesize the results.

Here's how the tool definitions look:

TOOLS = [
    {
        "name": "get_cricket_scorecard",
        "description": "Fetches live or recent match data from Sportmonks — score, wickets, player stats, and ball-by-ball history.",
        "parameters": {
            "type": "object",
            "properties": {
                "match_id": {"type": "string", "description": "Sportmonks match ID"},
                "batting_team": {"type": "string"},
                "bowling_team": {"type": "string"},
            },
            "required": ["batting_team", "bowling_team"],
        },
    },
    {
        "name": "get_weather_conditions",
        "description": "Fetches humidity, dew point, and temperature from OpenWeatherMap for a given venue city.",
        "parameters": {
            "type": "object",
            "properties": {
                "venue": {"type": "string", "description": "Full venue name, e.g. Wankhede Stadium, Mumbai"},
            },
            "required": ["venue"],
        },
    },
    {
        "name": "calculate_win_probability",
        "description": "Calculates chase win probability based on target, current score, wickets, overs, and dew factor.",
        "parameters": {
            "type": "object",
            "properties": {
                "target": {"type": "integer"},
                "current_score": {"type": "integer"},
                "wickets": {"type": "integer"},
                "balls_remaining": {"type": "integer"},
                "dew_factor": {"type": "string", "enum": ["none", "light", "heavy"]},
            },
            "required": ["target", "current_score", "wickets", "balls_remaining"],
        },
    },
    {
        "name": "get_player_matchup",
        "description": "Gets head-to-head stats between a specific batter and bowler.",
        "parameters": {
            "type": "object",
            "properties": {
                "batter": {"type": "string"},
                "bowler": {"type": "string"},
            },
            "required": ["batter", "bowler"],
        },
    },
]

The agent loop runs like this:

async def run_stats_analyst_fc(match_state: MatchStateInput) -> str:
    """True function-calling loop — Gemini decides which tools to invoke."""

    messages = [
        {
            "role": "user",
            "parts": [{"text": build_analyst_prompt(match_state)}]
        }
    ]

    # Multi-turn FC loop
    for attempt in range(MAX_FC_TURNS):
        response = genai_client.models.generate_content(
            model="gemini-2.5-flash",
            contents=messages,
            config=GenerateContentConfig(tools=[Tool(function_declarations=TOOLS)])
        )

        candidate = response.candidates[0]

        # Check if the model wants to call a function
        if candidate.content.parts and candidate.content.parts[0].function_call:
            fc = candidate.content.parts[0].function_call
            tool_name = fc.name
            tool_args = dict(fc.args)

            # Execute the tool
            result = await dispatch_tool(tool_name, tool_args)

            # Feed result back into the conversation
            messages.append({"role": "model", "parts": [{"function_call": fc}]})
            messages.append({
                "role": "user",
                "parts": [{"function_response": {"name": tool_name, "response": result}}]
            })
        else:
            # Model is done — return the text output
            return candidate.content.parts[0].text

    return "Analysis complete (max turns reached)."

What blew my mind: I gave the agent a match where a left-arm spinner was bowling in the 18th over with heavy dew. Without me hardcoding anything, the model called get_weather_conditions first, then get_player_matchup with the spinner's name against the danger batter. It figured out the relevant questions by itself.

Part 2: Real-Time Streaming with SSE + Auto-Retry

All five agents stream their output token-by-token to the frontend via Server-Sent Events. This creates the feeling of watching each agent "think" in real time — which is genuinely more engaging than waiting for a full response.

The SSE endpoint:

@app.post("/api/strategize")
async def strategize(match_state: MatchStateInput):
    async def event_stream():
        try:
            # 1. Stats Analyst (function calling — not streamed, but tool events are)
            yield f"event: agent_start\ndata: {json.dumps({'agent': 'stats_analyst'})}\n\n"
            fact_sheet = await run_stats_analyst_fc(match_state)
            yield f"event: agent_token\ndata: {json.dumps({'agent': 'stats_analyst', 'token': fact_sheet})}\n\n"
            yield f"event: agent_complete\ndata: {json.dumps({'agent': 'stats_analyst'})}\n\n"

            # 2–5. Stream each remaining agent
            for agent_id, prompt_fn in AGENT_PIPELINE:
                yield f"event: agent_start\ndata: {json.dumps({'agent': agent_id})}\n\n"
                full_text = ""
                async for token in stream_gemini_with_retry(prompt_fn(fact_sheet, match_state)):
                    full_text += token
                    yield f"event: agent_token\ndata: {json.dumps({'agent': agent_id, 'token': token})}\n\n"
                yield f"event: agent_complete\ndata: {json.dumps({'agent': agent_id})}\n\n"

            # Final structured Captain's Call
            call = extract_captains_call(full_text)
            yield f"event: captains_call\ndata: {json.dumps(call)}\n\n"
            yield f"event: done\ndata: {{}}\n\n"

        except Exception as e:
            yield f"event: error\ndata: {json.dumps({'message': str(e)})}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

The Auto-Retry Challenge

Running 5 sequential Gemini calls per "Strategize" click is aggressive. When you're testing rapidly (as you do in a hackathon), you hit 429 Too Many Requests constantly. I built an exponential backoff retry wrapper:

async def stream_gemini_with_retry(prompt: str, model="gemini-2.5-flash", max_retries=3):
    """Streams Gemini with exponential backoff on 429/503."""

    for attempt in range(max_retries):
        try:
            response = genai_client.models.generate_content_stream(
                model=model,
                contents=[{"role": "user", "parts": [{"text": prompt}]}]
            )
            for chunk in response:
                if chunk.text:
                    yield chunk.text
            return  # success

        except Exception as e:
            if "429" in str(e) or "503" in str(e):
                wait = 2 ** attempt  # 1s, 2s, 4s
                yield f"\n\n[Retrying in {wait}s...]\n\n"
                await asyncio.sleep(wait)
            else:
                raise

The frontend displays a dismissible retry banner whenever this fires, so users understand what's happening rather than staring at a frozen screen.

Part 3: Gemini Vision — Turning Screenshots Into Strategy

This is the feature that gets the most "wow" reaction in demos. You can drop a screenshot from JioCinema, Hotstar, or Cricbuzz directly into the app, and Gemini Vision extracts the full match state — score, wickets, bowlers, venue, even the dew factor — mapped to our internal JSON schema.

@app.post("/api/vision-extract")
async def vision_extract(file: UploadFile = File(...)):
    """Uses Gemini Vision to extract match state from a screenshot."""

    image_bytes = await file.read()
    image_b64 = base64.b64encode(image_bytes).decode()

    response = genai_client.models.generate_content(
        model="gemini-2.5-flash",
        contents=[
            {
                "role": "user",
                "parts": [
                    {
                        "inline_data": {
                            "mime_type": file.content_type,
                            "data": image_b64,
                        }
                    },
                    {"text": VISION_EXTRACT_PROMPT},
                ],
            }
        ],
    )

    raw = response.text.strip()
    # Strip markdown code fences if model adds them
    if raw.startswith("```

"):
        raw = raw.split("

```")[1]
        if raw.startswith("json"):
            raw = raw[4:]

    match_state = json.loads(raw.strip())
    return {"success": True, "matchState": match_state}

The prompt engineering for vision extraction required a lot of iteration. The key insight: you must give the model the exact JSON schema you expect, or it will invent field names. Once I added the full schema to the prompt, accuracy jumped dramatically.

The frontend drop zone:

const onDrop = e => {
  e.preventDefault()
  setDragOver(false)
  const file = e.dataTransfer.files[0]
  if (file) handleVisionFile(file)
}

const handleVisionFile = async file => {
  setVisionLoading(true)
  const formData = new FormData()
  formData.append('file', file)
  const resp = await fetch(`${API_BASE}/api/vision-extract`, { 
    method: 'POST', 
    body: formData 
  })
  const data = await resp.json()
  if (data.success) setState(prev => ({ ...prev, ...data.matchState }))
  setVisionLoading(false)
}

Part 4: The Captain's Override — Human-in-the-Loop

One of the most powerful features for hackathon judges (and for cricket fans who think they know better than the AI) is Captain's Override. Before you click "Strategize", you can type a tactical constraint:

"What if we deploy the Impact Player as a pinch hitter right now?"
"Assume Pathirana is carrying a niggle — only 1 more over available."

This gets injected as a system message into the debate pipeline. The agents must acknowledge and respond to it — the Devil's Advocate in particular will often push back hard if the override seems tactically wrong.

def build_strategist_prompt(fact_sheet: str, match_state, override: str = None) -> str:
    base_prompt = STRATEGIST_PROMPT.format(fact_sheet=fact_sheet, match_state=match_state)

    if override:
        base_prompt += f"""

⚠️ CAPTAIN'S OVERRIDE — Human tactical input:
"{override}"

You MUST address this override directly in your decision. Either incorporate it,
argue against it with data, or explain why conditions make it suboptimal.
Do not ignore it.
"""
    return base_prompt

Part 5: Real-Time Agent UI with Auto-Scroll and Markdown

The Debate Theater renders 5 AgentCard components that stream in sequentially. When a debate is running, the UI auto-scrolls to the currently-typing agent using a useRef anchor:

const bottomRef = useRef(null)

useEffect(() => {
  if (isRunning && bottomRef.current) {
    bottomRef.current.scrollIntoView({ behavior: 'smooth', block: 'nearest' })
  }
}, [debate, isRunning])

When an agent completes its response, the raw text (which often contains Markdown like **Decision:**, bullet points, and block quotes) is rendered with react-markdown:

function StreamingMarkdown({ text, isDone }) {
  if (isDone) {
    return (
      <div className="prose">
        <ReactMarkdown remarkPlugins={[remarkGfm]}>{text}</ReactMarkdown>
      </div>
    )
  }
  // While streaming — word-by-word animation
  const words = text.split(/(\s+)/)
  return (
    <p className="agent-card__text">
      {words.map((w, i) => (
        <span key={i} className="word" style={{ animationDelay: `${Math.min(i * 12, 400)}ms` }}>{w}</span>
      ))}
    </p>
  )
}

This dual-mode approach is important: while streaming, you want the word-fade animation for that live typewriter feel. Once complete, you want properly rendered Markdown for readability. If you render Markdown during streaming, you get broken syntax as asterisks appear before the closing pair — ugly.

Part 6: Persistent History with localStorage

After a debate completes, the Captain's Call is auto-saved to localStorage with a 20-entry circular buffer. This survives page refreshes — so when you're testing across multiple scenarios, your decision history is preserved.

const HISTORY_KEY = 'captain_cool_history_v1'

function loadHistory() {
  try { return JSON.parse(localStorage.getItem(HISTORY_KEY) || '[]') } catch { return [] }
}
function saveHistory(h) {
  try { localStorage.setItem(HISTORY_KEY, JSON.stringify(h.slice(-20))) } catch {}
}

// Auto-save when captainsCall arrives
if (captainsCall && !savedThisRun) {
  setSavedThisRun(true)
  const entry = { ...captainsCall, id: Date.now(), time: new Date().toLocaleTimeString() }
  setHistory(prev => {
    const updated = [...prev, entry]
    saveHistory(updated)
    return updated
  })
}

Part 7: The Design — Glassmorphism War Room Aesthetic

The UI needed to feel like a tactical war room, not a dashboard. Every design decision was made with that metaphor in mind.

The Color System

:root {
  --bg-base:  #080E1A;  /* Deep navy — near-black */
  --gold:     #F5A623;  /* IPL gold — primary accent */
  --coral:    #FF5566;  /* Devil's Advocate / danger */
  --blue:     #4D9FFF;  /* Data / tool calls */
  --green:    #35E07A;  /* Completion / confidence */
  --purple:   #A78BFA;  /* Analysis / ambient */
}

Glassmorphism Panels

Every panel in the app uses backdrop-filter to create depth:

.mcc__panel--left {
  background: rgba(10, 18, 40, 0.7);
  backdrop-filter: blur(12px) saturate(160%);
  -webkit-backdrop-filter: blur(12px) saturate(160%);
  border-right: 1px solid rgba(255, 255, 255, 0.05);
}

The ambient background gradient on body:

body {
  background-image:
    radial-gradient(ellipse 80% 60% at 50% -10%, rgba(77,159,255,0.08) 0%, transparent 60%),
    radial-gradient(ellipse 60% 40% at 80% 80%, rgba(245,166,35,0.05) 0%, transparent 55%),
    radial-gradient(ellipse 50% 40% at 20% 90%, rgba(167,139,250,0.04) 0%, transparent 50%);
}

Three invisible colored light sources give the dark background a subtle, premium depth.

The Animated Cricket Ball

The landing page canvas renders a rotating 3D cricket ball using the 2D Canvas API — no Three.js, no WebGL. Just radial gradients, ellipse() arcs, and requestAnimationFrame:

// Ball gradient — dark red with specular highlight
const ballGrd = ctx.createRadialGradient(bx-12, by-12, 4, bx, by, br)
ballGrd.addColorStop(0, '#8B2020')
ballGrd.addColorStop(0.5, '#6B1515')
ballGrd.addColorStop(1, '#3D0A0A')

// Seam (rotating ellipse)
ctx.save()
ctx.translate(bx, by)
ctx.rotate(frame * 0.015)
ctx.ellipse(0, 0, br * 0.95, br * 0.15, 0, 0, Math.PI * 2)
ctx.strokeStyle = 'rgba(255,220,180,0.7)'
ctx.restore()

What I Learned That Will Stick With Me

1. Function Calling is the Difference Between Fake and Real Agentic AI

Before implementing FC, the Stats Analyst was a prompt that always called every API regardless of match context. After FC, the agent dynamically decides. In a 1st innings powerplay with no target, it skips calculate_win_probability entirely. In a death over chase, it calls player matchups. The intelligence is emergent, not hardcoded.

2. SSE is Underrated for LLM Apps

Every tutorial pushes WebSockets. But for unidirectional streams (server → client), SSE is simpler, works over HTTP/1.1, and doesn't require a library. FastAPI's StreamingResponse with text/event-stream handles reconnection, and the browser's fetch with ReadableStream is all the client needs.

3. The Devil's Advocate is the Most Valuable Agent

Counter-intuitively, the agent I added mostly as a "cool hackathon feature" turned out to produce the most tactically interesting outputs. When the Strategist confidently calls for Pathirana in the 18th, the Devil's Advocate often surfaces the correct counter — "the dew will neutralize his yorkers; you need a cutters bowler instead." The debate structure actually finds better answers than a single-pass LLM call.

4. Prompt Schema Discipline Prevents JSON Hell

Every agent that produces structured output (captainsCall) was given a rigid JSON schema in the system prompt, plus a regex extraction fallback. Without the schema, Gemini returns beautifully written prose that is impossible to parse consistently. With it, you get clean JSON 95%+ of the time.

5. Vision Models Need the Target Schema in the Prompt

This was a hard lesson. When I first wrote the Vision extraction prompt, I described the fields in English. The model returned a different JSON structure every time. When I included the exact schema as a code block in the prompt and said "Return ONLY this JSON", consistency jumped from ~50% to ~95%.

Challenges and Things That Are Still Broken

429 Rate Limits: With 5 Gemini calls per "Strategize" click, you will hit rate limits on the free tier within minutes of testing. The auto-retry helps, but a proper production version needs caching, debouncing, and API key rotation.

Sportmonks Live Match IDs: The live cricket endpoint requires knowing the exact match ID. I currently use a search endpoint + fuzzy match on team names, which fails for multi-day fixtures. Mock fallback data keeps it usable.

PDF Export: I scoped a "Dressing Room Report" PDF that exports the full debate transcript. Not implemented yet — jspdf + Markdown rendering in a canvas is more complex than it looks.

Mobile UX: The three-panel layout collapses to tabs on mobile, but the debate theater on a phone is too tight. A dedicated mobile view would help significantly.

The Full Tech Architecture (at a Glance)

Frontend (React 19 + Vite)
├── Landing.jsx          — Canvas animation, hero section
├── MatchCommandCenter   — 3-panel layout, SSE consumer
│   ├── MatchInputForm   — Vision drop zone, URL scrape, Override
│   ├── DebateTheater    — 5 AgentCard components, auto-scroll
│   └── CaptainCallCard  — Markdown verdict, Confidence, WinProb gauge
├── hooks/useDebateStream.js — SSE reader, event dispatch, localStorage
└── styles/globals.css   — Design tokens, glassmorphism, animations

Backend (FastAPI + Python)
├── main.py              — /api/strategize SSE, /api/vision-extract, /api/scrape
├── prompts.py           — All agent prompts + VISION_EXTRACT_PROMPT
└── tools/
    ├── cricket_api.py   — Sportmonks integration + mock fallback
    └── weather.py       — OpenWeatherMap + venue-lat/lng map

External APIs
├── Google Gemini 2.5 Flash — Function Calling + Vision + Streaming
├── Sportmonks Cricket API  — Live scores, player stats
└── OpenWeatherMap          — Real-time dew/humidity for 11 IPL venues

Try It Yourself

The full source is on GitHub: github.com/AdityaPatil2549/APL-2026-GDGOC-Pune

# Clone
git clone https://github.com/AdityaPatil2549/APL-2026-GDGOC-Pune.git
cd APL-2026-GDGOC-Pune

# Backend
cd backend
pip install -r requirements.txt
cp ../.env.example ../.env  # add your API keys
uvicorn main:app --reload --port 8000

# Frontend (new terminal)
cd frontend
npm install
npm run dev
# → http://localhost:5174

You'll need:

GOOGLE_API_KEY — from Google AI Studio (free tier works)
SPORTMONKS_API_KEY — from sportmonks.com (free trial available)
OPENWEATHERMAP_KEY — from openweathermap.org (free tier works)

What's Next

PDF Dressing Room Report — Full debate transcript exportable as a branded PDF
WebSocket Live Match Sync — Poll Sportmonks every ball, trigger auto-re-debate if match state changes significantly
Historical Decision Analytics — Track how the AI's decisions compare to what actually happened ball-by-ball
Multi-language Commentary — Commentator agent in Hindi, Tamil, or Bengali for regional IPL fan bases

Final Thought

The most interesting thing I discovered building Captain Cool is that structured AI debate produces better tactical answers than a single, confident AI response.

The Strategist, left to itself, is competent. But once the Devil's Advocate forces it to defend its reasoning — and the Strategist has to either hold its position or revise with new logic — the final answer is almost always sharper, more nuanced, and more tactically sound.

That's not an AI insight. That's just how good decisions get made. We've always known that the best thinking happens when smart people argue with each other. We just hadn't applied it to AI systems in real-time, at the scale of a live cricket match.

Captain Cool is my attempt to do exactly that.

Built for the GDGoC Pune APL AI Hackathon 2026. If you found this useful, drop a ❤️ and share with your cricket-obsessed developer friends.

Tags: #gemini #ai #python #react #cricket #multiagent #functioncalling #hackathon #fastapi #llm

DEV Community