This article was created for the purposes of entering the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge
Every founder has been there. You rehearse your pitch until it sounds bulletproof. You walk into the room. An investor asks one question and the whole thing falls apart.
Not because the idea is bad. Because you never had anyone argue back.
That's what I built for the Gemini Live Agent Challenge: PitchFire, a realtime AI pitch steelmanning agent that challenges every weak claim you make, validates every strong one, and generates a battlehardened pitch deck from only the arguments you successfully defended.
Live at: pitchfire.up.railway.app
Code: github.com/iam25th1/pitchfire
How It Works
You tap the orb and start speaking your pitch. PitchFire listens using Voice Activity Detection: the moment you pause, it captures the segment, sends it to Gemini 2.5 Flash, and fires challenge cards within 2–3 seconds.
Say "our TAM is $50 billion" with no source, a red challenge card fires:
"A $50B TAM from what source? What year? What percentage can you realistically capture in 24 months? TAM without SAM/SOM is theater."
Your conviction score drops. Say "we have 3 paying pilots at $5K/month" — a green validation card fires, score goes up. At the end, hit END and Gemini generates a pitch deck containing only the claims that survived.
Two modes:
- Interrupt Mode — Interrupts when inconsistencies detected or you are having awkward silence.
- Full Pitch Mode — 3 seconds of silence. Deliver your entire pitch, get the full breakdown after.
Every card has three actions: READ the full challenge, ▶ LISTEN to hear it spoken aloud, or ↩ RESPOND to type a direct defense — which goes back through Gemini, keeping the conversation anchored to that specific claim.
The Technical Stack
The audio pipeline was the interesting part.
The browser captures raw PCM16 at 16kHz using a ScriptProcessorNode. Per buffer, I compute RMS volume to detect voice activity. When voice is detected, chunks accumulate. When silence crosses the threshold, the chunks are concatenated and wrapped in a 44-byte WAV header before being base64 encoded and sent to Gemini's multimodal REST endpoint.
What Gemini Made Possible
The entire product is one well engineered Gemini prompt. The model transcribes the audio AND analyzes every claim in a single call. It identifies whether a claim is weak or strong, generates a sharp investor-style challenge, cites relevant counterevidence, scores the claim, and categorizes it across six pitch dimensions all in one response.
Without a model that can do multimodal input and structured reasoning simultaneously, this product wouldn't exist. The Gemini 2.5 Flash API made it a 2-5 days build instead of a 2 month one.
What's Next
- Investor persona modes: VC, angel, strategic
- Team practice mode with multiple founders
- Integration with pitch deck tools
Built solo for the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge
Top comments (0)