Mohammed Ayaan Adil Ahmed

Posted on Mar 15

I Built a Live AI First Aid Agent with Gemini 2.5 Flash in 3 Days

#ai #devchallenge #gemini #showdev

How I Built CalmAid — A Live AI First Aid Agent with Gemini 2.5 Flash and Google Cloud Run

I created this piece of content for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

The Idea

In an emergency, people panic. They fumble with Google, get walls of text, and waste critical seconds. I wanted to build something that could just talk to you — calmly, instantly, while also seeing what you're dealing with.

That became CalmAid: speak the emergency, show the injury, hear step-by-step instructions streaming back in real time.

The Stack

Gemini 2.5 Flash — multimodal vision + text generation with streaming
Google GenAI SDK (google-genai) — the new SDK, not the deprecated one
FastAPI — async Python backend
Server-Sent Events (SSE) — real-time streaming to the browser
Google Cloud Run — serverless hosting
Google Secret Manager — secure API key storage
Web Speech API + Speech Synthesis — browser-native voice in and out
GSAP 3 — animations

How Streaming Works

The key insight that makes CalmAid feel live is that text renders and TTS speaks simultaneously while Gemini is still generating.

The backend streams via SSE:

async def stream_gemini(parts):
    response = client.models.generate_content_stream(
        model="gemini-2.5-flash",
        contents=parts,
        config=types.GenerateContentConfig(
            system_instruction=SYSTEM_PROMPT,
            max_output_tokens=300,
        )
    )
    for chunk in response:
        if chunk.text:
            yield f"data: {json.dumps({'chunk': chunk.text})}\n\n"
            await asyncio.sleep(0)
    yield f"data: {json.dumps({'done': True})}\n\n"

The frontend reads the stream and feeds sentences to a TTS queue the moment a sentence boundary (., !, ?) is detected:

function enqueueSentences(newText) {
  ttsBuffer += newText;
  const sentences = ttsBuffer.split(/(?<=[.!?])\s+/);
  ttsBuffer = sentences.pop() || "";
  sentences.forEach(s => { if (s.trim()) ttsQueue.push(s.trim()); });
  if (!ttsActive) drainTTSQueue();
}

The result: the agent starts speaking before the full response arrives. That's what makes it feel genuinely live.

Vision Integration

When a user snaps a photo, it's sent as base64 and converted to a Pillow image on the backend:

if req.image_b64:
    img_bytes = base64.b64decode(req.image_b64)
    img = Image.open(io.BytesIO(img_bytes)).convert("RGB")
    buf = io.BytesIO()
    img.save(buf, format="JPEG")
    img_part = types.Part.from_bytes(data=buf.getvalue(), mime_type="image/jpeg")
    parts.append(img_part)

Gemini then describes what it sees and tailors the first aid advice accordingly.

Deploying to Cloud Run

The whole deploy is one command thanks to --source . which triggers Cloud Build automatically:

gcloud run deploy calmaid-agent \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-secrets="GEMINI_API_KEY=gemini-api-key:latest" \
  --memory 512Mi

The API key lives in Secret Manager and gets injected at runtime — never hardcoded, never in the repo.

Challenges

SSE buffer management was trickier than expected. Chunks from the stream reader arrive mid-line, so you have to hold incomplete lines across read cycles:

buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop(); // hold incomplete line

Python 3.13 compatibility broke several pinned packages. Pillow 10.x and pydantic 2.7.x don't have prebuilt wheels for 3.13 — bumping to Pillow 11.1.0 and pydantic 2.10.0 fixed it.

SDK migration — the google-generativeai package is fully deprecated and streaming was unreliable. Switching to google-genai resolved it completely.

What I Learned

Streaming + TTS together is what makes AI feel live vs turn-based
Browser-native Web Speech API and Speech Synthesis are underrated — zero dependencies, instant
python:3.11-alpine cuts Docker image vulnerabilities dramatically vs slim
Cloud Run + Secret Manager is the cleanest production pattern for API keys

Try It

Live app: submitted via the Gemini Live Agent Challenge portal
GitHub: https://github.com/git791/Calm-Aid

Built for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

DEV Community