This blog post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
The Problem
285 million people worldwide are visually impaired. While screen readers help with digital interfaces, nothing helps them "see" the physical world around them. I wanted to change that.
The Solution: EyeGuide
EyeGuide is a real-time AI companion that sees through a phone camera and talks naturally with the user. It describes surroundings, reads text, warns about hazards, and can be interrupted mid-sentence — just like talking to a friend.
Tech Stack
- Gemini 2.5 Flash Native Audio — for real-time bidirectional audio + vision
- Google ADK (Agent Development Kit) — bidi-streaming runtime with LiveRequestQueue
- FastAPI + WebSocket — real-time server communication
- Google Cloud Run — serverless deployment
- Google Cloud Firestore — user preferences
How I Built It
The core challenge was connecting the browser's camera and microphone to Google's Gemini Live API in real-time. Here's the flow:
- Browser captures audio (16kHz PCM) and camera frames (1 FPS JPEG)
- WebSocket sends both to a FastAPI backend on Cloud Run
- ADK's
LiveRequestQueuefeeds data to Gemini's Live API - Gemini processes audio + video simultaneously and responds with voice
- Audio response streams back through WebSocket to the browser
The most impressive part? Barge-in support — users can interrupt the AI mid-sentence, and Gemini's built-in Voice Activity Detection handles it seamlessly.
Key Learnings
- Only
gemini-2.5-flash-native-audio-latestsupportsbidiGenerateContenton the Google AI API - ADK's
RunConfigwithStreamingMode.BIDIis the correct way to configure live streaming - 1 FPS video is surprisingly sufficient for scene understanding
- System prompt engineering makes or breaks a voice agent's persona
Try It
- Live App: https://eyeguide-966189115030.us-central1.run.app/
- GitHub: https://github.com/sharmaachintya/EyeGuide
Created for the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
Top comments (0)