DEV Community

Achintya Sharma
Achintya Sharma

Posted on

Building EyeGuide: A Real-Time AI Visual Companion for the Blind with Gemini Live API

This blog post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

The Problem

285 million people worldwide are visually impaired. While screen readers help with digital interfaces, nothing helps them "see" the physical world around them. I wanted to change that.

The Solution: EyeGuide

EyeGuide is a real-time AI companion that sees through a phone camera and talks naturally with the user. It describes surroundings, reads text, warns about hazards, and can be interrupted mid-sentence — just like talking to a friend.

Tech Stack

  • Gemini 2.5 Flash Native Audio — for real-time bidirectional audio + vision
  • Google ADK (Agent Development Kit) — bidi-streaming runtime with LiveRequestQueue
  • FastAPI + WebSocket — real-time server communication
  • Google Cloud Run — serverless deployment
  • Google Cloud Firestore — user preferences

How I Built It

The core challenge was connecting the browser's camera and microphone to Google's Gemini Live API in real-time. Here's the flow:

  1. Browser captures audio (16kHz PCM) and camera frames (1 FPS JPEG)
  2. WebSocket sends both to a FastAPI backend on Cloud Run
  3. ADK's LiveRequestQueue feeds data to Gemini's Live API
  4. Gemini processes audio + video simultaneously and responds with voice
  5. Audio response streams back through WebSocket to the browser

The most impressive part? Barge-in support — users can interrupt the AI mid-sentence, and Gemini's built-in Voice Activity Detection handles it seamlessly.

Key Learnings

  1. Only gemini-2.5-flash-native-audio-latest supports bidiGenerateContent on the Google AI API
  2. ADK's RunConfig with StreamingMode.BIDI is the correct way to configure live streaming
  3. 1 FPS video is surprisingly sufficient for scene understanding
  4. System prompt engineering makes or breaks a voice agent's persona

Try It


Created for the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

Top comments (0)