This afternoon I was standing outside my house in Lusaka, Zambia, testing an app on my brother's phone because mine is too slow for real-time AI.
I pointed the camera at my gate and asked: "Is my gate locked? Do I look safe?"
Gemini described the gate, the car parked nearby, the surroundings and then gave me safety advice I didn't even ask for, based on what it saw.
The Hackathon Challenge
I created this post for the purposes of entering the Gemini Live Agent Challenge 2026.
Most tools built for visual impairment are either expensive, complicated, or require a specialist device. I wanted to build something that works on any phone, in any browser, right now.
Gemini Live made that possible. It watches through a camera, hears your voice, and speaks back, all in real time. That's SightLine.
The stack
- Next.js 14 for the frontend
- FastAPI for the backend
- Gemini 2.0 Flash Live on Vertex AI for the AI
- WebSocket for the real-time connection
- Google Cloud Run for deployment
- Google Cloud Build for the container pipeline
Straightforward on paper. The reality was messier.
What actually broke and how I fixed it
IAM permissions killed half a day.
Cloud Run and Cloud Build each require specific roles to access the Artefact Registry. The documentation doesn't give you the exact combination up front. I got there through trial and error, artifactregistry.reader on the compute service account, artifactregistry.admin on the Cloud Build service account.
Next.js bakes environment variables at build time.
This means if your backend URL is in a .env file that's excluded from Git, which it should be, your frontend will always try to connect to localhost in production. I wasted hours debugging WebSocket failures before I understood what was happening. The fix was hardcoding the URL directly in the hook. Not elegant, but it works.
The audio pipeline was three problems pretending to be one.
Getting clean real-time PCM16 audio from a mobile browser was problem one. Stopping the microphone from picking up Gemini's voice and feeding it back as input was problem two. Recovering smoothly from the end of each exchange without the session dropping was problem three. I solved them with a isSpeakingRef that mutes the mic while Gemini is talking, a 400ms cooldown before reopening, and a WebSocket ping/pong keepalive every 20 seconds.
The latency reality
I'm in Lusaka, Zambia. My Cloud Run service is in us-east4, Virginia. Every Gemini response has to travel across the Atlantic twice.
That latency is noticeable. It doesn't break the app but it slows it down compared to what someone in the US would experience. When Gemini Live becomes available in African GCP regions this gets dramatically better. Right now SightLine works in spite of the distance and working is what matters for a week-one build.
Who SightLine is actually for
I want to be straight about this. SightLine is not currently built for people with complete blindness. Pressing START, navigating the UI and switching cameras require enough vision to use a phone screen.
The real users today are people with low vision or partial sight. People who can use a phone but struggle with fine detail. People with deteriorating vision from age or a medical condition. People in situations where even a sighted person would struggle bad lighting, tiny print, unfamiliar text.
Making SightLine work for users with complete blindness is the next step voice-activated start, audio-guided onboarding, full screen reader support. That's the roadmap.
What I actually learned
I came into this as a data analyst with some Python experience. I left with a working knowledge of Vertex AI, Cloud Run, Docker, real-time audio streaming, WebSocket session management, and IAM configuration all learned under deadline pressure.
The thing nobody tells you about building real AI applications is that the AI part is often the easiest bit. It's the infrastructure, the deployment pipeline, the browser APIs and the edge cases that take the time.
But the tools are genuinely good right now. As a data analyst from Lusaka, I can build and deploy a real-time multimodal AI app in one week. That still surprises me a little.
Try it
Live: https://sightline-frontend-59597652459.us-east4.run.app
GitHub: https://github.com/rkchellah/Sightline
Point it at small text. Ask what it sees. It works.
If you're building accessibility tools or have thoughts on where SightLine should go, I'd like to hear from you.
Top comments (0)