DEV Community

Pacy
Pacy

Posted on

From Clinic to Code: Engineering GozAI, a Clinically- grounded AI Copilot for Low Vision & Visually impaired

For the 2.2 billion people experiencing vision impairment globally, standard assistive technology often feels like a burden rather than a bridge.
The seed for my latest project, GozAI, was planted back in 2022 during my 500-level Optometry externship at the University of Ilorin Teaching Hospital (UITH) in Nigeria. Working under the guidance of a low-vision and pediatric specialist, I spent my days with countless patients navigating life with severe visual impairments.
Day after day, I witnessed the same frustrating reality: traditional low-vision aids are incredibly fragmented and prohibitively expensive. Patients were being asked to carry and pay a premium for clunky, single-purpose digital magnifiers.
Even then, it raised a glaring question: We are in the digital age. Why hasn't this been solved using the devices already in our pockets? At the time, I had the clinical domain expertise to know exactly what the solution should look like, but I lacked the software engineering skills to actually build it. The idea stayed on the shelf.

The Catalyst: Agentic Coding and Clinical Reality

Fast forward to 2026. Recent clinical research has completely validated what I saw in the clinic years ago, showing a staggering abandonment rate of up to 50% for specialized assistive gadgets. The primary culprits? Cost, severe battery drain, and the social stigma of carrying them. People don’t want another heavy device to manage. They just want the smartphone they already own to act as their eyes.
At the same time, we entered the era of agentic coding. With recent leaps in AI development tools, the technical barriers to entry that once held me back vanished. By building entirely with Google's Antigravity, I was finally able to take the clinical blueprints from my externship experience and code them into reality.

Building GozAI: A Multimodal Accessibility Copilot

When I entered the Gemini Live Agent Challenge, I set out with an uncompromising goal: build an accessibility copilot that is battery-optimized, emotionally reassuring, and rooted entirely in clinical reality.
GozAI is a real-time voice and vision assistant powered by the Gemini 2.0 Flash Multimodal Live API and hosted on Google Cloud Run. Instead of just building a "talking camera" that overwhelms the user, I engineered it to act as an intelligent, empathetic filter.

Bridging the Gap: Clinical Need & Technical Execution

  • Verified Clinical Grounding: To ensure GozAI wasn't just another tech novelty, I didn't want it just guessing. I engineered the backend RAG architecture (built using the Google ADK) to strictly rely on real, verified optometry data and textbooks. It operates under strict clinical guardrails, ensuring the advice and spatial orientation it provides are safe and accurate.
  • Semantic Vibro-Acoustics: Continuous audio feedback causes massive cognitive fatigue for low-vision users. GozAI captures frames at a battery-saving 1 FPS and uses synchronized haptics for navigation. It only delivers audio warnings for immediate hazards, keeping the user's mental bandwidth clear.
  • The Glaucoma Protocol: Progressive vision loss carries a high risk of depression and anxiety. GozAI’s persona is strictly constrained to be an anchor of psychological safety—calm, steady, and reassuring. It doesn't just see for the user; it acts as a supportive companion.
  • The UI Navigator: For digital environments, GozAI acts as a bridge where standard screen readers fail. Using Gemini’s multimodal capabilities, it can interpret unlabelled buttons on a screen, read medical labels offline (via Google ML Kit), and even inject synthetic screen taps for the user. ## The Technical Backbone To make this vision a reality, we leveraged a robust Google-native stack:

Flutter PWA / Mobile
└── Gemini Multimodal Live API (Bidirectional WebSocket)
└── Google ADK Agent (Cloud Run)
├── Tool: Optometry Guidelines (Verified RAG)
├── Tool: Medication Lookup (OpenFDA)
├── Tool: SOS → Firestore
└── Tool: Clinical Stats
└── Firebase / Firestore (Companion Memory)
The mobile client connects directly to the Gemini Multimodal Live API via a bidirectional WebSocket, streaming real-time audio and camera frames simultaneously — ensuring near-zero latency for those critical safety moments.

Closing the Gap
GozAI proves that when you combine deep clinical empathy with bleeding-edge tools like the Gemini Live API and Google's Antigravity, you can build solutions that actually restore independence, rather than just adding another gadget to a patient's bag.

Everyone deserves to navigate the world with confidence.

This piece was created for the purposes of entering the Gemini Live Agent Challenge. 🚀

Check out the code: https://github.com/Kryptopacy/GozAI
🌐 Try the Demo: gozai-app.web.app
🚀 Built for: Gemini Live Agent Challenge | Tracks: Live Agents + UI Navigator

Top comments (0)