This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built VisionVoice โ Multilingual Visual Aid for the Visually Impaired, an applet designed to break language and accessibility barriers.
The app helps visually impaired users by detecting emergency/public signs, translating them into multiple languages, and narrating them aloud. This ensures safety and independence in real-world scenarios, like navigating public spaces or understanding critical instructions.
Demo
๐ Live App: https://visionvoice-1073180550844.us-west1.run.app/
๐ GitHub Repo: https://github.com/vikasmukhiya1999/VisionVoice---Multilingual-Visual-Aid-for-the-Visually-Impaired
โถ๏ธ Video Demo: https://youtu.be/N95jVdkpWbo
How I Used Google AI Studio
I leveraged Google AI Studio with Gemini 2.5 Flash Image to process multilingual visual inputs.
The model reads text from uploaded/real-time images.
Translates the detected text into the userโs preferred language.
Converts translated text into audio narration, making it accessible for visually impaired users.
Multimodal Features
Image-to-Text Extraction: Captures emergency signs, directions, or public notices.
Text Translation: Supports multiple languages for global accessibility.
Text-to-Speech Narration: Gives voice output so users can understand without needing to read.
Mobile-First UI: Simple, modern, and accessible design optimized for quick use.
This combination of multimodal features transforms how visually impaired individuals interact with their environment, bridging accessibility and inclusivity.
Top comments (0)