DEV Community

Cover image for ๐ŸŒž ๐Ÿ‘๏ธ Sunbeam: AI-Powered Visual Assistant for the Visually Impaired
James Hoang
James Hoang

Posted on

๐ŸŒž ๐Ÿ‘๏ธ Sunbeam: AI-Powered Visual Assistant for the Visually Impaired

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

Sunbeam is a revolutionary AI-powered visual assistant designed specifically for the 1.3 billion visually impaired people worldwide. Built as a voice-first web application, Sunbeam transforms any smartphone camera into intelligent eyes, enabling users to navigate and understand their environment through natural conversation with AI.

The app addresses critical daily challenges faced by visually impaired individuals:

  • Scene Understanding: Describing surroundings in real-time
  • Text Reading: OCR for menus, signs, labels, and documents
  • Object Detection: Identifying and locating items with precision
  • People Finding: Locating people nearby with spatial guidance
  • Currency Recognition: Identifying money and denominations
  • Color Detection: Determining colors at fingertip precision
  • Document Analysis: Processing PDFs, Word documents, and images
  • Conversational AI: Natural dialogue about the environment

Sunbeam goes beyond basic accessibility tools by providing a humanized AI companion named "Sunbeam" that offers warm, supportive interaction rather than robotic responses. The app features sophisticated haptic feedback patterns, voice-first navigation, and enterprise-grade accessibility compliance.

Demo

๐ŸŽฌ โญ๏ธ Sunbeam Production Video: A Vision Story โญ๏ธ: https://www.youtube.com/watch?v=WShs1iW2LJg

๐ŸŽฅ Technical Demo Video: https://youtu.be/Pek2vHmnQXo

๐Ÿš€ Live App: https://sunbeam-55403884521.us-west1.run.app

How I Used Google AI Studio

Sunbeam is built entirely on Google AI Studio's Gemini ecosystem, leveraging multiple cutting-edge capabilities:

Core Integration:

  • Gemini 2.5 Flash API via @google/genai package
  • Structured Output: JSON Schema implement action for consistent AI responses
  • Custom System Instructions: "Sunbeam" personality with accessibility-focused prompting

Google AI Studio Features Utilized:

  • Multimodal Content Understanding for image, text, and audio processing
  • Real-time Streaming for conversational mode
  • Context-Aware Responses with conversation memory
  • Safety Controls for user protection and appropriate guidance

Multimodal Features

Sunbeam showcases comprehensive multimodal AI capabilities across three primary domains:

๐Ÿ–ผ๏ธ Visual Intelligence

  • Scene Analysis: Gemini processes camera feeds to describe environments with contextual detail
  • Object Detection: Real-time identification with normalized bounding box coordinates
  • Text Recognition (OCR): Extraction and reading of text from images, documents, and signage
  • Document Processing: Multi-format analysis supporting JPEG, PNG, PDF, and DOCX files
  • Color Recognition: Pixel-level color analysis with mathematical precision using Euclidean distance algorithms

๐ŸŽค Audio Intelligence

  • Speech Recognition: Web Speech API integration for natural voice commands
  • Text-to-Speech: Custom audio synthesis with Web Audio API
  • Voice Commands: "Hey Sunbeam" wake word with intelligent timeout management
  • Conversational AI: Real-time streaming chat with spatial awareness
  • Audio Feedback: Sophisticated haptic patterns (tap, success, error) designed like musical notes

๐Ÿง  Cross-Modal Intelligence

  • Spatial Awareness: Visual object detection โ†’ Audio spatial guidance ("to your right", "very close")
  • Context Preservation: Visual analysis informs conversational responses
  • Multi-Input Processing: Simultaneous camera, voice, and file input handling
  • Real-time Coordination: Background process management ensuring smooth multimodal experience

Enhanced User Experience Through Multimodality:

1. Independence Enablement: Users can navigate environments hands-free using voice while receiving rich visual information through audio
2. Natural Interaction: Combines visual AI understanding with conversational dialogue, mimicking human assistance
3. Accessibility Excellence: Voice-first design eliminates visual interface dependency while maintaining visual feedback for sighted companions
4. Emotional Connection: "Sunbeam" AI personality creates supportive, warm interactions rather than clinical tool usage


Impact Statement

Sunbeam represents more than technological achievementโ€”it's a bridge to independence for millions of visually impaired individuals worldwide. By combining Google AI Studio's powerful multimodal capabilities with accessibility-first design, I've created a solution that doesn't just process information but transforms lives through empowering technology.

Built with โค๏ธ for accessibility


Top comments (0)