This is a submission for the Google AI Studio Multimodal Challenge
What I Built
MediVision Assistant - An AI-powered healthcare companion that makes medical assistance accessible to everyone, especially those with visual impairments or accessibility needs. The app combines computer vision, voice recognition, and AI chat to provide comprehensive health monitoring and assistance.
Key Features:
- 🖼️ AI Skin Analysis - Upload photos for instant skin condition assessment
- 🎤 Voice Symptom Logger - Record and transcribe health symptoms using speech-to-text
- 💊 Medication Scanner - OCR-powered medication identification and management
- 💬 AI Health Chat - Conversational AI for health questions and guidance
- ♿ Full Accessibility Support - Voice navigation, screen reader compatibility, high contrast mode
- 📱 Progressive Web App - Works offline, installable on any device
Demo
Live Application: https://medivision-assistant-ov3t3b7vaa-uc.a.run.app
GitHub Repository: https://github.com/omkardongre/medi-vision-assistant-ai
Screenshots
Homepage Dashboard: Clean, accessible dashboard with health summary and quick actions
Skin Analysis: AI-powered skin condition analysis with detailed insights
Voice Logger: Voice-to-text symptom recording with transcription
Health Chat: Conversational AI for health questions
Accessibility Features: Comprehensive accessibility toolbar with voice navigation
How I Used Google AI Studio
I leveraged Google AI Studio extensively to power the multimodal capabilities:
1. Gemini 2.0 Flash Experimental for Skin Analysis
- Integrated Gemini's vision capabilities to analyze uploaded skin photos
- Provides detailed assessments of skin conditions, moles, rashes, and other dermatological concerns
- Returns structured health insights with confidence scores and recommendations
2. Gemini 2.0 Flash Experimental for Health Chat
- Powers the conversational AI health assistant
- Processes natural language health questions and provides evidence-based responses
- Maintains conversation context for follow-up questions
3. Multimodal Integration
- Combined text, image, and voice inputs for comprehensive health monitoring
Multimodal Features
🖼️ Image + Text Analysis
- Skin Photo Analysis: Users upload photos of skin conditions, and Gemini analyzes them for potential health concerns
- Medication OCR: Scans medication labels and bottles to extract drug information, dosages, and instructions
- Visual Health Monitoring: Tracks changes in skin conditions over time with AI-powered
🎤 Voice + Text Processing
- Voice Symptom Logger: Records audio descriptions of symptoms and converts them to structured text
- Voice Navigation: Complete app navigation using voice commands ("go home", "skin analysis", "emergency")
- Audio Feedback: Text-to-speech responses for accessibility
💬 Conversational AI
- Contextual Health Chat: AI remembers previous conversations and provides personalized health guidance
- Multimodal Queries: Users can ask questions about their uploaded images, voice recordings, or general health topics
- Emergency Response: Voice-activated emergency protocols with immediate AI assistance
♿ Accessibility-First Design
- Screen Reader Compatible: Full ARIA labels and semantic HTML
- Voice Commands: Navigate the entire app using voice ("skin analysis", "medication scanner", "help")
- High Contrast Mode: Enhanced visibility for users with visual impairments
- Font Scaling: Adjustable text size up to 300%
- Keyboard Navigation: Complete app functionality without mouse
🔄 Data Integration
- Health Records: All multimodal inputs (images, voice, chat) are stored
- Export Capabilities: Users can export their health data for medical consultations
Technical Implementation
- Frontend: Next.js 15 with TypeScript and Tailwind CSS
- AI Integration: Google AI Studio with Gemini 2.0 Flash Experimental
- Voice Processing: Web Speech API for speech-to-text and text-to-speech
- Image Processing: Canvas API for image optimization and preprocessing
- Deployment: Google Cloud Run with automatic scaling
- Database: Supabase for health records and user data
- Accessibility: WCAG 2.1 AA compliant with comprehensive testing
Impact & Accessibility
This project demonstrates how AI can make healthcare more accessible to everyone, particularly:
- Visually impaired users who can navigate entirely by voice
- Elderly users who may have difficulty with complex interfaces
- Users with motor disabilities who rely on voice commands
- Non-native speakers who can describe symptoms in their own words
The multimodal approach ensures that health monitoring is not limited by traditional input methods, making medical assistance truly inclusive.
Built with ❤️ for the Google AI Studio Multimodal Challenge
Top comments (0)