This is a submission for the Google AI Studio Multimodal Challenge
What I Built
MediVision Assistant - An AI-powered healthcare companion that makes medical assistance accessible to everyone, especially those with visual impairments or accessibility needs. The app combines computer vision, voice recognition, and AI chat to provide comprehensive health monitoring and assistance.
Key Features:
- 🖼️ AI Skin Analysis - Upload photos and videos for instant skin condition assessment
- 🎨 AI Health Infographics - Generate professional medical infographics using Imagen 4.0
- 🎤 Voice Symptom Logger - Record and transcribe health symptoms using speech-to-text
- 💊 Medication Scanner - OCR-powered medication identification and management
- 💬 AI Health Chat - Conversational AI for health questions and guidance
- 🔗 Seamless Analysis-to-Chat Integration - Continue conversations with AI based on analysis results
- ♿ Full Accessibility Support - Voice navigation, screen reader compatibility, high contrast mode
- 📱 Progressive Web App - Works offline, installable on any device
Demo
Live Application: https://medivision.omkard.site (Custom domain mapped via Google Cloud Run)
Backup Link: https://medivision-assistant-968390101733.us-central1.run.app (Direct Cloud Run URL)
GitHub Repository: https://github.com/omkardongre/medi-vision-assistant-ai
Demo : https://youtu.be/kxGtnp9X_48?si=rvrUcb-HwdogB7pS
Screenshots
Homepage Dashboard: Clean, accessible dashboard with health summary and quick actions
Skin Analysis: AI-powered skin condition analysis with detailed insights
Voice Logger: Voice-to-text symptom recording with transcription
Health Chat: Conversational AI for health questions
AI Health Infographics: Professional medical infographics
Accessibility Features: Comprehensive accessibility toolbar with voice navigation
How I Used Google AI Studio
I leveraged Google AI Studio extensively to power the multimodal capabilities:
1. Gemini 2.5 Flash for Skin Analysis (Image + Video)
- Integrated Gemini's vision capabilities to analyze uploaded skin photos and videos
- Provides detailed assessments of skin conditions, moles, rashes, and other dermatological concerns
- Supports video analysis for dynamic skin condition monitoring and movement patterns
- Returns structured health insights with confidence scores and recommendations
- Supports multiple video formats (MP4, MOV, AVI, WebM) up to 25MB
2. Gemini 2.5 Flash for Health Chat
- Powers the conversational AI health assistant
- Processes natural language health questions and provides evidence-based responses
- Maintains conversation context for follow-up questions
3. Imagen 4.0 for Health Infographics
- Integrated Google Imagen 4.0 for professional medical infographic generation
- Creates medication schedules, health progress charts, and symptom tracking visuals
- Generates accessible, high-contrast infographics with professional medical styling
- Supports download and sharing of AI-generated health content
- Uses latest Imagen for cutting-edge image generation
4. Multimodal Integration
- Combined text, image, video, voice, and AI-generated visual content for comprehensive health monitoring
Multimodal Features
🎥 Video + Text Analysis (Skin Analysis Page)
- Video Skin Monitoring: Users upload videos for dynamic skin condition analysis and movement patterns
- Symptom Documentation: Video recordings of skin symptoms for detailed medical assessment
🖼️ Image + Text Analysis
- Skin Photo Analysis: Users upload photos of skin conditions, and Gemini analyzes them for potential health concerns
- Medication OCR: Scans medication labels and bottles to extract drug information, dosages, and instructions
🎤 Voice + Text Processing
- Voice Symptom Logger: Records audio descriptions of symptoms and converts them to structured text
- Voice Navigation: Complete app navigation using voice commands ("go home", "skin analysis", "emergency")
- Audio Feedback: Text-to-speech responses for accessibility
💬 Conversational AI
- Contextual Health Chat: AI remembers previous conversations and provides personalized health guidance
- Seamless Analysis Integration: After any analysis (skin, medication, voice logger), users can click "Discuss with AI Assistant" to continue the conversation with full context of their analysis results
♿ Accessibility-First Design
- Screen Reader Compatible: Full ARIA labels and semantic HTML
- Voice Commands: Navigate the entire app using voice ("skin analysis", "medication scanner", "help")
- High Contrast Mode: Enhanced visibility for users with visual impairments
- Font Scaling: Adjustable text size up to 300%
- Keyboard Navigation: Complete app functionality without mouse
🎨 AI-Generated Visual Content
- Health Infographics: Professional medical charts and schedules generated by Imagen 4.0
- Medication Schedules: Visual medication timing and dosage charts
- Progress Tracking: Health milestone and achievement visualizations
- Symptom Charts: Color-coded symptom monitoring and tracking graphics
- Download & Share: Export AI-generated infographics for medical consultations
🔄 Data Integration
- Health Records: All multimodal inputs (videos, images, voice, chat, infographics) are stored and organized
- Export Capabilities: Users can export their health data and AI-generated infographics for medical consultations
- Video Storage: Secure video analysis results
Technical Implementation
- Frontend: Next.js 15 with TypeScript and Tailwind CSS
- AI Integration: Google AI Studio with Gemini 2.5 Flash (video, image, text, audio) and Imagen 4.0 (infographics)
- Voice Processing: Web Speech API for speech-to-text and text-to-speech
- Image Processing: Canvas API for image optimization and preprocessing
- Deployment: Google Cloud Run with automatic scaling
- Database: Supabase for health records and user data
- Accessibility: WCAG 2.1 AA compliant with comprehensive testing
Impact & Accessibility
This project demonstrates how AI can make healthcare more accessible to everyone, particularly:
- Visually impaired users who can navigate entirely by voice
- Elderly users who may have difficulty with complex interfaces
- Users with motor disabilities who rely on voice commands
- Non-native speakers who can describe symptoms in their own words
The multimodal approach ensures that health monitoring is not limited by traditional input methods, making medical assistance truly inclusive.
Built with ❤️ for the Google AI Studio Multimodal Challenge
Top comments (17)
Awesome 😍
Thank you 😊
Overall a good submission and nice article
Thank you 🙏
Simple and useful 👍
Can we collaborate, i have few ideas to make this super useful
Yes sure
i think with some more feature integration and enhancements, this can be real product as well. Good work, all the best 👍
Thank you 😊
Very resourceful !
Thank you 😊
Good one
Thank you 😊
Couple of useful features, great work
Thank you
Simple idea, but you clubbed it well with multiple features.
Is the code public on github?
Thank you