Healthcare workers in rural clinics and home-care settings often face critical situations without immediate access to specialist guidance. MediSense is an AI-powered remote emergency co-pilot that provides real-time voice, video, and screen analysis to support clinical decision-making — built entirely on Google AI and Google Cloud.
Live Demo: https://medisense-130810972151.us-central1.run.app
GitHub: https://github.com/lakshay0007/MediSense
The Problem
A junior nurse in a remote clinic encounters a wound she hasn't seen before. There's no senior doctor on-site. She needs guidance — right now — not in 30 minutes when someone calls back. MediSense bridges that gap with an AI co-pilot that can see what she sees and talk her through it in real time.
Google AI Models Used
- Gemini Live 2.5 Flash (Native Audio) — Real-Time Multimodal Streaming The core of MediSense is the Gemini Multimodal Live API (gemini-live-2.5-flash-native-audio). This enables:
Live voice conversation — the nurse speaks naturally and gets spoken responses
Real-time camera analysis — point a phone camera at a wound, equipment, or patient and Gemini analyzes the video feed live
Screen sharing analysis — share an EHR screen or vital signs monitor for AI interpretation
- Gemini 2.0 Flash (Image Generation) — Visual Aid Generation MediSense also uses gemini-2.0-flash-preview-image-generation to generate visual aids on demand — anatomical diagrams, procedure illustrations, or reference images that help guide clinical procedures.
Google Cloud Architecture
Vertex AI
All Gemini API calls go through Vertex AI, Google Cloud's managed ML platform:
Using Vertex AI gives us enterprise-grade authentication (OAuth 2.0), regional endpoints, and production-ready reliability.
Cloud Run
The app is containerized with Docker and deployed to Google Cloud Run for serverless, auto-scaling hosting:
Cloud Run handles scaling automatically — zero instances when idle, scaling up under load — perfect for a healthcare tool that needs to be always available but cost-efficient.
Cloud Build
CI/CD is handled by Google Cloud Build, which builds the Docker image directly from source:
What I Learned
The Multimodal Live API is a game-changer. Being able to stream video + audio bidirectionally opens up use cases that weren't possible with traditional request-response APIs. Healthcare is one of the most impactful.
Vertex AI simplifies production deployment. OAuth-based auth, regional endpoints, and the google-genai SDK made it straightforward to go from prototype to production.
Cloud Run + Cloud Build = fast iteration. Push code, build container, deploy — all in under 2 minutes. For a hackathon, this speed is essential.
Top comments (0)