This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built MedLens AI, an educational web applet designed to demystify complex medical imaging for patients and medical students. The problem it solves is the significant knowledge gap and anxiety that comes with viewing an X-ray, MRI, or other scan. These images are often intimidating and filled with technical details that are incomprehensible to the untrained eye.
MedLens AI provides a safe and accessible bridge to understanding. A user uploads a medical image, and the applet leverages Gemini's powerful multimodal capabilities to generate a clear, structured explanation.
Crucially, MedLens AI is not a diagnostic tool. Its primary design goal is safety and education. The core experience is built around three pillars:
Objective Description: It describes what is visible in the image using neutral, anatomical terms.
Educational Context: It explains the relevant body parts to help the user understand the 'where' and 'what'.
Empowerment through Questions: It generates a list of intelligent, relevant questions the user can then ask their doctor or professor, facilitating a more meaningful and informed conversation.
Before a user can even upload an image, they must acknowledge a prominent disclaimer that the tool is for educational purposes only and is not a substitute for professional medical advice.
Demo
https://medical-imaging-explainer-185762573786.us-west1.run.app/
How I Used Google AI Studio
Google AI Studio was the central workbench for this entire project. My development process was:
Prototyping and Iteration: I used the AI Studio interface to craft, test, and refine the core logic of the application: the system prompt. I experimented with different instructions to ensure the model's output was not only accurate but also adhered to my strict safety guardrails (i.e., never diagnosing).
Model Selection: AI Studio allowed me to easily test and compare Gemini 1.5 Flash for speed and Gemini 1.5 Pro for its deep analytical power. For this complex, specialized task, I opted for Gemini 1.5 Pro to ensure the highest quality analysis.
Prompt Engineering: The ability to quickly provide an image and text prompt in the same interface was essential. I fine-tuned the prompt to request a specific XML-style output format (, , etc.), which made parsing the model's response in my Python backend predictable and reliable.
Multimodal Features
The core of MedLens AI is its advanced multimodal medical image analysis. This is where the power of Gemini truly shines and enhances the user experience in a way that wouldn't otherwise be possible.
What it does: The applet doesn't just "see" an image; it performs a deep, contextual analysis. It takes the visual data from a scan and cross-references it with its vast knowledge base to identify anatomical structures and describe visual features in precise, technical language.
Why it enhances the experience: A user could try to search for their symptoms with text, but that is generic and unreliable. MedLens AI provides an explanation that is directly tied to their specific visual data. It transforms a static, intimidating medical image into a dynamic, interactive learning tool. By generating both a detailed textual description and a list of contextual questions from a single image, Gemini creates a holistic educational package that empowers the user and fosters greater health literacy.
Top comments (0)