DEV Community

Cover image for Gemini's Paw-sitive Insight: An AI Assistant for Your Pet's Health
Svet
Svet

Posted on

Gemini's Paw-sitive Insight: An AI Assistant for Your Pet's Health

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built AI Veterinary Assistant, an empathetic and interactive tool designed to help pet owners understand their pet's condition before a vet visit. The app goes beyond a simple information-gathering tool by providing a preliminary analysis, including potential differential diagnoses and immediate, actionable recommendations to alleviate discomfort. It uses multimodal AI to synthesize a pet's image/video, the owner's observations, and a live voice conversation into a clear, preliminary report. The goal is to empower pet owners with information and a sense of preparedness, turning a stressful situation into a more manageable one.

Demo

Deployed Applet: https://ai-veterinary-assistant-162545858215.us-west1.run.app/
Video Demo:
Screenshots:

How I Used Google AI Studio

I used Google AI Studio as the core platform for both development and deployment. Its intuitive environment allowed me to quickly prototype the multimodal and conversational flow. I implemented a detailed System Instruction to guide the Gemini model. The instruction prompts the model to not only provide a summary of symptoms but also to generate a list of likely diagnoses and safe, immediate care recommendations.

The seamless integration with Cloud Run was instrumental. By using the "Deploy app" feature, I was able to transform my prototype into a live, publicly accessible web service without having to build and manage a separate backend. This allowed me to focus on the frontend user experience and the core AI logic, including the critical disclaimer that accompanies every result to ensure responsible use.

Multimodal Features

This app is a robust demonstration of Gemini's multimodal capabilities, combining three key modalities to deliver a unique user experience:

Image and Video Analysis: The applet accepts visual input (an image or video of the pet) and uses Gemini to analyze it for visual cues. For example, it can identify a limp, a rash, or changes in behavior, which are then used as part of the diagnosis process.

Text Understanding: The AI synthesizes the owner's written observations and notes to understand the history and context of the pet's condition. This text input is crucial for providing a detailed, personalized diagnosis.

Real-time Voice Interaction (Live API): This is the core interactive component. The pet owner can have a natural, real-time conversation with the AI assistant, which acts as a guide. It listens, transcribes, and uses the live dialogue to ask clarifying questions and build the most accurate preliminary report possible.

This combination of image, text, and voice interaction allows the AI to provide a comprehensive and deeply personalized assessment, making a stressful situation more manageable and empowering the pet owner with information they can use during their vet visit.

Top comments (0)