DEV Community

ruturaj khandale
ruturaj khandale

Posted on

Smart Elderly Care Assistant

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built the "Smart Elderly Care Assistant," a web application designed to assist elderly individuals in understanding visual information more easily. The application allows users to upload an image of a prescription, a product label, or any object they have questions about. They can then ask a question about the image using either text or their voice, and the application will provide a clear, text-based answer. To further enhance accessibility, the application can also read the generated answer aloud. This solves the common problem of difficulty in reading small print on medication bottles or understanding complex product information, making daily life safer and more independent for elderly users.

Demo

You can try the deployed application here:

https://smart-elderly-care-assistant-743601729048.us-central1.run.app

I would recommend adding screenshots or a short video of the application in action to showcase its features, especially the image upload and voice interaction.

How I Used Google AI Studio

I used Google AI Studio to prototype and test the core functionality of my application. It was instrumental in crafting the prompts and exploring the capabilities of the Gemini 2.5 Flash model to handle multimodal inputs (image and text). AI Studio allowed me to quickly iterate on different prompts and see how the model would respond to various images and questions, which was crucial for building a reliable and helpful application. The ability to easily test the API in a web interface before writing any code saved a significant amount of development time.

Multimodal Features

The Smart Elderly Care Assistant leverages several multimodal features to create an intuitive and accessible user experience:

Image and Text Input: The core feature of the application is its ability to accept both an image and a text prompt as input. This allows users to provide visual context (the image) and then ask specific questions about it in natural language. This is a powerful combination that enables a wide range of use cases, from understanding medication instructions to identifying objects.
Speech-to-Text: To make the application even more accessible, I implemented a voice input feature. Users can simply click a button and speak their question instead of typing. This is particularly helpful for users who may have difficulty with keyboards.
Text-to-Speech: The application also includes a "read aloud" button that uses the browser's built-in speech synthesis to read the generated answer. This is a critical feature for users with visual impairments or those who simply prefer to listen to the response

Top comments (0)