Building an App Just by Talking to AI: My Google AI Studio Multimodal Challenge

#devchallenge #googleaichallenge #ai #gemini

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built a voice assistant app to make conversations with AI more natural. It's designed for AIs without a microphone function, like Google Gemini. Instead of manual text input, users can talk directly to the AI.

The app converts spoken words into text in real-time, which can then be copied to the clipboard with one click. This makes pasting the text into any AI chat, such as Gemini, seamless and intuitive.

My project is based on a simple concept: anyone, even without programming knowledge, can use voice to interact with an AI and solve a real-world problem.

Demo

This is a short video demonstrating the app's functionality:
yuutube
GITHub HP.

How I Used Google AI Studio

This app is a story of co-creation with AI. I'm not a programmer, but this project began when I simply started a conversation with Google AI Studio, saying, "I want to build an app that can take voice input."

Initial Ideation: I explained the app's concept and the necessary functions—voice recognition, text conversion, and a copy feature—to AI Studio.

Code Generation: AI Studio understood my instructions and generated the initial HTML, CSS, and JavaScript code.

Feature Refinement: I requested further improvements from AI Studio, such as "auto-start voice recognition" and "character count display," to enhance the user experience.

This project proves that AI Studio is more than just a code generator; it's a creative partner that helps turn ideas into reality.

Multimodal Features

My app leverages "voice" as a new modality for interacting with AI.

Voice Input: Users can speak to the app instead of typing, making the interaction with AI feel more human and natural.

Extending AI Tools: My app expands the use of AI tools like Gemini by enabling them to be controlled with voice. This creates a richer user experience by combining two distinct modalities: voice input and text-based AI output.

This project is a small attempt to build a new type of "AI assistant" that combines voice and AI in a novel way. -->