*This is a submission for AI Toy Companion the(https://ai.studio/apps/drive/1uoVd69qmdyCLy-Ju1v1681cUWT4myEdD)
What I Built
I built an AI Toy Companion — a magical app that turns any toy photo into a living, talking, and smiling virtual friend.
This companion isn’t just a chatbot — it’s a multimodal experience powered by Google AI Studio that makes toys come alive with voice, actions, and stories.
Here’s what it can do:
See & Understand: Upload any toy photo, and the AI describes it in simple, playful words.
Talk & Listen: The toy responds in a cheerful, child-like voice to text or voice commands.
Act & Animate: Commands like “dance”, “jump”, or “be happy” trigger fun cartoon-style animations.
Tell Stories: On request, the toy invents short, imaginative stories with matching visuals.
Stay Positive: If asked to do something sad (like “cry”), the toy gently refuses and instead does something joyful (like smiling or hugging).
Why it’s special:
This project focuses on fun + safety + imagination. It transforms ordinary toys into interactive companions that always spread positivity, making it delightful for kids and even nostalgic for adults.
Demo
(https://drive.google.com/file/d/1S68icqkJGnwhg8sJfTscWbGkI1Jyd-LW/view?usp=sharing)
How I Used Google AI Studio
I used Google AI Studio to bring my AI Toy Companion to life.
Gemini 2.5 Pro/Flash understood toy images and user commands.
I created a prompt system that gives two things every time:
A playful reply from the toy (text + voice).
An image generation prompt so the toy can act (like dancing, jumping, smiling).
For voice interaction, I used Speech-to-Text to understand users and Text-to-Speech to give the toy its own voice.
For visuals, I used Imagen/Veo to generate fun cartoon-style images and animations of the toy.
Finally, I deployed everything with Cloud Run so people can try it live.
With Google AI Studio’s multimodal power, the toy is no longer just a photo — it can see, talk, act, and even perform in generated images .
Multimodal Features
My AI Toy Companion uses multiple AI modes together to create a fun and interactive experience:
Image Understanding → The AI looks at toy photos and describes them in simple words.
Voice Interaction → Users can give commands by speaking, and the toy replies back in a playful voice.
Action Generation → Every command creates a short image/video prompt so the toy can perform actions like dancing, jumping, or smiling in cartoon style.
Storytelling & Emotions → The toy can tell short stories or show happy/angry expressions, making it feel alive.
Positive Twist → For negative commands (like “cry”), the toy gently changes it into something positive (like “smile” or “hug”).
These multimodal features make the toy feel more like a real companion — it can see, listen, talk, act, and express emotions, which is far more engaging than a normal chatbot
Since I built this project alone, this is an individual submission (no team members).
Thank You
Top comments (0)