AI Toy Companion

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

*This is a submission for AI Toy Companion the(https://ai.studio/apps/drive/1uoVd69qmdyCLy-Ju1v1681cUWT4myEdD)

What I Built

I built an AI Toy Companion — a magical app that turns any toy photo into a living, talking, and smiling virtual friend.

This companion isn’t just a chatbot — it’s a multimodal experience powered by Google AI Studio that makes toys come alive with voice, actions, and stories.

Here’s what it can do:

See & Understand: Upload any toy photo, and the AI describes it in simple, playful words.

Talk & Listen: The toy responds in a cheerful, child-like voice to text or voice commands.

Act & Animate: Commands like “dance”, “jump”, or “be happy” trigger fun cartoon-style animations.

Tell Stories: On request, the toy invents short, imaginative stories with matching visuals.

Stay Positive: If asked to do something sad (like “cry”), the toy gently refuses and instead does something joyful (like smiling or hugging).
Why it’s special:
This project focuses on fun + safety + imagination. It transforms ordinary toys into interactive companions that always spread positivity, making it delightful for kids and even nostalgic for adults.

Demo

(https://drive.google.com/file/d/1S68icqkJGnwhg8sJfTscWbGkI1Jyd-LW/view?usp=sharing)

How I Used Google AI Studio

I used Google AI Studio to bring my AI Toy Companion to life.

Gemini 2.5 Pro/Flash understood toy images and user commands.

I created a prompt system that gives two things every time:

A playful reply from the toy (text + voice).

An image generation prompt so the toy can act (like dancing, jumping, smiling).

For voice interaction, I used Speech-to-Text to understand users and Text-to-Speech to give the toy its own voice.

For visuals, I used Imagen/Veo to generate fun cartoon-style images and animations of the toy.

Finally, I deployed everything with Cloud Run so people can try it live.

With Google AI Studio’s multimodal power, the toy is no longer just a photo — it can see, talk, act, and even perform in generated images .

Multimodal Features

My AI Toy Companion uses multiple AI modes together to create a fun and interactive experience:

Image Understanding → The AI looks at toy photos and describes them in simple words.

Voice Interaction → Users can give commands by speaking, and the toy replies back in a playful voice.

Action Generation → Every command creates a short image/video prompt so the toy can perform actions like dancing, jumping, or smiling in cartoon style.

Storytelling & Emotions → The toy can tell short stories or show happy/angry expressions, making it feel alive.

Positive Twist → For negative commands (like “cry”), the toy gently changes it into something positive (like “smile” or “hug”).

These multimodal features make the toy feel more like a real companion — it can see, listen, talk, act, and express emotions, which is far more engaging than a normal chatbot

Since I built this project alone, this is an individual submission (no team members).

Thank You