This post is my submission for DEV Education Track: Build Multi-Agent Systems with ADK and Gemini Live Agent Challenge. I created this content specifically to document how the project was built with Google AI models and Google Cloud. #GeminiLiveAgentChallenge
What I Built
I built Gemini Tales, an interactive storytelling experience that blends real-time AI conversation with physical activity verification. It solves a haunting statistic: 80% of children today don't move enough. While technology is often seen as the cause of sedentary behavior, I wanted to turn the screen into a catalyst for movement.
Gemini Tales now offers two distinct ways to experience the magic:
- ๐๏ธ Live Mode: Spontaneous, highly interactive, and evolving based on every word the child says (Powered by Gemini Live 2.5 Flash with native audio/vision).
- ๐ค Agent Mode: A structured narrative epic pre-generated by a specialized agent network (Gemini 3.1 Pro for orchestration, Gemini 3.1 Flash-Lite for research & safety) before the curtain rises, then narrated by Puck with Gemini Live 2.5 Flash.
๐น Watch the Vision: See how we turn sedentary screen time into an active adventure.
Early concept:
Latest demo with full Agent Mode:
Gemini Tales doesn't just tell a storyโit sees your child, hears their voice, and asks them to ACT. Every physical movement becomes part of the magic. The story literally pauses until Puck visually verifies the Magic Sign (two fingers up) via the camera feed.
Cloud Run Embed
The project is currently running in Google Cloud Run (with the dev label):
Note: The live demo relies on experimental Gemini Live BIDI WebSockets. Due to hackathon API quota limits and strict browser audio-context policies, the live connection might occasionally drop. For the guaranteed, stable experience, please watch the Demo Video above!
๐ง The Experience: Live Multimodal Storytelling
The frontend is a direct bridge to Gemini Live API, enabling unified Voice + Vision interaction in real-time.
Features That Create Magic โจ
| Feature | What It Does | Tech Stack |
|---|---|---|
| ๐๏ธ Stable Voice Live | Interruption-aware, low-latency conversation. | Gemini Live 2.5 Flash |
| ๐ธ Visual Awareness | Real-time video stream (1 FPS) lets AI "see" movement. | Gemini Live 2.5 Flash + Camera |
| ๐ฌ Cinematic Animation | Magical video previews that bring Puck to life. | Veo 3.1 (NEW in final version) |
| ๐จ Dynamic Illustrations | Watercolor-style art that evolves with the plot. | Gemini 2.5 Flash-Image |
| โก Agent-Driven Context | Deep research & narrative weaving before the show. | Gemini 3.1 Pro + ADK A2A |
| ๐ฎ Physical Verification | AI confirms movement via vision, not just voice claims. | Multi-Agent Verification |
๐ค The Brain: Multi-Agent Story Engine
The backend is a distributed multi-agent system built with the Google Agent Development Kit (ADK) and the A2A (Agent-to-Agent) protocol. This ensures specialization, reliability, and scalability.
NOTE: Early versions used raw Vertex AI calls. The final architecture pivots to ADK's SequentialAgent + RemoteA2aAgent pattern for cleaner orchestration.
๐ญ Meet the Agents
| Agent | Role | Model |
|---|---|---|
| ๐ Adventure Seeker | Physical activity planning & Legend research | Gemini 3.1 Flash-Lite + google_search |
| โ๏ธ Guardian of Balance | Safety & activity density validation | Gemini 3.1 Flash-Lite |
| โ๏ธ Storysmith | Narrative weaving & character depth | Gemini 3.1 Pro |
| ๐ง Puck (Root Agent) | Live Narratorโvoice, vision, tool coordination | Gemini Live 2.5 Flash + FastAPI |
| ๐ช Orchestrator | Multi-agent coordination & loop escalation | ADK SequentialAgent |
Architecture Highlights
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Frontend (React 19 + Gemini Live) โ
โ Voice โข Vision โข Real-time Feedback โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ WebSocket (OAuth2 secured)
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Gateway (Port 8000) โ
โ โข WebSocket Proxy to Vertex AI โ
โ โข OAuth2 Token Generation โ
โ โข OpenTelemetry Tracing โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ A2A Protocol (HTTP + OAuth2)
โโโโโโโโโโโโโโผโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ โ โ
โโโโโผโโโ โโโโโโโผโโโ โโโโโโโผโโโ โโโโโโผโโโโ
โ8001 โ โ 8002 โ โ 8003 โ โ 8004 โ
โSeekerโ โGuardianโ โStorysmth Orch. โ
โโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโ
(A2A) (A2A) (A2A) (Root)
Key Design Decision: Instead of scripting Puck's behavior, Puck runs as an ADK Agent with its own tool set (generateIllustration, awardBadge, verifyPhysicalChallenge). This means:
- Puck's responses are AI-driven, not hardcoded
- The Orchestrator only manages the pre-story context via the other agents
- Live narration is genuinely adaptive to the child's input
For a detailed deep-dive into the system design, see ARCHITECTURE.md.
๐๏ธ Evolution: From Tutorial to Hackathon
The Learning Journey
This project started as a journey through the Build Multi-Agent Systems with ADK track in mid-Feb. I took those core architectural patterns and pivoted toward something bigger: an AI Nanny that inspires children to move.
Mid-Build Pivot (4 days before deadline): After completing the Way Back Home interactive course series (featured in the official Gemini Live Agent Challenge resources) and the official Build your agent with ADK tutorial, I rewrote the entire agent orchestration layer to use ADK's SequentialAgent and RemoteA2aAgent instead of raw HTTP calls. This was risky that close to submission, but it produced a fundamentally cleaner architecture.
The transition to Gemini 2.5 and 3.1 models has drastically improved the latency and reasoning capabilities of the "Puck" avatar, making it feel less like a bot and more like a magical forest sprite living in the mirror.
๐ ๏ธ Tech Stack
| Layer | Technology | Details |
|---|---|---|
| Frontend | React 19, Vite, TypeScript, Tailwind CSS | "Magic Mirror" dashboard with dual-stream chat |
| AI Models | Gemini Live 2.5 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Veo 3.1, Gemini 2.5 Flash-Image | Real-time voice/vision, orchestration, research, animation, illustration |
| Backend Framework | Google ADK, Agent-to-Agent (A2A) Protocol | Distributed multi-agent system with structured loop escalation |
| Infrastructure | FastAPI (Python), WebSockets, Google Cloud Run | Serverless, containerized, OAuth2-authenticated inter-service comms |
| Observability | OpenTelemetry, Google Cloud Trace | Full request tracing from frontend through agent pipeline |
| Dev Tools | Antigravity IDE, uv package manager, gcloud CLI, PowerShell automation | Local dev with 5-service orchestration, one-command Cloud Run deploy |
๐ Getting Started
Prerequisites
- Python 3.10+ and
uvinstalled - Node.js 18+ and npm
- Google Cloud Project with Gemini API access
Local Development (3 Terminals)
Terminal 1: Start ADK Agents (The Brain)
cd backend/agents
.\run_local.ps1
This starts the sub-agents on ports 8001โ8004 required for Agent Mode.
Terminal 2: Start Main Agent (Puck)
cd backend
uv sync
uv run python app/main.py
Starts Puck, the Live Narrator, ready to see and hear you (Port 8000).
Terminal 3: Start Frontend
cd frontend
npm install
npm run dev
Visit http://localhost:5173 and start creating stories! โจ
Cloud Deployment (Google Cloud Run)
Prerequisites for Cloud
- Google Cloud CLI installed and authenticated (
gcloud auth login) - Active Google Cloud Project with Billing enabled
-
.envfile inbackend/app/withGOOGLE_CLOUD_PROJECTandGOOGLE_CLOUD_LOCATION
Deploy Supporting Agents
cd backend/agents
.\deploy.ps1
Automatically deploys 4 microservices (Researcher, Judge, Storysmith, Orchestrator) to Cloud Run with security configured.
Deploy Main App
# Run from repository root
.\deploy_app.ps1
Handles dual-stage build: compiles React 19 frontend and wraps it with FastAPI/Puck bridge into a single production-ready container.
Pro-Tip: After deployment, manage AI parameters (Model IDs, API Keys) directly through Cloud Run environment variables without re-deploying.
๐ Key Learnings
๐ก๏ธ Infrastructure is Code (and Risk)
While completing the Way Back Home course, I discovered a critical issue in the workshop setup scripts (specifically billing-enablement.py). The automation silently defaulted to the user's first personal billing account (open_accounts[0]) if a workshop-specific one wasn't found, renaming it and forcing a linkโignoring any existing promotional credits.
The Reality Check:
Metadata Hijacking: The script programmatically renamed my personal billing profile without consent.
Financial Impact: It detached my project from my credit-funded account and linked it to my personal Mastercard, resulting in unauthorized charges of $10.13.
Silent Execution: The script lacks any
input()prompts, making it impossible to intercept these changes in real-time.
The Lesson: I had to manually "patch" the workshop files by adding exit(0) to the billing scripts to prevent further damage. While the educational content of the course was exceptional (10/10), the infrastructure automation was a harsh reminder: always audit third-party setup scripts before running them with elevated permissions. This project taught me that "Agentic Orchestration" starts with the environment, and interactive confirmation in automated DevOps pipelines is not optional.
Specialization > Monoliths
I was surprised at how much more reliable the system became when I stopped relying on one giant prompt and started treating agents like a specialized team with distinct responsibilities. The Guardian of Balance agent alone caught narrative safety issues that a single monolithic prompt would have missed.
The Power of A2A Protocol
Implementing Agent-to-Agent communication was challenging, especially handling Google Application Default Credentials (ADC) on Windows. But once it clicked, the elegance of distributed agents became clear. The Orchestrator only needs to know the agent card URLโnot the implementation. This enables independent scaling and future language-agnostic composition.
Movement Changes Everything
The most rewarding part? Seeing a child leap off the couch when Puck asked them to "show me how you jump." Screen time transformed from sedentary consumption into active play. Physical verification via 1 FPS vision is latency-acceptable and behaviorally transformative.
๐ Open Source & Reproducibility
The full source codeโincluding ADK orchestration logic, deployment scripts, and frontendโis available on GitHub:
๐ GitHub: vero-code/gemini-tales
Features:
- โ Full Docker & Cloud Run deployment with OAuth2 inter-service auth
- โ Multi-agent architecture with A2A protocol and structured loop escalation
- โ Live API integration with WebSocket proxy for credential security
- โ Comprehensive ARCHITECTURE.md for deep-dives
- โ 111+ commits documenting the evolution from raw API calls to ADK agents
๐ If Gemini Tales Wins...
If this project wins the Gemini Live Agent Challenge, here's what I'm committing to build:
Phase 1: Educator Adoption (Months 1-2)
- Educator Dashboard: Teachers configure story themes, movement goals, and age-appropriate challenges per session
- School Deployment Pack: A simplified Cloud Run setup guide + Docker image for schools to self-host
- Movement Metrics: Track physical activity data per child (with parental consent) to prove the "screen time โ move time" transformation
- Free Tier for Non-profits: Educational institutions get free Cloud Run quota for one academic year
Phase 2: Global Scale (Months 2-4)
- Multiplayer Mode: Two children, one story, coordinated physical challenges. Puck asks "Can you BOTH hop together?" and uses vision to verify synchronized movement
- Multilingual Support: Core stories in 10 languages
- Cultural Localization: Agent Mode story themes adapt to regional legends, holidays, and cultural values
- Mobile App: Native iOS/Android for living-room play without a laptop (React Native port of the "Magic Mirror")
Phase 3: Premium Tier (Month 4+)
- Gemini Tales Premium: Parent dashboard exposing the raw agent pipeline (Researcher, Judge, Storysmith working in real-time) so adults can see exactly how each story was crafted
- Custom Character Library: Upload your own character art (pet, stuffed animal, superhero OC) and have Puck transform it into the main character
- Extended Story Packs: Professionally written, multi-session adventures (The Dragon's Lair, The Enchanted Forest, The Lost Temple) with persistent progression across sessions
- Gamification API: Developers can integrate their own movement tracking devices (Fitbit, Apple Watch, smart scales) to unlock story-specific achievements
Phase 4: Research & Impact (Ongoing)
- Peer-Reviewed Study: Partner with pediatricians and child psychologists to measure sedentary reduction and cognitive engagement metrics
- Open Data Initiative: Anonymized, aggregated movement data shared with research institutions studying childhood activity
- ADK Extensibility: Document the multi-agent orchestration pattern as a reusable template for other child-safe AI applications
- Google Cloud Starter Kit: Contribute a "Gemini Tales Architecture" as an official Google Cloud Solution template for educational AI
The Bigger Vision ๐
If we can prove that AI can inspire movement instead of sedentary consumption, we unlock a new category of tech: AI that optimizes for human health, not screen time.
Winning this challenge means the resources to show that Gemini Tales is reproducible, scalable, and genuinely life-changing for kids. It's not just a hackathon projectโit's a proof-of-concept for the next generation of responsible AI.
The goal: 10,000 children moving more because of this app by end of 2026.
๐ฏ Why This Matters
Technology is often the villain in this story. But what if it could be the hero?
Gemini Tales proves that with the right architecture and intention, we can build AI experiences that:
- โ Entertain (magical storytelling powered by Gemini 3.1)
- โ Engage (real-time interaction with Gemini Live)
- โ Activate (physical movement required and verified)
- โ Educate (safe, age-appropriate learning with agent-driven safety review)
This is technology in service of human health.
๐ License
MIT โ See LICENSE
Created with โค๏ธ for the next generation of active explorers.
Tags: #GeminiLiveAgentChallenge #GoogleAI #MultiAgentSystems #ADK #ChildHealth #InteractiveTech #A2AProtocol #Veo #GeminiLive #EducationTech


Top comments (2)
How's it going?
Great, just added the embed + dev label. Thanks for checking in! ๐