Veronika Kashtanova

Posted on Feb 19 • Edited on Mar 16

Gemini Tales: Turning Screen Time Into Active Adventure🧸

#agents #buildmultiagents #gemini #adk

Education Track: Build Multi-Agent Systems with ADK

This post is my submission for DEV Education Track: Build Multi-Agent Systems with ADK and Gemini Live Agent Challenge. I created this content specifically to document how the project was built with Google AI models and Google Cloud. #GeminiLiveAgentChallenge

What I Built

I built Gemini Tales, an interactive storytelling experience that blends real-time AI conversation with physical activity verification. It solves a haunting statistic: 80% of children today don't move enough. While technology is often seen as the cause of sedentary behavior, I wanted to turn the screen into a catalyst for movement.

Gemini Tales now offers two distinct ways to experience the magic:

🎙️ Live Mode: Spontaneous, highly interactive, and evolving based on every word the child says (Powered by Gemini Live 2.5 Flash with native audio/vision).
🤖 Agent Mode: A structured narrative epic pre-generated by a specialized agent network (Gemini 3.1 Pro for orchestration, Gemini 3.1 Flash-Lite for research & safety) before the curtain rises, then narrated by Puck with Gemini Live 2.5 Flash.

📹 Watch the Vision: See how we turn sedentary screen time into an active adventure.

Early concept:

Latest demo with full Agent Mode:

Gemini Tales doesn't just tell a story—it sees your child, hears their voice, and asks them to ACT. Every physical movement becomes part of the magic. The story literally pauses until Puck visually verifies the Magic Sign (two fingers up) via the camera feed.

Cloud Run Embed

The project is currently running in Google Cloud Run (with the dev label):

Note: The live demo relies on experimental Gemini Live BIDI WebSockets. Due to hackathon API quota limits and strict browser audio-context policies, the live connection might occasionally drop. For the guaranteed, stable experience, please watch the Demo Video above!

🧚 The Experience: Live Multimodal Storytelling

The frontend is a direct bridge to Gemini Live API, enabling unified Voice + Vision interaction in real-time.

Features That Create Magic ✨

Feature	What It Does	Tech Stack
🎙️ Stable Voice Live	Interruption-aware, low-latency conversation.	Gemini Live 2.5 Flash
📸 Visual Awareness	Real-time video stream (1 FPS) lets AI "see" movement.	Gemini Live 2.5 Flash + Camera
🎬 Cinematic Animation	Magical video previews that bring Puck to life.	Veo 3.1 (NEW in final version)
🎨 Dynamic Illustrations	Watercolor-style art that evolves with the plot.	Gemini 2.5 Flash-Image
⚡ Agent-Driven Context	Deep research & narrative weaving before the show.	Gemini 3.1 Pro + ADK A2A
🎮 Physical Verification	AI confirms movement via vision, not just voice claims.	Multi-Agent Verification

🤖 The Brain: Multi-Agent Story Engine

The backend is a distributed multi-agent system built with the Google Agent Development Kit (ADK) and the A2A (Agent-to-Agent) protocol. This ensures specialization, reliability, and scalability.

NOTE: Early versions used raw Vertex AI calls. The final architecture pivots to ADK's SequentialAgent + RemoteA2aAgent pattern for cleaner orchestration.

🎭 Meet the Agents

Agent	Role	Model
🔍 Adventure Seeker	Physical activity planning & Legend research	Gemini 3.1 Flash-Lite + google_search
⚖️ Guardian of Balance	Safety & activity density validation	Gemini 3.1 Flash-Lite
✍️ Storysmith	Narrative weaving & character depth	Gemini 3.1 Pro
🧚 Puck (Root Agent)	Live Narrator—voice, vision, tool coordination	Gemini Live 2.5 Flash + FastAPI
🪄 Orchestrator	Multi-agent coordination & loop escalation	ADK SequentialAgent

Architecture Highlights

┌─────────────────────────────────────┐
│  Frontend (React 19 + Gemini Live)  │
│ Voice • Vision • Real-time Feedback │
└────────────────┬────────────────────┘
                 │ WebSocket (OAuth2 secured)
┌────────────────▼────────────────────┐
│   FastAPI Gateway (Port 8000)        │
│   • WebSocket Proxy to Vertex AI     │
│   • OAuth2 Token Generation          │
│   • OpenTelemetry Tracing            │
└────────────────┬────────────────────┘
                 │ A2A Protocol (HTTP + OAuth2)
    ┌────────────┼────────────┬────────────┐
    │            │            │            │
┌───▼──┐  ┌─────▼──┐  ┌─────▼──┐  ┌────▼───┐
│8001  │  │ 8002   │  │ 8003   │  │ 8004   │
│Seeker│  │Guardian│  │Storysmth   Orch.   │
└──────┘  └────────┘  └─────────┘  └────────┘
(A2A)    (A2A)      (A2A)      (Root)

Key Design Decision: Instead of scripting Puck's behavior, Puck runs as an ADK Agent with its own tool set (generateIllustration, awardBadge, verifyPhysicalChallenge). This means:

Puck's responses are AI-driven, not hardcoded
The Orchestrator only manages the pre-story context via the other agents
Live narration is genuinely adaptive to the child's input

For a detailed deep-dive into the system design, see ARCHITECTURE.md.

🏗️ Evolution: From Tutorial to Hackathon

The Learning Journey

This project started as a journey through the Build Multi-Agent Systems with ADK track in mid-Feb. I took those core architectural patterns and pivoted toward something bigger: an AI Nanny that inspires children to move.

Mid-Build Pivot (4 days before deadline): After completing the Way Back Home interactive course series (featured in the official Gemini Live Agent Challenge resources) and the official Build your agent with ADK tutorial, I rewrote the entire agent orchestration layer to use ADK's SequentialAgent and RemoteA2aAgent instead of raw HTTP calls. This was risky that close to submission, but it produced a fundamentally cleaner architecture.

The transition to Gemini 2.5 and 3.1 models has drastically improved the latency and reasoning capabilities of the "Puck" avatar, making it feel less like a bot and more like a magical forest sprite living in the mirror.

🛠️ Tech Stack

Layer	Technology	Details
Frontend	React 19, Vite, TypeScript, Tailwind CSS	"Magic Mirror" dashboard with dual-stream chat
AI Models	Gemini Live 2.5 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Veo 3.1, Gemini 2.5 Flash-Image	Real-time voice/vision, orchestration, research, animation, illustration
Backend Framework	Google ADK, Agent-to-Agent (A2A) Protocol	Distributed multi-agent system with structured loop escalation
Infrastructure	FastAPI (Python), WebSockets, Google Cloud Run	Serverless, containerized, OAuth2-authenticated inter-service comms
Observability	OpenTelemetry, Google Cloud Trace	Full request tracing from frontend through agent pipeline
Dev Tools	Antigravity IDE, uv package manager, gcloud CLI, PowerShell automation	Local dev with 5-service orchestration, one-command Cloud Run deploy

📂 Getting Started

Prerequisites

Python 3.10+ and uv installed
Node.js 18+ and npm
Google Cloud Project with Gemini API access

Local Development (3 Terminals)

Terminal 1: Start ADK Agents (The Brain)

cd backend/agents
.\run_local.ps1

This starts the sub-agents on ports 8001–8004 required for Agent Mode.

Terminal 2: Start Main Agent (Puck)

cd backend
uv sync
uv run python app/main.py

Starts Puck, the Live Narrator, ready to see and hear you (Port 8000).

Terminal 3: Start Frontend

cd frontend
npm install
npm run dev

Visit http://localhost:5173 and start creating stories! ✨

Cloud Deployment (Google Cloud Run)

Prerequisites for Cloud

Google Cloud CLI installed and authenticated (gcloud auth login)
Active Google Cloud Project with Billing enabled
.env file in backend/app/ with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION

Deploy Supporting Agents

cd backend/agents
.\deploy.ps1

Automatically deploys 4 microservices (Researcher, Judge, Storysmith, Orchestrator) to Cloud Run with security configured.

Deploy Main App

# Run from repository root
.\deploy_app.ps1

Handles dual-stage build: compiles React 19 frontend and wraps it with FastAPI/Puck bridge into a single production-ready container.

Pro-Tip: After deployment, manage AI parameters (Model IDs, API Keys) directly through Cloud Run environment variables without re-deploying.

📚 Key Learnings

🛡️ Infrastructure is Code (and Risk)

While completing the Way Back Home course, I discovered a critical issue in the workshop setup scripts (specifically billing-enablement.py). The automation silently defaulted to the user's first personal billing account (open_accounts[0]) if a workshop-specific one wasn't found, renaming it and forcing a link—ignoring any existing promotional credits.

The Reality Check:

Metadata Hijacking: The script programmatically renamed my personal billing profile without consent.
Financial Impact: It detached my project from my credit-funded account and linked it to my personal Mastercard, resulting in unauthorized charges of $10.13.
Silent Execution: The script lacks any input() prompts, making it impossible to intercept these changes in real-time.

The Lesson: I had to manually "patch" the workshop files by adding exit(0) to the billing scripts to prevent further damage. While the educational content of the course was exceptional (10/10), the infrastructure automation was a harsh reminder: always audit third-party setup scripts before running them with elevated permissions. This project taught me that "Agentic Orchestration" starts with the environment, and interactive confirmation in automated DevOps pipelines is not optional.

Specialization > Monoliths

I was surprised at how much more reliable the system became when I stopped relying on one giant prompt and started treating agents like a specialized team with distinct responsibilities. The Guardian of Balance agent alone caught narrative safety issues that a single monolithic prompt would have missed.

The Power of A2A Protocol

Implementing Agent-to-Agent communication was challenging, especially handling Google Application Default Credentials (ADC) on Windows. But once it clicked, the elegance of distributed agents became clear. The Orchestrator only needs to know the agent card URL—not the implementation. This enables independent scaling and future language-agnostic composition.

Movement Changes Everything

The most rewarding part? Seeing a child leap off the couch when Puck asked them to "show me how you jump." Screen time transformed from sedentary consumption into active play. Physical verification via 1 FPS vision is latency-acceptable and behaviorally transformative.

📂 Open Source & Reproducibility

The full source code—including ADK orchestration logic, deployment scripts, and frontend—is available on GitHub:

👉 GitHub: vero-code/gemini-tales

Features:

✅ Full Docker & Cloud Run deployment with OAuth2 inter-service auth
✅ Multi-agent architecture with A2A protocol and structured loop escalation
✅ Live API integration with WebSocket proxy for credential security
✅ Comprehensive ARCHITECTURE.md for deep-dives
✅ 111+ commits documenting the evolution from raw API calls to ADK agents

🏆 If Gemini Tales Wins...

If this project wins the Gemini Live Agent Challenge, here's what I'm committing to build:

Phase 1: Educator Adoption (Months 1-2)

Educator Dashboard: Teachers configure story themes, movement goals, and age-appropriate challenges per session
School Deployment Pack: A simplified Cloud Run setup guide + Docker image for schools to self-host
Movement Metrics: Track physical activity data per child (with parental consent) to prove the "screen time → move time" transformation
Free Tier for Non-profits: Educational institutions get free Cloud Run quota for one academic year

Phase 2: Global Scale (Months 2-4)

Multiplayer Mode: Two children, one story, coordinated physical challenges. Puck asks "Can you BOTH hop together?" and uses vision to verify synchronized movement
Multilingual Support: Core stories in 10 languages
Cultural Localization: Agent Mode story themes adapt to regional legends, holidays, and cultural values
Mobile App: Native iOS/Android for living-room play without a laptop (React Native port of the "Magic Mirror")

Phase 3: Premium Tier (Month 4+)

Gemini Tales Premium: Parent dashboard exposing the raw agent pipeline (Researcher, Judge, Storysmith working in real-time) so adults can see exactly how each story was crafted
Custom Character Library: Upload your own character art (pet, stuffed animal, superhero OC) and have Puck transform it into the main character
Extended Story Packs: Professionally written, multi-session adventures (The Dragon's Lair, The Enchanted Forest, The Lost Temple) with persistent progression across sessions
Gamification API: Developers can integrate their own movement tracking devices (Fitbit, Apple Watch, smart scales) to unlock story-specific achievements

Phase 4: Research & Impact (Ongoing)

Peer-Reviewed Study: Partner with pediatricians and child psychologists to measure sedentary reduction and cognitive engagement metrics
Open Data Initiative: Anonymized, aggregated movement data shared with research institutions studying childhood activity
ADK Extensibility: Document the multi-agent orchestration pattern as a reusable template for other child-safe AI applications
Google Cloud Starter Kit: Contribute a "Gemini Tales Architecture" as an official Google Cloud Solution template for educational AI

The Bigger Vision 🌍

If we can prove that AI can inspire movement instead of sedentary consumption, we unlock a new category of tech: AI that optimizes for human health, not screen time.

Winning this challenge means the resources to show that Gemini Tales is reproducible, scalable, and genuinely life-changing for kids. It's not just a hackathon project—it's a proof-of-concept for the next generation of responsible AI.

The goal: 10,000 children moving more because of this app by end of 2026.