DEV Community

Cover image for Gemini Tales: Turning Screen Time Into Active Adventure๐Ÿงธ
 Veronika Kashtanova
Veronika Kashtanova

Posted on • Edited on

Gemini Tales: Turning Screen Time Into Active Adventure๐Ÿงธ

Education Track: Build Multi-Agent Systems with ADK

This post is my submission for DEV Education Track: Build Multi-Agent Systems with ADK and Gemini Live Agent Challenge. I created this content specifically to document how the project was built with Google AI models and Google Cloud. #GeminiLiveAgentChallenge

What I Built

I built Gemini Tales, an interactive storytelling experience that blends real-time AI conversation with physical activity verification. It solves a haunting statistic: 80% of children today don't move enough. While technology is often seen as the cause of sedentary behavior, I wanted to turn the screen into a catalyst for movement.

Gemini Tales now offers two distinct ways to experience the magic:

  • ๐ŸŽ™๏ธ Live Mode: Spontaneous, highly interactive, and evolving based on every word the child says (Powered by Gemini Live 2.5 Flash with native audio/vision).
  • ๐Ÿค– Agent Mode: A structured narrative epic pre-generated by a specialized agent network (Gemini 3.1 Pro for orchestration, Gemini 3.1 Flash-Lite for research & safety) before the curtain rises, then narrated by Puck with Gemini Live 2.5 Flash.

๐Ÿ“น Watch the Vision: See how we turn sedentary screen time into an active adventure.

Early concept:

Latest demo with full Agent Mode:

Gemini Tales doesn't just tell a storyโ€”it sees your child, hears their voice, and asks them to ACT. Every physical movement becomes part of the magic. The story literally pauses until Puck visually verifies the Magic Sign (two fingers up) via the camera feed.

Wizard casting magic with children in a cozy living room, golden sparkles and stars filling the air.

Cloud Run Embed

The project is currently running in Google Cloud Run (with the dev label):

Note: The live demo relies on experimental Gemini Live BIDI WebSockets. Due to hackathon API quota limits and strict browser audio-context policies, the live connection might occasionally drop. For the guaranteed, stable experience, please watch the Demo Video above!


๐Ÿงš The Experience: Live Multimodal Storytelling

The frontend is a direct bridge to Gemini Live API, enabling unified Voice + Vision interaction in real-time.

Features That Create Magic โœจ

Feature What It Does Tech Stack
๐ŸŽ™๏ธ Stable Voice Live Interruption-aware, low-latency conversation. Gemini Live 2.5 Flash
๐Ÿ“ธ Visual Awareness Real-time video stream (1 FPS) lets AI "see" movement. Gemini Live 2.5 Flash + Camera
๐ŸŽฌ Cinematic Animation Magical video previews that bring Puck to life. Veo 3.1 (NEW in final version)
๐ŸŽจ Dynamic Illustrations Watercolor-style art that evolves with the plot. Gemini 2.5 Flash-Image
โšก Agent-Driven Context Deep research & narrative weaving before the show. Gemini 3.1 Pro + ADK A2A
๐ŸŽฎ Physical Verification AI confirms movement via vision, not just voice claims. Multi-Agent Verification

๐Ÿค– The Brain: Multi-Agent Story Engine

The backend is a distributed multi-agent system built with the Google Agent Development Kit (ADK) and the A2A (Agent-to-Agent) protocol. This ensures specialization, reliability, and scalability.

NOTE: Early versions used raw Vertex AI calls. The final architecture pivots to ADK's SequentialAgent + RemoteA2aAgent pattern for cleaner orchestration.

๐ŸŽญ Meet the Agents

Agent Role Model
๐Ÿ” Adventure Seeker Physical activity planning & Legend research Gemini 3.1 Flash-Lite + google_search
โš–๏ธ Guardian of Balance Safety & activity density validation Gemini 3.1 Flash-Lite
โœ๏ธ Storysmith Narrative weaving & character depth Gemini 3.1 Pro
๐Ÿงš Puck (Root Agent) Live Narratorโ€”voice, vision, tool coordination Gemini Live 2.5 Flash + FastAPI
๐Ÿช„ Orchestrator Multi-agent coordination & loop escalation ADK SequentialAgent

Architecture Highlights

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Frontend (React 19 + Gemini Live)  โ”‚
โ”‚ Voice โ€ข Vision โ€ข Real-time Feedback โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚ WebSocket (OAuth2 secured)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   FastAPI Gateway (Port 8000)        โ”‚
โ”‚   โ€ข WebSocket Proxy to Vertex AI     โ”‚
โ”‚   โ€ข OAuth2 Token Generation          โ”‚
โ”‚   โ€ข OpenTelemetry Tracing            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚ A2A Protocol (HTTP + OAuth2)
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚            โ”‚            โ”‚            โ”‚
โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”
โ”‚8001  โ”‚  โ”‚ 8002   โ”‚  โ”‚ 8003   โ”‚  โ”‚ 8004   โ”‚
โ”‚Seekerโ”‚  โ”‚Guardianโ”‚  โ”‚Storysmth   Orch.   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
(A2A)    (A2A)      (A2A)      (Root)
Enter fullscreen mode Exit fullscreen mode

Key Design Decision: Instead of scripting Puck's behavior, Puck runs as an ADK Agent with its own tool set (generateIllustration, awardBadge, verifyPhysicalChallenge). This means:

  • Puck's responses are AI-driven, not hardcoded
  • The Orchestrator only manages the pre-story context via the other agents
  • Live narration is genuinely adaptive to the child's input

For a detailed deep-dive into the system design, see ARCHITECTURE.md.

Young knight Lily with sword on a magical meadow path, mushroom houses and flowers surrounding her.


๐Ÿ—๏ธ Evolution: From Tutorial to Hackathon

The Learning Journey

This project started as a journey through the Build Multi-Agent Systems with ADK track in mid-Feb. I took those core architectural patterns and pivoted toward something bigger: an AI Nanny that inspires children to move.

Mid-Build Pivot (4 days before deadline): After completing the Way Back Home interactive course series (featured in the official Gemini Live Agent Challenge resources) and the official Build your agent with ADK tutorial, I rewrote the entire agent orchestration layer to use ADK's SequentialAgent and RemoteA2aAgent instead of raw HTTP calls. This was risky that close to submission, but it produced a fundamentally cleaner architecture.

The transition to Gemini 2.5 and 3.1 models has drastically improved the latency and reasoning capabilities of the "Puck" avatar, making it feel less like a bot and more like a magical forest sprite living in the mirror.


๐Ÿ› ๏ธ Tech Stack

Layer Technology Details
Frontend React 19, Vite, TypeScript, Tailwind CSS "Magic Mirror" dashboard with dual-stream chat
AI Models Gemini Live 2.5 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Veo 3.1, Gemini 2.5 Flash-Image Real-time voice/vision, orchestration, research, animation, illustration
Backend Framework Google ADK, Agent-to-Agent (A2A) Protocol Distributed multi-agent system with structured loop escalation
Infrastructure FastAPI (Python), WebSockets, Google Cloud Run Serverless, containerized, OAuth2-authenticated inter-service comms
Observability OpenTelemetry, Google Cloud Trace Full request tracing from frontend through agent pipeline
Dev Tools Antigravity IDE, uv package manager, gcloud CLI, PowerShell automation Local dev with 5-service orchestration, one-command Cloud Run deploy

๐Ÿ“‚ Getting Started

Prerequisites

  • Python 3.10+ and uv installed
  • Node.js 18+ and npm
  • Google Cloud Project with Gemini API access

Local Development (3 Terminals)

Terminal 1: Start ADK Agents (The Brain)

cd backend/agents
.\run_local.ps1
Enter fullscreen mode Exit fullscreen mode

This starts the sub-agents on ports 8001โ€“8004 required for Agent Mode.

Terminal 2: Start Main Agent (Puck)

cd backend
uv sync
uv run python app/main.py
Enter fullscreen mode Exit fullscreen mode

Starts Puck, the Live Narrator, ready to see and hear you (Port 8000).

Terminal 3: Start Frontend

cd frontend
npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

Visit http://localhost:5173 and start creating stories! โœจ

Cloud Deployment (Google Cloud Run)

Prerequisites for Cloud

  • Google Cloud CLI installed and authenticated (gcloud auth login)
  • Active Google Cloud Project with Billing enabled
  • .env file in backend/app/ with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION

Deploy Supporting Agents

cd backend/agents
.\deploy.ps1
Enter fullscreen mode Exit fullscreen mode

Automatically deploys 4 microservices (Researcher, Judge, Storysmith, Orchestrator) to Cloud Run with security configured.

Deploy Main App

# Run from repository root
.\deploy_app.ps1
Enter fullscreen mode Exit fullscreen mode

Handles dual-stage build: compiles React 19 frontend and wraps it with FastAPI/Puck bridge into a single production-ready container.

Pro-Tip: After deployment, manage AI parameters (Model IDs, API Keys) directly through Cloud Run environment variables without re-deploying.


๐Ÿ“š Key Learnings

๐Ÿ›ก๏ธ Infrastructure is Code (and Risk)

While completing the Way Back Home course, I discovered a critical issue in the workshop setup scripts (specifically billing-enablement.py). The automation silently defaulted to the user's first personal billing account (open_accounts[0]) if a workshop-specific one wasn't found, renaming it and forcing a linkโ€”ignoring any existing promotional credits.

The Reality Check:

  • Metadata Hijacking: The script programmatically renamed my personal billing profile without consent.

  • Financial Impact: It detached my project from my credit-funded account and linked it to my personal Mastercard, resulting in unauthorized charges of $10.13.

  • Silent Execution: The script lacks any input() prompts, making it impossible to intercept these changes in real-time.

The Lesson: I had to manually "patch" the workshop files by adding exit(0) to the billing scripts to prevent further damage. While the educational content of the course was exceptional (10/10), the infrastructure automation was a harsh reminder: always audit third-party setup scripts before running them with elevated permissions. This project taught me that "Agentic Orchestration" starts with the environment, and interactive confirmation in automated DevOps pipelines is not optional.

Specialization > Monoliths

I was surprised at how much more reliable the system became when I stopped relying on one giant prompt and started treating agents like a specialized team with distinct responsibilities. The Guardian of Balance agent alone caught narrative safety issues that a single monolithic prompt would have missed.

The Power of A2A Protocol

Implementing Agent-to-Agent communication was challenging, especially handling Google Application Default Credentials (ADC) on Windows. But once it clicked, the elegance of distributed agents became clear. The Orchestrator only needs to know the agent card URLโ€”not the implementation. This enables independent scaling and future language-agnostic composition.

Movement Changes Everything

The most rewarding part? Seeing a child leap off the couch when Puck asked them to "show me how you jump." Screen time transformed from sedentary consumption into active play. Physical verification via 1 FPS vision is latency-acceptable and behaviorally transformative.


๐Ÿ“‚ Open Source & Reproducibility

The full source codeโ€”including ADK orchestration logic, deployment scripts, and frontendโ€”is available on GitHub:

๐Ÿ‘‰ GitHub: vero-code/gemini-tales

Features:

  • โœ… Full Docker & Cloud Run deployment with OAuth2 inter-service auth
  • โœ… Multi-agent architecture with A2A protocol and structured loop escalation
  • โœ… Live API integration with WebSocket proxy for credential security
  • โœ… Comprehensive ARCHITECTURE.md for deep-dives
  • โœ… 111+ commits documenting the evolution from raw API calls to ADK agents

๐Ÿ† If Gemini Tales Wins...

If this project wins the Gemini Live Agent Challenge, here's what I'm committing to build:

Phase 1: Educator Adoption (Months 1-2)

  • Educator Dashboard: Teachers configure story themes, movement goals, and age-appropriate challenges per session
  • School Deployment Pack: A simplified Cloud Run setup guide + Docker image for schools to self-host
  • Movement Metrics: Track physical activity data per child (with parental consent) to prove the "screen time โ†’ move time" transformation
  • Free Tier for Non-profits: Educational institutions get free Cloud Run quota for one academic year

Phase 2: Global Scale (Months 2-4)

  • Multiplayer Mode: Two children, one story, coordinated physical challenges. Puck asks "Can you BOTH hop together?" and uses vision to verify synchronized movement
  • Multilingual Support: Core stories in 10 languages
  • Cultural Localization: Agent Mode story themes adapt to regional legends, holidays, and cultural values
  • Mobile App: Native iOS/Android for living-room play without a laptop (React Native port of the "Magic Mirror")

Phase 3: Premium Tier (Month 4+)

  • Gemini Tales Premium: Parent dashboard exposing the raw agent pipeline (Researcher, Judge, Storysmith working in real-time) so adults can see exactly how each story was crafted
  • Custom Character Library: Upload your own character art (pet, stuffed animal, superhero OC) and have Puck transform it into the main character
  • Extended Story Packs: Professionally written, multi-session adventures (The Dragon's Lair, The Enchanted Forest, The Lost Temple) with persistent progression across sessions
  • Gamification API: Developers can integrate their own movement tracking devices (Fitbit, Apple Watch, smart scales) to unlock story-specific achievements

Phase 4: Research & Impact (Ongoing)

  • Peer-Reviewed Study: Partner with pediatricians and child psychologists to measure sedentary reduction and cognitive engagement metrics
  • Open Data Initiative: Anonymized, aggregated movement data shared with research institutions studying childhood activity
  • ADK Extensibility: Document the multi-agent orchestration pattern as a reusable template for other child-safe AI applications
  • Google Cloud Starter Kit: Contribute a "Gemini Tales Architecture" as an official Google Cloud Solution template for educational AI

The Bigger Vision ๐ŸŒ

If we can prove that AI can inspire movement instead of sedentary consumption, we unlock a new category of tech: AI that optimizes for human health, not screen time.

Winning this challenge means the resources to show that Gemini Tales is reproducible, scalable, and genuinely life-changing for kids. It's not just a hackathon projectโ€”it's a proof-of-concept for the next generation of responsible AI.

The goal: 10,000 children moving more because of this app by end of 2026.


๐ŸŽฏ Why This Matters

Technology is often the villain in this story. But what if it could be the hero?

Gemini Tales proves that with the right architecture and intention, we can build AI experiences that:

  • โœ… Entertain (magical storytelling powered by Gemini 3.1)
  • โœ… Engage (real-time interaction with Gemini Live)
  • โœ… Activate (physical movement required and verified)
  • โœ… Educate (safe, age-appropriate learning with agent-driven safety review)

This is technology in service of human health.


๐Ÿ“œ License

MIT โ€” See LICENSE

Created with โค๏ธ for the next generation of active explorers.


Tags: #GeminiLiveAgentChallenge #GoogleAI #MultiAgentSystems #ADK #ChildHealth #InteractiveTech #A2AProtocol #Veo #GeminiLive #EducationTech

Top comments (2)

Collapse
 
jess profile image
Jess Lee

How's it going?

Collapse
 
vero-code profile image
Veronika Kashtanova

Great, just added the embed + dev label. Thanks for checking in! ๐Ÿ™Œ