Subtitle: Built for the Gemini Live Agent Challenge 2026 #GeminiLiveAgentChallenge
1. The Problem: AI stories are boring
Most AI storytelling experiences still feel transactional. You type into a box, get a paragraph back, maybe copy it into a document, and the magic ends there. They do not see, hear, speak, or remember. They do not feel like a living world.
That gap became the starting point for SAGA. I wanted something that felt less like prompting an API and more like stepping into a creative chamber where prose, visuals, narration, and world memory move together.
2. The Vision: What if a story could See, Hear, Speak, and Remember?
SAGA is built around a simple belief: stories should not be output, they should be environments.
So the product became a story universe engine:
- See through inline illustrations and cinematic clips
- Hear through narration and ambient score
- Speak through Gemini Live as a co-author
- Remember through persistent world state in Firestore and vector memory in Qdrant
That one framing decision drove the entire architecture.
3. Architecture: The 5-model stack
SAGA uses a layered Google AI stack:
- Gemini 2.0 Flash as the primary story engine
- Gemini Live API for real-time voice co-authoring
- Imagen 4 for scene illustrations
- Veo 2 for short cinematic beats
- Gemini TTS for narration
- Lyria 2 for ambient score generation
The backend runs on FastAPI and Cloud Run. Firestore stores story sessions and return-state. Cloud Storage stores media artifacts. Terraform provisions the infrastructure. Secret Manager handles secrets. Qdrant stores vector memory for continuity.
The key design choice was interleaving. Text, image, narration, and music do not appear in separate tabs. They arrive in one manuscript stream so the user experiences a single living artifact.
4. The Hard Parts
There were a few technical pieces that mattered more than expected:
PCM-to-WAV wrapping for live audio
Gemini Live returns raw audio chunks, so browser-safe playback required clean PCM handling and scheduling. Once chunk playback was scheduled in a persistent audio context instead of one context per chunk, the speaking voice stopped sounding broken.
Lyria REST workaround
The current Lyria path uses Vertex REST because the SDK path had a proto/runtime mismatch for this use case. That made the music layer slightly different from the other model integrations, but it kept the product stable and demoable.
Background world extraction
The story could not wait for map extraction, narration, or video to finish. The manuscript needed to keep moving. So world extraction, narration, music, and cinematic clip generation were pushed into non-blocking background tasks, then streamed back into the same WebSocket session.
5. The ADK Layer: Why SAGA is an agent
I wanted SAGA to be legible as an agent, not just a collection of API calls.
So I added an explicit Google ADK surface with tool definitions for:
- generating the next story section
- applying director commands
- extracting world locations
That matters for the architecture story. Gemini Live does not just transcribe voice. It listens, understands intent, then says GENERATING: ... when it is ready to trigger the next action. That is an agent moment.
6. The Demo Moment: "Three days have passed in Hastinapur..."
The most emotionally important feature is the return experience.
If you close the browser and come back later, SAGA restores the story world and writes you a welcome-back message that references your characters and locations. That single interaction reframes the product. The system no longer feels stateless. It feels like the world kept breathing while you were away.
That is the moment most people immediately understand the product.
7. What's Next
If I keep building SAGA, the next steps are clear:
- multi-user shared worlds
- mobile companion app
- collaborative writer rooms
- publishable world libraries
- a marketplace for stories, universes, and generated artifacts
8. Try It Yourself
- Demo Video: https://youtu.be/mdONC55NxEU
I created this content for the purposes of entering the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge









Top comments (0)