This article was written as part of my submission to the Gemini Live Agent Challenge. When sharing on social media, I'll be using the hashtag #GeminiLiveAgentChallenge.
If you've ever been a Game Master (GM) for a tabletop RPG like Dungeons & Dragons, you know the deal. You're a storyteller, an actor, a referee, and... an exhausted bookkeeper. I love crafting epic narratives, but the cognitive load of tracking every NPC, quest status, and inventory item in a messy notebook was burning me out.
I thought: what if an AI could handle the bookkeeping, leaving the creativity to the humans?
I didn't just want a chatbot. I wanted a "World Steward"βan agent that listens to the story and silently updates a structured database of the world in the background. That's why I built LoreForge.
What is LoreForge?
LoreForge is an AI-powered campaign companion that transforms unstructured storytelling into structured data and cinematic visuals.
- Automated State Tracking: It listens to gameplay and maintains a live JSON database of the "World State" (NPCs, Factions, Quests) and "Inventory."
- Cinematic Visualization: It detects when a scene is being described and uses Imagen to generate atmospheric fantasy illustrations in real-time.
- Session Recaps on Autopilot: At the end of a session, it generates a fully styled Reveal.js slide deck with an outline, summaries, and custom background art for an instant "Previously on..." presentation.
The Tech Stack
The project is built on a modern, asynchronous Python backend, leaning heavily on Google's ecosystem:
- Backend: Python with FastAPI
- Database: Google Cloud Firestore for persisting session data.
-
AI Models:
- Gemini API: The core LLM for reasoning, state derivation, and content generation.
- Imagen API: For generating all the cinematic visuals.
How It Works: A Look Under the Hood
The magic of LoreForge is in how it orchestrates these services to create something more than a simple chat interface.
1. The "World Steward": State Derivation with Gemini
This is the core of the project. Instead of just having a long chat history, I needed the AI to maintain a canonical, machine-readable "source of truth" for the campaign.
I implemented a "State Derivation" pattern. Periodically, I bundle up the recent gameplay events, the current JSON state, and a complex system prompt, and send it all to Gemini. The model's job isn't to chat, but to return a new, updated JSON object representing the new reality of the game world.
The prompt includes a "schema hint" to guide the model's output. When a player says "I take the 3 healing potions from the chest," the AI doesn't just acknowledge it. It processes the event log and updates the inventory array in the JSON state.
2. Taming the LLM: Forcing Valid JSON
Anyone who has worked with LLMs knows they sometimes get creative, even when you ask for structured data. A stray Markdown fence, a trailing comma, or a truncated response can break your application.
To make LoreForge robust, I wrote a dedicated function, coerce_json_object, to clean up the model's output before parsing. It's a series of defensive heuristics that have proven incredibly effective. This function is a lifesaver; it tries standard parsing, then applies fixes for common LLM mistakes, and as a last resort, even tries parsing the string as a Python literal, which is more forgiving.
3. From Words to Worlds: Cinematic Visuals & Recaps
- Visuals with Imagen: When the user's prompt contains the word "scene," LoreForge triggers Imagen to generate a visual. The key here was prompt engineering. I had to explicitly tell the model not to include text, UI elements, or logos to maintain a clean, cinematic feel.
-
Automated Recaps: This is my favorite feature. The presentation service reads the entire session history and the final world state, then uses a multi-step agentic workflow with Gemini:
- Generate Outline: Ask Gemini to create a JSON outline for a slide deck, summarizing key events.
- Generate Image Prompts: For each slide in the outline, ask Gemini to create a new, specific prompt for a background image.
- Render HTML: Use Jinja2 to render the final outline and image URLs into a Reveal.js HTML file.
The result is a beautiful, shareable slide deck, created with a single button click.
The "It's Alive!" Moment
The first time I described the party finding a treasure chest and then saw the inventory JSON update automatically in my debug view... that was magical. It wasn't just a chatbot anymore; it was an agent that understood the game's state. Clicking the "Generate Presentation" button and seeing a fully-formed slide deck appear moments later felt like pure science fiction.
What's Next?
This project was a deep dive into agentic workflows and state management with LLMs. The next logical step is to integrate Gemini Live to remove the keyboard entirely. I want to be able to simply speak my narration, and have LoreForge listen in as a silent, helpful scribe, updating the world state from my voice alone.
Thanks for reading! Building this has been an incredible learning experience.
Top comments (0)