Oleksiy

Posted on Mar 15

I Built an AI Dungeon Master with Gemini to Automate My D&D Campaigns

#google #gemini #python #geminiliveagentchallenge

This article was written as part of my submission to the Gemini Live Agent Challenge. When sharing on social media, I'll be using the hashtag #GeminiLiveAgentChallenge.

If you've ever been a Game Master (GM) for a tabletop RPG like Dungeons & Dragons, you know the deal. You're a storyteller, an actor, a referee, and... an exhausted bookkeeper. I love crafting epic narratives, but the cognitive load of tracking every NPC, quest status, and inventory item in a messy notebook was burning me out.

I thought: what if an AI could handle the bookkeeping, leaving the creativity to the humans?

I didn't just want a chatbot. I wanted a "World Steward"—an agent that listens to the story and silently updates a structured database of the world in the background. That's why I built LoreForge.

What is LoreForge?

LoreForge is an AI-powered campaign companion that transforms unstructured storytelling into structured data and cinematic visuals.

Automated State Tracking: It listens to gameplay and maintains a live JSON database of the "World State" (NPCs, Factions, Quests) and "Inventory."
Cinematic Visualization: It detects when a scene is being described and uses Imagen to generate atmospheric fantasy illustrations in real-time.
Session Recaps on Autopilot: At the end of a session, it generates a fully styled Reveal.js slide deck with an outline, summaries, and custom background art for an instant "Previously on..." presentation.

The Tech Stack

The project is built on a modern, asynchronous Python backend, leaning heavily on Google's ecosystem:

Backend: Python with FastAPI
Database: Google Cloud Firestore for persisting session data.
AI Models:
- Gemini API: The core LLM for reasoning, state derivation, and content generation.
- Imagen API: For generating all the cinematic visuals.

How It Works: A Look Under the Hood

The magic of LoreForge is in how it orchestrates these services to create something more than a simple chat interface.

1. The "World Steward": State Derivation with Gemini

This is the core of the project. Instead of just having a long chat history, I needed the AI to maintain a canonical, machine-readable "source of truth" for the campaign.

I implemented a "State Derivation" pattern. Periodically, I bundle up the recent gameplay events, the current JSON state, and a complex system prompt, and send it all to Gemini. The model's job isn't to chat, but to return a new, updated JSON object representing the new reality of the game world.

The prompt includes a "schema hint" to guide the model's output. When a player says "I take the 3 healing potions from the chest," the AI doesn't just acknowledge it. It processes the event log and updates the inventory array in the JSON state.

2. Taming the LLM: Forcing Valid JSON

Anyone who has worked with LLMs knows they sometimes get creative, even when you ask for structured data. A stray Markdown fence, a trailing comma, or a truncated response can break your application.

To make LoreForge robust, I wrote a dedicated function, coerce_json_object, to clean up the model's output before parsing. It's a series of defensive heuristics that have proven incredibly effective. This function is a lifesaver; it tries standard parsing, then applies fixes for common LLM mistakes, and as a last resort, even tries parsing the string as a Python literal, which is more forgiving.

3. From Words to Worlds: Cinematic Visuals & Recaps

Visuals with Imagen: When the user's prompt contains the word "scene," LoreForge triggers Imagen to generate a visual. The key here was prompt engineering. I had to explicitly tell the model not to include text, UI elements, or logos to maintain a clean, cinematic feel.
Automated Recaps: This is my favorite feature. The presentation service reads the entire session history and the final world state, then uses a multi-step agentic workflow with Gemini:
1. Generate Outline: Ask Gemini to create a JSON outline for a slide deck, summarizing key events.
2. Generate Image Prompts: For each slide in the outline, ask Gemini to create a new, specific prompt for a background image.
3. Render HTML: Use Jinja2 to render the final outline and image URLs into a Reveal.js HTML file.

The result is a beautiful, shareable slide deck, created with a single button click.

The "It's Alive!" Moment

The first time I described the party finding a treasure chest and then saw the inventory JSON update automatically in my debug view... that was magical. It wasn't just a chatbot anymore; it was an agent that understood the game's state. Clicking the "Generate Presentation" button and seeing a fully-formed slide deck appear moments later felt like pure science fiction.

What's Next?

This project was a deep dive into agentic workflows and state management with LLMs. The next logical step is to integrate Gemini Live to remove the keyboard entirely. I want to be able to simply speak my narration, and have LoreForge listen in as a silent, helpful scribe, updating the world state from my voice alone.

Thanks for reading! Building this has been an incredible learning experience.

DEV Community