austin amento

Posted on Mar 2

Building SoloQuest: Letting Gemini Be the DM, and Code Be the Referee

#devchallenge #geminireflections #gemini

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

SoloQuest started with a question that felt a little absurd and a little irresistible:

Could I build a solo D&D 5e campaign where an AI acts as the Dungeon Master, and have it actually feel coherent for more than a few turns?

At first, this was just a prompt experiment in Google AI Studio. I wanted to see whether Gemini could stay in character as a rules-aware DM over a long conversation. The early results were exciting. Gemini was great at scene-setting, roleplay, and making even simple encounters feel dramatic.

Then the cracks started to show.

A goblin would attack twice because initiative state drifted. A defeated enemy would still be described as active. A spell slot would get consumed in the narration but not in the actual game state. Death saves sounded correct and still ended up inconsistent.

That was the turning point. I realized I was trying to use Gemini for two jobs at once:

storytelling
stateful rules execution

It was excellent at the first and unreliable at the second.

That insight became the architecture for SoloQuest.

Today, SoloQuest is a full-stack Next.js app with Firebase Auth and Firestore, Stripe subscriptions, Google Text-to-Speech for narration, and Gemini powering the core game experience. The model handles narration, roleplay, encounter flavor, NPC dialogue, and player choice framing. My code handles the deterministic layer: combat state, initiative, HP math, spell slots, inventory, conditions, progression, and persistence.

The core loop works like this:

The player enters an action
The app sends Gemini the current campaign context, including character state, combat state, inventory, quest context, and recent session history
Gemini returns both narrative text and a structured MECHANICS block
A deterministic rules engine parses that block and applies the actual state transitions
Everything is saved back to Firestore so the campaign can continue across sessions

Gemini plays the DM. My code plays the referee.

That division of labor turned a fun prototype into a real product.

Demo

You can try SoloQuest at thesoloquest.com

What I Learned

The Most Important Design Decision Was Deciding What the AI Was Allowed to Do

The biggest lesson from this project had nothing to do with clever prompts. It was about system boundaries.

Early on, I kept trying to fix mechanical inconsistency with better instructions. I added more rules to the system prompt, more formatting requirements, more reminders about initiative order, death saves, conditions, and spell slots. It helped a little, but not enough. The problem was not that Gemini was failing to read my instructions. The problem was that I was assigning deterministic responsibilities to a probabilistic system.

Once I accepted that, progress became much faster.

Over eight phases of development, I moved game mechanics out of Gemini and into a rules engine. Gemini still decides how the world sounds and feels. The engine decides what actually happens.

That changed everything.

Here are a few examples:

Initiative and turn order are fully engine-controlled. My code rolls initiative, sorts combatants, advances turns, and enforces who can act.
Attack resolution uses real AC and stat data. Gemini can narrate a hit, but if the math says it missed, the engine wins.
Conditions like stunned, grappled, and paralyzed are stored as structured state with start-of-turn and end-of-turn hooks.
Spells are resolved by deterministic handlers based on spell category, such as attack roll, saving throw, auto-hit, or area effect.
Reactions like Shield, Uncanny Dodge, and Opportunity Attacks are surfaced as interactive prompts instead of left to narration.
Encounter generation is constrained by SRD-based balance rules so the model has creative freedom without producing absurd difficulty spikes.

The key mental model became:

Gemini should generate intent and drama. The engine should own truth.

That idea ended up being useful beyond games. It feels like a general rule for building with LLMs. If the model is responsible for something that must be correct every time, you should seriously consider whether that responsibility belongs in code instead.

The most valuable non-technical skill here was learning how to design the handoff between AI output and deterministic systems. That handoff is easy to underestimate, but in a product like this it defines the user experience.

Google Gemini Feedback

What Worked Really Well

Gemini is genuinely strong at long-form narrative interaction. It does a good job maintaining tone, pacing, and scene continuity across many turns, which matters a lot in a solo RPG where the model is effectively carrying the voice of the world.

It was also strong enough at structured output to support a hybrid architecture. I instructed it to always include a MECHANICS block alongside the narrative, and that gave me a consistent enough interface to build a parser and rules engine around.

A typical response pattern looks something like this:

NARRATIVE:
The goblin lunges from behind the crates, dagger flashing in the torchlight.

MECHANICS:
ACTION_TYPE: ATTACK
SOURCE: goblin_1
TARGET: player
ATTACK_TYPE: dagger

The important part is that Gemini is not trusted to finalize the outcome. It declares intent. The engine resolves whether the attack hits, how much damage is applied, whether a reaction is available, and what the updated combat state becomes.

I also liked working with the @google/genai SDK. It felt straightforward to integrate, and one design choice that paid off early was versioning my system instruction. I keep a SYSTEM_INSTRUCTION_VERSION constant so I can evolve the DM prompt over time without corrupting older save files or creating mismatches across campaigns.

Where I Hit Friction

The biggest friction point was long-context consistency.

As campaigns grew, Gemini would sometimes drift from earlier constraints. It might quietly invent a mechanic, forget a restriction, or phrase a response in a way that implied a state change that never actually happened. This forced me to constantly ask whether a bug belonged to prompt design, parser design, or the underlying architecture.

The second major friction point was output normalization. Even when Gemini followed the general structure, the exact formatting could vary. I ended up building a fairly defensive parser that handles misplaced directives, missing separators, whitespace variation, partial formatting failures, and narrative text leaking into the mechanics section.

That parser exists because the model was good enough to be useful, but not rigid enough to be treated like a strict protocol.

I also felt the absence of a more native function-style interface for state mutations in this workflow. I effectively created a domain-specific command layer for the model to speak through:

HP_CHANGE: -12
CONDITION_ADD: poisoned
SPELL_SLOT_USED: 2

That worked, but it shifted complexity into my application. A more direct way to return strongly structured actions would have simplified a lot of infrastructure.

The messiest failure mode was hallucinated state. In early versions, Gemini would occasionally introduce enemies I had never spawned, add items that were never earned, or award XP for events that had not actually occurred. To contain that, I added a validation layer that checks model responses before any game state is mutated. If validation fails, the app sends a repair prompt asking Gemini to correct the output.

That helped reliability, but it also introduced latency because a bad response can require an additional model call.

In the end, Gemini got much better as soon as I narrowed its job description. The more precise I was about what belonged to the model and what belonged to the engine, the better the overall system behaved.

Where I'm Headed Next

The engine is now through eight phases of rules implementation. The remaining work is less about core feasibility and more about completeness and production hardening: item economy, attunement, ammunition tracking, encumbrance, error dashboards, combat-friction telemetry, feature flags, and migration tooling for older campaigns.

What is interesting is that as the engine becomes more capable, the prompt becomes simpler.

Rules that Gemini used to carry in prompt instructions are now enforced directly in code. That means I can remove prompt complexity instead of adding to it. The system gets more reliable while the model's role becomes more focused.

That is where I think this architecture is heading:

Gemini handles prose, pacing, roleplay, and choice design. The engine handles rules, state, and truth.

That separation is what made SoloQuest viable, and it is the main thing I would tell anyone building with Gemini from day one:

Use the model for what it is uniquely good at. Protect that space. Move everything deterministic out of it as quickly as you can.

You can try SoloQuest at thesoloquest.com. Your first 50 turns are free. If you are building with Gemini or experimenting with AI-driven game systems, I would love to compare notes in the comments.

Top comments (1)

ryan Mull • Jun 24

I'm looking for more of a DM Assistant out of AI. Something to handle storytelling and keeping track of things, being able to look up rules, etc.
Have you run across any of this before? I don't think I have the time to start from scratch.