Dusty Mumphrey

Posted on May 15

Building Scrollbook: What I Learned Building and Scaling an AI Game Master Platform as a Solo Engineer

#aiengineering #systemdesign #gamedev #llm

Most AI demos look impressive when 5 people are using them.

Things get very different when hundreds of people are simultaneously asking your system to:

maintain long-running conversational memory
coordinate shared game state across multiple players
handle real-time Discord interactions
retrieve contextual campaign knowledge
stream AI responses quickly enough to feel conversational
and do all of it without your API bill exploding

Over the last year, I built Scrollbook, an AI-powered tabletop RPG platform designed to act as a persistent AI Game Master inside Discord.

At its peak, Scrollbook organically grew to 867 Discord servers and more than 1,000 active users.

I built the entire system solo.

And eventually, I had to shut it down.

Not because the product failed.

Because the infrastructure and model costs became unsustainable for me to continue operating alone.

That experience completely changed how I think about AI systems engineering.

The hardest part was never prompting Claude.

The hardest part was orchestration.

⸻

The Architecture

At a high level, Scrollbook combined:

a Discord bot for real-time gameplay
a FastAPI backend
shared business logic used by both the API and bot
PostgreSQL + pgvector for persistent state and semantic retrieval
Redis for streaming and event coordination
multiple Next.js frontends
AWS infrastructure running on ECS Fargate

The bot, API, and frontends all shared a common service layer and repository architecture.

I also recorded a deeper code and architecture walkthrough here for anyone interested in the implementation details:

One thing I learned very quickly:

Naive AI architectures collapse under state complexity.

A normal SaaS app can usually treat requests as isolated events.

AI-driven multiplayer systems cannot.

⸻

The Real Problem: Stateful AI

Players do not interact with tabletop games like normal chatbot users.

A single campaign can contain:

recurring NPCs
evolving world state
session history
player-specific memory
custom rules
hidden DM information
faction relationships
long-running narrative arcs

The AI needs enough context to feel intelligent, but not so much context that:

latency becomes unusable
token costs become catastrophic
responses drift or hallucinate

Very quickly, I stopped thinking like “an LLM app developer” and started thinking more like a distributed systems engineer.

One important architectural decision was that every Discord channel maintained its own shared conversation session.

All players inside that channel contributed to the same narrative thread and campaign memory. Sessions automatically expired after a configurable amount of time and rotated into fresh sessions to prevent runaway context growth.

Conceptually, that sounds simple.

Operationally, it changed everything.

⸻

Multiplayer AI Systems Need Real Isolation Boundaries

One subtle but important problem was multiplayer privacy and campaign isolation.

Players could privately DM Cypher outside the public campaign channel.

That immediately created a security and orchestration problem:
the AI should not accidentally:

expose hidden campaign information
mutate another player’s character
leak private narrative details
access tools that should only exist in shared sessions

So private DM sessions operated under restricted tool availability.

The API boundary itself enforced:

campaign isolation
character ownership
session scoping
allowed mutation operations

Every Discord server effectively acted as an isolated tenant with its own:

conversation state
campaign memory
active narrative context
AI session lifecycle

Without those boundaries, the system quickly stopped feeling like a multiplayer tabletop experience and started feeling like a chaotic shared chatbot.

⸻

Why the Shared Service Layer Became Critical

One of the best decisions I made early was centralizing business logic into a shared Python library used by both the Discord bot and REST API.

The architecture looked roughly like this:

Discord bot receives slash commands
API serves frontend requests
both call shared services/repositories
repositories manage PostgreSQL access
services orchestrate AI context building

This let me keep:

campaign state consistent
rules engines centralized
AI orchestration reusable
validation logic unified

Without that shared layer, I would have ended up maintaining two separate versions of the product logic.

⸻

The AI Only Knows the World Model You Give It

Before every AI interaction, Scrollbook built a structured context object containing:

campaign setting and tone
current quests
active NPCs
encounter state
party information
previously discovered locations
the player’s full character sheet
inventory, spells, equipment, and HP

All of that was assembled into a typed Pydantic model called AIContext.

That model effectively became the AI’s world model.

One lesson I learned quickly:

LLMs behave much more predictably when they operate against strongly structured context boundaries instead of giant unstructured prompts.

⸻

AI Reliability Required Explicit Behavioral Protocols

One thing I underestimated early was how unreliable LLMs become when they are allowed to improvise state mutations.

For example:

When a player drinks a healing potion, the AI cannot simply narrate:

“You regain 10 HP.”

It must:

remove the potion from inventory
wait for database confirmation
update character HP
confirm the mutation succeeded
only then narrate the result

I eventually encoded explicit multi-step behavioral protocols directly into the system prompt.

The prompt literally instructed Claude to:

never narrate mutations before tool confirmation
wait for successful tool execution
sequence actions deterministically
avoid leaking private player information
respect campaign boundaries

This dramatically improved reliability.

One of the more surprising lessons was that LLMs become far more dependable when treated less like chatbots and more like orchestration engines operating against constrained system contracts.

⸻

The “AI Features” Were Actually Infrastructure Problems

The interesting engineering problems were never:

“How do I call Claude?”

They were:

how do I avoid context duplication?
how do I partition campaign memory?
how do I maintain low latency during concurrent interactions?
how do I isolate Discord guild data safely?
how do I prevent race conditions during combat/session updates?
how do I make async orchestration debuggable?

The platform became heavily async very quickly.

Both the Discord bot and API were built around async execution because synchronous request handling would bottleneck almost immediately under concurrent usage.

⸻

The AI Was Writing Its Own Knowledge Graph

One of the most interesting systems in Scrollbook ran after the AI responded.

After every generated narrative response, I ran a second AI pass that extracted structured game events from the text:

new NPCs introduced
quests formed
locations discovered
faction relationships
important campaign events

Only events above a confidence threshold were persisted.

The result was that campaigns gradually built their own structured knowledge graph automatically while players interacted with the world.

Players never needed to manually maintain campaign state, update quest logs, or organize notes.

The AI handled the operational bookkeeping behind the scenes.

⸻

Cost Optimization Became Existential

This was ultimately the part that forced me to sunset the platform.

Long-running multiplayer AI systems generate enormous amounts of context.

And AI costs compound frighteningly fast when users expect conversational responsiveness.

At one point I reduced Claude API costs by roughly 90% through aggressive prompt caching and context deduplication strategies.

The biggest gains came from:

caching stable system prompts
partitioning static vs dynamic context
reducing repeated narrative payloads
loading only relevant campaign state
aggressively trimming conversational history

Even with those optimizations, the economics were still difficult as a solo operator.

That experience forced me to think much more seriously about sustainable AI infrastructure design.

⸻

The Production Reality

The production issues ended up being far more educational than the initial implementation.

A few examples:

malformed AI responses breaking downstream parsing
Discord interaction timing limits
async race conditions during simultaneous campaign updates
runaway token growth from recursive context loading
queue spikes from high-traffic servers
retry storms during partial provider failures
embedding retrieval returning semantically correct but narratively terrible results

One thing I underestimated was how much operational maturity AI products require once real users show up.

The “prompt engineering” phase ends very quickly.

After that, it becomes:

observability
retries
orchestration
failure isolation
caching
cost control
schema discipline
defensive engineering
and building systems that fail gracefully

⸻

What I’d Do Differently

I’m currently rebuilding Scrollbook with many of these lessons in mind.

The platform is now slowly coming back online through a gated public waitlist while I continue refining the architecture and cost model.

If I rebuilt the original system from scratch today, I would:

introduce stricter event-driven boundaries earlier
formalize observability sooner
reduce coupling between orchestration and retrieval layers
implement structured AI output contracts from day one
invest earlier in replay/debug tooling for AI workflows

I would also spend less time chasing “perfect AI behavior” and more time designing systems that fail gracefully.

Because production AI systems fail constantly.

The goal is not perfection.

The goal is controlled failure.

⸻

Final Thoughts

Building Scrollbook fundamentally changed the way I think about AI systems engineering.

The LLM is rarely the hard part.

The hard part is everything around it:

state management
orchestration
memory systems
concurrency
operational reliability
retrieval quality
multiplayer isolation
and keeping the whole thing economically viable

That’s where the real engineering starts.

⸻

Tech Stack

Python, FastAPI, PostgreSQL, pgvector, Redis, Next.js, TypeScript, AWS ECS Fargate, Bedrock, Anthropic Claude, Docker, SQLAlchemy, Discord.py

DEV Community

Building Scrollbook: What I Learned Building and Scaling an AI Game Master Platform as a Solo Engineer

Top comments (0)