Most AI demos look impressive when 5 people are using them.
Things get very different when hundreds of people are simultaneously asking your system to:
- maintain long-running conversational memory
- coordinate shared game state across multiple players
- handle real-time Discord interactions
- retrieve contextual campaign knowledge
- stream AI responses quickly enough to feel conversational
- and do all of it without your API bill exploding
Over the last year, I built Scrollbook, an AI-powered tabletop RPG platform designed to act as a persistent AI Game Master inside Discord.
At its peak, Scrollbook organically grew to 867 Discord servers and more than 1,000 active users.
I built the entire system solo.
And eventually, I had to shut it down.
Not because the product failed.
Because the infrastructure and model costs became unsustainable for me to continue operating alone.
That experience completely changed how I think about AI systems engineering.
The hardest part was never prompting Claude.
The hardest part was orchestration.
⸻
The Architecture
At a high level, Scrollbook combined:
- a Discord bot for real-time gameplay
- a FastAPI backend
- shared business logic used by both the API and bot
- PostgreSQL + pgvector for persistent state and semantic retrieval
- Redis for streaming and event coordination
- multiple Next.js frontends
- AWS infrastructure running on ECS Fargate
The bot, API, and frontends all shared a common service layer and repository architecture.
I also recorded a deeper code and architecture walkthrough here for anyone interested in the implementation details:
One thing I learned very quickly:
Naive AI architectures collapse under state complexity.
A normal SaaS app can usually treat requests as isolated events.
AI-driven multiplayer systems cannot.
⸻
The Real Problem: Stateful AI
Players do not interact with tabletop games like normal chatbot users.
A single campaign can contain:
- recurring NPCs
- evolving world state
- session history
- player-specific memory
- custom rules
- hidden DM information
- faction relationships
- long-running narrative arcs
The AI needs enough context to feel intelligent, but not so much context that:
- latency becomes unusable
- token costs become catastrophic
- responses drift or hallucinate
Very quickly, I stopped thinking like “an LLM app developer” and started thinking more like a distributed systems engineer.
One important architectural decision was that every Discord channel maintained its own shared conversation session.
All players inside that channel contributed to the same narrative thread and campaign memory. Sessions automatically expired after a configurable amount of time and rotated into fresh sessions to prevent runaway context growth.
Conceptually, that sounds simple.
Operationally, it changed everything.
⸻
Multiplayer AI Systems Need Real Isolation Boundaries
One subtle but important problem was multiplayer privacy and campaign isolation.
Players could privately DM Cypher outside the public campaign channel.
That immediately created a security and orchestration problem:
the AI should not accidentally:
- expose hidden campaign information
- mutate another player’s character
- leak private narrative details
- access tools that should only exist in shared sessions
So private DM sessions operated under restricted tool availability.
The API boundary itself enforced:
- campaign isolation
- character ownership
- session scoping
- allowed mutation operations
Every Discord server effectively acted as an isolated tenant with its own:
- conversation state
- campaign memory
- active narrative context
- AI session lifecycle
Without those boundaries, the system quickly stopped feeling like a multiplayer tabletop experience and started feeling like a chaotic shared chatbot.
⸻
Why the Shared Service Layer Became Critical
One of the best decisions I made early was centralizing business logic into a shared Python library used by both the Discord bot and REST API.
The architecture looked roughly like this:
- Discord bot receives slash commands
- API serves frontend requests
- both call shared services/repositories
- repositories manage PostgreSQL access
- services orchestrate AI context building
This let me keep:
- campaign state consistent
- rules engines centralized
- AI orchestration reusable
- validation logic unified
Without that shared layer, I would have ended up maintaining two separate versions of the product logic.
⸻
The AI Only Knows the World Model You Give It
Before every AI interaction, Scrollbook built a structured context object containing:
- campaign setting and tone
- current quests
- active NPCs
- encounter state
- party information
- previously discovered locations
- the player’s full character sheet
- inventory, spells, equipment, and HP
All of that was assembled into a typed Pydantic model called AIContext.
That model effectively became the AI’s world model.
One lesson I learned quickly:
LLMs behave much more predictably when they operate against strongly structured context boundaries instead of giant unstructured prompts.
⸻
AI Reliability Required Explicit Behavioral Protocols
One thing I underestimated early was how unreliable LLMs become when they are allowed to improvise state mutations.
For example:
When a player drinks a healing potion, the AI cannot simply narrate:
“You regain 10 HP.”
It must:
- remove the potion from inventory
- wait for database confirmation
- update character HP
- confirm the mutation succeeded
- only then narrate the result
I eventually encoded explicit multi-step behavioral protocols directly into the system prompt.
The prompt literally instructed Claude to:
- never narrate mutations before tool confirmation
- wait for successful tool execution
- sequence actions deterministically
- avoid leaking private player information
- respect campaign boundaries
This dramatically improved reliability.
One of the more surprising lessons was that LLMs become far more dependable when treated less like chatbots and more like orchestration engines operating against constrained system contracts.
⸻
The “AI Features” Were Actually Infrastructure Problems
The interesting engineering problems were never:
“How do I call Claude?”
They were:
- how do I avoid context duplication?
- how do I partition campaign memory?
- how do I maintain low latency during concurrent interactions?
- how do I isolate Discord guild data safely?
- how do I prevent race conditions during combat/session updates?
- how do I make async orchestration debuggable?
The platform became heavily async very quickly.
Both the Discord bot and API were built around async execution because synchronous request handling would bottleneck almost immediately under concurrent usage.
⸻
The AI Was Writing Its Own Knowledge Graph
One of the most interesting systems in Scrollbook ran after the AI responded.
After every generated narrative response, I ran a second AI pass that extracted structured game events from the text:
- new NPCs introduced
- quests formed
- locations discovered
- faction relationships
- important campaign events
Only events above a confidence threshold were persisted.
The result was that campaigns gradually built their own structured knowledge graph automatically while players interacted with the world.
Players never needed to manually maintain campaign state, update quest logs, or organize notes.
The AI handled the operational bookkeeping behind the scenes.
⸻
Cost Optimization Became Existential
This was ultimately the part that forced me to sunset the platform.
Long-running multiplayer AI systems generate enormous amounts of context.
And AI costs compound frighteningly fast when users expect conversational responsiveness.
At one point I reduced Claude API costs by roughly 90% through aggressive prompt caching and context deduplication strategies.
The biggest gains came from:
- caching stable system prompts
- partitioning static vs dynamic context
- reducing repeated narrative payloads
- loading only relevant campaign state
- aggressively trimming conversational history
Even with those optimizations, the economics were still difficult as a solo operator.
That experience forced me to think much more seriously about sustainable AI infrastructure design.
⸻
The Production Reality
The production issues ended up being far more educational than the initial implementation.
A few examples:
- malformed AI responses breaking downstream parsing
- Discord interaction timing limits
- async race conditions during simultaneous campaign updates
- runaway token growth from recursive context loading
- queue spikes from high-traffic servers
- retry storms during partial provider failures
- embedding retrieval returning semantically correct but narratively terrible results
One thing I underestimated was how much operational maturity AI products require once real users show up.
The “prompt engineering” phase ends very quickly.
After that, it becomes:
- observability
- retries
- orchestration
- failure isolation
- caching
- cost control
- schema discipline
- defensive engineering
- and building systems that fail gracefully
⸻
What I’d Do Differently
I’m currently rebuilding Scrollbook with many of these lessons in mind.
The platform is now slowly coming back online through a gated public waitlist while I continue refining the architecture and cost model.
If I rebuilt the original system from scratch today, I would:
- introduce stricter event-driven boundaries earlier
- formalize observability sooner
- reduce coupling between orchestration and retrieval layers
- implement structured AI output contracts from day one
- invest earlier in replay/debug tooling for AI workflows
I would also spend less time chasing “perfect AI behavior” and more time designing systems that fail gracefully.
Because production AI systems fail constantly.
The goal is not perfection.
The goal is controlled failure.
⸻
Final Thoughts
Building Scrollbook fundamentally changed the way I think about AI systems engineering.
The LLM is rarely the hard part.
The hard part is everything around it:
- state management
- orchestration
- memory systems
- concurrency
- operational reliability
- retrieval quality
- multiplayer isolation
- and keeping the whole thing economically viable
That’s where the real engineering starts.
⸻
Tech Stack
Python, FastAPI, PostgreSQL, pgvector, Redis, Next.js, TypeScript, AWS ECS Fargate, Bedrock, Anthropic Claude, Docker, SQLAlchemy, Discord.py
Top comments (0)