Memory in AI games is a harder problem than it looks.
Most AI RPG tools handle it the same way. They pass the conversation history into the context window and hope the model keeps track. Works fine for one session. By session three, names are wrong, plot threads have vanished, and NPCs are contradicting things they said two weeks ago.
We ran into this building Questsmith. The naive approach failed fast.
What We Tried First
Passing full conversation history was the obvious starting point. It worked until sessions got long. Context windows have limits and when you hit them, the oldest content drops off. The things that happened first — which are often the most important for continuity — are exactly what gets lost.
Summarization was the next attempt. Compress old sessions into a shorter format and prepend it to each new session. Better, but summaries lose specificity. "The player helped an NPC" is not the same as "The player helped a guard named Aldric who owes them a debt and has a sister in the city watch."
Details matter in RPGs. Summaries kill details.
What Actually Worked
We shifted to structured memory extraction instead of summarization.
After each session, a separate process reads the conversation and pulls out specific facts — character names, relationships, decisions, consequences, world-state changes, NPC attitudes. These get stored as discrete memory entries, not as prose.
At the start of each new session, relevant memories get injected back into the context based on what is likely to matter for the current scene. Not everything — just what is relevant. That keeps the context lean while keeping the important details accessible.
Questsmith tracks up to 500 of these memory entries per adventure. In practice, a long campaign builds up 200 to 300 entries over several weeks of play. The system references them without the player needing to re-establish context manually.
What This Feels Like from the Player Side
Week three of a campaign is when players notice it.
An NPC reappears and remembers what the player did for them in week one. A decision that seemed minor in session two comes back as a consequence in session five. A companion's trust meter shifts based on patterns across multiple sessions, not just the last thing the player said.
That kind of continuity is what separates a campaign from a series of disconnected sessions. It is also what makes players come back.
The Tradeoff
Structured extraction is more expensive than summarization. You are running an additional process after every session. The extraction needs to be accurate — a misremembered detail is worse than no detail because it actively contradicts the story.
We spent a significant amount of time on the extraction prompt and validation logic. Getting it right took longer than building the storage layer.
If You Are Building Something Similar
A few things that helped:
- Extract facts as discrete entries, not paragraphs. Structured data is easier to retrieve selectively than prose.
- Tag entries by entity. Memories about a specific NPC should be retrievable by that NPC's name, not just by session number.
- Build a relevance filter. Injecting all 300 memories into every context is wasteful and noisy. Inject what is relevant to the current scene.
- Let players see their memory log. Transparency builds trust and helps players understand why the AI remembers what it does.
The persistent memory problem in AI games is solvable. It just requires treating memory as a data problem, not a prompting problem.
Questsmith is live at thequestsmith.com if you want to see how it plays out in practice. Free tier available, no credit card needed.
Top comments (0)