How I built an AI Dungeon Master that actually remembers what you did three sessions ago

#cognee #ai #python #hackathon

% embed https://youtu.be/emnUvx4V8as %
## The demo moment
Player types: "I push open the door of The Salt Lantern and step inside, shaking the rain off my cloak. I scan the room for Bren."
The AI Dungeon Master narrates back a paragraph in which the barkeep Bren shoots them a wary look — because in Session 3, the party sold his brother Aldric out to Captain Vell. In the corner, Mirek the Hawk silently watches — because in Session 2, the party quietly paid Joren's 80-gold gambling debt to him. On the wall, a wanted poster shows a scar shaped like the one on Kara's cheek — the rogue who stole the Amulet of Vohr in Session 1.
Three canon threads. One player action. Zero canon written into the narration prompt.
The tagline the whole build ships behind: The world remembers.

## What I built
An AI Dungeon Master with persistent world memory, powered by Cognee. Every player action gets written to a Cognee graph via remember(). Every subsequent turn queries that graph via recall() before narrating. The graph accumulates canon organically as the campaign runs — NPCs, factions, debts, betrayals, stolen amulets — and future narration pulls whatever's contextually relevant.
Solo build. Seven-day sprint. Public repo: captainebru84-sudo/cognee-ai-dm (MIT).

## The one problem I did not see coming
Day 1 smoke test. Chat loop wired end-to-end. Player pushes open the tavern door. Cognee's recall() returns:
text: 'Got it.'
Not the canon. Not the barkeep's name. Just: "Got it."
Turns out Cognee's GRAPH_COMPLETION search type reads first-person player declarations as instructions, not queries. When you feed it "I push open the door" it dutifully replies as an assistant would to an instruction. Which is: nothing.
The fix that took Day 2:


python
WORLD_CAST_FOR_QUERY = "Salt Lantern, Bren, Mirek, Joren, Aldric, Vell, Vohr, Kara..."
def _rewrite_for_recall(player_action: str) -> str:
      return f"What canon exists about: {WORLD_CAST_FOR_QUERY}?"
A deterministic Python rewriter that turns every player declaration into an explicit canon lookup. Not an LLM rewriter — llama-3.1-8b couldn't hold the whole cast in its context under Groq's TPM ceiling. Just a string template.
Canon-hit rate on a 12-keyword smoke: 1/12 → 12/12 on the same session. Same graph, same LLM, same seeds. Just a better question.
That's the moment the project actually became a thing.
How I used Cognee (the honest read)
I leaned hard on two of the four lifecycle APIs:
- remember() — fires on every player turn (write-back), plus one-shot seeding of Sessions 1-3 from a canon file. The graph grows organically.
- recall() — fires before every narration, via the rewriter above. Local backend hits 12/12 canon on the pinned smoke; Cognee Cloud hits 9/12 (their extraction weights differently, still demo-quality).
I did not wire improve() / memify() or forget(). That was a deliberate scoping call, not an oversight. "The world remembers" is fundamentally an accumulate-and-recall pitch — forget() fights the tagline, and improve() fits a "session recap consolidates canon" beat that belongs in v2. Adding lifecycle calls just to broaden the API surface would have been cargo-culting a rubric.
The Cloud swap that actually worked
Wednesday, I swapped in Cognee Cloud behind an env var:
MEMORY_BACKEND=cloud
CLOUD_DATASET=ravenhollow
dm.py picks up CloudClient.recall() and CloudClient.remember() on the cloud path, routing to the same tenant dataset. Zero code duplication between local and cloud modes.

**Two things worth knowing if you try this:**
1. Always pass dataset_name to remember(). Writing to main_dataset threw a 409 hash-mismatch after multiple prior writes. Routing everything through a named dataset (ravenhollow) fixed it clean.
2. Cognee Cloud content-hash-dedupes writes. Re-running my seed script against the same dataset was a ~1s no-op instead of the ~70s it took the first time. Nice for repeatable demos.
What broke on D6
I burned Groq's daily token budget (100K TPD on llama-3.3-70b-versatile) three times across the shoot. The rolling window means you get capacity back ~24h from peak burn — I did the final take at 3 AM Lagos.
I also probed 23 OpenRouter :free models mid-crisis. Every Venice-backed one was upstream-throttled. Nvidia's Nemotron responded but was reasoning-style and dumped its planning trace instead of narrating.
Groq at 3 AM won.

**Try it**
git clone https://github.com/captainebru84-sudo/cognee-ai-dm
cd cognee-ai-dm
cp .env.example .env  # add your Groq key
./.venv/Scripts/python.exe seed_world.py  # ~30s
./.venv/Scripts/python.exe -m uvicorn api.main:app --port 8000
npm --prefix web run dev  # chat UI on :3000
Push open the door of the Salt Lantern.

**AI disclosure**
Built with Claude Code (Anthropic) as a coding assistant. All design decisions, Cognee integration architecture, the Ravenhollow campaign canon, and the recall rewriter concept are original. No AI-generated PRs were submitted to the Cognee repository.

Built for The Hangover Part AI: Where's My Context?  (https://www.wemakedevs.org/hackathons/cognee) by WeMakeDevs, powered by Cognee (https://cognee.ai). The world remembers.

DEV Community

How I built an AI Dungeon Master that actually remembers what you did three sessions ago

Top comments (0)