DEV Community

Cover image for Your LLM Forgets Everything. Give It a Wiki!
Artem M
Artem M

Posted on

Your LLM Forgets Everything. Give It a Wiki!

Every new chat with your LLM starts the same way. Hi, here's the context. Here's the stack. Here's what we tried last week. Here's the constraint nobody wrote down. By the time the model is caught up, you've burned ten minutes paying for ground you already covered.

Then you close the tab and it forgets all of it.

This is the part of "AI workflows" nobody really solved. Context windows got bigger. Agent frameworks got smarter. MCPs sprouted everywhere. The model still wakes up amnesiac every morning.

LLM Amnesia

Bigger context windows are not memory

People keep treating context length as a memory solution. It isn't. A 1M-token window means you can paste more into one conversation — not that anything carries over to the next one. The moment the chat ends, or compaction kicks in, you're back at zero.

The standard fix is RAG: dump your docs into a vector store, retrieve chunks at query time. Better than nothing, but the model is rediscovering knowledge from scratch on every question. Nothing accumulates. Two weeks in, your agent still doesn't know that the "billing" service is actually called payments-v2 internally, even though you've told it five times.

Karpathy's pitch: compile, don't retrieve

Andrej Karpathy published a short gist about a different pattern. The pitch:

Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

Three layers:

  • Raw sources — articles, papers, notes, transcripts. Immutable. The LLM reads them but never edits them.
  • The wiki — LLM-maintained markdown files. Entity pages, concept pages, summaries, an index. The LLM owns this layer entirely.
  • The schema — one doc (CLAUDE.md, AGENTS.md, whatever your agent reads) that tells the LLM the conventions for ingesting, querying, and maintaining the wiki.

LLM Wiki

When a new source comes in, the LLM doesn't just chunk and embed it. It reads it, integrates the relevant facts into existing pages, flags contradictions with what's already there, updates cross-references, appends to the log. One source might touch ten or fifteen wiki pages in one pass. The synthesis already reflects everything you've read.

It's not retrieval. It's compilation.

Karpathy reaches back to Vannevar Bush's 1945 Memex for the lineage — a personal, associative knowledge store with curated trails between documents. Bush couldn't solve the maintenance problem. LLMs can.

What kb-wiki actually is

I built kb-wiki as the practical version of this pattern. A local-first CLI that gives your agent a wiki it can actually read, search, and edit.

kb-wiki

The stack:

  • SQLite for metadata + FTS5 for keyword search
  • sqlite-vec for vector search
  • bge-base-en-v1.5 for local embeddings (~200 MB, runs on Apple Silicon, no API keys)
  • Hybrid search combining BM25 + cosine similarity via Reciprocal Rank Fusion
  • Markdown files as the source of truth — git-friendly, Obsidian-compatible, the SQLite index is fully rebuildable from disk

Everything runs on your machine. No cloud, no rate limits, no leaking your notes to a third party. The embedding model downloads once on first run and caches in ~/.kb/.models/. After that, you're offline.

The part that makes it click is kb setup. It writes a short memo into your agent's instruction file — CLAUDE.md for Claude Code, AGENTS.md for Codex / Cursor / Cline / Windsurf — telling the agent: "you have a wiki, here's how to use it." From that point on, the agent reaches for kb search, kb add, kb update on its own when you say things like "remember this" or "what did we decide about X."

You stop being the bridge between your agent and its memory. The agent walks across on its own.

Three commands to give your agent memory

Install globally:

npm install -g kb-wiki
Enter fullscreen mode Exit fullscreen mode

Create a wiki and set it as default:

kb wiki create my-notes
kb wiki use my-notes
Enter fullscreen mode Exit fullscreen mode

Wire it into your agent (pick the one you use):

kb setup --agents claude --global    # Claude Code, user-scope
kb setup --agents cursor             # Cursor, project-scope
kb setup --agents codex  --global    # Codex CLI, user-scope
Enter fullscreen mode Exit fullscreen mode

That's it. Open your agent and try:

"Remember that the billing service is actually called payments-v2 internally."

"Ingest this article and link it to our notes on caching."

"What did we decide about the auth flow last week?"

The agent will reach for kb on its own.

One performance note: the embedding model is ~200 MB and takes 2-3 seconds to load on every cold call. If you're going to use it a lot in a session, start the local server once:

kb serve --detached
Enter fullscreen mode Exit fullscreen mode

All subsequent kb calls auto-route through the warm server — no flags needed on the callers. Search drops from ~2-3 s to ~50-150 ms.

Why this actually sticks

The annoying part of any knowledge base is not the reading. It's the bookkeeping. Updating cross-references when something gets renamed. Keeping summaries current. Noticing that a new note contradicts an old one. Humans abandon wikis because that maintenance cost grows faster than the value.

LLMs don't get bored. They don't forget to update a cross-reference. They can touch fifteen files in one pass without complaining. The wiki stays alive because the cost of keeping it alive is near zero.

The job split is clean: you curate sources and ask good questions. The model does the filing.

If you've explained the same project context to your agent five times this month, this is the thing that fixes it.

npm install -g kb-wiki
kb wiki create my-notes
kb setup --agents claude --global
Enter fullscreen mode Exit fullscreen mode

Three commands. Your agent now remembers.

Top comments (0)