Artem M

Posted on May 26

Your LLM Forgets Everything. Give It a Wiki!

#ai #llm #opensource #productivity

Every new chat with your LLM starts the same way. Hi, here's the context. Here's the stack. Here's what we tried last week. Here's the constraint nobody wrote down. By the time the model is caught up, you've burned ten minutes paying for ground you already covered.

Then you close the tab and it forgets all of it.

This is the part of "AI workflows" nobody really solved. Context windows got bigger. Agent frameworks got smarter. MCPs sprouted everywhere. The model still wakes up amnesiac every morning.

Bigger context windows are not memory

People keep treating context length as a memory solution. It isn't. A 1M-token window means you can paste more into one conversation — not that anything carries over to the next one. The moment the chat ends, or compaction kicks in, you're back at zero.

The standard fix is RAG: dump your docs into a vector store, retrieve chunks at query time. Better than nothing, but the model is rediscovering knowledge from scratch on every question. Nothing accumulates. Two weeks in, your agent still doesn't know that the "billing" service is actually called payments-v2 internally, even though you've told it five times.

Karpathy's pitch: compile, don't retrieve

Andrej Karpathy published a short gist about a different pattern. The pitch:

Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

Three layers:

Raw sources — articles, papers, notes, transcripts. Immutable. The LLM reads them but never edits them.
The wiki — LLM-maintained markdown files. Entity pages, concept pages, summaries, an index. The LLM owns this layer entirely.
The schema — one doc (CLAUDE.md, AGENTS.md, whatever your agent reads) that tells the LLM the conventions for ingesting, querying, and maintaining the wiki.

When a new source comes in, the LLM doesn't just chunk and embed it. It reads it, integrates the relevant facts into existing pages, flags contradictions with what's already there, updates cross-references, appends to the log. One source might touch ten or fifteen wiki pages in one pass. The synthesis already reflects everything you've read.

It's not retrieval. It's compilation.

Karpathy reaches back to Vannevar Bush's 1945 Memex for the lineage — a personal, associative knowledge store with curated trails between documents. Bush couldn't solve the maintenance problem. LLMs can.

What kb-wiki actually is

I built kb-wiki as the practical version of this pattern. A local-first CLI that gives your agent a wiki it can actually read, search, and edit.

The stack:

SQLite for metadata + FTS5 for keyword search
sqlite-vec for vector search
bge-base-en-v1.5 for local embeddings (~200 MB, runs on Apple Silicon, no API keys)
Hybrid search combining BM25 + cosine similarity via Reciprocal Rank Fusion
Markdown files as the source of truth — git-friendly, Obsidian-compatible, the SQLite index is fully rebuildable from disk

Everything runs on your machine. No cloud, no rate limits, no leaking your notes to a third party. The embedding model downloads once on first run and caches in ~/.kb/.models/. After that, you're offline.

The part that makes it click is kb setup. It writes a short memo into your agent's instruction file — CLAUDE.md for Claude Code, AGENTS.md for Codex / Cursor / Cline / Windsurf — telling the agent: "you have a wiki, here's how to use it." From that point on, the agent reaches for kb search, kb add, kb update on its own when you say things like "remember this" or "what did we decide about X."

You stop being the bridge between your agent and its memory. The agent walks across on its own.

Three commands to give your agent memory

Install globally:

npm install -g kb-wiki

Create a wiki and set it as default:

kb wiki create my-notes
kb wiki use my-notes

Wire it into your agent (pick the one you use):

kb setup --agents claude --global    # Claude Code, user-scope
kb setup --agents cursor             # Cursor, project-scope
kb setup --agents codex  --global    # Codex CLI, user-scope

That's it. Open your agent and try:

"Remember that the billing service is actually called payments-v2 internally."

"Ingest this article and link it to our notes on caching."

"What did we decide about the auth flow last week?"

The agent will reach for kb on its own.

One performance note: the embedding model is ~200 MB and takes 2-3 seconds to load on every cold call. If you're going to use it a lot in a session, start the local server once:

kb serve --detached

All subsequent kb calls auto-route through the warm server — no flags needed on the callers. Search drops from ~2-3 s to ~50-150 ms.

Why this actually sticks

The annoying part of any knowledge base is not the reading. It's the bookkeeping. Updating cross-references when something gets renamed. Keeping summaries current. Noticing that a new note contradicts an old one. Humans abandon wikis because that maintenance cost grows faster than the value.

LLMs don't get bored. They don't forget to update a cross-reference. They can touch fifteen files in one pass without complaining. The wiki stays alive because the cost of keeping it alive is near zero.

The job split is clean: you curate sources and ask good questions. The model does the filing.

If you've explained the same project context to your agent five times this month, this is the thing that fixes it.

npm install -g kb-wiki
kb wiki create my-notes
kb setup --agents claude --global

Three commands. Your agent now remembers.

Top comments (2)

Harjot Singh • May 31

"Bigger context windows are not memory" is the line that needs to be repeated until it sticks, because conflating the two is the most common mistake in this space. A 1M-token window is working memory you have to re-fill every session at full token cost; it's the opposite of persistence, it forgets the instant the tab closes. The wiki framing is right because real memory has properties a context dump doesn't: it's structured, addressable, editable, and most importantly it persists and gets curated over time. The detail that makes or breaks it is retrieval, a wiki the agent never consults is just a folder, so the win is the agent pulling the relevant page at the relevant moment, not loading the whole thing (which would just recreate the context-stuffing problem you're escaping). And the highest-value pages aren't facts, they're the constraints nobody wrote down, the exact thing you mention re-explaining every morning. I run almost this exact pattern, durable structured memory plus disciplined retrieval, and it's the single biggest quality lever for a long-running agent. It's core to how I build Moonshift. How does the agent decide which wiki page to pull, embedding search, or explicit links it maintains itself?

Artem M • Jun 1

Agent just runs this bash command ‘kb search “something I want to learn”’
and sees best match from vector search combined with full text search. When reading matched documents, agent can crawl related docs by links.
Also agent has a skill explaining how to search in order to get best results, e.g. it can use only full text search to match exact terms, or only vector search to match semantics only. By default it searches both and combines the results, so the pages matched by both approaches get higher score.
But I believe the key ingredient is “writing for retrieval”: agent is thought to write and link docs in a way, so those then easy to search and retrieve. ‘kb lint’ command is useful and highlights the potential weak paragraphs, so agent can address those and re-write or accept.