Fex Beck

Posted on Apr 16

I accidentally built Karpathy's LLM Wiki — with 5,420 memories, 6 AI agents, and a self-healing knowledge graph

#ai #llm #braindb #karpathy

When Andrej Karpathy published his LLM Wiki pattern on April 4, 2026, I had a strange feeling of deja vu. I'd been building exactly this — a persistent, compounding knowledge system maintained by AI agents — for months on my homelab server in Bavaria. Only mine had evolved into something he didn't describe: a multi-agent cognitive engine that fact-checks itself at 2 AM.

What Karpathy Described

Karpathy's core insight is elegant: RAG is stateless. Every query starts from scratch, searching raw documents. A wiki compounds knowledge — each query refines and connects what the system knows.

He proposes three layers (raw sources, wiki entries, schema) and three operations:

Ingest — extract structured knowledge from raw sources, cross-reference with existing entries
Query — search the wiki, synthesize an answer, optionally write back
Lint — periodically health-check for staleness, contradictions, gaps

It's a beautiful pattern. And I'd been living it for a month before he published it.

What I Built (Before Reading His Post)

BrainDB started in March 2026 as a dumb JSON store so my Claude Code sessions could remember things between conversations. "Just save the SSH password somewhere," I thought.

Five weeks later, it had grown into this:

5,420+ memories across 11 types: credential, service, project, feedback, lesson, issue, reference, user, note, research, decision
Hybrid search — SQLite FTS5 + semantic embeddings (Ollama on a local GPU) + Reciprocal Rank Fusion
Knowledge graph — 551 relations with automatic entity extraction
Multi-agent coordination — advisory locks, heartbeats, session handovers
105+ API endpoints, 40+ MCP tools wired directly into Claude Code

The architecture wasn't planned. It was grown, one annoying problem at a time. "Why does Claude keep forgetting how my firewall works?" led to memory types. "Why did two agents just edit the same file?" led to coordination. "Why is this credential from last week wrong?" led to contradiction detection.

The Mapping — His Pattern, My Implementation

When I read Karpathy's gist, I started mapping concepts:

Karpathy's Concept	BrainDB's Implementation
Raw Sources -> Wiki -> Schema	`research` memories -> `recall`/`search` -> `CLAUDE.md` bootstrap
Ingest (extract + cross-reference)	`/learn` + `/wiki/ingest` (entity extraction, relation creation)
Query (search wiki + synthesize)	`/ask` RAG pipeline + `/wiki/synthesize`
Lint (health check)	`/wiki/lint` + autoDream + contradiction detection
Index file	`/recall` + `/hybrid-search` (FTS5 + vectors)
Log file	`/wiki/log` (append-only activity chronicle)
"The LLM maintains everything"	6 AI agents, each with a specialty

The overlap was uncanny. But the differences were more interesting.

Where BrainDB Goes Further

Karpathy describes a single LLM maintaining a wiki. BrainDB does that, but it also does things his pattern doesn't cover:

1. Inception Knowledge — The 2 AM Fact-Checker

Every night at 2:30 AM, a cron job kicks off a dream cycle:

Wake GPU PC via Wake-on-LAN
-> Query SearXNG for facts to verify
-> Pre-gate with local 14B model (free)
-> Fact-check survivors with Mistral API (free tier)
-> Store validated findings back in BrainDB
-> Shut down GPU PC

Last week it caught that a Docker image I'd pinned had a CVE published two days prior. I woke up to a memory tagged type: issue with the CVE number and a suggested fix. Self-healing knowledge isn't a feature — it's the whole point.

2. Contradiction Detection

Five automated strategies scan for conflicts: port collisions, status mismatches, decision reversals, credential drift, and temporal impossibilities. When two memories disagree, the system flags it and optionally routes to Mistral for arbitration.

{
  "strategy": "port_conflict",
  "memory_a": "Grafana runs on port 3000",
  "memory_b": "Brain dashboard runs on port 3000",
  "severity": "high",
  "auto_resolved": false
}

This is Karpathy's "lint" on steroids. His lint checks for staleness and gaps. Mine checks whether the knowledge is internally consistent.

3. Multi-Agent Coordination

Not one LLM — six, each with a role:

Agent	Cost	Role
Claude Code (Opus)	expensive	Orchestration, complex multi-step tasks
Mistral	free	Strategy, analysis, fact-checking
Codestral	free	Code review, refactoring, security
Codex	subscription	Autonomous multi-file coding
Local 14B (Ollama)	free	Batch processing, embeddings, offline fallback
Vibe	free	Quick prototyping

They coordinate through BrainDB: advisory locks prevent two agents from editing the same project, heartbeats track who's alive, and handover records let a morning session pick up exactly where the 2 AM dream cycle left off.

4. Temporal Decay

Every memory has a freshness score with a 30-day half-life. A credential stored yesterday scores 1.0. The same credential after 60 days scores 0.25. Search results factor in freshness, so you naturally get current information first.

Karpathy's wiki doesn't age. Mine does. Because facts spoil.

5. Context Budget API

Ask for 4,000 tokens of context about "firewall configuration" and get exactly that — the most relevant memories packed into your budget, scored and ranked:

curl braindb:3197/compact \
  -d '{"query": "firewall rules", "budget": 4000}'

This turns BrainDB from a search engine into a context engine. The LLM doesn't drown in irrelevant results — it gets a curated briefing, every time.

The Wiki Integration (After Reading His Post)

Credit where it's due: Karpathy's framing gave me four endpoints I was missing.

/wiki/synthesize — Takes related memories and asks Mistral to produce a synthesis. Connections I hadn't seen emerged.
/wiki/lint — A proper health score: orphan detection, staleness audit, coverage gaps, contradiction count.
/wiki/ingest — Cascading ingest with entity extraction and auto-linking to the knowledge graph.
/wiki/log — An append-only chronicle of every knowledge mutation.

The lint run was humbling. First pass flagged 20,647 contradictions. After reviewing them, 99.7% were false positives — mostly memories that described the same thing at different points in time. After dismissing those, the health score went from 0 to 69. There's work to do.

Lessons Learned

Knowledge compounds, but only if maintenance cost is zero. This is Karpathy's core insight and it's exactly right. BrainDB's maintenance is automated — dream cycles, contradiction detection, temporal decay. If I had to manually curate 5,420 memories, I'd have stopped at 50.

A single LLM is not enough. Different models have different strengths and cost profiles. Claude orchestrates. Mistral analyzes. Codestral reviews code. The local model handles bulk work for free. The ensemble is smarter and cheaper than any single model.

Contradiction detection is the killer feature nobody talks about. Everyone builds RAG. Nobody builds systems that notice when their own knowledge is wrong. This is where the real reliability comes from.

The wiki pattern works better as an API than as markdown files. Karpathy describes a folder of markdown files. That works for a prototype. But once you need search, scoring, freshness, coordination, and budget-aware retrieval, you need a database with an API.

Try It Yourself

Karpathy's LLM Wiki pattern is a great mental model. But the real magic happens when you add agents, contradictions, and a system that learns while you sleep. BrainDB started as a hack to save SSH passwords and became the nervous system for everything I build.

If you want to try the lightweight version: BrainDB Light+ is open source — single Docker container, SQLite-backed, under 500 KB, with hybrid search and the full wiki pattern built in.

What does your AI remember between sessions? And more importantly — does it know when it's wrong?

This is the second article in my BrainDB series. The first one, "I Built an AI Memory That Fact-Checks Itself While You Sleep", covers the Inception Knowledge system in detail.

DEV Community