When Andrej Karpathy published his LLM Wiki pattern on April 4, 2026, I had a strange feeling of deja vu. I'd been building exactly this — a persistent, compounding knowledge system maintained by AI agents — for months on my homelab server in Bavaria. Only mine had evolved into something he didn't describe: a multi-agent cognitive engine that fact-checks itself at 2 AM.
What Karpathy Described
Karpathy's core insight is elegant: RAG is stateless. Every query starts from scratch, searching raw documents. A wiki compounds knowledge — each query refines and connects what the system knows.
He proposes three layers (raw sources, wiki entries, schema) and three operations:
- Ingest — extract structured knowledge from raw sources, cross-reference with existing entries
- Query — search the wiki, synthesize an answer, optionally write back
- Lint — periodically health-check for staleness, contradictions, gaps
It's a beautiful pattern. And I'd been living it for a month before he published it.
What I Built (Before Reading His Post)
BrainDB started in March 2026 as a dumb JSON store so my Claude Code sessions could remember things between conversations. "Just save the SSH password somewhere," I thought.
Five weeks later, it had grown into this:
-
5,420+ memories across 11 types:
credential,service,project,feedback,lesson,issue,reference,user,note,research,decision - Hybrid search — SQLite FTS5 + semantic embeddings (Ollama on a local GPU) + Reciprocal Rank Fusion
- Knowledge graph — 551 relations with automatic entity extraction
- Multi-agent coordination — advisory locks, heartbeats, session handovers
- 105+ API endpoints, 40+ MCP tools wired directly into Claude Code
The architecture wasn't planned. It was grown, one annoying problem at a time. "Why does Claude keep forgetting how my firewall works?" led to memory types. "Why did two agents just edit the same file?" led to coordination. "Why is this credential from last week wrong?" led to contradiction detection.
The Mapping — His Pattern, My Implementation
When I read Karpathy's gist, I started mapping concepts:
| Karpathy's Concept | BrainDB's Implementation |
|---|---|
| Raw Sources -> Wiki -> Schema |
research memories -> recall/search -> CLAUDE.md bootstrap |
| Ingest (extract + cross-reference) |
/learn + /wiki/ingest (entity extraction, relation creation) |
| Query (search wiki + synthesize) |
/ask RAG pipeline + /wiki/synthesize
|
| Lint (health check) |
/wiki/lint + autoDream + contradiction detection |
| Index file |
/recall + /hybrid-search (FTS5 + vectors) |
| Log file |
/wiki/log (append-only activity chronicle) |
| "The LLM maintains everything" | 6 AI agents, each with a specialty |
The overlap was uncanny. But the differences were more interesting.
Where BrainDB Goes Further
Karpathy describes a single LLM maintaining a wiki. BrainDB does that, but it also does things his pattern doesn't cover:
1. Inception Knowledge — The 2 AM Fact-Checker
Every night at 2:30 AM, a cron job kicks off a dream cycle:
Wake GPU PC via Wake-on-LAN
-> Query SearXNG for facts to verify
-> Pre-gate with local 14B model (free)
-> Fact-check survivors with Mistral API (free tier)
-> Store validated findings back in BrainDB
-> Shut down GPU PC
Last week it caught that a Docker image I'd pinned had a CVE published two days prior. I woke up to a memory tagged type: issue with the CVE number and a suggested fix. Self-healing knowledge isn't a feature — it's the whole point.
2. Contradiction Detection
Five automated strategies scan for conflicts: port collisions, status mismatches, decision reversals, credential drift, and temporal impossibilities. When two memories disagree, the system flags it and optionally routes to Mistral for arbitration.
{
"strategy": "port_conflict",
"memory_a": "Grafana runs on port 3000",
"memory_b": "Brain dashboard runs on port 3000",
"severity": "high",
"auto_resolved": false
}
This is Karpathy's "lint" on steroids. His lint checks for staleness and gaps. Mine checks whether the knowledge is internally consistent.
3. Multi-Agent Coordination
Not one LLM — six, each with a role:
| Agent | Cost | Role |
|---|---|---|
| Claude Code (Opus) | expensive | Orchestration, complex multi-step tasks |
| Mistral | free | Strategy, analysis, fact-checking |
| Codestral | free | Code review, refactoring, security |
| Codex | subscription | Autonomous multi-file coding |
| Local 14B (Ollama) | free | Batch processing, embeddings, offline fallback |
| Vibe | free | Quick prototyping |
They coordinate through BrainDB: advisory locks prevent two agents from editing the same project, heartbeats track who's alive, and handover records let a morning session pick up exactly where the 2 AM dream cycle left off.
4. Temporal Decay
Every memory has a freshness score with a 30-day half-life. A credential stored yesterday scores 1.0. The same credential after 60 days scores 0.25. Search results factor in freshness, so you naturally get current information first.
Karpathy's wiki doesn't age. Mine does. Because facts spoil.
5. Context Budget API
Ask for 4,000 tokens of context about "firewall configuration" and get exactly that — the most relevant memories packed into your budget, scored and ranked:
curl braindb:3197/compact \
-d '{"query": "firewall rules", "budget": 4000}'
This turns BrainDB from a search engine into a context engine. The LLM doesn't drown in irrelevant results — it gets a curated briefing, every time.
The Wiki Integration (After Reading His Post)
Credit where it's due: Karpathy's framing gave me four endpoints I was missing.
-
/wiki/synthesize— Takes related memories and asks Mistral to produce a synthesis. Connections I hadn't seen emerged. -
/wiki/lint— A proper health score: orphan detection, staleness audit, coverage gaps, contradiction count. -
/wiki/ingest— Cascading ingest with entity extraction and auto-linking to the knowledge graph. -
/wiki/log— An append-only chronicle of every knowledge mutation.
The lint run was humbling. First pass flagged 20,647 contradictions. After reviewing them, 99.7% were false positives — mostly memories that described the same thing at different points in time. After dismissing those, the health score went from 0 to 69. There's work to do.
Lessons Learned
Knowledge compounds, but only if maintenance cost is zero. This is Karpathy's core insight and it's exactly right. BrainDB's maintenance is automated — dream cycles, contradiction detection, temporal decay. If I had to manually curate 5,420 memories, I'd have stopped at 50.
A single LLM is not enough. Different models have different strengths and cost profiles. Claude orchestrates. Mistral analyzes. Codestral reviews code. The local model handles bulk work for free. The ensemble is smarter and cheaper than any single model.
Contradiction detection is the killer feature nobody talks about. Everyone builds RAG. Nobody builds systems that notice when their own knowledge is wrong. This is where the real reliability comes from.
The wiki pattern works better as an API than as markdown files. Karpathy describes a folder of markdown files. That works for a prototype. But once you need search, scoring, freshness, coordination, and budget-aware retrieval, you need a database with an API.
Try It Yourself
Karpathy's LLM Wiki pattern is a great mental model. But the real magic happens when you add agents, contradictions, and a system that learns while you sleep. BrainDB started as a hack to save SSH passwords and became the nervous system for everything I build.
If you want to try the lightweight version: BrainDB Light+ is open source — single Docker container, SQLite-backed, under 500 KB, with hybrid search and the full wiki pattern built in.
What does your AI remember between sessions? And more importantly — does it know when it's wrong?
This is the second article in my BrainDB series. The first one, "I Built an AI Memory That Fact-Checks Itself While You Sleep", covers the Inception Knowledge system in detail.
Top comments (0)