I'm not an ML researcher. I don't have a CS degree. I built a read-later app called Burn 451 because my bookmarks pile kept embarrassing me. Yesterday I shipped the ninth vault — 25 essays by Tiago Forte, our in-house PKM canon. The week before that, Andrej Karpathy published a gist describing how he organizes his personal research with an LLM. 16M views in 48 hours. I read it and realized he had just described my database schema.
What Karpathy said
The pattern is three folders. raw/ holds immutable source material — papers, transcripts, articles. The model reads it, never writes to it. wiki/ holds LLM-authored markdown, one file per concept, with backlinks and provenance tracked in git. outputs/ holds synthesized answers the model writes back after a query.
No vector database. No embeddings. No chunking. The LLM reads the markdown directly in its context window. Karpathy frames it as a compiler: raw is source, wiki is the compiled binary, outputs is the runtime. His own wiki on one research topic is around 100 articles and 400K words.
The top Hacker News thread on the gist hit 296 points. The main objection was model collapse — won't LLM-generated notes degrade over iterations? Karpathy's answer in the comments: he uses already-trained models to curate, not to retrain. Curation is a one-way flow. It compounds.
The counter-arguments matter. Epsilla's team pointed out that walking a directory is O(n) — it works at 100 pages, strains at 1,000, breaks at 10,000. Towards AI noted that vector embeddings are still the cheapest way to answer "what's relevant" across arbitrary corpus sizes. Both are right. The pattern only works inside a narrow envelope: small corpus, stable content, one human owner, model with a 200K+ context window. That envelope is the exact shape of a personal read-later archive.
The accidental reference implementation
I read the gist and opened my own codebase. The mapping is one-to-one.
My bookmarks table in Supabase is the raw layer. Every article a user saves lands there with the full fetched text, the canonical URL, and a timestamp. The model never edits these rows. They're append-only by construction.
My collections table is the wiki layer. Each collection is a topic — Karpathy, Simon Willison, Paul Graham, vibe coding, Tiago Forte. Nine live right now. Each collection holds a curated list of bookmark IDs, plus an AI-written summary and three-bullet digest generated per article. The full compiled output lives at website/data/vault-content/<slug>/<id>.md. Today that directory holds 140 markdown files and 263,000 words of source text.
The outputs layer is what the reader sees. Every vault renders as a static page at burn451.cloud/vault, regenerated hourly. Each article has its own public URL with structured data, an AI summary, and a link back to the original. The new one is burn451.cloud/vault/tiago-forte.
The piece Karpathy doesn't talk about in the gist is ingest. In his version, he drops files into raw/ manually. In Burn, ingest is automatic. A user taps save on their phone. The article is fetched through a clean-reader pipeline, deduped, summarized, and filed under a collection. I ship this as burn-mcp-server on npm — 26 tools for Claude and Cursor to query the vault in real time. Tools like search_vault, list_vault, get_article_content. It's the automated pipeline into Karpathy's raw/ folder. I didn't know that when I built it.
Demo: Tiago Forte vault in Claude Project
Here's how to run it yourself. Takes five minutes.
Step 1. Open the Tiago Forte vault. Twenty-five essays covering PARA, the CODE method, Building a Second Brain, and his AI-era writing. The vault exports as a zip of markdown files — frontmatter with title, URL, fetched date, then the full article body. Total size around 40K words, well under Claude's 200K context ceiling.
Step 2. Open Claude, create a new Project, name it something like "Tiago Forte Second Brain". Drag the zip into the Knowledge panel. Claude unpacks the markdown and adds each file to the project index. No embedding step. No chunk configuration.
Step 3. Ask things you actually want to know. Three queries I ran:
- "What is Forte's CODE method in one paragraph?" Claude returned a clean synthesis across the CODE introduction essay and the book chapters: Capture what resonates, Organize by actionability with PARA, Distill into the notes you'll actually use, Express by turning the notes into creative output. The answer named the four steps in order and cited which essays each was drawn from.
- "How does PARA differ from Zettelkasten?" This is where the no-RAG pattern earned its keep. Claude pulled from three different essays and produced a contrast — PARA organizes by actionability (what will I use this for?), Zettelkasten organizes by atomic idea (what concept does this belong to?). A chunk-based RAG would have returned a top passage from one essay. The full-context read produced the comparison.
- "How has Forte's view on AI changed between his 2022 writing and 2026?" Claude walked the timeline. Early caution about AI commoditizing notes. Later reframing around AI as a distillation partner. The answer tracked the shift across essays — something RAG's top-k retrieval struggles with by design.
If you're a developer, skip the export. Install burn-mcp-server in Cursor or Claude Code. The vault is already served as live MCP tools — the model calls search_vault and get_article_content directly. No file drag. New saves appear within seconds. This is the version I use day-to-day.
Where no-RAG breaks
I'm going to be honest about where this falls apart, because the pattern is being oversold on X right now.
Corpus size. Claude's context window is 200K tokens. Roughly 150K words. Past that, Claude Projects silently flips into RAG mode under the hood — there's an open issue documenting this. My Karpathy vault is already at 120K words. Merge all nine of my vaults into one project — 263K words — and you're not running pure no-RAG anymore.
Non-text. Images, audio, video. The pattern assumes markdown. A YouTube transcript works. A diagram doesn't. Forte has great slide decks. They don't make it in.
Live data. Claude Project knowledge is frozen at upload time. New essays need a re-upload. This is why the MCP version matters — it queries live Supabase, so the corpus updates the moment you save something new.
Who this isn't for. Enterprise search. Multi-tenant. Row-level ACL. Regulatory audit. A folder of markdown can't answer "which employees saw this document." If your requirements include any of that, use a real semantic graph.
What this means for you
If you save articles to a read-later app, you're already halfway to an LLM Wiki. You have the raw layer. You probably even have the wiki layer — it's called your tag system. The missing piece is the last mile: an export that an LLM can read, or an MCP server that hands it over in real time. Pocket never shipped either. Readwise ships an API but no MCP. The rest of the category is still optimizing for save count.
That's the whole product thesis behind Burn. Read-later is the natural shape of the pattern. The 24-hour timer handles triage. The vault handles the wiki. The MCP server handles ingest. Try it free. Point it at your bookmarks. See what your own reading looks like when you can ask it questions.
FAQ
What is an LLM Wiki?
An LLM Wiki is a human-curated set of markdown files that a language model reads directly at query time. Instead of chunking documents and retrieving embeddings, the model walks the directory and answers from the raw text. Karpathy proposed the pattern in April 2026 as an alternative to RAG for small, stable corpora owned by one person.
How is this different from RAG?
RAG chunks documents, embeds the chunks, and retrieves top-k matches at query time. An LLM Wiki skips all of that. The model reads the full markdown, uses its own context window as the index, and answers with cross-document synthesis. No vector database, no embedding costs, no chunk boundaries to fight.
Do I need to be technical to use this pattern?
No. If you can drag files into a Claude Project or ChatGPT Project, you have everything you need. The pattern works with any chat tool that accepts a knowledge folder and has a 200K-token context window. The technical version — MCP servers, CLI tools — is optional and only useful if you want live data.
What's Burn 451's role in this?
Burn 451 is a read-later app that accidentally matches the LLM Wiki shape. The bookmarks table is the raw layer. The vault collections and markdown files are the wiki layer. The MCP server automates ingest, so saving an article in Burn is the same as filing a new note in Karpathy's raw folder.
When should I still use RAG?
Use RAG when your corpus is bigger than your context window, when content changes every few minutes, or when you need multi-tenant access control. The LLM Wiki pattern breaks past roughly 200K words, fails on non-text content like images and audio, and has no concept of user permissions.
Can I use this with Claude Code or Cursor?
Yes. Install burn-mcp-server from npm, connect it to Claude Code or Cursor, and your saved articles become live tools the model can call. You skip the export step entirely. New saves in Burn appear as new queryable documents within seconds.
How big can the corpus be before the pattern breaks?
The hard ceiling is the model's context window. Claude handles 200K tokens, roughly 150K words. Past that, Claude Projects silently flip into RAG mode under the hood. In practice, keep a single wiki under 100K words for pure no-RAG behavior. Split larger corpora into topic-specific projects.
Originally published at burn451.cloud. Written by Fisher — @hawking520.
Top comments (0)