DEV Community

Zafer Dace
Zafer Dace

Posted on • Originally published at dev.to

Karpathy's Obsidian Wiki Broke at 100 Articles - RAG Fixed It


When your note system gets smart enough to confuse itself.

When Andrej Karpathy shared his LLM wiki workflow, I built one the same week. Obsidian vault, raw documents, Claude Code compiling everything into a structured wiki with backlinks and cross-references. I wrote about it, people loved it, and I kept feeding the beast.

Then somewhere around article 80, things started breaking.

Not breaking in an obvious way. Breaking in a way where Claude would confidently tell me something from my own wiki — and be wrong. Ask it "what's the difference between ReAct and Chain of Thought?" and it would tell me ReAct was a step inside Chain of Thought reasoning, stitching my [[react-pattern]] note to my [[cot-overview]] note into a confident hybrid that no source document actually contained.

Not hallucinating. Worse. The context window had become a blender.


The Problem Nobody Warns You About

Every tutorial about LLM knowledge bases shows you the happy path: 10 articles, beautiful graph view, perfect answers. But nobody tells you what happens at scale.

Here's the math. A single Obsidian wiki article averages ~500 tokens. At 100 articles, that's 50K tokens — well within Claude's 200K context window. Sounds fine, right?

Except you also have:

  • Raw source documents (often 2-5x longer than the compiled articles)
  • The _index.md master file growing with every addition
  • Your CLAUDE.md instructions
  • The actual conversation context
  • The question you're asking and the reasoning needed to answer it

By the time you hit 100 articles, you're actually pushing 200-400K tokens. The model isn't reading your wiki anymore — it's skimming it. And skimming leads to exactly the kind of "confident but wrong" answers I was getting.

Karpathy's approach works brilliantly. But he didn't mention what happens when your wiki outgrows the context window. So I had to figure it out myself.


More context is not the same as better memory.


The Fix: RAG in 50 Lines

RAG — Retrieval Augmented Generation. Instead of stuffing everything into the context window, you search first and only load what's relevant.

The concept is simple:

OLD: Load entire wiki → Ask question → Hope the model finds the right article
NEW: Ask question → Search finds the 5 most relevant chunks → Load only those → Get precise answer
Enter fullscreen mode Exit fullscreen mode

I built this in 50 lines of Python using ChromaDB (a local vector database) and a tiny embedding model. No cloud services, no API costs for the retrieval part, everything runs locally. Full implementation is in the appendix; the workflow is what matters.

Step 1: Install dependencies

pip install chromadb
Enter fullscreen mode Exit fullscreen mode

Step 2: Index your vault

python index_vault.py ~/path/to/obsidian/vault
Enter fullscreen mode Exit fullscreen mode

The indexer walks your vault, splits every markdown file into section-level chunks (one per # / ## / ### heading), computes embeddings, and stores them in a local ChromaDB with file path, heading, and line numbers as metadata.

Step 3: Query

python query_vault.py "how does the ReAct pattern work"
python query_vault.py "what are the salary ranges for AI roles"
Enter fullscreen mode Exit fullscreen mode

Semantic search returns the top N most relevant chunks. You see exactly which files and headings matched, and the relevance distance.

That's the whole loop. The 50 lines of Python at the end of this post cover both the indexer and the query tool.


The Difference is Night and Day

Before RAG, my 100-article wiki couldn't answer the ReAct vs Chain of Thought question cleanly — the model would blend five articles into a plausible-sounding mess.

After RAG, the same question retrieves exactly two chunks — the ReAct article and the Chain of Thought article — and the answer is precise.

The key insight: RAG doesn't replace the LLM's intelligence. It replaces the LLM's memory. Instead of trying to remember everything it skimmed, it gets exactly what it needs.

Token comparison

Approach Tokens loaded Answer quality
Full wiki in context (50 articles) ~25,000 Good
Full wiki in context (100 articles) ~50,000 Degrading
Full wiki in context (200+ articles) ~100,000+ Unreliable
RAG (top 5 chunks) ~2,500 Excellent

That's a 20-40x reduction in tokens with better results.


The expensive part was never generation. It was dragging the whole library into the room.


Keeping It Fresh: Auto-Reindex on Edit

A wiki is alive — you add articles, edit existing ones, reorganize sections. If your RAG index is stale, you get stale answers.

I set up a simple hook: every time I save a markdown file, it automatically re-indexes that file in ChromaDB. If you're using Claude Code, add this to your hooks:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "command": "python3 /path/to/reindex_file.py /path/to/vault \"$FILE_PATH\"",
      "timeout": 5000
    }]
  }
}
Enter fullscreen mode Exit fullscreen mode

Now your RAG index stays in sync without you thinking about it.


What I Changed About My Workflow

Karpathy's original approach — dump documents, let the LLM compile — still works perfectly for the writing part. But the reading part needed to change.

Before:

  1. Add raw document to raw/
  2. Ask Claude to compile into wiki articles
  3. Ask questions by loading the entire wiki into context

After:

  1. Add raw document to raw/
  2. Ask Claude to compile into wiki articles
  3. Auto-reindex the vault into ChromaDB
  4. Ask questions using RAG to retrieve relevant chunks first

Step 3 is invisible (hook does it). Step 4 is just a different command. The workflow barely changed, but the quality at scale is dramatically better.


The Graph Gets More Valuable, Not Less

One thing I worried about: would RAG make Obsidian's graph view irrelevant? If I'm searching by meaning instead of following links, why bother with [[wiki links]]?

Turns out, they serve different purposes:

  • Graph view = exploring connections you didn't know existed ("oh, these two concepts are linked through this third one")
  • RAG search = finding exactly what you need when you know what you're looking for

The graph is for discovery. RAG is for retrieval. You need both.


When You Don't Need RAG

Let me save you some effort. You probably don't need RAG if:

  • Your wiki is under 50 articles
  • You're using a model with 200K+ context (Claude, Gemini)
  • Your articles are short (under 300 tokens each)
  • You mainly browse the wiki in Obsidian, not through LLM queries

The sweet spot where RAG becomes necessary: 80-100 articles, or whenever you notice the LLM's answers getting fuzzy.


The Real Lesson

The failure mode nobody talks about: LLM workflows fail first at retrieval, not generation.

When Claude gave me a wrong answer from my own wiki, the model wasn't broken. The retrieval was. The "intelligence" of a knowledge base isn't in the LLM — it's in what you choose to put in front of the LLM. Bigger context windows just let you hide this problem longer. Eventually you'll hit the wall.

Karpathy showed us how to build the wiki. He forgot to ship the search engine.

If you take one thing from this post: before you upgrade to a bigger model or try to fit more into context, look at what you're loading. Most of it is noise. RAG isn't a cleverness trick — it's just respecting your model's attention.


Have you built your own LLM wiki? I'd love to hear how you're handling scale — drop a comment below.


Appendix A: Full Implementation

index_vault.py

#!/usr/bin/env python3
"""index_vault.py — Index Obsidian vault into ChromaDB"""

import os, re, hashlib, chromadb

DB_PATH = "chroma_db"

def extract_sections(content, filepath):
    """Split markdown into section-level chunks."""
    chunks = []
    lines = content.split("\n")
    current_section = []
    current_heading = "intro"
    start_line = 1

    for i, line in enumerate(lines):
        if re.match(r'^#{1,3}\s+', line):
            if current_section:
                text = "\n".join(current_section).strip()
                if text:
                    chunks.append({
                        "text": text,
                        "heading": current_heading,
                        "file": filepath,
                        "start_line": start_line,
                        "end_line": i
                    })
            current_heading = re.sub(r'^#{1,3}\s+', '', line).strip()
            current_section = [line]
            start_line = i + 1
        else:
            current_section.append(line)

    if current_section:
        text = "\n".join(current_section).strip()
        if text:
            chunks.append({
                "text": text,
                "heading": current_heading,
                "file": filepath,
                "start_line": start_line,
                "end_line": len(lines)
            })
    return chunks


def index_vault(vault_path):
    client = chromadb.PersistentClient(path=DB_PATH)
    try:
        client.delete_collection("wiki")
    except:
        pass
    collection = client.create_collection("wiki")

    all_chunks = []
    for root, dirs, files in os.walk(vault_path):
        dirs[:] = [d for d in dirs if d not in {".obsidian", ".git"}]
        for f in files:
            if not f.endswith(".md"):
                continue
            filepath = os.path.join(root, f)
            rel_path = os.path.relpath(filepath, vault_path)
            with open(filepath, "r", errors="ignore") as fh:
                content = fh.read()
            chunks = extract_sections(content, rel_path)
            all_chunks.extend(chunks)

    print(f"Indexing {len(all_chunks)} chunks from {vault_path}")

    batch_size = 64
    for i in range(0, len(all_chunks), batch_size):
        batch = all_chunks[i:i+batch_size]
        texts = [c["text"] for c in batch]
        ids = [hashlib.md5(f"{c['file']}::{c['heading']}::{c['start_line']}".encode()).hexdigest() for c in batch]
        metadatas = [{"file": c["file"], "heading": c["heading"],
                      "start_line": c["start_line"], "end_line": c["end_line"]} for c in batch]
        collection.add(documents=texts, ids=ids, metadatas=metadatas)

    print(f"Done! {len(all_chunks)} chunks indexed.")


if __name__ == "__main__":
    import sys
    vault = sys.argv[1] if len(sys.argv) > 1 else "."
    index_vault(vault)
Enter fullscreen mode Exit fullscreen mode

query_vault.py

#!/usr/bin/env python3
"""query_vault.py — Semantic search over your Obsidian wiki"""

import sys, chromadb

def query(question, n_results=5):
    client = chromadb.PersistentClient(path="chroma_db")
    collection = client.get_collection("wiki")
    results = collection.query(query_texts=[question], n_results=n_results)

    for i in range(len(results["ids"][0])):
        meta = results["metadatas"][0][i]
        dist = results["distances"][0][i]
        doc = results["documents"][0][i]
        print(f"\n{'='*60}")
        print(f"#{i+1} | {meta['file']}{meta['heading']} | distance: {dist:.4f}")
        print(f"{'='*60}")
        lines = doc.split("\n")
        print("\n".join(lines[:20]))
        if len(lines) > 20:
            print(f"... ({len(lines)-20} more lines)")

if __name__ == "__main__":
    question = sys.argv[1] if len(sys.argv) > 1 else "help"
    n = int(sys.argv[2]) if len(sys.argv) > 2 else 5
    query(question, n)
Enter fullscreen mode Exit fullscreen mode

Appendix B: The Setup Prompt

If you want Claude Code to set up this entire system for you, paste this prompt:

I want to set up an Obsidian knowledge base with RAG-powered search. Here's what I need:

1. Create a vault folder structure:
   - raw/ (source documents, never modified by LLM)
   - wiki/concepts/ (atomic concept articles, one per file)
   - wiki/topics/ (broader topic articles connecting concepts)
   - output/ (generated summaries, reports)
   - _index.md (master index of all articles)

2. Create a CLAUDE.md with these rules:
   - Articles use YAML frontmatter (title, created, updated, tags, sources)
   - Use [[wiki links]] for cross-referencing
   - Tags: [list your domains, e.g., ai, career, tools, security]
   - Keep concepts atomic, topics can synthesize
   - Update _index.md after changes

3. Create index_vault.py:
   - Uses ChromaDB + sentence-transformers
   - Splits markdown into section-level chunks
   - Stores file path, heading, line numbers as metadata
   - Skips .obsidian and .git folders

4. Create query_vault.py:
   - Semantic search over the indexed wiki
   - Returns top N results with file, heading, distance

5. Add a sample raw document and compile it into wiki articles with backlinks.

6. Index the vault and test a query.

Vault location: ~/obsidian-vault
Enter fullscreen mode Exit fullscreen mode

Top comments (0)