Zafer Dace

Posted on Apr 7

Build Your Own AI-Powered Knowledge Base with LLMs and Obsidian

#ai #llm #productivity #tutorial

A practical guide to Andrej Karpathy's approach for turning raw research into a living, LLM-maintained wiki.

Last week, Andrej Karpathy shared a fascinating workflow on X: instead of using LLMs primarily for code, he's been using them to build and maintain personal knowledge bases. Raw documents go in, and the LLM compiles them into a structured markdown wiki — complete with summaries, backlinks, concept articles, and cross-references.

The idea is simple but powerful: you rarely touch the wiki yourself. The LLM writes it, maintains it, and answers questions from it.

I loved this concept and decided to build my own version. In this post, I'll walk you through exactly how to set it up using Obsidian as your viewer and Claude Code (or any LLM coding agent) as the engine that manages everything.

The Architecture

The system has four layers:

There's no fancy integration or plugin needed. Obsidian and Claude Code simply share the same directory. Obsidian watches the files and renders them beautifully. Claude Code reads and writes them. That's it.

Step 1: Set Up the Vault

Create a folder structure for your knowledge base:

mkdir -p ~/knowledge-base/{raw,wiki/concepts,wiki/topics,output}
cd ~/knowledge-base

Create a CLAUDE.md file at the root — this tells Claude Code how to behave in this project:

# Knowledge Base Instructions

## Structure
- `raw/` — Source documents (articles, papers, notes). Never modify these.
- `wiki/` — LLM-maintained wiki. All articles are markdown with YAML frontmatter.
- `wiki/concepts/` — Individual concept articles.
- `wiki/topics/` — Broader topic overviews.
- `output/` — Generated outputs (comparisons, slides, charts).
- `_index.md` — Master index of all wiki articles with one-line summaries.

## Article Format
Every wiki article must have:
- YAML frontmatter with: title, tags, sources (list of raw/ files), last_updated
- A brief summary (2-3 sentences) at the top
- Backlinks to related concepts using [[wiki links]]
- Sources section at the bottom linking to raw/ documents

## Rules
- Always update `_index.md` when creating or modifying articles.
- Use [[double bracket]] links for cross-references.
- Never delete or modify files in `raw/`.
- When adding new information, cite the source file from `raw/`.

Now open this folder as an Obsidian vault:

Open Obsidian
"Open folder as vault" → select ~/knowledge-base
Done — Obsidian is now your viewer

Step 2: Collect Raw Data

This is the "data ingest" phase. You have several options:

Obsidian Web Clipper (Recommended)

Install the Obsidian Web Clipper browser extension. Configure it to save clipped articles into your raw/ folder. One click saves any web article as clean markdown.

Manual Copy

For PDFs, papers, or notes — just drop markdown files into raw/:

---
title: "Attention Is All You Need"
source: https://arxiv.org/abs/1706.03762
type: paper
date_added: 2025-04-07
---

# Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks...

Images

Save related images into raw/images/ and reference them in your markdown. Obsidian renders them inline, and Claude Code can analyze them too.

Step 3: Compile the Wiki

This is where the magic happens. Open Claude Code in your knowledge base directory and ask it to compile:

Read all files in raw/ and compile a wiki:
- Create concept articles in wiki/concepts/ for each key concept
- Create topic overviews in wiki/topics/ for broader themes
- Add backlinks between related articles
- Update _index.md with all articles and one-line summaries

Claude Code will:

Read every document in raw/
Identify key concepts and themes
Create structured markdown articles with frontmatter
Cross-link everything with [[wiki links]]
Build a master index

The result looks something like this in Obsidian's graph view — a connected web of knowledge that you never had to organize manually.

Incremental Updates

When you add new documents to raw/, you don't need to rebuild everything:

I added 3 new articles to raw/. Read them and integrate into the existing wiki.
Update existing articles if there's new info, create new ones if needed,
and update _index.md.

The LLM reads the new sources, figures out what's new vs. what's already covered, and surgically updates the wiki.

Step 4: Ask Questions

Once your wiki reaches a decent size, you can query it like a research assistant:

Based on the wiki, compare the training approaches of GPT-4 and Llama 3.
Write the comparison as output/gpt4-vs-llama3.md with a summary table.

What are the main unsolved problems in RLHF according to our sources?
Write a brief report to output/rlhf-challenges.md

Create a Marp slide deck summarizing the key concepts in wiki/topics/
Save as output/overview-slides.md

The LLM reads the relevant wiki articles, synthesizes an answer, and writes it as a markdown file — which you immediately see in Obsidian.

Pro tip: File the best outputs back into the wiki. Your explorations compound over time.

Step 5: Lint and Maintain

As Karpathy mentioned, you can run "health checks" on your wiki:

Scan the entire wiki for:
- Inconsistent information between articles
- Missing backlinks (concepts mentioned but not linked)
- Articles that reference deleted or missing sources
- Stub articles that need expansion
Report findings in output/health-check.md

Look at the wiki and suggest 5 new article topics that would
fill gaps in our coverage. Explain why each would be valuable.

This is surprisingly useful — the LLM often finds connections and gaps you wouldn't notice yourself.

Tips and Tricks

Use CLAUDE.md Wisely

The CLAUDE.md file is your control plane. As your wiki grows, refine the instructions. Add domain-specific terminology, preferred article structure, or naming conventions.

Keep _index.md Updated

This is the LLM's "table of contents." When the wiki gets large (100+ articles), the LLM reads _index.md first to understand what exists before diving into specific files. Keep it clean and current.

Obsidian Graph View

Enable Obsidian's graph view to visualize connections. The [[wiki links]] that the LLM creates show up as edges in the graph. It's a great way to spot isolated articles or missing connections.

Marp for Presentations

Install the Marp plugin for Obsidian to render slide decks. Ask Claude Code to generate presentations in Marp format — instant slides from your knowledge base.

Scale Considerations

Karpathy reports his wiki works well at ~100 articles and ~400K words without needing RAG. The key is the _index.md with brief summaries — the LLM reads this first, then dives into relevant articles. At much larger scales, you might need a search tool or embeddings-based retrieval.

Why This Works

The insight behind this approach is subtle: LLMs are better at maintaining structured knowledge than we are. They don't forget to add backlinks. They don't leave articles half-finished (unless you tell them to). They can read 50 articles and produce a consistent summary faster than we can read 5.

You bring the judgment — which sources to add, which questions to ask, which outputs to keep. The LLM handles the grunt work of organizing, linking, summarizing, and maintaining.

As Karpathy put it:

"You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts."

Until that product exists, Obsidian + Claude Code gets you 90% of the way there — today, for free, with tools you might already have.

Getting Started

Create a folder, add CLAUDE.md with your wiki rules
Open it as an Obsidian vault
Clip or drop 5-10 articles into raw/
Run claude in the folder and ask it to compile
Explore the result in Obsidian
Start asking questions

The beauty of this system is that it compounds. Every article you add, every question you ask, every health check you run — they all make the knowledge base richer and more connected. After a few weeks, you'll have a personal research assistant that actually knows your domain.

Credit: This approach was originally described by Andrej Karpathy. This post is a practical implementation guide based on his concept.

DEV Community