The Inspiration
I saw Andrej Karpathy's viral post about using LLMs to build personal knowledge bases — no vector database, no chunking pipeline. Just markdown files, Obsidian, and Claude Code.
The core idea blew my mind:
- Create a folder with
raw/andwiki/subfolders - Drop in source documents, articles, transcripts
- Tell the LLM to ingest the raw files and build wiki pages with relationships, tags, and backlinks
I immediately thought: I need to build this, but better.
What I Built
I took Karpathy's concept and extended it into a full-featured Personal Second Brain with several improvements:
The Original Concept (Credit: Andrej Karpathy)
- Markdown-based wiki with
raw/→wiki/pipeline - LLM reads source material and generates structured wiki pages
- Pages link to each other via
[[backlinks]] - Graph view in Obsidian shows connections
My Improvements
1. Multi-Format Ingestion
The original handles text/markdown. I added support for:
- PDF files → converted via Marker to markdown before processing
- YouTube transcripts → auto-fetched and ingested
- Web articles → fetched and cleaned automatically
- Any text-based format
2. Smart Duplicate Detection
Before creating a new wiki page, the system checks if a similar topic already exists. If so, it merges the new information instead of creating duplicates.
3. Auto-Generated Index
A master _Index.md file is automatically maintained with:
- Categorized links to all wiki pages
- Quick-reference descriptions
- Last-updated timestamps
4. Relationship Mapping
Every wiki page includes:
-
related_topicsin frontmatter - Inline
[[backlinks]]to connected concepts - Tags for cross-cutting themes
5. Source Tracking
Each wiki page tracks which raw file(s) it was generated from, so you can always trace back to the original source.
Project Structure
knowledge-base/
├── raw/ # Drop files here
│ ├── articles/
│ ├── transcripts/
│ ├── notes/
│ └── pdfs/
├── wiki/ # Auto-generated wiki pages
│ ├── _Index.md # Master index
│ ├── concept-name.md # Individual pages
│ └── ...
├── .claude/
│ └── commands/
│ └── ingest.md # The ingestion prompt
└── CLAUDE.md # Project instructions
How the Ingestion Works
The magic is in the ingestion prompt. When you run it, Claude Code:
-
Scans
raw/for new/modified files - Reads each file and extracts key concepts, entities, and relationships
- Checks existing wiki pages for overlap
- Creates or updates wiki pages with proper frontmatter, backlinks, and tags
- Updates the master index
Here's what a generated wiki page looks like:
---
title: Transformer Architecture
tags: [deep-learning, nlp, attention]
source: raw/articles/attention-is-all-you-need.md
related_topics: [[Self-Attention]], [[BERT]], [[GPT]]
created: 2026-04-07
---
# Transformer Architecture
The transformer is a neural network architecture that relies
entirely on self-attention mechanisms...
## Key Concepts
- **Self-Attention** — see [[Self-Attention]]
- **Multi-Head Attention** — parallel attention layers
- **Positional Encoding** — since transformers have no recurrence
## Related
- [[BERT]] — encoder-only transformer
- [[GPT]] — decoder-only transformer
The Results
After ingesting ~50 files:
- 44 interconnected wiki pages generated automatically
- Graph view in Obsidian shows meaningful clusters
- Token savings: ~90% reduction vs. feeding raw files to an LLM
- Retrieval: follows index → links instead of similarity search, so relationships are meaningful, not just "these chunks seem similar"
Try It Yourself
Prerequisites
- Claude Code (CLI)
- Obsidian (for viewing)
- A folder of documents you want to organize
Quick Start
mkdir -p ~/knowledge-base/{raw,wiki}
cd ~/knowledge-base
# Drop your files into raw/
cp ~/Documents/interesting-article.md raw/
# Start Claude Code and ingest
claude
# Then type: "Ingest all files in raw/ and create wiki pages in wiki/"
Key Takeaway
You don't need a vector database, embeddings pipeline, or RAG infrastructure to give AI persistent, organized memory. A folder of markdown files gets you surprisingly far.
The real insight from Karpathy's approach: let the LLM do what it's good at — reading, understanding, and organizing — while you use simple, human-readable files as the storage layer.
Credits
Full credit to Andrej Karpathy for the original concept and inspiration. His viral post about LLM-powered knowledge bases sparked this project. I've simply extended the idea with multi-format support, duplicate detection, and automated indexing.
Have questions or built something similar? Drop a comment below!
Top comments (0)