I Built a Personal Second Brain with Markdown Files and Claude Code — Here's How

#ai #productivity #knowledgebase #claudecode

The Inspiration

I saw Andrej Karpathy's viral post about using LLMs to build personal knowledge bases — no vector database, no chunking pipeline. Just markdown files, Obsidian, and Claude Code.

The core idea blew my mind:

Create a folder with raw/ and wiki/ subfolders
Drop in source documents, articles, transcripts
Tell the LLM to ingest the raw files and build wiki pages with relationships, tags, and backlinks

I immediately thought: I need to build this, but better.

What I Built

I took Karpathy's concept and extended it into a full-featured Personal Second Brain with several improvements:

The Original Concept (Credit: Andrej Karpathy)

Markdown-based wiki with raw/ → wiki/ pipeline
LLM reads source material and generates structured wiki pages
Pages link to each other via [[backlinks]]
Graph view in Obsidian shows connections

My Improvements

1. Multi-Format Ingestion
The original handles text/markdown. I added support for:

PDF files → converted via Marker to markdown before processing
YouTube transcripts → auto-fetched and ingested
Web articles → fetched and cleaned automatically
Any text-based format

2. Smart Duplicate Detection
Before creating a new wiki page, the system checks if a similar topic already exists. If so, it merges the new information instead of creating duplicates.

3. Auto-Generated Index
A master _Index.md file is automatically maintained with:

Categorized links to all wiki pages
Quick-reference descriptions
Last-updated timestamps

4. Relationship Mapping
Every wiki page includes:

related_topics in frontmatter
Inline [[backlinks]] to connected concepts
Tags for cross-cutting themes

5. Source Tracking
Each wiki page tracks which raw file(s) it was generated from, so you can always trace back to the original source.

Project Structure

knowledge-base/
├── raw/                    # Drop files here
│   ├── articles/
│   ├── transcripts/
│   ├── notes/
│   └── pdfs/
├── wiki/                   # Auto-generated wiki pages
│   ├── _Index.md           # Master index
│   ├── concept-name.md     # Individual pages
│   └── ...
├── .claude/
│   └── commands/
│       └── ingest.md       # The ingestion prompt
└── CLAUDE.md               # Project instructions

How the Ingestion Works

The magic is in the ingestion prompt. When you run it, Claude Code:

Scans raw/ for new/modified files
Reads each file and extracts key concepts, entities, and relationships
Checks existing wiki pages for overlap
Creates or updates wiki pages with proper frontmatter, backlinks, and tags
Updates the master index

Here's what a generated wiki page looks like:

---
title: Transformer Architecture
tags: [deep-learning, nlp, attention]
source: raw/articles/attention-is-all-you-need.md
related_topics: [[Self-Attention]], [[BERT]], [[GPT]]
created: 2026-04-07
---

# Transformer Architecture

The transformer is a neural network architecture that relies 
entirely on self-attention mechanisms...

## Key Concepts
- **Self-Attention** — see [[Self-Attention]]
- **Multi-Head Attention** — parallel attention layers
- **Positional Encoding** — since transformers have no recurrence

## Related
- [[BERT]] — encoder-only transformer
- [[GPT]] — decoder-only transformer

The Results

After ingesting ~50 files:

44 interconnected wiki pages generated automatically
Graph view in Obsidian shows meaningful clusters
Token savings: ~90% reduction vs. feeding raw files to an LLM
Retrieval: follows index → links instead of similarity search, so relationships are meaningful, not just "these chunks seem similar"

Try It Yourself

Prerequisites

Claude Code (CLI)
Obsidian (for viewing)
A folder of documents you want to organize

Quick Start

mkdir -p ~/knowledge-base/{raw,wiki}
cd ~/knowledge-base

# Drop your files into raw/
cp ~/Documents/interesting-article.md raw/

# Start Claude Code and ingest
claude
# Then type: "Ingest all files in raw/ and create wiki pages in wiki/"

Key Takeaway

You don't need a vector database, embeddings pipeline, or RAG infrastructure to give AI persistent, organized memory. A folder of markdown files gets you surprisingly far.

The real insight from Karpathy's approach: let the LLM do what it's good at — reading, understanding, and organizing — while you use simple, human-readable files as the storage layer.

Credits

Full credit to Andrej Karpathy for the original concept and inspiration. His viral post about LLM-powered knowledge bases sparked this project. I've simply extended the idea with multi-format support, duplicate detection, and automated indexing.

Have questions or built something similar? Drop a comment below!