DEV Community

Xunxing Mao
Xunxing Mao

Posted on

Practicing Karpathy's Personal Knowledge Base Method with a Git Repository

This article was originally published on maoxunxing.com. Follow me there for more on AI-assisted workflows, Hugo, and knowledge systems.


What Karpathy Shared

Andrej Karpathy recently shared a practical approach on X/Twitter and published a complete LLM Wiki Gist: using LLMs to build personal knowledge bases for research topics. The core workflow:

  1. Dump source files (articles, papers, screenshots) into a raw/ directory
  2. Use an LLM to "compile" them into structured Markdown knowledge entries
  3. Browse everything in Obsidian
  4. Query the knowledge base — the LLM searches and answers autonomously
  5. Periodically run LLM "health checks" to fix contradictions and fill gaps

His knowledge base has grown to ~100 entries and 400K words. No RAG needed — the LLM maintains indexes and summaries to handle all queries.

In one sentence: raw materials in, structured knowledge out, LLM does the heavy lifting.

Why Not Obsidian

Karpathy uses Obsidian as his viewer. But if you already have a Hugo blog repository, you don't need any extra software:

Need Obsidian Approach Hugo Repo Approach
View Markdown Obsidian editor hugo server -D local preview
Link knowledge [[]] backlinks + graph Hugo tags + Algolia search
Publish output Requires extra export Remove draft: true, push
Version control Needs Obsidian Git plugin It's already a Git repo
Multi-device sync Obsidian Sync or iCloud git pull
Search Built-in Obsidian search grep / Algolia / LLM

The key advantage: knowledge refined into articles publishes directly — zero migration cost. One repo, full pipeline from collection to publication.

Three-Layer Knowledge Pipeline

Build three content tiers inside your repository:

content/
  raw/        <- Inbox: see something good, dump it here
  notes/      <- Knowledge base: LLM-compiled structured entries
  posts/      <- Blog: polished, published articles
Enter fullscreen mode Exit fullscreen mode

raw/ — Zero-Friction Inbox

This is the system's entry point. Key principle: don't fuss over formatting or classification — just capture it.

Each raw entry is a Markdown file with frontmatter:

---
title: "Some article about RAG pipelines"
date: 2026-04-09
draft: true
tags: [AI, RAG]
source: "https://original-url"
---

Paste the original text / summary / screenshot / notes here. Whatever is fastest.
Enter fullscreen mode Exit fullscreen mode

draft: true ensures these materials never appear on your live blog — only visible locally with hugo server -D.

notes/ — Compiled Knowledge Entries

When raw/ accumulates enough material on a topic, let the LLM:

  • Merge and synthesize related materials
  • Extract core insights
  • Add structured summaries
  • Tag with cross-references

Turning raw/ fragments into complete knowledge entries in notes/.

posts/ — Published Blog Articles

When a notes/ entry reaches sufficient depth and you're ready to write a full article, polish it, remove draft: true, and publish.

Flow is always one-directional: raw -> notes -> posts. Materials only get more refined, never regress.

Step-by-Step Setup

Step 1: Create the raw directory

mkdir -p content/raw

cat <<'EOF' > content/raw/_index.md
---
title: "Raw"
description: "Knowledge inbox"
draft: true
---
EOF
Enter fullscreen mode Exit fullscreen mode

Step 2: Add a Hugo archetype template

Create archetypes/raw.md:

---
title: "{{ replace .Name "-" " " | title }}"
date: {{ .Date }}
draft: true
tags: []
source: ""
---
Enter fullscreen mode Exit fullscreen mode

Now hugo new raw/topic-name/index.md auto-generates entries with the template.

Step 3: Configure Hugo permalinks

Add raw to the permalinks section in config.toml:

[permalinks]
raw = "/:slugorcontentbasename/"
Enter fullscreen mode Exit fullscreen mode

Step 4: Start collecting

See a good article or have an idea? Create a raw entry immediately:

hugo new raw/interesting-topic/index.md
Enter fullscreen mode Exit fullscreen mode

Paste in the content. No formatting needed, no perfection required — raw state is fine.

Compiling with LLM

This is the heart of Karpathy's method and the highest-value step.

Materials -> Knowledge Entries

Have the LLM read multiple related materials from raw/ and synthesize a notes/ entry:

"Read all raw entries tagged with AI, synthesize them into a structured knowledge entry under content/notes/ai-fundamentals/. Requirements: extract core concepts, add cross-references, cite sources."

Knowledge Entries -> Blog Posts

When a notes entry has accumulated enough depth:

"Based on the knowledge entry in content/notes/ai-fundamentals/, write a developer-facing blog post for content/posts/. Requirements: include opinions, real examples, and actionable advice."

Health Checks

Periodically audit the knowledge base:

"Scan all entries in content/raw/ and content/notes/. Find: 1) duplicate topics that should merge 2) entries missing tags 3) raw materials ready to compile into notes"

Automate with a Qoder Skill

Take it further with a Qoder Skill — one sentence does it all:

  • /kb collect https://example.com/article — fetch and create a raw entry
  • /kb collect I learned today that LoRA fine-tuning's key is... — quick-capture a thought
  • /kb compile AI — compile AI-related raw materials into a notes entry
  • /kb preview — start local preview with all materials visible
  • /kb check — LLM health check

Daily Workflow

The visual flow:

See a great article / Have an insight
       |
       v
  /kb collect "content"     <-- One sentence, zero friction
       |
       v
  content/raw/xxx/          <-- Auto-created, draft:true
       |
       v (accumulate enough)
  /kb compile "topic"       <-- LLM synthesizes
       |
       v
  content/notes/xxx/        <-- Structured knowledge entry
       |
       v (polish & refine)
  content/posts/xxx/        <-- Published blog post, draft removed
       |
       v
  git push -> live on the web
Enter fullscreen mode Exit fullscreen mode

The entire process:

  • Collection: Zero friction, one sentence
  • Compilation: LLM handles the grunt work
  • Publishing: Remove draft: true, push to deploy
  • No extra software: Git + Hugo + LLM, that's it

Comparison with Karpathy's Original

Aspect Karpathy's Version This Approach
Storage Standalone knowledge repo Embedded in blog repo
Viewer Obsidian hugo server -D
Raw materials raw/ directory content/raw/ (draft)
Compilation LLM generates .md LLM generates notes/
Output Markdown/Marp/charts Directly published as blog posts
Search Custom search engine grep + Algolia + LLM
Health checks LLM audit Same LLM audit

The biggest difference: Karpathy's knowledge base is standalone — output requires manual migration. In this approach, the knowledge base and blog are unified. Collection to publication happens in one repository, with zero migration cost.

Summary

The core of Karpathy's method isn't about which tools you use — it's about establishing a "collect -> compile -> output" knowledge pipeline and letting the LLM handle compilation and maintenance.

If you already have a blog repository, you can implement this method right inside it: add content/raw/ as an inbox, use draft: true to control visibility, and let the LLM drive the flow from raw materials to knowledge to published articles.

No Obsidian. No Notion. No new software. One Git repo is your knowledge base.


If you're interested in AI-assisted development workflows, check out my AI Coding Playbook for tool selection and prompt templates.

I also wrote AI Rewriting Workflow on how knowledge workers can adapt when AI multiplies leverage.


References

Primary Sources

  • Andrej Karpathy's original postX/Twitter thread on LLM Knowledge Bases — The original announcement describing the raw/ -> wiki compilation workflow.
  • LLM Wiki Gistgithub.com/karpathy/442a6bf... — Karpathy's complete LLM Wiki pattern specification, defining the three-layer architecture (source materials, AI-generated wiki, configuration).

Video Explainers

Analysis & Community

Related Concepts


Felix Mao | maoxunxing.com | @maoxunxing

Top comments (0)