How I Turned 6 Cognitive Science Principles Into an AI Agent That Builds Obsidian Vaults

Eason XUE — Wed, 20 May 2026 01:51:19 +0000

In 30 years, computers got 100x faster. Our learning method stayed the same: read → highlight → re-read → forget → re-read again.

I shared with EMBA students who have to digest hundreds of pages of materials every month — textbooks, case studies, industry reports. They have demanding jobs, families, and a finite number of hours. The ones who fall behind aren't less capable. They just don't have a system.

A few months ago I found a way to break that cycle. It combines a century-old insight about how knowledge works with modern AI. I open-sourced the result as VaultForge.

This is the story of what it does and how it's built.

The Method Was Always There — But It Had Two Bottlenecks

Management consultants face the same problem my audience do. When a consulting firm lands a client in an unfamiliar industry, they have one week to get smart enough to talk to senior executives. They can't read everything. So they don't try.

They do three things:

Build a framework first — map out what dimensions matter before reading
Define boundaries — know what they don't need to know, and actively search for knowledge and information around a few core questions, building connections between knowledge and real-world problems
Find experts — get someone to pinpoint the core consensus, debates and unknowns in 60 minutes, so you aren't limited by readily available knowledge but stand at the industry frontier to explore its direction and boundaries

Cognitive science backs this up. Passive re-reading creates familiarity, not memory. Active retrieval — explaining, questioning, connecting — is 3-5x more effective.

And then there's Niklas Luhmann. The German sociologist wrote 70 books and 400+ papers using 90,000 index cards. His secret wasn't intelligence. It was connection. Every card linked to others. Ideas fermented over years. He said he didn't think first and then write — he wrote to discover what he was thinking.

That's the zettelkasten method, formalized as digital tools like Obsidian.

But here's the problem. The consulting method requires access to experts and years of framework-building practices. Luhmann's system requires a lifetime of manual card maintenance. Both are inaccessible to normal people with normal schedules.

Two bottlenecks that AI could break.

The Six-Phase Pipeline

VaultForge is an AI agent skill that loads into Claude Code, Codex, or Cursor. Give it a PDF or Markdown file, and it produces a complete Obsidian vault. The engineering is organized into seven phases:

Phase 0: Vault Scan and Mode Selection

When you trigger VaultForge on a folder that already contains VaultForge-generated notes, the skill auto-detects them via vf: true frontmatter. It counts notes by status — pristine (auto-updatable), user-modified (hands-off), and locked (frozen by the user) — and offers three paths:

Incremental update — add new notes only, never touch what you've edited
Full regenerate — rebuild from scratch
Skip — do nothing

New files are pre-selected; previously-processed files default to unselected. After processing, an Update Report lists what was created, what wikilinks were established, and which existing notes have fresh source content available for refresh.

Phase 1: Roadmap Generation

The agent reads the full material and produces two outputs: an outline version (H2/H3 hierarchy with bullet points) and a detailed version with cases, citations, and source page ranges. This replaces the "build a framework first" consulting step.

Every H2 category must have at least 2 H3 topics. This prevents hierarchy from being a facade — if you can't break a category into at least two subtopics, your schema isn't well-formed.

Phase 2: File Structure

Batch creates folders, Maps of Content (MOC) pages, and blank atomic note stubs. The MOC sits at the H3 level, ensuring every cluster of notes has an index page.

Phase 3: Parallel Content Fill (The Interesting Part)

This is where the engineering gets specific. Each atomic note goes through a five-state state machine:

draft → filling → filled → reviewed
                  ↘ needs_review

The filling state splits into two failure modes: crash before the .md.tmp is written (no partial file) and crash after the rename but before metadata update (.md exists but status is still filling). Both are handled differently during recovery.

A Python script (context-extractor.py) pre-fetches source paragraphs per source_range. Instead of passing the full PDF to every agent, each note agent receives only the pages it needs. Token consumption drops 5x or more.

After filling, every note is reviewed by a separate agent. If it fails quality thresholds — 200+ words for core concepts, 150+ for case studies, 50-150 for original citations — it gets up to 2 repair attempts before being flagged as needs_review.

Phase 4: Wikilink Building

A three-stage funnel:

Structural affinity (zero cost) — notes in the same H2 category get linked
TF-IDF semantics (low cost) — pure Python, zero external dependencies, scores term overlap
LLM classification (controlled cost) — five relationship types: derivation, analogy, contradiction, application, context

Most existing wikilink tools stop at "similar words match." The five-type classification maps to Bloom's taxonomy levels from comprehension to evaluation.

Phase 5: Core Questions

Generates ≤5 guiding questions per topic. These anchor the learner's retrieval practice — not "what does this term mean" but "how would you apply this principle to a situation where X and Y conflict?"

Phase 6: Deep Research

Web search for controversy analysis. Output uses a three-layer framework: consensus, disputes, and context-dependent. This is the closest the pipeline gets to the "expert interview" consulting step.

The Engineering Decisions That Matter

Three design choices separate VaultForge from most "AI agent" projects.

Atomic writes. Every note is written as .md.tmp first, verified on disk, then renamed. If the agent crashes mid-write — which happens constantly — the partial file is invisible to Obsidian and detectable on resume.

File-based statistics. Phase 1 counts roadmap topics by parsing the output file, not by asking the model "how many did you write." Model self-report is unreliable. Deterministic file parsing is not.

English instructions, bilingual output. All agent instructions are in English because LLMs execute English prompts with highest precision. But the first user prompt asks for language selection. The templates support English and Chinese section headers side by side. It's a small thing that matters a lot for non-English users.

What the Cognitive Science Evaluation Said

I had a final evaluation done against cognitive science standards. The key findings:

Design Element	Principle	Score
Atomic notes + length rules	Cognitive Load Theory — chunking in both directions	5/5
MOC at H3 level	Ausubel's advance organizers + Schema Theory	5/5
Five-type wikilinks	Constructivist semantic network + Bloom's	4.5/5
Core questions (≤5)	Elaborative Interrogation	4/5
Controversy analysis	Piaget cognitive conflict + critical thinking	5/5
H2 ≥ 2 H3 constraint	Taxonomy minimum branching	4/5

The overall judgment: "VaultForge is the project with the most solid educational theory and the most rigorous engineering architecture in the current Agent Skill ecosystem."

I don't disagree, but I'll let readers decide by examining the repo.

Limits and Next Steps

It's not for everything. VaultForge works best with structured, high-quality materials — textbooks, academic papers, industry reports. It's unsuited for fragmented reading like news feeds or social media. For that, a different tool is needed.

The evaluation identified one clear upgrade path: adding a retrieval practice feedback loop — answer input, LLM assessment, weakness tagging, spaced repetition scheduling. That would take the educational dimension from 4.5/5 to 5/5.

How to Use It

git clone https://github.com/Easonnotsing/VaultForge.git ~/.agents/skills/VaultForge
pip install pypdf

Then trigger in any Claude Code / Codex / Cursor session. That's it. The skill handles the rest.

The repo has 58 automated tests, a compatibility guide for different clients, and a changelog of every failure pattern encountered during development.

The Larger Pattern

Tools will keep accelerating. What matters isn't which note-taking app you use or which model you call. What matters is what happens in your head while you use them.

I built VaultForge because I wanted us to spend our limited time thinking — not formatting, not organizing, not wrestling with folder structures. The AI handles the production work. The human handles the judgment.

That division of labor is going to become the default. This is one version of what it looks like.

VaultForge is open source under MIT. Repo: https://github.com/Easonnotsing/VaultForge

DEV Community: Eason XUE