DEV Community

Eason XUE
Eason XUE

Posted on • Originally published at github.com

How I Turned 6 Cognitive Science Principles Into an AI Agent That Builds Obsidian Vaults

In 30 years, computers got 100x faster. Our learning method stayed the same: read → highlight → re-read → forget → re-read again.

I shared with EMBA students who have to digest hundreds of pages of materials every month — textbooks, case studies, industry reports. They have demanding jobs, families, and a finite number of hours. The ones who fall behind aren't less capable. They just don't have a system.

A few months ago I found a way to break that cycle. It combines a century-old insight about how knowledge works with modern AI. I open-sourced the result as VaultForge.

This is the story of what it does and how it's built.

The Method Was Always There — But It Had Two Bottlenecks

Management consultants face the same problem my audience do. When a consulting firm lands a client in an unfamiliar industry, they have one week to get smart enough to talk to senior executives. They can't read everything. So they don't try.

They do three things:

  1. Build a framework first — map out what dimensions matter before reading
  2. Define boundaries — know what they don't need to know, and actively search for knowledge and information around a few core questions, building connections between knowledge and real-world problems
  3. Find experts — get someone to pinpoint the core consensus, debates and unknowns in 60 minutes, so you aren't limited by readily available knowledge but stand at the industry frontier to explore its direction and boundaries

Cognitive science backs this up. Passive re-reading creates familiarity, not memory. Active retrieval — explaining, questioning, connecting — is 3-5x more effective.

And then there's Niklas Luhmann. The German sociologist wrote 70 books and 400+ papers using 90,000 index cards. His secret wasn't intelligence. It was connection. Every card linked to others. Ideas fermented over years. He said he didn't think first and then write — he wrote to discover what he was thinking.

That's the zettelkasten method, formalized as digital tools like Obsidian.

But here's the problem. The consulting method requires access to experts and years of framework-building practices. Luhmann's system requires a lifetime of manual card maintenance. Both are inaccessible to normal people with normal schedules.

Two bottlenecks that AI could break.

The Six-Phase Pipeline

VaultForge is an AI agent skill that loads into Claude Code, Codex, or Cursor. Give it a PDF or Markdown file, and it produces a complete Obsidian vault. The engineering is organized into six phases:

Phase 1: Roadmap Generation

The agent reads the full material and produces two outputs: an outline version (H2/H3 hierarchy with bullet points) and a detailed version with cases, citations, and source page ranges. This replaces the "build a framework first" consulting step.

Every H2 category must have at least 2 H3 topics. This prevents hierarchy from being a facade — if you can't break a category into at least two subtopics, your schema isn't well-formed.

Phase 2: File Structure

Batch creates folders, Maps of Content (MOC) pages, and blank atomic note stubs. The MOC sits at the H3 level, ensuring every cluster of notes has an index page.

Phase 3: Parallel Content Fill (The Interesting Part)

This is where the engineering gets specific. Each atomic note goes through a five-state state machine:

draft → filling → filled → reviewed
                  ↘ needs_review
Enter fullscreen mode Exit fullscreen mode

The filling state splits into two failure modes: crash before the .md.tmp is written (no partial file) and crash after the rename but before metadata update (.md exists but status is still filling). Both are handled differently during recovery.

A Python script (context-extractor.py) pre-fetches source paragraphs per source_range. Instead of passing the full PDF to every agent, each note agent receives only the pages it needs. Token consumption drops 5x or more.

After filling, every note is reviewed by a separate agent. If it fails quality thresholds — 200+ words for core concepts, 150+ for case studies, 50-150 for original citations — it gets up to 2 repair attempts before being flagged as needs_review.

Phase 4: Wikilink Building

A three-stage funnel:

  1. Structural affinity (zero cost) — notes in the same H2 category get linked
  2. TF-IDF semantics (low cost) — pure Python, zero external dependencies, scores term overlap
  3. LLM classification (controlled cost) — five relationship types: derivation, analogy, contradiction, application, context

Most existing wikilink tools stop at "similar words match." The five-type classification maps to Bloom's taxonomy levels from comprehension to evaluation.

Phase 5: Core Questions

Generates ≤5 guiding questions per topic. These anchor the learner's retrieval practice — not "what does this term mean" but "how would you apply this principle to a situation where X and Y conflict?"

Phase 6: Deep Research

Web search for controversy analysis. Output uses a three-layer framework: consensus, disputes, and context-dependent. This is the closest the pipeline gets to the "expert interview" consulting step.

The Engineering Decisions That Matter

Three design choices separate VaultForge from most "AI agent" projects.

Atomic writes. Every note is written as .md.tmp first, verified on disk, then renamed. If the agent crashes mid-write — which happens constantly — the partial file is invisible to Obsidian and detectable on resume.

File-based statistics. Phase 1 counts roadmap topics by parsing the output file, not by asking the model "how many did you write." Model self-report is unreliable. Deterministic file parsing is not.

English instructions, bilingual output. All agent instructions are in English because LLMs execute English prompts with highest precision. But the first user prompt asks for language selection. The templates support English and Chinese section headers side by side. It's a small thing that matters a lot for non-English users.

What the Cognitive Science Evaluation Said

I had a final evaluation done against cognitive science standards. The key findings:

Design Element Principle Score
Atomic notes + length rules Cognitive Load Theory — chunking in both directions 5/5
MOC at H3 level Ausubel's advance organizers + Schema Theory 5/5
Five-type wikilinks Constructivist semantic network + Bloom's 4.5/5
Core questions (≤5) Elaborative Interrogation 4/5
Controversy analysis Piaget cognitive conflict + critical thinking 5/5
H2 ≥ 2 H3 constraint Taxonomy minimum branching 4/5

The overall judgment: "VaultForge is the project with the most solid educational theory and the most rigorous engineering architecture in the current Agent Skill ecosystem."

I don't disagree, but I'll let readers decide by examining the repo.

Limits and Next Steps

It's not for everything. VaultForge works best with structured, high-quality materials — textbooks, academic papers, industry reports. It's unsuited for fragmented reading like news feeds or social media. For that, a different tool is needed.

The biggest missing piece: incremental updates. If you add a new PDF to a processed vault, the current version requires re-running the full pipeline. Incremental mode is the next major feature.

And the evaluation identified one clear upgrade path: adding a retrieval practice feedback loop — answer input, LLM assessment, weakness tagging, spaced repetition scheduling. That would take the educational dimension from 4.5/5 to 5/5.

How to Use It

git clone https://github.com/Easonnotsing/VaultForge.git ~/.agents/skills/VaultForge
pip install pypdf
Enter fullscreen mode Exit fullscreen mode

Then trigger in any Claude Code / Codex / Cursor session. That's it. The skill handles the rest.

The repo has 51 automated tests, a compatibility guide for different clients, and a changelog of every failure pattern encountered during development.

The Larger Pattern

Tools will keep accelerating. What matters isn't which note-taking app you use or which model you call. What matters is what happens in your head while you use them.

I built VaultForge because I wanted my students to spend their limited time thinking — not formatting, not organizing, not wrestling with folder structures. The AI handles the production work. The human handles the judgment.

That division of labor is going to become the default. This is one version of what it looks like.


VaultForge is open source under MIT. Repo: https://github.com/Easonnotsing/VaultForge

Top comments (0)