DEV Community

gpt ai clips
gpt ai clips

Posted on

From abandoned repos to a $87K Obsidian vault: a three-pass extraction pattern

Most of us have a folder full of repos that haven't been opened in months. Here is a pattern I have been using to turn those repos into a packaged developer product instead of guilt.

The three-pass pipeline

The core idea is to extract decisions, not descriptions. A file-level summary is useless to a future reader — what they want is the implicit reasoning the original author was preserving.

Pass 1 — file-level extraction

For every file, ask the model for four things: purpose, public surface, hidden invariants, and a risk score from 1 to 5. The risk score is the secret ingredient — it forces the model to find load-bearing logic.

Pass 2 — module-level clustering

Feed all of pass 1 back in and ask for clusters of files that share invariants. Each cluster becomes an Architecture Decision Record (ADR) with status, context, decision, and consequences.

Pass 3 — architecture-level graph

Apply Leiden clustering across the ADRs to surface the cross-cutting concepts. Each graph node carries a one-paragraph 'why this matters to a maintainer' note. Leiden gives stabler cluster boundaries than vanilla modularity on small graphs.

Why Sonnet 4.6 matters here

With a 1M-token context window I can run the whole-repo pass without first summarizing per file. Per-file summarization is where cross-references die — once you compress, you lose the links the graph step depends on.

Packaging into Obsidian

Graphify (the Claude Code skill, ~37K stars) has an --obsidian flag that writes the graph as a markdown vault with backlinks already wired up. Add ADR templates and you have a product, not a dump.

Quick demo

Three-tier pricing

  • $0 — sampler vault with two ADRs and the graph view
  • $49 — full vault with all ADRs and concept notes
  • $149 — full vault plus raw prompts and ADR templates so the buyer can run the pipeline on their own code

Top developer vaults on Gumroad clear 3K+ copies a year, so the ceiling is real.

Try it on your own repo

Checkout the Graphify project and the longer walkthroughs over at cptdigital.com. The bottleneck is almost always the prompt for pass 1, not the model.

Top comments (0)