DEV Community

Cover image for My AI corrections used to be tribal knowledge. I shipped the fix into my own OSS.
LimSangkyu
LimSangkyu

Posted on

My AI corrections used to be tribal knowledge. I shipped the fix into my own OSS.

TL;DR — I ran a personal wiki inside Claude Code for a month and shipped it as Hypomnema (npm install -g hypomnema). It's MIT, plain markdown + git, no vector DB, no API keys. The core is 14 lifecycle hooks that wire the wiki into Claude Code so it maintains itself — auto-commit, auto-push, session-state injection, lookup signals, compact-time guards. The most acute thing a month of dogfooding surfaced — AI behavior corrections rotting across three hand-synced storage layers — gets a fix that ships with this first drop. This is the field report.

(There's a Korean retrospective on velog if you read Korean — same author, longer narrative.)

The thing every note tool fails at

I've cycled through every personal-knowledge tool you've heard of. They all break in roughly the same way:

The pain
Note vaults (Obsidian, Notion) You write fine. You never re-read. 100 notes = 100 islands.
RAG / vector DBs Chunks grow forever. Knowledge doesn't compound — it accumulates.
AI notebooks (Mem, Reflect, etc.) Closed format. No git. I don't trust them in five years.
Code wikis (auto-generated from a repo) Code only. Can't capture decisions, research, or AI behavior corrections.

The thing every one of them fails at is synthesis. They store. They retrieve. They never update what you already know with what you just read.

That's exactly the gap Karpathy pointed at in his Gist this April. RAG (Retrieval-Augmented Generation) re-reads chunks of your sources on every query and pastes them into the prompt; a wiki persists synthesis between queries:

"RAG re-reads sources every query. A wiki persists synthesis. The bottleneck was always bookkeeping, and LLMs reduce that cost to zero."

So I built one — small at first, then aggressive — and ran it daily for a month inside Claude Code before opening it up. The result is Hypomnema.

What it is, in one paragraph

Hypomnema is a personal wiki that lives in ~/hypomnema/ as plain markdown + git. You feed it sources with one slash command. Claude reads them, synthesizes structured pages, and updates existing pages instead of creating new islands. Fourteen lifecycle hooks wire it into Claude Code so the wiki maintains itself: auto-commit, auto-push, session-state injection, lookup signals, compact-time guards. There's no vector DB. There's no API key. The whole stack is "Node.js script + markdown + git + Claude."

npm install -g hypomnema
# inside Claude Code:
/hypo:init
Enter fullscreen mode Exit fullscreen mode

That's the entire setup.


Six walls, six fixes

Wall 1 — "Every new session forgets where I was"

Coming back to a paused project after a week, Claude had no memory of it. I had to retype context for ten minutes.

The fix took three attempts.

  1. Print hot.md to stderr from a SessionStart hook. Doesn't work — Claude Code's TUI captures hook stderr during init. You see nothing.
  2. Inject via additionalContext from SessionStart. Also doesn't work in a multi-hook setup — when more than one SessionStart hook runs, Claude Code keeps only the last hook's additionalContext. Earlier hooks get silently overwritten.
  3. The pattern that finally stuck: SessionStart writes a marker file (os.tmpdir()/wiki-session-marker.json); the first UserPromptSubmit reads the marker and injects via its additionalContext (which is reliable). 10-minute TTL, single-shot consumption.
SessionStart
  ↳ hypo-session-start.mjs
      ↳ git pull --ff-only
      ↳ write tmpdir marker {hot, session-state, ts}

UserPromptSubmit (first one only, within 10 min)
  ↳ hypo-first-prompt.mjs
      ↳ read marker → inject hot.md + session-state.md as additionalContext
      ↳ delete marker
Enter fullscreen mode Exit fullscreen mode

Resume cost is now effectively O(1). The takeaway: you can't rely on prompts; you also can't rely on a single hook channel — you have to triangulate around platform constraints. I lost a day to "why isn't stderr printing" before I understood that.

Wall 2 — "/compact deletes my work-in-progress learnings"

Claude Code's /compact summarizes your conversation. It also blows away anything you didn't write down. I lost two productive afternoons before I understood that.

Fix: a PreCompact hook that refuses to compact if the session-close checklist is unfinished or the wiki lints fail.

PreCompact
  ↳ hypo-personal-check.mjs
      ↳ if lint errors → block + show what's broken
      ↳ if session-close incomplete → block + show next step
Enter fullscreen mode Exit fullscreen mode

The same machinery covers /clear: a Stop-chain auto-minimal-crystallize step offers /hypo:crystallize --apply-session-close --minimal on non-trivial session ends, and a SessionEnd marker + SessionStart source=clear recovery makes the /clear-then-restart round-trip clean (ADR — Architecture Decision Record — 0022 Layer 3 in the repo).

Wall 3 — "The 3-mode privacy matrix didn't survive contact with reality"

An early prototype had personal / shared / public modes per page. Sounded fine on paper. In practice every privacy decision was about which paths to exclude, not about a mode label.

Fix: the public release deletes the mode concept entirely. There's a single .hypoignore file with glob patterns. One file, one source of truth. Code shrunk by half. Mental model doubled in clarity.

Wall 4 — "BM25 libraries can't find Korean substrings"

Most BM25 (Best Matching 25 — the standard term-frequency ranking function for full-text search) implementations split on whitespace. Korean ("장애 보고서") doesn't always have whitespace where you'd expect, so a search for "장애" missed the page that contained "장애보고서".

Fix: a hand-rolled bi-gram BM25 tokenizer for CJK. Search suddenly worked.

Wall 5 — "I corrected the AI three times, and it still made the same mistake tomorrow"

Every AI tool today accepts corrections for the current chat. None of them persist.

I built /hypo:feedback "stop doing X because Y" — stores the rule in pages/feedback/<slug>.md, which is inside the wiki, so it auto-syncs via the Stop hook to every machine. Good first step.

But: Claude Code itself has two more places where behavior rules live — ~/.claude/projects/<id>/memory/MEMORY.md (per-project, machine-local) and ~/.claude/CLAUDE.md <learned_behaviors> (global, machine-local). Both live outside the wiki. Both are read on every prompt. Hand-syncing three layers rots fast.

Wall 6 — "Three storage layers, hand-synced, rotting fast"

A month of dogfooding compressed this into a sharp shape: I had 14 hand-written entries in ~/.claude/CLAUDE.md <learned_behaviors>. Nine of them had the same rule in pages/feedback/ already. I'd edited the wiki page and forgotten the global rules file. I'd added a global rule and forgotten to write the page. The drift was invisible until I counted.

The response that became the headline: feedback is the single source of truth, <learned_behaviors> and MEMORY.md are derived.

  • One place to edit: pages/feedback/<slug>.md.
  • One command to project: hypomnema feedback-sync --check (dry-run) / --write (apply).
  • <learned_behaviors> and MEMORY.md get filled inside marker-fenced managed blocks (HYPO:FEEDBACK-SYNC:START / END). Hand-edits inside those blocks surface as CONFLICT_MANUAL_EDIT.
  • Strict gate for the global rules layer: a feedback page must declare scope: global + tier: L1 + targets: claude-learned + promote_to_global: true + sensitivity ∈ {public, sanitized}. Five gates. Plus a hard cap of 10 entries so global rules can't quietly bloat the token budget.
  • One-way projection. The wiki is the SoT (single source of truth, SoT); the two downstreams are derived. No back-flow.
  • sensitivity: private is forbidden — the wiki is git-pushed, so private data must stay outside the wiki entirely.
# Author the SoT page
/hypo:feedback "<rule>"

# Project to downstreams
hypomnema feedback-sync --check       # dry-run
hypomnema feedback-sync --write       # apply (managed blocks only)
hypomnema feedback-sync --bootstrap   # scaffold drafts from existing state
Enter fullscreen mode Exit fullscreen mode

This is the spine of the release and the reason it's the first public one: any earlier and the projection design would have shipped with the assumption that hand-sync works, which it doesn't.


SCHEMA — why we refused to auto-stub

The new type: feedback page requires nine fields unconditionally: status, scope, tier, targets, sensitivity, priority, memory_summary, reason, source. If targets includes claude-learned, add global_summary + promote_to_global.

When you run hypomnema upgrade --apply, the upgrade writes a fresh backfill checklist into your wiki root. It deliberately does not auto-stub.

The reason: scope / tier / targets / sensitivity / reason / source are meaning decisions, not formatting decisions. Auto-stubbing them with defaults like scope: project:? would silently project wrong behavior into two downstream surfaces (MEMORY.md and <learned_behaviors>). We chose loud manual work over silent wrong work.

SCHEMA.md itself stays byte-equal across upgrades (Option C preservation). Migration report tag stays [schema] — the only token historically valid across every shipped Meta vocabulary. One honest caveat: lint regex ^project:[a-z0-9][a-z0-9-]*$ and the default cwd-derived project-id format are incompatible — to use scope: project:* you must pass --project-id=<slug>. Full reconciliation comes in the next minor.

A related cross-project leak in the projection filter is also closed: a feedback page scoped to project:other no longer projects into the current project's MEMORY.md. Only scope: global or exact scope: project:${projectId} entries pass through.


The rest in one section

Extensions companion sync. The wiki ships extensions/{agents, commands, hooks, skills}/. init scaffolds; upgrade mirrors into ~/.claude/ (and with --codex, the hooks + commands subset into ~/.codex/). hypomnema doctor extensions audits orphans, matcher drift, non-registrable orphans. This means your personal agents / skills / commands ride along with the wiki and stay in sync across machines. What's left for a dotfiles tool like chezmoi is ~/.claude/CLAUDE.md body + settings.json deltas.

Auto-project on cwd match. Open a repo with a project marker (package.json, Cargo.toml, go.mod, pyproject.toml, …) but no wiki project? SessionStart offers to create one. "Y" scaffolds from templates/projects/_template/. "N" gets recorded; 5-minute per-cwd cooldown.

PostToolUse WebFetch / WebSearch auto-ingest signal. Claude fetches a URL or runs WebSearch → PostToolUse hook injects a nudge so Claude considers /hypo:ingest. Privacy: URL query / hash / userinfo are stripped before injection on WebFetch URLs (WebSearch isn't URL-redacted because there's no URL being called).

Update notifier + --codex core hook mirror + W8 lint stale design-history + code comment cleanup. Update banner at SessionStart (npm channel + Claude Code plugin channel; pick the one you're on). upgrade --codex now mirrors core hooks (not just extensions). Lint emits W8 for stale design-history.md. A comment-only cleanup pass landed across 13 files: rot-prone refs and codex verdict shorthand stripped while contract / spec / Layer anchors stay. New policy: time-bound cross-references belong in PR descriptions, not in code comments. A second cleanup pass is queued for the next track.


What you actually get

8 slash commands

Command Purpose
/hypo:ingest <url-or-path> Save raw to sources/, synthesize a page in pages/. Updates existing page if topic exists.
/hypo:query "..." BM25 retrieval + LLM synthesis. Answers cite [[wikilink]]s.
/hypo:crystallize End-of-session: 11-step checklist that locks today's learnings into the wiki.
/hypo:resume Reload the most recent state of an active project.
/hypo:feedback "..." Capture an AI behavior correction — writes the SoT page directly.
/hypo:verify Audit pages whose verify_by date has passed.
/hypo:lint Frontmatter / wikilink / schema validation.
/hypo:graph Generate a wikilink dependency graph.

5 CLI subcommands

hypomnema init                        # bootstrap a wiki
hypomnema upgrade [--apply] [--codex] # hooks / SCHEMA / extensions / migration report
hypomnema doctor [extensions]         # integrity audit
hypomnema uninstall                   # remove companion files
hypomnema feedback-sync --check|--write|--bootstrap
Enter fullscreen mode Exit fullscreen mode

14 lifecycle hooks

Event Hook What it does
SessionStart hypo-session-start Inject hot.md / session-state.md, git pull --ff-only, offer auto-project, update notifier
UserPromptSubmit hypo-lookup BM25 top-3 HIT inject / MISS → closest-slug signal
UserPromptSubmit hypo-compact-guard Detect /compact → enforce checklist
UserPromptSubmit hypo-first-prompt First-prompt forced resume summary (marker-driven, 10-min TTL)
CwdChanged hypo-cwd-change Inject the matching project's hot.md
FileChanged hypo-file-watch Notify on wiki-file changes (honors .hypoignore)
PostToolUse(Write/Edit) hypo-auto-stage Auto git add
PostToolUse(WebFetch/WebSearch) hypo-web-fetch-ingest Inject /hypo:ingest nudge (URL redacted)
Stop hypo-auto-commit Auto commit + pull --rebase + push
Stop hypo-hot-rebuild Rebuild hot.md from the latest session activity
Stop hypo-session-record Record session metadata for the observability score
Stop hypo-auto-minimal-crystallize Offer minimal crystallize on non-trivial session end
SessionEnd hypo-session-end Stash a /clear-survivable marker so the next SessionStart can recover
PreCompact hypo-personal-check Block compact on lint failures or unfinished session-close

The synthesis-heavy commands also ship as Claude Agent Skills, so they auto-trigger by description match — no slash required.

Directory shape

~/hypomnema/
├── hypo-config.md       ← root marker
├── index.md             ← page catalog
├── hot.md               ← active project pointers
├── log.md               ← append-only activity log
├── SCHEMA.md            ← type system (v2.0, user-owned)
├── MIGRATION-*.md       ← created on schema bumps with a backfill checklist
├── .hypoignore          ← glob patterns to exclude
├── pages/
│   └── feedback/        ← AI behavior corrections (the SoT)
├── projects/<name>/
│   ├── hot.md
│   ├── session-state.md
│   └── session-log/
├── journal/{daily,weekly,monthly}/
├── extensions/{agents,commands,hooks,skills}/   ← mirrored to ~/.claude/
└── sources/             ← raw ingested sources, never edited
Enter fullscreen mode Exit fullscreen mode

How it stacks up against the other LLM-wiki OSS

After Karpathy's Gist, ten-plus implementations appeared in a couple of weeks. I read all of them. Short version:

Project Strongest dimension Where Hypomnema differs
nvk/llm-wiki --mode thesis (parallel for/against agents) We don't have thesis mode yet — on the roadmap
SamurAIGPT/llm-wiki-agent Multi-format ingest (PDF/Word/PPT) + contradiction detection We do contradictions at lint time, not ingest
swarmclawai/swarmvault Graph RAG with Louvain clustering We use Obsidian's graph view; no custom RAG
nashsu/llm_wiki (6.6k ⭐) Electron desktop GUI, multi-language We're CLI + Obsidian, no GUI
Tencent/WeKnora Enterprise RAG + ReAct + WeChat Mini Different category; we're personal-scale
lucasastorian/llmwiki Hosted web app (SQLite FTS5, MCP) We're local-first, file-based
OmegaWiki 9-entity typed graph We have lighter typed relations

Where Hypomnema is alone:

  • Session lifecycle automation — no other project hooks SessionStart, PreCompact, Stop to keep the wiki maintained.
  • Source isolationsources/ is immutable. Other projects blur source and synthesis.
  • AI feedback as a single source of truth with one-way projections/hypo:feedback writes a typed page; hypomnema feedback-sync projects into Claude Code's memory and global rules layers.
  • Companion sync for ~/.claude/{agents,commands,hooks,skills}/ — your personal Claude Code companion files ride along with the wiki.
  • Bi-gram BM25 for Korean — small thing, shipped by default.

The summary I keep coming back to: other projects are about how to structure wiki content. Hypomnema is about automating the loop of working with an AI assistant over months. Different problem.


Things deliberately left out

  • No vector DB. BM25 + LLM synthesis covers personal-scale corpora. Vector DBs are a future failure mode (they get bought, deprecated, or leak credentials).
  • No API keys. Claude Code is the only AI dependency, and you already have it.
  • No GUI. Obsidian already exists and is excellent.
  • No mode matrix. A single .hypoignore beats personal / shared / public.
  • No auto-stub on SCHEMA bumps. Wrong defaults silently project wrong behavior. Loud manual work beats silent wrong work.

The whole stack is intentionally boring. The interesting part is what Claude does on top of it.


Dogfooding the SoT engine on ourselves

The day the release was ready to publish, I ran hypomnema feedback-sync --check against my own machine for the first real time. Three things broke at once:

  • The engine reported claude-target candidates = 11, cap = 10, overCap=true, dirty=true.
  • My ~/.claude/CLAUDE.md <learned_behaviors> had 14 hand-written entries.
  • Of those 14, 9 had a backing page already in the wiki (i.e., duplicate hand-syncs that should have been managed blocks). 5 had no backing page (hand-written-only rules I'd never page-promoted).

So "14 vs cap 10" wasn't an over-cap problem. It was three problems compressed: (a) one demote needed to land under the cap, (b) nine duplicates needed cleanup, (c) five page-less rules needed a separate promotion track.

I ran codex through three candidates for the demote and picked the lowest-operating-loss one. I edited its frontmatter (targets removed claude-learned, promote_to_global: false). I ran --check again — green. I ran --write<learned_behaviors> now had a HYPO:FEEDBACK-SYNC:START / END fenced block with 10 managed entries. I removed the 9 duplicate hand-writtens. End state: 5 hand-written + 10 managed = 15 entries. --check clean. Vault commit 18dd4e8. Engine transitioned from dormant to active.

What's worth saying out loud about that moment: a self-operating system surfaces its operator's accumulated drift on first run. The engine was fine. The data was the failure mode — a month of hand-sync I hadn't kept clean. That's the shape of dogfood that actually catches something — not "does the binary run," but "what does the binary surface about the operator."

The five hand-written-only rules become next-minor work (page them, decide demote, run --write). The nine cleanup wasn't a one-off; it's evidence the cap + gate enforcement was right.


Ready to try?

npm install -g hypomnema
# in Claude Code:
/hypo:init
Enter fullscreen mode Exit fullscreen mode

That's it. The hooks register themselves. First commit happens automatically. If you set a remote, Stop will keep every machine in sync.

If you've been frustrated by the "I write notes I never re-read" loop, give it a week. The compounding shows up around day 10 — the moment a third article on the same topic updates an existing page instead of creating a new one. That's when the model in your head clicks.

The next-minor contribution magnet: a chezmoi bridge for ~/.claude/CLAUDE.md and settings.json so the feedback-sync projection works across machines without manual sync. The Extensions companion already covers agents / skills / commands / hooks, but the CLAUDE.md body itself is still machine-local. Solving it has impact on the whole LLM-wiki OSS ecosystem — no project there has done it yet.

Feedback, issues, PRs welcome. If you build something on top of it, drop a comment — I'd love to see it.

Top comments (0)