just_an_electron

Posted on Jul 1 • Edited on Jul 14

A self-updating knowledge base for my terminal AI assistant (Claude Code hooks)

#claudecode #ai #claude

I spend most of my day in the terminal with an AI coding assistant. Every session I would solve something worth remembering: a tricky fix, a config gotcha, a small runbook. Then I would lose it. It lived in a scrollback buffer that vanished when I closed the tab. A month later I would re-solve the same thing.

The idea

Make the assistant maintain its own notes. Three moving parts:

On every prompt, search a small Markdown knowledge base (KB) and inject the relevant entries, so the assistant answers with prior context instead of re-deriving it.
When a session ends, capture anything worth keeping back into the KB.
On session start, load the index so the assistant knows what exists.

Claude Code has hooks: shell commands that fire on lifecycle events. Three of them cover the whole loop.

{
  "hooks": {
    "SessionStart":     [{ "hooks": [{ "type": "command", "command": "/path/kb-load.sh" }] }],
    "UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "/path/kb-search.sh" }] }],
    "SessionEnd":       [{ "hooks": [{ "type": "command", "command": "/path/kb-enqueue.sh" }] }]
  }
}

Everything below is the useful stuff I learned wiring these up.

Retrieval: UserPromptSubmit runs on every message

UserPromptSubmit fires for each prompt you send, receives the prompt text on stdin, and can inject context before the model runs. That is the hook that makes the KB get checked automatically, instead of leaving it to the model to decide whether to look.

Three design rules: it must be cheap (grep, no LLM), lean (inject only the top matches, never dump whole files), and it must never block the prompt.

#!/usr/bin/env bash
# NOTE: no `set -e`. A UserPromptSubmit hook that exits non-zero BLOCKS the prompt.
# Every failure path must fall through to `exit 0` with no output.
set -uo pipefail

PROMPT="$(cat | python3 -c 'import sys,json;print(json.load(sys.stdin).get("prompt",""))')"
[ -n "$PROMPT" ] || exit 0

python3 - "$KB" "$PROMPT" <<'PY' || true
import sys, os, re, glob, json
kb, prompt = sys.argv[1], sys.argv[2]

# tokenize the prompt, drop stopwords/short tokens, keep ticket-ish IDs like abc-123
terms = [t for t in re.findall(r"[a-z][a-z0-9-]{2,}", prompt.lower()) if t not in STOP]
if not terms: sys.exit(0)

# rank every KB file by (distinct terms matched, then total matches)
scored = []
for f in glob.glob(os.path.join(kb, "**/*.md"), recursive=True):
    if excluded(f): continue                          # skip templates, tooling, backup mirror
    text = open(f, errors="ignore").read().lower()
    hits = sum(1 for t in terms if t in text)
    if not hits: continue
    # a term matched in the title/tags/path counts double, so the file that is ABOUT a
    # topic outranks a big generic file that merely mentions the word once.
    hits += sum(1 for t in terms if t in header_of(f, text))
    scored.append((hits, sum(text.count(t) for t in terms), f))
if not scored: sys.exit(0)
scored.sort(reverse=True)

# inject ONLY the top few: title + path + the one matching line. The model opens the file for more.
lines = ["# Relevant KB entries (consult before answering; open the file for detail):", ""]
for _, _, f in scored[:5]:
    lines.append(f"- {title_of(f)} [{os.path.relpath(f, kb)}]")
    if snippet_of(f, terms): lines.append(f"    {snippet_of(f, terms)}")
print(json.dumps({"hookSpecificOutput":
    {"hookEventName": "UserPromptSubmit", "additionalContext": "\n".join(lines)}}))
PY
exit 0

The point is not that grep is clever. It is that the hook does the searching once, ranks the results, and hands the model the specific files to read. The model stops guessing which files or tools to check, because the relevant entry is already in front of it. Scanning a few dozen small Markdown files per prompt takes under 100ms, and the win is injecting only the top five, not the whole KB.

Two things I got wrong in the naive version, both about ranking. First, raw term-frequency let my biggest, most cross-referenced file win almost every query: it mentions everything once, so it always scored a hit. The fix is to weight matches by where they land — a term in the title, tags, or path counts double, so the entry that is genuinely about the topic beats the file that just name-drops it. Second, I keep a one-way backup mirror of some config files inside the KB tree, and search kept surfacing those stale duplicates over the real runbooks. Anything that is a copy, not knowledge, has to be excluded from the scan. A ranked retriever is only as good as the stuff you let it rank.

Capture: the mistake, and the fix

My first version captured on the Stop hook, which fires every time the assistant finishes a reply. The capture spawns a headless AI call (claude -p ...) to read the transcript and write KB entries. It worked, and then the assistant got painfully slow. Every turn ended with "running stop hook" for minutes.

Here is why. Stop runs synchronously, once per turn. A 30-turn session fired the headless capture about 30 times, each one re-reading a bigger transcript. I was re-capturing the same conversation over and over.

The unit I actually wanted was the session (start until /clear), captured once. That is SessionEnd, which fires on /clear and on exit. So I moved the capture there, and hit the real lesson:

SessionEnd cannot run a long or backgrounded job. It is documented as non-blocking (side effects only), and a process you background from it is not guaranteed to survive. The session is terminating, and its child processes can be killed with it.

So a multi-minute headless capture launched from SessionEnd gets cut off partway.

The fix is to split it across two hooks.

SessionEnd does something trivial and instant: it enqueues the ended transcript's path.

#!/usr/bin/env bash
set -euo pipefail
[ -n "${KB_CAPTURE:-}" ] && exit 0          # recursion guard (see below)
T="$(cat | python3 -c 'import sys,json;print(json.load(sys.stdin).get("transcript_path",""))')"
{ [ -n "$T" ] && [ -f "$T" ]; } && echo "$T" >> "$KB/tools/queue.txt"
exit 0

SessionStart (of the next session) drains the queue and runs the capture in the background. That is safe here because this session stays alive.

QUEUE="$KB/tools/queue.txt"; LOCK="$KB/tools/.lock"
if [ -s "$QUEUE" ] && mkdir "$LOCK" 2>/dev/null; then   # mkdir is an atomic single-drain lock
  mv "$QUEUE" "$QUEUE.wip"                               # claim atomically so new enqueues are not lost
  ( sort -u "$QUEUE.wip" | while read -r t; do
        [ -f "$t" ] && KB_BG=1 KB_TRANSCRIPT="$t" "$KB/tools/kb-capture.sh" >>"$KB/log" 2>&1
      done
      rm -f "$QUEUE.wip"; rmdir "$LOCK"
  ) >/dev/null 2>&1 &        # backgrounded off a LIVE session, so it survives
  disown 2>/dev/null || true
fi

Net effect: one capture per session, off the critical path. The only trade-off is that the write lands at the start of the next session (a few seconds later) rather than the instant you /clear, which is the one place a long job is guaranteed to survive.

"One capture per session" needs one more guard, because the queue can hand you the same transcript twice — a repeat /clear, a resumed session, a re-enqueue. So the worker keeps a tiny ledger keyed by transcript_path + byte-size and skips anything it has already seen at that size. The size is the trick: a resumed session that did more work grows its transcript, so the key changes and it re-captures the new part; an unchanged re-enqueue has the same size and is skipped. Cheap idempotency without diffing content.

Two smaller notes. macOS has no setsid, so detach with nohup ... & plus disown (a bash builtin) rather than setsid. And read stdin before backgrounding, because the detached copy has none.

Three things that made it cheap and safe

1. A cheaper model for the capture. It is a summarize-and-write task, so it does not need your most expensive model. Pin it.

KB_CAPTURE=1 claude -p "$(cat capture-prompt.md)" --model <cheap-fast-model> \
  --allowedTools "Read,Edit,Write,Grep,Glob"

Roughly 5x cheaper, and it finishes fast, so queued captures clear quickly.

2. A pre-filter before spending a token — and the trap in it. Most sessions are not worth capturing, so a quick grep should gate the AI call:

grep -qiE 'root cause|next step|blocked on|draft|fix|<ticket-pattern>' "$TRANSCRIPT" || exit 0

That looks right and does nothing. The transcript is JSONL of the whole session, not just what was said — it includes tool-call metadata, tool names, and the context the harness injects on every turn (system reminders, memory, prior hook output). All of that mentions my trigger words in essentially every session, so the filter matched every time and I paid for a headless call on empty sessions anyway.

The fix is to filter on the conversation, not the transcript. Parse the JSONL, keep only user and assistant text blocks, drop tool blocks and injected reminders, and grep that:

CONVO="$(python3 - "$TRANSCRIPT" <<'PY'
import json, sys
for line in open(sys.argv[1], errors="ignore"):
    try: o = json.loads(line)
    except Exception: continue
    if o.get("type") not in ("user", "assistant"): continue
    c = (o.get("message") or {}).get("content")
    blocks = [c] if isinstance(c, str) else [
        b.get("text","") for b in (c or []) if isinstance(b, dict) and b.get("type") == "text"]
    for t in blocks:
        if t and not t.lstrip().startswith("<system-reminder"): print(t)
PY
)"
printf '%s' "$CONVO" | grep -qiE 'root cause|next step|blocked on|<ticket-pattern>' || exit 0

Same grep, but now it sees only the human/assistant prose. A gate that inspects the raw transcript is worse than no gate: it costs you the parse and still fires every time.

3. A recursion guard, and set it first. The headless claude you spawn inherits your environment, including the hook registration. So it fires its own hooks, which spawn another headless claude, and so on without end. One env var breaks the loop: put [ -n "${KB_CAPTURE:-}" ] && exit 0 at the top of every hook, and set KB_CAPTURE=1 when you spawn the headless call. Without it, one real session fans out into an unbounded tree of AI calls.

The economics

Per prompt: a sub-100ms grep plus a small context injection of about five entries, and only when there is a match. Per session: one cheap capture, and only if the session actually learned something (the pre-filter plus a "did anything change" check before writing).

That is the trade. You pay a tiny, bounded cost to cache knowledge, so future sessions skip the expensive part: re-reading a large codebase, re-querying tools, re-investigating a problem you already solved. One avoided re-derivation pays for a lot of captures.

The lessons, condensed

UserPromptSubmit is where you inject retrieval. It is the only hook that runs on every prompt and can add context before the model.
A ranked retriever is only as good as what you let it rank: weight matches by where they land so the file about a topic wins, and exclude anything that is a copy rather than knowledge.
Anything on Stop runs on the critical path of every turn. Keep it instant.
SessionEnd cannot run long or background work. Enqueue there, and do the heavy lifting on the next SessionStart, where the process survives.
Make capture idempotent. The queue will hand you the same session twice; key on transcript path plus size so resumed work re-captures and unchanged work does not.
A pre-filter must inspect the conversation, not the raw transcript. Tool metadata and injected context contain your trigger words in every session, so a naive grep never actually filters.
Guard against recursion before you ever run a hook that spawns the assistant.

Top comments (1)

Shoogar • Jul 2

The SessionEnd-enqueue / SessionStart-drain split is a legitimately clever workaround for the hook process-survival constraint — and documenting the Stop-hook mistake that got you there made this ten times more useful than the usual hooks tutorial. Curious: have you hit KB rot yet, where two captured "lessons" from different sessions contradict each other and search surfaces the wrong one at exactly the wrong moment? That's where my own memory layer ended up needing an explicit superseding mechanism.