DEV Community

YHH
YHH

Posted on

Designing a Hooks System for AI Agent CLIs: 4 Lifecycle Points That Cover Everything

Designing Hooks for an AI Agent CLI: 4 Lifecycle Points, Shell-Only, and What I Cut

A few weeks ago I shipped Reasonix — a DeepSeek-native TypeScript agent framework. The first article was about the cache-first prompt structure that pushed cache hit rates to 85-95% on real sessions. That got picked up on dev.to.

Two weeks later, I needed hooks. Specifically I wanted my coding agent to:

  • Refuse rm -rf even if the model decided that was a great idea
  • Auto-format files after every edit
  • Auto-commit at the end of a session if the diff was clean

I could have hardcoded each of these. But every time I did, three days later I'd want a fourth thing — and the agent would grow a "configuration ballast" problem.

So I designed a hooks system. This post is the design doc, including what I cut and the bugs that shaped it.

Why hooks at all (and not the obvious alternatives)

Three things that almost made it in but didn't:

1. A TypeScript plugin system.
You'd npm install reasonix-hook-prettier, the framework would import it, hooks would be JS callbacks. I rejected this because hooks are a sysadmin concern, not a JS concern. The user who wants "run prettier after every edit" already has prettier on $PATH and knows how to invoke it. Forcing them through npm and a TS API is gatekeeping.

2. A middleware chain like Express.
loop.use((event, next) => ...). I rejected this because hooks aren't transformers. They observe, and sometimes veto. Middleware semantics imply "you can rewrite the payload and pass it on" — a much bigger contract than what most users actually want.

3. Webhooks (HTTP callouts).
Too much infrastructure. "After every edit, run prettier" should not require standing up an HTTP server.

What won: a hook is a shell command. Reasonix invokes it with a JSON payload on stdin and reads the exit code.

// .reasonix/settings.json
{
  "hooks": {
    "PostToolUse": [
      { "match": "edit_file", "command": "prettier --write \"$(jq -r .toolArgs.path)\"" }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Language-agnostic. Composable with everything in the user's terminal. Mental model is identical to "what would I run in my shell?"

The 4 events — split into 2 categories

Event Fires when Category
PreToolUse The model decided to call a tool, before dispatch Gating
PostToolUse A tool returned, before the result reaches the model Observing
UserPromptSubmit The user typed a prompt, before it goes to the model Gating
Stop The loop finished a turn (assistant returned final text) Observing

Two categories, not four. The split drives every other decision in the design:

const BLOCKING_EVENTS: ReadonlySet<HookEvent> = new Set(["PreToolUse", "UserPromptSubmit"]);

const DEFAULT_TIMEOUTS_MS: Record<HookEvent, number> = {
  PreToolUse: 5_000,
  UserPromptSubmit: 5_000,
  PostToolUse: 30_000,
  Stop: 30_000,
};
Enter fullscreen mode Exit fullscreen mode

Gating events can refuse to let the loop proceed (exit code 2 = block). They have a tight 5-second timeout because they hold up forward progress.

Observing events fire after the action is already done, so blocking is meaningless — exit 2 from a PostToolUse hook just becomes a warning. They get 30 seconds because nobody is waiting on them.

This asymmetry is hardcoded in the decision matrix:

export function decideOutcome(event: HookEvent, raw: HookSpawnResult) {
  if (raw.spawnError) return "error";
  if (raw.timedOut) return BLOCKING_EVENTS.has(event) ? "block" : "warn";
  if (raw.exitCode === 0) return "pass";
  if (raw.exitCode === 2 && BLOCKING_EVENTS.has(event)) return "block";
  return "warn";
}
Enter fullscreen mode Exit fullscreen mode

Events I cut

  • SessionStart / SessionEnd — solved by the user's shell. If you want to run something before reasonix chat, you write pre-reasonix && reasonix chat. I'm not the right system to schedule that.
  • PreCompact — Reasonix doesn't do automatic context compaction (the cache-first design works against compaction). No event to hook.
  • Per-token streaming — would have been a hook firing for every token of model output. Use cases (PII redaction, profanity filter) exist, but the per-call cost would be brutal and post-processing the final text is more sensible.
  • OnError — error handling lives at the loop level; a hook can't meaningfully recover from "the model returned malformed JSON." A subprocess can't fix what only the loop sees.

The protocol — JSON in, exit code out

Every hook gets a single-line JSON envelope on stdin:

{
  "event": "PreToolUse",
  "cwd": "/Users/me/my-project",
  "toolName": "edit_file",
  "toolArgs": { "path": "src/foo.ts", "content": "..." }
}
Enter fullscreen mode Exit fullscreen mode

Fields differ by event. PostToolUse adds toolResult. UserPromptSubmit has prompt. Stop has lastAssistantText and turn. The shape is documented in hooks.ts.

Exit code is the protocol:

Exit code Gating event Observing event
0 pass (continue) pass
2 block (stop the chain) warn (action already done)
anything else warn (continue, log) warn

Why exit code instead of structured stdout JSON? Because I wanted hooks to be writable in one line of bash:

test -f .git/MERGE_HEAD && exit 2  # block any tool call during a merge
Enter fullscreen mode Exit fullscreen mode

Versus:

test -f .git/MERGE_HEAD && echo '{"decision":"block","reason":"merge in progress"}' || echo '{"decision":"pass"}'
Enter fullscreen mode Exit fullscreen mode

The second is cleaner from a typed-API perspective and worse from a "I'll write this in 30 seconds" perspective. I optimized for 30 seconds.

The cost: hooks can't return structured data back to the model. A PreToolUse hook can only veto or pass — it can't say "let me rewrite these args first." That's a real limitation. The upgrade path, if demand grows, is opt-in: a parseStdoutAsJson: true flag in settings. For now, not worth the complexity.

Two surprising design calls

Call 1 — match is anchored regex, not substring

export function matchesTool(hook: ResolvedHook, toolName: string): boolean {
  if (hook.event !== "PreToolUse" && hook.event !== "PostToolUse") return true;
  const m = hook.match;
  if (!m || m === "*") return true;
  try {
    const re = new RegExp(`^(?:${m})$`);
    return re.test(toolName);
  } catch {
    return false; // fail closed — see "Bugs that shaped the design"
  }
}
Enter fullscreen mode Exit fullscreen mode

So "match": "file" does not trigger on read_file. You have to write ".*file" or "read_file|write_file|edit_file".

I tried substring first. Within a day I had a hook configured as match: "edit" firing on tool_called_edit_text, git_edit_commit, and audit_log_edit — all unintentional. Substring matching is intuitive in the small, dangerous in the large.

The cost: more typing. The benefit: when you write match: "shell", that's exactly what fires. No surprises.

I expect this to be the most-debated decision in this post. Feel free to argue with me in the comments.

Call 2 — events fire from two different layers

src/loop.ts        → fires PreToolUse, PostToolUse
src/cli/ui/App.tsx → fires UserPromptSubmit, Stop
Enter fullscreen mode Exit fullscreen mode

The loop is a library. It runs headless, in tests, in scripts. It doesn't know what a "user prompt submission" is — it just knows about step(text). The text could come from a TUI, a JSON RPC call, or a for loop in a script.

So UserPromptSubmit lives at the App boundary — the TUI is what decides "the user just hit enter on this." Same for Stop — the loop emits assistant_final events, but "the turn is done from the user's perspective" is a UI concept.

Practically: if you embed CacheFirstLoop in your own app, you get tool-related hooks for free. You wire prompt and stop hooks yourself if you want them.

// Embedded usage — tool hooks "just work"
const loop = new CacheFirstLoop({ client, prefix, hooks });
for await (const ev of loop.step("...")) { /* ... */ }

// You decide what counts as a "submitted prompt"
const promptReport = await runHooks({
  hooks,
  payload: { event: "UserPromptSubmit", cwd, prompt: text }
});
Enter fullscreen mode Exit fullscreen mode

Slightly more boilerplate. Way cleaner separation.

Real configurations that do useful things

All copy-pasteable. Drop them in .reasonix/settings.json (project) or ~/.reasonix/settings.json (global).

Block dangerous shell commands

{
  "hooks": {
    "PreToolUse": [
      {
        "match": "shell",
        "command": "jq -r .toolArgs.command | grep -qE '^(rm -rf|sudo|curl.*\\| bash)' && { echo 'denied: dangerous command' >&2; exit 2; } || exit 0",
        "description": "block rm -rf / sudo / curl|bash"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

The >&2 matters — Reasonix surfaces stderr as the block reason in both the UI and the tool result the model sees. The model gets a structured refusal, not silence.

Auto-format after edits

{
  "hooks": {
    "PostToolUse": [
      {
        "match": "edit_file|write_file",
        "command": "path=$(jq -r .toolArgs.path) && prettier --write \"$path\" 2>/dev/null || true"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

The || true is intentional — if prettier doesn't recognize the file type, the hook still passes. We don't want a yellow warning row for every .txt edit.

Daily cost ceiling

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "command": "test \"$(reasonix stats --today --json | jq .totalCostUsd)\" \\< 5.00 || { echo 'daily budget hit ($5)' >&2; exit 2; }",
        "description": "stop accepting prompts after $5/day"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Composes the /stats feature with the hooks system. reasonix stats --today --json returns running cost; the hook compares and blocks. Zero code changes to Reasonix needed.

Auto-commit on Stop, only if clean

{
  "hooks": {
    "Stop": [
      {
        "command": "git diff --quiet || (git add -A && git commit -m \"reasonix: $(jq -r .lastAssistantText | head -c 60)\" --no-verify)"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

If you trust the agent end-to-end, this turns every chat into a commit. I personally don't enable it — but several Reasonix users do, and it's been stable enough to mention.

The bugs that shaped the design

Three real bugs that turned into permanent design decisions.

1. SIGTERM doesn't always land on Windows shell children

First version killed timed-out hooks with child.kill("SIGTERM"). On Windows, when the hook ran through cmd.exe /c, SIGTERM was caught by the shell but never propagated to the actual hook process. The shell exited; the hook kept running.

const timer = setTimeout(() => {
  timedOut = true;
  child.kill("SIGTERM");
  setTimeout(() => {
    try { child.kill("SIGKILL"); } catch { /* gone */ }
  }, 500);
}, input.timeoutMs);
Enter fullscreen mode Exit fullscreen mode

500ms grace, then a hard kill. Slightly inelegant. Actually works.

2. Malformed regex in match used to fire on every tool

new RegExp("[unclosed") throws. The first version caught the throw and returned true — assuming the user's intent was permissive. That's wrong: a typo in a regex shouldn't suddenly cause a PreToolUse hook to fire on every tool call.

} catch {
  // malformed regex → don't fire (safer than firing on every tool)
  return false;
}
Enter fullscreen mode Exit fullscreen mode

A typo in match now makes the hook silently inactive (visible in /hooks list) instead of suddenly gating every tool call.

3. A typo in settings.json crashed the entire CLI

JSON.parse throws on malformed JSON. The first version let it propagate. One missing comma in ~/.reasonix/settings.json and reasonix chat exited with a stack trace before showing the TUI.

function readSettingsFile(path: string): HookSettings | null {
  if (!existsSync(path)) return null;
  try {
    const raw = readFileSync(path, "utf8");
    const parsed = JSON.parse(raw);
    if (parsed && typeof parsed === "object") return parsed as HookSettings;
  } catch {
    /* malformed JSON → treat as no hooks; don't lose the whole CLI to a typo */
  }
  return null;
}
Enter fullscreen mode Exit fullscreen mode

The principle: a tool that's broken should still be openable, even if degraded. Configuration is the most fragile part of any system; it should never take down the part that lets you fix the configuration.

What's still missing

  • No structured rewrite from hooks. A PreToolUse hook can block or pass; it can't rewrite arguments. Want to redact secrets from toolArgs.path? You can't — only refuse the call.
  • No hook timing in /stats. Hook duration is captured per-outcome (durationMs) but isn't aggregated. A 30-second formatter hook on every edit is a real productivity tax — should be visible.
  • No match for non-tool events. UserPromptSubmit always fires for every prompt. There's an argument for match working as a regex over the prompt text. Haven't been convinced yet.
  • Composition with skills. Reasonix 0.4.26 added subagents-via-skills. Subagent invocations don't currently fire PreToolUse (a subagent isn't a tool in the model's eyes). Should they? Probably yes, separate event: PreSubagent. On the roadmap.

Open questions I'd love feedback on

  • Anchored vs substring match. Strong opinion, weakly held. Substring crowd has a point about ergonomics. Anchored crowd has a point about predictability. Vote in the comments.
  • Should hooks be allowed to mutate the payload? I deliberately said no. But "redact this argument before the tool runs" is real. Worth a parseStdoutAsJson: true flag?
  • Per-event vs per-tool default timeouts. I picked 5s for gating, 30s for observing — uniform across tools. But a PostToolUse hook on web_search (already 10s of latency) is different from one on edit_file (instant). Should defaults adapt?

Quick start

npm install -g reasonix
mkdir -p .reasonix
cat > .reasonix/settings.json <<'EOF'
{
  "hooks": {
    "Stop": [
      { "command": "echo 'turn done.' && date" }
    ]
  }
}
EOF
reasonix chat
# /hooks         list active
# /hooks reload  re-read after editing settings.json
Enter fullscreen mode Exit fullscreen mode

Full source: github.com/esengine/reasonix, specifically src/hooks.ts and tests/hooks.test.ts.

If you want the bigger picture on Reasonix (cache-first loop, R1 thought harvesting, branching), the first article covers that.

Issues, design arguments, and counter-examples especially welcome.

Top comments (0)