Designing Hooks for an AI Agent CLI: 4 Lifecycle Points, Shell-Only, and What I Cut
A few weeks ago I shipped Reasonix — a DeepSeek-native TypeScript agent framework. The first article was about the cache-first prompt structure that pushed cache hit rates to 85-95% on real sessions. That got picked up on dev.to.
Two weeks later, I needed hooks. Specifically I wanted my coding agent to:
- Refuse
rm -rfeven if the model decided that was a great idea - Auto-format files after every edit
- Auto-commit at the end of a session if the diff was clean
I could have hardcoded each of these. But every time I did, three days later I'd want a fourth thing — and the agent would grow a "configuration ballast" problem.
So I designed a hooks system. This post is the design doc, including what I cut and the bugs that shaped it.
Why hooks at all (and not the obvious alternatives)
Three things that almost made it in but didn't:
1. A TypeScript plugin system.
You'd npm install reasonix-hook-prettier, the framework would import it, hooks would be JS callbacks. I rejected this because hooks are a sysadmin concern, not a JS concern. The user who wants "run prettier after every edit" already has prettier on $PATH and knows how to invoke it. Forcing them through npm and a TS API is gatekeeping.
2. A middleware chain like Express.
loop.use((event, next) => ...). I rejected this because hooks aren't transformers. They observe, and sometimes veto. Middleware semantics imply "you can rewrite the payload and pass it on" — a much bigger contract than what most users actually want.
3. Webhooks (HTTP callouts).
Too much infrastructure. "After every edit, run prettier" should not require standing up an HTTP server.
What won: a hook is a shell command. Reasonix invokes it with a JSON payload on stdin and reads the exit code.
// .reasonix/settings.json
{
"hooks": {
"PostToolUse": [
{ "match": "edit_file", "command": "prettier --write \"$(jq -r .toolArgs.path)\"" }
]
}
}
Language-agnostic. Composable with everything in the user's terminal. Mental model is identical to "what would I run in my shell?"
The 4 events — split into 2 categories
| Event | Fires when | Category |
|---|---|---|
PreToolUse |
The model decided to call a tool, before dispatch | Gating |
PostToolUse |
A tool returned, before the result reaches the model | Observing |
UserPromptSubmit |
The user typed a prompt, before it goes to the model | Gating |
Stop |
The loop finished a turn (assistant returned final text) | Observing |
Two categories, not four. The split drives every other decision in the design:
const BLOCKING_EVENTS: ReadonlySet<HookEvent> = new Set(["PreToolUse", "UserPromptSubmit"]);
const DEFAULT_TIMEOUTS_MS: Record<HookEvent, number> = {
PreToolUse: 5_000,
UserPromptSubmit: 5_000,
PostToolUse: 30_000,
Stop: 30_000,
};
Gating events can refuse to let the loop proceed (exit code 2 = block). They have a tight 5-second timeout because they hold up forward progress.
Observing events fire after the action is already done, so blocking is meaningless — exit 2 from a PostToolUse hook just becomes a warning. They get 30 seconds because nobody is waiting on them.
This asymmetry is hardcoded in the decision matrix:
export function decideOutcome(event: HookEvent, raw: HookSpawnResult) {
if (raw.spawnError) return "error";
if (raw.timedOut) return BLOCKING_EVENTS.has(event) ? "block" : "warn";
if (raw.exitCode === 0) return "pass";
if (raw.exitCode === 2 && BLOCKING_EVENTS.has(event)) return "block";
return "warn";
}
Events I cut
-
SessionStart / SessionEnd — solved by the user's shell. If you want to run something before
reasonix chat, you writepre-reasonix && reasonix chat. I'm not the right system to schedule that. - PreCompact — Reasonix doesn't do automatic context compaction (the cache-first design works against compaction). No event to hook.
- Per-token streaming — would have been a hook firing for every token of model output. Use cases (PII redaction, profanity filter) exist, but the per-call cost would be brutal and post-processing the final text is more sensible.
- OnError — error handling lives at the loop level; a hook can't meaningfully recover from "the model returned malformed JSON." A subprocess can't fix what only the loop sees.
The protocol — JSON in, exit code out
Every hook gets a single-line JSON envelope on stdin:
{
"event": "PreToolUse",
"cwd": "/Users/me/my-project",
"toolName": "edit_file",
"toolArgs": { "path": "src/foo.ts", "content": "..." }
}
Fields differ by event. PostToolUse adds toolResult. UserPromptSubmit has prompt. Stop has lastAssistantText and turn. The shape is documented in hooks.ts.
Exit code is the protocol:
| Exit code | Gating event | Observing event |
|---|---|---|
0 |
pass (continue) | pass |
2 |
block (stop the chain) | warn (action already done) |
| anything else | warn (continue, log) | warn |
Why exit code instead of structured stdout JSON? Because I wanted hooks to be writable in one line of bash:
test -f .git/MERGE_HEAD && exit 2 # block any tool call during a merge
Versus:
test -f .git/MERGE_HEAD && echo '{"decision":"block","reason":"merge in progress"}' || echo '{"decision":"pass"}'
The second is cleaner from a typed-API perspective and worse from a "I'll write this in 30 seconds" perspective. I optimized for 30 seconds.
The cost: hooks can't return structured data back to the model. A PreToolUse hook can only veto or pass — it can't say "let me rewrite these args first." That's a real limitation. The upgrade path, if demand grows, is opt-in: a parseStdoutAsJson: true flag in settings. For now, not worth the complexity.
Two surprising design calls
Call 1 — match is anchored regex, not substring
export function matchesTool(hook: ResolvedHook, toolName: string): boolean {
if (hook.event !== "PreToolUse" && hook.event !== "PostToolUse") return true;
const m = hook.match;
if (!m || m === "*") return true;
try {
const re = new RegExp(`^(?:${m})$`);
return re.test(toolName);
} catch {
return false; // fail closed — see "Bugs that shaped the design"
}
}
So "match": "file" does not trigger on read_file. You have to write ".*file" or "read_file|write_file|edit_file".
I tried substring first. Within a day I had a hook configured as match: "edit" firing on tool_called_edit_text, git_edit_commit, and audit_log_edit — all unintentional. Substring matching is intuitive in the small, dangerous in the large.
The cost: more typing. The benefit: when you write match: "shell", that's exactly what fires. No surprises.
I expect this to be the most-debated decision in this post. Feel free to argue with me in the comments.
Call 2 — events fire from two different layers
src/loop.ts → fires PreToolUse, PostToolUse
src/cli/ui/App.tsx → fires UserPromptSubmit, Stop
The loop is a library. It runs headless, in tests, in scripts. It doesn't know what a "user prompt submission" is — it just knows about step(text). The text could come from a TUI, a JSON RPC call, or a for loop in a script.
So UserPromptSubmit lives at the App boundary — the TUI is what decides "the user just hit enter on this." Same for Stop — the loop emits assistant_final events, but "the turn is done from the user's perspective" is a UI concept.
Practically: if you embed CacheFirstLoop in your own app, you get tool-related hooks for free. You wire prompt and stop hooks yourself if you want them.
// Embedded usage — tool hooks "just work"
const loop = new CacheFirstLoop({ client, prefix, hooks });
for await (const ev of loop.step("...")) { /* ... */ }
// You decide what counts as a "submitted prompt"
const promptReport = await runHooks({
hooks,
payload: { event: "UserPromptSubmit", cwd, prompt: text }
});
Slightly more boilerplate. Way cleaner separation.
Real configurations that do useful things
All copy-pasteable. Drop them in .reasonix/settings.json (project) or ~/.reasonix/settings.json (global).
Block dangerous shell commands
{
"hooks": {
"PreToolUse": [
{
"match": "shell",
"command": "jq -r .toolArgs.command | grep -qE '^(rm -rf|sudo|curl.*\\| bash)' && { echo 'denied: dangerous command' >&2; exit 2; } || exit 0",
"description": "block rm -rf / sudo / curl|bash"
}
]
}
}
The >&2 matters — Reasonix surfaces stderr as the block reason in both the UI and the tool result the model sees. The model gets a structured refusal, not silence.
Auto-format after edits
{
"hooks": {
"PostToolUse": [
{
"match": "edit_file|write_file",
"command": "path=$(jq -r .toolArgs.path) && prettier --write \"$path\" 2>/dev/null || true"
}
]
}
}
The || true is intentional — if prettier doesn't recognize the file type, the hook still passes. We don't want a yellow warning row for every .txt edit.
Daily cost ceiling
{
"hooks": {
"UserPromptSubmit": [
{
"command": "test \"$(reasonix stats --today --json | jq .totalCostUsd)\" \\< 5.00 || { echo 'daily budget hit ($5)' >&2; exit 2; }",
"description": "stop accepting prompts after $5/day"
}
]
}
}
Composes the /stats feature with the hooks system. reasonix stats --today --json returns running cost; the hook compares and blocks. Zero code changes to Reasonix needed.
Auto-commit on Stop, only if clean
{
"hooks": {
"Stop": [
{
"command": "git diff --quiet || (git add -A && git commit -m \"reasonix: $(jq -r .lastAssistantText | head -c 60)\" --no-verify)"
}
]
}
}
If you trust the agent end-to-end, this turns every chat into a commit. I personally don't enable it — but several Reasonix users do, and it's been stable enough to mention.
The bugs that shaped the design
Three real bugs that turned into permanent design decisions.
1. SIGTERM doesn't always land on Windows shell children
First version killed timed-out hooks with child.kill("SIGTERM"). On Windows, when the hook ran through cmd.exe /c, SIGTERM was caught by the shell but never propagated to the actual hook process. The shell exited; the hook kept running.
const timer = setTimeout(() => {
timedOut = true;
child.kill("SIGTERM");
setTimeout(() => {
try { child.kill("SIGKILL"); } catch { /* gone */ }
}, 500);
}, input.timeoutMs);
500ms grace, then a hard kill. Slightly inelegant. Actually works.
2. Malformed regex in match used to fire on every tool
new RegExp("[unclosed") throws. The first version caught the throw and returned true — assuming the user's intent was permissive. That's wrong: a typo in a regex shouldn't suddenly cause a PreToolUse hook to fire on every tool call.
} catch {
// malformed regex → don't fire (safer than firing on every tool)
return false;
}
A typo in match now makes the hook silently inactive (visible in /hooks list) instead of suddenly gating every tool call.
3. A typo in settings.json crashed the entire CLI
JSON.parse throws on malformed JSON. The first version let it propagate. One missing comma in ~/.reasonix/settings.json and reasonix chat exited with a stack trace before showing the TUI.
function readSettingsFile(path: string): HookSettings | null {
if (!existsSync(path)) return null;
try {
const raw = readFileSync(path, "utf8");
const parsed = JSON.parse(raw);
if (parsed && typeof parsed === "object") return parsed as HookSettings;
} catch {
/* malformed JSON → treat as no hooks; don't lose the whole CLI to a typo */
}
return null;
}
The principle: a tool that's broken should still be openable, even if degraded. Configuration is the most fragile part of any system; it should never take down the part that lets you fix the configuration.
What's still missing
-
No structured rewrite from hooks. A
PreToolUsehook can block or pass; it can't rewrite arguments. Want to redact secrets fromtoolArgs.path? You can't — only refuse the call. -
No hook timing in
/stats. Hook duration is captured per-outcome (durationMs) but isn't aggregated. A 30-second formatter hook on every edit is a real productivity tax — should be visible. -
No
matchfor non-tool events.UserPromptSubmitalways fires for every prompt. There's an argument formatchworking as a regex over the prompt text. Haven't been convinced yet. -
Composition with skills. Reasonix 0.4.26 added subagents-via-skills. Subagent invocations don't currently fire
PreToolUse(a subagent isn't a tool in the model's eyes). Should they? Probably yes, separate event:PreSubagent. On the roadmap.
Open questions I'd love feedback on
-
Anchored vs substring
match. Strong opinion, weakly held. Substring crowd has a point about ergonomics. Anchored crowd has a point about predictability. Vote in the comments. -
Should hooks be allowed to mutate the payload? I deliberately said no. But "redact this argument before the tool runs" is real. Worth a
parseStdoutAsJson: trueflag? -
Per-event vs per-tool default timeouts. I picked 5s for gating, 30s for observing — uniform across tools. But a
PostToolUsehook onweb_search(already 10s of latency) is different from one onedit_file(instant). Should defaults adapt?
Quick start
npm install -g reasonix
mkdir -p .reasonix
cat > .reasonix/settings.json <<'EOF'
{
"hooks": {
"Stop": [
{ "command": "echo 'turn done.' && date" }
]
}
}
EOF
reasonix chat
# /hooks list active
# /hooks reload re-read after editing settings.json
Full source: github.com/esengine/reasonix, specifically src/hooks.ts and tests/hooks.test.ts.
If you want the bigger picture on Reasonix (cache-first loop, R1 thought harvesting, branching), the first article covers that.
Issues, design arguments, and counter-examples especially welcome.
Top comments (0)