Your AI coding agent will eventually run rm -rf — so I built a fuse for it

#ai #devtools #cli #security

AI coding agents are great until the moment they aren't.

They don't just write text — they run real commands in your terminal, with your permissions. Most of the time that's the whole point of using one. But every so often an agent hits an error, decides the fix is to "clean things up," and runs something like:

rm -rf .

No malice, it genuinely thinks it's helping. That's actually what makes it dangerous: a human would freeze the second they realized "wait, that's the prod folder," but the agent just keeps going.

I got tired of babysitting mine, so I ended up building a fuse for it.

Telling it to "be careful" doesn't work

The obvious first move is to tell the agent to behave: "Never run destructive commands."

But that's just a request, not a control. The model forgets it halfway through a task, and honestly, if it could reliably tell what counts as destructive in context, you wouldn't need the rule at all. What you actually need is something that doesn't rely on the agent's judgment, or its mood that day.

A hook that checks the command before it runs

Claude Code has a PreToolUse hook that fires before a tool call actually executes, and it can block that call. That's really the whole trick.

agent-fuse is a small script wired into that hook. Before your agent runs any shell command, the fuse looks at the command text and returns one of three decisions:

deny — irreversible damage, blocked outright
ask — risky but sometimes intended, so it pauses for a human
allow — normal work, passes straight through

It talks to Claude Code using its JSON protocol. Input on stdin looks like:

{ "tool_name": "Bash", "tool_input": { "command": "rm -rf /" } }

And the output:

{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "[agent-fuse:rm-rf-root-or-home] rm -rf targeting / , ~ , $HOME or a top-level path."
  }
}

The rules themselves are just JSON: a pattern, a severity, and a message.

{
  "id": "gcloud-delete-infra",
  "severity": "block",
  "pattern": "gcloud\\s+(sql\\s+instances\\s+delete|firestore\\s+databases\\s+delete|run\\s+services\\s+delete|projects\\s+delete|secrets\\s+delete)\\b",
  "message": "Deleting managed cloud infrastructure. Almost always irreversible."
}

The default set covers the usual footguns: rm -rf on /, ~, or $HOME; DROP/TRUNCATE; DELETE/UPDATE with no WHERE; git push --force to main; terraform destroy; gcloud … delete; recursive bucket deletes; dd of=/dev/…; mkfs; curl … | bash. You can add your own by pointing AGENT_FUSE_RULES at a file.

It judges the command itself, not how confident the agent sounds, so it doesn't matter how sure the agent is that this time it's fine.

The part I'd rather admit up front

You're a developer, you'd spot this in five seconds anyway, so I'll just say it directly: this matches command text with regex. It's not a shell parser.

Which means:

echo "DROP TABLE" trips the SQL rule (a false positive).
a deliberately obfuscated command — r""m -rf, or something base64-decoded and built up through variables — can slip past it (a false negative).

What it does stop is the far more common problem: an agent acting in good faith that runs a plainly-written destructive command. Think of it as a seatbelt, not armor plating, and it's no substitute for backups. If you're worried about a determined adversary, this isn't the tool for that. If you're worried about your agent enthusiastically deleting something at 2am, it is.

Right now it also only hooks Claude Code's Bash tool specifically. Cursor/Codex support and file-write coverage are next on the list.

Build it yourself, or just grab mine

You can absolutely build this yourself, the hook part isn't hard.

What takes actual time is the rest of it: sitting down to list out every destructive pattern you can think of, deciding block vs. ask for each one, testing that you haven't broken normal usage, and writing an installer that backs up existing settings and can be run more than once safely. That's basically the afternoon I spent on this.

The code itself is open to read, one Python file, no dependencies, so you can check the "no network calls" claim yourself. I also sell a packaged, curated, kept-updated version for $8: agent-fuse on Gumroad. If you'd rather just take the idea and build your own version, that's a perfectly good outcome too.

Either way, put something between your agent and the commands you can't undo. Giving an autonomous process root access on your machine probably deserves at least as much safety as we give a household circuit breaker.

Do you already guard your agents somehow, allowlists, sandboxes, a devcontainer? Curious what's actually working for people, drop it in the comments.