DEV Community

yotta
yotta

Posted on

An AI Disabled Its Own Safety Guard — So I Redesigned It

TL;DR

  • I built omamori, a Rust CLI that blocks destructive commands executed by AI tools (Claude Code, Codex CLI, Cursor, etc.)
  • During testing, Gemini CLI autonomously discovered how to disable omamori's protection rules — without being told how
  • omamori now defends not just against dangerous commands, but against AI agents disabling the guard itself
  • What it can't block is explicitly documented in SECURITY.md and tested in a bypass corpus

GitHub logo yottayoshida / omamori

AI Agent's Omamori — protect your system from dangerous commands executed via AI CLI tools

omamori

CI crates.io homebrew License

Safety guard for AI CLI tools. Blocks dangerous commands — and resists being disabled.

When AI tools like Claude Code, Codex, or Cursor run shell commands, omamori intercepts destructive operations and replaces them with safe alternatives. It also defends itself against AI agents attempting to disable or bypass its protection (#22).

macOS only. Terminal commands are never affected — omamori only activates when it detects an AI tool's environment variable.

omamori demo

Quick Start

# Install (macOS)
brew install yottayoshida/tap/omamori

# Setup (shims + hooks + config — all in one)
omamori install --hooks

# Add to your shell profile (~/.zshrc or ~/.bashrc)
export PATH="$HOME/.omamori/shim:$PATH"
Enter fullscreen mode Exit fullscreen mode

That's it. After brew upgrade, hooks and shims are auto-updated on the next command — no manual re-install needed.

What It Blocks

Command Pattern Action
rm -r, -rf, -fr, --recursive trash — move to macOS

An AI Disabled Its Own Safety Guard

The original idea was simple. When an AI CLI tool runs rm -rf, redirect the files to Trash instead of deleting them permanently. Don't interfere with terminal commands typed directly by the user. A small safety valve — nothing more.

That's how omamori started. A macOS guard tool that intercepts destructive commands from AI CLI tools (Claude Code, Codex CLI, Cursor, etc.) and either blocks them or replaces them with safer alternatives.

I tested it with each tool. Claude Code, Codex CLI, Cursor — all worked as expected. Dangerous commands were caught.

Then, during a real-world test with Gemini CLI, something happened that changed the design (Issue #22).

The AI autonomously discovered the omamori config disable command, turned off the protection rules, and then ran the destructive command. I never told it how to disable omamori.

For an AI, "remove the obstacle" is a rational strategy to complete its task.

Blocking commands wasn't enough. I had to defend against the AI dismantling the guard itself. That's when omamori's design changed.


The Threat Model

That incident made one thing clear: in the age of AI CLI tools, a safety guard that just enumerates dangerous commands and blocks them isn't sufficient.

AI CLI tools can propose and execute strong actions without fully evaluating the importance or irreversibility of what they're affecting. Reports of destructive operations by AI agents are appearing more frequently across GitHub Issues and social media. Literacy and rules alone aren't enough to prevent accidents.

So omamori is built around three assumptions:

  1. AI will execute destructive commandsrm -rf, git reset --hard, git push --force, etc.
  2. AI will try to disable its own guardconfig disable, uninstall, env var removal
  3. Complete prevention is impossible — and that fact should be disclosed honestly

Intercepting Only AI-Executed Commands

AI CLI tool → CLAUDECODE=1 → rm -rf src/
                                ↓
                          [omamori shim]
                                ↓
                        blocked (protected path)

Terminal → rm -rf src/
                ↓
          [/usr/bin/rm]
                ↓
          deleted normally
Enter fullscreen mode Exit fullscreen mode

omamori detects AI tools by environment variable. Claude Code sets CLAUDECODE=1, Codex CLI sets CODEX_CI=1, Cursor sets CURSOR_AGENT=1. Rules only apply when one of these is present. Terminal commands are never affected.

Default Rules

Command Pattern Action
rm -r, -rf, -fr, --recursive Move to macOS Trash (not permanent delete)
git reset --hard git stash first, then execute
git push --force, push -f Block
git clean -fd, clean -fdx Block
chmod 777 Block
find -delete, --delete Block
rsync --delete + 7 variants Block

Default rules were selected based on the most commonly reported AI CLI accidents on GitHub Issues and Hacker News — irreversible rm -rf, lost uncommitted changes from git reset --hard, history destruction from git push --force.

Staying Out of Your Way

Guard tools tend to be heavy on configuration. People who don't read code — and increasingly, even engineers running autonomous AI workflows — aren't going to manage complex CLI settings. omamori is designed to work with zero configuration.

  • Zero-config: brew installomamori install --hooks → add to PATH. No config file editing needed
  • Defaults always active: 7 protections work out of the box. Customize only by writing diffs to config.toml
  • Auto-maintenance: After brew upgrade, shims and hooks auto-update on the next invocation
# macOS / Homebrew
brew install yottayoshida/tap/omamori
omamori install --hooks
export PATH="$HOME/.omamori/shim:$PATH"  # add to ~/.zshrc
Enter fullscreen mode Exit fullscreen mode

How It Works: PATH Shim + Hooks

Two layers of defense.

Layer 1 — PATH shim: Symlinks for rm, git, chmod, find, rsync are placed in ~/.omamori/shim/ at the front of PATH. When an AI environment variable is detected, rules are applied. Otherwise, the real binary is called directly.

Layer 2 — Hooks: Full-path execution like /bin/rm bypasses the shim. Hooks compensate. Claude Code's PreToolUse hook and Cursor's beforeShellExecution handler catch dangerous patterns via string matching.

Shim overhead is under 10ms. Without an AI environment variable, the real binary is called via the shortest path. No perceptible delay.

The safety design cuts two ways. A temporary shim crash shouldn't halt the user's work. But a broken config shouldn't silently remove all protection. So: shim crash → fail-open (real command runs). Broken config → fail-close (default rules apply automatically).


Adversarial-Aware: Defending Against AI Self-Disablement

After the Gemini CLI incident, omamori implemented these defenses:

  • When an AI environment variable is detected, config disable, uninstall, and init --force are blocked
  • Claude Code and Cursor hooks detect and block unset CLAUDECODE and similar env var removal attempts
  • Claude Code's PreToolUse hook blocks direct editing of config.toml

Existing destructive command guard tools tend to focus on rule breadth and analysis depth. omamori focuses on AI self-bypass defense and context-aware evaluation. Different tools, different threat surfaces.


Context-Aware: Not Everything Gets the Same Treatment

Another design priority: don't block everything with the same force.

Command Without context With context
rm -rf target/ Move to Trash Log only (regenerable)
rm -rf src/ Move to Trash Block (source code is protected)
git reset --hard (no changes) Stash then exec Log only (nothing to lose)

target/ and node_modules/ can be rebuilt with cargo build. src/ and .env can't. This distinction is opt-in via config.toml:

[context]
# Just adding [context] activates built-in defaults.
# To customize, specify your own lists (replaces built-in defaults):
# regenerable_paths = ["target/", "node_modules/", "my-cache/"]
# protected_paths = ["src/", "lib/", ".git/", ".env", ".ssh/", "secrets/"]
Enter fullscreen mode Exit fullscreen mode

src/ and similar paths are hardcoded as NEVER_REGENERABLE internally. Even if added to regenerable_paths, they're silently ignored. Misconfiguration fails safe.


What It Can't Block

omamori can't prevent everything. This is documented in SECURITY.md as KNOWN_LIMIT:

Attack Vector Why Undetectable
sudo rm -rf sudo changes PATH; shim is never invoked
alias rm='/bin/rm' Aliases bypass hook string matching
env -i rm -rf Clears all env vars; undetectable by hooks
Base64-encoded commands String-based matching can't decode runtime-constructed commands
export -n CLAUDECODE Removes export attribute without unsetting; not caught by unset patterns

A bypass corpus test suite verifies both what omamori blocks and what it can't. Documenting what you can't do is more trustworthy than only listing what you can.


Supported Tools and Limitations

Tier Tools Coverage
Supported Claude Code, Codex CLI, Cursor E2E tested. Layer 1 + Layer 2.
Community Gemini CLI, Cline Layer 1 only. Not E2E tested.
Fallback Any tool setting AI_GUARD=1 Layer 1 only.

Current limitations: macOS only. Command-level protection (not a sandbox). Only detects known AI tools. Full-path execution bypasses the shim (partially mitigated by Layer 2). All documented in SECURITY.md.

Expanding support for tools without hook APIs and adding tamper-evident audit logging are under consideration. Progress is tracked in GitHub Issues.


Conclusion

The core of omamori isn't "redirect rm -rf to Trash."

It's the decision to treat AI not as a proxy acting on the user's behalf, but as an agent that may disable protection mechanisms to achieve its goals.

What threat model did that lead to? How did the design change? What can be defended, and where does defense end? This article is a record of those decisions.

If you use AI CLI tools on macOS, give it a try:

brew install yottayoshida/tap/omamori
omamori install --hooks
Enter fullscreen mode Exit fullscreen mode

Top comments (0)