Your AI Agent Just Ran rm -rf / — Here's How to Stop It

#ai #security #claude #opensource

AI coding agents like Claude Code, Cursor, and Copilot can execute shell commands, edit files, and call APIs on your behalf. That's powerful — until the agent runs something dangerous.

I built BodAIGuard to solve this. It's a guardrail that sits between AI agents and your system, blocking dangerous actions before they execute.

The Problem

AI agents can be tricked through prompt injection: malicious text hidden in tool results, emails, or web pages that hijacks the agent's behavior.

Without protection, the agent might actually execute dangerous commands.

How BodAIGuard Works

BodAIGuard evaluates every action before it runs:

AI Agent → BodAIGuard → System
            ↑
    45 block rules
    45 confirm rules
    12 categories
    Prompt injection scanner

5 enforcement modes:

Claude Code hooks — intercepts tool calls via PreToolUse
MCP Proxy — wraps any MCP server with security
HTTP Proxy — sits between agent and API
Shell Guard — bash integration via extdebug
REST API — evaluate commands programmatically

Quick Start

Download the binary for your platform from GitHub Releases.

# Make it executable
chmod +x bodaiguard-linux-x64
sudo mv bodaiguard-linux-x64 /usr/local/bin/bodaiguard

# Install Claude Code hooks
bodaiguard install hooks

# Test it
bodaiguard test 'rm -rf /'
# → BLOCK: Filesystem root destruction

bodaiguard test 'ls -la'
# → ALLOW

MCP Proxy Mode (New in v0.8.0)

Wrap any MCP server with security:

{
  "mcpServers": {
    "my-tool": {
      "command": "bodaiguard",
      "args": ["mcp-proxy", "node", "/path/to/mcp-server.js"]
    }
  }
}

Every tools/call is evaluated against the rules. Dangerous calls get a JSON-RPC error instead of executing.

Prompt Injection Detection

BodAIGuard scans content for injection attacks:

bodaiguard scan 'Check encoded payloads'
# → Detects base64-encoded attacks, hidden HTML, zero-width obfuscation

Detection covers:

Instruction overrides and role hijacking
ChatML delimiter injection
Base64-encoded attacks
Hidden HTML injection
Zero-width character obfuscation

Building Your Own Detector (Tutorial)

Want to understand the concepts? Here's a minimal 3-tier detector in Python:

import re, base64, unicodedata

# Tier 1: Regex patterns for known attack types
PATTERNS = [
    (r'you\s+are\s+now\s+(DAN|evil|unrestricted)', 'role_hijack'),
    (r'<\|im_start\|>', 'chatml_delimiter'),
    (r'pretend\s+you\s+have\s+no\s+restrictions', 'jailbreak'),
]

def detect(content: str) -> dict:
    # Normalize unicode
    text = unicodedata.normalize('NFKC', content)
    text = re.sub(r'[\u200b-\u200d\u2060\ufeff]', '', text)

    # Tier 1: Fast regex
    for pattern, category in PATTERNS:
        if re.search(pattern, text, re.IGNORECASE):
            return {'blocked': True, 'reason': category}

    # Tier 2: Base64 decode + re-scan
    for match in re.findall(r'[A-Za-z0-9+/]{24,}={0,2}', text):
        try:
            decoded = base64.b64decode(match, validate=True).decode()
            for pattern, category in PATTERNS:
                if re.search(pattern, decoded, re.IGNORECASE):
                    return {'blocked': True, 'reason': f'encoded_{category}'}
        except Exception:
            pass

    return {'blocked': False}

This catches the basics. BodAIGuard goes much further with DOM parsing, Unicode lookalike detection, and YAML-driven rules.

YAML-Driven Rules

All rules live in default.yaml — no code changes needed:

actions:
  destructive:
    block:
      - pattern: 'rm\\s+(-[a-zA-Z]*)?\\s*/'
        reason: Filesystem root destruction
      - pattern: 'mkfs\\.'
        reason: Filesystem format
    confirm:
      - pattern: 'rm\\s+-r'
        reason: Recursive delete

paths:
  block:
    - ~/.ssh/**
    - /etc/shadow
  readonly:
    - /etc/passwd

Top comments (2)

MaxxMini • Feb 25

The 5-mode enforcement architecture is smart — especially Shell Guard via extdebug. I've been running a 24/7 AI agent on a Mac Mini for weeks now, and the security layering we arrived at independently mirrors several of your categories.

A few observations from the trenches:

The "confirm" tier is underrated. We started with binary block/allow, but the interesting commands are the ambiguous ones. git push --force isn't destructive to the filesystem, but it can destroy history. brew services restart seems harmless but once silently overwrote our Ollama plist config. Having a "pause and ask" tier for these gray-zone commands would have saved us hours of debugging.

Credential path protection needs recursive expansion. Your ~/.ssh/** glob is good, but cat $(find ~ -name "*.pem") bypasses path-based blocks entirely. Does BodAIGuard evaluate expanded arguments or just the raw command string?

The base64 decode + re-scan is essential. We've seen exactly this in production: fetched web pages containing encoded instructions targeting our agent. Question — do you handle nested encoding? Base64-wrapped URL-encoded payloads require multi-pass decode.

Curious about YAML rule hot-reload too. We have to restart our entire gateway to update security policies. Being able to SIGHUP new rules into a running guardrail without dropping sessions would be a huge operational win.

Etienne • Feb 25

Thanks for the detailed feedback — glad the architecture resonates
with your real-world experience.
On the confirm tier: Exactly right. We have 45 confirm rules
specifically for these gray-zone commands like git push --force,
recursive deletes, service restarts, etc. Binary block/allow misses
too much.
On path expansion and $(find ...): Yes, BodAIGuard uses
shell-aware parsing (shell-quote) to extract and evaluate $()
substitutions and backticks separately — not just the raw command
string. Each sub-expression is evaluated independently.
On nested encoding: Currently we handle base64 decode +
recursive re-scan. Multi-pass decoding (URL-encode wrapped in
base64, etc.) is on the roadmap.
On YAML hot-reload: Rules are re-read on every hook invocation
— no persistent cache between calls. So you can edit default.yaml
and the next tool call picks up the changes immediately. No
restart, no SIGHUP needed.
Appreciate you sharing the Mac Mini setup context — always
interesting to hear how others are handling agent security in
production.