DEV Community

Mavericksantander
Mavericksantander

Posted on

Your AI Agent Just Deleted Something It Shouldn't Have? Here's How to Prevent It

You gave your agent access to the filesystem.

It was supposed to clean up temp files.

Instead, it deleted something important.

Or it called an external API using production credentials when you only meant to test it. Or executed a shell command that made sense in isolation — but was catastrophic in context.

These aren't edge cases. They're predictable failure modes.


The Missing Layer in Most Agent Architectures

When building an AI agent, most developers focus on three things:

  1. The model (Claude, GPT-4, etc.)
  2. The tools it can access
  3. The system prompt

But there's a layer that consistently gets skipped:

What happens when the agent does something it can do… but shouldn't?

This is subtle, and that's what makes it dangerous:

  • ✅ The model is working correctly
  • ✅ The tool is functioning as expected
  • ✅ The instruction is valid

And yet — the action is still wrong in context.

This isn't a model alignment problem. It's a policy enforcement problem.


Why Prompts Aren't Enough

The natural instinct is to write something like:

"Don't do anything destructive. Never delete files in production. Be careful with credentials."

The problem:

  • Prompts are not guarantees — they influence behavior, they don't enforce it
  • Prompt injection is real — malicious content in the environment can override your instructions
  • Context is incomplete — the agent doesn't always know what it doesn't know
  • Reasoning fails under edge cases — even well-designed prompts break in unexpected situations

You need a hard control point between the agent and the real world.


Runtime Authorization: The Core Idea

I built a small library called Canopy Runtime around one simple principle:

Before an agent executes any action, it must be explicitly authorized.

The entire API surface is a single function:

authorize_action(agent_ctx, action_type, action_payload)
Enter fullscreen mode Exit fullscreen mode

It returns one of three decisions:

Decision Meaning
ALLOW Proceed
DENY Block immediately
REQUIRE_APPROVAL Pause for human review

Before vs. After

Without guardrails:

import subprocess

subprocess.run(command)  # 🤞 hope for the best
Enter fullscreen mode Exit fullscreen mode

With runtime authorization:

from canopy import authorize_action
import subprocess

result = authorize_action(
    agent_ctx={"env": "production"},
    action_type="execute_shell",
    action_payload={"command": command},
)

match result["decision"]:
    case "ALLOW":
        subprocess.run(command)
    case "DENY":
        print(f"Blocked: {result['reason']}")
    case "REQUIRE_APPROVAL":
        request_human_approval(result)
Enter fullscreen mode Exit fullscreen mode

Same command. Completely different safety profile.


Default Safety Policies (Out of the Box)

Canopy ships with conservative defaults so you're protected immediately:

Shell commands:

  • Destructive patterns (rm -rf, mkfs) → DENY
  • Network operations (curl, wget) → REQUIRE_APPROVAL

File operations:

  • Protected system paths → DENY
  • Allowlisted paths → ALLOW

External APIs:

  • Default → REQUIRE_APPROVAL

No configuration required to get a secure baseline.


Tamper-Evident Audit Logging

Every authorization decision is logged with a cryptographic hash chain — each entry links to the previous one:

{
  "timestamp": "2026-04-03T00:04:54Z",
  "action_type": "execute_shell",
  "decision": "DENY",
  "reason": "destructive pattern detected",
  "entry_hash": "a3f9...",
  "prev_hash": "cc81..."
}
Enter fullscreen mode Exit fullscreen mode

What this gives you:

  • Logs cannot be silently modified — any tampering is detectable
  • Full post-incident traceability — replay exactly what happened and why
  • Compliance-ready — useful for security audits, regulated environments

Custom Policies via YAML

For fine-grained control, define your own rules:

export CANOPY_POLICY_FILE=/path/to/policy.yaml
Enter fullscreen mode Exit fullscreen mode
rules:
  - action_type: "execute_shell"
    when_all:
      - 'agent_ctx.env == "production"'
    deny_regex: 'rm\s+-rf'

  - action_type: "call_external_api"
    require_approval: true
Enter fullscreen mode Exit fullscreen mode

Policies are explicit, versionable, and enforced at runtime — not just suggested in a prompt.


Optional: Run as a Gateway Service

If you have multiple agents or services, you can run Canopy as a standalone HTTP gateway:

pip install canopy-runtime[gateway]
python -m uvicorn canopy.service:app --port 8010
Enter fullscreen mode Exit fullscreen mode

Agents post to:
POST /authorize_action

This makes the system language-agnostic — any agent, any stack, same enforcement layer.


When Do You Actually Need This?

If your agent does any of the following, you need runtime authorization:

  • ☐ Executes shell commands
  • ☐ Reads or modifies files
  • ☐ Calls external APIs
  • ☐ Uses credentials (API keys, tokens, secrets)
  • ☐ Runs in a production or staging environment

Rule of thumb: if it can affect real systems, it needs enforceable controls.


Try It

pip install canopy-runtime
Enter fullscreen mode Exit fullscreen mode
from canopy import authorize_action

result = authorize_action(
    agent_ctx={"env": "production"},
    action_type="call_external_api",
    action_payload={"url": "https://api.stripe.com/v1/charges"},
)

print(result["decision"])
Enter fullscreen mode Exit fullscreen mode

Then check audit.log. That's where the real value shows up.


Final Thought

AI agents are moving from assistants to actors.

Once they can take actions — not just generate text — the risk profile changes completely. Smarter models help, but they're not the answer to this problem.

We need enforceable boundaries at runtime, not just well-crafted prompts.


How are you handling runtime safety in your agent stack today? Drop a comment — genuinely curious what approaches people are using.

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

One surprising insight is that configuring access controls at a more granular level is often overlooked. In our experience with enterprise teams, the key isn't just setting permissions broadly, but leveraging system tools like AppArmor or SELinux to fine-tune what your AI agent can actually touch. This goes beyond traditional file permission settings and can drastically reduce unintended file deletions. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)