Mavericksantander

Posted on Apr 3

Your AI Agent Just Deleted Something It Shouldn't Have? Here's How to Prevent It

#ai #agents #opensource #python

You gave your agent access to the filesystem.

It was supposed to clean up temp files.

Instead, it deleted something important.

Or it called an external API using production credentials when you only meant to test it. Or executed a shell command that made sense in isolation — but was catastrophic in context.

These aren't edge cases. They're predictable failure modes.

The Missing Layer in Most Agent Architectures

When building an AI agent, most developers focus on three things:

The model (Claude, GPT-4, etc.)
The tools it can access
The system prompt

But there's a layer that consistently gets skipped:

What happens when the agent does something it can do… but shouldn't?

This is subtle, and that's what makes it dangerous:

✅ The model is working correctly
✅ The tool is functioning as expected
✅ The instruction is valid

And yet — the action is still wrong in context.

This isn't a model alignment problem. It's a policy enforcement problem.

Why Prompts Aren't Enough

The natural instinct is to write something like:

"Don't do anything destructive. Never delete files in production. Be careful with credentials."

The problem:

Prompts are not guarantees — they influence behavior, they don't enforce it
Prompt injection is real — malicious content in the environment can override your instructions
Context is incomplete — the agent doesn't always know what it doesn't know
Reasoning fails under edge cases — even well-designed prompts break in unexpected situations

You need a hard control point between the agent and the real world.

Runtime Authorization: The Core Idea

I built a small library called Canopy Runtime around one simple principle:

Before an agent executes any action, it must be explicitly authorized.

The entire API surface is a single function:

authorize_action(agent_ctx, action_type, action_payload)

It returns one of three decisions:

Decision	Meaning
`ALLOW`	Proceed
`DENY`	Block immediately
`REQUIRE_APPROVAL`	Pause for human review

Before vs. After

Without guardrails:

import subprocess

subprocess.run(command)  # 🤞 hope for the best

With runtime authorization:

from canopy import authorize_action
import subprocess

result = authorize_action(
    agent_ctx={"env": "production"},
    action_type="execute_shell",
    action_payload={"command": command},
)

match result["decision"]:
    case "ALLOW":
        subprocess.run(command)
    case "DENY":
        print(f"Blocked: {result['reason']}")
    case "REQUIRE_APPROVAL":
        request_human_approval(result)

Same command. Completely different safety profile.

Default Safety Policies (Out of the Box)

Canopy ships with conservative defaults so you're protected immediately:

Shell commands:

Destructive patterns (rm -rf, mkfs) → DENY
Network operations (curl, wget) → REQUIRE_APPROVAL

File operations:

Protected system paths → DENY
Allowlisted paths → ALLOW

External APIs:

Default → REQUIRE_APPROVAL

No configuration required to get a secure baseline.

Tamper-Evident Audit Logging

Every authorization decision is logged with a cryptographic hash chain — each entry links to the previous one:

{
  "timestamp": "2026-04-03T00:04:54Z",
  "action_type": "execute_shell",
  "decision": "DENY",
  "reason": "destructive pattern detected",
  "entry_hash": "a3f9...",
  "prev_hash": "cc81..."
}

What this gives you:

Logs cannot be silently modified — any tampering is detectable
Full post-incident traceability — replay exactly what happened and why
Compliance-ready — useful for security audits, regulated environments

Custom Policies via YAML

For fine-grained control, define your own rules:

export CANOPY_POLICY_FILE=/path/to/policy.yaml

rules:
  - action_type: "execute_shell"
    when_all:
      - 'agent_ctx.env == "production"'
    deny_regex: 'rm\s+-rf'

  - action_type: "call_external_api"
    require_approval: true

Policies are explicit, versionable, and enforced at runtime — not just suggested in a prompt.

Optional: Run as a Gateway Service

If you have multiple agents or services, you can run Canopy as a standalone HTTP gateway:

pip install canopy-runtime[gateway]
python -m uvicorn canopy.service:app --port 8010

Agents post to:
POST /authorize_action

This makes the system language-agnostic — any agent, any stack, same enforcement layer.

When Do You Actually Need This?

If your agent does any of the following, you need runtime authorization:

☐ Executes shell commands
☐ Reads or modifies files
☐ Calls external APIs
☐ Uses credentials (API keys, tokens, secrets)
☐ Runs in a production or staging environment

Rule of thumb: if it can affect real systems, it needs enforceable controls.

Try It

pip install canopy-runtime

from canopy import authorize_action

result = authorize_action(
    agent_ctx={"env": "production"},
    action_type="call_external_api",
    action_payload={"url": "https://api.stripe.com/v1/charges"},
)

print(result["decision"])

Then check audit.log. That's where the real value shows up.

Final Thought

AI agents are moving from assistants to actors.

Once they can take actions — not just generate text — the risk profile changes completely. Smarter models help, but they're not the answer to this problem.

We need enforceable boundaries at runtime, not just well-crafted prompts.

How are you handling runtime safety in your agent stack today? Drop a comment — genuinely curious what approaches people are using.