DEV Community

Cover image for Your AI Agent Just Deleted Something It Shouldn't Have. Here's How to Prevent It.
Mavericksantander
Mavericksantander

Posted on

Your AI Agent Just Deleted Something It Shouldn't Have. Here's How to Prevent It.

You gave your agent access to the filesystem. It was supposed to clean up temp files. Instead, it deleted something important.

Or maybe it called an external API with production credentials when you only meant to test it. Or executed a shell command that made sense in isolation but was catastrophic in context.

These aren't hypotheticals. They're the kinds of failures that happen when we give agents power without governance.

Today I want to show you a small library I built to solve exactly this: Canopy Runtime — a minimal agent safety runtime that adds ALLOW / DENY / REQUIRE_APPROVAL decisions to any action your agent wants to take, with a tamper-evident audit trail.


The Core Problem

When you build an autonomous agent, you typically think about:

  • What model to use
  • What tools to give it
  • What the system prompt should say

What most developers don't think about until it's too late:

What happens when the agent does something it's allowed to do... but shouldn't?

The model isn't broken. The tool works as designed. But the action — in this context, with this payload, at this moment — should have been blocked or at least reviewed.

This is a policy problem, not a model problem. And it needs a runtime solution.


Enter Canopy: One Function, Three Outcomes

Canopy is built around a single primitive:

authorize_action(agent_ctx, action_type, action_payload)
Enter fullscreen mode Exit fullscreen mode

It returns one of three decisions:

  • ALLOW — proceed
  • DENY — blocked, with a reason
  • REQUIRE_APPROVAL — pause and wait for human sign-off

Every decision is appended to a hash-chain audit log — meaning each entry is cryptographically linked to the previous one. You can't quietly edit history.


Installation

pip install canopy-runtime
Enter fullscreen mode Exit fullscreen mode

That's it. No server to spin up, no config required to get started.


A Real Example

Let's say your agent needs to execute a shell command. Without Canopy:

import subprocess
subprocess.run(command)  # 🙈 fingers crossed
Enter fullscreen mode Exit fullscreen mode

With Canopy:

from canopy import authorize_action

result = authorize_action(
    agent_ctx={"env": "production"},
    action_type="execute_shell",
    action_payload={"command": "rm -rf /tmp/logs"},
)

if result["decision"] == "ALLOW":
    subprocess.run(command)
elif result["decision"] == "DENY":
    print(f"Blocked: {result['reason']}")
elif result["decision"] == "REQUIRE_APPROVAL":
    # notify a human, pause the workflow, log and wait
    request_human_approval(result)
Enter fullscreen mode Exit fullscreen mode

The default policy ships with sensible conservative rules out of the box:

  • execute_shell: destructive patterns (rm -rf, mkfs, etc.) → DENY. Network/install commands → REQUIRE_APPROVAL.
  • modify_file: protected system paths → DENY. Safe paths you allowlist → ALLOW.
  • call_external_api: always REQUIRE_APPROVAL unless you explicitly loosen the policy.

The Audit Log

Every call to authorize_action writes a line to audit.log (path configurable via CANOPY_AUDIT_LOG_PATH). Each entry is chained to the previous one:

{"ts": "2026-03-16T10:22:01Z", "action_type": "execute_shell", "decision": "DENY", "reason": "matches destructive pattern", "avid": "c3f2a1...", "prev_hash": "9e4b12..."}
Enter fullscreen mode Exit fullscreen mode

The prev_hash field means any tampering with the log is immediately detectable. This matters when you need to audit what an agent did — and why — after something goes wrong.


Custom Policies

The default policy lives in src/canopy/policies/default.yaml. You can override it completely:

CANOPY_POLICY_FILE=/path/to/my-policy.yaml python my_agent.py
Enter fullscreen mode Exit fullscreen mode

Your policy file lets you define rules per action type, with pattern matching on the payload and conditions based on agent_ctx (like environment, safe paths, capabilities).


Optional HTTP Gateway

If you want to run Canopy as a sidecar service that multiple agents call over HTTP:

pip install canopy-runtime[gateway]
CANOPY_AUDIT_LOG_PATH=/tmp/canopy_audit.log python -m uvicorn canopy.service:app --port 8010
Enter fullscreen mode Exit fullscreen mode

Now any agent — regardless of language or framework — can POST /authorize_action and get a decision back.


Why Not Just Use a System Prompt?

A fair question. The answer is: defense in depth.

System prompts are great for guiding behavior. But they're not guarantees. Models hallucinate. Prompt injection is real. Context windows have limits. A runtime check is a hard boundary that doesn't care about what the model "understood."

Think of it like seatbelts and airbags. The prompt is driver training. Canopy is the safety hardware.


What's Next

Canopy is the minimal safety primitive. If you need the full governance layer — agent identity, JWT auth, reputation scoring, a registry of verified agents, multi-agent communication, and an observability dashboard — that's what I'm building in Apex Protocol.

Canopy is the right place to start: zero infrastructure, one function call, immediate value.


Try It

pip install canopy-runtime
Enter fullscreen mode Exit fullscreen mode
from canopy import authorize_action

decision = authorize_action(
    agent_ctx={"env": "production"},
    action_type="call_external_api",
    action_payload={"url": "https://api.stripe.com/v1/charges"},
)
print(decision["decision"])  # REQUIRE_APPROVAL
Enter fullscreen mode Exit fullscreen mode

Check audit.log after running. That's the whole point — every decision, traceable, tamper-evident.


GitHub: https://github.com/Mavericksantander/Canopy

If you're building agents and want to talk about safety patterns, I'm all ears in the comments.

Top comments (0)