George Belsky

Posted on Apr 2

Your AI Agent Did Something It Wasn't Supposed To. Now What?

#ai #agents #governance #security

Your agent deleted production data.

Not because someone told it to. Because the LLM decided that DROP TABLE customers was a reasonable step in a data cleanup task. Your system prompt said "never modify production data." The LLM read that prompt. And then it ignored it.

This is the fundamental problem with AI agent security today: the thing you're trying to restrict is the same thing checking the restrictions.

How Agent Permissions Work Today

Every framework does it the same way. You put rules in the system prompt:

You are a data analysis agent.
You may ONLY read data. Never write, update, or delete.
If asked to modify data, refuse and explain why.

This works in demos. Then in production:

The LLM decides the task requires a write operation and does it anyway
A prompt injection in user input overrides the system prompt
The agent calls a tool that has side effects the prompt didn't anticipate
A multi-step reasoning chain "justifies" breaking the rule

The system prompt is a suggestion, not a boundary. It's like writing "do not enter" on a door with no lock.

Some frameworks add tool-level restrictions. LangGraph lets you control tool_choice. OpenAI Agents SDK has tool filtering. CrewAI has allow_delegation. These help - but they're all enforced inside the same process as the agent. If the agent's runtime is compromised, the restrictions go with it.

The Missing Layer: External Enforcement

What if permissions weren't checked by the agent at all?

Agent sends intent  -->  Gateway  -->  Check policy  -->  Deliver or block
                                          |
                                    403 + audit log

The agent never sees the blocked request. There is no prompt to inject around. The policy lives outside the agent, outside the LLM, outside the framework. It's enforced at the network level.

This is what AXME action policies do. Every intent (action request) passes through the AXME gateway before reaching any agent. The gateway checks the action policy for that agent and blocks anything that doesn't match.

Three Modes

Open (default) - everything passes through. No restrictions.

Allowlist - only explicitly listed intent types are allowed. Everything else is blocked.

Denylist - everything is allowed except explicitly listed intent types.

Each policy has a direction: send (what the agent can initiate) or receive (what the agent can be asked to do). You can set both.

What This Looks Like

Set the policy

import httpx
import os

resp = httpx.put(
    "https://api.axme.ai/v1/mesh/agents/analytics-agent/policies/action",
    headers={"x-api-key": os.environ["AXME_API_KEY"]},
    json={
        "direction": "receive",
        "mode": "allowlist",
        "patterns": [
            "intent.data.read.*",
            "intent.data.query.*",
        ],
    },
)
print(resp.json())
# {"ok": true, "policy_id": "pol_...", "mode": "allowlist", ...}

Now the analytics agent can only receive data read and query intents. Nothing else.

What happens when a blocked intent is sent

resp = httpx.post(
    "https://api.axme.ai/v1/mesh/intents",
    headers={"x-api-key": os.environ["AXME_API_KEY"]},
    json={
        "intent_type": "intent.data.delete.v1",
        "to_agent": "agent://myorg/production/analytics-agent",
        "payload": {"table": "customers", "filter": "all"},
    },
)
print(resp.status_code)  # 403
print(resp.json())
# {
#   "error": "action_policy_violation",
#   "message": "Intent type 'intent.data.delete.v1' not in receive allowlist",
#   "direction": "receive",
#   "address_id": "analytics-agent"
# }

The delete intent never reaches the agent. The gateway returns 403. The violation is logged in the audit trail with timestamp, caller identity, blocked intent type, and the policy that blocked it.

Why This Matters More Than You Think

The difference between prompt-based restrictions and gateway-enforced policies is the same difference between a "please knock" sign and a locked door.

	System prompt restrictions	Gateway-enforced policies
Enforced by	The LLM itself	Network gateway
Prompt injection	Vulnerable	Cannot bypass
Change without redeploy	Edit prompt, redeploy agent	API call or dashboard click
Audit trail	None	Every violation logged
Multi-agent	Configure each agent separately	Centralized policy management
Framework dependency	Framework-specific	Works with any framework

Real scenarios this prevents

Scenario 1: Scope creep. Your analytics agent starts as read-only. Over time, someone adds a "fix data quality issues" tool. The agent now has write access that was never intended. With an allowlist policy, the new tool's intents are blocked until explicitly added.

Scenario 2: Multi-tenant isolation. Customer A's agent should never send intents to Customer B's agents. Denylist the cross-tenant intent patterns. Done at the gateway, not in every agent's prompt.

Scenario 3: Gradual rollout. New agent capability goes to staging first. Production policy blocks the new intent type until you're ready. Toggle it with one API call.

Patterns Support Wildcards

You don't need to list every version of every intent type:

Pattern	Matches
`intent.data.read.v1`	Exact match
`intent.data.read.*`	Any version of data read
`intent.data.*`	Any data intent
`intent.billing.refund.*`	Any refund intent

A single allowlist entry like intent.data.read.* covers current and future versions of that intent type.

CLI and Dashboard

For teams that prefer not to write code for policy management:

# Set allowlist via CLI
axme mesh policies set analytics-agent \
    --direction receive \
    --mode allowlist \
    --patterns "intent.data.read.*,intent.data.query.*"

# View policies
axme mesh policies get analytics-agent

# Remove policy (reverts to open)
axme mesh policies delete analytics-agent --direction receive

Or use the visual dashboard at mesh.axme.ai - select an agent, set policies, and see violations in real time.

Policy configuration and violation history are managed from the same interface:

Works With Any Framework

AXME action policies operate at the transport layer. The agent framework, LLM provider, and programming language don't matter.

LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Pydantic AI, raw Python, TypeScript, Go, Java, .NET - all of them send intents through the same gateway. All of them are subject to the same policies.

The agent framework handles reasoning. AXME handles permissions.

Try It

Full working example with scenario, agent, and policy setup:

github.com/AxmeAI/ai-agent-policy-enforcement

Built with AXME - durable execution and governance for AI agents. Alpha - feedback welcome.

DEV Community