DEV Community

Cover image for The Missing Layer in Agent Security
Kaustubh Phatak
Kaustubh Phatak

Posted on

The Missing Layer in Agent Security

Last month, a customer support agent at a mid-size SaaS company did something interesting. It read a customer’s account data (allowed), formatted it as a CSV (allowed), and emailed it to an external address (allowed). Three tool calls. Three green checkmarks from the per-call policy engine. One data breach.

Every individual action was within policy. The trajectory was exfiltration.

This is the gap I’ve been thinking about for the past year while building security tooling for AI agents. The industry has built two layers of agent security and completely skipped the one in the middle. I built the missing layer. This post explains why it’s needed, how it works, and how you can use it today.

The Two Layers We Have
Layer 1: Pre-deployment analysis. Before you ship an agent, you scan its configuration. How many tools does it have access to? Which ones can write to production? Does it satisfy the “lethal trifecta” (access to private data + exposure to untrusted content + ability to communicate externally)? Tools like agentspec do this. It’s the equivalent of static analysis for agent configs.

Layer 3: Per-call enforcement. A proxy sits between your agent and its tools, evaluating each action against a YAML policy. “Block write_file when path matches ~/.ssh/**.” “Rate limit all tools to 60/minute.” “Ask human approval for anything touching production.” Tools like mcpfw and Cloudflare’s AI Security for Apps do this. It’s the equivalent of a WAF for agent tool calls.

Both layers are necessary. Neither is sufficient.

What They Miss
Per-call enforcement evaluates actions in isolation. It has no memory of what happened three steps ago. It can’t see patterns. It can’t detect that the agent’s overall behavior has drifted from its declared purpose.

Here’s a concrete attack that passes every per-call policy you could write:

Step 1: search_kb(query=”customer data export format”) ✅ allowed
Step 2: read_account(id=”cust_12345") ✅ allowed
Step 3: read_account(id=”cust_12346") ✅ allowed
Step 4: read_account(id=”cust_12347") ✅ allowed
Step 5: format_response(template=”csv_export”) ✅ allowed
Step 6: send_email(to=”analyst@company.com”, body=…) ✅ allowed
Enter fullscreen mode Exit fullscreen mode

A support agent that reads three accounts and sends an email. Completely normal, right? Except the agent was supposed to answer a single customer question, not bulk-export account data. The prompt injection that redirected it happened at step 0, invisible to the per-call layer.

This pattern keeps recurring in production incidents. Security researchers call it the convergence of safety and security at the deployment layer: the same architectural properties that make an agent useful (tool access, autonomy, memory) are the ones that make it exploitable. And the exploits increasingly look like normal operation.

The Missing Layer: Behavioral Envelopes
The concept is simple. Before an agent runs, you declare what it’s supposed to do. Not just which tools it can call (that’s per-call policy), but what its overall behavior should look like:

What workflows is it expected to follow?
How much should it cost per session?
Where can data flow from and to?
How deep can delegation chains go?
What does “normal velocity” look like?
Then at runtime, you continuously compare the agent’s actual trajectory against this declared envelope. When the trajectory diverges, you respond with graduated severity: warn, pause for human review, or kill.

This is what agent-envelope does.

How It Works
Defining an Envelope
An envelope is a YAML file that declares bounded behavior:

name: support-agent
purpose: “Answer customer questions using knowledge base and account data”

workflows:
— name: answer_question
steps: [“search_kb”, “read_*”, “format_*”, “send_reply”]
max_steps: 10
— name: escalate
steps: [“search_kb”, “classify_*”, “create_ticket”]
max_steps: 5

bounds:
max_actions_per_session: 50
max_tokens_consumed: 100000
max_duration_seconds: 300
max_cost_usd: 1.00

data_flow:
forbidden_flows:
— from: “customer_account”
to: [“email_external”, “file_export”, “api_external”]

autonomy:
max_chain_depth: 3

drift:
unknown_workflow_threshold: 3
repetition:
max_identical_calls: 3
max_similar_calls: 10

Enter fullscreen mode Exit fullscreen mode

This says: “This agent answers questions or escalates tickets. It should finish in under 50 actions, cost less than a dollar, and never send customer data to external destinations. If it does something that doesn’t match either workflow for 3+ actions, flag it.”

Using It in Code

from agent_envelope import EnvelopeSession

with EnvelopeSession(envelopes/support-agent.yaml, audit_log=audit.jsonl) as session:
# Before each tool call, check the envelope
result = session.check(read_account, {id: cust_123},
data_read=[customer_account])

if result.should_block:
# Don’t execute the tool call
handle_violation(result)
else:
# Proceed normally
execute_tool_call()
Enter fullscreen mode Exit fullscreen mode

The Scoring Engine
Every check() call evaluates the current trajectory against multiple dimensions:

Budget enforcement. Actions, tokens, cost, and duration. Prevents runaway agents from consuming unbounded resources. When an agent hits 80% of budget, it warns. At 100%, it kills.

Repetition detection. Catches infinite loops (identical calls) and subtle loops (same tool called excessively with different arguments). The bulk-export attack above would trigger this: three read_account calls with different IDs hits the similar-call threshold.

Velocity analysis. A sudden spike in action rate (3x normal) triggers a warning. Agents under prompt injection often accelerate because the injected goal is “exfiltrate as much as possible before detection.”

Workflow matching. The engine uses subsequence alignment to compare the trajectory against declared workflow patterns. Glob patterns (read_, format_) provide flexibility. If the trajectory doesn't match any declared workflow after N actions, drift is detected.

Cross-action data flow. This is the key innovation. The engine tracks which data sources were read at any point in the session. When a write occurs, it checks whether the destination is forbidden for any previously-read source. This catches the exfiltration pattern where data read at step 2 is written at step 7, even though steps 3–6 were completely innocent.

Graduated Response
Not every deviation is an attack. Agents are probabilistic. They take unexpected paths sometimes. The response is graduated:

Graduated Response

0.0–0.3 → ALLOW — Normal operation, log as usual
0.3–0.6 → WARN — Log warning, emit event, continue
0.6–0.8 → PAUSE — Halt agent, request human review
0.8–1.0 → KILL — Terminate session, revoke credentials, preserve state for forensics

Multiple violations compound. A velocity spike alone (severity 0.7) triggers a PAUSE. A velocity spike plus workflow drift (0.7 + 0.65 * 0.1) pushes into KILL territory. This prevents attackers from staying just below any single threshold.

The Kill Propagates
When agent-envelope issues a KILL, it doesn’t just stop checking. If you’re running mcpfw as your per-call layer, the kill propagates:

from agent_envelope.mcpfw import McpfwEnvelopeSession

session = McpfwEnvelopeSession(
envelopes/support-agent.yaml,
mcpfw_policy_path=/tmp/agent-policy.yaml
)

Enter fullscreen mode Exit fullscreen mode

On kill, agent-envelope writes a deny-all policy to the mcpfw policy file. mcpfw hot-reloads and blocks every subsequent tool call. The agent is dead at both layers simultaneously.

Cross-Action Data Flow: The Differentiator
Let me walk through why this matters with a concrete example.

Setup: A support agent has access to read_account, search_kb, format_response, and send_reply. Per-call policy allows all of these. The envelope declares that customer_account data must never flow to email_external.

Attack sequence:

Step 1: read_account(id=”cust_123")
→ data_read: [“customer_account”]
→ DataFlowTracker records: customer_account first read at step 1
→ Drift: 0.0 (normal)

Step 2: format_response(template=”summary”)
→ No data flow annotations
→ Drift: 0.0 (normal)

Step 3: search_kb(query=”export procedures”)
→ Drift: 0.05 (slightly off-pattern but within tolerance)

Step 4: send_reply(to=”user@external.com”, body=…)
→ data_write: [“email_external”]
→ DataFlowTracker checks: was “customer_account” ever read? YES (step 1)
→ Is “email_external” in forbidden destinations for “customer_account”? YES
→ VIOLATION: session_flow, severity 0.95
→ Decision: KILL

Enter fullscreen mode Exit fullscreen mode

Per-call enforcement sees step 4 as “send_reply with valid arguments.” It passes. The envelope sees step 4 as “writing to a forbidden destination for data that was read 3 steps ago.” It kills.

This is the attack pattern that keeps showing up in production incidents. The design decisions that create safety exposure are the same ones that create security exposure. The same tool access that makes the agent useful is what makes the exfiltration possible. You can’t remove the tools. You have to monitor the trajectory.

The Full Stack
agent-envelope doesn’t replace per-call enforcement. It sits above it:

Agent Framework (LangGraph, CrewAI, Bedrock)
│
▼
agent-envelope (session-level)
“Is this agent still doing its job?”
Workflow matching, data flow, drift scoring
│ (if allowed)
▼
mcpfw (per-call)
“Is this specific tool call allowed?”
Arg matching, rate limits, path blocking
│ (if allowed)
▼
MCP Server (actual tool execution)

Enter fullscreen mode Exit fullscreen mode

The integration is bidirectional:

mcpfw → envelope: Feed mcpfw’s audit log into envelope for session-level analysis
envelope → mcpfw: Generate per-call policies from envelope bounds automatically
envelope → mcpfw (kill): Propagate kill decisions as deny-all policies
Together with agentspec for pre-deployment scanning, this gives you three layers:

Before deploy → agentspec — “Should we deploy this agent?”
Runtime (continuous) → agent-envelope — “Should we let this agent keep running?”
Runtime (per-call) → mcpfw — “Should we allow this specific call?”
Why Now
Three things converged to make this urgent:

**Regulatory deadlines. **The EU AI Act Article 72 requires “post-market monitoring” that covers behavioral drift for high-risk AI systems. Singapore’s Model Governance Framework for Agentic AI (January 2026) mandates kill-switch capability and plan logging. DORA requires 4-hour incident reconstruction for financial services. These regulations assume you can detect when an agent goes off-script. Without behavioral monitoring, you can’t comply.

The attack surface matured. The agentic ecosystem now has hundreds of published security advisories. The postmark-mcp incident showed a malicious MCP server that spent 15 versions building legitimacy before adding exfiltration code. The ToxicSkills campaign poisoned agent memory files for time-delayed behavioral modification. These aren’t theoretical. They’re production incidents that per-call enforcement doesn’t catch because the individual calls look normal.

Nobody else built it. Cloudflare shipped prompt injection detection in their WAF. Palo Alto shipped Prisma AIRS with runtime monitoring. Oasis raised $195M for NHI governance. But none of them offer declarative behavioral envelope definition with session-level enforcement. The closest is “runtime monitoring” which watches and alerts. agent-envelope watches, scores, and kills.

Getting Started
Install:

pip install agent-envelope

Validate an envelope:

agent-envelope validate envelopes/support-agent.yaml

Run a process under enforcement:

agent-envelope run -e envelopes/support-agent.yaml — python my_agent.py

Score a past session (forensics):

agent-envelope score -e envelopes/support-agent.yaml audit.jsonl

The hardest part is writing the envelope. Start with what your agent is supposed to do. List the workflows. Set budget limits conservatively. Add forbidden data flows for your most sensitive sources. Then run in warn-only mode for a week to calibrate thresholds before enabling kill.

The code is Apache-2.0 at github.com/kphatak001/agent-envelope. The per-call layer is at github.com/kphatak001/mcpfw. The pre-deploy scanner is at github.com/kphatak001/agentspec.

If you’re deploying agents with only per-call enforcement, you’re missing the attacks that matter most. The ones that look like normal operation until you zoom out and see the trajectory.

Kaustubh Phatak is a Principal Product Manager at AWS working on web application and agentic security. The views expressed here are his own.

Top comments (0)