A diagnostic framework for evaluating the security posture of AI agents that act on your behalf—covering threat layers, attack surfaces, access boundaries, and governance.
The Problem Nobody Ships a Fix For
AI agents are shipping fast. They browse the web for you, execute code, read your files, manage your calendar, send emails, and chain tool calls across services—often in a single prompt.
The convenience is real. So is the attack surface.
Most security conversations about agentic systems focus on prompt injection and call it a day. That's one vector out of many. If you're building, deploying, or even just using an agentic system, you need a broader diagnostic lens.
This post walks through four evaluation gates—a lightweight framework for assessing where your exposure actually lives when you hand operational access to a non-human agent.
Gate 1: The Five-Layer Threat Model
Every agentic interaction involves five layers. Most people only think about two of them.
| Layer | What It Is | What It Inherits |
|---|---|---|
| You | The human operator | Intent, credentials, accountability |
| The Agent | The AI system acting on your behalf | Your permissions, your context, your name |
| The Tools | APIs, file systems, browsers, databases | Your access tokens, your session, your scope |
| The Vendor | The company hosting the agent | Your telemetry, your usage patterns, your data |
| The State | Regulatory and intelligence apparatus | Whatever the vendor is compelled to provide |
The pattern: each layer inherits from the one above it, and accountability degrades at each handoff.
Practical implication: When you evaluate an agent's risk profile, you're not just evaluating the model. You're evaluating the entire stack—including the organizational and jurisdictional layers you never configured.
If your threat model stops at "the agent," it isn't a threat model.
Gate 2: Three Attack Surfaces That Don't Require Malice
Agentic systems don't need to be compromised to be dangerous. They just need to be under-scoped.
1. Indirect Prompt Injection
You already know this one: untrusted content (emails, web pages, documents) contains instructions the agent interprets as its own. The agent can't reliably distinguish your intent from an attacker's payload embedded in the data it's processing.
This is well-documented. What's less discussed is that every tool that ingests unstructured input is an injection surface. The more tools your agent chains, the more entry points exist.
2. Tool Misuse via Ambient Access
A tool doesn't have to be exploited to leak information. It just has to be over-permissioned.
- A calendar tool reveals your schedule and meeting participants.
- A file search tool reveals your directory structure and naming conventions.
- A browser tool with session access reveals your authenticated services.
Each "read" operation is also a reconnaissance operation if the results are observable by the agent (and, by extension, by whoever processes the agent's outputs and logs).
3. Silent Enumeration
This is the one most teams aren't tracking.
Silent enumeration is the agent passively mapping your environment as a side effect of normal operation—indexing files, discovering contacts, cataloging services, learning patterns—without producing any visible output that would trigger monitoring.
It's not exfiltration. It's pre-exfiltration. It turns an opaque environment into a legible target. And because it generates no anomalous behavior (the agent is "just" doing its job), most logging and alerting pipelines won't flag it.
If you're building an agent that has filesystem or network access, ask yourself: what does the agent now know about the environment that it didn't need to know to complete the task?
Gate 3: Boundary Hygiene—Five Access Constraints
This is the implementation layer. If you're integrating an agent into any workflow, these are your non-negotiables:
1. No ambient privileges.
Grant minimum access for the specific task. Revoke when the task completes. Don't leave tokens, sessions, or permissions open between invocations.
2. No open-ended access.
"Read this file" ≠ "access the filesystem." Scope every tool call. If your agent framework doesn't support granular scoping, that's a design gap, not a feature.
3. No silent execution.
Every action the agent takes should be logged in a format the operator can audit. If you can't reconstruct the agent's action sequence after the fact, your observability is insufficient.
4. No irreversible actions without confirmation.
Emails sent, files deleted, code committed, purchases executed—these are one-way doors. Require human-in-the-loop confirmation for anything that can't be undone.
5. No over-scoped tools.
If the task is "schedule a meeting," the agent doesn't need access to your documents, contacts, and browsing history. If your tooling bundles those permissions together, you've traded granularity for convenience—and convenience is the most common path to exposure.
The common failure mode isn't a dramatic breach. It's a thousand small permission grants, each individually reasonable, that collectively hand over a complete map of your operating environment.
Gate 4: The Governance Rubric—Four Diagnostic Questions
Before you deploy, integrate, or even evaluate an agentic system, put it through these four questions:
| Question | What You're Actually Measuring |
|---|---|
| What can it see? | The gap between what it needs to see and what it can see is your exposure surface. |
| What can it do? | Capability ≠ intent. The perimeter of what it could do with its current access is the perimeter of risk. |
| Who receives the exhaust? | Every interaction generates metadata and logs. Where does telemetry go? Who aggregates it? What inferences are possible at scale? |
| What happens when it drifts? | Models update. Policies change. Vendors get acquired. If you have no mechanism to detect behavioral drift, you have no governance. |
If the system can't answer these, it's not ready. If the vendor won't answer these, it's not yours. If you can't verify the answers independently, you're operating on trust in an environment that doesn't warrant it.
Wrapping Up
This isn't a complete security architecture—it's a diagnostic starting point. A way to ask better questions before the agent has already mapped your environment and the vendor has already aggregated your telemetry.
The four gates:
- Model all five layers, not just the agent.
- Map the three attack surfaces, especially silent enumeration.
- Enforce boundary hygiene as a discipline, not an afterthought.
- Apply the governance rubric before granting access, and again when anything changes.
The agentic era isn't coming. It's here. The question is whether you're navigating it with a diagnostic framework or just hoping the defaults are safe.
They're not.
Top comments (1)
I love how you've woven these different gates together to create a comprehensive framework for evaluating the security posture of AI agents, it's like a puzzle I'm excited to help put together. This five-layer threat model, in particular, makes me wonder about the intricacies of designing a secure system that's not just about patching vulnerabilities but anticipating potential risks from the get-go. What are some of the most pressing challenges you've encountered while building this framework, and how do you see it evolving in the future?