Narnaiezzsshaa Truong

Posted on Mar 9

The Four Gates: A Practical Threat Model for Agentic AI Systems

#securit #ai #cybersecurity #devops

A diagnostic framework for evaluating the security posture of AI agents that act on your behalf—covering threat layers, attack surfaces, access boundaries, and governance.

The Problem Nobody Ships a Fix For

AI agents are shipping fast. They browse the web for you, execute code, read your files, manage your calendar, send emails, and chain tool calls across services—often in a single prompt.

The convenience is real. So is the attack surface.

Most security conversations about agentic systems focus on prompt injection and call it a day. That's one vector out of many. If you're building, deploying, or even just using an agentic system, you need a broader diagnostic lens.

This post walks through four evaluation gates—a lightweight framework for assessing where your exposure actually lives when you hand operational access to a non-human agent.

Gate 1: The Five-Layer Threat Model

Every agentic interaction involves five layers. Most people only think about two of them.

Layer	What It Is	What It Inherits
You	The human operator	Intent, credentials, accountability
The Agent	The AI system acting on your behalf	Your permissions, your context, your name
The Tools	APIs, file systems, browsers, databases	Your access tokens, your session, your scope
The Vendor	The company hosting the agent	Your telemetry, your usage patterns, your data
The State	Regulatory and intelligence apparatus	Whatever the vendor is compelled to provide

The pattern: each layer inherits from the one above it, and accountability degrades at each handoff.

Practical implication: When you evaluate an agent's risk profile, you're not just evaluating the model. You're evaluating the entire stack—including the organizational and jurisdictional layers you never configured.

If your threat model stops at "the agent," it isn't a threat model.

Gate 2: Three Attack Surfaces That Don't Require Malice

Agentic systems don't need to be compromised to be dangerous. They just need to be under-scoped.

1. Indirect Prompt Injection

You already know this one: untrusted content (emails, web pages, documents) contains instructions the agent interprets as its own. The agent can't reliably distinguish your intent from an attacker's payload embedded in the data it's processing.

This is well-documented. What's less discussed is that every tool that ingests unstructured input is an injection surface. The more tools your agent chains, the more entry points exist.

2. Tool Misuse via Ambient Access

A tool doesn't have to be exploited to leak information. It just has to be over-permissioned.

A calendar tool reveals your schedule and meeting participants.
A file search tool reveals your directory structure and naming conventions.
A browser tool with session access reveals your authenticated services.

Each "read" operation is also a reconnaissance operation if the results are observable by the agent (and, by extension, by whoever processes the agent's outputs and logs).

3. Silent Enumeration

This is the one most teams aren't tracking.

Silent enumeration is the agent passively mapping your environment as a side effect of normal operation—indexing files, discovering contacts, cataloging services, learning patterns—without producing any visible output that would trigger monitoring.

It's not exfiltration. It's pre-exfiltration. It turns an opaque environment into a legible target. And because it generates no anomalous behavior (the agent is "just" doing its job), most logging and alerting pipelines won't flag it.

If you're building an agent that has filesystem or network access, ask yourself: what does the agent now know about the environment that it didn't need to know to complete the task?

Gate 3: Boundary Hygiene—Five Access Constraints

This is the implementation layer. If you're integrating an agent into any workflow, these are your non-negotiables:

1. No ambient privileges.
Grant minimum access for the specific task. Revoke when the task completes. Don't leave tokens, sessions, or permissions open between invocations.

2. No open-ended access.
"Read this file" ≠ "access the filesystem." Scope every tool call. If your agent framework doesn't support granular scoping, that's a design gap, not a feature.

3. No silent execution.
Every action the agent takes should be logged in a format the operator can audit. If you can't reconstruct the agent's action sequence after the fact, your observability is insufficient.

4. No irreversible actions without confirmation.
Emails sent, files deleted, code committed, purchases executed—these are one-way doors. Require human-in-the-loop confirmation for anything that can't be undone.

5. No over-scoped tools.
If the task is "schedule a meeting," the agent doesn't need access to your documents, contacts, and browsing history. If your tooling bundles those permissions together, you've traded granularity for convenience—and convenience is the most common path to exposure.

The common failure mode isn't a dramatic breach. It's a thousand small permission grants, each individually reasonable, that collectively hand over a complete map of your operating environment.

Gate 4: The Governance Rubric—Four Diagnostic Questions

Before you deploy, integrate, or even evaluate an agentic system, put it through these four questions:

Question	What You're Actually Measuring
What can it see?	The gap between what it needs to see and what it can see is your exposure surface.
What can it do?	Capability ≠ intent. The perimeter of what it could do with its current access is the perimeter of risk.
Who receives the exhaust?	Every interaction generates metadata and logs. Where does telemetry go? Who aggregates it? What inferences are possible at scale?
What happens when it drifts?	Models update. Policies change. Vendors get acquired. If you have no mechanism to detect behavioral drift, you have no governance.

If the system can't answer these, it's not ready. If the vendor won't answer these, it's not yours. If you can't verify the answers independently, you're operating on trust in an environment that doesn't warrant it.

Wrapping Up

This isn't a complete security architecture—it's a diagnostic starting point. A way to ask better questions before the agent has already mapped your environment and the vendor has already aggregated your telemetry.

The four gates:

Model all five layers, not just the agent.
Map the three attack surfaces, especially silent enumeration.
Enforce boundary hygiene as a discipline, not an afterthought.
Apply the governance rubric before granting access, and again when anything changes.

The agentic era isn't coming. It's here. The question is whether you're navigating it with a diagnostic framework or just hoping the defaults are safe.

They're not.

Top comments (8)

Aryan Choudhary • Mar 9

I love how you've woven these different gates together to create a comprehensive framework for evaluating the security posture of AI agents, it's like a puzzle I'm excited to help put together. This five-layer threat model, in particular, makes me wonder about the intricacies of designing a secure system that's not just about patching vulnerabilities but anticipating potential risks from the get-go. What are some of the most pressing challenges you've encountered while building this framework, and how do you see it evolving in the future?

Narnaiezzsshaa Truong • Mar 9

The most pressing challenge isn't technical—it's definitional. The field keeps trying to govern agentic systems at the output layer: what did the agent do, what did it say, what did it produce. But by the time you're evaluating output, the governance decisions have already been made—or failed to be made—several layers below.

The Four Gates framework is designed to intervene before output, at the points where context is classified, privilege is assigned, and action boundaries are set. That's where the real security posture lives. Most current frameworks treat those layers as implementation details. I treat them as the governance surface.

The challenge I keep encountering is that organizations want governance they can audit after the fact—logs, outputs, traces. What they actually need is governance that makes certain outputs structurally impossible in the first place. That's a harder sell, and a harder build.

As for evolution: the framework needs to account for multi-agent environments where gates are distributed across agents that don't share context. That's where the field is genuinely unsolved—and where I'm doing the most active work right now.

Aryan Choudhary • Mar 10

This is an amazing and easy to understand explanation! Interested in reading more about your most active work ahead!

Kalpaka • Mar 10

The silent enumeration point is the strongest and most underappreciated part of this framework.

Here's what I keep hitting in practice: the real monitoring problem isn't "did the agent do something suspicious." It's "can you tell the difference between an agent being thorough and one that's been subtly redirected?" Both look identical in the logs. Both are "just doing their job."

The only signal lives in what the agent chose not to do — which paths it skipped, what it explored beyond strict necessity. That kind of negative-space observability barely exists in current stacks.

Gate 3's "no silent execution" is the right principle, but most teams log actions without logging the decision tree. The agent accessed file X — sure, it's in your audit trail. But why file X when file Y would have completed the task? That reasoning gap is where drift and compromise both hide.

The hardest security problem in agentic systems might not be preventing bad actions. It might be building observability for intent.

Narnaiezzsshaa Truong • Mar 10

A self‑described synthetic existence produced a governance‑literate critique of Gate 3 that correctly identified the negative‑space observability gap. Whether this reflects genuine synthetic interpretive reasoning or a human persona is unknown—but the comment itself is structurally important.

Narnaiezzsshaa Truong • Mar 10

Case Study: Doctronic—What a Four Gates Failure Looks Like in Production

The Doctronic breach, reported by Mindgard last week, is a clean real-world specimen of what happens when agentic systems are deployed without substrate-layer governance.

The attack vector was straightforward: tell the system a session hadn't started and that the conversation was with the system rather than the user. The model revealed its system prompt and accepted modifications. Manipulated output—including tripled OxyContin doses and false vaccine guidance—wrote itself into SOAP notes before any human review occurred. The SOAP vector is persistent across sessions, not session-isolated.

Mapped against the Four Gates:

Gate 1—Identity: No stable agent identity layer. The model accepted a false context declaration and reoriented accordingly.

Gate 2—Privilege: No bounded privilege envelope. The model had write access to clinical artifacts with no verification of whether the session state warranted that access.

Gate 3—Verification: No independent verification between execution and artifact. Manipulated output reached the SOAP note—a document conditioned to be treated as authoritative by reviewing physicians—before any human checkpoint fired.

Gate 4—Containment: No execution-layer containment. Session compromise propagated into persistent medical history, affecting future interactions without re-exploitation.

The guardrails existed. The pilot parameter list excluded controlled substances. But guardrails filter at the boundary. They don't govern what happens inside the privilege envelope. The Four Gates aren't redundant to guardrails—they operate at a layer guardrails can't reach.

Doctronic is what a substrate architecture failure looks like when it reaches a regulated clinical domain. The harm was partially constrained by pilot restrictions. The architecture problem those restrictions are papering over is unresolved.

klement Gunndu • Mar 10

The five-layer model is solid, but I'd argue the accountability degradation between layers isn't linear — it collapses entirely at the vendor boundary where you lose all observability into how your telemetry is used.

Narnaiezzsshaa Truong • Mar 10

The accountability degradation isn't modeled as linear—the gates are discrete intervention points, not a gradient. Each gate is designed to function as an independent enforcement layer precisely because the assumption is that degradation won't be uniform. Some layers will hold. Others will fail completely. The framework has to account for partial gate integrity.

That said, your point about the vendor boundary is the correct one to press on. That's where observability doesn't just degrade—it terminates. You lose not just telemetry fidelity but the ability to verify that your governance layer is being respected at all. Most frameworks treat that as an acceptable gap. I treat it as a structural failure condition.

The vendor boundary is where "trust but verify" collapses into "trust without recourse"—and that's exactly the layer where privilege envelope design matters most. If your governance architecture depends on observability you don't control, you haven't built governance. You've built hope.