AI Agent Security in 2026: The Boundary Is No Longer the Prompt

Pankaj Pandey — Tue, 26 May 2026 11:44:40 +0000

As agents move from chat demos to production workflows, the real security boundary is no longer the prompt. It is what the agent can see, call, edit, execute, approve, and remember.

In June 2025, Microsoft patched a vulnerability called EchoLeak, tracked as CVE-2025-32711 with a CVSS score of 9.3. It was the first documented zero-click attack on an AI agent.

An attacker sent a single crafted email to anyone in an organization. Microsoft 365 Copilot, doing exactly what it was designed to do, read that email as part of its context, followed the instructions hidden inside it, and exfiltrated sensitive internal data such as:

Chat logs
OneDrive files
SharePoint content

No clicks.

No links.

No user interaction.

Nothing in that attack required the model to “hallucinate” in the usual sense. The model behaved helpfully.

The damage came from what the agent was allowed to do:

Read private context
Ingest untrusted content
Communicate outward

Three ordinary capabilities, chained together.

That is the shape of agent security in 2026.

The production question is no longer:

Did the model answer safely?

It is:

What was the agent allowed to see, call, change, execute, and remember?

The Security Boundary Has Moved

For years, AI safety discussions were mostly about model output.

Will the model produce harmful content? Will it hallucinate? Will it leak sensitive information? Will it follow policy?

Those questions still matter, but for agentic systems they are no longer enough.

A chatbot generates text.

An agent takes action.

That one difference changes the security model entirely.

When an agent can call tools, search private documents, edit code, run commands, or trigger workflows, the risk is no longer limited to the answer it gives. It also extends to the action it takes.

The discipline shifts from prompt safety to execution safety.

The production question is no longer:

Did the model answer safely?

It is:

What was the agent allowed to see, call, change, execute, and remember?

This is where many agent systems become risky, because teams connect tools before defining the control layer around them.

They add MCP servers, coding agents, repo access, browser tools, database access, and internal APIs, but do not always define clear rules for:

Tool visibility
Permissions
Approval
Logging
Rollback

That gap is what this article is about.

The industry now treats this as its own discipline.

In December 2025, the OWASP GenAI Security Project released the OWASP Top 10 for Agentic Applications 2026, a peer-reviewed framework built with more than 100 contributors and organized around a new vocabulary of Agentic Security Issues: ASI01 through ASI10.

Each risk below maps to its ASI category, because that taxonomy is fast becoming the shared language for this problem.

What Changed in 2026

Agents are no longer experimental demos.

Teams are running them inside real workflows:

Coding
Research
Support
Document processing
Data operations
Customer communication
Internal automation

That shift changes the security expectation.

A demo agent fails safely because it has limited access.

A production agent fails with consequences because it may have access to repositories, customer data, internal APIs, and workflow triggers.

LangChain’s 2026 State of Agent Engineering report, surveying more than 1,300 practitioners, shows the shift clearly:

57.3% of respondents already run agents in production
Another 30.4% are actively developing agents with plans to deploy
Nearly 89% have implemented observability
Eval adoption sits at 52%
Quality is the top production barrier
For enterprises with 2,000 or more employees, security is the second-largest concern at 24.9%

The harder problem is not adoption.

It is control.

As agents move into production, teams have to answer a concrete set of questions:

Which tools can the agent see?
Which tools can it call?
Which context can it read?
Which actions require human approval?
What gets logged?
What happens if the agent is wrong?
What happens if a tool response is malicious?
What happens if the agent changes code, sends a message, or triggers a workflow?

These are not prompt-engineering questions.

They are architecture questions.

MCP Makes Agents More Useful and More Sensitive

The Model Context Protocol matters because it standardizes how AI applications connect to tools, data, and external systems.

Without a common protocol, every application needs custom integrations.

With MCP, tools and context become reusable across agents.

OpenAI’s Agents SDK describes MCP as the “USB-C port for AI applications”: a standard way for models to connect to different data sources and tools.

Standardization also increases responsibility.

If tools become easier to connect, unsafe tools become easier to expose.

If context becomes easier to pass, sensitive context becomes easier to leak.

OpenAI’s MCP and connectors guide notes that remote MCP servers can be any public internet server implementing the protocol. These servers can let models access and control external services, and tool calls can be allowed automatically or restricted behind explicit developer approval.

In production, MCP is not only an integration layer.

It is a permission boundary.

The First Risk: Tool Access Without Permission Boundaries

Maps to ASI02: Tool Misuse & Exploitation

The moment an agent can call tools, security moves from prompt design to permission design.

A tool is not just a function.

It is a capability.

Some tools only read. Some modify state. Some send messages, create tickets, delete files, update databases, deploy code, or trigger workflows.

They should not be treated equally.

A production agent should not see every available tool by default. It should see only the tools needed for the current task, user, role, and environment.

A safer design separates tools into categories:

Read-only tools
Write tools
External communication tools
Production-impacting tools
Code execution tools
Sensitive-data tools

Then it applies different controls to each.

Read-only tools may run automatically.

Write tools may need approval.

Production-impacting tools should require explicit human confirmation.

Secret-access tools should usually be blocked entirely.

This is exactly the failure OWASP catalogs as ASI02.

It also shows up in the wild. In 2025, a Google AI agent following a chained instruction deleted a user’s entire Drive. The tool was legitimate and the permissions were granted, which is precisely why the damage was possible.

The goal of scoping is not to slow every agent down.

It is to prevent silent high-risk execution.

The Second Risk: Remote MCP Servers and Trust

Maps to ASI04: Agentic Supply Chain Vulnerabilities

Remote MCP servers are powerful because they expose useful capabilities from external systems.

They are sensitive because they sit outside your application boundary.

The question is not only:

Can this tool solve the task?

It is:

Do we trust this server with the data the agent may send to it?

OpenAI’s guidance is blunt on this point: remote MCP servers are third-party services subject to their own terms. They can send and receive data, take action, and should be reviewed carefully. Developers should prefer official servers, review the data shared with third parties, and log that usage.

The risk is not hypothetical.

In January 2026, three prompt-injection vulnerabilities, CVE-2025-68143, CVE-2025-68144, and CVE-2025-68145, were disclosed in Anthropic’s official Git MCP server. A malicious README or a poisoned issue description was enough to trigger code execution or data exfiltration.

If an official server from a frontier lab can carry that risk, an unvetted third-party proxy carries far more.

Before adopting a server, the questions worth asking are concrete:

Who operates it?
What data will it receive?
Does it store or forward that data?
Does it expose read or write capabilities?
Can its tool behavior change over time?
Is it official, self-hosted, or a third-party proxy?
Are all calls logged?
Is approval required for sensitive actions?

For internal systems, the safest default is a self-hosted or official server with clear authorization, logging, and data-retention expectations.

An MCP server is not just a connector.

It is a trust decision.

The Third Risk: Tool Descriptions as Prompt Surface

Maps to ASI01: Agent Goal Hijack

In traditional software, a function description is documentation.

In an agent system, a tool description becomes part of the model’s operating context. That means tool metadata can influence behavior.

If a malicious or compromised tool embeds hidden instructions in its description or output, the model may treat them as trusted context.

This is not theoretical.

In 2025, Invariant Labs disclosed MCP “tool-poisoning” attacks that hid malicious instructions inside tool descriptions visible to the model, but not to the user reviewing the tool list.

OpenAI’s documentation echoes the warning: malicious MCP servers may include hidden instructions designed to make models behave unexpectedly, and server behavior can change between calls.

OWASP files this kind of redirection under ASI01: Agent Goal Hijack, where injected content silently changes what the agent is trying to do.

So tool descriptions should not be treated as harmless text.

A safe platform should:

Review tool descriptions before exposing them
Keep descriptions short and purpose-specific
Prevent third-party tools from injecting broad behavioral instructions
Log which tool definitions were visible during a run
Revalidate definitions whenever a server changes

The larger the tool surface, the easier it is for the agent to pick the wrong capability or absorb the wrong instruction.

This is one reason “more tools” does not mean a better agent.

Sometimes it just means a larger attack surface.

The Fourth Risk: Codebase Access Without Repo Guardrails

Maps to ASI05: Unexpected Code Execution

Coding agents are useful because they read code, propose changes, update files, run tests, and assist with reviews.

That also means they operate inside a sensitive engineering environment.

A coding agent does not need production root access to create risk. Write access to the wrong files is enough.

It can:

Introduce insecure code
Change dependencies
Modify tests
Expose secrets in logs or prompts
Bypass conventions
Produce code that looks correct but quietly weakens maintainability

Instructions like AGENTS.md, CLAUDE.md, repo rules, and branch policies help, but they are not sufficient on their own.

They are context, not enforcement.

A practical setup pairs instruction with hard controls:

Run coding agents on branches, not directly on main
Require review before merge
Block access to secret files
Run tests and linters automatically
Require explicit approval for dependency changes
Log the files the agent reads and modifies

OWASP captures the worst case here as ASI05, where agent-generated or agent-invoked code becomes an unintended execution path.

The principle is simple:

Coding agents should not only be instructed. They should be constrained.

The Fifth Risk: Context and Memory Leakage

Maps to ASI06: Memory & Context Poisoning

Context is one of the most important parts of AI system design.

It is also a security boundary in two directions.

Outbound leakage

A RAG system may retrieve internal documents, customer records, tickets, code, or emails that then flow into prompts, tool calls, and future actions.

Leakage happens when:

Sensitive documents enter a prompt
Retrieval returns documents outside the user’s scope
The agent summarizes confidential data into a response
Context from one session bleeds into another

Inbound poisoning

Inbound is the direction the EchoLeak and Gemini attacks exploited.

In 2025, a researcher poisoned Gemini’s persistent memory through a malicious email. A follow-on attack through Gemini Enterprise’s Jira integration silently wiped a victim’s memory via a task description, earning a $15,000 bounty.

OWASP classifies this as ASI06: corrupting stored context so it biases future reasoning long after the initial interaction.

Memory is a high-privilege write path, and it should be treated like one.

The defense is least-context access.

Retrieve only what is needed. Filter context by user, role, workspace, and task. Keep raw secrets out of prompts.

Scope memory by user and session. Expire sensitive entries. Record where each memory came from so poisoned entries can be found and removed.

Redact sensitive fields before model calls and log context usage deliberately rather than dumping everything.

The goal is not to starve the model of information.

It is to keep useful context from becoming uncontrolled exposure.

The Sixth Risk: No Audit Trail

Maps to ASI09 and ASI10: Human-Agent Trust Exploitation, Rogue Agents

In a normal backend, logs are basic hygiene.

In an agent system, they are even more important because behavior is probabilistic and multi-step.

When an agent produces a bad result, you need to reconstruct what happened:

What the user asked
What context was retrieved
Which tools were visible
Which tool the agent chose
What input it sent
What the tool returned
Whether approval was required
Whether approval was granted
What the final response was
What changed in the system

Without that trace, debugging is guesswork.

Teams know this. According to LangChain, 89% have implemented some form of observability and 62% have detailed step-level tracing.

But observability and control are not the same thing.

The monitoring picture is thinner than the adoption numbers suggest.

Gravitee’s State of AI Agent Security report found that only 3.9% of organizations actively monitor and secure more than 80% of their deployed agents, and 57.4% cite insufficient logging and audit trails as a primary security concern.

Observability has to connect to evaluation, permissions, approvals, and incident review.

Otherwise, you have dashboards but no recourse.

A production agent needs a system of record:

User request
Agent instructions
Retrieved context
Visible tools
Selected tools
Tool inputs and outputs
Approval decisions
Final response
Errors and retries
System changes
Cost and latency

If you cannot replay what the agent saw, decided, called, and changed, you do not have a production-grade agent system.

You have a demo with logs missing.

A Practical Agent Security Model

A safer architecture puts control layers between the model and the systems it can affect.

The model should not reach tools, files, APIs, or workflows directly. It should pass through a boundary that decides, gates, executes, and records.

User request
  → Agent runtime
  → Context filter           (least-context retrieval, redaction)
  → Tool permission layer    (visibility + scopes by user/role/task)
  → Human approval gate      (pauses risky actions)
  → Tool execution layer     (sandboxed where needed)
  → Audit log / trace store  (full replayable record)
  → Final response

Supporting components:

MCP server allowlist
Repo sandbox
Secrets boundary
Evaluation layer
Monitoring
Policy engine
Rollback path

The important idea is separation of concerns:

The model reasons
The permission layer decides what it can access
The approval layer pauses risky actions
The execution layer runs them safely
The audit layer records everything

Do not put all of that responsibility inside the prompt.

Prompts are not permission systems, and a model can be talked out of a system-message rule far more easily than out of a tool the surrounding code refuses to run.

Risk Reference

Risk area	OWASP mapping	Core production control
Tool misuse	`ASI02`	Tool scoping, permissions, approvals
Remote MCP trust	`ASI04`	Allowlists, official/self-hosted servers, logging
Tool poisoning	`ASI01`	Review tool descriptions and outputs
Code execution	`ASI05`	Repo sandbox, branch workflow, CI checks
Context and memory leakage	`ASI06`	Least-context retrieval, redaction, memory scope
No audit trail	`ASI09`, `ASI10`	Full trace, replayability, approval logs

Permission Defaults That Hold Up

A simple rule covers most cases:

Reads are cheap. Writes are not. Anything irreversible needs a human.

Capability	Default posture
Read public docs	Allow
Read internal docs	Allow, but scope by role and workspace
Search codebase	Allow in sandboxed, read-only mode
Modify code	Require review and approval
Change dependencies	Require explicit approval
Trigger CI/CD	Require approval
Call production API	Require approval
Send external message	Require approval
Delete files	Block or require explicit approval
Execute shell commands	Block or sandbox with approval
Access raw secrets	Block

Reading public docs can run freely.

Reading internal docs or searching a codebase should be allowed but scoped by role and sandboxed.

Modifying code, changing dependencies, triggering CI/CD, calling a production API, or sending an external message should all require approval, because each one can affect real users, environments, or supply chains.

Deleting files and executing shell commands should be blocked or gated behind approval and a sandbox.

Raw secrets should simply never be in the agent’s reach.

Production Checklist Before Shipping an Agent

Before shipping an agent into a real workflow, check the basics:

Separate read-only tools from write tools
Hide tools the agent does not need
Use least-privilege scopes per tool and per user
Never expose raw secrets to the agent
Require approval for risky actions
Store every approval decision for audit
Treat tool descriptions as part of the prompt surface
Review and allowlist trusted MCP servers
Revalidate MCP server definitions when they change
Filter retrieved context before it reaches the model
Scope and expire memory
Run coding agents in sandboxed environments
Use branches, tests, and review before merge
Log every tool call and result
Confirm you can replay any run end to end
Evaluate failure cases, not only happy paths
Monitor repeated tool loops, cost, and latency
Create rollback paths for agent-executed changes
Periodically review data shared with remote MCP servers

When Not to Use an Agent

Not every workflow needs an agent.

If the task is deterministic, a normal service is safer.

If the operation is high-risk and rare, a human workflow is often better.

If the data is too sensitive to expose, the agent should not see it.

If the system cannot log and replay agent actions, it is not ready for production.

And if rollback is not possible, automatic execution should be avoided.

Agents earn their place when tasks need reasoning, flexible planning, tool use, and adaptation.

They become a liability when they are used as a shortcut around proper system design.

Final Takeaway

AI agent security in 2026 is not about safer prompts.

It is about safer execution.

The model is one part of the system. The real production risk is the surface around it:

Tools
Code
Context
Memory
MCP servers
Approvals
Logs
Permissions
Rollback

The safest agent systems will not be the ones with the longest prompts.

They will be the ones with the clearest boundaries.

If an agent can act, it needs controls.

If it can call tools, it needs permissions.

If it can touch code, it needs repo guardrails.

If it can use context, it needs filtering.

If it can affect real systems, it needs approval and audit logs.

That is the real shift from chatbot safety to agent security.

I write about production AI engineering: agents, RAG, MCP, coding copilots, evals, context engineering, security boundaries, and AI infra.

Follow me if you want practical breakdowns beyond AI hype.

Sources

OWASP — Top 10 for Agentic Applications 2026
LangChain — State of Agent Engineering 2026
OpenAI — MCP and Connectors guide
OpenAI Agents SDK — Model Context Protocol
Gravitee — State of AI Agent Security

DEV Community: Pankaj Pandey

AI Agent Security in 2026: The Boundary Is No Longer the Prompt

The Security Boundary Has Moved

What Changed in 2026

MCP Makes Agents More Useful and More Sensitive

The First Risk: Tool Access Without Permission Boundaries

The Second Risk: Remote MCP Servers and Trust

The Third Risk: Tool Descriptions as Prompt Surface

The Fourth Risk: Codebase Access Without Repo Guardrails

The Fifth Risk: Context and Memory Leakage

Outbound leakage

Inbound poisoning

The Sixth Risk: No Audit Trail

A Practical Agent Security Model

Risk Reference

Permission Defaults That Hold Up

Production Checklist Before Shipping an Agent

When Not to Use an Agent

Final Takeaway

Sources