<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pankaj Pandey</title>
    <description>The latest articles on DEV Community by Pankaj Pandey (@pankaj_km_pandey).</description>
    <link>https://dev.to/pankaj_km_pandey</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3952472%2Ffe4fc09b-d91b-48a0-9efd-4c41322e3d2c.png</url>
      <title>DEV Community: Pankaj Pandey</title>
      <link>https://dev.to/pankaj_km_pandey</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pankaj_km_pandey"/>
    <language>en</language>
    <item>
      <title>AI Agent Security in 2026: The Boundary Is No Longer the Prompt</title>
      <dc:creator>Pankaj Pandey</dc:creator>
      <pubDate>Tue, 26 May 2026 11:44:40 +0000</pubDate>
      <link>https://dev.to/pankaj_km_pandey/ai-agent-security-in-2026-the-boundary-is-no-longer-the-prompt-11cd</link>
      <guid>https://dev.to/pankaj_km_pandey/ai-agent-security-in-2026-the-boundary-is-no-longer-the-prompt-11cd</guid>
      <description>&lt;p&gt;&lt;em&gt;As agents move from chat demos to production workflows, the real security boundary is no longer the prompt. It is what the agent can see, call, edit, execute, approve, and remember.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In June 2025, Microsoft patched a vulnerability called EchoLeak, tracked as &lt;code&gt;CVE-2025-32711&lt;/code&gt; with a CVSS score of 9.3. It was the first documented zero-click attack on an AI agent.&lt;/p&gt;

&lt;p&gt;An attacker sent a single crafted email to anyone in an organization. Microsoft 365 Copilot, doing exactly what it was designed to do, read that email as part of its context, followed the instructions hidden inside it, and exfiltrated sensitive internal data such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat logs&lt;/li&gt;
&lt;li&gt;OneDrive files&lt;/li&gt;
&lt;li&gt;SharePoint content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No clicks.&lt;br&gt;&lt;br&gt;
No links.&lt;br&gt;&lt;br&gt;
No user interaction.&lt;/p&gt;

&lt;p&gt;Nothing in that attack required the model to “hallucinate” in the usual sense. The model behaved helpfully.&lt;/p&gt;

&lt;p&gt;The damage came from what the agent was allowed to do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read private context&lt;/li&gt;
&lt;li&gt;Ingest untrusted content&lt;/li&gt;
&lt;li&gt;Communicate outward&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three ordinary capabilities, chained together.&lt;/p&gt;

&lt;p&gt;That is the shape of agent security in 2026.&lt;/p&gt;

&lt;p&gt;The production question is no longer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did the model answer safely?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What was the agent allowed to see, call, change, execute, and remember?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The Security Boundary Has Moved
&lt;/h2&gt;

&lt;p&gt;For years, AI safety discussions were mostly about model output.&lt;/p&gt;

&lt;p&gt;Will the model produce harmful content? Will it hallucinate? Will it leak sensitive information? Will it follow policy?&lt;/p&gt;

&lt;p&gt;Those questions still matter, but for agentic systems they are no longer enough.&lt;/p&gt;

&lt;p&gt;A chatbot generates text.&lt;br&gt;&lt;br&gt;
An agent takes action.&lt;/p&gt;

&lt;p&gt;That one difference changes the security model entirely.&lt;/p&gt;

&lt;p&gt;When an agent can call tools, search private documents, edit code, run commands, or trigger workflows, the risk is no longer limited to the answer it gives. It also extends to the action it takes.&lt;/p&gt;

&lt;p&gt;The discipline shifts from &lt;strong&gt;prompt safety&lt;/strong&gt; to &lt;strong&gt;execution safety&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The production question is no longer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did the model answer safely?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What was the agent allowed to see, call, change, execute, and remember?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where many agent systems become risky, because teams connect tools before defining the control layer around them.&lt;/p&gt;

&lt;p&gt;They add MCP servers, coding agents, repo access, browser tools, database access, and internal APIs, but do not always define clear rules for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool visibility&lt;/li&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Approval&lt;/li&gt;
&lt;li&gt;Logging&lt;/li&gt;
&lt;li&gt;Rollback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gap is what this article is about.&lt;/p&gt;

&lt;p&gt;The industry now treats this as its own discipline.&lt;/p&gt;

&lt;p&gt;In December 2025, the OWASP GenAI Security Project released the &lt;strong&gt;OWASP Top 10 for Agentic Applications 2026&lt;/strong&gt;, a peer-reviewed framework built with more than 100 contributors and organized around a new vocabulary of Agentic Security Issues: &lt;code&gt;ASI01&lt;/code&gt; through &lt;code&gt;ASI10&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each risk below maps to its ASI category, because that taxonomy is fast becoming the shared language for this problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Changed in 2026
&lt;/h2&gt;

&lt;p&gt;Agents are no longer experimental demos.&lt;/p&gt;

&lt;p&gt;Teams are running them inside real workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coding&lt;/li&gt;
&lt;li&gt;Research&lt;/li&gt;
&lt;li&gt;Support&lt;/li&gt;
&lt;li&gt;Document processing&lt;/li&gt;
&lt;li&gt;Data operations&lt;/li&gt;
&lt;li&gt;Customer communication&lt;/li&gt;
&lt;li&gt;Internal automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That shift changes the security expectation.&lt;/p&gt;

&lt;p&gt;A demo agent fails safely because it has limited access.&lt;br&gt;&lt;br&gt;
A production agent fails with consequences because it may have access to repositories, customer data, internal APIs, and workflow triggers.&lt;/p&gt;

&lt;p&gt;LangChain’s 2026 State of Agent Engineering report, surveying more than 1,300 practitioners, shows the shift clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;57.3% of respondents already run agents in production&lt;/li&gt;
&lt;li&gt;Another 30.4% are actively developing agents with plans to deploy&lt;/li&gt;
&lt;li&gt;Nearly 89% have implemented observability&lt;/li&gt;
&lt;li&gt;Eval adoption sits at 52%&lt;/li&gt;
&lt;li&gt;Quality is the top production barrier&lt;/li&gt;
&lt;li&gt;For enterprises with 2,000 or more employees, security is the second-largest concern at 24.9%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The harder problem is not adoption.&lt;/p&gt;

&lt;p&gt;It is control.&lt;/p&gt;

&lt;p&gt;As agents move into production, teams have to answer a concrete set of questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tools can the agent see?&lt;/li&gt;
&lt;li&gt;Which tools can it call?&lt;/li&gt;
&lt;li&gt;Which context can it read?&lt;/li&gt;
&lt;li&gt;Which actions require human approval?&lt;/li&gt;
&lt;li&gt;What gets logged?&lt;/li&gt;
&lt;li&gt;What happens if the agent is wrong?&lt;/li&gt;
&lt;li&gt;What happens if a tool response is malicious?&lt;/li&gt;
&lt;li&gt;What happens if the agent changes code, sends a message, or triggers a workflow?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not prompt-engineering questions.&lt;/p&gt;

&lt;p&gt;They are architecture questions.&lt;/p&gt;
&lt;h2&gt;
  
  
  MCP Makes Agents More Useful and More Sensitive
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol matters because it standardizes how AI applications connect to tools, data, and external systems.&lt;/p&gt;

&lt;p&gt;Without a common protocol, every application needs custom integrations.&lt;/p&gt;

&lt;p&gt;With MCP, tools and context become reusable across agents.&lt;/p&gt;

&lt;p&gt;OpenAI’s Agents SDK describes MCP as the “USB-C port for AI applications”: a standard way for models to connect to different data sources and tools.&lt;/p&gt;

&lt;p&gt;Standardization also increases responsibility.&lt;/p&gt;

&lt;p&gt;If tools become easier to connect, unsafe tools become easier to expose.&lt;/p&gt;

&lt;p&gt;If context becomes easier to pass, sensitive context becomes easier to leak.&lt;/p&gt;

&lt;p&gt;OpenAI’s MCP and connectors guide notes that remote MCP servers can be any public internet server implementing the protocol. These servers can let models access and control external services, and tool calls can be allowed automatically or restricted behind explicit developer approval.&lt;/p&gt;

&lt;p&gt;In production, MCP is not only an integration layer.&lt;/p&gt;

&lt;p&gt;It is a permission boundary.&lt;/p&gt;
&lt;h2&gt;
  
  
  The First Risk: Tool Access Without Permission Boundaries
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI02: Tool Misuse &amp;amp; Exploitation&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The moment an agent can call tools, security moves from prompt design to permission design.&lt;/p&gt;

&lt;p&gt;A tool is not just a function.&lt;/p&gt;

&lt;p&gt;It is a capability.&lt;/p&gt;

&lt;p&gt;Some tools only read. Some modify state. Some send messages, create tickets, delete files, update databases, deploy code, or trigger workflows.&lt;/p&gt;

&lt;p&gt;They should not be treated equally.&lt;/p&gt;

&lt;p&gt;A production agent should not see every available tool by default. It should see only the tools needed for the current task, user, role, and environment.&lt;/p&gt;

&lt;p&gt;A safer design separates tools into categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read-only tools&lt;/li&gt;
&lt;li&gt;Write tools&lt;/li&gt;
&lt;li&gt;External communication tools&lt;/li&gt;
&lt;li&gt;Production-impacting tools&lt;/li&gt;
&lt;li&gt;Code execution tools&lt;/li&gt;
&lt;li&gt;Sensitive-data tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it applies different controls to each.&lt;/p&gt;

&lt;p&gt;Read-only tools may run automatically.&lt;br&gt;&lt;br&gt;
Write tools may need approval.&lt;br&gt;&lt;br&gt;
Production-impacting tools should require explicit human confirmation.&lt;br&gt;&lt;br&gt;
Secret-access tools should usually be blocked entirely.&lt;/p&gt;

&lt;p&gt;This is exactly the failure OWASP catalogs as &lt;code&gt;ASI02&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It also shows up in the wild. In 2025, a Google AI agent following a chained instruction deleted a user’s entire Drive. The tool was legitimate and the permissions were granted, which is precisely why the damage was possible.&lt;/p&gt;

&lt;p&gt;The goal of scoping is not to slow every agent down.&lt;/p&gt;

&lt;p&gt;It is to prevent silent high-risk execution.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Second Risk: Remote MCP Servers and Trust
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI04: Agentic Supply Chain Vulnerabilities&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Remote MCP servers are powerful because they expose useful capabilities from external systems.&lt;/p&gt;

&lt;p&gt;They are sensitive because they sit outside your application boundary.&lt;/p&gt;

&lt;p&gt;The question is not only:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can this tool solve the task?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do we trust this server with the data the agent may send to it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenAI’s guidance is blunt on this point: remote MCP servers are third-party services subject to their own terms. They can send and receive data, take action, and should be reviewed carefully. Developers should prefer official servers, review the data shared with third parties, and log that usage.&lt;/p&gt;

&lt;p&gt;The risk is not hypothetical.&lt;/p&gt;

&lt;p&gt;In January 2026, three prompt-injection vulnerabilities, &lt;code&gt;CVE-2025-68143&lt;/code&gt;, &lt;code&gt;CVE-2025-68144&lt;/code&gt;, and &lt;code&gt;CVE-2025-68145&lt;/code&gt;, were disclosed in Anthropic’s official Git MCP server. A malicious README or a poisoned issue description was enough to trigger code execution or data exfiltration.&lt;/p&gt;

&lt;p&gt;If an official server from a frontier lab can carry that risk, an unvetted third-party proxy carries far more.&lt;/p&gt;

&lt;p&gt;Before adopting a server, the questions worth asking are concrete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who operates it?&lt;/li&gt;
&lt;li&gt;What data will it receive?&lt;/li&gt;
&lt;li&gt;Does it store or forward that data?&lt;/li&gt;
&lt;li&gt;Does it expose read or write capabilities?&lt;/li&gt;
&lt;li&gt;Can its tool behavior change over time?&lt;/li&gt;
&lt;li&gt;Is it official, self-hosted, or a third-party proxy?&lt;/li&gt;
&lt;li&gt;Are all calls logged?&lt;/li&gt;
&lt;li&gt;Is approval required for sensitive actions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For internal systems, the safest default is a self-hosted or official server with clear authorization, logging, and data-retention expectations.&lt;/p&gt;

&lt;p&gt;An MCP server is not just a connector.&lt;/p&gt;

&lt;p&gt;It is a trust decision.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Third Risk: Tool Descriptions as Prompt Surface
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI01: Agent Goal Hijack&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In traditional software, a function description is documentation.&lt;/p&gt;

&lt;p&gt;In an agent system, a tool description becomes part of the model’s operating context. That means tool metadata can influence behavior.&lt;/p&gt;

&lt;p&gt;If a malicious or compromised tool embeds hidden instructions in its description or output, the model may treat them as trusted context.&lt;/p&gt;

&lt;p&gt;This is not theoretical.&lt;/p&gt;

&lt;p&gt;In 2025, Invariant Labs disclosed MCP “tool-poisoning” attacks that hid malicious instructions inside tool descriptions visible to the model, but not to the user reviewing the tool list.&lt;/p&gt;

&lt;p&gt;OpenAI’s documentation echoes the warning: malicious MCP servers may include hidden instructions designed to make models behave unexpectedly, and server behavior can change between calls.&lt;/p&gt;

&lt;p&gt;OWASP files this kind of redirection under &lt;code&gt;ASI01: Agent Goal Hijack&lt;/code&gt;, where injected content silently changes what the agent is trying to do.&lt;/p&gt;

&lt;p&gt;So tool descriptions should not be treated as harmless text.&lt;/p&gt;

&lt;p&gt;A safe platform should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review tool descriptions before exposing them&lt;/li&gt;
&lt;li&gt;Keep descriptions short and purpose-specific&lt;/li&gt;
&lt;li&gt;Prevent third-party tools from injecting broad behavioral instructions&lt;/li&gt;
&lt;li&gt;Log which tool definitions were visible during a run&lt;/li&gt;
&lt;li&gt;Revalidate definitions whenever a server changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The larger the tool surface, the easier it is for the agent to pick the wrong capability or absorb the wrong instruction.&lt;/p&gt;

&lt;p&gt;This is one reason “more tools” does not mean a better agent.&lt;/p&gt;

&lt;p&gt;Sometimes it just means a larger attack surface.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Fourth Risk: Codebase Access Without Repo Guardrails
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI05: Unexpected Code Execution&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Coding agents are useful because they read code, propose changes, update files, run tests, and assist with reviews.&lt;/p&gt;

&lt;p&gt;That also means they operate inside a sensitive engineering environment.&lt;/p&gt;

&lt;p&gt;A coding agent does not need production root access to create risk. Write access to the wrong files is enough.&lt;/p&gt;

&lt;p&gt;It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduce insecure code&lt;/li&gt;
&lt;li&gt;Change dependencies&lt;/li&gt;
&lt;li&gt;Modify tests&lt;/li&gt;
&lt;li&gt;Expose secrets in logs or prompts&lt;/li&gt;
&lt;li&gt;Bypass conventions&lt;/li&gt;
&lt;li&gt;Produce code that looks correct but quietly weakens maintainability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instructions like &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, repo rules, and branch policies help, but they are not sufficient on their own.&lt;/p&gt;

&lt;p&gt;They are context, not enforcement.&lt;/p&gt;

&lt;p&gt;A practical setup pairs instruction with hard controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run coding agents on branches, not directly on &lt;code&gt;main&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Require review before merge&lt;/li&gt;
&lt;li&gt;Block access to secret files&lt;/li&gt;
&lt;li&gt;Run tests and linters automatically&lt;/li&gt;
&lt;li&gt;Require explicit approval for dependency changes&lt;/li&gt;
&lt;li&gt;Log the files the agent reads and modifies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OWASP captures the worst case here as &lt;code&gt;ASI05&lt;/code&gt;, where agent-generated or agent-invoked code becomes an unintended execution path.&lt;/p&gt;

&lt;p&gt;The principle is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Coding agents should not only be instructed. They should be constrained.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The Fifth Risk: Context and Memory Leakage
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI06: Memory &amp;amp; Context Poisoning&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Context is one of the most important parts of AI system design.&lt;/p&gt;

&lt;p&gt;It is also a security boundary in two directions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Outbound leakage
&lt;/h3&gt;

&lt;p&gt;A RAG system may retrieve internal documents, customer records, tickets, code, or emails that then flow into prompts, tool calls, and future actions.&lt;/p&gt;

&lt;p&gt;Leakage happens when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sensitive documents enter a prompt&lt;/li&gt;
&lt;li&gt;Retrieval returns documents outside the user’s scope&lt;/li&gt;
&lt;li&gt;The agent summarizes confidential data into a response&lt;/li&gt;
&lt;li&gt;Context from one session bleeds into another&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Inbound poisoning
&lt;/h3&gt;

&lt;p&gt;Inbound is the direction the EchoLeak and Gemini attacks exploited.&lt;/p&gt;

&lt;p&gt;In 2025, a researcher poisoned Gemini’s persistent memory through a malicious email. A follow-on attack through Gemini Enterprise’s Jira integration silently wiped a victim’s memory via a task description, earning a $15,000 bounty.&lt;/p&gt;

&lt;p&gt;OWASP classifies this as &lt;code&gt;ASI06&lt;/code&gt;: corrupting stored context so it biases future reasoning long after the initial interaction.&lt;/p&gt;

&lt;p&gt;Memory is a high-privilege write path, and it should be treated like one.&lt;/p&gt;

&lt;p&gt;The defense is &lt;strong&gt;least-context access&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Retrieve only what is needed. Filter context by user, role, workspace, and task. Keep raw secrets out of prompts.&lt;/p&gt;

&lt;p&gt;Scope memory by user and session. Expire sensitive entries. Record where each memory came from so poisoned entries can be found and removed.&lt;/p&gt;

&lt;p&gt;Redact sensitive fields before model calls and log context usage deliberately rather than dumping everything.&lt;/p&gt;

&lt;p&gt;The goal is not to starve the model of information.&lt;/p&gt;

&lt;p&gt;It is to keep useful context from becoming uncontrolled exposure.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Sixth Risk: No Audit Trail
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Maps to &lt;code&gt;ASI09&lt;/code&gt; and &lt;code&gt;ASI10: Human-Agent Trust Exploitation, Rogue Agents&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In a normal backend, logs are basic hygiene.&lt;/p&gt;

&lt;p&gt;In an agent system, they are even more important because behavior is probabilistic and multi-step.&lt;/p&gt;

&lt;p&gt;When an agent produces a bad result, you need to reconstruct what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the user asked&lt;/li&gt;
&lt;li&gt;What context was retrieved&lt;/li&gt;
&lt;li&gt;Which tools were visible&lt;/li&gt;
&lt;li&gt;Which tool the agent chose&lt;/li&gt;
&lt;li&gt;What input it sent&lt;/li&gt;
&lt;li&gt;What the tool returned&lt;/li&gt;
&lt;li&gt;Whether approval was required&lt;/li&gt;
&lt;li&gt;Whether approval was granted&lt;/li&gt;
&lt;li&gt;What the final response was&lt;/li&gt;
&lt;li&gt;What changed in the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that trace, debugging is guesswork.&lt;/p&gt;

&lt;p&gt;Teams know this. According to LangChain, 89% have implemented some form of observability and 62% have detailed step-level tracing.&lt;/p&gt;

&lt;p&gt;But observability and control are not the same thing.&lt;/p&gt;

&lt;p&gt;The monitoring picture is thinner than the adoption numbers suggest.&lt;/p&gt;

&lt;p&gt;Gravitee’s State of AI Agent Security report found that only 3.9% of organizations actively monitor and secure more than 80% of their deployed agents, and 57.4% cite insufficient logging and audit trails as a primary security concern.&lt;/p&gt;

&lt;p&gt;Observability has to connect to evaluation, permissions, approvals, and incident review.&lt;/p&gt;

&lt;p&gt;Otherwise, you have dashboards but no recourse.&lt;/p&gt;

&lt;p&gt;A production agent needs a system of record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User request&lt;/li&gt;
&lt;li&gt;Agent instructions&lt;/li&gt;
&lt;li&gt;Retrieved context&lt;/li&gt;
&lt;li&gt;Visible tools&lt;/li&gt;
&lt;li&gt;Selected tools&lt;/li&gt;
&lt;li&gt;Tool inputs and outputs&lt;/li&gt;
&lt;li&gt;Approval decisions&lt;/li&gt;
&lt;li&gt;Final response&lt;/li&gt;
&lt;li&gt;Errors and retries&lt;/li&gt;
&lt;li&gt;System changes&lt;/li&gt;
&lt;li&gt;Cost and latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot replay what the agent saw, decided, called, and changed, you do not have a production-grade agent system.&lt;/p&gt;

&lt;p&gt;You have a demo with logs missing.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Practical Agent Security Model
&lt;/h2&gt;

&lt;p&gt;A safer architecture puts control layers between the model and the systems it can affect.&lt;/p&gt;

&lt;p&gt;The model should not reach tools, files, APIs, or workflows directly. It should pass through a boundary that decides, gates, executes, and records.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request
  → Agent runtime
  → Context filter           (least-context retrieval, redaction)
  → Tool permission layer    (visibility + scopes by user/role/task)
  → Human approval gate      (pauses risky actions)
  → Tool execution layer     (sandboxed where needed)
  → Audit log / trace store  (full replayable record)
  → Final response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supporting components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP server allowlist
Repo sandbox
Secrets boundary
Evaluation layer
Monitoring
Policy engine
Rollback path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important idea is separation of concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model reasons&lt;/li&gt;
&lt;li&gt;The permission layer decides what it can access&lt;/li&gt;
&lt;li&gt;The approval layer pauses risky actions&lt;/li&gt;
&lt;li&gt;The execution layer runs them safely&lt;/li&gt;
&lt;li&gt;The audit layer records everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not put all of that responsibility inside the prompt.&lt;/p&gt;

&lt;p&gt;Prompts are not permission systems, and a model can be talked out of a system-message rule far more easily than out of a tool the surrounding code refuses to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risk Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk area&lt;/th&gt;
&lt;th&gt;OWASP mapping&lt;/th&gt;
&lt;th&gt;Core production control&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool misuse&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ASI02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tool scoping, permissions, approvals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote MCP trust&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ASI04&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Allowlists, official/self-hosted servers, logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool poisoning&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ASI01&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Review tool descriptions and outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code execution&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ASI05&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Repo sandbox, branch workflow, CI checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context and memory leakage&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ASI06&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Least-context retrieval, redaction, memory scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No audit trail&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ASI09&lt;/code&gt;, &lt;code&gt;ASI10&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Full trace, replayability, approval logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Permission Defaults That Hold Up
&lt;/h2&gt;

&lt;p&gt;A simple rule covers most cases:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reads are cheap. Writes are not. Anything irreversible needs a human.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Default posture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read public docs&lt;/td&gt;
&lt;td&gt;Allow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read internal docs&lt;/td&gt;
&lt;td&gt;Allow, but scope by role and workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search codebase&lt;/td&gt;
&lt;td&gt;Allow in sandboxed, read-only mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modify code&lt;/td&gt;
&lt;td&gt;Require review and approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change dependencies&lt;/td&gt;
&lt;td&gt;Require explicit approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trigger CI/CD&lt;/td&gt;
&lt;td&gt;Require approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Call production API&lt;/td&gt;
&lt;td&gt;Require approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Send external message&lt;/td&gt;
&lt;td&gt;Require approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delete files&lt;/td&gt;
&lt;td&gt;Block or require explicit approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execute shell commands&lt;/td&gt;
&lt;td&gt;Block or sandbox with approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access raw secrets&lt;/td&gt;
&lt;td&gt;Block&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reading public docs can run freely.&lt;/p&gt;

&lt;p&gt;Reading internal docs or searching a codebase should be allowed but scoped by role and sandboxed.&lt;/p&gt;

&lt;p&gt;Modifying code, changing dependencies, triggering CI/CD, calling a production API, or sending an external message should all require approval, because each one can affect real users, environments, or supply chains.&lt;/p&gt;

&lt;p&gt;Deleting files and executing shell commands should be blocked or gated behind approval and a sandbox.&lt;/p&gt;

&lt;p&gt;Raw secrets should simply never be in the agent’s reach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Checklist Before Shipping an Agent
&lt;/h2&gt;

&lt;p&gt;Before shipping an agent into a real workflow, check the basics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate read-only tools from write tools&lt;/li&gt;
&lt;li&gt;Hide tools the agent does not need&lt;/li&gt;
&lt;li&gt;Use least-privilege scopes per tool and per user&lt;/li&gt;
&lt;li&gt;Never expose raw secrets to the agent&lt;/li&gt;
&lt;li&gt;Require approval for risky actions&lt;/li&gt;
&lt;li&gt;Store every approval decision for audit&lt;/li&gt;
&lt;li&gt;Treat tool descriptions as part of the prompt surface&lt;/li&gt;
&lt;li&gt;Review and allowlist trusted MCP servers&lt;/li&gt;
&lt;li&gt;Revalidate MCP server definitions when they change&lt;/li&gt;
&lt;li&gt;Filter retrieved context before it reaches the model&lt;/li&gt;
&lt;li&gt;Scope and expire memory&lt;/li&gt;
&lt;li&gt;Run coding agents in sandboxed environments&lt;/li&gt;
&lt;li&gt;Use branches, tests, and review before merge&lt;/li&gt;
&lt;li&gt;Log every tool call and result&lt;/li&gt;
&lt;li&gt;Confirm you can replay any run end to end&lt;/li&gt;
&lt;li&gt;Evaluate failure cases, not only happy paths&lt;/li&gt;
&lt;li&gt;Monitor repeated tool loops, cost, and latency&lt;/li&gt;
&lt;li&gt;Create rollback paths for agent-executed changes&lt;/li&gt;
&lt;li&gt;Periodically review data shared with remote MCP servers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Not to Use an Agent
&lt;/h2&gt;

&lt;p&gt;Not every workflow needs an agent.&lt;/p&gt;

&lt;p&gt;If the task is deterministic, a normal service is safer.&lt;/p&gt;

&lt;p&gt;If the operation is high-risk and rare, a human workflow is often better.&lt;/p&gt;

&lt;p&gt;If the data is too sensitive to expose, the agent should not see it.&lt;/p&gt;

&lt;p&gt;If the system cannot log and replay agent actions, it is not ready for production.&lt;/p&gt;

&lt;p&gt;And if rollback is not possible, automatic execution should be avoided.&lt;/p&gt;

&lt;p&gt;Agents earn their place when tasks need reasoning, flexible planning, tool use, and adaptation.&lt;/p&gt;

&lt;p&gt;They become a liability when they are used as a shortcut around proper system design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;AI agent security in 2026 is not about safer prompts.&lt;/p&gt;

&lt;p&gt;It is about safer execution.&lt;/p&gt;

&lt;p&gt;The model is one part of the system. The real production risk is the surface around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tools&lt;/li&gt;
&lt;li&gt;Code&lt;/li&gt;
&lt;li&gt;Context&lt;/li&gt;
&lt;li&gt;Memory&lt;/li&gt;
&lt;li&gt;MCP servers&lt;/li&gt;
&lt;li&gt;Approvals&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Rollback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The safest agent systems will not be the ones with the longest prompts.&lt;/p&gt;

&lt;p&gt;They will be the ones with the clearest boundaries.&lt;/p&gt;

&lt;p&gt;If an agent can act, it needs controls.&lt;br&gt;&lt;br&gt;
If it can call tools, it needs permissions.&lt;br&gt;&lt;br&gt;
If it can touch code, it needs repo guardrails.&lt;br&gt;&lt;br&gt;
If it can use context, it needs filtering.&lt;br&gt;&lt;br&gt;
If it can affect real systems, it needs approval and audit logs.&lt;/p&gt;

&lt;p&gt;That is the real shift from chatbot safety to agent security.&lt;/p&gt;




&lt;p&gt;I write about production AI engineering: agents, RAG, MCP, coding copilots, evals, context engineering, security boundaries, and AI infra.&lt;/p&gt;

&lt;p&gt;Follow me if you want practical breakdowns beyond AI hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/" rel="noopener noreferrer"&gt;OWASP — Top 10 for Agentic Applications 2026&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;LangChain — State of Agent Engineering 2026&lt;/a&gt;&lt;br&gt;
&lt;a href="https://developers.openai.com/api/docs/guides/tools-connectors-mcp" rel="noopener noreferrer"&gt;OpenAI — MCP and Connectors guide&lt;/a&gt;&lt;br&gt;
&lt;a href="https://openai.github.io/openai-agents-python/mcp/" rel="noopener noreferrer"&gt;OpenAI Agents SDK — Model Context Protocol&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.gravitee.io/state-of-ai-agent-security" rel="noopener noreferrer"&gt;Gravitee — State of AI Agent Security&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>mcp</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
