Traditional observability tells you what broke. Securing MCP-enabled agentic AI requires understanding why the agent decided to act — and that requires a fundamentally different engineering approach.
Views and opinions are my own.
The reliability engineering community has spent decades building frameworks for understanding why systems fail. Error budgets. Blast radius analysis. Reversibility constraints. Safe degradation patterns.
None of these were designed with AI agents in mind.
And that gap is becoming one of the most important unsolved problems in production infrastructure.
What MCP Actually Is — and Why It Changes Everything
The Model Context Protocol (MCP) is the emerging standard that gives AI agents the ability to invoke tools, access data, and execute operations at machine speed. It is not simply an API integration layer.
MCP is a capability delegation framework. When your AI agent connects to an MCP server, it gains the authority to act on behalf of your systems — reading data, writing records, triggering workflows — with minimal human intervention between decisions.
That fundamental shift in what software can do autonomously is what makes MCP security categorically different from traditional application security.
The Failure Modes Traditional SRE Doesn't See
SRE practice is built around observable failure. A service goes down. Latency spikes. Error rates climb. Dashboards turn red. Alerts fire.
MCP introduces a class of failures that produce none of these signals:
Poisoned tool outputs — A malicious or compromised MCP server returns data designed to manipulate the agent's reasoning rather than serve its stated purpose. The agent doesn't throw an error. It simply makes different decisions — quietly, at machine speed, across every subsequent action in the workflow.
Rug pull attacks — An MCP tool's behavior, schema, or permissions change after your security review approved it. The tool still responds. Requests still succeed. But what the tool actually does has changed in ways your authorization model never accounted for.
Context contamination — In multi-server MCP deployments, data from an untrusted server can influence the agent's reasoning about a completely separate trusted system. There is no network boundary violation. No access control failure. The contamination happens at the semantic layer — inside the agent's context window.
These are not failures that observability platforms are built to detect. They don't produce stack traces. They don't increment error counters. They manifest as the agent making decisions that appear locally reasonable but are globally wrong.
What SRE Principles Actually Map To in MCP Security
The Cloud Security Alliance AI Safety Working Group is currently developing "The Six Pillars of MCP Security" — a framework I'm contributing to through research and writing focused specifically on the SRE and operational resilience angle.
Here's how the core SRE concepts translate directly into MCP security primitives:
Decision lineage instead of just logs
Traditional logging captures what happened — which service was called, what response was returned, what error was thrown. MCP security requires capturing why the agent decided to act — which tool was selected, which context influenced that selection, which prior tool output shaped the current reasoning step.
This is decision lineage: a tamper-evident record of the agent's reasoning pathway that makes it possible to reconstruct exactly how a sequence of actions came to occur. Without it, forensic investigation of an MCP security incident is essentially impossible.
Error budgets applied to unsafe autonomy
SRE error budgets define the acceptable threshold for unreliable behavior — the point at which reliability risk outweighs the cost of moving slower. The same concept applies directly to agent autonomy.
An agent operating within normal behavioral bounds earns the right to act autonomously. An agent whose tool invocation patterns, context window composition, or decision sequences drift outside established baselines should have its autonomy progressively constrained — moving toward human-in-the-loop confirmation for high-impact actions until normal patterns are restored.
This is error budgets applied not to uptime, but to trustworthiness.
Safe degradation for agentic systems
When a microservice degrades, it fails gracefully — returning cached responses, shedding load, activating circuit breakers. When an MCP-enabled agent degrades, the equivalent is reducing its capability surface: restricting which tools it can invoke, requiring explicit approval for write operations, limiting the scope of context it can access.
Safe degradation for agentic systems means defining the progressive capability reduction path — from full autonomy to supervised operation to read-only mode to complete suspension — and automating the transitions based on observable behavioral signals.
The Observability Gap
The hardest part of this problem is not the controls. It's the detection.
Traditional observability tells you what broke. A request failed. A threshold was crossed. A dependency went down.
MCP security requires understanding why the agent made a particular decision — and that requires a fundamentally different instrumentation approach. You need to capture not just the inputs and outputs of each tool call, but the semantic context that surrounded it. What was in the agent's context window? What prior tool outputs influenced this decision? What was the agent's stated reasoning before it chose this action?
This is not a solved problem in the current observability tooling landscape. It is the gap that makes MCP security genuinely difficult — and genuinely important to get right before agentic AI is operating at scale in regulated production environments.
What This Means for Your Team Right Now
If your team is deploying AI agents that touch production infrastructure, the question isn't whether you need an MCP security strategy.
It's whether you're already operating with one without realizing it needs a formal name.
Start with three questions:
Can you reconstruct why your agent took a specific action? If not, you don't have decision lineage — and you can't do forensics on an MCP security incident.
Do you have behavioral baselines for your agents? If not, you can't detect drift — and context contamination and tool poisoning both manifest as behavioral drift before they manifest as anything else.
Do you have a defined capability reduction path? If your agent starts behaving outside expected parameters, what happens? If the answer is "we'd have to manually intervene," you don't have safe degradation — you have a manual kill switch, which is not the same thing.
These are solvable engineering problems. They require applying reliability engineering discipline to a new domain — which is exactly what SRE has always done.
I shared a shorter version of these ideas on LinkedIn here(https://www.linkedin.com/posts/ajay-devineni_agenticai-mcp-aisecurity-activity-7446992069618913281-dnPv?utm_source=share&utm_medium=member_desktop&rcm=ACoAACIp55QBRGVmAcEbf0D-1PaR5vEbm2yMcJU


). This research is part of my contribution to the Cloud Security Alliance AI Safety Working Group's Six Pillars of MCP Security framework.
What challenges are you seeing when bringing agentic AI safely into production? Are observability gaps or control gaps the bigger problem for your team?
Top comments (0)