It usually starts with something that feels harmless.
You give an AI agent access to a few tools. Maybe it can read internal tickets, check a data...
For further actions, you may consider blocking this person and/or reporting abuse
Really solid breakdown. The "lethal trifecta" framing is something I've been trying to articulate to our team — combining private data access, untrusted input, and third-party action capability is exactly where our agent systems started showing unexpected behavior in staging.
The point about output validation being as important as input validation hit home. We caught a prompt injection coming through a third-party API response rather than user input. Easy to miss if you're only looking at the front door.
One thing I'd add from our experience building agentic workflows: tool sprawl is a culture problem as much as a technical one. Developers want flexibility, so they register "just in case" tools that never get cleaned up. Having a gateway that surfaces unused tool access over time would help enforce least privilege without making it feel like a bureaucratic tax.
Really appreciate this, especially the real-world examples.
The output-side injection is a great point and honestly one of those things that’s easy to underestimate until you actually hit it in staging.
And yeah, tool sprawl being more of a culture issue than a technical one resonates a lot. Without some visibility into what’s actually being used over time, least privilege becomes hard to sustain in practice.
The staging discovery part rings so true - output injection tends to be invisible until your agent starts hallucinating tool calls or leaking context downstream. The culture point is the harder fix though. You can patch a misconfigured tool in an afternoon, but getting teams to actually audit what's in the tool registry takes organizational buy-in. Visibility tooling helps, but the behavior change has to come first.
Yeah, completely agree. The technical side is usually the easier part; changing habits and getting teams to continuously think about tool hygiene is where things get difficult.
Feels very similar to how permissions and cloud access started getting treated years ago: everyone wants flexibility until the blast radius becomes real.
That cloud IAM analogy is exactly right - and it suggests we already know how this ends. Observability was the forcing function that finally changed cloud permission habits. Teams didn't really audit IAM policies until there was an incident with real blast radius. The AI equivalent is probably an agent making a sequence of authorized-but-unexpected decisions that erodes trust or causes a costly mistake. Not necessarily a single dramatic breach, but enough friction to make tool governance feel urgent. The question is whether we can establish those habits before the incident rather than after.
Exactly. The tricky part is that tool hygiene only becomes real when teams can see near-misses, not just obvious failures. Cloud IAM got better once people had audit trails, blast-radius thinking, and postmortems. Agent systems probably need the same trio: least-privilege tool scopes, observable tool-call traces, and a lightweight review loop for "authorized but wrong" behavior before it turns into an incident.
Great breakdown of the MCP security gaps. The "sequence problem" you described is where we've seen teams hit hardest in production — two individually safe operations combining into an unsafe one is exactly where traditional API-level security breaks down for agents.
One pattern that's worked in practice: treating the gateway layer as a policy enforcement point that evaluates action sequences holistically rather than individual tool calls in isolation. You essentially need a request-scoped context that accumulates risk signals across a multi-step agent flow, not just per-call validation.
Curious what your take is on where that sequence-aware policy logic should live — at the MCP server level, the gateway, or a separate policy engine? Each has trade-offs in terms of latency and coupling.
Gateway's the right place for sequence evaluation imo, but there's a pre-call vector the article doesn't really touch on: tool description poisoning. MCP discovery returns descriptions the LLM uses to decide what to call, so a shadow server can return a description claiming it's the "internal database query handler" and the agent routes sensitive queries there willingly. The gateway evaluates the call after the routing decision's already been made by the model. You'd need manifest validation at tool registration time, not just call-time policy — and that's a fundamentally different enforcement point than what most gateway architectures are built around.
This is a really good point and honestly a threat vector I should’ve explored more in the article.
You’re right that by the time the gateway evaluates the call, the model may have already been influenced by poisoned discovery metadata. Manifest validation and trust verification at registration time feel increasingly necessary as MCP ecosystems become more dynamic.
Really appreciate this insight, especially the point about accumulating risk across a multi-step flow instead of validating calls in isolation. That’s exactly the shift I think a lot of teams still underestimate with agent systems.
My current leaning is that the gateway is probably the best place for sequence-aware enforcement because it has the broadest visibility across tools and agents without tightly coupling policy logic to individual MCP servers. But I can definitely see the argument for a separate policy engine as systems become more complex.
Feels like this area is still evolving pretty quickly, and honestly, a lot of the “right” patterns probably haven’t fully emerged yet.
Really solid article, especially the part explaining the difference between MCP and the governance layer.
A lot of teams seem to assume that adopting MCP automatically makes agent systems secure, while MCP mainly standardizes communication, not control.
The “Lethal Trifecta” section was particularly important because it clearly shows why AI agents introduce a completely different risk model compared to traditional APIs.
Especially dangerous action sequences like:
reading internal data → sending externally.
I also think AI Gateways will eventually become for agents what API Gateways became for microservices.
Really appreciate this. And yeah, that’s exactly the distinction I wanted to highlight: MCP solves interoperability, but governance and control are a completely separate layer.
Also fully agree on AI Gateways potentially becoming the “API Gateway layer” for agent systems. Feels like the industry is starting to move in that direction pretty quickly.
This is one of the most important distinctions in production agent design: MCP gives agents a way to connect, but it does not automatically define what should be allowed. The prompt-injection-through-tool-output risk is especially easy to miss when teams are still in demo mode. I have been building github.com/rajudandigam/Ultimate-T..., a TypeScript-first catalog of AI agent, workflow, RAG, eval, and governance blueprints focused on moving from prompts to production systems. This article maps well to the guardrails and approval-gate side of that work. I’d love for builders here to explore, star, fork, or suggest security-focused project ideas that should be included.
Thank you so much. Fully agree that the “communication vs control” distinction is where a lot of teams still get tripped up.
Your project sounds really interesting too, especially the focus on bridging the gap between demos and actual production systems.
The "lethal trifecta" framing is spot on. One angle worth adding: when the agent runs entirely on-device (local model + local execution), the attack surface shrinks dramatically — no screenshots leaving the machine, no prompt injection via network calls. The tradeoff is capability, but with 4B quantized models hitting 476 tokens/s on M4, edge agents are becoming viable for production workflows where data sensitivity matters most.
That’s a really interesting angle, and I agree the local-first approach changes the threat model quite a bit.
Keeping execution and data entirely on-device removes a huge amount of external exposure by default. Feels like edge agents are becoming much more practical now than they were even a year ago.
Thanks for the write up and definitely MCP security has been top of mind for many folks since day 1 before heavier enterprise usage can be relied on.
It seems like many players are focusing on the Gateway part, so i wonder how TrueFoundry differs from other solutions such as MintMCP or Runlayer? Would love to understand and see some comparisons
Really appreciate this. And yeah, I think that’s exactly the interesting part of the space right now; everyone agrees the gateway layer matters, but the approaches are starting to diverge quite a bit.
From what I’ve seen, MintMCP feels more MCP-governance focused, while TrueFoundry seems to position the MCP layer as part of a broader AI control plane (routing, guardrails, observability, budgets, deployment, etc.). I still need to spend more time looking into Runlayer’s architecture in depth, though, before making a fair comparison.
Nice try, MCP, but basing auth on localStorage is a backdoor. Enforce least privilege, revoke tool access regularly, and audit relentlessly.
Yeah, agreed on the spirit of that.
If auth or sensitive tokens end up in something like
localStorage, you’ve basically weakened the whole trust boundary by default.The bigger point (which I probably should’ve made clearer) is exactly what you said: MCP only standardizes tool calling; it doesn’t enforce the security model around it. That part still needs proper least-privilege, rotation, and audit layers on top.
Great breakdown of the MCP security gaps 🔥
Really appreciate that 😍
Feels like a lot of teams are starting to realize that MCP solves the communication layer, but the governance and security layer around it is where things get interesting.