Hadil Ben Abdallah

Posted on May 11

How to Secure AI Agents in Production: What MCP Gets Right (and What It Doesn’t)

#ai #machinelearning #security #backend

The lethal trifecta of agent risk

It usually starts with something that feels harmless.

You give an AI agent access to a few tools. Maybe it can read internal tickets, check a database, and send Slack messages. You wire things up, test a few flows, and everything works.

Then someone asks a simple question:

“What stops this agent from doing something it shouldn’t?”

That’s where things get uncomfortable.

The “Lethal Trifecta” (Why This Gets Risky Fast)

There’s a concept from recent security research that’s been getting a lot of attention.

It’s sometimes called the “lethal trifecta.”

An AI agent becomes dangerous when it combines three capabilities:

Access to private data
Exposure to untrusted input
Ability to take external actions

Each of these is fine on its own.

Together, they’re a problem.

Imagine this:

Your agent reads internal support tickets.
It also processes external content, like GitHub issues.
And it can send messages to Slack.

Now someone posts a malicious prompt inside a public GitHub issue.

The agent reads it, follows the instructions, and sends sensitive internal data to an external channel.

No exploit. No broken auth. Just… the system doing exactly what it was allowed to do.

This isn’t theoretical; recent security research has already demonstrated variations of this in real systems.

Where MCP Fits (and Where It Doesn’t)

To be fair, the Model Context Protocol (MCP) solves a real problem.

It standardizes how agents talk to tools.

Instead of building custom integrations for every system, you get a consistent interface. That’s a big win for developer productivity.

But MCP was never meant to be a security framework.

It’s a protocol, not a control plane.

And that distinction matters a lot in production.

This is the part most teams miss: MCP standardizes communication, but the gateway layer is what actually enforces governance and security.

Diagram of a production AI agent architecture using an AI Gateway, MCP servers, guardrails, access control, governance, routing, and observability to secure enterprise AI agents across AWS, Azure, GCP, on-prem, and air-gapped deployments. — AI agent security architecture showing how an AI Gateway, MCP servers, guardrails, and governance layers work together to secure production AI agents across cloud and on-prem infrastructure (Adapted from the TrueFoundry website)

What MCP Deliberately Doesn’t Handle

Once you start looking closely, the gaps become obvious.

MCP defines how communication happens. It doesn’t define what should be allowed.

Here’s what it leaves to you:

No built-in authentication

There’s no default mechanism enforcing identity between agents and tools. You’re responsible for implementing and managing that layer yourself.

No access control model

By default, any agent can discover and call any registered tool. There’s no concept of scoped visibility unless you build it.

No observability

Direct MCP connections give you very little insight into what’s actually happening. You don’t get a clear trace of agent behavior across tools.

No guardrails

Tools execute with whatever permissions they have. MCP doesn’t inspect inputs or outputs for risky behavior.

None of this is a flaw. It’s a design choice.

But it means MCP alone is not enough once you move beyond demos.

The Real Threat Model for Agent Systems

Agent systems introduce risks that don’t exist in traditional APIs.

If you treat them the same way, you miss what actually matters.

1. Prompt injection via tool responses

This one catches teams off guard.

You secure your prompts. You validate inputs. Everything looks fine.

But the attack comes from the tool output.

A Jira ticket. A web page. A GitHub issue.

If that content contains instructions, the agent may follow them as if they were part of the original task.

That’s how data gets exfiltrated without breaking any rules.

2. Tool permission creep

This usually starts with good intentions.

“Let’s just give the agent access to everything it might need.”

A few weeks later, it has access to 40 or 50 tools.

Most of them aren’t used.

But every unused tool increases your blast radius.

You don’t get breached because of what you use.
You get breached because of what you forgot was there.

3. The sequence problem

Two actions can be safe individually and dangerous together.

Read internal data → safe
Send data externally → safe

Combine them:

Read internal data → send externally → not safe

Traditional systems struggle with this because they evaluate actions in isolation.

Agent systems execute sequences. That’s where the risk lives.

4. Shadow MCP servers

This one is more of an organizational issue.

Developers spin up their own MCP servers to move faster.

No review. No governance. No centralized visibility.

Now you have tools in your system that your security team doesn’t even know exist.

And agents can talk to them.

This is exactly where a gateway layer becomes necessary.

MCP defines how tools are called.
A gateway defines what is allowed, monitored, and enforced.

Without that layer, you’re relying on application logic for security, and that doesn’t scale.

What a Production-Ready Security Model Looks Like

Once you accept that MCP doesn’t handle security, the next question is:

What does a secure setup actually look like?

At a high level, you need a layer that enforces control, visibility, and policy across every tool interaction.

Let’s break down the key controls.

Least-privilege tool access

Agents shouldn’t discover tools and then get blocked.

They shouldn’t see tools they’re not allowed to use in the first place.

This is a subtle but important difference.

In practice, this means each agent interacts with a filtered view of the tool registry.

This is exactly how TrueFoundry implements least-privilege access in production, using Virtual MCP Servers to control what each agent can even see.

In production, secure agent systems usually expose a filtered tool registry instead of giving agents global visibility into every MCP server.

TrueFoundry MCP Gateway interface showing Virtual MCP Servers, GitHub MCP integration, Atlassian tools, and controlled AI agent access for secure enterprise MCP deployments. — Example of Virtual MCP Servers in TrueFoundry, where AI agents only see the tools and integrations they are explicitly authorized to access.

Per-agent RBAC

Not all agents are equal.

A compliance agent and a customer support agent should operate in completely different scopes.

That separation should be enforced at the infrastructure level, not buried inside application logic.

Otherwise, it becomes fragile and hard to audit.

In mature deployments, security policies are enforced centrally instead of being scattered across application code.

TrueFoundry AI Gateway dashboard showing rate limiting policies, per-team governance rules, model access controls, and centralized security enforcement for enterprise AI systems. — Centralized AI Gateway controls for enforcing per-team rate limits, model governance, and policy enforcement across production AI agents and MCP tools (source: TrueFoundry platform)

Guardrails on both paths

Most teams think about validating inputs.

Fewer think about validating outputs.

You need both.

Inspect inputs before they reach a tool (to prevent prompt injection)
Inspect outputs before they reach the agent (to prevent data exfiltration)

This creates a controlled boundary around every tool call.

Human-in-the-loop gates

Some actions shouldn’t be fully automated.

Deleting data. Sending external communications. Triggering financial operations.

For these, you need approval steps.

A secure system doesn’t assume agents are always right. It gives humans the ability to intervene when it matters.

Immutable audit trails

When something goes wrong, you need answers.

Not guesses.

You need to know:

Which agent made the call
Which tool it used
What parameters were passed
What the tool returned
What happened next

Without this, debugging becomes impossible and compliance becomes a nightmare.

Deployment: Where Does Your Data Actually Go?

This is the part that security teams care about immediately.

In many setups, requests flow through third-party infrastructure.

That means your data leaves your environment.

For some teams, that’s acceptable.

For many enterprises, it isn’t.

A different approach is to run everything inside your own infrastructure.

Platforms like TrueFoundry support deployment in your VPC, on-prem, or even air-gapped environments, so data never leaves your domain.

In practice, this translates into infrastructure that’s already running at a serious scale.

Enterprise AI agent deployment architecture with AI Gateway, MCP Gateway, guardrails, audit logs, RBAC, observability, and secure model routing inside a customer VPC or on-prem environment. — Enterprise AI agent security architecture showing AI Gateway, MCP Gateway, guardrails, RBAC, audit logging, and observability running entirely inside customer-controlled infrastructure.

TrueFoundry is recognized in the 2026 Gartner® Market Guide for AI Gateways and handles production-scale workloads, processing 10B+ requests per month while maintaining 350+ RPS on a single vCPU with sub-3ms latency.
It’s compliant with SOC 2, HIPAA, GDPR, ITAR, and the EU AI Act and is trusted by enterprises including Siemens Healthineers, NVIDIA, Resmed, and Automation Anywhere.

A Practical Security Checklist (Before You Ship)

If you’re moving agents to production, this is the checklist I’d actually use:

[ ] Are all tool interactions going through a centralized MCP gateway?
[ ] Does each agent only see the tools it’s allowed to use?
[ ] Are tool inputs and outputs inspected for risky behavior?
[ ] Do high-risk actions require human approval?
[ ] Can you trace every agent action end-to-end?
[ ] Is everything running inside your own infrastructure (not a third-party SaaS)?

If you answer “no” to more than one of these, you’re not production-ready yet.

The Real Takeaway

MCP is a solid foundation.

It makes tool integration cleaner, faster, and more consistent.

But it doesn’t make your system secure.

Security comes from the layer that controls everything around that interaction.

That’s the difference most teams miss.

They adopt MCP, see things working, and assume they’re done.

In reality, they’ve only solved the communication problem, not the control problem.

MCP standardizes communication.
The gateway standardizes control.

Final Thoughts

AI agents change how systems behave.

They don’t just respond to requests. They take actions, make decisions, and interact with multiple systems in sequence.

That’s powerful.

But it also means the risk model is different.

If you treat agents like simple APIs, you’ll miss the failure modes that actually matter.

The teams that get this right don’t just add tools; they add structure around how those tools are used.

If you’re starting to think seriously about security, that’s a good sign. It usually means your system is moving from demo to something real.

If you want to explore what a unified control plane for models, tools, and agents looks like in practice, you can try TrueFoundry free, no credit card required, and deploy it in your own cloud in under 10 minutes.

Thanks for reading! 🙏🏻 I hope you found this useful ✅ Please react and follow for more 😍 Made with 💙 by Hadil Ben Abdallah

Hadil Ben Abdallah

Software Engineer • Technical Writer (250K+ readers) I turn brands into websites people 💙 to use

Top comments (25)

Vic Chen • May 11

Really solid breakdown. The "lethal trifecta" framing is something I've been trying to articulate to our team — combining private data access, untrusted input, and third-party action capability is exactly where our agent systems started showing unexpected behavior in staging.

The point about output validation being as important as input validation hit home. We caught a prompt injection coming through a third-party API response rather than user input. Easy to miss if you're only looking at the front door.

One thing I'd add from our experience building agentic workflows: tool sprawl is a culture problem as much as a technical one. Developers want flexibility, so they register "just in case" tools that never get cleaned up. Having a gateway that surfaces unused tool access over time would help enforce least privilege without making it feel like a bureaucratic tax.

Hadil Ben Abdallah • May 11

Really appreciate this, especially the real-world examples.

The output-side injection is a great point and honestly one of those things that’s easy to underestimate until you actually hit it in staging.

And yeah, tool sprawl being more of a culture issue than a technical one resonates a lot. Without some visibility into what’s actually being used over time, least privilege becomes hard to sustain in practice.

Vic Chen • May 12

The staging discovery part rings so true - output injection tends to be invisible until your agent starts hallucinating tool calls or leaking context downstream. The culture point is the harder fix though. You can patch a misconfigured tool in an afternoon, but getting teams to actually audit what's in the tool registry takes organizational buy-in. Visibility tooling helps, but the behavior change has to come first.

Hadil Ben Abdallah • May 12

Yeah, completely agree. The technical side is usually the easier part; changing habits and getting teams to continuously think about tool hygiene is where things get difficult.

Feels very similar to how permissions and cloud access started getting treated years ago: everyone wants flexibility until the blast radius becomes real.

Vic Chen • May 13 • Edited

That cloud IAM analogy is exactly right - and it suggests we already know how this ends. Observability was the forcing function that finally changed cloud permission habits. Teams didn't really audit IAM policies until there was an incident with real blast radius. The AI equivalent is probably an agent making a sequence of authorized-but-unexpected decisions that erodes trust or causes a costly mistake. Not necessarily a single dramatic breach, but enough friction to make tool governance feel urgent. The question is whether we can establish those habits before the incident rather than after.

Vic Chen • May 16

Exactly. The tricky part is that tool hygiene only becomes real when teams can see near-misses, not just obvious failures. Cloud IAM got better once people had audit trails, blast-radius thinking, and postmortems. Agent systems probably need the same trio: least-privilege tool scopes, observable tool-call traces, and a lightweight review loop for "authorized but wrong" behavior before it turns into an incident.

Xidao • May 11

Great breakdown of the MCP security gaps. The "sequence problem" you described is where we've seen teams hit hardest in production — two individually safe operations combining into an unsafe one is exactly where traditional API-level security breaks down for agents.

One pattern that's worked in practice: treating the gateway layer as a policy enforcement point that evaluates action sequences holistically rather than individual tool calls in isolation. You essentially need a request-scoped context that accumulates risk signals across a multi-step agent flow, not just per-call validation.

Curious what your take is on where that sequence-aware policy logic should live — at the MCP server level, the gateway, or a separate policy engine? Each has trade-offs in terms of latency and coupling.

Rahul S • May 12

Gateway's the right place for sequence evaluation imo, but there's a pre-call vector the article doesn't really touch on: tool description poisoning. MCP discovery returns descriptions the LLM uses to decide what to call, so a shadow server can return a description claiming it's the "internal database query handler" and the agent routes sensitive queries there willingly. The gateway evaluates the call after the routing decision's already been made by the model. You'd need manifest validation at tool registration time, not just call-time policy — and that's a fundamentally different enforcement point than what most gateway architectures are built around.

Hadil Ben Abdallah • May 12

This is a really good point and honestly a threat vector I should’ve explored more in the article.

You’re right that by the time the gateway evaluates the call, the model may have already been influenced by poisoned discovery metadata. Manifest validation and trust verification at registration time feel increasingly necessary as MCP ecosystems become more dynamic.

Hadil Ben Abdallah • May 11

Really appreciate this insight, especially the point about accumulating risk across a multi-step flow instead of validating calls in isolation. That’s exactly the shift I think a lot of teams still underestimate with agent systems.

My current leaning is that the gateway is probably the best place for sequence-aware enforcement because it has the broadest visibility across tools and agents without tightly coupling policy logic to individual MCP servers. But I can definitely see the argument for a separate policy engine as systems become more complex.

Feels like this area is still evolving pretty quickly, and honestly, a lot of the “right” patterns probably haven’t fully emerged yet.

Mahdi Jazini • May 12

Really solid article, especially the part explaining the difference between MCP and the governance layer.

A lot of teams seem to assume that adopting MCP automatically makes agent systems secure, while MCP mainly standardizes communication, not control.

The “Lethal Trifecta” section was particularly important because it clearly shows why AI agents introduce a completely different risk model compared to traditional APIs.
Especially dangerous action sequences like:
reading internal data → sending externally.

I also think AI Gateways will eventually become for agents what API Gateways became for microservices.

Hadil Ben Abdallah • May 12

Really appreciate this. And yeah, that’s exactly the distinction I wanted to highlight: MCP solves interoperability, but governance and control are a completely separate layer.

Also fully agree on AI Gateways potentially becoming the “API Gateway layer” for agent systems. Feels like the industry is starting to move in that direction pretty quickly.

Raju Dandigam • May 15

This is one of the most important distinctions in production agent design: MCP gives agents a way to connect, but it does not automatically define what should be allowed. The prompt-injection-through-tool-output risk is especially easy to miss when teams are still in demo mode. I have been building github.com/rajudandigam/Ultimate-T..., a TypeScript-first catalog of AI agent, workflow, RAG, eval, and governance blueprints focused on moving from prompts to production systems. This article maps well to the guardrails and approval-gate side of that work. I’d love for builders here to explore, star, fork, or suggest security-focused project ideas that should be included.

Hadil Ben Abdallah • May 16

Thank you so much. Fully agree that the “communication vs control” distinction is where a lot of teams still get tripped up.
Your project sounds really interesting too, especially the focus on bridging the gap between demos and actual production systems.

Mininglamp • May 12

The "lethal trifecta" framing is spot on. One angle worth adding: when the agent runs entirely on-device (local model + local execution), the attack surface shrinks dramatically — no screenshots leaving the machine, no prompt injection via network calls. The tradeoff is capability, but with 4B quantized models hitting 476 tokens/s on M4, edge agents are becoming viable for production workflows where data sensitivity matters most.

Hadil Ben Abdallah • May 12

That’s a really interesting angle, and I agree the local-first approach changes the threat model quite a bit.

Keeping execution and data entirely on-device removes a huge amount of external exposure by default. Feels like edge agents are becoming much more practical now than they were even a year ago.

Gavin Lin • May 13

Thanks for the write up and definitely MCP security has been top of mind for many folks since day 1 before heavier enterprise usage can be relied on.

It seems like many players are focusing on the Gateway part, so i wonder how TrueFoundry differs from other solutions such as MintMCP or Runlayer? Would love to understand and see some comparisons

Hadil Ben Abdallah • May 16

Really appreciate this. And yeah, I think that’s exactly the interesting part of the space right now; everyone agrees the gateway layer matters, but the approaches are starting to diverge quite a bit.

From what I’ve seen, MintMCP feels more MCP-governance focused, while TrueFoundry seems to position the MCP layer as part of a broader AI control plane (routing, guardrails, observability, budgets, deployment, etc.). I still need to spend more time looking into Runlayer’s architecture in depth, though, before making a fair comparison.

Yunetzi • May 11

Nice try, MCP, but basing auth on localStorage is a backdoor. Enforce least privilege, revoke tool access regularly, and audit relentlessly.

Hadil Ben Abdallah • May 11

Yeah, agreed on the spirit of that.
If auth or sensitive tokens end up in something like localStorage, you’ve basically weakened the whole trust boundary by default.

The bigger point (which I probably should’ve made clearer) is exactly what you said: MCP only standardizes tool calling; it doesn’t enforce the security model around it. That part still needs proper least-privilege, rotation, and audit layers on top.

Ben Abdallah Hanadi • May 12

Great breakdown of the MCP security gaps 🔥

Hadil Ben Abdallah • May 12

Really appreciate that 😍

Feels like a lot of teams are starting to realize that MCP solves the communication layer, but the governance and security layer around it is where things get interesting.

View full discussion (25 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

The “Lethal Trifecta” (Why This Gets Risky Fast)

Where MCP Fits (and Where It Doesn’t)

What MCP Deliberately Doesn’t Handle

No built-in authentication

No access control model

No observability

No guardrails

The Real Threat Model for Agent Systems

1. Prompt injection via tool responses

2. Tool permission creep

3. The sequence problem

4. Shadow MCP servers

What a Production-Ready Security Model Looks Like

Least-privilege tool access

Per-agent RBAC

Guardrails on both paths

Human-in-the-loop gates

Immutable audit trails

Deployment: Where Does Your Data Actually Go?

A Practical Security Checklist (Before You Ship)

The Real Takeaway

Final Thoughts

Hadil Ben AbdallahFollow

Hadil Ben Abdallah