DEV Community

Cover image for How to Secure AI Agents in Production: What MCP Gets Right (and What It Doesn’t)
Hadil Ben Abdallah
Hadil Ben Abdallah

Posted on

How to Secure AI Agents in Production: What MCP Gets Right (and What It Doesn’t)

It usually starts with something that feels harmless.

You give an AI agent access to a few tools. Maybe it can read internal tickets, check a database, and send Slack messages. You wire things up, test a few flows, and everything works.

Then someone asks a simple question:

“What stops this agent from doing something it shouldn’t?”

That’s where things get uncomfortable.


The “Lethal Trifecta” (Why This Gets Risky Fast)

There’s a concept from recent security research that’s been getting a lot of attention.

It’s sometimes called the “lethal trifecta.”

An AI agent becomes dangerous when it combines three capabilities:

  • Access to private data
  • Exposure to untrusted input
  • Ability to take external actions

Each of these is fine on its own.

Together, they’re a problem.

Imagine this:

Your agent reads internal support tickets.
It also processes external content, like GitHub issues.
And it can send messages to Slack.

Now someone posts a malicious prompt inside a public GitHub issue.

The agent reads it, follows the instructions, and sends sensitive internal data to an external channel.

No exploit. No broken auth. Just… the system doing exactly what it was allowed to do.

This isn’t theoretical; recent security research has already demonstrated variations of this in real systems.


Where MCP Fits (and Where It Doesn’t)

To be fair, the Model Context Protocol (MCP) solves a real problem.

It standardizes how agents talk to tools.

Instead of building custom integrations for every system, you get a consistent interface. That’s a big win for developer productivity.

But MCP was never meant to be a security framework.

It’s a protocol, not a control plane.

And that distinction matters a lot in production.

This is the part most teams miss: MCP standardizes communication, but the gateway layer is what actually enforces governance and security.

Diagram of a production AI agent architecture using an AI Gateway, MCP servers, guardrails, access control, governance, routing, and observability to secure enterprise AI agents across AWS, Azure, GCP, on-prem, and air-gapped deployments.

AI agent security architecture showing how an AI Gateway, MCP servers, guardrails, and governance layers work together to secure production AI agents across cloud and on-prem infrastructure (Adapted from the TrueFoundry website)

What MCP Deliberately Doesn’t Handle

Once you start looking closely, the gaps become obvious.

MCP defines how communication happens. It doesn’t define what should be allowed.

Here’s what it leaves to you:

No built-in authentication

There’s no default mechanism enforcing identity between agents and tools. You’re responsible for implementing and managing that layer yourself.

No access control model

By default, any agent can discover and call any registered tool. There’s no concept of scoped visibility unless you build it.

No observability

Direct MCP connections give you very little insight into what’s actually happening. You don’t get a clear trace of agent behavior across tools.

No guardrails

Tools execute with whatever permissions they have. MCP doesn’t inspect inputs or outputs for risky behavior.

None of this is a flaw. It’s a design choice.

But it means MCP alone is not enough once you move beyond demos.


The Real Threat Model for Agent Systems

Agent systems introduce risks that don’t exist in traditional APIs.

If you treat them the same way, you miss what actually matters.

1. Prompt injection via tool responses

This one catches teams off guard.

You secure your prompts. You validate inputs. Everything looks fine.

But the attack comes from the tool output.

A Jira ticket. A web page. A GitHub issue.

If that content contains instructions, the agent may follow them as if they were part of the original task.

That’s how data gets exfiltrated without breaking any rules.

2. Tool permission creep

This usually starts with good intentions.

“Let’s just give the agent access to everything it might need.”

A few weeks later, it has access to 40 or 50 tools.

Most of them aren’t used.

But every unused tool increases your blast radius.

You don’t get breached because of what you use.
You get breached because of what you forgot was there.

3. The sequence problem

Two actions can be safe individually and dangerous together.

  • Read internal data → safe
  • Send data externally → safe

Combine them:

  • Read internal data → send externally → not safe

Traditional systems struggle with this because they evaluate actions in isolation.

Agent systems execute sequences. That’s where the risk lives.

4. Shadow MCP servers

This one is more of an organizational issue.

Developers spin up their own MCP servers to move faster.

No review. No governance. No centralized visibility.

Now you have tools in your system that your security team doesn’t even know exist.

And agents can talk to them.

This is exactly where a gateway layer becomes necessary.

MCP defines how tools are called.
A gateway defines what is allowed, monitored, and enforced.

Without that layer, you’re relying on application logic for security, and that doesn’t scale.


What a Production-Ready Security Model Looks Like

Once you accept that MCP doesn’t handle security, the next question is:

What does a secure setup actually look like?

At a high level, you need a layer that enforces control, visibility, and policy across every tool interaction.

Let’s break down the key controls.

Least-privilege tool access

Agents shouldn’t discover tools and then get blocked.

They shouldn’t see tools they’re not allowed to use in the first place.

This is a subtle but important difference.

In practice, this means each agent interacts with a filtered view of the tool registry.

This is exactly how TrueFoundry implements least-privilege access in production, using Virtual MCP Servers to control what each agent can even see.

In production, secure agent systems usually expose a filtered tool registry instead of giving agents global visibility into every MCP server.

TrueFoundry MCP Gateway interface showing Virtual MCP Servers, GitHub MCP integration, Atlassian tools, and controlled AI agent access for secure enterprise MCP deployments.

Example of Virtual MCP Servers in TrueFoundry, where AI agents only see the tools and integrations they are explicitly authorized to access.

 

Per-agent RBAC

Not all agents are equal.

A compliance agent and a customer support agent should operate in completely different scopes.

That separation should be enforced at the infrastructure level, not buried inside application logic.

Otherwise, it becomes fragile and hard to audit.

In mature deployments, security policies are enforced centrally instead of being scattered across application code.

TrueFoundry AI Gateway dashboard showing rate limiting policies, per-team governance rules, model access controls, and centralized security enforcement for enterprise AI systems.

Centralized AI Gateway controls for enforcing per-team rate limits, model governance, and policy enforcement across production AI agents and MCP tools (source: TrueFoundry platform)

 

Guardrails on both paths

Most teams think about validating inputs.

Fewer think about validating outputs.

You need both.

  • Inspect inputs before they reach a tool (to prevent prompt injection)
  • Inspect outputs before they reach the agent (to prevent data exfiltration)

This creates a controlled boundary around every tool call.

Human-in-the-loop gates

Some actions shouldn’t be fully automated.

Deleting data. Sending external communications. Triggering financial operations.

For these, you need approval steps.

A secure system doesn’t assume agents are always right. It gives humans the ability to intervene when it matters.

Immutable audit trails

When something goes wrong, you need answers.

Not guesses.

You need to know:

  • Which agent made the call
  • Which tool it used
  • What parameters were passed
  • What the tool returned
  • What happened next

Without this, debugging becomes impossible and compliance becomes a nightmare.


Deployment: Where Does Your Data Actually Go?

This is the part that security teams care about immediately.

In many setups, requests flow through third-party infrastructure.

That means your data leaves your environment.

For some teams, that’s acceptable.

For many enterprises, it isn’t.

A different approach is to run everything inside your own infrastructure.

Platforms like TrueFoundry support deployment in your VPC, on-prem, or even air-gapped environments, so data never leaves your domain.

In practice, this translates into infrastructure that’s already running at a serious scale.

Enterprise AI agent deployment architecture with AI Gateway, MCP Gateway, guardrails, audit logs, RBAC, observability, and secure model routing inside a customer VPC or on-prem environment.

Enterprise AI agent security architecture showing AI Gateway, MCP Gateway, guardrails, RBAC, audit logging, and observability running entirely inside customer-controlled infrastructure.

 

TrueFoundry is recognized in the 2026 Gartner® Market Guide for AI Gateways and handles production-scale workloads, processing 10B+ requests per month while maintaining 350+ RPS on a single vCPU with sub-3ms latency.
It’s compliant with SOC 2, HIPAA, GDPR, ITAR, and the EU AI Act and is trusted by enterprises including Siemens Healthineers, NVIDIA, Resmed, and Automation Anywhere.


A Practical Security Checklist (Before You Ship)

If you’re moving agents to production, this is the checklist I’d actually use:

  • [ ] Are all tool interactions going through a centralized MCP gateway?
  • [ ] Does each agent only see the tools it’s allowed to use?
  • [ ] Are tool inputs and outputs inspected for risky behavior?
  • [ ] Do high-risk actions require human approval?
  • [ ] Can you trace every agent action end-to-end?
  • [ ] Is everything running inside your own infrastructure (not a third-party SaaS)?

If you answer “no” to more than one of these, you’re not production-ready yet.


The Real Takeaway

MCP is a solid foundation.

It makes tool integration cleaner, faster, and more consistent.

But it doesn’t make your system secure.

Security comes from the layer that controls everything around that interaction.

That’s the difference most teams miss.

They adopt MCP, see things working, and assume they’re done.

In reality, they’ve only solved the communication problem, not the control problem.

MCP standardizes communication.
The gateway standardizes control.


Final Thoughts

AI agents change how systems behave.

They don’t just respond to requests. They take actions, make decisions, and interact with multiple systems in sequence.

That’s powerful.

But it also means the risk model is different.

If you treat agents like simple APIs, you’ll miss the failure modes that actually matter.

The teams that get this right don’t just add tools; they add structure around how those tools are used.

If you’re starting to think seriously about security, that’s a good sign. It usually means your system is moving from demo to something real.

If you want to explore what a unified control plane for models, tools, and agents looks like in practice, you can try TrueFoundry free, no credit card required, and deploy it in your own cloud in under 10 minutes.


Thanks for reading! 🙏🏻
I hope you found this useful ✅
Please react and follow for more 😍
Made with 💙 by Hadil Ben Abdallah
LinkedIn GitHub Twitter

Top comments (0)