MCP Gives Agents the Keys — Who's Watching the Door?
The Model Context Protocol (MCP) is rapidly becoming the standard way AI agents interact with external tools — databases, file systems, APIs, code repositories. Instead of hardcoding integrations, developers expose MCP servers that agents discover and call dynamically.
This is powerful. It's also a massive expansion of your attack surface.
MCP effectively gives an AI model the ability to read files, query databases, make HTTP requests, and execute code — all based on instructions it receives in its context window. If an attacker can influence that context, they can influence what the agent does with your tools.
Most MCP security guidance focuses on building secure servers. That's important, but it's only half the picture. If your team consumes third-party MCP servers — or even internal ones you didn't write — you need security at the point where traffic flows: the gateway.
The MCP Attack Surface
Before diving into defenses, let's map the threats. MCP introduces four categories of risk that didn't exist with traditional API calls:
Tool poisoning — A malicious MCP server can advertise tools with deceptive descriptions. The tool named read_file might actually exfiltrate data to an external endpoint. Since the model selects tools based on their descriptions, a poisoned description can redirect agent behavior without any visible change to the user.
Data exfiltration via tool outputs — An MCP server returns data to the model as tool results. If the server has access to sensitive systems (databases, internal APIs), it can surface PII, credentials, or proprietary data into the model's context — where it may leak into logs, responses, or downstream tool calls.
Prompt injection through tool descriptions — MCP tool descriptions are included in the model's system context. An attacker who controls a tool description can inject instructions that override the user's intent. This is indirect prompt injection applied to tool metadata.
Over-permissive server configurations — MCP servers often expose more capabilities than needed. A file system server might grant read/write access to the entire disk when the agent only needs one directory. There's no built-in permission model in MCP itself.
Why Server-Side Security Isn't Enough
OWASP published a practical guide for secure MCP server development that covers input validation, output sanitization, and least-privilege configurations. It's solid guidance — for server authors.
But here's the thing: most teams aren't writing their own MCP servers. They're consuming them. Community-built servers for GitHub, Slack, Jira, databases, and file systems are being plugged into agent workflows with minimal review. You're trusting that every server you connect:
- Validates its inputs correctly
- Doesn't leak sensitive data in tool results
- Has descriptions that accurately reflect behavior
- Doesn't phone home with your data
That's a lot of trust. And even for internal servers you do control, there's no centralized visibility into what's actually flowing through MCP connections at runtime.
This is the same problem that API gateways solved for microservices a decade ago: you need a chokepoint where you can inspect, log, and enforce policy on all traffic — regardless of what's on either end.
Gateway-Level MCP Security
An AI gateway sitting between your agent and its MCP servers can provide controls that neither the client nor the server can enforce alone:
Inspect tool call arguments — Before a tool call reaches the MCP server, the gateway can scan arguments for PII (names, emails, credit card numbers, API keys) and either redact them or block the call entirely. This prevents your agent from accidentally sending customer data to a third-party tool.
Audit all MCP traffic — Every tool call, every result, every error — logged with full context including the trace ID, the originating prompt, and the model's reasoning. This creates the audit trail that compliance teams need and that MCP doesn't provide natively.
Detect and block injection — If the gateway detects prompt injection patterns in tool descriptions or tool results, it can block the response before it reaches the model. This is the critical difference between logging an attack and preventing it.
Rate-limit tool calls — An agent stuck in a loop can burn through API quotas and rack up costs. Gateway-level rate limiting per tool, per server, or per trace prevents runaway agents from causing damage.
Enforce allowlists — Only permit tool calls to approved MCP servers. If a poisoned tool description tries to redirect the agent to an unauthorized endpoint, the gateway blocks it.
Using Evals to Understand MCP Tool Usage
Logging tells you what happened. Evals tell you whether it was the right thing.
When MCP traffic flows through your gateway, you can run LLM-as-a-judge evaluations on tool call patterns to answer questions that logs alone can't:
Which tools is the model actually calling? — Track tool call distribution across your MCP servers. If a model suddenly starts calling a tool it's never used before, that's worth investigating.
Are tool calls relevant to the user's request? — An eval can score whether each tool call was necessary and appropriate given the original prompt. A low relevance score might indicate the model is being manipulated via indirect injection or is simply confused.
Is the model leaking data across tool calls? — Evaluate whether sensitive information from one tool's output is being passed into another tool's input. This catches data exfiltration patterns that per-call inspection might miss.
Quality scoring for tool results — Not all MCP servers are equal. Eval scores on tool result quality help you identify servers that return noisy, incomplete, or misleading data — before your users notice.
Running evals on production MCP traffic turns your gateway from a passive observer into an active quality and security monitor.
Inspect, Audit, and Block — Not Just Log
Most observability tools treat MCP traffic as just another set of log entries. That's not enough when your agent has write access to production systems.
The security model for MCP needs three layers:
Layer 1: Real-time inspection — Every tool call is scanned in-flight. PII detection runs on arguments and results. Injection patterns are matched against tool descriptions and outputs. This happens synchronously, before the data reaches its destination.
Layer 2: Active blocking — When inspection finds a threat, the gateway doesn't just flag it — it blocks the call. The model receives an error response, the trace records the blocked call with the reason, and an alert fires. This is the difference between "we detected an injection attempt in our logs" and "we stopped an injection attempt before it executed."
Layer 3: Continuous evaluation — Evals run asynchronously on completed traces, catching patterns that real-time inspection can't — like gradually escalating privilege across a chain of tool calls, or a model being slowly steered toward a specific tool by repeated subtle injections.
// Example: MCP tool call flowing through a gateway
// The gateway inspects, logs, and can block at each step
const response = await fetch("https://proxy.grepture.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer gpt_your_key",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "claude-sonnet-4-5-20250514",
messages: [
{ role: "user", content: "Summarize the Q1 sales report" }
],
tools: [
{
type: "function",
function: {
name: "read_document",
description: "Read a document from the company drive",
parameters: {
type: "object",
properties: {
path: { type: "string" }
}
}
}
}
]
}),
});
// The gateway:
// 1. Logs the full tool call chain in a trace
// 2. Scans tool arguments for PII/secrets before forwarding
// 3. Checks tool descriptions for injection patterns
// 4. Blocks the call if a threat is detected
// 5. Runs async evals on tool call relevance and data flow
How Grepture Helps
Grepture sits in the request path as an AI gateway — which means MCP tool calls that flow through your LLM API already pass through Grepture. Here's what you get out of the box:
Full trace visibility — Every tool call appears in the trace waterfall, showing the complete chain of tool invocations with timing, arguments, and results. You can see exactly what your agent did and in what order.
PII detection on tool traffic — Grepture's detection rules run on tool call arguments and results, catching sensitive data before it leaves your infrastructure or enters your model's context. Over 50 built-in PII patterns, plus custom rules.
Injection detection and blocking — Prompt injection detection applies to the full request context, including tool descriptions and results. When an injection is detected, Grepture can block the request and log the attempt.
Evals on tool call patterns — Run evaluators on your MCP traffic to score tool call relevance, detect anomalous patterns, and track quality over time. Custom eval prompts let you define domain-specific quality criteria for your agent's tool usage.
Cost and usage tracking — Track token usage and cost per trace, so you know exactly how much each MCP-powered workflow costs — including the overhead of tool call chains.
Key Takeaways
- MCP security doesn't stop at the server. If you consume MCP servers you didn't write, you need visibility and control at the gateway layer.
- Inspect and block, don't just log. Real-time PII scanning and injection detection on tool call traffic prevents attacks instead of documenting them.
- Evals add the "why" layer. Logging shows what tools were called; evals reveal whether those calls were appropriate, relevant, and safe.
- Treat MCP like any other API surface. Gateway-level controls (rate limiting, allowlists, audit trails) are the same patterns that secured microservices — applied to AI agent workflows.
- The August 2026 EU AI Act deadline makes this urgent. Article 14 requires human oversight of high-risk AI systems. An unmonitored agent with MCP tool access is the opposite of oversight.
Top comments (0)