DEV Community

Cover image for What Building with MCP Taught Me About Its Biggest Gap
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

What Building with MCP Taught Me About Its Biggest Gap

Missing security and audit controls

I spent the last few weeks wiring up MCP at my org, stitching a handful of internal tools (GitHub, Slack, Datadog) into a shared layer that multiple teams' AI agents could call into.

Useful. Powerful. And, about a week in, slightly alarming.

The same four or five "wait, doesn't MCP handle this?" questions kept coming up. Who's allowed to call this tool? What happens if a tool returns 50MB of data? Where are we logging any of this? How do I give Team A read-only access when Team B needs write?

Turns out: MCP doesn't handle any of it. Not because it's broken, because that's not what it's for.

MCP standardizes how agents talk to tools. It says nothing about who gets to, how much they can pull, or whether anyone's keeping receipts.

I can't drop my org's internal code into a blog post, so I rebuilt the same shape of problem in a tiny public repo.

Three MCP servers, one Gemini-driven agent, one minimal gateway, all runnable in five minutes.

So, MCP. What is it, again?

A thirty-second version of MCP, straight from the official docs:

MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. Think of it like a USB-C port for AI applications — a standardized way to plug data sources, tools, and workflows into Claude, ChatGPT, or whatever model you're wiring up.

modelcontextprotocol.io

The mental model that finally made it click for me: MCP standardizes the plug, not the power grid.

Your agent speaks MCP.

Your tools (GitHub, Slack, Datadog, your database) speak MCP.

They meet in the middle and everything Just Works.

Well. Almost everything.

The demo: one agent, three MCP servers, a Gemini brain

To make this concrete, I built the smallest possible setup, a repo anyone can clone and run in five minutes:

  • A GitHub MCP server that exposes get_readme, get_latest_commit, get_repo_files
  • A Slack MCP server that exposes send_message
  • A SQLite MCP server that exposes log_event, get_logs
  • A Gemini-driven agent that picks a tool, calls it, summarizes the result, and posts to Slack

Five processes, one loop. Here's what that actually looks like on screen:

All five terminals running together

Point the agent at a repo and off it goes:

taco@TCSIND-4shZvZXk:~/mcp$ node agent/agent.js
[agent] starting one-loop run
[agent] chosen tool: github.get_readme
[agent] summary: Ragfolio is an AI-powered portfolio template that uses RAG to answer professional questions based on your resume data. It is built with a modern stack including React, FastAPI, and Google Gemini for high-performance retrieval and generation.
[agent] demo loop complete
Enter fullscreen mode Exit fullscreen mode

Try a bigger, more famous repo? Same agent, no code change:

taco@TCSIND-4shZvZXk:~/mcp$ node agent/agent.js https://github.com/vercel/next.js
[agent] starting one-loop run
[agent] target repo: https://github.com/vercel/next.js
[agent] objective: Summarize the project highlights for a dev audience.
[agent] chosen tool: github.get_readme
[agent] summary: Next.js is a full-stack React framework designed for building high-performance web applications with integrated Rust-based tooling. It extends the latest React features while providing optimized build processes and a robust ecosystem for enterprise-scale development.
[agent] demo loop complete
Enter fullscreen mode Exit fullscreen mode

Slack confirms the summaries landed:

Slack showing ragfolio and Next.js summaries

Everything works. High-fives all around. I'm ready to ship this to teams.

The

And then I actually think about what I just built.

What MCP quietly does not give you

MCP is a protocol. That's wonderful and that's also exactly the problem. Out of the box, vanilla MCP has:

  • No auth. Anyone who can reach port 4001 can call every tool on the GitHub server. In prod, that's a problem.
  • No RBAC. Every caller gets every tool, or no tools.
  • No audit. Unless you add logging to every server, by hand, there is no record of who called what.
  • No guardrails on outputs. If a tool returns a 2MB README, your agent happily eats 2MB of its context window. If a tool returns rm -rf /, your agent happily executes it too.
  • No shared policy layer. Every team ends up copy-pasting the same "validate tool name, wrap in { output, error }" boilerplate, each with its own subtly different bugs.

This is not a knock on MCP.

USB-C also doesn't come with a surge protector.

Those are separate products for good reasons.

But if you're running agents in an environment where the blast radius of "oops" is meaningful, you need that separate product.

The obvious place for that product to live? A gateway, sitting between every agent and every MCP server.

Putting a tiny gateway in front of everything

In my demo repo, the gateway is a single 90-line Express file (gateway/gateway.js). It does three things. Together, they cover 80% of the complaints above.

1. An allowlist — capability control in one Set

Every tool call is namespaced (github.get_readme, slack.send_message, db.log_event).

The allowlist is quite literally a JS Set:

const TOOL_ALLOWLIST = new Set([
  TOOL_NAMES.GITHUB_GET_README,
  TOOL_NAMES.SLACK_SEND_MESSAGE,
  TOOL_NAMES.DB_LOG_EVENT
]);
Enter fullscreen mode Exit fullscreen mode

If an agent (or a prompt-injected-into-misbehavior agent) tries to call github.delete_repo, it never reaches the GitHub server.

The gateway refuses in three lines, logs the attempt, and sends back a polite error.

Notice what this isn't: a prompt that says "please don't call delete_repo."

Prompts are suggestions.

Allowlists are rules.

2. A guardrail — the content contract

Some tools return unbounded blobs.

READMEs in particular love to be 40KB of badges and marketing copy.

The gateway has a hardcoded cap:

if (
  tool === TOOL_NAMES.GITHUB_GET_README &&
  serverResponse.output.content.length > 5000
) {
  return { output: null, error: createError("README content exceeded 5000 characters") };
}
Enter fullscreen mode Exit fullscreen mode

Here's that guardrail earning its keep on a deliberately gnarly repo:

taco@TCSIND-4shZvZXk:~/mcp$ node agent/agent.js https://github.com/juice-shop/juice-shop 
[agent] starting one-loop run
[agent] target repo: https://github.com/juice-shop/juice-shop
[agent] objective: Summarize the project highlights for a dev audience.
[agent] chosen tool: github.get_readme
[agent] first tool call failed: {
  message: 'Guardrail blocked response: README content exceeded 5000 characters',
  details: null
}
Enter fullscreen mode Exit fullscreen mode

Juice Shop's README is enormous.

Without the guardrail, my agent would've burned half its context on emoji-laden marketing.

With the guardrail, the agent got a clean "nope, try something else" and my context window stayed intact.

Gandalf

3. Logging — audit trail for free

Every single call through the gateway i.e success, failure, allowlist rejection, guardrail block gets recorded to SQLite via the db.log_event tool.

Best-effort, fire-and-forget, one await in the middleware.

Now when someone asks "what did the agent do yesterday?" the answer is a query, not a shrug.

That's it.

That's the whole governance layer.

An allowlist, a guardrail, a log roughly 200 lines of Node, no framework, readable in a single sitting.

But a toy gateway is still a toy

Here's where I have to be honest with myself.

My gateway works for the demo.

It would not survive contact with a real organization.

  • The allowlist is one Set shared by everyone. No per-team, per-agent, per-use-case scoping.
  • Guardrails are hardcoded conditionals. Adding a new one means a code change and a redeploy.
  • Authentication is nonexistent. Anyone who can curl :3000/mcp is an agent now.
  • Routing is three localhost URLs in a map. No service discovery, no health checks, no retries.
  • Adding a new tool means editing three files.

Solving each of those is a weekend project.

Solving all of them, operating them, and keeping them maintained as tools come and go across teams, that's a platform team's full-time job.

The feature I wish I'd built first: the Virtual MCP Server

While researching what a grown-up version of this gateway looks like, I came across TrueFoundry's MCP Gateway and specifically their concept of a Virtual MCP Server.

It's one of those ideas that's obvious in retrospect and I'm mildly annoyed I didn't think of it first.

Expanding-brain / galaxy-brain meme — panel 1:

The idea:

You have a bunch of real MCP servers, each exposing lots of tools.

Some tools are safe.

Some are dangerous.

Some are fine for one team and a footgun for another.

Rather than giving teams access to whole servers, you compose a Virtual MCP Server, a curated, custom, named collection of tools pulled from whichever upstream servers you want.

Concretely:

  • Your doc-summary-bot Virtual MCP Server exposes just github.get_readme and slack.send_message. That's the full surface area.
  • Your release-bot Virtual MCP Server exposes github.create_release, github.tag_commit, and slack.send_message — but not github.delete_repo, even though the upstream GitHub server technically supports it.

No new deployments.

The virtual server is just configuration on the gateway, and each one gets its own allowlist, its own guardrails, its own auth scope.

This matters because of the failure mode it quietly prevents.

Here's a solid demo video explaining Virtual MCP Server.

What this looks like in a real workflow

Let me walk through the kind of agent I'd actually want to run in production, a compliance automation bot, operating entirely through a TrueFoundry MCP Gateway endpoint:

  1. A PR merges to main. A webhook wakes the agent.
  2. The agent calls github.get_diff via its Virtual MCP Server. Authenticated, not with a bare PAT pasted into an env var, but with a service token the gateway issued and can rotate.
  3. The diff comes back. The gateway's guardrail notices it's 12,000 lines, well over the "unsupervised review" threshold and pauses the run, requesting human approval before continuing. (Try getting that out of a lone MCP server.)
  4. A reviewer approves. The agent writes the diff plus metadata to MongoDB via db.store_diff.
  5. It opens a Jira ticket via jira.create_issue, linking back to the diff.
  6. It posts a summary to Slack via slack.send_message.

Six tool calls.

Four different upstream MCP servers.

One endpoint. Every call authenticated. Every call logged to the audit trail.

The single dangerous tool the agent isn't supposed to touch, even if a prompt injection convinces it to try it isn't prompted against.

It's simply not in the Virtual MCP Server, so calling it is a 404, not a judgment call.

That, to me, is the jump from protocol to platform.

Wrapping up

MCP gave us a clean, shared language for AI agents to talk to tools.

That's a big deal, and it's easy to underrate how much of a pain this was before MCP existed.

But a shared language isn't a shared policy.

If you're running more than one agent, or letting more than one team build agents, you will need the thing that sits between "call the tool" and "did we mean to let it call the tool."

That thing is a gateway.

You can build a toy version in an afternoon.

My demo repo is proof.

But for anything real — auth, RBAC, audit, per-scope capability boundaries, and the Virtual MCP Server trick you want a platform that treats governance as the product, not the afterthought.

Take a look at TrueFoundry's MCP Gateway and the Virtual MCP Server feature if you're at the "I'm giving real agents real tools and someone in security wants to talk to me" stage.

If you build something interesting on top of either, I'd love to see it. Happy gatewaying.

Top comments (10)

Collapse
 
theeagle profile image
Victor Okefie

The USB-C analogy is the one that sticks. MCP standardized the plug, not the power grid. That's not a flaw — it's a scope decision. But most teams discover the gap at 2 AM when an agent calls a tool it shouldn't have and there's no audit trail to answer "who did that." The virtual MCP server pattern is the fix because it moves governance from "trust the agent" to "configure the gateway." Allowlists over prompts every time. Prompts are suggestions. Allowlists are physics.

Collapse
 
pranay_jain_36a3cebfbacc6 profile image
Pranay Jain

Please do not blindly copy paste AI generated content here.

Collapse
 
motedb profile image
mote

The USB-C analogy is spot-on, and I think it extends further than you mentioned.

In embedded AI systems — think robots running inference on-device — we hit a version of this gap that's even more acute. MCP tells you how to call a sensor tool, but nothing about what happens when that tool returns a 200KB point cloud at 30Hz. You don't just need auth and RBAC — you need backpressure, rate limiting, and data shaping before anything reaches the agent's context window.

We ended up building something like your gateway idea, but with a twist: the gateway also normalizes tool outputs. A camera returns raw bytes, a GPS returns coordinates, a LiDAR returns structured frames — the agent shouldn't care about any of that. It should just get a uniform, pre-processed payload. Think of it as a middleware serialization layer on top of MCP.

Curious if you've looked at the MCP spec's proposed streaming extensions at all. That seems like it could partially address the "too much data" problem, but I'm not convinced it handles the policy side.

Have you benchmarked the latency overhead of your gateway in production? In robotics, even a 50ms hop between agent and tool can be the difference between real-time and too-late.

Collapse
 
motedb profile image
mote

The USB-C analogy is perfect, and your observation about "the plug, not the power grid" captures something I keep running into from a different angle.

I work on embedded systems where agents run on-device (think drones, mobile robots) and the calculus inverts in an interesting way. Your five complaints — no auth, no RBAC, no audit, unbounded outputs, no shared policy — are all valid for an enterprise deployment. But when the agent is the only caller and the tools are local, most of these become features, not bugs. No auth overhead means lower latency for a real-time control loop. No RBAC means simpler deployment. Unbounded outputs are fine when you control the entire stack.

The problem surfaces when people try to use the same MCP tooling designed for local-first agents in a multi-team server environment. The protocol doesn't distinguish between these deployment modes, so you end up bolting on enterprise concerns that were never in scope.

Your gateway approach is solid for the multi-tenant case. I'd argue the real gap in the MCP ecosystem is deployment-aware tooling: a way to declare "this tool is local-only, single-caller" vs "this tool needs auth and rate limiting." Right now everything gets the same treatment, which means either local users over-engineer or enterprise users under-protect.

Have you looked at whether the MCP spec is tracking any of this, or is it still firmly in the "just the protocol" camp?

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

The virtual MCP pattern is the right technical shape, but the ops question nobody asks: who owns the allowlist at scale? When three teams each ship ten agents, the gateway config becomes a governance artifact that needs a maintainer, review process, and deprecation policy. Same pain I've seen with data catalogs. The tooling works, the human process around it is what collapses first.

Collapse
 
automate-archit profile image
Archit Mittal

The "plug, not the power grid" framing is spot on. I've been building MCP integrations for a few months and the governance gap is real — MCP gives you transport and discovery, but auth, rate limits, audit logs, and per-team scoping are all on you. The pattern that works for me: put a thin gateway between agents and MCP servers that handles auth tokens, logs every tool call with the caller identity, and enforces per-tool quotas. Curious whether the spec will ever absorb some of this or if gateways become a permanent part of the stack.

Collapse
 
data_nerd profile image
The Data Nerd

The authorization gap you're describing is exactly what OAuth 2.0 scopes solved for REST APIs in 2012. The pattern repeats: every new protocol ships permissive-by-default, then gets hardened after the first multi-tenant breach. MCP tools returning 50MB is a cursor/pagination decision, not an MCP spec problem — same as GraphQL N+1 before DataLoader.

Collapse
 
mako-mako-mako profile image
mako

Thanks for sharing! It's an architectural consideration!

Collapse
 
megallmio profile image
Socials Megallm

we tried this too and hit the same wall. spent more time figuring out what we could connect to than actually building the integrations some kinda registry or directory would solve so much friction

Collapse
 
itskondrat profile image
Mykola Kondratiuk

ran into the same logging gap - no cross-agent trace when 3 agents all call the same MCP tool in the same minute. ended up building a lightweight event bus just to answer who called this and when

Some comments may only be visible to logged-in visitors. Sign in to view all comments.