MCP Governance: What It Actually Means in Production (And the Four Walls We Had to Build)

#ai #security #mcp #devops

TL;DR: MCP governance is the set of controls that determine which agents can access which tools, under which identities, with what limits, and with what audit trail. Raw MCP has none of this - it's a protocol for structured tool calls, not a policy engine. The governance layer is something teams have to build deliberately, and most don't start until something breaks. This post is the thing I wish I'd read before our first incident.

When we first deployed MCP servers in production, our governance story was: don't do anything obviously stupid. No access controls beyond "you have the server URL or you don't." No audit trail beyond whatever logs the downstream API generated. No limits on which tools any agent could call.

That story lasted about four months, until three things happened in quick succession:

A support agent that had access to a Jira MCP server ended up being tested with a write-enabled configuration. It created 47 duplicate tickets before someone noticed.

A contractor finished an engagement and we revoked their GitHub and Slack access. Three weeks later their Jira MCP token was still active because it wasn't on the offboarding checklist - MCP credentials weren't part of our standard process.

A research agent pulling competitor content via a web-fetch MCP tool retrieved a page containing injected instructions that the agent executed. Nothing catastrophic, but the blast radius could have been significant with a differently-configured agent.

All three of these are governance failures. None of them required sophisticated attacks. They just required an absence of controls that should have existed.

What MCP governance actually is

MCP governance is the controls layer on top of the MCP protocol that determines:

Who can connect to which MCP servers (identity + access control)
What they can do once connected (tool-level RBAC)
Under which limits (rate limits, token budgets, workflow step caps)
With what record (audit trail per invocation)
With what policy enforcement on content (input and output guardrails)

The MCP protocol itself doesn't provide any of this. It defines how agents call tools and how tools return results. The governance is infrastructure you build on top of it or don't, and learn why you should have.

The four walls

After our three incidents, we mapped the governance problem into four distinct control areas. Each has different failure modes and different mitigations.

Wall 1: Access control

The failure mode: Any agent with a server URL can call any tool. No access scoping, no identity requirement, no per-team restrictions.

What we needed: Tool-level RBAC. Not just "team A can connect to the Jira server" but "team A can call search_issues and create_issue but not delete_issue, delete_project, or bulk_update."

The second requirement is what most early MCP setups miss. Controlling server access at the connection level doesn't help if the server exposes both read and write tools and you want different agents to have different capabilities.

How we implemented it: All MCP access routes through a central gateway. RBAC policies are defined per server, per tool, per role. Agents receive filtered tool listings - tools/list returns only what they're authorized to call. They never see tools they can't use, which removes the surface area for misconfiguration entirely.

We use TrueFoundry's MCP Gateway for this. The RBAC configuration took roughly a day per server to set up properly. We also use Virtual MCP Servers - curated logical endpoints that expose only the tool subset a given team persona needs, so a research agent and a customer support agent see completely different tool surfaces even if they're both authorized for the same underlying servers.

Wall 2: Identity and credential governance

The failure mode: Credentials are managed individually per developer per server. When someone joins, they set up credentials manually. When someone leaves, you hope someone revokes them everywhere.

What we needed: Centralized credential management with offboarding that propagates.

The contractor credential problem from our incident is almost universal. MCP servers are typically a separate system from everything else in your onboarding/offboarding workflow. Unless you deliberately connect them, they won't be covered.

How we implemented it: Every developer and service agent authenticates to a central gateway with a single token (PAT for humans, VAT for service agents). The gateway manages all downstream credentials — GitHub OAuth, Jira API keys, Confluence tokens, internal API service accounts and auto-refreshes them before expiry.

Offboarding became one action: revoke the gateway token. Every downstream MCP server's access is cut automatically because the credentials the gateway manages are under the gateway's control, not the individual's.

For tools that should act on behalf of a specific user (post to Slack as a person, not as a bot), we use OAuth 2.0 with our Okta integration. The gateway handles token exchange and refresh - agents don't manage OAuth flows directly.

Wall 3: Audit trail

The failure mode: Tool calls generate logs in two places - the MCP server logs and the downstream API logs — and neither one tells you which agent or user triggered the call.

What we needed: A structured audit trail per tool invocation that records: which agent, which authenticated user, which tool, what input parameters, what the response was, at what time.

This is what the compliance team actually asks for. "Tool X was called 400 times last month" is not useful. "Agent Y, under service account Z, called delete_record on table T at 14:23:07 with these parameters" is what makes an incident investigation recoverable.

How we implemented it: The gateway logs every tool call with structured metadata before routing it to the downstream server. Logs export via OpenTelemetry to our Datadog setup. Queryable. When our security team asked "which agents accessed the production data API in the last 30 days," it went from a two-day investigation to a ten-minute query.

Wall 4: Content guardrails

The failure mode: An agent retrieves content from an external system via an MCP tool. That content contains injected instructions. The agent processes it and executes the injected instructions. This is prompt injection via tool response and it's the one that caught us with the web-fetch agent.

What we needed: Post-tool-call inspection of what the tool returns before it enters the agent's context.

This is different from input guardrails on LLM calls (which inspect what the user or application sends to the model). Tool response guardrails inspect what the MCP server sends back before the agent sees it. The injection happens in the tool's output, so the defense has to be at that layer.

How we implemented it: Gateway-level post-execution guardrails that inspect MCP tool responses before returning them to the calling agent. We run PII detection and prompt injection pattern matching on tool responses. For high-risk tools (web fetch, document retrieval, external API responses), we also apply content sanitization in mutate mode — the response is modified to strip detected injections rather than just being flagged.

This is the governance layer most teams skip until they've had the injection near-miss. We wish we'd built it first.

The four walls in a table

	What breaks without it	What implements it
Access control	Any agent calls any tool, including destructive ones	Tool-level RBAC, Virtual MCP Servers, filtered `tools/list`
Identity + credentials	Offboarding misses MCP access; shared tokens have no user attribution	Centralized gateway with PAT/VAT auth, IdP integration
Audit trail	Incidents aren't reconstructable; compliance questions take days to answer	Structured log per tool call with user identity, exported to SIEM
Content guardrails	Prompt injection via tool responses executes in agent context	Post-tool-call inspection before response enters agent context

What governance doesn't require

A few things I'd push back on from early conversations about MCP governance:

You don't need to solve all four walls at once. Wall 1 (access control) has the highest return on investment because it prevents the most common class of incident - over-permissioned agents. Start there. Wall 4 (content guardrails) is the most technically involved and matters most when agents are retrieving external, untrusted content. If your agents only call internal APIs with trusted responses, Wall 4 can wait.

You don't need custom tooling for every server. The pattern that scales is a central gateway that handles governance for all servers, rather than baking governance into each server individually. One config to update when policy changes, one place to check when something goes wrong.

Governance doesn't have to slow agents down. Auth checks, RBAC evaluation, and audit log writes can all happen in-memory on the hot path with async log flushing. TrueFoundry's gateway adds sub-3ms latency under load. If governance is adding seconds per tool call, something is wrong with the architecture, not with governance in principle.

What governance controls did you build first and which incident made you prioritize it? The "agent called a tool it shouldn't have" pattern seems to be the most common forcing function. Drop it in the comments.