DEV Community

Cover image for Why MCP Needs a Governance Layer
Ricardo Rodrigues
Ricardo Rodrigues

Posted on

Why MCP Needs a Governance Layer

The MCP ecosystem is at an inflection point. What started as a protocol for connecting AI assistants to external tools has become the default integration layer for a generation of AI-powered engineering tools — Claude, Cursor, Windsurf, GitHub Copilot. Thousands of MCP servers exist. Tens of thousands of developers have installed them.

Almost none of them have thought about what happens when this goes to production.


The Individual Problem Is Solved

For a solo developer, MCP is close to perfect. You find a server, copy a JSON snippet, paste it into your config file, restart your AI client, and you have a new capability. GitHub integration, database access, Slack messaging, web search — all available in natural language. The friction is low enough that exploration is easy.

The MCP ecosystem solved the individual problem well.

The Team Problem Is Not

The moment a second person joins, the model breaks.

Authentication. You have a workspace token. You share it with your team. Now everyone has the same access to every tool. You cannot differentiate between what Alice is allowed to do and what Bob is allowed to do. You cannot revoke Bob's access without rotating the token — which breaks every AI client on the team simultaneously.

Deployment. Most MCP servers run as local stdio processes via npx. They exist only on the machine where they were installed. They cannot be shared across a team. They cannot be put behind a gateway. They cannot be monitored or audited. When a developer leaves, their MCP servers leave with them.

Visibility. When something goes wrong — when a production database gets queried unexpectedly, when a CI/CD pipeline is triggered by an agent, when sensitive data appears in a context where it should not — you cannot answer the most basic post-incident question: "which agent called which tool, and when?" There is no log. There is no audit trail. There is no answer.

Quality. 7,500+ MCP servers exist. GitHub stars measure historical interest, not current health. A server with 3,000 stars may not have had a commit in 18 months. There is no quality signal. There is no trust layer.


Why This Matters Now

AI agents are moving toward production infrastructure. They are not just answering questions — they are writing code, querying databases, triggering deployments, sending messages. The tools they use via MCP are real tools with real access to real systems.

The governance problem is not theoretical. It is the same problem that made API keys dangerous before OAuth, that made server access chaotic before centralised identity providers, that made network access ungovernable before Zero Trust. Every time a powerful capability becomes accessible to teams, governance follows — because it has to.

MCP is at the API key moment. The capability exists. The governance does not.


What Governance for MCP Looks Like

A governance layer for MCP needs to answer four questions consistently:

1. Who is using which tools?
Every tool call must be attributable to a specific person. Not "the workspace called something" — "Alice called the database query tool." Member-level attribution requires per-member tokens and protocol-level logging.

2. What is each person allowed to do?
Access control must operate at the tool level, not the workspace level. The security engineer should not have the same MCP permissions as the junior developer. Tool allowlists per member enforce least privilege at the protocol layer.

3. How do you revoke access instantly?
When someone leaves the team, their access must be gone in seconds. Without touching anyone else's configuration. Without rotating credentials that break the whole team. Per-member tokens make this possible.

4. Where do the servers run?
Local stdio processes are ungovernable by design. MCP servers need to run in isolated environments with defined lifecycles — deployable, monitorable, and terminatable without touching developer machines.


The Category That Does Not Exist Yet

Enterprise MCP governance is not a product category yet. It will be.

The same pattern has played out in every previous infrastructure layer. API management was chaos before control planes emerged. Identity was chaos before centralised providers. Network access was chaos before Zero Trust made it manageable.

MCP is the next layer. The governance problem is structural, not optional. And the window to define this category is open now — before the major platforms build their own solutions, before the ecosystem consolidates, before the standard emerges.

The teams that govern MCP today will not be scrambling to retrofit security into production deployments tomorrow.


Ricardo Rodrigues is the founder of MCPNest.io and a Platform Engineer at a large financial institution in Portugal. MCPNest.io is the enterprise governance layer for MCP servers — Gateway, per-member access control, hosted infrastructure, and audit logging.

mcpnest.io

Top comments (5)

Collapse
 
kenwalger profile image
Ken W Alger

You’ve hit on the exact pressure point where early MCP adoption will struggle in the enterprise: we’ve solved for interoperability, but we haven’t yet standardized on intent.

The 'fenceless' nature of MCP is exactly why a Sovereign System approach is necessary. In this model, the MCP server isn't just a data provider; it acts as a Governance Gate. We have to move from 'Reactive Enforcement' (catching a bad prompt after it’s already been processed) to a 'Proactive Negotiation' layer that sits between the agent and the resource.

This isn't just a safety requirement; it’s a Fiscal Architecture necessity. Without a governance gate, companies pay an 'Infrastructure Tax' in three ways:

  • Redundant Discovery: Agents burning expensive cloud tokens to repeatedly 'discover' tools they already have access to.

  • Hallucination Labor: The sunk cost of high-value engineers debugging agentic errors caused by ungrounded tool calls.

  • Unmanaged Burn: Giving an agent a 'corporate credit card' (API access) with no spending limit or audit trail.

By implementing a 'Sieve-and-Sign' pattern, where a local-first gateway inspects the tool call before it reaches the reasoning engine, we can optimize the Unit Economics of Intelligence. We turn a black-box expense into a predictable, high-yield infrastructure asset.

I’m curious: do you see this governance living within the individual MCP servers, or as a centralized 'Intelligent Sieve' that proxies all tool traffic?

Collapse
 
codemalasartes profile image
Ricardo Rodrigues

Ken, this framing is sharp — and the three-part Infrastructure Tax
is a better way to explain the cost than anything I've written.

On your question: centralized proxy, not server-side governance.

Embedding governance inside individual MCP servers doesn't scale —
you'd need every server author to implement the same policy engine,
and you lose the cross-server view that makes audit meaningful.
"Which agent called which tools across all servers in the last hour"
is only answerable at the proxy layer.

The gateway approach (mcpnest.io/api/gw/{workspace-slug}) sits
between the AI client and all upstream servers. Every tools/call
passes through it — SHA-256 Bearer auth, per-member token
attribution, tool allowlists enforced before the call reaches
the server, latency and status logged on every call.

Your "Sieve-and-Sign" pattern is exactly this. The sieve is the
allowlist enforcement (does this member have permission to call
this tool?). The sign is the audit log entry (member_id, tool_name,
latency_ms, status — immutable record of what was authorized).

The "Redundant Discovery" tax is one we're working on — tools/list
is currently called upstream on every request. Caching the tool
manifest per workspace with invalidation on server changes is the
fix, not yet shipped.

Curious whether you see the Sovereign System living at the
workspace level (per team) or at the organization level
(single gateway for the whole company)?

Collapse
 
kenwalger profile image
Ken W Alger

Ricardo, this is a vital distinction. Seeing the 'Sieve-and-Sign' pattern live in mcpnest.io makes the 'Infrastructure Tax' feel much more manageable.

Regarding the Workspace vs. Org level:

I lean toward a Hierarchical Governance model.

The Workspace as the Sieve: Governance is most effective when it’s close to the work. Individual teams (Workspaces) know which tools are 'dangerous' or 'costly' in their specific context. They need the autonomy to set allowlists and token attributions without waiting for a central IT ticket.

The Org as the Sign: The audit trail (the 'Sign') must roll up to the Org level. For a Sovereign System, the forensic integrity of the log is only useful if it’s immutable and centralized. You want a single pane of glass to answer 'What did our agents do today?' across all departments.

The 'Sovereign' sweet spot is likely a Federated Gateway:
The gateway sits at the Workspace level for latency and local control, but it pipes its 'Sign' events (the immutable audit logs) to a central Org-level forensic store.

If it’s only at the Org level, the 'Sieve' becomes a bottleneck. If it’s only at the Workspace level, the forensic audit becomes a fragmented mess of 'missing ledgers' when a team deletes a workspace.

How are you thinking about log retention? In a forensic-first system, those SHA-256 attributed logs are arguably more valuable than the tools themselves over the long term.

Thread Thread
 
codemalasartes profile image
Ricardo Rodrigues

Ken, the Federated Gateway model is exactly right — and
better articulated than the framing I had internally.

Today MCPNest sits at the Workspace level for both Sieve
and Sign. Each workspace has its own gateway endpoint
(mcpnest.io/api/gw/{slug}), per-member SHA-256 Bearer
tokens, tool allowlists, and an audit log scoped to that
workspace. Three columns on every tools/call: member_id,
tool_name, latency_ms, status. No payload bodies stored.

The Org-level Sign rollup is the gap. Right now if a
customer has five workspaces, they have five audit ledgers —
no single pane of glass. That's the Phase 2 build:
Enterprise tenant with cross-workspace audit aggregation,
immutable storage with content-addressed hashing, and
optional pipe-to-customer-SIEM for the regulated industries.

On log retention specifically: 90 days hot retention
currently, served from Postgres. The honest answer is
that 90 days is wrong for forensic-first. For real
governance you need at minimum the compliance window
(SOX is 7 years, HIPAA is 6, GDPR is "as long as
necessary"). The roadmap there is two-tier — hot for
30-90 days, cold archival to object storage with
Merkle-tree integrity proofs, retrievable on demand.

The cold-tier design is where it gets interesting.
Append-only WORM storage for SHA-256 attributed logs has
specific compliance value — but it also creates a
defensibility moat that's stronger than the gateway
itself. The gateway is the wedge. The audit ledger is
the asset.

A question back: do you see the immutable log primarily
as a compliance artifact (regulators ask, you produce),
or as an active forensic surface that security teams
query continuously? The product shape differs materially
between those two — one wants S3 + retrieval, the other
wants real-time queryable indices with anomaly detection.

Thread Thread
 
kenwalger profile image
Ken W Alger

Ricardo, this is a masterful breakdown. Seeing how you're structuring MCPNest at the workspace level—and identifying the cross-workspace Orgrollup as the Phase 2 asset—is pure validation of the Federated Gateway model. You're entirely right: the gateway is the wedge, but the unalterable audit ledger is the enterprise moat.

To your question about the shape of that log: The answer is both, but strictly decoupled.

If you build it only as a compliance artifact (Cold S3/WORM with Merkle proofs), you satisfy the legal team, but you leave the security team blind in real-time. If you build it only as an active forensic surface (Hot OpenSearch/Postgres), the storage costs over a 7-year SOX/HIPAA window will completely destroy your margins.

The architecture requires a split pipeline:

  1. The Active Surface (The Hot Tier): Ingest tool call signatures, latencies, and statuses into a high-speed, queryable index for 30–90 days. This is where your real-time anomaly detection lives—monitoring for sudden spikes in tool invocation, data exfiltration, or credential harvesting.

  2. The Compliance Anchor (The Cold Tier): Stream those same logs asynchronously into an append-only, content-addressed WORM storage tier (like S3 with Object Lock) bound by Merkle-tree integrity proofs.

By isolating the two, your security engine can query the hot tier continuously without impacting production performance, while your cold tier sits immutably in the background, ready to prove system integrity to a regulator 6 years from now.

The fact that you're already mapping out Merkle-tree proofs for cold archival storage tells me MCPNest is positioning itself exactly where the enterprise market needs it. This is real governance.