Razu Kc

Posted on May 12 • Edited on May 29

Compile-time vs runtime: where MCP security actually lives

#mcp #security #sandbox #ai

Update (2026-05-29): The four-layer decomposition below is still useful as a procurement checklist — do we have a story for each of these four points in the lifecycle? — and that's the job it does best. For classifying tools and projects, I've since moved to a tighter three-way working model: static technical, static governance, dynamic attestation. The current write-up of that framing is here: A working map of MCP security tools. If you've landed on this post via search, read both — they answer different questions about the same space.

Disclosure: I represent capgate, a compile-time policy compiler for MCP servers. capgate appears as the worked example in the compile-time section. The other three sections describe categories, not specific products.

If you run a Model Context Protocol (MCP) server in production, you've probably noticed that "MCP security" doesn't mean one thing. It means at least four things, sitting at different points in the lifecycle of a tool call, solving different problems. Most teams I've talked to need two or three of them. Almost none of them realize that until they've shipped the wrong one first.

This is a positioning post. The goal isn't to argue that any one layer is best — it's to give you a way to figure out which layer your team actually needs, so you stop bolting the wrong tool onto the wrong problem.

The four layers

A tool call through an MCP server passes through, conceptually, four points where security work can happen:

manifest → [1] compile-time policy → [2] sandbox runtime → [3] tool invocation → [4] decision log
              emission                 inspection             gateway / auth        signed receipts

Each of these is its own discipline with its own tooling and its own people who care deeply about it. Lumping them together as "MCP security" is what causes teams to evaluate one tool for a problem it doesn't solve.

1. Compile-time policy emission

What it does. Reads the MCP server's manifest before the server runs and emits a concrete sandbox policy — bwrap argv, docker run flags, an egress allowlist, a list of environment variables the server is allowed to see. The output is a static artifact. It does not execute, does not speak MCP on the wire, does not watch traffic.

The argument for it. Most sandboxes a team actually runs are hand-written: someone reads the README, makes their best guess at what the server needs, and writes --cap-drop ALL --network host --volume ... from memory. That hand-written sandbox is the de facto security policy for the server, and it's invisible to code review. A compile-time policy makes the sandbox a reviewable artifact: it lives in the repo, it changes when the manifest changes, and it can be diffed in a PR.

When you'd reach for it. Your team runs more than two or three MCP servers, the people running them aren't the people who wrote them, and "what is this server allowed to do?" is a question that has to be answerable from the repo, not from someone's memory.

Concrete example. Here's a minimal manifest and the policy capgate emits for it:

import { compile, lowerToDocker } from 'capgate';

const docker = lowerToDocker(compile({
  name: 'my-server',
  version: '0.1.0',
  tools: [{ name: 'read_file', capabilities: ['fs:read:/workspace/**'] }],
}));

console.log(docker.argv.join(' '));
// → --rm --cap-drop ALL --security-opt no-new-privileges --read-only
//   --tmpfs /tmp --network none --volume /workspace:/workspace:ro

One capability in, one container policy out. The server declared no network → --network none. Read-only filesystem declared → :ro mount. No environment variables declared → none cross the boundary. A real manifest with several tools merges per-tool capabilities into a server-level policy.

The limits. Compile-time emission doesn't watch what the server actually does. It enforces what was declared. A manifest that under-declares is silently over-granted at runtime; that bug lives in the manifest, not the compiler. If you need to catch a server doing something its manifest didn't say it would, that's layer 2.

2. Runtime sandbox inspection

What it does. Watches the MCP server as it runs, inspects its tool definitions and call traces against a catalog of known threat techniques, and surfaces risky behavior — prompt injection patterns in tool descriptions, indirect tool-chaining, unexpected outbound calls.

The argument for it. You did not write this server. You found it on a registry, or a vendor handed it to you, or your CI agent installed it. You don't know what its tool descriptions look like. You don't know whether one of its tools chains into another in a way the author didn't anticipate. You want a sensor in the path that says "this looks wrong" — preferably one that maps what it sees onto a recognized taxonomy (STRIDE, MITRE ATT&CK, OWASP LLM Top 10) so your security team can interpret the signal.

When you'd reach for it. Your threat model assumes the server might be adversarial or buggy in ways its manifest doesn't reveal. Common cases: third-party servers, large in-house catalogs of servers you can't deeply audit, regulated environments where "we have a sensor watching for X" is a compliance requirement.

The limits. Runtime inspection catches what's already happening. By the time the sensor sees the suspicious outbound request, the request has been made. It's a detection layer, not a prevention layer. Most teams that adopt runtime inspection also have layer 1 (compile-time policy) running underneath it — the sandbox prevents the bulk of the bad things, and inspection catches what gets through the gaps in declaration coverage.

3. API gateway / per-request authorization

What it does. Sits in front of the MCP server as a network proxy. Every request from the agent to a tool passes through it, and the gateway decides — based on identity, headers, role, policy rules — whether that request is allowed. This is the same shape as a regular service mesh, applied to MCP traffic.

The argument for it. Your concern isn't what the server can do. Your concern is who is asking it to do things. You have multiple agents, multiple users, multiple environments hitting the same MCP server, and the right thing to enforce is "developer Alice can call apply_patch, but the deploy agent can only call search_code" — a per-caller policy, not a per-server one. This is identity-and-access work, and the mature tooling for it is exactly the kind of API-gateway / authorization-proxy software that solved the same problem for HTTP APIs ten years ago.

When you'd reach for it. Multiple callers share an MCP server, and the boundary you care about is between callers, not between the server and the host. Your security team already runs an authorization gateway for everything else; extending it to MCP is cheaper than evaluating a new category.

The limits. A gateway can stop a call before it reaches the server. It can't stop the server, once invoked, from doing something the gateway didn't anticipate. If the agent is allowed to call read_file('/etc/passwd'), the gateway authorized that call; the sandbox in layer 1 is what prevents the underlying read from succeeding.

4. Decision logs / signed receipts

What it does. Cryptographically logs each tool invocation — what was called, with what arguments, by whom, with what result. The log is tamper-evident: a downstream auditor can verify, weeks later, that the recorded history hasn't been edited.

The argument for it. This is for environments where "we need a trustworthy record of what the agent did" is a hard requirement — regulated industries, agents that touch financial state, anything where the conversation with the auditor begins with "prove to me this didn't happen." It's the audit-trail layer, and it doesn't make sense to evaluate it for the same reasons you'd evaluate layers 1-3. It solves a different problem.

When you'd reach for it. Your compliance posture requires a verifiable record of agent actions. Or you're building tooling for other people's agents and you need to give them a way to prove what their agents did. Or you're sufficiently far ahead of the curve that you're trying to build the artifact regulators will eventually ask for.

The limits. Receipts prove what happened. They don't prevent anything. A signed log of an agent exfiltrating a secret is still a log of an agent exfiltrating a secret. Like layer 2, this is almost always run alongside layer 1, not instead of it.

A decision matrix

If you're trying to figure out which layer to invest in first, the question I'd ask is:

Question you're asking	Layer that answers it
"What is this server allowed to do?"	1 — compile-time policy
"What is this server actually doing right now?"	2 — runtime inspection
"Who is allowed to call which tool?"	3 — API gateway
"What did the agent do, and can we prove it?"	4 — decision logs

Most production teams running MCP servers at scale end up with layers 1 + 2, or 1 + 3, or 1 + 4. Layer 1 is the load-bearing one — it prevents the bulk of bad outcomes by construction. The others are sensors and policy controls that ride on top of that prevention.

The wrong answer is buying a layer-2 product because "we need MCP security" and then discovering six months later that you never wrote a sandbox underneath it, so the sensor is logging the smoke from fires you could have prevented.

What this means for tool selection

The reason it's worth being precise about these four layers is that the tools in this space are, by and large, not trying to do all four. The honest ones are explicit about which layer they live at. The ones to be cautious of are the ones that pitch themselves as "MCP security" without telling you which layer.

When you're evaluating something in this space, the first question to ask is: which of the four does this live at? If the answer is "all of them," push harder — usually it means one of the four is the actual product and the rest are marketing. There's nothing wrong with focused tools; the wrong shape is a tool that pretends to cover layers it doesn't.

Where this is going

The four layers will look more distinct, not less, as the MCP ecosystem matures. Right now most teams pick one tool and accept that it's covering 60% of what they need. In a year, expect to see specialized tools at each layer, expect to see them composed, and expect the question "which layer is this?" to be the first one in any procurement conversation.

If you're picking a layer to start with, start with layer 1. It's the cheapest prevention and the foundation everything else assumes is in place.

Please feel free to leave questions for me in the comments.

capgate is open source under Apache 2.0. If you're running MCP servers in production and want to compare what your hand-written sandbox grants vs what a manifest-derived policy would, the repo has a CLI you can point at a manifest in 30 seconds.

Top comments (5)

PracHub • May 13

The article explains the four layers of MCP security well. I'm curious about how effective runtime sandbox inspection is in environments where compliance is crucial. Are we just catching risky behaviors after they happen instead of preventing them upfront? I've been using prachub.com for system design mock interviews, and their bank includes follow-up questions similar to what interviewers typically ask. They cover more than random blog posts.

Harjot Singh • Jun 1

i appreciate your breakdown of the four-layer decomposition as a procurement checklist. it really highlights the complexity of MCP security. on a different note, if you ever need a quick deployment for your projects, moonshift can help you get a full next.js + postgres + auth app live in about 7 minutes, and you own the code on your github. happy to offer a free run if you're interested.

Truong Bui • May 15

The framework is useful and the decision matrix helps a lot. One thing that's missing from the four layers though is a "layer 0" — the question that should come before "what is this server allowed to do?" is "should we be running this server at all?"

Before writing a compile-time policy for an MCP server, someone should be asking whether the server has hardcoded credentials in the source, prompt injection vectors baked into tool descriptions, or SSRF exposure in how it fetches external data. Those are code-level risks that exist before the server ever runs, and neither compile-time policy emission nor runtime sandbox inspection look for them — the policy compiler trusts the manifest, and the runtime sensor is watching what happens after deploy.

We built MCPSafe (mcpsafe.io) for that gap — it's a pre-install scanner that takes a GitHub URL, npm package, or PyPI package and runs a 5-LLM consensus panel against the code, scoring findings with AIVSS (which extends CVSS with agentic factors like tool poisoning and prompt injection). Across 508 servers scanned, 23% had at least one issue we'd consider critical. Most of those wouldn't have shown up in a policy compiler because the manifest was clean — the problem was in the tool implementation itself.

Your layer 1 is the right starting point once you've decided to run something. The pre-deployment audit belongs upstream of it.

Razu Kc • May 21

Fair distinction. The four layers in the post are about runtime behavior and declared interface - what a server is allowed to do once you've decided to run it. Pre-install code audit (hardcoded credentials, prompt injection in tool descriptions, SSRF in source) is a different category - closer to supply-chain review than to capability enforcement. capgate intentionally starts at the manifest because conflating the two muddies what each layer can be relied on for. Worth its own framework, just not the same one.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.