Ahgen Topps

Posted on Feb 6

Pre-Execution Governance for AI Agents: Why Your MCP Server Needs a Gatekeeper

#mcp #ai #opensource #governance

The Problem: Your Agent Already Did the Thing

AI agents are no longer chat toys. They're executing financial transactions, modifying production databases, deploying code, and calling external APIs. The MCP ecosystem has made it trivially easy to give an agent access to powerful tools.

The standard governance approach? Logging. Let the agent act, write it to an audit trail, review later. Maybe throw an alert if something looks off.

This is the equivalent of reviewing security camera footage after someone has already walked out of the building with your servers.

By the time you see the log entry showing your agent made an unauthorized API call, or deleted records it shouldn't have touched, or spent budget in a way that doesn't match your intent -- the damage is done. You're in recovery mode, not prevention mode.

Defining Pre-Execution Governance

Pre-execution governance means intercepting every agent action before it executes and making a deterministic allow/block/hold decision based on rules. Not after. Not eventually. Before the tool call happens.

This is a specific design pattern, distinct from:

Post-hoc auditing (logging what happened and reviewing later)
Guardrails (prompt engineering to discourage bad behavior)
Rate limiting (throttling volume without inspecting intent)

Pre-execution governance is deterministic. If an agent is halted, it cannot execute. Period. Not "it probably won't" or "the LLM was instructed not to." The code path literally does not reach the tool execution function.

The key properties:

Deterministic blocking -- Rules produce the same result every time, regardless of the LLM's reasoning
Zero-latency relative to the operation -- Validation is orders of magnitude faster than the tool call itself
Human-in-the-loop holds -- Risky operations pause for human approval, not just logging
Behavioral drift detection -- Deviations from baseline behavior trigger holds or halts
Chain validation -- Child agents cannot weaken parent constraints

Why Blocking Before Beats Auditing After

Consider an agent tasked with managing a cloud deployment. Under post-hoc auditing:

Agent receives instruction
Agent calls delete_production_database
Database is deleted
Audit log records the deletion
Alert fires
You start the incident response process

Under pre-execution governance:

Agent receives instruction
Agent calls delete_production_database
Gatekeeper intercepts the call
Validation pipeline runs (0.103ms)
Operation is blocked or held for human approval
Database is still running

The math is simple: prevention is cheaper than recovery. Every time.

The 5-Stage Validation Pipeline

PromptSpeak implements pre-execution governance as an MCP server. Every tool call passes through five stages before it can execute. If any stage fails, execution stops.

  Agent Tool Call Request
         |
         v
  +-----------------------+
  |   1. CIRCUIT BREAKER  |  Halted agents blocked immediately.
  +-----------------------+  No validation needed -- just stop.
         |
         | (agent not halted)
         v
  +-----------------------+
  |  2. FRAME VALIDATION  |  Structural, semantic, and chain
  +-----------------------+  rules checked against the operation.
         |
         | (valid frame)
         v
  +-----------------------+
  |  3. DRIFT DETECTION   |  Compare to baseline behavior.
  +-----------------------+  Flag anomalies before they execute.
         |
         | (within baseline)
         v
  +-----------------------+
  |  4. HOLD MANAGER      |  Should a human review this?
  +-----------------------+  Financial ops, deletions, external
         |                   calls can require approval.
         | (no hold needed or hold approved)
         v
  +-----------------------+
  |  5. INTERCEPTOR       |  Final permission check: tool
  +-----------------------+  bindings, rate limits, coverage
         |                   confidence, forbidden constraints.
         | (all checks pass)
         v
     EXECUTE TOOL

The order matters. The circuit breaker is first because a halted agent should be blocked immediately, with zero computation wasted on validation. This is the "deterministic stop" guarantee.

What Each Stage Does

Stage 1: Circuit Breaker. If an agent has been halted (manually, or by automatic drift detection), all tool calls are rejected instantly. This is the kill switch. It doesn't evaluate the request at all -- it checks a boolean flag and returns.

Stage 2: Frame Validation. Operations are encoded as symbolic frames with mode, domain, action, and constraint markers. Validation runs three tiers: structural rules (is the frame well-formed?), semantic rules (do the symbols make sense together?), and chain rules (does this agent have the right to do this given its parent's constraints?).

Stage 3: Drift Detection. The engine compares the current operation to the agent's behavioral baseline. If an agent that normally makes read-only queries suddenly tries to execute a write operation, the drift score will be elevated. Includes tripwire injection for proactive anomaly detection.

Stage 4: Hold Manager. Configurable rules determine which operations need human approval. You decide what gets held: all financial operations, any external API calls, operations above a confidence threshold, or specific tool names. Held operations are queued until a human approves or rejects them.

Stage 5: Interceptor. The final gate checks tool bindings (is this tool allowed by the current frame?), rate limits, coverage confidence (does the frame adequately describe this operation?), and forbidden constraints. Default policy is deny -- if a tool isn't explicitly allowed, it's blocked.

Performance

Pre-execution governance only works if it's fast enough to be invisible. If your validation layer adds 500ms to every tool call, developers will disable it.

Metric	Value
Average validation latency	0.103ms
P95 validation latency	0.121ms
Operations per second	6,977
Test coverage	951 tests across 30 files

For context, a typical MCP tool call (file read, API request, database query) takes 50-500ms. The governance layer adds 0.1ms. That's noise.

Quick Start

PromptSpeak is a TypeScript MCP server. No npm package yet -- you clone the repo and build it.

git clone https://github.com/chrbailey/promptspeak.git
cd promptspeak/mcp-server
npm install
npm run build
npm test        # 951 tests

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "promptspeak": {
      "command": "node",
      "args": ["/path/to/promptspeak/mcp-server/dist/server.js"]
    }
  }
}

Claude Code Configuration

Add to your Claude Code MCP settings:

{
  "mcpServers": {
    "promptspeak": {
      "command": "node",
      "args": ["/path/to/promptspeak/mcp-server/dist/server.js"],
      "env": {}
    }
  }
}

Once connected, the server exposes governance tools that your agent (or you) can call:

ps_execute -- Execute a tool call through the full validation pipeline
ps_validate -- Dry-run validation without executing
ps_hold_list / ps_hold_approve / ps_hold_reject -- Human-in-the-loop management
ps_state_halt / ps_state_resume -- Emergency stop and resume
ps_drift_history -- Review behavioral drift events

How It Compares to Gateway Approaches

There are other projects working on AI agent governance. The approaches differ architecturally:

Network proxy / gateway pattern (e.g., Lasso Guard, MintMCP): These sit between the agent and the MCP server as a network intermediary. They intercept traffic at the transport layer, inspecting requests as they pass through. This is similar to how API gateways work in traditional architectures.

In-process governance pattern (PromptSpeak): PromptSpeak operates inside the agent's tool ecosystem as an MCP server itself. The agent calls PromptSpeak tools directly. The validation pipeline runs in the same process context, with access to agent state, behavioral history, and drift baselines.

Tradeoffs:

Aspect	Gateway Pattern	In-Process Pattern
Setup	Separate proxy process	MCP server config
Latency	Network hop overhead	Sub-millisecond (in-process)
Agent state access	Limited (sees requests)	Full (behavioral baselines, drift history)
Multi-agent coordination	Harder (stateless proxy)	Native (shared state)
Deployment	Standalone service	Co-located with agent

Neither approach is universally better. Gateways are good when you need to govern agents you don't control. In-process governance is better when you need deep behavioral monitoring and want to avoid network overhead.

What It Doesn't Do (Yet)

Honesty section. PromptSpeak is early-stage and has real limitations:

No npm package. You clone and build from source. Packaging is planned but not done.
TypeScript/Node.js only. If your stack is Python, you'll need to run it as a separate process.
No GUI dashboard. Configuration is code and JSON. A visual dashboard is not yet built.
Symbolic frame system has a learning curve. The frame encoding is compact but unfamiliar. Natural language translation is included but it's an extra step.
Single-node only. No distributed deployment, no clustering. It runs on one machine.

The core governance pipeline is solid -- 951 tests, sub-millisecond latency, production-validated architecture. But the developer experience around setup and configuration needs work.

The Bigger Picture

MCP is growing fast. Agents are getting access to more tools, more APIs, more real-world systems. The governance conversation needs to shift from "how do we log what agents did" to "how do we prevent agents from doing things they shouldn't."

Pre-execution governance is one answer to that question. It's not the only answer, and it works best as part of a layered strategy (combine it with post-hoc auditing, prompt engineering, and operational monitoring). But it fills a gap that nothing else covers: deterministic, sub-millisecond blocking of agent actions before they execute.

If you're building agents that touch anything you care about -- production systems, financial data, user-facing services -- you should think about what happens when the agent makes a bad call. And you should decide whether you want to find out from a log entry or from a held operation waiting for your approval.

PromptSpeak is open source: github.com/chrbailey/promptspeak

Built by Christopher Bailey -- 25+ years in enterprise systems. Questions, issues, and contributions welcome.