DEV Community

Alex Garden
Alex Garden

Posted on

Building Proactive AI Agent Governance: Policy Engines in the Request Pipeline

It’s becoming increasingly clear to me that the world needs a governance system for complex, highly autonomous AI systems such as self-driving vehicles. But looking at current governance systems, all of them do one thing in common: they react after something has already happened, and they record everything that has occurred in a log file, with the vague hope that perhaps someone will read the log file and perhaps identify a pattern. This post-reactive approach to what can be called a “regulatory bank” is akin to having a bank that records every transaction but doesn’t have any preventative controls in place to stop a fraud transaction from occurring in the first place, with the knowledge that you’ll only find out something has gone wrong after it has already happened.

I wanted to give you a preview of a new system we are building at Mnemom. We have been playing with the idea of shifting governance earlier in the request pipeline, before an agent would actually act.

The Monitoring vs Governance Problem

Most AI governance solutions are like security cameras. They show who did what, alert on unusual activities and sometimes give a few visuals to executives to help them decide on the next steps. And by then, it’s usually too late because the damage has already been done.

This creates several problems:

  • Reactive response: Damage happens before detection
  • Human bottleneck: Every decision requires human review
  • Configuration confusion: Tool additions look like policy violations
  • Trust erosion: Once a score drops, it stays dropped

What we really needed was a firewall or set of rules to prevent the evil from occurring.

The Semantic Gap Challenge

Core technical challenge: The semantic gap. Agent alignment cards declare capabilities in a human-friendly format:

{
"bounded_actions": ["inference", "read", "write", "web_fetch"]
}
Enter fullscreen mode Exit fullscreen mode

But actual tools have specific, implementation-dependent names:

mcp__browser__navigate
execute_python_code
search_web
read_file
Enter fullscreen mode Exit fullscreen mode

There has always been a disconnect between the semantic intent of an action (web_fetch) and the way that action is actually performed (mcp_browser_navigate). Looking at traces, an observer has to make a guess about what tool was used to perform the action described in the card and this can lead to all sorts of wrong conclusions.

Policy DSL: Bridging Intent and Implementation

Our solution is a policy Domain Specific Language that explicitly maps semantic capabilities to tool patterns:

capability_mappings:
web_browsing:
tools:
- "mcp__browser__*"
card_actions:
- "web_fetch"
- "web_search"

file_reading:
tools:
- "mcp__filesystem__read*"
- "mcp__filesystem__list*"
card_actions:
- "read_file"

forbidden:
- pattern: "mcp__filesystem__delete*"
reason: "File deletion not permitted"
severity: "critical"
- pattern: "mcp__shell__*"
reason: "Shell execution not permitted"
severity: "high"

defaults:
unmapped_tool_action: "warn"
grace_period_hours: 24
Enter fullscreen mode Exit fullscreen mode

This is more than configuration, this is a governance contract. The alignment card describes our intent, the policy maps that intent to the platform, and the engine is the enforcement mechanism.

Three-Checkpoint Architecture

The same policy engine evaluates the same rules at three different points:

1. CI/CD (Prevention)

Check that the tool manifest for the agent is appropriate for the combination of the alignment card and policy that it is being sent to deploy:

npx @mnemom/smoltbot policy evaluate --card card.json --tools manifest.json
Enter fullscreen mode Exit fullscreen mode

This check will fail the build if there is a policy mismatch. The idea is to catch misconfigured agents before they are deployed.

2. Gateway (Enforcement)

This extension makes a DevOps and Developers life easier. It extracts the name of the tool used in an API request by integrating itself in to the Kestrel request pipeline. It also shows a real time validation of that extracted name.

// Supports Anthropic, OpenAI, Gemini formats
const verdict = await policyEngine.evaluate({
toolNames: extractToolNames(request.body),
agentId: request.headers['x-agent-id'],
mode: 'enforce' // or 'warn', 'off'
});

if (verdict.action === 'reject') {
return Response.json(
{ error: verdict.reason },
{ status: 403, headers: { 'X-Policy-Verdict': 'rejected' }}
);
}
Enter fullscreen mode Exit fullscreen mode

Policy evaluation occurs concurrently with quota allocation and does not introduce any latency in the hot path.

3. Observer (Verification)

After execution the traces are compared against the same policy rules. We call these three steps together the Auditing Process and we distinguish between:

  • Card gaps: Tools the agent uses that aren’t in its card (configuration error)
  • Behavior gaps: The agent actions which are specified in the control law, but which are forbidden (policy violation)

The distinction matters for what happens next.

Trust Recovery: Learning from Configuration Errors

Before this system, a common scenario would break trust scores permanently:

  1. Developer adds new MCP tool to agent
  2. Agent starts using the tool
  3. Observer flags UNBOUNDED_ACTION violations
  4. Mnemom Trust Rating drops
  5. Developer fixes alignment card
  6. Trust score stays low (damage was permanent)

They were valid violations, but we had a configuration gap that caused them to be flagged rather than a user error. Our new system addresses this:

// When violations are reclassified from behavior gaps to card gaps
POST /v1/agents/{agent_id}/reclassify
Enter fullscreen mode Exit fullscreen mode

Reclassification triggers:

  • Trust score recalculation (violations excluded)
  • Downstream agent score updates (transitive trust recovery)
  • Proof reissuance (if affected violations were in proven sessions)
  • OTel event emission with before/after scores

Grace periods help with this. If the gateway encounters a tool that is not in the inventory, it will be given a 24 hour grace period (can be configured) before it will be reported as a violation.

Intelligence Layer: Predicting Multi-Agent Failures

Detection tells you what happened. Prediction tells you what will happen.

We built an intelligence layer on top of N-way coherence analysis that:

  1. ** Extracts fault lines** from agent value conflicts
  2. ** Forecasts risks** for specific team compositions
  3. ** Recommends policies** to mitigate predicted failures
  4. ** Applies transaction-scoped guardrails** without permanent changes

Example API flow:

// Find where agents conflict
const faultLines = await fetch('/v1/teams/fault-lines', {
method: 'POST',
body: JSON.stringify({ agents: ['agent-a', 'agent-b', 'agent-c'] })
});

// Predict failure modes for a task
const forecast = await fetch('/v1/teams/forecast', {
method: 'POST',
body: JSON.stringify({
agents: ['agent-a', 'agent-b'],
context: 'data_processing_pipeline'
})
});

// Get policy recommendation with cryptographic proof
const recommendation = await fetch('/v1/teams/recommend-policy', {
method: 'POST',
body: JSON.stringify({
team: 'data-team',
context: 'high_value_transaction'
})
});
Enter fullscreen mode Exit fullscreen mode

The recommendation is attested with a STARK proof that spans the entire derivation chain of fault line extraction, risk forecast, policy generation and expected enforcement outcomes.

Implementation Notes

Performance Considerations

  • Policy evaluation is parallelized with quota resolution
  • Tool name extraction supports streaming request bodies
  • Grace period lookups use Redis with TTL expiry
  • Reclassification uses BFS traversal with 50-agent cap

Security Design

  • Policy DSL uses safe YAML subset (no code execution)
  • Tool patterns use glob matching with escape sequence validation
  • STARK proofs use SP1 guest programs with deterministic execution
  • On-chain anchoring uses ERC-8004 aligned contracts

Integration Points

// Express middleware
app.use('/v1/agents', policyEnforcement({ mode: 'enforce' }));

// Next.js API route
export default async function handler(req, res) {
const verdict = await evaluatePolicy(req);
if (verdict.action === 'reject') {
return res.status(403).json({ error: verdict.reason });
}
// Continue with agent request...
}

// Fastify plugin
await fastify.register(require('@mnemom/policy-plugin'), {
mode: 'warn',
gracePeriodHours: 48
});
Enter fullscreen mode Exit fullscreen mode

What This Enables

Moving governance into the request pipeline fundamentally changes what's possible:

  • Proactive enforcement instead of reactive monitoring
  • Configuration error recovery instead of permanent trust damage
  • Predictive risk management for multi-agent teams
  • Cryptographically verified policy recommendations
  • Zero-latency policy evaluation in production

The engine that validated your application in your CI/CD pipeline, validated your production gateway and traced your observability traces has no drift, no gaps, no surprises – and now – it’s also the same engine you have in production.

Getting Started

Foundational model ethics is about the fact that your governance must not obstruct the autonomy your models were designed to not obstruct in the first place. If you’re planning on creating large teams of AI agents and they need some governance that doesn’t quite kill their autonomy in the process:

# Generate starter policy
smoltbot policy init

# Validate existing agents
smoltbot policy evaluate --card ./alignment-card.json --tools ./tool-manifest.json

# Test policy against live agents
curl -X POST https://api.mnemom.ai/v1/policies/evaluate \
-H "Content-Type: application/json" \
-d '{"agentId": "your-agent", "tools": ["tool-name"]}'
Enter fullscreen mode Exit fullscreen mode

The full technical specification and integration guides are in our docs.


Originally published on mnemom.ai

Top comments (0)