Same question, three answers: a governed MCP server with receipts

#ai #llm #mcp #python

Ask my agent "what's the open pipeline for Acme Corp?" as an admin and it answers $125,000 across two deals, with a table. Ask the exact same question as a support agent and it says, politely and correctly, that it can't see pipeline data and suggests who to ask instead.

The model didn't decide that. It never gets the chance to. That's the whole project.

The problem

Give an AI agent tool access to company data and you get two questions you can't dodge:

Who is the agent acting as? A support rep must not get answers the human behind the keyboard isn't allowed to see.
How do you know it behaved? "Seemed fine in testing" doesn't survive a security review.

I kept seeing these two questions in every AI-infra and forward-deployed engineering job posting, so I built a complete answer and put it on the public internet: Warden, a governed MCP server with an agent, traces, and evals on top. You can fire a real (rate-limited) agent run yourself.

Three ideas worth stealing

1. Governance lives outside the model. The role comes from session identity (the MCP server reads it at spawn, like OAuth scopes). Every read passes through one GovernedStore choke point that applies resource access, region row-scoping, and field redaction before the model sees a byte. Prompting harder widens nothing, because there's nothing on the model's side of the wall to widen.

2. The eval oracle has to obey the rules too. This was the design moment of the build. If your reference answers come from the raw database, then a correctly denied answer scores as a failure: the support agent honestly says "I can't see pipeline" and your eval compares that to $125,000 and marks it wrong. So Warden's oracle computes ground truth through the same governance layer as the agent. An honest denial becomes a passing grade. Then a stronger model judges than answers (Opus judging Sonnet), anchored to that reference. Unanchored LLM judges grade on vibes; anchored ones measure. 12/12 cases passing, scorecard is public.

3. Denials are data, not error strings. Tools return a structured access_denied object. That's what lets the eval layer check "did the agent report the limit honestly instead of guessing," and it's what makes the same-question-three-roles diff page work.

Every run also emits real OpenTelemetry spans (GenAI semantic conventions) that the dashboard replays as a timeline, with the enforcing role stamped on every tool result.

What bit me

Markdown tables from the agent rendered as pipe-soup until I learned Tailwind's prose classes silently do nothing without @tailwindcss/typography, and react-markdown needs remark-gfm for tables at all. Found it by clicking the deployed site, not in the build.
The official MCP Python SDK ships FastMCP at mcp.server.fastmcp. The standalone fastmcp package is a different thing. Know which one you're importing.
A public endpoint that burns real model tokens forces the unglamorous work: per-IP limits off the CDN's forwarded header, a global daily budget, a single-flight lock, hard timeouts. That's the difference between a demo and a toy.

Links

Live console: warden.alexlaguardia.dev
Source: github.com/AlexlaGuardia/warden
Full build write-up: alexlaguardia.dev/writing/warden

Built solo as a working answer to "how do you let an agent touch real data without trusting it blindly?" If you're building agents over data someone cares about, the choke point, the governance-aware oracle, and structured denials all carry straight over.

Top comments (1)

Max Quimby • Jun 13

The "governance outside the model, one GovernedStore choke point" decision is the part I wish more people internalized — once enforcement sits on the model's side of the wall, every prompt-injection write-up becomes whack-a-mole. Routing the eval oracle through the same layer is the subtler win; that "a correct denial scores as failure" trap catches almost everyone who bolts evals on after the fact. One leak vector worth stress-testing: the shape of a denial. If access_denied comes back faster, or structurally different, than a genuine empty result ("no pipeline for Acme" vs "you can't see pipeline"), a probing agent can infer the existence of records it isn't allowed to read — an oracle of absence. Constant-time, constant-shape denials close that. The other is caching: if GovernedStore ever memoizes a read with a key that omits the enforcing role, that's a cross-role leak waiting to happen. Did role end up in the cache key, or do you redact post-cache?