Ask my agent "what's the open pipeline for Acme Corp?" as an admin and it answers $125,000 across two deals, with a table. Ask the exact same question as a support agent and it says, politely and correctly, that it can't see pipeline data and suggests who to ask instead.
The model didn't decide that. It never gets the chance to. That's the whole project.
The problem
Give an AI agent tool access to company data and you get two questions you can't dodge:
- Who is the agent acting as? A support rep must not get answers the human behind the keyboard isn't allowed to see.
- How do you know it behaved? "Seemed fine in testing" doesn't survive a security review.
I kept seeing these two questions in every AI-infra and forward-deployed engineering job posting, so I built a complete answer and put it on the public internet: Warden, a governed MCP server with an agent, traces, and evals on top. You can fire a real (rate-limited) agent run yourself.
Three ideas worth stealing
1. Governance lives outside the model. The role comes from session identity (the MCP server reads it at spawn, like OAuth scopes). Every read passes through one GovernedStore choke point that applies resource access, region row-scoping, and field redaction before the model sees a byte. Prompting harder widens nothing, because there's nothing on the model's side of the wall to widen.
2. The eval oracle has to obey the rules too. This was the design moment of the build. If your reference answers come from the raw database, then a correctly denied answer scores as a failure: the support agent honestly says "I can't see pipeline" and your eval compares that to $125,000 and marks it wrong. So Warden's oracle computes ground truth through the same governance layer as the agent. An honest denial becomes a passing grade. Then a stronger model judges than answers (Opus judging Sonnet), anchored to that reference. Unanchored LLM judges grade on vibes; anchored ones measure. 12/12 cases passing, scorecard is public.
3. Denials are data, not error strings. Tools return a structured access_denied object. That's what lets the eval layer check "did the agent report the limit honestly instead of guessing," and it's what makes the same-question-three-roles diff page work.
Every run also emits real OpenTelemetry spans (GenAI semantic conventions) that the dashboard replays as a timeline, with the enforcing role stamped on every tool result.
What bit me
- Markdown tables from the agent rendered as pipe-soup until I learned Tailwind's
proseclasses silently do nothing without@tailwindcss/typography, and react-markdown needsremark-gfmfor tables at all. Found it by clicking the deployed site, not in the build. - The official MCP Python SDK ships FastMCP at
mcp.server.fastmcp. The standalonefastmcppackage is a different thing. Know which one you're importing. - A public endpoint that burns real model tokens forces the unglamorous work: per-IP limits off the CDN's forwarded header, a global daily budget, a single-flight lock, hard timeouts. That's the difference between a demo and a toy.
Links
- Live console: warden.alexlaguardia.dev
- Source: github.com/AlexlaGuardia/warden
- Full build write-up: alexlaguardia.dev/writing/warden
Built solo as a working answer to "how do you let an agent touch real data without trusting it blindly?" If you're building agents over data someone cares about, the choke point, the governance-aware oracle, and structured denials all carry straight over.
Top comments (0)