Agents Lie. That's the Problem.
Here's a truth most multi-agent frameworks won't tell you: AI agents lie. They'll report success when they failed. They'll confirm they followed your guidelines while silently violating them. They'll tell you everything is fine — and it isn't.
I run 40+ autonomous agents that manage everything from family logistics to content pipelines to client projects. They make thousands of decisions daily without human oversight. The only reason this works is because I stopped trusting context-level instructions and started governing at the action layer.
Most "governance" in the AI agent space means adding more instructions, more context, more tokens — more suggestions that the model may or may not follow. That's not governance. That's hope. True governance means deterministic control over the actions an agent can take, plus the ability to steer behavior strategically and verifiably.
The 3-Layer Governance Framework
After months of iteration — and plenty of spectacular failures — I settled on a three-layer architecture that separates what you suggest, what you enforce, and what you deny:

The governance pattern: deny the raw way, provide a governed tool, steer the agent toward it
Layer 1: Instructions (Steering)
Instructions are suggestions. They guide the agent toward the right path without wasting tokens on trial-and-error. Think of them as guardrails in a bowling alley — they keep the ball roughly on track, but they don't guarantee a strike.
What belongs here:
- Style preferences and conventions
- Decision frameworks for ambiguous situations
- Workflow sequences ("do A before B")
- Communication tone and formatting rules
The limitation: Instructions are probabilistic. A model might follow them 95% of the time — but at scale, that 5% failure rate compounds fast. When an agent makes 200 decisions per session, you'll hit instruction violations every single run.
Layer 2: Extensions (Deterministic Tools)
When you need something done right every time, you define it as a tool. Extensions replace free-form agent behavior with deterministic workflows that produce consistent results regardless of model temperature, prompt drift, or context window overflow.
The pattern: Deny the raw way → define the governed way → steer the agent toward the governed tool.
Here's a real example from my system: I don't let agents run raw git commit. Instead, I built a dev_commit extension tool that enforces commit message formatting, adds co-author trailers, validates branch protection, and logs the operation. The agent calls one tool, and five governance concerns are handled automatically.
What belongs here:
- Workflows that require multiple coordinated steps
- Operations with side effects (file writes, API calls, deployments)
- Processes that need audit trails or consistent formatting
- Anything where "close enough" isn't good enough
Layer 3: Hookflows (Deny/Block)
Hookflows are the immune system. They fire deterministically on every tool call — before execution — and can deny, modify, or gate any action. The agent never gets a chance to make the mistake because the action is blocked at the infrastructure level.
What belongs here:
- Security boundaries (no secrets in outputs, no raw API calls)
- Brand protection rules (never mention competitors negatively)
- Data governance (no writes to protected files without extension tools)
- Safety-critical operations (never state a child's location without staleness caveat)
The key insight: hookflows are the only layer that provides zero-trust guarantees. Instructions can be ignored. Tools can be misused. But a pre-execution hook that denies a tool call? That's physics, not suggestion.
The Decision Framework
When I encounter a new governance requirement, I run it through this decision tree:

How to choose the right governance layer for each new requirement
| Question | Answer → Layer |
|---|---|
| Is this an activity you don't want happening? | → Hookflow (deny) |
| Is this something that must be done correctly every time? | → Extension tool + hookflow to block the raw way + instruction to steer toward the tool |
| Is this a non-deterministic judgment call (taste, review, prioritization)? | → Instructions only |
The token waste problem illustrates why you need all three layers working together. If I only use a hookflow to block git commit, the agent wastes tokens attempting it, receiving the denial, then figuring out an alternative. Adding an instruction ("always use dev_commit instead of raw git") prevents the wasted attempt. The hook remains as the safety net for when instructions fail — and they will fail.
Autonomy Without Anarchy: The Escalation Model
Governance isn't just about blocking — it's about knowing when agents should act freely versus when they should escalate. My framework uses a filter-based approach:
Act autonomously when:
- The action has a deterministic tool governing it
- The action is within the agent's declared domain
- The action is reversible or low-stakes
Escalate when:
- The agent isn't confident in its decision
- The action crosses domain boundaries
- The action is irreversible and high-stakes (major purchases, medical decisions, data deletion)
The scale challenge is real — you can't review everything. My solution: review agents that review other agents, with continuous augmentation to the governance layer based on what the review agents find. It's quality assurance all the way down, with humans only entering the loop for genuinely novel situations.
The Hard-Won Lesson: Proof Over Trust
The most expensive architectural mistake I made was relying on context to enforce correctness. Context-heavy governance is fragile because:
- Context windows overflow — long-running agents lose early instructions
- Model updates change behavior — what worked with one model version may not work with the next
- Agents confabulate — they'll generate convincing confirmation of actions they never took
The fix: require proof that a workflow was executed. The only way certain content can exist in an agent's response is if it came from a known deterministic flow. I built cryptographic approval gates as a proof-of-concept for this pattern — digital signatures that prove a human or review process actually approved an action, not just that the agent claims it was approved.
This is the same principle behind the Cloud Security Alliance's Agentic Trust Framework: zero-trust governance applied to AI agents, where trust is verified through evidence rather than assumed through instructions.
The Governance Maturity Model
If you're building governed agent systems from scratch, here's the progression I recommend:

The governance maturity progression — from simple context steering to self-improving meta-governance
Level 1: Context-Level Steering
Master the ability to articulate guardrails and document them effectively. Write clear instructions. Learn what the model follows reliably and what it doesn't. This is where 90% of builders stay — and it works fine for simple, single-agent systems.
Graduate when: You notice the agent NOT following instructions consistently. That's your signal that context-level governance has reached its ceiling.
Level 2: Simple Deterministic Guards
Add basic hookflows — deny patterns that should never happen (secrets in output, writes to protected paths). These are your first zero-trust guarantees.
Level 3: Governed Tool Workflows
Replace free-form behaviors with extension tools. This is the highest-leverage layer — you're not just blocking bad actions, you're making the right action the only action.
Level 4: Adaptive Governance
Policies that learn. When a new failure mode emerges, the governance layer updates itself — new hookflows, new tool constraints, updated instructions. The system gets stronger from every mistake. Research on runtime governance for AI agents is formalizing this as "policies on paths" — adaptive policy selection based on execution state.
Level 5: Meta-Governance
Governance of the governance layer itself. Review agents that audit your hookflows. Quality agents that validate your extension tools still work correctly. Meta-governance architectures are emerging as the frontier for multi-agent system safety — and in my experience, you need them sooner than you think.
What This Looks Like in Practice
My production system runs with 60+ reusable skills, 44 extension tools, and a growing set of hookflows governing 40+ agents. The layered approach means:
- New agents inherit governance automatically (hooks fire on all tool calls)
- Common mistakes are impossible (not just discouraged)
- Quality improves with scale (more review data → better review agents)
- The system is auditable (per-turn evaluation provides dynamic governance at runtime)
Microsoft's Agent Governance Toolkit and Azure's Cloud Adoption Framework for AI agents validate that enterprise is moving in the same direction — policy-driven, auditable, layered governance rather than monolithic prompt engineering.
The Bottom Line
If your AI governance strategy is "write better prompts," you're building on sand. Prompts are suggestions. Governance is infrastructure.
Start with instructions to steer cheaply. Graduate to hookflows when instructions fail. Build extension tools when you need workflows done right every time. And never, ever trust an agent's self-report — require deterministic proof that the right thing happened.
The maturity curve applies here too: early governance feels like overhead. Mature governance feels like freedom — because you can grant agents more autonomy when you have confidence in the guardrails underneath.
Your agents will lie to you. Build systems that make that lie impossible.
Top comments (0)