DEV Community

Cover image for Microsoft Agent Governance Toolkit vs Waxell: Toolkit vs Platform
Logan for Waxell

Posted on • Originally published at waxell.ai

Microsoft Agent Governance Toolkit vs Waxell: Toolkit vs Platform

On April 2, 2026, Microsoft released the Agent Governance Toolkit — an open-source library for enforcing policy on AI agent actions before they execute. It is the first tool from a major platform vendor that takes the governance problem seriously at the runtime layer, and it's a significant piece of engineering: sub-millisecond policy evaluation, post-quantum cryptography already shipped, and a 9,500+ test corpus with continuous fuzzing.

If you're evaluating agent governance infrastructure, AGT belongs in the conversation. This post is for the enterprise architect who has read the announcement and is now asking the natural follow-up question: where does this fit, what does it cover, and what does it leave open?

The answer matters because "governance" means different things at different layers of the stack. AGT solves one well-defined version of the problem. Waxell solves the full version. Understanding the boundary between them is how you make the right infrastructure decision.


TL;DR

AGT Waxell
Product type Open-source library Hosted SaaS platform
Governance timing Pre-execution only Pre, mid, and post-execution
Agent scope Framework-attached agents External agents, framework agents, agentic runtime
Policy management Developer-authored YAML, code deployment required Dynamic engine — non-technical users, runtime injection
Policy categories Open-ended rule authoring 26 structured policy categories with scoping
Incident disposition Allow / deny Warn, block, or redact — scoped per category
Data layer governance Tool call level Tool call + database + vector database (Signals / Domains)
Observability Audit log + flight recorder Full span-level tracing, RunEdge causal DAG
Cost tracking None Per-call, per-user, per-tenant, with BudgetLedger enforcement
Durable execution Saga orchestrator (in-session only) Suspend, resume, human gates across session boundaries
Multi-language Python, TS, .NET, Rust, Go Python
Post-quantum crypto Ed25519 + ML-DSA-65 Ed25519
OWASP attestation CLI Yes (agt verify) No equivalent CLI
Built on Threat model and whitepaper Millions of production agentic executions

What AGT Is

AGT is structured as a monorepo of nine independently installable packages. The core components:

Agent OS is the policy engine. It intercepts agent tool calls before they execute and evaluates declarative rules written in YAML, OPA/Rego, or Cedar. Microsoft's own numbers: 0.012ms on a single rule, 0.029ms at 100 rules. The evaluation happens in-process, in the same Python (or TS/.NET/Rust/Go) runtime as your agent.

AgentMesh handles agent identity using the SPIFFE/SVID standard — the same cryptographic workload identity model used for service-to-service mTLS across cloud-native infrastructure. Messages between agents are encrypted with the Signal protocol. AgentMesh also provides the infrastructure for the kill switch: a signal that propagates across the mesh and halts a target agent.

Agent Runtime implements four privilege rings, a saga orchestrator for multi-step rollback, and the kill switch. Privilege rings control what classes of action an agent can take; the saga orchestrator ensures that if a multi-step workflow fails partway through, the completed steps can be reversed.

Agent SRE provides reliability engineering primitives: SLO definitions, chaos injection hooks for testing, and circuit breakers that can pause an agent workflow when error rates breach a threshold.

Agent Compliance ships the agt verify CLI, which maps your agent stack against the OWASP Agentic Top 10 and generates a signed attestation on every deployment.

Agent Hypervisor does reversibility verification: before a potentially irreversible action executes, the Hypervisor checks whether it can be undone. Actions that can't be reversed are blocked or require explicit override.

Agent Discovery scans processes, configs, and repositories for AI agents that haven't been registered in your governance system — the "shadow agent" detection problem.

Agent Marketplace handles plugin lifecycle management — Ed25519 signing, verification, trust-tiered capability gating, and supply-chain security for third-party agent plugins.

Agent Lightning provides governance for reinforcement learning training: policy-enforced RL runners and reward shaping that enforces zero policy violations during training.

The multi-language support is real and broad: Python has full support; TypeScript, .NET, Rust, and Go have subsets. The test corpus is 9,500+ tests with ClusterFuzzLite fuzzing running continuously against the policy engine. Post-quantum cryptography is already shipped: agent identities are signed with both Ed25519 and ML-DSA-65.

AGT is also clear about what it does not do. From the documentation: "This is not a prompt guardrail or content moderation system. It governs agent actions, not LLM inputs or outputs." The policy engine runs in-process — AGT's own documentation recommends container isolation as a compensating control for higher-risk deployments. Workflow-level policies and intent declaration are on the roadmap but not yet available.


What Waxell Is

Waxell is a hosted, multi-tenant SaaS platform built across three product planes.

Runtime provides durable execution primitives for AI agents: spawn sub-agents, suspend for arbitrary durations, wait for human approval, resume after a signal or timer. The Envelope state machine checkpoints every agent run to Postgres after each await. If the worker process crashes, the run resumes automatically from the last checkpoint — no deterministic replay required.

Observe is the distributed tracing and cost layer. It auto-instruments 157 libraries at process start via pip install waxell-observe[all] — LangChain, CrewAI, AutoGen, the Anthropic SDK, the OpenAI SDK, and 151 others. Every LLM call produces a span with token counts, latency, model, and cost. Every tool call is recorded with its arguments and output. The RunEdge DAG links every spawn, signal, resume, and cross-session bridge causally — so when Agent A spawns Agent B which calls a tool that triggers Agent C, the full causal chain is browsable in the trace explorer.

The Dynamic Policy Engine is the governance layer. Unlike AGT's static YAML deployment model, Waxell's policy engine is injectable at runtime without redeployment. Waxell ships 26 policy categories — covering data handling, cost, tool access, output content, identity, inter-agent communication, and more — each with scoping controls. Policy assignment is dynamic: different agents and fleets can run under different policy sets. The incident disposition model mirrors cloud infrastructure security: warn, block, or redact, scoped per category. A compliance officer can push a policy change through the platform UI without opening a terminal or filing a deployment ticket.

The Governed Data Access Layer extends policy enforcement beyond tool calls to the data retrieval layer. Waxell's Signals and Domains schema lets teams declare which agents can access which data sources, at what granularity, under what conditions. Enforcement happens at the retrieval boundary — before the data enters the agent's context — closing the gap that tool-call governance alone cannot close.


Where AGT Has an Advantage

Multi-language support. AGT ships working policy enforcement for Python, TypeScript, .NET, Rust, and Go. If your agent fleet isn't Python-only, AGT is the policy layer that works for your whole stack. Waxell is currently Python SDK only.

Post-quantum cryptography, shipped. AGT signs agent identity with both Ed25519 and ML-DSA-65 (CRYSTALS-Dilithium). For organizations with a post-quantum compliance timeline, that checkbox is already ticked. Waxell's AXID uses Ed25519; ML-DSA-65 is on the roadmap.

Formal OWASP attestation. AGT ships the agt verify CLI, which produces a signed attestation mapping your deployment against all ten OWASP Agentic Top 10 risk categories. Both AGT and Waxell are built on the same underlying standards — the difference is the formal mapping and the auditable CLI artifact. If your compliance team needs that specific deliverable, AGT produces it.

Test depth and fuzzing. 9,500+ tests plus continuous ClusterFuzzLite fuzzing against the policy engine. For security-critical deployments where test coverage is an auditable artifact, that corpus is a meaningful signal.

Open-source, no vendor dependency. AGT is Apache 2.0 licensed. No usage cost, no API key, no hosted infrastructure. If your organization's policy is to not depend on external SaaS for security-critical functions, AGT's deployment model is compatible with that constraint in a way Waxell's is not.

Microsoft distribution. If your organization runs Azure, Semantic Kernel, or AutoGen, AGT ships with native adapters and Microsoft's distribution behind it.


Where Waxell Has the Advantage

The Execution Arc

The most fundamental difference is when governance applies.

AGT governs the pre-execution moment: a tool call is about to fire, the policy engine evaluates, outcome is allow or deny. If the call is allowed, AGT has done its job. Everything that happens after that — what the agent does mid-run, what it outputs, what the next agent in the chain receives — is outside AGT's enforcement surface.

Waxell governs the full arc. Pre-execution policy evaluation works the same way. But Waxell also enforces mid-execution: BudgetLedger tracks spend across the entire spawn tree in real time and can halt a run the moment a cost threshold is crossed, not just at the next discrete tool call. Human review gates suspend a run mid-execution until a reviewer acts. And Waxell governs post-execution: output gates, audit closure, causal graph completion.

The six failure modes that matter in production — runaway loops, scope creep, data leakage, hallucination-in-action, prompt injection, and cascade failures — are not primarily pre-execution failures. They unfold during execution. A policy that only fires before a tool call can't stop a loop that's unfolding across turns. It can't gate output before it reaches the next agent in a chain. It can't enforce a review step between what the agent decided and what the agent dispatched.

The Dynamic Policy Engine

AGT's policy model is static: YAML, OPA/Rego, or Cedar rules deployed in a policies/ directory. Changing a policy means editing a file, running tests, and deploying. That is a developer task. Every policy change goes through the engineering queue.

Waxell's policy engine is dynamic. The 26 policy categories are structured around the actual violation types that surface in production — each with scoping controls that let compliance and security teams configure enforcement without writing code. Policies are injectable at runtime. AGT makes governance an engineering concern. Waxell makes it an organizational concern.

The incident disposition model adds another dimension AGT doesn't have. Where AGT's enforcement is binary (allow or deny), Waxell's options are warn, block, or redact — scoped per policy category. A tool call that trips a budget threshold can generate a warning and route to human review before hard blocking. A response containing sensitive data can be redacted before it reaches the next agent in the chain rather than halting the run entirely. Proportionate response is how mature security infrastructure works.

The Data Layer

AGT governs tool calls. It has no mechanism to enforce policy on what data an agent retrieves. An agent with permission to call a search or retrieval function can surface anything those systems return — the tool call is allowed, the policy was satisfied, and the governance layer never sees what the agent is about to read.

Waxell's Signals and Domains schema closes this gap at the retrieval boundary. Declare which agents can access which data sources, at what granularity, under what conditions. Enforcement happens before the data enters the agent's context. An agent can be perfectly well-governed at the AGT tool-call level and still exfiltrate data through an unguarded retrieval path. The governed data access layer is the answer to that.

Observability and Causal Lineage

AGT's observability surface is an audit log of policy events and a flight recorder for post-mortem policy replay. Both are valuable. Neither is a span-level trace.

When debugging a production incident — an agent that ran for 40 turns, consumed significant budget across LLM calls, spawned several sub-agents, and then failed — AGT tells you which policy rule fired. Waxell tells you every turn, every token, every tool call, every spawn edge, in a browsable causal graph.

Production agent failures rarely announce themselves through policy violations. Policy violations are rare by design — they're the catch, not the signal. The failures that actually hurt — cost overruns, reasoning regressions, emergent behavior — don't trigger any rule. They only become visible in spans.

The RunEdge DAG goes further: when Agent A spawns Agent B which calls a tool that triggers Agent C across a different session boundary, the full causal chain is recorded with typed edge kinds (spawn, signal_fire, domain_callback, cross_session_bridge). An incident that traces back through four spawn levels across three sessions is navigable in the UI in under a minute. A sequential audit log can't answer "what caused this?" — it can only answer "what happened?"

Cost Tracking and BudgetLedger

AGT has no model cost table, no token aggregation, no billing attribution. This is documented, not a criticism. But cost attribution is a real operational need the moment you run agents at scale on behalf of customers.

Waxell's SystemModelCost records every LLM call with tokens and cost. ModelCostOverride handles custom model endpoints. Pass a session ID and you get per-user, per-tenant cost attribution without building the reporting layer yourself.

The BudgetLedger primitive adds enforcement: it's a real-time, tree-scoped cost ledger that agents can query mid-run. A policy rule that says "block this tool call if the spawn tree has spent over $10" queries the live BudgetLedger as its condition. The policy team writes the threshold; the enforcement is real-time.

Durable Execution

AGT's saga orchestrator handles multi-step rollback within a single session. It doesn't provide suspend-for-days. The use cases that need cross-session durable execution — wait for payment confirmation, pause for human approval, nightly batch workflows that sleep until an event — aren't addressable with a saga orchestrator.

Waxell Runtime provides native durable execution with no determinism requirement. Agent state checkpoints to Postgres after each await. If the worker crashes during a sleep, the run resumes automatically from the last checkpoint when a worker comes back up.

External Agent Coverage

AGT instruments agents running inside its supported frameworks. It has no surface for agents running in external environments — developer tooling, CI pipelines, third-party integrations — that operate outside framework instrumentation.

Waxell's external agent observability covers these cases via the Waxell installer, which drops configuration that routes structured events from external agents into the same governance surface as your framework-built agents. All three contexts — external agents, framework agents, and the agentic runtime — appear under one observability plane with unified attribution.


The Trust Boundary Question

AGT's own documentation is direct about the same-process model: the policy engine and the agent run in the same Python process. If a compromised dependency patches the evaluation function to always return allow, an in-process check can't detect it. AGT recommends container isolation per agent as a compensating control.

This is an honest statement of a real trade-off, not a weakness unique to AGT. Every in-process policy library faces the same boundary.

Waxell's out-of-process enforcement for domain endpoints works differently: for risky actions, the agent SDK sends an intent to a server-side endpoint. The server re-verifies the AXID, re-checks policy, and debits the BudgetLedger before returning. The agent process cannot bypass this by patching the SDK — it cannot proceed without the server response. The enforcement is strong for the action classes covered by domain endpoints; it is not a full OS isolation layer either.

For internal agent workloads with vetted dependencies and container-per-agent deployment, AGT's same-process enforcement is sufficient. For regulated environments where an external audit trail of enforcement is required, or for workloads running third-party tools, out-of-process enforcement for the highest-risk actions is the more defensible architecture.


Decision Framework

AGT is the right choice when:

  • Your agent stack includes TypeScript, .NET, Rust, or Go and needs cross-language policy enforcement
  • You have a near-term post-quantum compliance audit requirement
  • Your compliance team needs a signed OWASP Agentic Top 10 attestation produced by a CLI
  • Your organization has a policy against external SaaS dependencies for security-critical functions
  • You're already standardized on Microsoft infrastructure and want native integration

Waxell is the right choice when:

  • You need governance across the full execution arc — pre, mid, and post-execution
  • Policy changes need to happen without engineering deployments — compliance and security teams need to own their own policies
  • Your risk surface includes data retrieval, not just tool dispatch
  • You need hosted observability, cost tracking, and causal lineage without building the infrastructure
  • Your agents need durable execution across session boundaries
  • Your agent fleet spans external environments that can't run framework adapters

Frequently Asked Questions

Is AGT a replacement for Waxell?
No. AGT is a pre-execution policy enforcement library. Waxell is a hosted platform for running, observing, and governing agents across the full execution arc. AGT doesn't ship observability, cost tracking, durable execution, or data layer governance. The gap between them is real and documented.

Does Waxell have its own policy layer?
Yes — and it goes further than a companion library. Waxell's dynamic policy engine supports 26 policy categories with scoping controls, runtime-injectable policies, per-agent and per-fleet policy assignment, and warn/block/redact disposition options. It's manageable by non-technical users directly through the platform UI.

What is AXID?
AXID (Agent Execution Identity) is an Ed25519-signed JWT minted per run by Waxell. It carries tenant ID, agent slug, run ID, delegated sub-user ID, spawn-chain parent AXID, and a 5-minute TTL. It's attached as an HTTP header on every outbound action. AXID is distinct from AGT's AgentMesh/SPIFFE identity: SPIFFE identifies the service; AXID identifies the specific run and its causal chain.

If AGT only covers pre-execution, what happens to mid-run failures?
They unfold undetected until they surface as an outcome. A loop that's building across turns, a spawn tree that's accumulating cost, an agent that retrieved data it shouldn't have — none of these trigger a pre-execution policy rule. Waxell's mid-execution enforcement and continuous span-level tracing are designed specifically for this category of failure.

What does "toolkit vs platform" mean in practice?
A toolkit is a set of components you integrate and operate. A platform is a service that manages infrastructure on your behalf. AGT requires you to deploy, configure, upgrade, and maintain it. Waxell is a SaaS product: you instrument, the platform operates the rest. The distinction matters most when you're evaluating build vs. buy on observability, cost attribution, durable execution, and data layer governance — all of which you'd need to build and operate yourself on top of AGT alone.

Is AGT production-ready?
Microsoft labels it Public Preview as of April 2026, version 3.2.2. The test corpus and performance numbers suggest it's production-ready for early adopters. AGT's own documentation recommends container isolation for higher-risk workloads given the same-process trust boundary.


Waxell is the hosted platform for running, observing, and governing AI agents in production. Built on millions of agentic executions. See the platform overview or book a reference architecture review.


Sources

Top comments (0)