Imagine this scenario: An AI agent is asked to “clean up old records,” and it interprets “old” as “everything older than today.” There is no policy engine to intercept the action, no approval workflow to pause and ask a human, and no kill switch to stop it mid-execution. The agent has been given unrestricted tool access — the equivalent of handing a new employee the root password on their first day and saying, “figure it out.”
This hypothetical illustrates a real and growing concern. As AI agents have evolved from simple chatbots into autonomous systems that book flights, execute trades, write code, and manage infrastructure, a gap has emerged: Many of the popular frameworks that power these agents focus on orchestration and have not yet built in runtime security governance. Frameworks like LangChain, AutoGen, and CrewAI do an excellent job of orchestrating agent behavior, but the industry as a whole is still developing answers to a fundamental question: What happens when an agent does something it shouldn’t?
Note: The Agent Governance Toolkit is currently available as a community preview release. All packages published to PyPI and npm are not official Microsoft-signed releases. Official signed packages via ESRP Release will be available in a future release. All security policy rules and detection patterns ship as configurable sample configurations that users must review and customize before production use.
That question sent me down a path that eventually became the Agent Governance Toolkit — an open-source framework, now released by Microsoft, that brings operating system-level security concepts to the world of AI agents. In this article, I walk through the problem we set out to solve, the architectural decisions that shaped our approach, and the technical details of how we built a system that enforces policy, verifies identity, isolates execution, and engineers reliability for autonomous AI agents.
The problem: AI agents operate in a security vacuum
To understand why agent governance matters, consider how a typical AI agent works today. A developer writes a prompt, connects a set of tools (database access, web browsing, file system operations), and hands control to an LLM. The agent reasons about what to do, selects tools, and executes actions — often in a loop, sometimes spawning sub-agents to handle subtasks.
The challenge is that in many current implementations, agent actions are unmediated. When an agent calls a tool, there is typically no security layer checking whether that call is within policy. There is often no identity verification when one agent communicates with another. There may be no resource limit preventing an agent from making 10,000 API calls in a minute. And there is frequently no circuit breaker to stop a failing agent from cascading failures across a system.
In February 2026, OWASP published the Agentic AI Top 10 (see owasp.org/www-project-agentic-ai-threats), the first formal taxonomy of risks specific to autonomous AI agents. The list highlights serious concerns for anyone running agents in production: goal hijacking through prompt injection, tool misuse, identity abuse, memory poisoning, cascading failures, and rogue agents. My team realized that addressing these risks required more than a guardrail library. It required a fundamentally new abstraction layer.
The insight: What if we treated AI agents like processes?
The key insight came from an analogy that now seems obvious in hindsight, because operating systems solved a similar problem decades ago.
In the 1970s, when multi-user computing emerged, engineers faced a similar challenge: multiple untrusted programs sharing resources on a single machine. The solution they developed was the OS kernel — a privileged layer that mediates every interaction between a process and the outside world. Processes can’t directly access hardware; they make syscalls. They can’t read each other’s memory; they have isolated address spaces. They can’t consume unlimited resources; the scheduler enforces quotas.
So, we asked ourselves: What would an “operating system for AI agents” look like?
The answer became the four-layer architecture of the Agent Governance Toolkit:
- Agent OS: The kernel. Every agent action passes through a policy engine before execution, just as every process action passes through the OS kernel via syscalls.
- AgentMesh: The identity layer. Agents have cryptographic identities (DIDs with Ed25519 key pairs) and must verify each other before communicating, similar to how mTLS works in service meshes.
- Agent Runtime: The isolation layer. Agents are assigned to execution rings based on their trust scores, with resource limits enforced per ring — inspired by CPU privilege rings.
- Agent SRE: The reliability layer. SLOs, error budgets, circuit breakers, and chaos testing — all the practices that keep distributed services reliable, applied to agent systems.
Under the hood: How policy enforcement actually works
Let me show you what runtime policy enforcement looks like in practice, because it’s the piece that distinguishes this toolkit from existing approaches.
Most “guardrail” systems work by filtering inputs or outputs — they check the prompt before the LLM sees it, or they scan the response after the LLM generates it. The problem is that agent actions happen between those two points. An agent might receive a perfectly safe prompt, reason correctly about it, and then call a tool in a way that violates policy. Input/output filtering misses this entirely.
Agent OS intercepts at the action layer. When an agent calls a tool, the call passes through the policy engine before reaching the tool:
from agent_os import StatelessKernel, ExecutionContext, Policy
kernel = StatelessKernel()
# Define what this agent is allowed to do
ctx = ExecutionContext(
agent_id="analyst-001",
policies=[
Policy.read_only(), # Default: no writes
Policy.rate_limit(100, "1m"), # Max 100 calls/minute
Policy.require_approval(
actions=["delete_*", "write_production_*"],
min_approvals=2,
approval_timeout_minutes=30,
),
],
)
# This call gets intercepted by the policy engine
result = await kernel.execute(
action="delete_user_record",
params={"user_id": 12345},
context=ctx,
)
# result.signal == "ESCALATE" → approval workflow initiated
The key design decision here was to make the kernel stateless. Each request carries its own context — policies, history, identity — rather than storing state in the kernel. We chose this pattern because it enables horizontal scaling: You can run the kernel behind a load balancer, in a serverless function, or as a sidecar container, with no shared state to manage.
The policy engine itself has two layers. The first is configurable pattern matching with sample rule sets for detecting dangerous strings like “ignore previous instructions” or SQL injection patterns. The second is a semantic intent classifier that detects dangerous goals even when the exact phrasing does not match a pattern. When an agent’s action is classified as DESTRUCTIVE_DATA, DATA_EXFILTRATION, or PRIVILEGE_ESCALATION, the policy engine flags it for intervention regardless of how the request was worded.
Zero-trust identity: TLS for AI agents
When we started looking at multi-agent systems — scenarios where multiple agents collaborate on a task — the identity challenge became clear. In many frameworks, agents communicate as simple function calls. Agent A calls Agent B, and Agent B processes whatever it receives because identity verification has not yet been a standard feature of agent communication protocols.
AgentMesh introduces a protocol we call IATP — the Inter-Agent Trust Protocol. Think of it as TLS for AI agents: encryption, authentication, and authorization in one handshake.
Every agent gets a cryptographic DID (Decentralized Identifier) backed by an Ed25519 key pair:
from agentmesh import AgentIdentity, TrustBridge
# Create identity with a human sponsor for accountability
identity = AgentIdentity.create(
name="data-analyst",
sponsor="alice@company.com",
capabilities=["read:data", "write:reports"],
)
# identity.did → "did:mesh:data-analyst:a7f3b2..."
# Before communicating, verify the peer
bridge = TrustBridge()
verification = await bridge.verify_peer(
peer_id="did:mesh:other-agent",
required_trust_score=700, # Must score ≥700/1000
)
if verification.verified:
await bridge.send_message(peer_id, encrypted_message)
One design choice that proved critical was trust decay. An agent’s trust score isn’t static — it decays over time without positive signals. An agent that was trusted yesterday but has been silent for a week gradually becomes untrusted. This models reality: In the physical world, trust requires ongoing demonstration of good behavior, and our system reflects that.
Delegation chains solve another real-world problem: When an orchestrator agent delegates a task to a worker agent, the worker should have only the permissions needed for that specific task. AgentMesh enforces scope narrowing — a parent with read and write capabilities can delegate only read access to a child, and that child cannot re-delegate broader permissions than it received.
Execution rings: Hardware security concepts for software agents
The Agent Runtime borrows from CPU architecture. Intel processors have privilege rings (Ring 0 for the kernel, Ring 3 for user processes) that prevent unprivileged code from accessing protected resources. We applied the same concept to agents, but with a twist: Ring assignment is dynamic, based on behavioral trust scores.
- Ring 0 (Privileged): Trust score ≥ 0.95. Can modify system policies. Reserved for human-verified orchestrators.
- Ring 1 (Trusted): Trust score ≥ 0.80. Standard operations with full tool access.
- Ring 2 (Standard): Trust score ≥ 0.60. Limited resource access, rate-limited.
- Ring 3 (Sandbox): Trust score < 0.60. Heavily restricted. New or untrusted agents start here.
Each ring enforces resource limits: maximum execution time per step, memory caps, CPU throttling, and request rate limits. An agent in Ring 3 might be limited to 10 API calls per minute with a five-second execution timeout, while a Ring 0 agent has no such restrictions.
The runtime also provides saga orchestration for multi-step operations. When an agent executes a sequence of actions — draft an email, send it, update the CRM — and the final step fails, the saga engine automatically calls compensating actions in reverse order. The email gets recalled, the draft gets deleted. This pattern, borrowed from distributed transaction processing, prevents the partial-completion failures that plague agentic workflows.
Reliability engineering for agents
When we built the Agent SRE package, we started with a question: How do you define “reliable” for an AI agent? Traditional SRE metrics like uptime and latency matter, but agents introduce new dimensions. An agent might be fast and available but produce incorrect results. It might be accurate but cost $500 per hour in API calls. It might work perfectly in isolation but cause cascading failures when it interacts with other agents.
We defined seven Service Level Indicators (SLIs) specific to AI agents: correctness, safety, latency, cost, availability, throughput, and delegation success rate. Each SLI gets a threshold, and together they form an error budget — a quantified tolerance for failure.
Here’s where it gets interesting: The error budget drives automated remediation. When an agent’s safety SLI drops below 99 percent (meaning more than 1 percent of its actions violate policy), the system can automatically trigger a kill switch, downgrade the agent’s execution ring, or activate a circuit breaker that rejects new requests until the agent recovers.
We also built nine chaos engineering fault injection templates — network delays, LLM provider failures, tool timeouts, trust score manipulation, memory corruption, concurrent access races — because the only way to know if your agent system is resilient is to break it on purpose in controlled conditions.
Covering the OWASP Agentic AI Top 10
When OWASP published their Agentic AI Top 10, we mapped each risk to our toolkit’s capabilities and found that the architecture provides mitigations for all ten categories:
- Goal hijacking is addressed by the policy engine’s semantic intent classifier.
- Tool misuse is mitigated by capability sandboxing and the MCP proxy.
- Identity abuse is addressed by DID-based identity and trust scoring.
- Supply chain risks are tracked by AI-BOM v2.0, which records model provenance, dataset lineage, and weight versioning.
- Code execution is constrained by execution rings and resource limits.
- Memory poisoning is detected by the Cross-Model Verification Kernel, which runs claims through multiple LLMs and uses majority voting to identify manipulation.
- Insecure communications are mitigated by the IATP protocol’s encryption layer.
- Cascading failures are addressed by circuit breakers and SLO enforcement.
- Human-agent trust exploitation is mitigated by approval workflows with quorum logic.
- Rogue agents are addressed by ring isolation, behavioral trust decay, and the kill switch.
This alignment was by design, not by accident. The OS-inspired architecture creates defense in depth — multiple independent layers that each address different threat categories. No security system can guarantee absolute protection, but by layering complementary defenses, the toolkit significantly reduces the attack surface for autonomous AI agents.
The interoperability challenge
A governance toolkit is only useful if it works with the frameworks people actually use. We designed the toolkit to be framework-agnostic, with adapters that interoperate with LangChain, CrewAI, Google ADK, AutoGen, LlamaIndex, and others. Each adapter hooks into the framework’s native extension points — LangChain’s callback handlers, CrewAI’s task decorators, Google ADK’s plugin system — so that adding governance does not require rewriting existing agent code.
Several of these adapters are already working with production frameworks: Dify (65K+ GitHub stars) has the governance plugin in its marketplace, LlamaIndex (47K+ stars) has a TrustedAgentWorker, and proposals are active for AutoGen, CrewAI, Google ADK, and Haystack.
What we learned
Building this toolkit reinforced several lessons that apply beyond agent governance:
Borrow from solved problems. The OS kernel, service mesh, and SRE playbook all addressed security and reliability challenges in other domains. Translating those patterns to AI agents was more effective than inventing from scratch.
Make security the default, not an add-on. The reason we built governance into the execution path (intercepting actions) rather than as an optional wrapper is that optional security tends to go unadopted. If adding governance requires changing agent code, many teams will defer it. That said, no security layer is a silver bullet — defense in depth and ongoing monitoring remain essential.
Trust is dynamic, not static. A binary trusted/untrusted model doesn’t capture reality. Trust scoring with behavioral decay and ring-based privilege assignment turned out to be a much better model for systems where agents are constantly changing.
Statelessness enables everything. By making the kernel stateless, we got horizontal scaling, containerized deployment, and perfect auditability for free. Every decision we agonized over early in the architecture became easier once we committed to statelessness.
Getting started
The Agent Governance Toolkit is now open source under the MIT license at github.com/microsoft/agent-governance-toolkit. You can install it with a single command:
pip install ai-agent-compliance[full]
This installs all four packages — Agent OS, AgentMesh, Agent Runtime, and Agent SRE — with version compatibility guaranteed. Individual packages are also available for teams that want to adopt governance incrementally.
The toolkit runs at sub-millisecond governance latency (< 0.1ms p99), so it adds negligible overhead to agent execution. It exports metrics to OpenTelemetry-compatible platforms (Datadog, Prometheus, Grafana, Arize, Langfuse), and it works with Python 3.10+.
AI agents are becoming autonomous decision-makers in high-stakes domains — finance, healthcare, infrastructure, security. The question is not whether we need governance for these systems, but whether we will build it proactively, before incidents occur, or reactively, after them. We’ve chosen to be proactive. We hope you join us.
Imran Siddique is on LinkedIn.
The Agent Governance Toolkit is open source under the MIT license. Contributions welcome at github.com/microsoft/agent-governance-toolkit.
The author used AI-assisted tools during the drafting of this article. All technical content, code examples, and architectural descriptions reflect the actual capabilities of the Agent Governance Toolkit and have been reviewed for accuracy.

Top comments (0)