DEV Community: Arnaud Perret

Microsoft Agent 365 vs AgentRail: Governing Agents vs Governing Actions

Arnaud Perret — Wed, 29 Apr 2026 17:24:56 +0000

https://agent-rail.dev/blog/microsoft-agent-365-vs-agentrail-governing-agents-vs-actions

On May 1st 2026, Microsoft launches Agent 365 — its control plane for enterprise AI agents. It covers agent registry, lifecycle management, security policies, and audit logging across the Microsoft 365 ecosystem.
It's a significant move. And it raises an important question for every enterprise team deploying AI agents right now:
If Microsoft is building the governance layer, does AgentRail still matter?
The answer is yes. Not because AgentRail competes with Agent 365 — but because they solve fundamentally different problems at fundamentally different layers.

What Microsoft Agent 365 Actually Does
Agent 365 is a control plane for agents as entities. Think of it as the identity and lifecycle management layer for AI agents — the equivalent of what Active Directory and Intune do for human users and devices.
Concretely, Agent 365 lets enterprise IT teams:

Discover every agent deployed across the organization — who built it, what it has access to, when it was last active
Register agents via governed workflows with security policy templates applied at onboarding
Manage lifecycle — automatically expire inactive agents, identify agents without owners, block agents flagged as high-risk
Audit agent interactions at the platform level — logs of what agents accessed and when
Enforce security through Microsoft Defender, Entra conditional access policies, and Purview data governance

This is genuinely valuable infrastructure. If your organization is running dozens or hundreds of agents across departments, knowing which agents exist and who owns them is a real problem that Agent 365 solves well.
But here's what Agent 365 does not do.

The Gap Agent 365 Doesn't Close
Agent 365 governs agents. It answers the question: "Is this agent authorized to exist in our environment?"
It does not answer the question that matters in the moment an incident happens: "Should this specific action, with this specific payload, on this production system, execute right now?"
These are different questions. And confusing them is expensive.
Here's a concrete example.
Your support agent is registered in Agent 365. It has an Entra identity. It passed your IT onboarding workflow. Its permissions are configured correctly. Lifecycle governance is in place.
That same agent receives a task: "Update account status for all customers inactive for 90 days."
Agent 365 does not intercept this action before it executes. It does not score the risk — 847 records, production environment, Customer PII, bulk write operation. It does not apply a policy that says "bulk CRM writes in production require human approval." It does not route this to a reviewer with full context. It does not produce a cryptographically signed, replayable record of what the agent was trying to accomplish, what policy should have caught it, and what the outcome was.
It logs that the action happened. After it happened.
That's the gap.

Two Layers. Two Questions. Both Necessary.
The clearest way to understand the relationship between Agent 365 and AgentRail is through the questions each layer answers:
Microsoft Agent 365 asks:

Which agents are authorized to operate in our environment?
What systems and data can each agent access?
Who owns each agent and when does its authorization expire?
What did agents access, at the platform level?

AgentRail asks:

Should this specific action execute right now?
What is the risk score of this action, given its context?
Which policy applies — and what decision does it produce?
Who approved this action, with what rationale, at what time?
Can we replay this action with a different policy to understand what should have happened?

Agent 365 is the agent identity and lifecycle layer. AgentRail is the runtime action control layer. They operate at different points in the flow:
Agent → [Agent 365: Is this agent authorized?] → Action → [AgentRail: Should this action execute?] → System
Both checks are necessary. Neither replaces the other.

What "Governing Actions" Actually Means
When AgentRail intercepts a high-risk agent action, here is what happens in under 200 milliseconds — before a single record is touched, before a single API call completes:

Capture — The action's full context is recorded: which agent, which user triggered it, what the intent was, what the prompt contained, which tool is being called, what the payload looks like, which environment is targeted.
Evaluate — A risk score is computed based on multiple dimensions simultaneously: action type (read vs. write vs. delete), environment (dev vs. staging vs. production), volume of records affected, data classification of the target, historical behavior patterns of this agent.
Policy match — The action is evaluated against versioned policy rules. If it matches a rule — "bulk writes to production CRM affecting more than 50 records require approval" — the appropriate decision is made automatically.
Decide — Three possible outcomes: Allow (low-risk, proceed automatically), Require approval (high-impact, route to a human reviewer with full context), or Block (forbidden by policy, stopped at the edge).
Record — Every action, regardless of outcome, produces an immutable Action Passport: a structured, cryptographically signed, replayable record of what happened, why, who decided, and what the outcome was. This is not logging. Logs show that an event occurred. An Action Passport proves what was intended, what was decided, why, by whom, and what changed — in a form that can be replayed, exported, and used as compliance evidence.

Why This Matters for AI Act Compliance
Both Microsoft and AgentRail address AI Act requirements — but at different levels of granularity.
The EU AI Act requires, for high-risk AI systems: traceability of every decision, explainability of reasoning, meaningful human oversight before consequential actions, and auditability of evidence.
Microsoft Agent 365 addresses the platform-level compliance requirements: agent identity, access logging, data governance through Purview.
AgentRail addresses the action-level compliance requirements: per-action evidence of intent and context, per-decision policy traceability, per-action human oversight record, and cryptographic proof that can be exported for regulatory examination.
A useful analogy: Agent 365 is to AI Act compliance what Active Directory is to general security compliance — necessary infrastructure, but not sufficient on its own for the specific requirements that apply to consequential autonomous actions.

The Runtime Stack That Enterprise Teams Are Building
The most sophisticated enterprise teams we talk to are not choosing between Agent 365 and AgentRail. They are building a layered stack:
Identity and lifecycle → Microsoft Agent 365 (or Okta, or Entra natively)
Runtime action control → AgentRail
Agent framework → Claude Code, LangChain, Dust, n8n, or custom
Enterprise systems → GitHub, Salesforce, HubSpot, Stripe, internal APIs
Each layer does one thing well. The governance gap isn't filled by any single tool — it's filled by the right combination of purpose-built layers.
AgentRail fits into this stack without replacing anything. It works alongside Agent 365, not instead of it. If your agents are registered in Agent 365 and governed at the platform level, AgentRail adds the action-level control that the platform layer doesn't provide.

A Note on Ecosystem
Microsoft Agent 365 is built for the Microsoft ecosystem — agents with Entra identities, published through Microsoft 365 channels, integrated with Defender and Purview.
AgentRail is runtime-agnostic. It works with Claude Code, LangChain, CrewAI, Dust, Glean, n8n, and custom-built agents — regardless of which cloud they run on, which identity provider manages them, or which framework built them.
For organizations running mixed environments — some Microsoft-native agents alongside custom-built or third-party agents — AgentRail provides action-level governance across the entire fleet, not just the Microsoft-registered portion.

The Bottom Line
Microsoft Agent 365 is a significant and welcome addition to enterprise AI infrastructure. It solves a real problem: the proliferation of ungoverned agents across organizations that have no visibility into what agents exist, who owns them, or whether they're still active.
AgentRail solves a different problem: what happens when those agents act. The moment a governed agent calls a production API, sends a bulk write to a CRM, modifies permissions in an identity system, or triggers a financial transaction — that's when action-level governance matters.
Microsoft governs your agents.
AgentRail governs their actions.
Both questions need answers. The enterprises that answer both will be the ones that scale AI agent deployment with confidence — and the ones that can prove it when their auditors, regulators, or legal counsel ask.

AgentRail is the runtime action control layer for enterprise AI agents — independent of framework, cloud, and identity provider. It works alongside Microsoft Agent 365, not instead of it.

Claude Code in Enterprise Production: What Risks to Control

Arnaud Perret — Sat, 25 Apr 2026 17:28:28 +0000

https://agent-rail.dev/blog/claude-code-enterprise-production-risks

Claude Code can deploy code, merge pull requests, and modify production systems autonomously. Here's what enterprise teams need to govern before deploying it at scale.

Claude Code is one of the most capable coding agents available today. It can write code, run tests, open pull requests, merge branches, interact with CI/CD pipelines, and — with the right tools — deploy directly to production environments.

For individual developers, this is transformative. For enterprise teams, it introduces a governance question that most organizations are not yet equipped to answer: when Claude Code acts autonomously on your production systems, who is in control?

What Claude Code Can Actually Do
It is worth being precise about Claude Code's capabilities in an enterprise context, because the gap between "coding assistant" and "autonomous production actor" is larger than many teams realize.

With standard integrations, Claude Code can:

Read and write files across your codebase
Execute shell commands and scripts
Interact with Git — commits, branches, pull requests, merges
Call APIs through MCP (Model Context Protocol) tools
Interact with GitHub Actions, CI/CD pipelines, and deployment systems
Access databases and internal APIs through configured tool integrations
In a well-configured enterprise environment, this means Claude Code can autonomously take actions that directly affect production systems — merging code, triggering deployments, modifying configuration, or running scripts that change live data.

This is not a criticism of Claude Code. It is the point of it. The capability is the value.

But capability without governance is risk.

The Four Risk Categories for Claude Code in Enterprise

Production Code Deployment Risk The most direct risk is that Claude Code, operating on a task, makes changes that reach production environments in ways that were not intended or reviewed.

This can happen through several paths:

Merging a pull request that triggers an automatic deployment pipeline
Pushing directly to a branch with auto-deploy configured
Modifying infrastructure-as-code files that trigger cloud resource changes
Interacting with CI/CD systems in ways that initiate production workflows
In each case, the action is technically authorized — Claude Code has the credentials and permissions to perform it — but the organization may not have intended for an autonomous agent to make this class of decision without human review.

What governance looks like: Policy rules that require human approval for any action involving production branch merges, deployment triggers, or infrastructure modifications. Risk scoring based on the target environment (development vs. staging vs. production) and the type of change.

Codebase Integrity Risk Claude Code operating across a codebase can make changes that are individually reasonable but collectively problematic — refactoring that introduces subtle bugs, dependency updates that create compatibility issues, or architectural changes that conflict with decisions made in other parts of the codebase.

The risk compounds when Claude Code is operating autonomously across multiple tasks simultaneously, or when it is working in a codebase where the full context of prior decisions is not captured in the code itself.

What governance looks like: Audit trails that capture the full context of each code change — what Claude Code was trying to accomplish, what files were modified, what tests were run, what the outcome was. This context is essential for debugging when something goes wrong.

Secrets and Sensitive Data Risk Claude Code, in the course of working on a codebase, may encounter or need to handle sensitive information — API keys, database credentials, customer data in test fixtures, internal system addresses, or proprietary business logic.

The risk is not primarily that Claude Code will exfiltrate this information maliciously. The risk is that it might inadvertently include sensitive data in outputs, logs, pull request descriptions, or comments in ways that expand exposure beyond the intended scope.

What governance looks like: Policy rules that flag actions involving files known to contain sensitive data, require review for pull requests that touch configuration or secrets management code, and capture payload context in a way that can be audited without reproducing the sensitive content itself.

Scope Creep Risk AI agents operating autonomously tend to take the actions necessary to complete their assigned task — which sometimes means actions that were not explicitly authorized but that the agent judges necessary to achieve the goal.

For Claude Code, this might mean: opening additional pull requests to fix issues discovered while working on the primary task, modifying files outside the explicitly specified scope, or interacting with systems beyond the immediate task context in order to gather information or complete a prerequisite.

This is often useful behavior. It is also behavior that can take actions outside the organizational intent of the original task.

What governance looks like: Clear scope boundaries enforced at the policy level, with alerts or approval requirements when Claude Code attempts to take actions outside the defined task scope.

What Enterprise Governance for Claude Code Looks Like in Practice
Here is a concrete example of how governance changes the risk profile of a Claude Code deployment.

Scenario: A developer asks Claude Code to refactor a module and open a pull request for review.

Without governance: Claude Code works through the task, makes the changes, opens the pull request, and — noticing that the tests were failing on main — also merges an unrelated bug fix to unblock the CI pipeline. The merge triggers a deployment. The deployment includes an unreviewed change. A production incident follows.

Every individual action Claude Code took was technically authorized. The sequence of actions was not what the organization intended.

With governance: - Claude Code opens the pull request as requested — low risk, allowed automatically - Claude Code attempts to merge the unrelated bug fix — production branch merge, risk score elevated, routed for human approval - The reviewer sees the context: which agent, which task, what merge, what the CI status is - The reviewer approves or blocks with full information - Every action is recorded with intent, payload, and outcome as immutable evidence

The developer still gets the value of Claude Code. The organization maintains control over production-impacting decisions.

The MCP Surface Area
Claude Code's MCP (Model Context Protocol) integration significantly expands its tool access. Through MCP, Claude Code can be connected to virtually any API or system — databases, internal tools, cloud platforms, communication systems, external services.

Each MCP connection expands what Claude Code can do autonomously. Without governance at the MCP action layer, each new tool integration also expands the potential blast radius of an unintended action.

Effective governance for MCP-connected Claude Code deployments requires policy coverage at the tool level — not just "Claude Code is allowed to use the database MCP" but "Claude Code is allowed to read from the database MCP in development, and requires approval to write to any database in production."

Building the Right Trust Model
The goal of governance for Claude Code is not to slow it down or to add friction to every action. It is to build the right trust model — one where the level of human oversight is proportional to the potential impact of the action.

Low-risk actions (reading code, running tests, creating branches) should proceed automatically. Medium-risk actions (opening pull requests, modifying configuration) should be logged and monitored. High-risk actions (merging to production branches, triggering deployments, modifying infrastructure) should require explicit human approval.

This graduated trust model allows Claude Code to operate at full speed on the vast majority of its work, while ensuring that the decisions with real production impact remain under meaningful human control.

Practical Steps for Enterprise Teams
If you are deploying Claude Code in an enterprise environment, here are the immediate steps that reduce risk most significantly:

Inventory Claude Code's tool access. List every system Claude Code can interact with — Git repositories, CI/CD systems, databases, APIs. This is your governance surface area.
Classify actions by environment and impact. Separate read actions from write actions. Separate development environment actions from production environment actions. These two dimensions drive most of your risk assessment.
Define approval requirements for high-impact actions. At minimum, production branch merges, deployment triggers, and infrastructure changes should require human review before execution.
Establish audit trails for every action. Every action Claude Code takes should be captured with full context — what it was trying to do, what it did, and what the outcome was. This is essential for incident investigation and compliance.
Test your policies before you need them. Run Claude Code against historical tasks with your governance policies active in simulation mode to validate that they catch the right actions before you rely on them in production.

Claude Code is genuinely powerful technology. Deploying it with governance in place does not reduce that power — it makes the power safe to use at enterprise scale.

AgentRail works with Claude Code and other agent runtimes to provide the control layer that makes autonomous coding agents safe to deploy in enterprise production environments.

https://agent-rail.dev/