TL;DR
- Enterprise-Managed Authorization (EMA) centralizes access provisioning and eliminates per-server consent prompts. It is the right solution for connection-time governance. It was not designed to authorize each individual tool call, and it does not.
- AI workflows need per-action authorization to limit the blast radius of prompt injection, because attacks exploit the gap between "this agent is allowed to connect" and "this specific action should execute right now."
- A secure authorization layer must evaluate the intersection of organization policies, user delegation, and agent capability boundaries immediately before an action executes.
- Production-grade deployments use a pre-execution interceptor and credential isolation to guarantee that large language models never access raw authentication tokens directly.
- High-risk production deployments need action-level runtime enforcement, implemented in-house or through an action runtime such as Arcade, without replacing existing corporate identity infrastructure, including EMA.
What Enterprise-Managed Authorization (EMA) Solves for MCP
Enterprise-Managed Authorization is now stable. The extension, adopted by Anthropic, Microsoft, Okta, and a growing number of MCP servers, solves the per-server OAuth consent tax that slowed enterprise MCP adoption.
Before EMA, every employee had to authorize every MCP server individually. Security teams had no centralized control. Work and personal accounts bled together. EMA eliminates all of this by making the organization's IdP the authoritative decision-maker for MCP server access. Administrators define policy once. Users authenticate through single sign-on and inherit every server their role permits. No per-app OAuth, nothing to configure as a one-off.
Under the hood, as part of the SSO-based authorization flow, the client obtains an identity assertion and uses it to request an Identity Assertion JWT Authorization Grant (ID-JAG), which it exchanges for access tokens from each MCP server's authorization server. Three properties follow: authorize once and inherit everywhere, centralized policy and audit for access decisions, and elimination of personal/enterprise account mixups.
This is valuable infrastructure. It is also, by design, a grant-time decision. EMA's IdP evaluates policy when tokens are issued (and may re-evaluate on renewal), but its standardized authorization visibility does not extend to individual tool calls. EMA determines who may connect to what. It has nothing to say about whether a specific tool call, proposed by a potentially compromised agent five minutes after the token was issued, should actually execute.
That gap is where the real attacks live.
How Prompt Injection Exploits Authenticated AI Agents
In early 2025, security researcher Johann Rehberger demonstrated SpAIware: a single indirect prompt injection, delivered through a malicious website, planted persistent instructions in ChatGPT's memory store. Those instructions survived logouts and browser restarts. The compromised instance then acted as a command-and-control relay, polling a public GitHub repository for attacker commands and writing exfiltrated data to Azure Blob Storage request logs. The CSA's March 2026 Promptware report generalized this into a broader class of agent C2 attacks.
The agent's built-in capabilities (web access, memory, code execution) were all legitimately available to its runtime. EMA-style centralized provisioning would not have changed the outcome. The injected instructions exploited capabilities already present in the agent's environment, not separately provisioned OAuth connections. No authorization layer distinguished a user-initiated action from an injection-initiated one. Connection-time governance was powerless because the problem was never authentication. The agent was who it claimed to be.
In mid-2026, researchers demonstrated prompt-injection attacks through GitHub comments, issue bodies, and PR titles that hijacked Claude Code, Gemini CLI, and GitHub Copilot Agent. Across the three products, the attacks exploited pre-authorized tool capabilities to exfiltrate CI secrets; some variants also induced shell-command execution. A related academic study documented similar injection vectors across 15 GitHub Actions. Anthropic's remediation was telling: they disallowed the ps tool rather than restricting broad tool access. The response was a band-aid on a connection-level wound.
These are not isolated demonstrations. F5 describes a banking scenario in which threat actors use prompt injection against an AI chatbot to initiate unauthorized financial transactions, with the bank identifying the loss only after multiple accounts are impacted. The AI Red Teaming Guide catalogs a growing body of MCP-related vulnerabilities disclosed through 2025. Simon Willison, who has tracked prompt injection since 2022, coined the "lethal trifecta" for this pattern: private data, untrusted content, and external communication converging in the same system.
The common thread across every attack: attackers induced agents to misuse capabilities already available to their runtimes. No authorization layer asked whether the specific action matched the user's intent.
Per-action authorization evaluates whether a specific tool call should proceed based on the intersection of organization policy, user delegation, and agent capability, checked at execution time, after the prompt, for every action independently. It is distinct from grant-time authorization (evaluated at token issuance, which is what EMA provides) and session-level authorization (checked once per conversation).
Per-action authorization is not itself a prompt-injection detector. It limits blast radius by denying or escalating actions that violate deterministic constraints. An injected action that remains within those constraints may still execute, so provenance controls, content isolation, and human approval remain necessary for sensitive operations.
EMA vs. Per-Action Authorization: Provisioning vs. Runtime
EMA and per-action authorization are not competing solutions. They operate at different points in the execution lifecycle and address different threat models.
| Concern | EMA (Connection-Time) | Per-Action Authorization (Runtime) |
|---|---|---|
| Decision point | Before the agent connects to a server | Before the agent executes a specific tool call |
| What it answers | "Is this user/agent allowed to access this MCP server?" | "Should this specific action execute in this context?" |
| Policy inputs | IdP groups, roles, conditional access rules | Organization policy + user delegation + agent capability + tool arguments + trusted provenance and risk signals |
| Threat model | Unauthorized connections, personal/enterprise mixups, shadow IT | Prompt injection, permission abuse, lateral movement through valid connections |
| Evaluation frequency | At token issuance/renewal | Every tool call |
| Audit trail | "User X connected to Server Y at time T" | "Agent A attempted action B with parameters C, evaluated against policy D, outcome E" |
EMA provides the outer gate. It ensures that only authorized users connect to approved servers through managed corporate identities. But EMA itself adds no per-tool-call semantic policy. Individual MCP servers may enforce scopes, ACLs, or rate limits on each request, but those controls are server-specific, inconsistent across the ecosystem, and unaware of whether a tool call originated from user intent or injected instructions.
The NSA's May 2026 Cybersecurity Information document on MCP security is blunt: "MCP itself cannot enforce these security principles at the protocol level." This applies equally to EMA. The extension centralizes provisioning decisions. It does not, and cannot, evaluate whether the tool call an agent is about to make was triggered by the user's intent or by a malicious instruction embedded in a GitHub comment.
Why OAuth Scopes Are Not Enough for AI Agent Authorization
OAuth scopes are space-delimited strings and are often too coarse for transaction-specific authorization. A mail.send scope grants the ability to email any recipient. It cannot encode which recipient, in what context, whether the user intended this specific email, or whether the conversation was corrupted by an injection.
RFC 9396 (Rich Authorization Requests) partially addresses this by using JSON objects to describe API access with type, locations, and actions fields. RAR can constrain later operations using transaction-specific authorization details (recipient, amount, resource), and resource servers can enforce those details. But RAR does not standardize provenance-aware evaluation of whether an agent's later action still reflects the user's current intent. When an agent makes a tool call from a potentially compromised conversation, RAR constrains the parameters but cannot determine whether the call was user-initiated or injection-initiated.
The MCP specification's auth extensions face the same structural limitation. As of June 2026, both EMA and Client Credentials operate at the transport/connection level. The ext-auth repository contains no per-action authorization extension. Final MCP SEP-2468 recommends that authorization servers include the OAuth authorization-response iss parameter and requires clients to validate it, mitigating authorization-server mix-up attacks. This is a transport-security measure, not per-action evaluation. MCP's core authorization does support runtime insufficient-scope challenges and step-up authorization, where scopes may depend on request arguments and context. These are valuable server-side controls, but they remain server-defined scope enforcement, not standardized provenance-aware authorization.
This is not an oversight in the protocol or the extension. It reflects an architectural boundary. Authentication answers "who is this?" Connection-level authorization (including EMA) answers "what can this entity access?" Per-action authorization answers "should this specific action happen right now?" Zero-touch OAuth establishes the first two. The third requires an additional application- or runtime-level mechanism.
OAuth has progressively added defenses across the authorization and token lifecycle. RFC 6749 (2012) and RFC 6750 defined bearer tokens without sender-constraining. PKCE (2015) mitigated authorization-code interception. DPoP (2023) sender-constrained tokens to reduce replay. RFC 9700 (2025) updated the entire threat model based on "practical experiences gathered since OAuth 2.0 was published." These mechanisms are not per-action authorization, but they illustrate the broader movement away from relying on bearer credentials alone. Each addition responded to real attacks that exploited assumptions about what grant-time credentials could safely cover.
The Three-Layer Authorization Model for AI Agents
Agents operate at the intersection of three distinct permission sets, not one.
AWS IAM provides a useful precedent for this model. The following table simplifies IAM's full evaluation logic (which combines identity-based and resource-based grants, then constrains them by permissions boundaries and SCPs) to illustrate the intersection principle:
| IAM Layer | Agent Authorization Analog | What It Controls |
|---|---|---|
| Service Control Policy (Organization) | Organization policy | Maximum permissions any agent in this org can possess |
| Identity-based policy (User) | User delegation | What this specific user has delegated to the agent |
| Permission boundary (Entity) | Agent capability boundary | What this agent type is designed and permitted to do |
The identity or resource policy must grant the action, while the permissions boundary and SCP must permit it. An explicit deny overrides an allow, and adding a permissions boundary can only reduce effective permissions.
EMA maps cleanly onto the first two layers at connection time. The IdP enforces organization-level policy (which servers are approved) and user-level access (which roles and groups the user belongs to). But it evaluates these layers at token issuance, not per tool call, and it does not standardize an agent-specific capability boundary. OAuth authorization servers can apply client-specific policy, but EMA itself does not define how agent capabilities should be constrained beyond what scopes and roles permit.
Suppose your organization policy says "no agent may delete production databases." A user has delegated broad access to their calendar, email, and project management tools. The agent is a triage-bot designed to label issues and assign them. The effective permission is the intersection: the triage-bot can label and assign issues in the user's projects, and nothing else. It cannot send email (outside its capability boundary), cannot delete databases (blocked by org policy), and cannot access another user's calendar (not delegated).
Oso's 2026 Least Privilege Report (analyzing 2.4 million workers and 3.6 billion permissions) found that 96% of enterprise permissions go unused over 90 days. Employees typically possess 10 times the access they actually need. Thirty-one percent of workers can modify or delete sensitive data. Thirteen percent can reach regulated data including financial and health records.
Humans often leave dormant permissions unused because of judgment, habit, and professional accountability. Agents do not share those natural constraints and can operate continuously at machine speed. When an agent inherits a human's permission set through a grant-time OAuth token (whether provisioned manually or through EMA), it may exercise capabilities the human rarely touches, turning latent over-provisioning into active attack surface.
OpenFGA (built on Google Zanzibar's principles) has formalized this by modeling agents as first-class principals, identical to human users, with explicit authorization tuples like user: agent:triage-bot, relation: member, object: project:alpha. But the intersection model must be augmented with runtime evaluation: not just "does this agent have the permission?" but "does this agent's current context justify exercising this permission?"
Zero-Touch OAuth vs. Runtime Security for AI Agents
The zero-touch reflex and the security reflex are both right, and they pull in opposite directions.
One view holds that the protocol should stay out of application-level authorization. Before EMA, users completed one authorization flow per MCP server; afterward, the client included a bearer token that the server validated on every HTTP request. EMA centralizes that initial provisioning without changing the server's responsibility to validate requests.
The opposing view holds that user-visible friction can still serve a purpose. A per-server consent prompt is not approval of each transaction, but it does show the user what access is being granted. In hosts that expose connected tools across conversations, pre-connecting a high-stakes server can make it reachable from any such conversation. That argues for separate transaction-specific controls, not for preserving per-server OAuth prompts as their substitute.
Some security teams value explicit user consent for accountability, while others prefer centrally administered access with fine-grained agent policies. Both needs can be met by combining centralized provisioning with runtime enforcement and targeted human approval.
Without a runtime enforcement layer, zero-touch provisioning can leave an action-level authorization gap. Authorization should therefore be separated from model decision-making and enforced by the harness or execution layer, whether in-process, in a sidecar, or as a remote service.
How to Implement Per-Action Authorization with a Pre-Execution Interceptor
Insert a policy evaluation point between the LLM's tool-call decision and the actual tool execution. This is the "post-prompt, pre-execution" gap that EMA and zero-touch OAuth leave open by design.
The common objection is latency. Three implementations demonstrate that per-action policy evaluation is feasible at low cost relative to typical LLM inference:
-
Microsoft's Agent Governance Toolkit (April 2026), which Microsoft describes as the first toolkit addressing all 10 OWASP agentic AI risks: a stateless policy engine with a
ToolCallInterceptorthat hooks into native framework extension points. Microsoft's own benchmarks report p99 under 0.1 milliseconds. - OPA/Rego sidecar: suitable local policies can evaluate in single-digit milliseconds, although teams should benchmark their own policy complexity and deployment topology.
- Google Zanzibar: per-request authorization serving many large-scale Google services. Reported p95 under 10 milliseconds at millions of checks per second.
The minimal viable architecture has three components:
- Interceptor hooking between the LLM's tool-call output and tool execution. Frameworks provide native extension points (LangChain callbacks, CrewAI middleware).
- Stateless policy engine evaluating each call against organization, user, and agent policy layers. OPA, Cedar, or equivalent, running locally or as a sidecar.
- Credential store isolated from the LLM. Raw tokens are never exposed to the model's context window. Credentials are injected only after policy allows execution.
The interceptor pattern in practice looks like this:
async def authorized_tool_call(tool_name, args, agent_id, delegation_chain):
decision = await opa_evaluate({
"tool": tool_name,
"args": args,
"agent_id": agent_id,
"delegation_chain": delegation_chain
})
if decision["outcome"] == "allow":
return await execute_tool(tool_name, args)
elif decision["outcome"] == "deny":
return {"error": decision["reason"], "code": decision["reason_code"]}
elif decision["outcome"] == "escalate":
return await request_human_approval(tool_name, args, decision["reason"])
else:
return {"error": "Unknown policy outcome", "code": "unknown_outcome"}
Production implementations should canonicalize tool arguments, bind policy decisions and human approvals to a hash of the exact tool name and arguments, and re-evaluate policy after an asynchronous approval. This prevents arguments, credentials, or policy state from changing between authorization and execution.
When Rego policies are written to return structured decisions (reason code, deciding policy rule), OPA can surface that context to the caller. A safe, user-facing reason code can be returned to the model so it can replan. Detailed policy rules and sensitive denial context should remain in internal audit logs rather than being exposed to the model.
Production implementations use RFC 8693 OAuth 2.0 Token Exchange to issue short-lived, least-privilege credentials bound to the current user and session. The LLM never sees any token; the execution layer receives the attenuated credential. This means a successful prompt injection that exfiltrates the agent's context window yields no actionable credentials. EMA's ID-JAG flow establishes the user's identity; credential isolation reduces the risk of that identity being exploited through token theft. Action-level policy and containment remain necessary to prevent the execution layer itself from being used as a confused deputy.
Different risk levels warrant different patterns:
| Pattern | When to Use | Latency | Human Required? |
|---|---|---|---|
| Synchronous policy check | Read operations, low-risk tool calls | < 10ms | No |
| Asynchronous human-in-the-loop (HITL) approval | Financial transactions, data deletion | Minutes to hours | Yes |
| Deny-with-replan | Agent can choose an alternative action | < 10ms + inference | No |
The asynchronous pattern draws from financial services' four-eyes principle (maker-checker): one party prepares an action, another independently reviews and approves before execution. The agent is the "maker." When a human independently reviews the agent's proposed action, this is literal maker-checker. Automated policy enforcement provides an analogous independent control but is not, by itself, the four-eyes principle.
Why Per-Action Authorization Is Inevitable for Enterprise AI
The industry has repeatedly moved from coarse upfront grants toward narrower runtime controls, and each time, it wasn't optional for long.
Android permissions. Before Android 6.0 Marshmallow (2015), apps received all requested permissions at install time. Users faced an all-or-nothing choice. Android 6.0 moved "dangerous permissions" to a contextual, just-in-time model: apps must request them at the moment of use, and users can deny or revoke specific permissions. Once granted, permissions persist until revoked, so this is not per-action authorization. But the shift from blanket install-time grants to contextual, revocable runtime grants is the same directional move. Install-time permissions are connection-time provisioning (EMA's domain).
Google BeyondCorp. After Operation Aurora (2010) demonstrated that perimeter-based trust was insufficient, Google replaced its castle-and-moat model with per-request evaluation based on device state, user identity, and context, regardless of network location. The lesson: "connected" (on the corporate network) was not an authorization decision.
OAuth's own evolution. OAuth retained bearer-token deployments while adding PKCE, DPoP, and updated security guidance to harden different stages of the flow. Neither PKCE nor DPoP is per-action authorization, but both responded to attacks that exploited assumptions about what grant-time credentials could safely cover.
AI agent authorization is the next instance. EMA represents the maturation of the connection layer, the same way centralized SSO matured enterprise web app access. The CSA, NSA, and OWASP already emphasize action-level controls, least privilege, deterministic validation, and explicit approval for consequential operations. The question is how quickly the industry will build the runtime layer that complements centralized provisioning.
Compliance pressure is accelerating the timeline. SOC 2 Trust Services Criteria map naturally to per-action controls. CC6.1 (logical and physical access controls) can be supported when audit trails capture each agent action, not just token issuance. CC6.6 (system boundary protection) is strengthened when policy enforcement operates at the tool-call level, not just the network perimeter. CC7.2 (anomaly monitoring) benefits from granular agent telemetry that reveals unusual tool-call patterns in real time. Per-tool-call logging is not a verbatim SOC 2 requirement, but it can provide useful evidence when auditors assess how agent access and actions are controlled.
On the analyst side, Gartner's Market Guide for Guardian Agents and Forrester's 2026 Technology and Security Predictions both signal that agent governance is now an enterprise category. Forrester predicts enterprises will defer 25% of planned AI spending to 2027 as financial scrutiny intensifies and organizations struggle to demonstrate ROI.
Building a Production Per-Action Authorization Architecture
A production-grade implementation requires seven components:
- Connection-time provisioning (EMA, centralized IdP) controlling which users and agents access which servers.
- Pre-execution interceptor between the LLM's tool-call output and execution.
- Policy engine evaluating the three-layer intersection (org x user x agent) per call.
- Credential isolation from the LLM, with tokens injected only after policy allows.
- Deny-by-default stance with structured reason feedback for model replanning.
- Human-in-the-loop (HITL) approval for high-risk actions via Slack, email, or equivalent out-of-band flow.
- Per-action audit logging supporting SOC 2 Trust Services Criteria (CC6.1, CC6.6, CC7.2).
None of these components require novel technology. Microsoft AGT delivers sub-millisecond policy enforcement. OPA handles deny-with-reason in single-digit milliseconds. Zanzibar processes millions of authorization checks per second. EMA handles centralized provisioning today. The necessary building blocks exist. The gap is in connecting them: applying policies consistently across all agents as they scale to more users and systems. That is the central gap an action runtime fills. Without infrastructure for secure action, organizations often restrict agents to analysis and recommendations, keeping realized ROI incremental.
Arcade.dev evaluates agent scope and user scope together on every tool call. Its Contextual Access capability adds customer-defined organization policy through pre-execution hooks that can allow, deny, or modify tool calls. Credentials remain isolated from the LLM, and the model never receives raw tokens. Arcade's catalog includes 8,000+ agent-optimized tools designed around natural-language intent rather than raw API passthrough.
Arcade goes beyond routing. Its MCP Gateway federates multiple servers behind a single controlled endpoint. For governance, Arcade generates structured, OpenTelemetry-compatible audit events for every agent action, attributable to the requesting user and exportable to enterprise SIEM systems.
Arcade integrates with existing OAuth and IdP flows, including Microsoft Entra and Okta, rather than replacing them. It can be deployed in Arcade Cloud, in a customer VPC, on-premises, or in a fully air-gapped environment, allowing organizations to control data residency and network isolation.
Other tools in this space (OPA, Cedar, Microsoft AGT, Kontext, AuthZed) address individual pieces: policy engines, credential management, or governance overlays. Arcade provides all of these capabilities out of the box. By uniting agent authorization (policy and credentials), agent-optimized tools, and lifecycle governance into a single runtime, Arcade solves the complete execution-time security challenge. That matters because these three concerns interact at execution time.
Conclusion
EMA is the right answer to one authorization problem, but not the complete answer for agent runtime security.
The industry has repeatedly moved from coarse upfront grants toward narrower runtime controls. Each time, early adopters avoided the painful retrofit that the rest of the industry eventually endured.
The teams building continuous authorization into their agent architecture now, complementing EMA with runtime policy enforcement, make the same bet the Android, BeyondCorp, and OAuth security teams made: that "provisioned" was never the same as "authorized," and that the gap between them is where real attacks live.
FAQ
What is Enterprise-Managed Authorization (EMA) for MCP?
Enterprise-Managed Authorization is an MCP extension that allows organizations to centrally manage which MCP servers their users can access. It uses the organization's identity provider (IdP) to provision access based on groups, roles, and conditional access rules. Users authenticate once through SSO and automatically connect to all approved MCP servers without per-server consent prompts.
How does EMA relate to per-action authorization?
EMA and per-action authorization solve different problems at different points in the execution lifecycle. EMA governs who connects to what (provisioning). Per-action authorization governs whether a specific tool call should execute (runtime enforcement). EMA is the outer gate; per-action authorization is the inner gate. A complete enterprise architecture needs both centralized provisioning and runtime enforcement; EMA is one way to provide the provisioning layer.
What is per-action authorization for AI agents?
Per-action authorization is a security model that evaluates whether a specific AI agent tool call should proceed based on organization policy, user delegation, and agent capability. It checks permissions at execution time, immediately after the prompt and before the action occurs. This limits the blast radius of prompt injection by blocking policy-violating actions, even when the underlying permissions were legitimately provisioned through EMA or standard OAuth.
Why is EMA not sufficient for AI agent security?
EMA centralizes access provisioning, which is valuable. But it evaluates access at token issuance (not per tool call) and cannot detect if a specific runtime action was genuinely requested by the user or triggered by a prompt injection. Because AI agents execute tasks at machine speed, they can rapidly exercise latent over-provisioning inherent in standard OAuth scopes, even when those scopes were provisioned through a centrally managed, policy-governed flow.
How can prompt injection abuse access granted through EMA and OAuth?
Prompt injection abuses EMA- and OAuth-granted access by planting malicious instructions within untrusted content that an authenticated AI agent processes. Because the agent's connection to tools like GitHub or Azure is already authorized via valid, centrally-provisioned tokens, these calls use valid credentials and remain within granted scopes, so they can pass conventional token, scope, and ACL checks. Those checks do not establish whether the user intended the particular action.
Does per-action authorization add latency to AI agents?
Per-action authorization typically adds low latency when evaluated locally or in-process. Suitable local policies can complete in single-digit milliseconds, though results vary with policy complexity and network topology. For local policies this overhead is usually small relative to LLM inference, but remote services and complex policies should be benchmarked in the target deployment.
How do you implement per-action authorization alongside EMA?
You implement per-action authorization by inserting a pre-execution interceptor between the LLM tool call output and the actual tool execution. This interceptor uses a stateless policy engine to evaluate the requested action against organization, user, and agent policies. EMA continues to handle grant-time provisioning through the IdP. Developers can build this architecture manually or use an action runtime platform like Arcade to enforce runtime checks across their agent infrastructure while preserving their existing EMA and IdP flows.
What Does Arcade Do for AI Agent Authorization?
Arcade is an action runtime platform that provides per-action authorization, managed tools, and governance for AI agents in a single unified system. It evaluates agent and user scopes on every tool call and can enforce customer-defined organization policy through pre-execution hooks immediately before execution. Arcade integrates with existing IdP infrastructure (such as Microsoft Entra and Okta via OIDC) rather than replacing it, adding the runtime enforcement layer that grant-time provisioning cannot provide. It also isolates credentials from the LLM so that the model never sees raw tokens, reducing credential-exfiltration risk during prompt injection attacks. Action-level policy and containment remain necessary to prevent the execution layer from being used as a confused deputy.
Top comments (0)