<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arcade.dev</title>
    <description>The latest articles on DEV Community by Arcade.dev (arcade).</description>
    <link>https://dev.to/arcade</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12915%2Febc942d3-5ae5-44e5-9a4a-06829aad6a1a.png</url>
      <title>DEV Community: Arcade.dev</title>
      <link>https://dev.to/arcade</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arcade"/>
    <language>en</language>
    <item>
      <title>Claude Tag: How to Build Your Own Slack AI Agent with Arcade.dev</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 25 Jun 2026 20:21:44 +0000</pubDate>
      <link>https://dev.to/arcade/claude-tag-how-to-build-your-own-slack-ai-agent-with-arcadedev-3724</link>
      <guid>https://dev.to/arcade/claude-tag-how-to-build-your-own-slack-ai-agent-with-arcadedev-3724</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;"Today, 65% of our product team's code is created by our internal version of Claude Tag."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's Anthropic, talking about its own engineering team. And this is not code autocomplete or a chatbot generating snippets in isolation. Claude Tag is a shared agent inside Slack that teammates mention by name to investigate bugs, pull metrics, work support tickets, and complete longer-running tasks. It reads thread context, connects to approved tools and codebases, and posts results back in the same conversation.&lt;/p&gt;

&lt;p&gt;The question is not whether Claude Tag is impressive. It is: what would your team delegate if you had one?&lt;/p&gt;

&lt;p&gt;You do not need to recreate Anthropic's entire product to find out. This tutorial recreates Claude Tag's core interaction pattern, not Anthropic's proprietary product. Start with one high-value Slack workflow, give the agent a small toolset, and use &lt;a href="https://www.arcade.dev" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; for the action layer: tool connectivity, authorization, and controlled access to external systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways: Claude Tag and building your own Slack AI agent
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Tag is Anthropic's shared AI agent for Slack&lt;/strong&gt;. It lets teams mention &lt;code&gt;@Claude&lt;/code&gt; in selected channels to complete multi-step work using conversation context, connected tools, and codebases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Tag turns Slack into the agent interface&lt;/strong&gt;. It can remember relevant channel context, work asynchronously, use a dedicated identity, and return results in the thread where the request began.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You can recreate the core Claude Tag pattern.&lt;/strong&gt; This tutorial builds a Claude Tag-style Slack AI agent with Python, Slack Bolt, OpenAI, and Arcade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arcade provides secure tool access.&lt;/strong&gt; The example connects the agent to read-only GitHub, Datadog, and PagerDuty tools while Arcade handles authorization, credentials, tool execution, and access controls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start with one bounded workflow&lt;/strong&gt;. Incident triage is a strong first use case because it crosses multiple systems, produces reviewable evidence, and does not require irreversible actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production agents need explicit safeguards.&lt;/strong&gt; Restrict the agent to approved Slack channels, use dedicated or per-user identities, require human approval for consequential writes, log its actions, and maintain a kill switch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Claude Tag and why does your team want it?
&lt;/h2&gt;

&lt;p&gt;Anthropic launched &lt;a href="https://www.anthropic.com/news/introducing-claude-tag" rel="noopener noreferrer"&gt;Claude Tag&lt;/a&gt; on June 23, 2026 as a beta for Enterprise and Team customers. The operating model is simple: Claude joins selected Slack channels as a teammate. Anyone in the channel can tag &lt;code&gt;@Claude&lt;/code&gt; with a request. It breaks the task into stages, works through them using connected tools, and replies in-thread with what it produced. Once a thread is active, anyone there can steer it without re-mentioning the agent.&lt;/p&gt;

&lt;p&gt;What makes this different from a personal chatbot is that the work happens in public. The channel is the interface, the context, and the audit trail. A single shared Claude instance serves an entire channel, building persistent memory as it follows along. It can work asynchronously, schedule its own follow-up tasks, and combine context from Slack threads, Google Drive docs, ticketing systems, and data warehouses into a single answer.&lt;/p&gt;

&lt;p&gt;The underlying insight is not about AI capabilities. It is about where work starts. Most cross-functional tasks begin as a Slack message. Someone asks a question, flags a problem, or requests information that lives across three systems. The true value of shared agents is when it can do useful work in a place where that work already begins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do not build an AI employee. Pick one workflow.
&lt;/h2&gt;

&lt;p&gt;The fastest way to stall an agent project is to scope it as "an AI that can do anything." Start with one workflow. Choose something that is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frequent.&lt;/strong&gt; The team does it every week, ideally every day.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-system.&lt;/strong&gt; It requires pulling context from two or more tools (Slack, GitHub, a dashboard, a CRM).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tedious to investigate manually.&lt;/strong&gt; Someone has to copy-paste between tabs, summarize findings, and post an update.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy for a human to review.&lt;/strong&gt; The agent produces a summary or recommendation, not a final irreversible action.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some high-value starting points:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incident triage&lt;/strong&gt; across Slack, GitHub, and observability tools. When errors spike after a deployment, the agent pulls recent commits, queries Datadog for error rates and latency, checks PagerDuty for related incidents, and posts a structured summary with evidence links.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support escalation summaries&lt;/strong&gt; using your ticketing system, CRM, and internal docs. Instead of an engineer spending 15 minutes rebuilding context on an escalated ticket, the agent does it in seconds and posts the summary in the escalation channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product-feedback triage&lt;/strong&gt; that reads a Slack thread, extracts the core request, checks for duplicates in Linear or Jira, and creates a properly tagged issue with the original thread linked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Account research&lt;/strong&gt; that pulls together CRM data, recent email threads, product usage metrics, and internal notes before a customer call.&lt;/p&gt;

&lt;p&gt;Start narrow. A focused agent earns trust faster than a broadly capable one.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does a Claude Tag-style Slack agent work?
&lt;/h2&gt;

&lt;p&gt;The architecture behind a Claude Tag-style agent has four layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Slack is the interface.&lt;/strong&gt; Users tag the agent in a thread. Slack delivers the triggering event; your application retrieves thread context via the API and displays results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The model is the reasoning layer.&lt;/strong&gt; It understands the request, decides what information it needs, and synthesizes a response. Use whatever LLM and agent framework fits your stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arcade is the action layer.&lt;/strong&gt; It connects the agent to approved tools, handles authorization and token management, and enforces access policy. The model never sees credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your app handles orchestration.&lt;/strong&gt; Task state, retries, async job processing, and posting updates back to Slack.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fx54ag558ryuzh4oecx79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fx54ag558ryuzh4oecx79.png" alt="Slack AI agent architecture showing the five stages from a Slack @mention, through the agent's reasoning loop and the Arcade API MCP Gateway, to approved tools and the result returned in Slack" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each layer is independently replaceable. Swap the model, change the framework, add tools. The boundaries stay clean.&lt;/p&gt;

&lt;p&gt;What we are building is a shared agent, not a multi-user agent. Every tool call runs under a single service identity regardless of who tagged the bot. Step 4 covers how to add per-user authorization if your use case requires it.&lt;/p&gt;

&lt;p&gt;This prototype starts a run only when mentioned. Claude Tag's production experience supports unmentioned follow-ups within an active thread. To add that behavior, subscribe to &lt;code&gt;message.channels&lt;/code&gt; and &lt;code&gt;message.groups&lt;/code&gt;, track active thread IDs, and filter out bot-generated messages. That is a production extension beyond the scope of this walkthrough.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to build a Claude Tag-style Slack agent with Arcade
&lt;/h2&gt;

&lt;p&gt;This walkthrough uses Python with Slack's Bolt framework and the Arcade Python SDK. The same pattern works with any language or agent framework that supports MCP or Arcade's REST API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;You need Python 3.8+, permission to create and install a Slack app, an &lt;a href="https://docs.arcade.dev/home/api-keys" rel="noopener noreferrer"&gt;Arcade account and API key&lt;/a&gt;, and an &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI API key&lt;/a&gt;. For local Slack Events API testing, also install and authenticate the &lt;a href="https://ngrok.com/docs/getting-started" rel="noopener noreferrer"&gt;ngrok CLI&lt;/a&gt; or another public HTTPS tunnel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;slack-bolt arcadepy openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Create the Slack app and event trigger
&lt;/h3&gt;

&lt;p&gt;Create a Slack app at &lt;a href="https://api.slack.com/apps" rel="noopener noreferrer"&gt;api.slack.com/apps&lt;/a&gt;. Under &lt;strong&gt;OAuth &amp;amp; Permissions&lt;/strong&gt;, add the bot scopes &lt;code&gt;app_mentions:read&lt;/code&gt;, &lt;code&gt;chat:write&lt;/code&gt;, &lt;code&gt;channels:history&lt;/code&gt;, and &lt;code&gt;groups:history&lt;/code&gt;. Install the app to your workspace, then copy the Bot User OAuth Token (&lt;code&gt;xoxb-...&lt;/code&gt;) and Signing Secret from the app settings.&lt;/p&gt;

&lt;p&gt;You now have everything needed to set the environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"xoxb-..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SLACK_SIGNING_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARCADE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"you@company.com"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SLACK_ALLOWED_CHANNEL_IDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"C0123456789"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For &lt;code&gt;ARCADE_USER_ID&lt;/code&gt;, use the email associated with your Arcade account. Arcade's &lt;a href="https://docs.arcade.dev/home/quickstart" rel="noopener noreferrer"&gt;default development verifier&lt;/a&gt; expects that identity. This is the single shared identity under which every tool call executes. All mentions in all approved channels resolve to this one account. It does not create GitHub or PagerDuty service accounts on its own. If the agent must act under a dedicated downstream identity, use dedicated accounts during the OAuth flows in Step 2.&lt;/p&gt;

&lt;p&gt;Replace &lt;code&gt;C0123456789&lt;/code&gt; with your actual Slack channel ID. Open the channel in Slack's web or desktop app and copy the &lt;code&gt;C...&lt;/code&gt; portion of its URL (&lt;code&gt;https://app.slack.com/client/T.../C...&lt;/code&gt;). See Slack's &lt;a href="https://slack.com/help/articles/221769328-Locate-your-Slack-URL-or-ID" rel="noopener noreferrer"&gt;guide to locating IDs&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SLACK_ALLOWED_CHANNEL_IDS&lt;/code&gt; restricts the agent to specific channels, enforcing the per-channel scoping that Claude Tag uses. Comma-separate multiple channel IDs. If different channels need different permissions or toolsets, you will need a &lt;code&gt;channel_id&lt;/code&gt;-to-identity mapping or separate deployments.&lt;/p&gt;

&lt;p&gt;Slack's three-second rule is the critical implementation detail. Your endpoint must return HTTP 200 within three seconds or Slack marks delivery as failed and retries up to three times. Bolt handles acknowledgement automatically when you use the standard decorator pattern. For production workloads where agent processing takes longer, offload work to a task queue. Deduplicate on Slack's top-level &lt;code&gt;event_id&lt;/code&gt; before enqueueing work, otherwise retries can execute the same tools twice.&lt;/p&gt;

&lt;p&gt;Save this as &lt;code&gt;app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;slack_bolt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;App&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_agent&lt;/span&gt;  &lt;span class="c1"&gt;# Step 3
&lt;/span&gt;
&lt;span class="n"&gt;ALLOWED_CHANNEL_IDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_ALLOWED_CHANNEL_IDS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;signing_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_SIGNING_SECRET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app.event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app_mention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_mention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;say&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;channel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ALLOWED_CHANNEL_IDS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ignoring mention from unauthorized channel %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;channel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# Ignore messages from bots (including this one) to prevent loops
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;thread_ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_ts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieve up to 50 messages of thread context.
&lt;/span&gt;        &lt;span class="c1"&gt;# Production implementations should follow
&lt;/span&gt;        &lt;span class="c1"&gt;# response_metadata.next_cursor for longer threads.
&lt;/span&gt;        &lt;span class="n"&gt;replies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;conversations_replies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;channel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;bot_user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot_user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;transcript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;replies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bot_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bot_user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;speaker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;On it. Gathering context...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Slack recommends keeping messages under 4,000 characters.
&lt;/span&gt;        &lt;span class="c1"&gt;# Truncate or chunk longer responses in production.
&lt;/span&gt;        &lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t complete that investigation. Check the application logs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# This is Bolt's built-in development server. For production,
&lt;/span&gt;    &lt;span class="c1"&gt;# deploy through a supported web-framework adapter (e.g. Flask + Gunicorn).
&lt;/span&gt;    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note. Bolt handles signing-secret verification automatically when you pass &lt;code&gt;signing_secret&lt;/code&gt; to the App constructor. The channel allowlist on the first check enforces per-channel scoping so the agent only responds in channels you have explicitly approved. The &lt;code&gt;conversations_replies&lt;/code&gt; call retrieves up to one page of thread context so the agent sees more than just the triggering message. Slack's &lt;a href="https://docs.slack.dev/apis/events-api" rel="noopener noreferrer"&gt;Events API&lt;/a&gt; delivers only the triggering event, not the thread history, so your app must fetch it. And the &lt;code&gt;event.get("bot_id")&lt;/code&gt; guard prevents the agent from responding to its own messages and creating an infinite loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Connect GitHub, Datadog, and PagerDuty with Arcade
&lt;/h3&gt;

&lt;p&gt;Arcade connects your agent to external systems through a curated set of tools. For incident triage, you need read-only tools from GitHub, Datadog, and PagerDuty. Select specific tools rather than loading entire toolkits. Toolkits include write operations that contradict a read-only agent's scope, and a narrower tool list helps the model pick the right tool more reliably.&lt;/p&gt;

&lt;p&gt;These tool names match Arcade's current &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, and &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt; catalogs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TOOL_NAMES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.ListRepositoryActivities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.GetPullRequest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datadog.AggregateEvents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datadog.SearchLogs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListIncidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListLogEntries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Authorize tools before first use.&lt;/strong&gt; GitHub and PagerDuty require OAuth authorization. Datadog requires API credentials configured as Arcade secrets (&lt;code&gt;DATADOG_API_KEY&lt;/code&gt;, &lt;code&gt;DATADOG_APPLICATION_KEY&lt;/code&gt;, and &lt;code&gt;DATADOG_SITE&lt;/code&gt;). Configure the Datadog secrets in the &lt;a href="https://api.arcade.dev/dashboard/auth/secrets" rel="noopener noreferrer"&gt;Arcade secrets dashboard&lt;/a&gt;, then save the following as &lt;code&gt;authorize.py&lt;/code&gt; and run it once to complete the OAuth flows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;arcadepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Arcade&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;arcade&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Arcade&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;OAUTH_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.ListRepositoryActivities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.GetPullRequest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListIncidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListLogEntries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;OAUTH_TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorize &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All OAuth-backed tools authorized.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open each URL and complete the OAuth consent. Arcade stores the tokens and refreshes them automatically. Subsequent calls reuse the authorization until it expires, is revoked, or a tool requires additional permissions. See Arcade's &lt;a href="https://docs.arcade.dev/en/guides/tool-calling/custom-apps/auth-tool-calling" rel="noopener noreferrer"&gt;authorization guide&lt;/a&gt; for the full setup flow.&lt;/p&gt;

&lt;p&gt;If your agent framework supports MCP natively, you can alternatively create an &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;Arcade MCP Gateway&lt;/a&gt; that federates these tools behind a single Streamable-HTTP endpoint. The gateway serves tool definitions over MCP, so your agent discovers exactly the tools you curated. The direct SDK approach shown here works with any framework.&lt;/p&gt;

&lt;p&gt;Tool selection is both a technical and product decision. The fewer tools the agent sees, the more reliably it picks the right one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Build the tool-calling agent loop
&lt;/h3&gt;

&lt;p&gt;This is the piece that connects the Slack trigger to the tools. Your agent runtime sits between Slack and Arcade: it receives the thread transcript, uses an LLM to decide what tools to call, and executes them through Arcade.&lt;/p&gt;

&lt;p&gt;Arcade is framework-agnostic. It works with LangGraph, the OpenAI Agents SDK, CrewAI, Mastra, Pydantic AI, Google ADK, or any MCP-compatible client. The integration has two touchpoints, both through the &lt;code&gt;arcadepy&lt;/code&gt; SDK: &lt;code&gt;tools.formatted.get&lt;/code&gt; to load tool definitions, and &lt;code&gt;tools.execute&lt;/code&gt; to run them.&lt;/p&gt;

&lt;p&gt;Save the following as &lt;code&gt;agent.py&lt;/code&gt;. This is the &lt;code&gt;run_agent&lt;/code&gt; function imported in Step 1, using the OpenAI Chat Completions API directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;arcadepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Arcade&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;arcade&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Arcade&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="c1"&gt;# reads ARCADE_API_KEY from env
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;      &lt;span class="c1"&gt;# reads OPENAI_API_KEY from env
&lt;/span&gt;
&lt;span class="c1"&gt;# Load tools once at startup, not on every request
&lt;/span&gt;&lt;span class="n"&gt;TOOL_NAMES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.ListRepositoryActivities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Github.GetPullRequest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datadog.AggregateEvents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datadog.SearchLogs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListIncidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pagerduty.ListLogEntries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;OPENAI_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;ARCADE_NAME_BY_FUNCTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;arcade_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TOOL_NAMES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;definition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;OPENAI_TOOLS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;definition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ARCADE_NAME_BY_FUNCTION&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;definition&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_name&lt;/span&gt;

&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You investigate production incidents using only the supplied read-only &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools. Return a concise summary, evidence with source identifiers or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;links, a recommended next step, and an Actions taken section. Never &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim a query succeeded unless its tool result confirms success.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MAX_TOOL_ROUNDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_TOOL_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;OPENAI_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No response was produced.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;arcade_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ARCADE_NAME_BY_FUNCTION&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent exceeded the maximum number of tool rounds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting. Tools are loaded once at module level using &lt;code&gt;formatted.get&lt;/code&gt; for each specific tool, which avoids pulling in unwanted write operations and eliminates per-request overhead. The &lt;code&gt;ARCADE_NAME_BY_FUNCTION&lt;/code&gt; mapping handles the translation between OpenAI's function names and Arcade's tool names. The loop caps at &lt;code&gt;MAX_TOOL_ROUNDS&lt;/code&gt; to prevent runaway execution. Structured tool failures returned by Arcade are fed back to the model as tool results, so it can report issues in its summary rather than crashing silently. Network and SDK exceptions still bubble to the outer Slack handler. And &lt;code&gt;store=False&lt;/code&gt; disables storage of the Chat Completion as application state. It does not itself enable Zero Data Retention; API requests may still generate abuse-monitoring logs according to your organization's &lt;a href="https://developers.openai.com/api/docs/guides/your-data" rel="noopener noreferrer"&gt;data-control settings&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Arcade documents &lt;code&gt;formatted.get&lt;/code&gt;, &lt;code&gt;formatted.list&lt;/code&gt;, and the OpenAI format &lt;a href="https://docs.arcade.dev/en/guides/tool-calling/custom-apps/get-tool-definitions" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Chat Completions remains supported, and GPT-4.1 supports function calling. OpenAI recommends the Responses API for new projects, but the pattern above is valid. For a complete Slack-to-Arcade reference implementation using LangGraph, see &lt;a href="https://github.com/ArcadeAI/SlackAgent" rel="noopener noreferrer"&gt;ArcadeAI/SlackAgent&lt;/a&gt;. For other frameworks, see Arcade's &lt;a href="https://docs.arcade.dev/en/get-started/agent-frameworks/openai-agents/setup-python" rel="noopener noreferrer"&gt;framework-specific setup guides&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Run and test the agent
&lt;/h3&gt;

&lt;p&gt;With all three files saved:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;python authorize.py&lt;/code&gt; once to complete the OAuth flows.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;python app.py&lt;/code&gt; to start the Bolt development server.&lt;/li&gt;
&lt;li&gt;In another terminal, run &lt;code&gt;ngrok http 3000&lt;/code&gt; to expose the server.&lt;/li&gt;
&lt;li&gt;In your Slack app settings, set the Request URL to &lt;code&gt;https://&amp;lt;your-ngrok-host&amp;gt;/slack/events&lt;/code&gt;, subscribe to &lt;code&gt;app_mention&lt;/code&gt;, and reinstall the app if Slack prompts you.&lt;/li&gt;
&lt;li&gt;Invite the bot to your test channel with &lt;code&gt;/invite @YourBot&lt;/code&gt; and try a mention.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 5: Configure identity and secure tool access
&lt;/h3&gt;

&lt;p&gt;The prototype above is a shared agent: one fixed service identity (&lt;code&gt;ARCADE_USER_ID&lt;/code&gt;) handles every tool call, no matter which teammate tagged the bot. That is the right starting point for a read-only agent, but it is not the only option. A multi-user agent, where each person authorizes tools under their own identity, requires a different auth pattern. Which identity the agent uses, and whether users need to authorize tools themselves, depends on the access model you choose.&lt;/p&gt;

&lt;p&gt;A useful architecture for recreating the Claude Tag pattern uses two identity models. Public launch material confirms Claude Tag's channel-scoped shared identity, and the DM model extends naturally from it:&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;shared channels&lt;/strong&gt;, the agent acts under its own dedicated identity, not the tagging user's. Permissions are scoped per-channel.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;DMs&lt;/strong&gt;, the agent runs with the user's own connectors and credentials.&lt;/p&gt;

&lt;p&gt;Replicate this with Arcade's auth patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For shared-channel agents&lt;/strong&gt; (like &lt;code&gt;#eng-incidents&lt;/code&gt;), use a fixed service identity as shown in Steps 1 through 3. If you are connecting through an MCP Gateway instead of the direct SDK, &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;Arcade Headers&lt;/a&gt; authenticates the gateway connection. An important distinction: Arcade Headers authenticates the connection to the gateway itself, but it does not bypass OAuth authorization required by individual tools like GitHub or PagerDuty. Gateway authentication and &lt;a href="https://docs.arcade.dev/en/learn/server-level-vs-tool-level-auth" rel="noopener noreferrer"&gt;tool-level authorization&lt;/a&gt; are separate layers. That is why the one-time setup in Step 2 is necessary regardless of which auth mode you choose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For personal DM agents&lt;/strong&gt;, the tools change too. Instead of shared incident-response tools, a DM agent might access a user's own Gmail, Calendar, or Drive. Use per-user OAuth through Arcade's &lt;a href="https://docs.arcade.dev/en/guides/tool-calling/custom-apps/auth-tool-calling" rel="noopener noreferrer"&gt;&lt;code&gt;tools.authorize&lt;/code&gt;&lt;/a&gt; flow. When a tool requires the user's own credentials, Arcade returns an authorization URL. Your app posts that URL to the user in Slack, waits for consent, then resumes execution. The model never sees the token.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slack_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channel_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Authorize a tool for a specific user and execute it.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.ListEmails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# In a DM, use a persistent message (no need for ephemeral)
&lt;/span&gt;        &lt;span class="n"&gt;slack_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_postMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;channel_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please authorize Gmail access: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arcade&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.ListEmails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Arcade stores and refreshes OAuth tokens automatically. Subsequent calls reuse the authorization until it expires, is revoked, or a tool requires additional permissions.&lt;/p&gt;

&lt;p&gt;Note that Step 1 does not currently implement DM support. To add it, you need the bot scope &lt;code&gt;im:history&lt;/code&gt;, the bot event &lt;code&gt;message.im&lt;/code&gt;, and a separate &lt;code&gt;@app.event("message")&lt;/code&gt; handler that checks &lt;code&gt;event["channel_type"] == "im"&lt;/code&gt; and filters out bot messages. Slack does not deliver DMs as &lt;code&gt;app_mention&lt;/code&gt; events. See Slack's &lt;a href="https://docs.slack.dev/reference/events/message.im/" rel="noopener noreferrer"&gt;&lt;code&gt;message.im&lt;/code&gt; documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For a per-user identity without requiring email scopes in Slack, Arcade accepts any consistent unique identifier. A composite Slack identity like &lt;code&gt;f"{body['team_id']}:{event['user']}"&lt;/code&gt; works and avoids the need for &lt;code&gt;users:read&lt;/code&gt; or &lt;code&gt;users:read.email&lt;/code&gt; permissions.&lt;/p&gt;

&lt;p&gt;For production multi-user agents, use Arcade's &lt;a href="https://docs.arcade.dev/en/guides/user-facing-agents/secure-auth-production" rel="noopener noreferrer"&gt;custom user verifier&lt;/a&gt; so end-user identity is verified against your own identity system rather than relying on Slack ID mapping alone. Note that production multi-user OAuth also requires your own provider OAuth app credentials, since Arcade's default OAuth apps use the Arcade verifier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Return auditable results in Slack
&lt;/h3&gt;

&lt;p&gt;Trustworthy agents show their work. Structure every response so a human can verify what happened before acting on it.&lt;/p&gt;

&lt;p&gt;Here is what a good incident-triage response looks like in Slack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summary: Checkout error rate increased 340% starting at 14:32 UTC, correlating with deployment v2.41.3 merged at 14:28.
Evidence:
- Datadog: p99 latency spiked from 220ms to 1,400ms at 14:32
- GitHub: PR #1847 modified the payment validation middleware
- PagerDuty: No prior incidents on checkout-service in the last 7 days
Recommended next step: Review the diff in PR #1847, specifically checkout/validation.py lines 84-112. Consider a rollback if error rate does not stabilize within 15 minutes.
Actions taken: Read-only queries to GitHub, Datadog, and PagerDuty. No writes performed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "actions taken" line matters. It tells the team exactly what the agent did and, just as importantly, what it did not do.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to secure and govern a Claude Tag-style Slack agent
&lt;/h2&gt;

&lt;p&gt;Governance is not a compliance afterthought. It is what lets teams deploy useful agents in the first place. Without clear controls, security teams will block the project before it ships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start read-only.&lt;/strong&gt; Give the agent query access to GitHub, Datadog, and PagerDuty. Do not grant write access until the team has confidence in the agent's judgment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Require approval before consequential writes.&lt;/strong&gt; Opening a PR, acknowledging a PagerDuty incident, posting to a customer-facing channel: these should require a human to confirm. Arcade's &lt;a href="https://docs.arcade.dev/en/guides/contextual-access" rel="noopener noreferrer"&gt;Contextual Access&lt;/a&gt; hooks let you enforce this with pre-execution webhooks that allow, deny, or modify tool execution. Your application collects the human approval and resumes the job; Contextual Access handles the policy-enforcement layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope tool access by workflow.&lt;/strong&gt; The incident agent should not see CRM tools. The support agent should not see deployment tooling. Separate tool sets per workflow enforce this structurally, whether you use explicit tool lists in the SDK or separate MCP Gateways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log what the agent did.&lt;/strong&gt; Arcade's audit logs capture administrative actions by default. Combine these with your application-level logs and downstream SaaS audit trails so you can always answer: what did the agent do, under which identity, in which system?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it easy to stop.&lt;/strong&gt; A kill switch is a feature. Revoking the agent's dedicated API key or disabling the Slack app should take seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build the Slack agent your team will actually tag
&lt;/h2&gt;

&lt;p&gt;The goal is not an AI agent that can do everything. It is one dependable agent that removes friction from a workflow your team performs every week.&lt;/p&gt;

&lt;p&gt;Pick the workflow. Define the toolset. Wire up the Slack trigger. Connect the tools through &lt;a href="https://www.arcade.dev" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;. Start read-only, return inspectable results, and expand scope as trust builds.&lt;/p&gt;

&lt;p&gt;The team that ships a useful agent in one channel next week will learn more than the team that spends a quarter designing a platform for every channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Identify one recurring, cross-system workflow your team performs in Slack&lt;/li&gt;
&lt;li&gt;[ ] Pick a small read-only toolset from Arcade's &lt;a href="https://docs.arcade.dev/en/resources/integrations" rel="noopener noreferrer"&gt;tool catalog&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Authorize those tools for your service identity (&lt;code&gt;python authorize.py&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] Build the Slack trigger with thread context retrieval and error handling&lt;/li&gt;
&lt;li&gt;[ ] Deploy, observe, and expand deliberately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Explore Arcade's &lt;a href="https://docs.arcade.dev/en/resources/integrations" rel="noopener noreferrer"&gt;tool catalog&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/guides/tool-calling/custom-apps/auth-tool-calling" rel="noopener noreferrer"&gt;authorization guides&lt;/a&gt;, and &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;MCP Gateway documentation&lt;/a&gt; to get started. The code from this guide is on &lt;a href="https://github.com/manveer/open-claude-tag" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Fork it and build something useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude Tag?
&lt;/h3&gt;

&lt;p&gt;Claude Tag is Anthropic's shared AI agent for Slack, launched on June 23, 2026 for Enterprise and Team customers. Unlike the previous Claude in Slack integration, which ran as a personal assistant under each user's own account, Claude Tag operates as a shared teammate in channels. Anyone can tag &lt;a class="mentioned-user" href="https://dev.to/claude"&gt;@claude&lt;/a&gt;, and the entire exchange is visible to the channel. It reads thread context, uses connected tools, and posts structured results in-thread.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is Claude Tag different from Claude in Slack?
&lt;/h3&gt;

&lt;p&gt;Claude in Slack gave each user a private instance that acted under their personal permissions and usage quota. Claude Tag replaces that with a single shared identity per channel, scoped by an admin. Work is visible to the whole channel, anyone can pick up a conversation where someone else left off, and Claude builds persistent context as it follows along. Anthropic will automatically migrate existing Claude in Slack workspaces to Claude Tag on August 3, 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you build your own version of Claude Tag?
&lt;/h3&gt;

&lt;p&gt;Yes. Claude Tag's core interaction pattern is reproducible: a Slack event trigger, an LLM reasoning loop, and authorized access to external tools. This tutorial builds that pattern with Python, Slack Bolt, and Arcade. Arcade handles tool connectivity and OAuth token management so you can connect to systems like GitHub, Datadog, and PagerDuty without managing credentials yourself. The result is not Anthropic's proprietary product, but a Claude Tag-style agent you fully control.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does Arcade do in a Slack AI agent?
&lt;/h3&gt;

&lt;p&gt;Arcade is the action layer between your agent and external tools. It handles three things: loading tool definitions formatted for your LLM, executing tool calls with the correct credentials injected at runtime, and managing OAuth authorization flows so the model never sees tokens or API keys. You choose which tools the agent can access, and Arcade enforces that scope on every request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does my Slack AI agent have access to user passwords or API keys?
&lt;/h3&gt;

&lt;p&gt;No. Arcade manages all credentials on the server side. When a tool requires OAuth (like GitHub or PagerDuty), the user completes a consent flow once and Arcade stores and refreshes the token. When a tool requires API keys (like Datadog), those are configured as secrets in the Arcade dashboard. The LLM and your application code never see raw credentials. Arcade injects the right token at execution time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Enterprise-Managed Authorization Is a Foundation, Not a Ceiling: Why Connected Agents Need Per-Action Authorization</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 23 Jun 2026 20:19:06 +0000</pubDate>
      <link>https://dev.to/arcade/enterprise-managed-authentication-mcp-per-action-authorization-for-enterprise-ai-agents-3hd1</link>
      <guid>https://dev.to/arcade/enterprise-managed-authentication-mcp-per-action-authorization-for-enterprise-ai-agents-3hd1</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise-Managed Authorization (EMA) centralizes access provisioning and eliminates per-server consent prompts. It is the right solution for connection-time governance. It was not designed to authorize each individual tool call, and it does not.
&lt;/li&gt;
&lt;li&gt;AI workflows need per-action authorization to limit the blast radius of prompt injection, because attacks exploit the gap between "this agent is allowed to connect" and "this specific action should execute right now."
&lt;/li&gt;
&lt;li&gt;A secure authorization layer must evaluate the intersection of organization policies, user delegation, and agent capability boundaries immediately before an action executes.
&lt;/li&gt;
&lt;li&gt;Production-grade deployments use a pre-execution interceptor and credential isolation to guarantee that large language models never access raw authentication tokens directly.
&lt;/li&gt;
&lt;li&gt;High-risk production deployments need action-level runtime enforcement, implemented in-house or through an action runtime such as Arcade, without replacing existing corporate identity infrastructure, including EMA.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Enterprise-Managed Authorization (EMA) Solves for MCP&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://modelcontextprotocol.io/extensions/auth/enterprise-managed-authorization" rel="noopener noreferrer"&gt;Enterprise-Managed Authorization&lt;/a&gt; is now stable. The extension, adopted by Anthropic, Microsoft, Okta, and a growing number of MCP servers, solves the per-server OAuth consent tax that slowed enterprise MCP adoption.&lt;/p&gt;

&lt;p&gt;Before EMA, every employee had to authorize every MCP server individually. Security teams had no centralized control. Work and personal accounts bled together. EMA eliminates all of this by making the organization's IdP the authoritative decision-maker for MCP server access. Administrators define policy once. Users authenticate through single sign-on and inherit every server their role permits. No per-app OAuth, nothing to configure as a one-off.&lt;/p&gt;

&lt;p&gt;Under the hood, as part of the SSO-based authorization flow, the client obtains an identity assertion and uses it to request an Identity Assertion JWT Authorization Grant (ID-JAG), which it exchanges for access tokens from each MCP server's authorization server. Three properties follow: authorize once and inherit everywhere, centralized policy and audit for access decisions, and elimination of personal/enterprise account mixups.&lt;/p&gt;

&lt;p&gt;This is valuable infrastructure. It is also, by design, a grant-time decision. EMA's IdP evaluates policy when tokens are issued (and may re-evaluate on renewal), but its standardized authorization visibility does not extend to individual tool calls. EMA determines &lt;em&gt;who may connect to what&lt;/em&gt;. It has nothing to say about whether a specific tool call, proposed by a potentially compromised agent five minutes after the token was issued, should actually execute.&lt;/p&gt;

&lt;p&gt;That gap is where the real attacks live.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How Prompt Injection Exploits Authenticated AI Agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In early 2025, security researcher Johann Rehberger demonstrated &lt;a href="https://embracethered.com/blog/posts/2025/spaiware-and-chatgpt-command-and-control-via-prompt-injection-zombai/" rel="noopener noreferrer"&gt;SpAIware&lt;/a&gt;: a single indirect prompt injection, delivered through a malicious website, planted persistent instructions in ChatGPT's memory store. Those instructions survived logouts and browser restarts. The compromised instance then acted as a command-and-control relay, polling a public GitHub repository for attacker commands and writing exfiltrated data to Azure Blob Storage request logs. The CSA's March 2026 &lt;a href="https://labs.cloudsecurityalliance.org/research/csa-research-note-promptware-agent-commander-c2-20260317-csa/" rel="noopener noreferrer"&gt;Promptware report&lt;/a&gt; generalized this into a broader class of agent C2 attacks.&lt;/p&gt;

&lt;p&gt;The agent's built-in capabilities (web access, memory, code execution) were all legitimately available to its runtime. EMA-style centralized provisioning would not have changed the outcome. The injected instructions exploited capabilities already present in the agent's environment, not separately provisioned OAuth connections. No authorization layer distinguished a user-initiated action from an injection-initiated one. Connection-time governance was powerless because the problem was never authentication. The agent was who it claimed to be.&lt;/p&gt;

&lt;p&gt;In mid-2026, researchers demonstrated prompt-injection attacks through GitHub comments, issue bodies, and PR titles that &lt;a href="https://www.securityweek.com/claude-code-gemini-cli-github-copilot-agents-vulnerable-to-prompt-injection-via-comments/" rel="noopener noreferrer"&gt;hijacked Claude Code, Gemini CLI, and GitHub Copilot Agent&lt;/a&gt;. Across the three products, the attacks exploited pre-authorized tool capabilities to exfiltrate CI secrets; some variants also induced shell-command execution. A related &lt;a href="https://arxiv.org/abs/2605.11229" rel="noopener noreferrer"&gt;academic study&lt;/a&gt; documented similar injection vectors across 15 GitHub Actions. Anthropic's remediation was telling: they disallowed the &lt;code&gt;ps&lt;/code&gt; tool rather than restricting broad tool access. The response was a band-aid on a connection-level wound.&lt;/p&gt;

&lt;p&gt;These are not isolated demonstrations. &lt;a href="https://www.f5.com/resources/articles/top-agentic-ai-security-vulnerabilities-in-banking" rel="noopener noreferrer"&gt;F5&lt;/a&gt; describes a banking scenario in which threat actors use prompt injection against an AI chatbot to initiate unauthorized financial transactions, with the bank identifying the loss only after multiple accounts are impacted. &lt;a href="https://github.com/requie/AI-Red-Teaming-Guide" rel="noopener noreferrer"&gt;The AI Red Teaming Guide&lt;/a&gt; catalogs a growing body of MCP-related vulnerabilities disclosed through 2025. Simon Willison, who has tracked prompt injection since 2022, coined the "&lt;a href="https://simonw.substack.com/p/the-lethal-trifecta-for-ai-agents" rel="noopener noreferrer"&gt;lethal trifecta&lt;/a&gt;" for this pattern: private data, untrusted content, and external communication converging in the same system.&lt;/p&gt;

&lt;p&gt;The common thread across every attack: attackers induced agents to misuse capabilities already available to their runtimes. No authorization layer asked whether the specific action matched the user's intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-action authorization&lt;/strong&gt; evaluates whether a specific tool call should proceed based on the intersection of organization policy, user delegation, and agent capability, checked at execution time, after the prompt, for every action independently. It is distinct from grant-time authorization (evaluated at token issuance, which is what EMA provides) and session-level authorization (checked once per conversation).&lt;/p&gt;

&lt;p&gt;Per-action authorization is not itself a prompt-injection detector. It limits blast radius by denying or escalating actions that violate deterministic constraints. An injected action that remains within those constraints may still execute, so provenance controls, content isolation, and human approval remain necessary for sensitive operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;EMA vs. Per-Action Authorization: Provisioning vs. Runtime&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;EMA and per-action authorization are not competing solutions. They operate at different points in the execution lifecycle and address different threat models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;EMA (Connection-Time)&lt;/th&gt;
&lt;th&gt;Per-Action Authorization (Runtime)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Decision point&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Before the agent connects to a server&lt;/td&gt;
&lt;td&gt;Before the agent executes a specific tool call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What it answers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Is this user/agent allowed to access this MCP server?"&lt;/td&gt;
&lt;td&gt;"Should this specific action execute in this context?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Policy inputs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IdP groups, roles, conditional access rules&lt;/td&gt;
&lt;td&gt;Organization policy + user delegation + agent capability + tool arguments + trusted provenance and risk signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Threat model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unauthorized connections, personal/enterprise mixups, shadow IT&lt;/td&gt;
&lt;td&gt;Prompt injection, permission abuse, lateral movement through valid connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation frequency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;At token issuance/renewal&lt;/td&gt;
&lt;td&gt;Every tool call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit trail&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"User X connected to Server Y at time T"&lt;/td&gt;
&lt;td&gt;"Agent A attempted action B with parameters C, evaluated against policy D, outcome E"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;EMA provides the outer gate. It ensures that only authorized users connect to approved servers through managed corporate identities. But EMA itself adds no per-tool-call semantic policy. Individual MCP servers may enforce scopes, ACLs, or rate limits on each request, but those controls are server-specific, inconsistent across the ecosystem, and unaware of whether a tool call originated from user intent or injected instructions.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/Article/4496698/nsa-releases-security-design-considerations-for-ai-driven-automation-leveraging/" rel="noopener noreferrer"&gt;NSA's May 2026 Cybersecurity Information document&lt;/a&gt; on MCP security is blunt: "MCP itself cannot enforce these security principles at the protocol level." This applies equally to EMA. The extension centralizes provisioning decisions. It does not, and cannot, evaluate whether the tool call an agent is about to make was triggered by the user's intent or by a malicious instruction embedded in a GitHub comment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why OAuth Scopes Are Not Enough for AI Agent Authorization&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;OAuth scopes are space-delimited strings and are often too coarse for transaction-specific authorization. A &lt;code&gt;mail.send&lt;/code&gt; scope grants the ability to email any recipient. It cannot encode which recipient, in what context, whether the user intended this specific email, or whether the conversation was corrupted by an injection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.rfc-editor.org/info/rfc9396/" rel="noopener noreferrer"&gt;RFC 9396&lt;/a&gt; (Rich Authorization Requests) partially addresses this by using JSON objects to describe API access with &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;locations&lt;/code&gt;, and &lt;code&gt;actions&lt;/code&gt; fields. RAR can constrain later operations using transaction-specific authorization details (recipient, amount, resource), and resource servers can enforce those details. But RAR does not standardize provenance-aware evaluation of whether an agent's later action still reflects the user's current intent. When an agent makes a tool call from a potentially compromised conversation, RAR constrains the parameters but cannot determine whether the call was user-initiated or injection-initiated.&lt;/p&gt;

&lt;p&gt;The MCP specification's auth extensions face the same structural limitation. As of June 2026, both EMA and Client Credentials operate at the transport/connection level. The ext-auth repository contains no per-action authorization extension. Final MCP SEP-2468 recommends that authorization servers include the OAuth authorization-response &lt;code&gt;iss&lt;/code&gt; parameter and requires clients to validate it, mitigating authorization-server mix-up attacks. This is a transport-security measure, not per-action evaluation. MCP's core authorization does support runtime insufficient-scope challenges and step-up authorization, where scopes may depend on request arguments and context. These are valuable server-side controls, but they remain server-defined scope enforcement, not standardized provenance-aware authorization.&lt;/p&gt;

&lt;p&gt;This is not an oversight in the protocol or the extension. It reflects an architectural boundary. Authentication answers "who is this?" Connection-level authorization (including EMA) answers "what can this entity access?" Per-action authorization answers "should this specific action happen right now?" Zero-touch OAuth establishes the first two. The third requires an additional application- or runtime-level mechanism.&lt;/p&gt;

&lt;p&gt;OAuth has progressively added defenses across the authorization and token lifecycle. &lt;a href="https://www.rfc-editor.org/info/rfc6749/" rel="noopener noreferrer"&gt;RFC 6749&lt;/a&gt; (2012) and &lt;a href="https://www.rfc-editor.org/info/rfc6750/" rel="noopener noreferrer"&gt;RFC 6750&lt;/a&gt; defined bearer tokens without sender-constraining. PKCE (2015) mitigated authorization-code interception. DPoP (2023) sender-constrained tokens to reduce replay. &lt;a href="https://www.rfc-editor.org/info/rfc9700/" rel="noopener noreferrer"&gt;RFC 9700&lt;/a&gt; (2025) updated the entire threat model based on "practical experiences gathered since OAuth 2.0 was published." These mechanisms are not per-action authorization, but they illustrate the broader movement away from relying on bearer credentials alone. Each addition responded to real attacks that exploited assumptions about what grant-time credentials could safely cover.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Three-Layer Authorization Model for AI Agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Agents operate at the intersection of three distinct permission sets, not one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html" rel="noopener noreferrer"&gt;AWS IAM&lt;/a&gt; provides a useful precedent for this model. The following table simplifies IAM's full evaluation logic (which combines identity-based and resource-based grants, then constrains them by permissions boundaries and SCPs) to illustrate the intersection principle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;IAM Layer&lt;/th&gt;
&lt;th&gt;Agent Authorization Analog&lt;/th&gt;
&lt;th&gt;What It Controls&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Service Control Policy (Organization)&lt;/td&gt;
&lt;td&gt;Organization policy&lt;/td&gt;
&lt;td&gt;Maximum permissions any agent in this org can possess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Identity-based policy (User)&lt;/td&gt;
&lt;td&gt;User delegation&lt;/td&gt;
&lt;td&gt;What this specific user has delegated to the agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permission boundary (Entity)&lt;/td&gt;
&lt;td&gt;Agent capability boundary&lt;/td&gt;
&lt;td&gt;What this agent type is designed and permitted to do&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The identity or resource policy must grant the action, while the permissions boundary and SCP must permit it. An explicit deny overrides an allow, and adding a permissions boundary can only reduce effective permissions.&lt;/p&gt;

&lt;p&gt;EMA maps cleanly onto the first two layers at connection time. The IdP enforces organization-level policy (which servers are approved) and user-level access (which roles and groups the user belongs to). But it evaluates these layers at token issuance, not per tool call, and it does not standardize an agent-specific capability boundary. OAuth authorization servers can apply client-specific policy, but EMA itself does not define how agent capabilities should be constrained beyond what scopes and roles permit.&lt;/p&gt;

&lt;p&gt;Suppose your organization policy says "no agent may delete production databases." A user has delegated broad access to their calendar, email, and project management tools. The agent is a triage-bot designed to label issues and assign them. The effective permission is the intersection: the triage-bot can label and assign issues in the user's projects, and nothing else. It cannot send email (outside its capability boundary), cannot delete databases (blocked by org policy), and cannot access another user's calendar (not delegated).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.osohq.com/research" rel="noopener noreferrer"&gt;Oso's 2026 Least Privilege Report&lt;/a&gt; (analyzing 2.4 million workers and 3.6 billion permissions) found that 96% of enterprise permissions go unused over 90 days. Employees typically possess 10 times the access they actually need. Thirty-one percent of workers can modify or delete sensitive data. Thirteen percent can reach regulated data including financial and health records.&lt;/p&gt;

&lt;p&gt;Humans often leave dormant permissions unused because of judgment, habit, and professional accountability. Agents do not share those natural constraints and can operate continuously at machine speed. When an agent inherits a human's permission set through a grant-time OAuth token (whether provisioned manually or through EMA), it may exercise capabilities the human rarely touches, turning latent over-provisioning into active attack surface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openfga.dev/" rel="noopener noreferrer"&gt;OpenFGA&lt;/a&gt; (built on &lt;a href="https://research.google/pubs/zanzibar-googles-consistent-global-authorization-system/" rel="noopener noreferrer"&gt;Google Zanzibar's principles&lt;/a&gt;) has formalized this by modeling agents as first-class principals, identical to human users, with explicit authorization tuples like &lt;code&gt;user: agent:triage-bot, relation: member, object: project:alpha&lt;/code&gt;. But the intersection model must be augmented with runtime evaluation: not just "does this agent have the permission?" but "does this agent's current context justify exercising this permission?"&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Zero-Touch OAuth vs. Runtime Security for AI Agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The zero-touch reflex and the security reflex are both right, and they pull in opposite directions.&lt;/p&gt;

&lt;p&gt;One view holds that the protocol should stay out of application-level authorization. Before EMA, users completed one authorization flow per MCP server; afterward, the client included a bearer token that the server validated on every HTTP request. EMA centralizes that initial provisioning without changing the server's responsibility to validate requests.&lt;/p&gt;

&lt;p&gt;The opposing view holds that user-visible friction can still serve a purpose. A per-server consent prompt is not approval of each transaction, but it does show the user what access is being granted. In hosts that expose connected tools across conversations, pre-connecting a high-stakes server can make it reachable from any such conversation. That argues for separate transaction-specific controls, not for preserving per-server OAuth prompts as their substitute.&lt;/p&gt;

&lt;p&gt;Some security teams value explicit user consent for accountability, while others prefer centrally administered access with fine-grained agent policies. Both needs can be met by combining centralized provisioning with runtime enforcement and targeted human approval.&lt;/p&gt;

&lt;p&gt;Without a runtime enforcement layer, zero-touch provisioning can leave an action-level authorization gap. Authorization should therefore be separated from model decision-making and enforced by the harness or execution layer, whether in-process, in a sidecar, or as a remote service.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Implement Per-Action Authorization with a Pre-Execution Interceptor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Insert a policy evaluation point between the LLM's tool-call decision and the actual tool execution. This is the "post-prompt, pre-execution" gap that EMA and zero-touch OAuth leave open by design.&lt;/p&gt;

&lt;p&gt;The common objection is latency. Three implementations demonstrate that per-action policy evaluation is feasible at low cost relative to typical LLM inference:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/" rel="noopener noreferrer"&gt;&lt;strong&gt;Microsoft's Agent Governance Toolkit&lt;/strong&gt;&lt;/a&gt; (April 2026), which Microsoft describes as the first toolkit addressing all 10 OWASP agentic AI risks: a stateless policy engine with a &lt;code&gt;ToolCallInterceptor&lt;/code&gt; that hooks into native framework extension points. &lt;strong&gt;Microsoft's own benchmarks report p99 under 0.1 milliseconds.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPA/Rego sidecar&lt;/strong&gt;: suitable local policies can evaluate in single-digit milliseconds, although teams should benchmark their own policy complexity and deployment topology.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Zanzibar&lt;/strong&gt;: per-request authorization serving many large-scale Google services. &lt;strong&gt;Reported p95 under 10 milliseconds at millions of checks per second.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The minimal viable architecture has three components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Interceptor&lt;/strong&gt; hooking between the LLM's tool-call output and tool execution. Frameworks provide native extension points (&lt;a href="https://www.arcade.dev/blog/agent-authorization-langgraph-guide/" rel="noopener noreferrer"&gt;LangChain callbacks&lt;/a&gt;, CrewAI middleware).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless policy engine&lt;/strong&gt; evaluating each call against organization, user, and agent policy layers. &lt;a href="https://www.openpolicyagent.org/" rel="noopener noreferrer"&gt;OPA&lt;/a&gt;, &lt;a href="https://cedarpolicy.com/" rel="noopener noreferrer"&gt;Cedar&lt;/a&gt;, or equivalent, running locally or as a sidecar.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential store&lt;/strong&gt; isolated from the LLM. Raw tokens are never exposed to the model's context window. Credentials are injected only after policy allows execution.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The interceptor pattern in practice looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authorized_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delegation_chain&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;opa_evaluate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delegation_chain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;delegation_chain&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;allow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;request_human_approval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown policy outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown_outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production implementations should canonicalize tool arguments, bind policy decisions and human approvals to a hash of the exact tool name and arguments, and re-evaluate policy after an asynchronous approval. This prevents arguments, credentials, or policy state from changing between authorization and execution.&lt;/p&gt;

&lt;p&gt;When Rego policies are written to return structured decisions (reason code, deciding policy rule), OPA can surface that context to the caller. A safe, user-facing reason code can be returned to the model so it can replan. Detailed policy rules and sensitive denial context should remain in internal audit logs rather than being exposed to the model.&lt;/p&gt;

&lt;p&gt;Production implementations use &lt;a href="https://www.rfc-editor.org/info/rfc8693/" rel="noopener noreferrer"&gt;RFC 8693&lt;/a&gt; OAuth 2.0 Token Exchange to issue short-lived, least-privilege credentials bound to the current user and session. The LLM never sees any token; the execution layer receives the attenuated credential. This means a successful prompt injection that exfiltrates the agent's context window yields no actionable credentials. EMA's ID-JAG flow establishes the user's identity; credential isolation reduces the risk of that identity being exploited through token theft. Action-level policy and containment remain necessary to prevent the execution layer itself from being used as a confused deputy.&lt;/p&gt;

&lt;p&gt;Different risk levels warrant different patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Human Required?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Synchronous policy check&lt;/td&gt;
&lt;td&gt;Read operations, low-risk tool calls&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Asynchronous human-in-the-loop (HITL) approval&lt;/td&gt;
&lt;td&gt;Financial transactions, data deletion&lt;/td&gt;
&lt;td&gt;Minutes to hours&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deny-with-replan&lt;/td&gt;
&lt;td&gt;Agent can choose an alternative action&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms + inference&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The asynchronous pattern draws from &lt;a href="https://www.arcade.dev/blog/build-ai-agents-for-financial-services-banking/" rel="noopener noreferrer"&gt;financial services' four-eyes principle&lt;/a&gt; (maker-checker): one party prepares an action, another independently reviews and approves before execution. The agent is the "maker." When a human independently reviews the agent's proposed action, this is literal maker-checker. Automated policy enforcement provides an analogous independent control but is not, by itself, the four-eyes principle.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Per-Action Authorization Is Inevitable for Enterprise AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The industry has repeatedly moved from coarse upfront grants toward narrower runtime controls, and each time, it wasn't optional for long.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Android permissions.&lt;/strong&gt; Before Android 6.0 Marshmallow (2015), apps received all requested permissions at install time. Users faced an all-or-nothing choice. Android 6.0 moved "dangerous permissions" to a contextual, just-in-time model: apps must request them at the moment of use, and users can deny or revoke specific permissions. Once granted, permissions persist until revoked, so this is not per-action authorization. But the shift from blanket install-time grants to contextual, revocable runtime grants is the same directional move. Install-time permissions are connection-time provisioning (EMA's domain).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google BeyondCorp.&lt;/strong&gt; After Operation Aurora (2010) demonstrated that perimeter-based trust was insufficient, Google replaced its castle-and-moat model with per-request evaluation based on device state, user identity, and context, regardless of network location. The lesson: "connected" (on the corporate network) was not an authorization decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OAuth's own evolution.&lt;/strong&gt; OAuth retained bearer-token deployments while adding PKCE, DPoP, and updated security guidance to harden different stages of the flow. Neither PKCE nor DPoP is per-action authorization, but both responded to attacks that exploited assumptions about what grant-time credentials could safely cover.&lt;/p&gt;

&lt;p&gt;AI agent authorization is the next instance. EMA represents the maturation of the connection layer, the same way centralized SSO matured enterprise web app access. The CSA, NSA, and OWASP already emphasize action-level controls, least privilege, deterministic validation, and explicit approval for consequential operations. The question is how quickly the industry will build the runtime layer that complements centralized provisioning.&lt;/p&gt;

&lt;p&gt;Compliance pressure is accelerating the timeline. SOC 2 Trust Services Criteria map naturally to per-action controls. CC6.1 (logical and physical access controls) can be supported when audit trails capture each agent action, not just token issuance. CC6.6 (system boundary protection) is strengthened when policy enforcement operates at the tool-call level, not just the network perimeter. CC7.2 (anomaly monitoring) benefits from granular agent telemetry that reveals unusual tool-call patterns in real time. Per-tool-call logging is not a verbatim SOC 2 requirement, but it can provide useful evidence when auditors assess how agent access and actions are controlled.&lt;/p&gt;

&lt;p&gt;On the analyst side, Gartner's Market Guide for Guardian Agents and Forrester's 2026 Technology and Security Predictions both signal that agent governance is now an enterprise category. &lt;a href="https://www.forrester.com/press-newsroom/forrester-tech-security-2026-predictions/" rel="noopener noreferrer"&gt;Forrester predicts&lt;/a&gt; enterprises will defer 25% of planned AI spending to 2027 as financial scrutiny intensifies and organizations struggle to demonstrate ROI.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building a Production Per-Action Authorization Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A production-grade implementation requires seven components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Connection-time provisioning&lt;/strong&gt; (EMA, centralized IdP) controlling which users and agents access which servers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-execution interceptor&lt;/strong&gt; between the LLM's tool-call output and execution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy engine&lt;/strong&gt; evaluating the three-layer intersection (org x user x agent) per call.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential isolation&lt;/strong&gt; from the LLM, with tokens injected only after policy allows.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deny-by-default&lt;/strong&gt; stance with structured reason feedback for model replanning.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop (HITL) approval&lt;/strong&gt; for high-risk actions via Slack, email, or equivalent out-of-band flow.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-action audit logging&lt;/strong&gt; supporting SOC 2 Trust Services Criteria (CC6.1, CC6.6, CC7.2).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these components require novel technology. Microsoft AGT delivers sub-millisecond policy enforcement. OPA handles deny-with-reason in single-digit milliseconds. Zanzibar processes millions of authorization checks per second. EMA handles centralized provisioning today. The necessary building blocks exist. The gap is in connecting them: applying policies consistently across all agents as they scale to more users and systems. That is the central gap an action runtime fills. Without infrastructure for secure action, organizations often restrict agents to analysis and recommendations, keeping realized ROI incremental.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/get-started/authorization/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; evaluates agent scope and user scope together on every tool call. Its &lt;a href="https://docs.arcade.dev/en/guides/contextual-access" rel="noopener noreferrer"&gt;Contextual Access&lt;/a&gt; capability adds customer-defined organization policy through pre-execution hooks that can allow, deny, or modify tool calls. Credentials remain isolated from the LLM, and the model never receives raw tokens. Arcade's catalog includes 8,000+ agent-optimized tools designed around natural-language intent rather than raw API passthrough.&lt;/p&gt;

&lt;p&gt;Arcade goes beyond routing. Its &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;MCP Gateway&lt;/a&gt; federates multiple servers behind a single controlled endpoint. For governance, Arcade generates structured, OpenTelemetry-compatible &lt;a href="https://www.arcade.dev/blog/ai-agent-governance-compliance/" rel="noopener noreferrer"&gt;audit events&lt;/a&gt; for every agent action, attributable to the requesting user and exportable to enterprise SIEM systems.&lt;/p&gt;

&lt;p&gt;Arcade integrates with existing OAuth and IdP flows, including Microsoft Entra and Okta, rather than replacing them. It can be &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;deployed in Arcade Cloud, in a customer VPC, on-premises, or in a fully air-gapped environment&lt;/a&gt;, allowing organizations to control data residency and network isolation.&lt;/p&gt;

&lt;p&gt;Other tools in this space (OPA, Cedar, Microsoft AGT, Kontext, &lt;a href="https://authzed.com/" rel="noopener noreferrer"&gt;AuthZed&lt;/a&gt;) address individual pieces: policy engines, credential management, or governance overlays. Arcade provides all of these capabilities out of the box. By uniting agent authorization (policy and credentials), agent-optimized tools, and lifecycle governance into a single runtime, Arcade solves the complete execution-time security challenge. That matters because these three concerns interact at execution time.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;EMA is the right answer to one authorization problem, but not the complete answer for agent runtime security.&lt;/p&gt;

&lt;p&gt;The industry has repeatedly moved from coarse upfront grants toward narrower runtime controls. Each time, early adopters avoided the painful retrofit that the rest of the industry eventually endured.&lt;/p&gt;

&lt;p&gt;The teams building continuous authorization into their agent architecture now, complementing EMA with runtime policy enforcement, make the same bet the Android, BeyondCorp, and OAuth security teams made: that "provisioned" was never the same as "authorized," and that the gap between them is where real attacks live.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is Enterprise-Managed Authorization (EMA) for MCP?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Enterprise-Managed Authorization is an MCP extension that allows organizations to centrally manage which MCP servers their users can access. It uses the organization's identity provider (IdP) to provision access based on groups, roles, and conditional access rules. Users authenticate once through SSO and automatically connect to all approved MCP servers without per-server consent prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does EMA relate to per-action authorization?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;EMA and per-action authorization solve different problems at different points in the execution lifecycle. EMA governs who connects to what (provisioning). Per-action authorization governs whether a specific tool call should execute (runtime enforcement). EMA is the outer gate; per-action authorization is the inner gate. A complete enterprise architecture needs both centralized provisioning and runtime enforcement; EMA is one way to provide the provisioning layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is per-action authorization for AI agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Per-action authorization is a security model that evaluates whether a specific AI agent tool call should proceed based on organization policy, user delegation, and agent capability. It checks permissions at execution time, immediately after the prompt and before the action occurs. This limits the blast radius of prompt injection by blocking policy-violating actions, even when the underlying permissions were legitimately provisioned through EMA or standard OAuth.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why is EMA not sufficient for AI agent security?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;EMA centralizes access provisioning, which is valuable. But it evaluates access at token issuance (not per tool call) and cannot detect if a specific runtime action was genuinely requested by the user or triggered by a prompt injection. Because AI agents execute tasks at machine speed, they can rapidly exercise latent over-provisioning inherent in standard OAuth scopes, even when those scopes were provisioned through a centrally managed, policy-governed flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How can prompt injection abuse access granted through EMA and OAuth?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Prompt injection abuses EMA- and OAuth-granted access by planting malicious instructions within untrusted content that an authenticated AI agent processes. Because the agent's connection to tools like GitHub or Azure is already authorized via valid, centrally-provisioned tokens, these calls use valid credentials and remain within granted scopes, so they can pass conventional token, scope, and ACL checks. Those checks do not establish whether the user intended the particular action.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does per-action authorization add latency to AI agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Per-action authorization typically adds low latency when evaluated locally or in-process. Suitable local policies can complete in single-digit milliseconds, though results vary with policy complexity and network topology. For local policies this overhead is usually small relative to LLM inference, but remote services and complex policies should be benchmarked in the target deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you implement per-action authorization alongside EMA?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You implement per-action authorization by inserting a pre-execution interceptor between the LLM tool call output and the actual tool execution. This interceptor uses a stateless policy engine to evaluate the requested action against organization, user, and agent policies. EMA continues to handle grant-time provisioning through the IdP. Developers can build this architecture manually or use an action runtime platform like Arcade to enforce runtime checks across their agent infrastructure while preserving their existing EMA and IdP flows.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What Does Arcade Do for AI Agent Authorization?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Arcade is an action runtime platform that provides per-action authorization, managed tools, and governance for AI agents in a single unified system. It evaluates agent and user scopes on every tool call and can enforce customer-defined organization policy through pre-execution hooks immediately before execution. Arcade integrates with existing IdP infrastructure (such as Microsoft Entra and Okta via OIDC) rather than replacing it, adding the runtime enforcement layer that grant-time provisioning cannot provide. It also isolates credentials from the LLM so that the model never sees raw tokens, reducing credential-exfiltration risk during prompt injection attacks. Action-level policy and containment remain necessary to prevent the execution layer from being used as a confused deputy.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>Lessons from building 20 MCP Apps in 2 days</title>
      <dc:creator>Teal Larson</dc:creator>
      <pubDate>Fri, 19 Jun 2026 21:56:10 +0000</pubDate>
      <link>https://dev.to/arcade/lessons-from-building-20-mcp-apps-in-2-days-1f98</link>
      <guid>https://dev.to/arcade/lessons-from-building-20-mcp-apps-in-2-days-1f98</guid>
      <description>&lt;p&gt;A few weeks back, my team sat down for two days and built around twenty MCP Apps. I came out with a much better idea of what they are, what they aren't (yet), and where the duct tape is currently holding things together. Here's the brain dump.&lt;/p&gt;

&lt;p&gt;If you haven't run into them yet: MCP Apps is the first official extension to the MCP spec. It lets a tool return a UI resource alongside its result. The host renders that UI inline as a sandboxed iframe. Tables, charts, forms, branded layouts, little interactive bits that can call back into other tools. Real, actual UI in the middle of a chat-based experience. Very cool.&lt;/p&gt;

&lt;p&gt;They matter because some information is just better visually. A neatly grouped list of pull requests is much easier to scan than a wall of bulleted text. A chart beats a CSV. And as more of our day-to-day work shifts into chat, "bring your brand and your product surface into the conversation" stops being a nice-to-have.&lt;/p&gt;

&lt;p&gt;OK. Lessons.&lt;/p&gt;

&lt;h2&gt;
  
  
  The call is coming from inside the house
&lt;/h2&gt;

&lt;p&gt;MCP Apps live inside the server. This was the first thing that surprised me. An MCP App isn't a hosted URL you point your tool at or some third-party iframe you embed. It is fetched via MCP, not HTTP, so the UI code ships with the MCP server and is served via the ui:// resource scheme.&lt;/p&gt;

&lt;p&gt;There are a number of different ways you could go about organizing this. For example, you could co-locate the files with the tools themselves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-mcp-server/
  tools/
    list_projects    
      list_projects.py
      project-summary.html
    list_project_patterns
      list_project_patterns.py
      pattern-card.html    
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or co-locate them in a single place:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-mcp-server/
  tools/
   list_projects.py
   list_project_patterns.py
  ui/
    pattern-card.html
    project-summary.html
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We were using React because we wanted to leverage the existing internal design system components. So we landed on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-mcp-server/
   tools/
   ui/
     pattern-card.tsx
     project-summary.tsx
     package.json
     vite.config.mjs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single Vite project at the root of /ui, configured to output an HTML file per TSX file at build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Apps are enrichment-only
&lt;/h2&gt;

&lt;p&gt;If a host supports MCP Apps, your user sees the rich UI. If it doesn't (Claude Code, most terminal-based clients, anything that isn't on the new extension), the _meta.ui property is silently ignored and your user just gets the text response.&lt;/p&gt;

&lt;p&gt;So the text response is still the contract. Your MCP App is enrichment on top. If you stuff the actual answer into the UI and leave your text response empty, congratulations: you've shipped a tool that works in some clients and silently breaks in others. Always design as if half your users will never see the app.&lt;br&gt;
Keep these things STUPID simple&lt;br&gt;
I am going to be the first one to tell you: keep your MCP App components dumb. Pure. Boring. All data passed in as props from the tool result. No fetches from inside the app, no state machines, no calls back to your API.&lt;/p&gt;

&lt;p&gt;The tool runs, computes its answer, hands the data to the UI as props, and the UI is just a deterministic render of that. This made our apps fast to build, easy to reason about, and very simple to test in isolation. It also kept us honest about what exactly we were sending into a sandbox we don't fully control (more on that in a sec).&lt;/p&gt;

&lt;h2&gt;
  
  
  Host quirks are real
&lt;/h2&gt;

&lt;p&gt;There's a spec, but hosts implement it with their own opinions. Container width, padding, default typography, dark/light handling, the whole vibe varies. ChatGPT renders wide in a browser. Claude renders narrow in a chat panel. Mobile is mobile. VS Code's side panel is its own little adventure.&lt;/p&gt;

&lt;p&gt;There's no standardized testing harness yet, so our iteration loop was: build, install in client A, eyeball it, install in client B, eyeball it, adjust, repeat. Compared to ordinary frontend dev, where you'd just spin up a Storybook or run Playwright across browsers in CI, it felt slow. Like, painfully slow.&lt;/p&gt;

&lt;p&gt;Two things helped:&lt;/p&gt;

&lt;p&gt;Designing layouts that gracefully reflow at narrow widths from the start, rather than fixing them after the fact.&lt;br&gt;
Using a file watcher to rebuild on save, so the inner loop wasn't quite so brutal.&lt;/p&gt;

&lt;p&gt;The tooling will catch up. For now: plan for visual QA across multiple hosts, and accept the dev loop is going to feel slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The host can see everything
&lt;/h2&gt;

&lt;p&gt;MCP Apps run in a sandboxed iframe, but the content of that iframe is visible to the host. This has a real implication and I don't want to bury it: don't use MCP Apps to collect secrets. No API keys in form fields. No OAuth tokens. Nothing you wouldn't want logged.&lt;/p&gt;

&lt;p&gt;If you need to collect secrets, use URL elicitation or a separate secure form outside the MCP App. You can pair that with an MCP App that polls the external endpoint for completion status. The secret itself just shouldn't live inside the rendered iframe.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you're starting from zero:&lt;/p&gt;

&lt;p&gt;Bundle your UI inside your server. Multi-page Vite, one HTML per surface, your existing design system imported directly.&lt;br&gt;
Always make the text response stand on its own.&lt;br&gt;
Pure components, props in, no client-side state.&lt;br&gt;
Test on every host you care about, by hand, until tooling catches up.&lt;br&gt;
Don't put secrets in the app.&lt;/p&gt;

&lt;p&gt;The patterns that do still feel impossible (gathering tool inputs via UI before the call, for example) might not stay that way for long. MCP Tasks are in the experimental phase and looks like it could open that door&lt;/p&gt;

&lt;p&gt;For now: MCP Apps are early, the spec is moving, the tooling is thin, and they're already worth shipping. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>frontend</category>
    </item>
    <item>
      <title>Best Composio Alternatives in 2026 for Production AI Agents</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 11 Jun 2026 19:25:27 +0000</pubDate>
      <link>https://dev.to/arcade/best-composio-alternatives-in-2026-for-production-ai-agents-446p</link>
      <guid>https://dev.to/arcade/best-composio-alternatives-in-2026-for-production-ai-agents-446p</guid>
      <description>&lt;p&gt;Composio offers over 1,000 toolkits and 20,000 tools through MCP and direct APIs.&lt;/p&gt;

&lt;p&gt;It's great for rapid prototyping, but scaling AI agents to production requires a different architecture.&lt;/p&gt;

&lt;p&gt;This guide evaluates four production-ready alternatives, covering authorization models, governance, deployment options, and real migration complexity, for engineering teams moving beyond the prototype stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;p&gt;When evaluating Composio alternatives for production, prioritize per-user delegated authorization (just-in-time user consent), agent-optimized tools with constrained schemas that reduce hallucination, and centralized governance with immutable audit logs, ideally OpenTelemetry-compatible. Deployment model (cloud, VPC, or air-gapped) is also an important consideration for enterprise environments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best overall for secure multi-user production:&lt;/strong&gt; Arcade.dev&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for AWS-native ecosystems:&lt;/strong&gt; AWS AgentCore&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for data-centric B2B data sync:&lt;/strong&gt; Merge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for shadow AI discovery and governance:&lt;/strong&gt; Natoma&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to evaluate Composio vs. production-ready alternatives
&lt;/h2&gt;

&lt;p&gt;Composio is an MCP gateway and integration wrapper; it works well for early prototyping, single-user internal utilities, or budget-constrained projects. Its extensive integration catalog and low per-call pricing make it the fastest way to wire up a multi-app agent for a proof of concept.&lt;/p&gt;

&lt;p&gt;Moving beyond prototypes reveals architectural limitations around identity, blast radius, observability, and multi-user AI agent authorization when routing multiple real users through agent workflows.&lt;/p&gt;

&lt;p&gt;Evaluating a production-ready alternative comes down to three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Where do my users' OAuth tokens and API keys live, and what is the blast radius if the platform is breached?&lt;/li&gt;
&lt;li&gt;Who can register and run tool definitions, and is execution governed and versioned?&lt;/li&gt;
&lt;li&gt;If something goes wrong, can I prove exactly what every agent did?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Adopting a runtime like Arcade or a unified data layer like Merge doesn't replace your agent orchestration loops. Teams still bring their own orchestration layers, like &lt;a href="https://docs.langchain.com/oss/python/langchain/overview" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; or &lt;a href="https://mastra.ai/" rel="noopener noreferrer"&gt;Mastra&lt;/a&gt;, to manage reasoning and maintain contextual state. The platforms evaluated below operate as execution runtimes and gateways, securing and standardizing the tool layer that orchestration frameworks call.&lt;/p&gt;

&lt;p&gt;When evaluating authorization and blast radius, look for delegated authorization models that evaluate the intersection of agent and user permissions for each action at runtime, scoped to that action, with credentials never exposed to the LLM. The weaker pattern, common in prototyping-first tools, is pre-authorized tokens with broad, static permissions that are fast to wire up, but widen the blast radius the moment an agent is compromised.&lt;/p&gt;

&lt;p&gt;On &lt;a href="https://composio.dev/blog/composio-may-2026-security-incident" rel="noopener noreferrer"&gt;May 21, 2026, an attacker&lt;/a&gt; gained access from internal monitoring tools into automated remediation systems, registered malicious tool definitions inside the tool-execution sandbox and executed arbitrary code. They separately abused compromised employee Gmail OAuth tokens via magic-link sign-in. Roughly 0.3% of active connections were exposed, including about 5,001 GitHub tokens, a small number of Gmail and other service tokens, and an auxiliary cache that held about 5,241 API keys during the breach window, with the full scope not yet known at the time of disclosure.&lt;/p&gt;

&lt;p&gt;Composio responded with credential rotation and OAuth revocation across roughly 100 toolkits, and is introducing customer-key self-custody (a Zero Trust Proxy KMS), with keys visible only at creation and IP allowlisting. This incident maps directly onto the authorization, blast-radius, and governance dimensions, demonstrating that the criteria most critical to production-readiness are exactly the ones that breadth-and-price comparisons tend to ignore.&lt;/p&gt;

&lt;p&gt;Tool reliability is another critical axis of evaluation. You need to differentiate between intent-level tools and raw API wrappers. Tools with constrained, intention-aligned schemas reduce the surface area for hallucinations and map more reliably to API calls than raw wrappers do. Raw API wrappers force the LLM to guess the exact schema structure, leading to endless retry loops and excessive token usage.&lt;/p&gt;

&lt;p&gt;Production workloads demand strict MCP and agent governance. Composio lets teams build custom tools through its SDK, but does not support connecting external MCP servers, including official vendor-published servers. This locks teams into Composio's catalog for pre-built integrations. Look for a governed tool registration that lets teams connect external MCP servers and manage their own tool definitions alongside pre-built catalogs, with pre- and post-tool-call policy enforcement and immutable audit logs. OpenTelemetry (OTel) compliance is the emerging standard for production AI observability. Platforms must support &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/" rel="noopener noreferrer"&gt;OTel with GenAI and MCP semantic conventions&lt;/a&gt;, capturing exact tool execution states to provide a reliable audit substrate.&lt;/p&gt;

&lt;p&gt;Pricing structure, deployment and self-hosting support, developer experience, and documentation quality should also guide your final platform choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Composio alternatives comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Arcade&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;AWS AgentCore&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Merge&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Natoma&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure multi-user production&lt;/td&gt;
&lt;td&gt;AWS-native ecosystems&lt;/td&gt;
&lt;td&gt;B2B data sync&lt;/td&gt;
&lt;td&gt;Shadow AI discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Platform + Usage based&lt;/td&gt;
&lt;td&gt;Usage-based (Complex)&lt;/td&gt;
&lt;td&gt;Platform / Linked accounts&lt;/td&gt;
&lt;td&gt;Seat-based / Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP gateway/capability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runtime + Gateway&lt;/td&gt;
&lt;td&gt;Partial (BYO servers)&lt;/td&gt;
&lt;td&gt;Gateway Only&lt;/td&gt;
&lt;td&gt;Gateway Only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User and agent authorization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Delegated per-user auth, scoped agent permissions, runtime intersection enforcement&lt;/td&gt;
&lt;td&gt;IAM and workload identities; end-user delegation depends on implementation&lt;/td&gt;
&lt;td&gt;Linked account credentials for data access; limited agent-specific authorization&lt;/td&gt;
&lt;td&gt;ABAC and role-based profiles across AI clients&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Key differentiator vs Composio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unified MCP runtime: auth + agent-optimized tools + governance&lt;/td&gt;
&lt;td&gt;Deep AWS compliance integration&lt;/td&gt;
&lt;td&gt;Normalized data schemas&lt;/td&gt;
&lt;td&gt;Shadow AI discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment options&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cloud, VPC, Air-gapped&lt;/td&gt;
&lt;td&gt;Cloud (AWS only)&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Cloud, VPC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit logs support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Immutable runtime audit logs&lt;/td&gt;
&lt;td&gt;CloudWatch/X-Ray via AWS setup&lt;/td&gt;
&lt;td&gt;Linked-account audit trail&lt;/td&gt;
&lt;td&gt;Tool-call and activity logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenTelemetry (OTel) compliance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  In-depth reviews of the best Composio alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Arcade: Composio alternative for secure, multi-user production
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;Engineering and AI product teams deploying secure, governed, multi-user agents in production environments.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Arcade.dev is the MCP runtime for building and deploying multi-user AI agents that take real actions across enterprise systems. It unifies agent authorization, agent-optimized tools, and lifecycle governance into a single execution layer, on the principle that a runtime is the best gateway. The layer that brokers identity and routes traffic should also enforce policy and capture audit, rather than leaving teams to bolt those concerns onto a thin proxy.&lt;/p&gt;

&lt;p&gt;This means engineering teams don't have to rebuild security plumbing, complex token management, and logging infrastructure for every new software integration.&lt;/p&gt;

&lt;h4&gt;
  
  
  Arcade vs. Composio: Key differences
&lt;/h4&gt;

&lt;p&gt;Composio focuses on breadth with a large catalog of tools auto-generated from OpenAPI specifications. Arcade focuses on depth with &lt;a href="https://www.arcade.dev/compare/arcade-vs-composio/" rel="noopener noreferrer"&gt;tools built to agent-experience principles and validated with evals before release&lt;/a&gt;, and provides the full runtime stack of authorization, agent-optimized tools, and governance in a single execution layer. That architectural difference drives three major advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Governance:&lt;/strong&gt; Arcade is the central enforcement point for policies your organization has already defined in IdPs, SaaS tools, and security systems, rather than asking teams to recreate them. Unlike Composio's Tool Router, Arcade can register and govern built-in, custom, and external MCP servers via a single control plane. That control plane covers every tool, agent, and auth provider, with strict versioning, a shared registry that prevents teams from rebuilding what already exists, visibility filtering so that agents only see tools their users are permitted to invoke, and immutable, OpenTelemetry-compatible audit logs. Pre- and post-tool-call hooks let compliance teams drop in custom variables (workflow state, time windows, request volume, session context) that the runtime treats as first-class enforcement primitives. Arcade's SOC 2 Type 2 certification validates these controls through an independent audit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delegated Authorization:&lt;/strong&gt; Arcade uses a &lt;a href="https://www.arcade.dev/blog/ai-agent-authentication-authorization/" rel="noopener noreferrer"&gt;multi-user, post-prompt authorization model&lt;/a&gt; with just-in-time permissions mapping. The runtime evaluates the exact intersection of what the agent and user are allowed to do, per action, at execution time. Tokens are managed through Arcade's &lt;a href="https://www.arcade.dev/get-started/authorization/" rel="noopener noreferrer"&gt;automated token vault&lt;/a&gt;, keeping credentials isolated from the underlying language model and removing prompt injection as a direct credential-theft vector. Destructive actions can be routed through out-of-band approvals before they execute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent-Level Reliability:&lt;/strong&gt; Arcade bypasses raw API wrappers by offering a &lt;a href="https://www.arcade.dev/tools/" rel="noopener noreferrer"&gt;catalog of 8,000+ agent-optimized MCP tools&lt;/a&gt; with constrained schemas that map reliably to API calls, reducing hallucination surface area. These tools select only the fields an agent requests and flatten responses into key-value pairs, which sharply reduces token consumption. In Arcade's &lt;a href="https://www.arcade.dev/blog/attio-mcp-toolkit-benchmark/" rel="noopener noreferrer"&gt;head-to-head Attio CRM benchmark&lt;/a&gt;, Composio returned roughly 100x more response tokens than Arcade across identical queries (747,083 vs. 7,426), a gap that can reach six figures in monthly token spend at enterprise scale. Built-in parallelized execution, intelligent retries with developer-defined context, and automatic failover sit alongside the catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros: What you gain with Arcade
&lt;/h4&gt;

&lt;p&gt;Arcade delivers production-grade security. Teams pass stringent enterprise security reviews by using vaulted tokens, just-in-time user consent flows, and out-of-band approvals for destructive actions, backed by SOC 2 Type 2 certification. Arcade can be deployed in the cloud, a customer VPC, on-prem, or fully air-gapped environments, which matters for regulated industries and teams running sensitive or legacy systems where the "I do not want to personally be on the hook for this" risk is highest.&lt;/p&gt;

&lt;p&gt;Arcade also eliminates configuration sprawl. Organizations manage all custom, third-party, and built-in tools from one centralized control plane with strict versioning. Since Arcade uses specialized intent-level tools, you'll see lower token usage and &lt;a href="https://www.arcade.dev/blog/connect-ai-agents-enterprise-tools/" rel="noopener noreferrer"&gt;fewer parameter hallucinations&lt;/a&gt; compared to basic API wrappers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons: What you give up with Arcade
&lt;/h4&gt;

&lt;p&gt;Arcade is purpose-built for multi-user production. Teams in the earliest single-user prototyping phase, where per-user authorization, governance, and audit are not yet requirements, may not need the full runtime on day one. In practice, most teams that reach Arcade start exactly there and switch once the agent meets real users.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pricing: How Arcade is priced
&lt;/h4&gt;

&lt;p&gt;Arcade uses a platform fee plus usage-based pricing on tool calls and auth events, designed for predictable scaling at enterprise volumes.&lt;/p&gt;

&lt;h4&gt;
  
  
  Migration considerations
&lt;/h4&gt;

&lt;p&gt;For an existing Composio-backed agent, the main work is replacing Composio tool calls with Arcade's agent-optimized tools, connecting existing OAuth and IdP providers, and validating that each workflow preserves the right user consent, tool permissions, and audit trail. Because Arcade exposes a standard MCP runtime endpoint, teams can keep their orchestration layer while moving tool execution into Arcade.&lt;/p&gt;




&lt;h3&gt;
  
  
  AWS AgentCore: Composio alternative for AWS-native agent stacks
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;Enterprise engineering teams fully entrenched in the AWS ecosystem who require tight integration with the existing infrastructure and strict compliance models, and have the expertise and resources to manage the integrations themselves.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Amazon Bedrock AgentCore is a platform for building, connecting, and optimizing AI agents. Unlike standalone third-party tools, it connects agents to enterprise systems via MCP servers, internal APIs, and Lambda functions, leveraging the massive scale of AWS's broader security, identity, and networking infrastructure.&lt;/p&gt;

&lt;h4&gt;
  
  
  AWS AgentCore vs. Composio: Key differences
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deep AWS native integration:&lt;/strong&gt; AgentCore inherits AWS's massive enterprise compliance halo. That gives teams access to SOC 2-, ISO-, and HIPAA-certified infrastructure, alongside resilient, multi-region availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS identity and security controls:&lt;/strong&gt; AgentCore can use &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/security-iam.html" rel="noopener noreferrer"&gt;AWS Identity and Access Management (IAM)&lt;/a&gt; for access policies, AWS Security Token Service (STS) for short-lived role assumption, and Key Management Service (KMS) for secret encryption during tool execution. These controls are powerful, but teams must configure and connect them across the agent execution path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS ecosystem evaluation tooling:&lt;/strong&gt; AWS offers experimentation and evaluation tooling around Bedrock agent workflows, so teams can test agent variations and tool-call reliability within the AWS environment. These capabilities still require setup across the surrounding AWS services.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros: What you gain with AWS AgentCore
&lt;/h4&gt;

&lt;p&gt;You get compliance and alignment with AWS architectures. If your organization already mandates strict VPC boundaries, private subnets, and granular IAM roles, AgentCore fits into that secure paradigm.&lt;/p&gt;

&lt;p&gt;Combine it with AWS CloudWatch and X-Ray, and you get debugging and trace correlation for every agent action across your cloud footprint.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons: What you give up with AWS AgentCore
&lt;/h4&gt;

&lt;p&gt;The primary tradeoff is operational assembly and management overhead. Building a secure agent environment in AgentCore requires configuring and stitching together multiple AWS services, such as IAM, CloudWatch, X-Ray, Step Functions, and Lambda, whereas a purpose-built runtime such as Arcade bundles per-user authorization, lifecycle governance, OpenTelemetry-compatible audit, and execution into a single layer that maps cleanly across clouds.&lt;/p&gt;

&lt;p&gt;This assembly burden introduces hidden logging and compute costs that are difficult to forecast. It also creates significant ecosystem lock-in. Once you build your agent architecture tightly around AWS IAM and Bedrock routing, you lose the portability that independent, cloud-agnostic runtimes provide.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pricing: How AWS AgentCore is priced
&lt;/h4&gt;

&lt;p&gt;AgentCore relies on a complex, usage-based AWS pricing model spanning multiple underlying compute and logging services. Forecasting total costs accurately is difficult.&lt;/p&gt;

&lt;h4&gt;
  
  
  Migration considerations
&lt;/h4&gt;

&lt;p&gt;Moving a Composio-backed agent to AWS AgentCore requires more AWS-specific implementation work. Teams need to translate integration logic into Lambda functions, AWS-hosted MCP servers, or other AWS services, then configure IAM, workload identities, logging, and tracing around those execution paths.&lt;/p&gt;




&lt;h3&gt;
  
  
  Merge: Composio alternative for unified APIs and B2B data sync
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;B2B SaaS companies focused on data-centric integration and normalizing data across hundreds of third-party platforms, like HRIS, ATS, and CRM systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Merge originally established itself as a leading Unified API provider, and has recently expanded to include an Agent Handler and Gateway. It connects AI tools to enterprise applications not just by routing raw requests, but by normalizing business data into standard, predictable schemas.&lt;/p&gt;

&lt;h4&gt;
  
  
  Merge vs. Composio: Key differences
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normalized Data Models:&lt;/strong&gt; Instead of connecting raw APIs and returning varied JSON structures, Merge standardizes data across entire software categories. All ticket data looks the same whether it comes from Jira, Zendesk, or Salesforce. This predictable schema benefits both Retrieval-Augmented Generation (RAG) and massive B2B data-syncing operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified API focus:&lt;/strong&gt; Merge has a stronger legacy in rigorous B2B data synchronization compared to Composio's primary focus on raw, varied action execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros: What you gain with Merge
&lt;/h4&gt;

&lt;p&gt;Engineering teams get built-in data syncing capabilities that form the bedrock of contextual, data-heavy RAG pipelines.&lt;/p&gt;

&lt;p&gt;Merge also brings a mature compliance posture for data-sync workloads, including SOC 2 Type II, HIPAA support, and GDPR alignment. Its dedicated Security Gateway can &lt;a href="https://docs.merge.dev/merge-agent-handler/overview" rel="noopener noreferrer"&gt;scan and redact Personally Identifiable Information (PII)&lt;/a&gt; before data ever reaches your underlying language models, though this is also achievable in runtime platforms like Arcade via pre- and post-tool-call hooks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons: What you give up with Merge
&lt;/h4&gt;

&lt;p&gt;Merge is strongest when the agent needs standardized data access across categories like HRIS, ATS, ticketing, CRM, and accounting. Compared with Composio, it is less of a broad action-execution layer for quickly calling many vendor APIs. Merge also comes from the Unified API and B2B data-sync category, so its AI capabilities are layered onto a data integration foundation rather than designed first as an agent execution runtime. Teams that need agents to perform varied actions across many apps should confirm the required actions are supported by Merge's normalized models and Agent Handler, rather than assuming the breadth of a tool-wrapper catalog.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pricing: How Merge is priced
&lt;/h4&gt;

&lt;p&gt;Merge operates on a premium B2B SaaS pricing model focused on platform usage and the total volume of active linked accounts.&lt;/p&gt;

&lt;h4&gt;
  
  
  Migration considerations
&lt;/h4&gt;

&lt;p&gt;Moving from Composio to Merge is less about swapping an agent runtime and more about changing the integration layer. Teams need to map existing tool calls to Merge's normalized data models and adjust agent code that expects raw vendor-specific API responses.&lt;/p&gt;




&lt;h3&gt;
  
  
  Natoma: Composio alternative for shadow AI discovery
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;IT and Security teams that need to discover and govern unmanaged AI clients and rogue MCP servers across enterprise networks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Natoma is an enterprise MCP gateway focused on discovering and governing AI tool access across fragmented clients like Claude Code, Cursor, ChatGPT, and custom internal agents. Its strongest fit is shadow AI discovery: finding unmanaged AI clients and rogue MCP servers, then applying identity-aware access controls so security teams can see and govern how agents connect to enterprise systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.snowflake.com/en/news/press-releases/snowflake-announces-intent-to-acquire-natoma-providing-secure-connectivity-for-the-agentic-enterprise/" rel="noopener noreferrer"&gt;Snowflake announced a definitive agreement to acquire Natoma&lt;/a&gt; on May 27, 2026. Buyers should validate the standalone product roadmap, support model, and integration coverage before standardizing on it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Natoma vs. Composio: Key differences
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Policy at the tool layer:&lt;/strong&gt; Natoma emphasizes Attribute-Based Access Control (ABAC) and bundles toolkits into strict, role-based Profiles. It focuses on rigorous policy enforcement and the integration of AWS Cedar policies rather than on basic API routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shadow AI discovery:&lt;/strong&gt; Unlike Composio, Natoma offers dedicated network-level tools to discover and govern unmanaged AI clients and rogue shadow MCP servers across an enterprise network.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros: What you gain with Natoma
&lt;/h4&gt;

&lt;p&gt;Organizations get high visibility into exactly which AI clients are active in their enterprise environments.&lt;/p&gt;

&lt;p&gt;You can secure existing AI coding assistants and internal agent builds without changing the underlying language models or orchestration frameworks that those tools rely on. Extensive SIEM and EDR integrations ensure your security operations center stays fully informed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons: What you give up with Natoma
&lt;/h4&gt;

&lt;p&gt;Natoma focuses primarily on authorization and identity mapping. Like other governance-focused overlays, it doesn't include a catalog of pre-built, agent-optimized tools.&lt;/p&gt;

&lt;p&gt;For built-in execution-reliability features like automatic failover and intelligent retries that stabilize fragile API connections, teams typically pair it with a dedicated runtime.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pricing: How Natoma is priced
&lt;/h4&gt;

&lt;p&gt;Natoma uses a custom Enterprise SaaS pricing model requiring organizations to contact their sales team for tiered seat licensing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Migration considerations
&lt;/h4&gt;

&lt;p&gt;Moving from Composio to Natoma depends on whether the goal is replacing tool execution or adding governance over existing AI clients and MCP servers. Teams should validate supported integrations, policy coverage, and the product roadmap following Snowflake's announced intent to acquire Natoma.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Choosing the best Composio alternative for production
&lt;/h2&gt;

&lt;p&gt;Governance determines whether you can safely scale AI agents beyond a single user, and the foundational layer you pick makes that governance enforceable rather than aspirational.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;Arcade&lt;/strong&gt; for a full multi-user production runtime with built-in governance and agent-optimized tools. Choose &lt;strong&gt;AWS AgentCore&lt;/strong&gt; for strict AWS-native integrations. Go for &lt;strong&gt;Merge&lt;/strong&gt; if your priority is B2B data syncing and normalized schemas. Consider &lt;strong&gt;Natoma&lt;/strong&gt; for shadow AI discovery across enterprise networks.&lt;/p&gt;

&lt;p&gt;If you're transitioning from a prototype to a secure, multi-user production environment, &lt;a href="https://app.arcade.dev/register" rel="noopener noreferrer"&gt;explore Arcade.dev to see how a unified MCP runtime natively solves authorization and governance&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Composio best for?
&lt;/h3&gt;

&lt;p&gt;Composio works best for rapid prototyping and early-stage agents where you want quick access to a large catalog of integrations and don't need strict multi-user authorization, governance, and production-level auditability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Composio production-ready for multi-user AI agents?
&lt;/h3&gt;

&lt;p&gt;Composio can support limited production scenarios, but teams typically outgrow it when they need per-user delegated authorization, blast-radius controls, and standardized observability and audit logs across many users and tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  What should I look for in a production-ready alternative to Composio?
&lt;/h3&gt;

&lt;p&gt;Prioritize per-user delegated authorization with tokens kept out of model context, governance controls for tool registration and policy enforcement, and audit logs and traceability (ideally OpenTelemetry) for every tool call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which Composio alternative is best for secure, multi-user production agents?
&lt;/h3&gt;

&lt;p&gt;Arcade is the best choice for teams that need a unified MCP runtime with just-in-time authorization and centralized governance for multi-user production deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I choose Arcade instead of Composio?
&lt;/h3&gt;

&lt;p&gt;Choose Arcade when you need a unified MCP runtime for multi-user production agents with per-user delegated authorization, centralized governance, and agent-optimized tools in a single execution layer. It fits teams moving beyond prototyping that require vaulted credentials, immutable audit logs, and flexible deployment (cloud, VPC, or air-gapped).&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I choose AWS AgentCore instead of a standalone runtime?
&lt;/h3&gt;

&lt;p&gt;Choose AWS AgentCore when you're all-in on AWS (IAM, VPC, CloudWatch/X-Ray) and have the engineering resourcing and expertise to assemble and manage multiple AWS services to meet your security, compliance, and operational requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  When is Merge a better choice than Composio?
&lt;/h3&gt;

&lt;p&gt;Choose Merge when your primary need is B2B data integration, especially normalized schemas and data sync across categories like HRIS, ATS, and CRM, rather than governed, multi-step action execution for many end users.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is MCP (Model Context Protocol), and why does it matter for these tools?
&lt;/h3&gt;

&lt;p&gt;MCP is a standard way for agents to call tools and servers. It matters because a production setup needs consistent authorization, governance, and observability around those tool calls, especially when many users share the same agent system.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "delegated authorization" mean for AI agents?
&lt;/h3&gt;

&lt;p&gt;Delegated authorization means the agent performs actions on behalf of a specific end user. Each tool call is evaluated against both the agent's permissions and the user's permissions at runtime, reducing the risk of shared credentials and oversized access.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>security</category>
      <category>agents</category>
    </item>
    <item>
      <title>Best Natoma Alternatives in 2026 After the Snowflake Acquisition</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 11 Jun 2026 19:20:22 +0000</pubDate>
      <link>https://dev.to/arcade/best-natoma-alternatives-in-2026-after-the-snowflake-acquisition-425k</link>
      <guid>https://dev.to/arcade/best-natoma-alternatives-in-2026-after-the-snowflake-acquisition-425k</guid>
      <description>&lt;p&gt;On May 27, 2026, Snowflake &lt;a href="https://www.snowflake.com/en/news/press-releases/snowflake-announces-intent-to-acquire-natoma-providing-secure-connectivity-for-the-agentic-enterprise/" rel="noopener noreferrer"&gt;announced its intent to acquire Natoma&lt;/a&gt;. This validates both Natoma and the enterprise Model Context Protocol governance category. Still, the acquisition prompts engineering leaders, AI platform teams, and security buyers to reassess their multi-user agent infrastructure.&lt;/p&gt;

&lt;p&gt;When evaluating MCP runtime alternatives, you're facing a real architectural decision of whether to stay tethered to an ecosystem-native gateway or adopt an independent or vendor-neutral MCP runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose Arcade.dev&lt;/strong&gt; if you need an independent MCP runtime with secure agent authorization via On-Behalf-Of (OBO), agent-optimized tools, agent lifecycle governance, and flexible deployment (cloud, VPC, and air-gapped).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose AWS AgentCore&lt;/strong&gt; if you're all-in on AWS/Bedrock and accept AWS-only constraints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose WorkOS&lt;/strong&gt; if your main gap is enterprise SSO/directory sync (identity), not agent execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose Merge&lt;/strong&gt; if your main need is normalized integrations and bulk data sync, not multi-step agent workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Key differentiator&lt;/th&gt;
&lt;th&gt;Deployment flexibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Arcade&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unified auth, tools, and governance runtime&lt;/td&gt;
&lt;td&gt;Cloud, VPC, Air-gapped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS AgentCore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native AWS IAM and Bedrock integration&lt;/td&gt;
&lt;td&gt;AWS-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WorkOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Developer-first human identity auth APIs&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Merge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unified API data normalization&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What the Snowflake acquisition means for Natoma users
&lt;/h2&gt;

&lt;p&gt;Snowflake's acquisition of Natoma signals that strict AI governance is now a core enterprise requirement. Natoma is a fully managed enterprise MCP gateway that enforces Cedar-based attribute access control (ABAC), shadow-AI discovery, SSO and SCIM, and SIEM/EDR integrations.&lt;/p&gt;

&lt;p&gt;Enterprises currently discover an &lt;a href="https://natoma.ai/platform" rel="noopener noreferrer"&gt;average of 225 unmanaged shadow AI instances per organization&lt;/a&gt;. That makes centralized governance an immediate security priority. But this acquisition shifts the product roadmap toward native Snowflake Intelligence and Cortex ecosystems.&lt;/p&gt;

&lt;p&gt;Under the agreement, Snowflake will build Natoma into its governance and identity layer for AI agents and MCP tool access, using it as the centralized gateway enforcing identity, policy, and audit at the tool-call level.&lt;/p&gt;

&lt;p&gt;This raises real questions for current and prospective Natoma users. Will Natoma remain available and supported as a standalone product, or be folded into Snowflake's stack? Will the roadmap orient toward Snowflake Intelligence, Cortex, and the broader Snowflake ecosystem?&lt;/p&gt;

&lt;h3&gt;
  
  
  When Natoma still makes sense after the acquisition
&lt;/h3&gt;

&lt;p&gt;Natoma makes sense for enterprises already embedded in the Snowflake ecosystem, with internal role-based access control (RBAC) as their primary governance layer. It also suits platform teams that prioritize native integration with Cortex tools such as search and analyst services.&lt;/p&gt;

&lt;p&gt;Enterprise buyers who prefer their agent governance bundled with their core data warehouse procurement will find the combined offering a natural fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to evaluate Natoma alternatives in 2026
&lt;/h2&gt;

&lt;p&gt;An enterprise agent setup rests on three core pillars: agent authorization, agent-optimized tool reliability, and agent lifecycle governance. Any production runtime must solve all three simultaneously, plus deployment flexibility as a cross-cutting requirement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1os2hksbqo14vi1jekko.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1os2hksbqo14vi1jekko.jpg" alt="A detailed architectural diagram illustrating the workflow and components of an MCP Runtime system within a B2B SaaS environment. A Client Application sends a Tool Request to the central MCP Runtime hub, which orchestrates three branches: Identity Context and Authorization (to Per-User Delegated Authorization, then to an OAuth / Identity Provider for Policy Evaluation); Tool Catalog and Execution (to an Agent-Optimized Tool Catalog that Invokes actions, leading to execution on External Enterprise SaaS); and Governance and Auditing (to Lifecycle Governance and Audit Logs to Emit Telemetry). The diagram uses a hierarchical structure with rounded nodes and a navy, teal, and gray color scheme." width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How agent authorization and OBO execution works
&lt;/h3&gt;

&lt;p&gt;Teams either give agents their own identity with broad credentials, or they inherit the user's full access. Both approaches create an excessive blast radius. Any failure, whether from misconfiguration, hallucinated tool calls, or adversarial input, propagates across every connected system.&lt;/p&gt;

&lt;p&gt;The runtime must enforce the exact intersection of agent and user permissions for each action, evaluating both what the agent is allowed to do and what the user is allowed to do at execution time. This process requires managing the complete OAuth token lifecycle isolated from the language model itself.&lt;/p&gt;

&lt;p&gt;Make sure the system supports pre- and post-call policy hooks to dynamically evaluate granular access requests at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to evaluate agent-optimized tool reliability
&lt;/h3&gt;

&lt;p&gt;Most MCP servers wrap APIs designed for structured inputs, such as &lt;code&gt;recipient_user_id&lt;/code&gt; or &lt;code&gt;file_id&lt;/code&gt;, not for natural language like "send this to Finance." The root cause is that tool schemas are written for machine consumers rather than language models. Verbose schemas bloat the context window, and mismatched parameter names cause the model to hallucinate values.&lt;/p&gt;

&lt;p&gt;Evaluate whether the runtime provides curated tools optimized for natural-language intent rather than rigid machine interfaces. The runtime execution layer must also support intelligent retries, automatic schema validation, and automated failover capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  What agent lifecycle governance should include
&lt;/h3&gt;

&lt;p&gt;Every tool execution requires immutable, &lt;a href="https://opentelemetry.io/docs/concepts/semantic-conventions/" rel="noopener noreferrer"&gt;OpenTelemetry-compatible audit logs&lt;/a&gt; tracing the agent action per user per connected service.&lt;/p&gt;

&lt;p&gt;The runtime must enforce visibility filtering so that agents discover only the specific, approved tools permitted by the active human user session. It should also provide version control for safe upgrades and a shared registry with team-level access controls to prevent tool sprawl across projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to assess deployment flexibility and vendor independence
&lt;/h3&gt;

&lt;p&gt;Enterprise architecture demands deployment versatility. Can the runtime operate as a vendor-neutral layer that runs across any major cloud provider? Can you self-host it on a private network or securely deploy it in air-gapped environments?&lt;/p&gt;

&lt;p&gt;Systems tied to a broader data warehouse or cloud provider ecosystem will dictate your downstream infrastructure choices and limit cross-platform integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  In-depth reviews of the best Natoma alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Alternative 1: Arcade (independent action runtime)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;Enterprise engineering teams needing a complete, vendor-neutral action runtime for multi-user production agents. Security-conscious organizations requiring per-user delegated authorization and air-gapped deployments.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev is an independent action runtime&lt;/a&gt; that unifies agent authorization, agent-optimized tools, and continuous lifecycle governance into a single execution layer.&lt;/p&gt;

&lt;p&gt;While standalone gateways or specialized registries often focus primarily on routing traffic, Arcade handles the direct, parallelized execution of a catalog of over 8,000 agent-optimized MCP tools. It enforces access controls at the intersection of agent and user permissions, ensuring secure downstream actions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differentiators vs. Natoma
&lt;/h4&gt;

&lt;p&gt;Arcade is a full actions runtime, not only a routing gateway. It directly executes and manages the runtime reliability of the tools, whereas Natoma routes requests to your existing deployed servers.&lt;/p&gt;

&lt;p&gt;It maintains platform independence through cloud-agnostic, flexible deployment models and provides an extensive, curated catalog of agent-optimized tools built for language-model intent. This often reduces parameter-hallucination issues found in standard interface wrappers.&lt;/p&gt;

&lt;p&gt;Arcade co-authored the MCP auth specification alongside Microsoft and Okta/Auth0, and authored the URL Elicitation specification with Anthropic. This standards-level involvement shapes how the protocol itself handles identity and consent.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros (what you gain)
&lt;/h4&gt;

&lt;p&gt;You get a centralized control plane for authorization, reliable tool execution, and continuous governance without stitching together multiple fragmented point solutions.&lt;/p&gt;

&lt;p&gt;Arcade enforces a permission-intersection model in which every action is authorized at the strict intersection of the agent's permissions and the specific human user's permissions. This two-identity approach isolates credentials from the language model, preventing privilege escalation.&lt;/p&gt;

&lt;p&gt;The runtime acquires credentials only when an action is required, requesting minimum OAuth permissions scoped to that specific tool. For irreversible actions, out-of-band approvals enforce a mandatory human approval step. You also get detailed, &lt;a href="https://docs.arcade.dev/en/guides/audit-logs" rel="noopener noreferrer"&gt;OpenTelemetry-compatible audit logging&lt;/a&gt; for every agent action executed across the runtime. Arcade holds SOC 2 Type II certification, with coverage that extends from the underlying cloud infrastructure through to every tool call an agent executes.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons (what you give up)
&lt;/h4&gt;

&lt;p&gt;You give up the likely future advantage of Natoma being built into Snowflake Intelligence, Cortex, and Snowflake-native governance workflows. Snowflake governance policies can still be applied to workflows running through Arcade, but not natively by default. You also lose the administrative convenience of bundled procurement and unified billing if your company already purchases significant Snowflake infrastructure.&lt;/p&gt;

&lt;h4&gt;
  
  
  Deployment and flexibility
&lt;/h4&gt;

&lt;p&gt;Arcade provides maximum environmental adaptability. It &lt;a href="https://docs.arcade.dev/en/guides/deployment-hosting" rel="noopener noreferrer"&gt;supports cloud deployments, self-hosted deployments within your own virtual private cloud, and air-gapped environments&lt;/a&gt; designed for regulated industries. Arcade is also agnostic to models, agent frameworks, and clients, so your team can use any combination of LLM providers and orchestration tools without runtime constraints. Post-acquisition, Natoma will likely become more opinionated toward Snowflake-supported models and tooling.&lt;/p&gt;

&lt;p&gt;Arcade brokers authorization protocols with your existing identity providers, including Okta and Microsoft Entra, enforcing existing policies rather than requiring duplication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative 2: WorkOS (enterprise identity and SSO)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;SaaS application developers whose primary roadblock is managing human user identity synchronization rather than handling agent-specific tool execution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;WorkOS is a developer platform with APIs designed to make applications enterprise-ready. It offers AuthKit, single sign-on, automated directory synchronization, and standard role-based access control.&lt;/p&gt;

&lt;p&gt;It is a foundational identity building block, not a full AI agent platform.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differentiators vs. Natoma
&lt;/h4&gt;

&lt;p&gt;WorkOS maintains a pure focus on identity, providing robust infrastructure for human identity management.&lt;/p&gt;

&lt;p&gt;It delivers a great developer experience through comprehensive documentation, software development kits, and integrated drop-in interface components that accelerate time-to-market for standard authentication flows.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros (what you gain)
&lt;/h4&gt;

&lt;p&gt;You get the fastest available path to implementing enterprise single sign-on and automated directory synchronization. WorkOS provides an off-the-shelf administrative portal that empowers enterprise buyers to manage their own user provisioning.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons (what you give up)
&lt;/h4&gt;

&lt;p&gt;WorkOS has no native understanding of AI agents, the Model Context Protocol (MCP), or tool-calling security primitives.&lt;/p&gt;

&lt;p&gt;Your engineering team must build the agent authorization layer from scratch, mapping WorkOS identities to individual agent scope boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative 3: AWS AgentCore (AWS-native agent platform)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;Large enterprises committed to Amazon Web Services as their exclusive cloud provider, seeking to build native AI agents directly within Amazon Bedrock.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;AgentCore is the dedicated agent platform layer within Amazon Bedrock. It connects foundation models to enterprise systems while enforcing access policies and tracing agent workflows.&lt;/p&gt;

&lt;p&gt;It delivers a secure, scalable environment backed by existing Amazon identity and access management infrastructure and automated reasoning primitives.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differentiators vs. Natoma
&lt;/h4&gt;

&lt;p&gt;AgentCore offers cloud-native integration with deep, structural ties to AWS-native serverless functions, isolated virtual private clouds, and existing identity infrastructure.&lt;/p&gt;

&lt;p&gt;It also includes built-in evaluations, providing robust native tooling for experimenting with and evaluating agent behavior under high-volume production traffic.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros (what you gain)
&lt;/h4&gt;

&lt;p&gt;You achieve strong compliance and security inheritance if your critical workloads already operate within the Amazon ecosystem.&lt;/p&gt;

&lt;p&gt;AgentCore provides secure connectivity to other AWS-hosted services, including storage buckets, relational databases, and internal private APIs, without routing sensitive traffic over the public internet.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons (what you give up)
&lt;/h4&gt;

&lt;p&gt;You sacrifice vendor neutrality. AgentCore locks your agent architecture into the Amazon ecosystem and Bedrock execution paradigms.&lt;/p&gt;

&lt;p&gt;This architecture is difficult to deploy across multi-cloud environments or hybrid on-premise setups outside the prescribed footprint. And requires a heavy engineering burden to manage the separate services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative 4: Merge (unified API for data sync)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Best for
&lt;/h4&gt;

&lt;p&gt;Engineering teams building products requiring standardized data synchronization across common software categories rather than executing multi-step agent operations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Merge is a unified API for normalized business data, providing a single integration point for hundreds of third-party tools. It's the most narrowly scoped option in this list, but the right fit for data-heavy use cases. Their &lt;a href="https://merge.dev/blog/agent-handler" rel="noopener noreferrer"&gt;agent handler product&lt;/a&gt; allows large language models to query and push structured data through these normalized interfaces.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differentiators vs. Natoma
&lt;/h4&gt;

&lt;p&gt;Merge excels at data normalization, translating disparate external interfaces into a unified data schema. It focuses on aggregating standard integration layers rather than managing protocol-level execution governance for custom-deployed servers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros (what you gain)
&lt;/h4&gt;

&lt;p&gt;You get access to hundreds of standard external platforms without having to read individual technical documentation. Merge also automatically handles authentication for end-user application integrations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons (what you give up)
&lt;/h4&gt;

&lt;p&gt;Merge offers less granular control over per-user delegated execution policies, which are required for enterprise protocol governance.&lt;/p&gt;

&lt;p&gt;Its integrations optimize for bulk data synchronization rather than natural-language intent, increasing the risk of token bloat during complex reasoning loops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Choosing the right Natoma alternative after the acquisition
&lt;/h2&gt;

&lt;p&gt;The Snowflake acquisition of Natoma pushes engineering leaders to evaluate whether their agent infrastructure solves authorization, tool reliability, and governance together, while maintaining the deployment flexibility their architecture demands.&lt;/p&gt;

&lt;p&gt;The best alternative depends on your architectural philosophy and whether you stay tethered to a Snowflake-native gateway, piece together governance tools, or adopt an independent runtime. These choices are not mutually exclusive. A two-layer approach keeps data-proximate agents operating natively inside Snowflake for internally governed analytics while deploying an external, vendor-neutral runtime such as Arcade to handle cross-cloud tool execution.&lt;/p&gt;

&lt;p&gt;Prioritize platforms that solve agent authorization, agent-optimized tool reliability, and lifecycle governance simultaneously. Addressing only one or two of these pillars will create gaps that slow your production rollout.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/contact" rel="noopener noreferrer"&gt;Book a demo with the Arcade.dev team today&lt;/a&gt; to see the permission intersection model execute in a live production environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What changed for Natoma users after the Snowflake acquisition?
&lt;/h3&gt;

&lt;p&gt;Natoma's roadmap will likely align more tightly with Snowflake's ecosystem, which can reduce cross-cloud portability. Teams should reassess whether they want Snowflake-native governance or an independent runtime for multi-cloud agent execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Natoma still a good choice after the acquisition?
&lt;/h3&gt;

&lt;p&gt;Yes, if your agents primarily run in Snowflake and you want governance tightly coupled to Snowflake RBAC and Cortex workflows. If you need multi-cloud execution or non-Snowflake toolchains, an independent layer may be a better fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why choose Arcade as a Natoma alternative?
&lt;/h3&gt;

&lt;p&gt;Arcade is a vendor-neutral action runtime that combines per-user delegated authorization, a catalog of over 8,000 agent-optimized tools, and lifecycle governance in a single layer. It supports cloud, VPC, and air-gapped deployments, and is agnostic to models, frameworks, and clients. For teams that need cross-cloud portability and production-grade agent infrastructure without ecosystem lock-in, Arcade covers authorization, execution, and audit without requiring additional point solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between an MCP gateway and an action runtime?
&lt;/h3&gt;

&lt;p&gt;A gateway routes requests and enforces access policies for tool calls. A runtime executes tools, enforces policy, and audits, handling reliability (retries, failover, validation), delegated auth flows, and telemetry during execution. For production multi-user deployments, a runtime is architecturally superior because it owns the full execution lifecycle rather than just the routing layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I choose an independent Natoma alternative instead of a Snowflake-native option?
&lt;/h3&gt;

&lt;p&gt;Choose an independent option when you need multi-cloud portability, want to avoid data-cloud lock-in, or must support VPC, on-prem, and air-gapped deployments. An independent option also fits better when agents need to call many external SaaS tools outside Snowflake.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is per-user delegated authorization and why does it matter for agents?
&lt;/h3&gt;

&lt;p&gt;Per-user delegated authorization means each tool action is authorized using the intersection of the end user's permissions and the agent's allowed scope. This approach reduces the blast radius compared with shared service accounts and improves auditability for enterprise security reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which alternative is best if I already have an agent execution stack and only need governance?
&lt;/h3&gt;

&lt;p&gt;A governance overlay fits best. Focus on registry, threat detection, and audit controls layered on top of your existing runtime rather than replacing execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which option is best if my company is all-in on AWS?
&lt;/h3&gt;

&lt;p&gt;If your agents run on Bedrock and you rely on AWS IAM and native AWS networking controls, an AWS-native agent platform is the most straightforward choice. Keep in mind that it comes with a trade-off on multi-cloud portability.&lt;/p&gt;

&lt;h3&gt;
  
  
  What should I look for in an enterprise action runtime evaluation?
&lt;/h3&gt;

&lt;p&gt;Prioritize per-user delegated authorization, agent-optimized tool reliability, centralized audit and governance, and deployment flexibility (cloud, VPC, and air-gapped). These criteria directly determine whether your agent infrastructure can scale securely across users, tools, and environments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>security</category>
      <category>identity</category>
    </item>
    <item>
      <title>AI agent governance and runtime compliance framework for CISOs</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 09 Jun 2026 20:50:33 +0000</pubDate>
      <link>https://dev.to/arcade/ai-agent-governance-compliance-5841</link>
      <guid>https://dev.to/arcade/ai-agent-governance-compliance-5841</guid>
      <description>&lt;p&gt;AI agents are now in production across healthcare, financial services, and critical SaaS systems. They mutate data, trigger workflows, and call external APIs on behalf of real users. These are autonomous actors, not the read-only recommendation engines that security teams already know how to govern. The business is shipping them, and saying no won't pause that. The CISO question is no longer whether to allow agents into production. It's how to say yes safely, fast enough that security isn't the reason the business can't ship.&lt;/p&gt;

&lt;p&gt;The honest answer is that traditional enterprise security models don't survive contact with this workload. Governance-as-logging assembles evidence after the breach. Governance-as-spreadsheet drifts the moment code ships. Governance-as-policy-PDF answers an auditor's question about intent, not the runtime question of what an agent actually did at 03:14 on a Tuesday. None of these are governance. They are documentation. And building bespoke security infrastructure to close the gap is the same mistake in engineering form: months of plumbing while the actual governance gap stays open.&lt;/p&gt;

&lt;p&gt;Governance is a runtime contract enforced at the exact millisecond of every tool call, paired with an immutable audit trail that an auditor can replay end-to-end. Every agent action must be attributable, policy-governed, immutably audit-replayable, and revocable across user, agent, tenant, and task. Enforced at runtime, provable after the fact.&lt;/p&gt;

&lt;p&gt;What follows is a CISO-grade rubric organized around the four concerns CISOs surface in the field, with the six runtime capabilities that address them. Hand it to your AI/ML team. Hand it to the security architects and IAM leads pulled into the project. Both audiences should be able to verify against it before any agent reaches production.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;A CISO-grade rubric for AI agent governance organizes around the four concerns every CISO surfaces in the field. The 6 capabilities that address them sit beneath each:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Identity and attribution: the service account problem.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capability 1: Agent and tool registry under version control&lt;/li&gt;
&lt;li&gt;Capability 2: Delegated agent authorization with scoped, just-in-time credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Active prevention at the tool call: saying yes safely.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capability 3: Centralized policy enforcement at runtime&lt;/li&gt;
&lt;li&gt;Capability 4: Action-layer guardrails (parameter validation, rate limits, output filtering, prompt injection interception, step-up authorization for high-impact actions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Observability your SIEM can use: after the fact.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capability 5: Immutable, replayable audit trail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Continuous audit-readiness: the verification rubric itself.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capability 6: Compliance attestation at the agent and action layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No patchwork of SIEMs, policy engines, GRC platforms, identity providers, or MCP gateways answers all four concerns end-to-end. A unified MCP runtime does.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mapping AI agent governance to NIST, ISO/IEC 42001, and the EU AI Act
&lt;/h2&gt;

&lt;p&gt;The policy layer of enterprise AI governance is defined by a small set of converging international frameworks: ISO/IEC 42001, the NIST AI Risk Management Framework, ISO/IEC 42005, the EU AI Act, and the CSA/NIST Agentic Profile, which extends them for autonomous systems. Technical controls need to anchor here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.iso.org/standard/42001" rel="noopener noreferrer"&gt;ISO/IEC 42001&lt;/a&gt; sets the foundational requirements for an enterprise AI Management System. It demands continuous system monitoring, strict event logging, and traceable data provenance.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.nist.gov/itl/ai-risk-management-framework" rel="noopener noreferrer"&gt;NIST AI Risk Management Framework&lt;/a&gt;, extended by the GenAI Profile, provides a risk operating model that emphasizes continuous testing, evaluation, verification, and validation throughout the agent lifecycle.&lt;/p&gt;

&lt;p&gt;ISO/IEC 42005 builds on these by mandating rigorous AI system impact assessments. You'll need documented, immutable evidence of risk treatments and architectural safeguards.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://artificialintelligenceact.eu/the-act/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt; is what's actually driving urgency. Its phased timeline doesn't just turn best practices into legally binding requirements. It turns the gap between policy and runtime into a legal liability that the security team is responsible for.&lt;/p&gt;

&lt;p&gt;Prohibited practices and AI literacy obligations became applicable on February 2, 2025. Strict obligations for General Purpose AI providers took effect on August 2, 2025, requiring detailed technical documentation and systemic risk monitoring. By August 2, 2026, the broad applicability phase requires that every high-risk AI system implement automatic, immutable logging and strict human oversight.&lt;/p&gt;

&lt;p&gt;These are not deadlines on the security team's roadmap. They are legal exposure that attaches whenever an agent in a regulated workload acts without governance evidence the runtime can produce on demand. "We're planning to address this" is not a defensible position when an auditor asks why the agent that processed yesterday's PHI access can't be replayed.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://labs.cloudsecurityalliance.org/agentic/agentic-nist-ai-rmf-profile-v1/" rel="noopener noreferrer"&gt;Cloud Security Alliance and NIST Agentic Profile&lt;/a&gt; bridge the gap between broad regulatory mandates and technical implementation. This profile explicitly extends the NIST AI RMF to address threats specific to autonomous systems.&lt;/p&gt;

&lt;p&gt;It introduces autonomy-tier classification, tool-use risk modeling, and continuous delegation-chain monitoring, giving you the vocabulary to assess multi-agent interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standards define what. A runtime defines how.
&lt;/h3&gt;

&lt;p&gt;These standards are rigorous about the policy layer. They are silent about execution. NIST AI RMF tells you to monitor; it doesn't intercept a prompt injection. ISO/IEC 42001 tells you to log; it doesn't block an undesired API call. The EU AI Act requires human oversight; it doesn't mandate cryptographic approval for a specific tool-call payload.&lt;/p&gt;

&lt;p&gt;Closing the gap between legal requirement and technical reality is a runtime problem, not a documentation problem. The control point is the action layer (the moment an agent tries to call a tool), not the infrastructure boundary, the network perimeter, or a policy document. Runtime enforcement is what turns the standards into active security controls at the place the action actually happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Classifying AI agent autonomy tiers (1–4)
&lt;/h2&gt;

&lt;p&gt;Governance strictness has to scale with agent autonomy. A structured classification gives you the vocabulary to do that.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://labs.cloudsecurityalliance.org/agentic/agentic-nist-ai-rmf-profile-v1/" rel="noopener noreferrer"&gt;CSA / NIST AI RMF Agentic Profile&lt;/a&gt; defines a four-tier classification aligned with the operational characteristics that drive governance requirements. Data sensitivity, action reversibility, and potential legal or customer impact should dictate the maximum acceptable tier for any workload.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Agent autonomy-tier classification (CSA / NIST Agentic Profile)&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Autonomy tier&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Governance requirement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tier 1: fully supervised&lt;/td&gt;
&lt;td&gt;Agent generates outputs that require human approval before any action is taken.&lt;/td&gt;
&lt;td&gt;Governance structures equivalent to non-agentic generative AI.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tier 2: constrained autonomy&lt;/td&gt;
&lt;td&gt;Agent executes pre-approved action types within a predefined scope. Actions outside that scope require human escalation.&lt;/td&gt;
&lt;td&gt;Formal action scope documentation, approval authority delegation policies, defined escalation triggers, action-consequence mapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tier 3: broad autonomy within boundaries&lt;/td&gt;
&lt;td&gt;Agent operates with broad autonomy within a defined operational boundary. Bounded by hard constraints on resource access, action scope, and time horizon, and subject to continuous monitoring.&lt;/td&gt;
&lt;td&gt;Continuous behavioral monitoring, defined response playbooks, real-time agent registries integrated with IAM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tier 4: full autonomy within constrained environment&lt;/td&gt;
&lt;td&gt;Agent operates at full autonomy within a constrained environment, capable of spawning sub-agents, acquiring new tool capabilities, and executing long-horizon plans with minimal human interaction.&lt;/td&gt;
&lt;td&gt;All Tier 3 requirements plus formal oversight board review at defined intervals.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tier 1 is the starting point for high-stakes workflows where you can't easily reverse an action; every output is gated by human approval before it executes. Tier 2 is the practical default for most current enterprise deployments. Agents act autonomously within pre-approved scopes and escalate anything outside them. Tier 3 introduces broad autonomy within a defined operational boundary and is appropriate where continuous monitoring and well-bounded behavioral envelopes are in place. Tier 4 introduces sub-agent orchestration and long-horizon planning. It requires formal oversight board review and is rarely appropriate outside controlled research environments or specialized workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Action-layer risk taxonomy for AI agents (tool calls, identity, and delegation)
&lt;/h2&gt;

&lt;p&gt;Relying on generic vulnerability lists, such as the &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications" rel="noopener noreferrer"&gt;OWASP Top 10,&lt;/a&gt; isn't enough to secure autonomous systems.&lt;/p&gt;

&lt;p&gt;Prompt injections and training data poisoning are real concerns. But when you deploy agents, focus on the action layer. When an AI system can mutate data, trigger workflows, and interact with external APIs, the threat model changes.&lt;/p&gt;

&lt;p&gt;Once an agent can act, the threat model collapses into a set of operational questions: &lt;em&gt;which tool, with which parameters, on whose behalf, under which policy version, with what approval, and for how long that authority holds.&lt;/em&gt; The last one, time-bounded authority, is the dimension most often missed. A token issued for a session must expire when the session ends, not linger for hours or days as a residual credential that outlives the workflow that produced it. Just-in-time issuance and tight TTLs are part of the threat model, not part of the infrastructure details.&lt;/p&gt;

&lt;p&gt;An action-layer risk taxonomy maps specific agentic threats directly to architectural mitigations in the runtime, moving security teams from theoretical vulnerabilities to deterministic system design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Threat-to-mitigation mapping for tool calls
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Action-layer threat-to-mitigation mapping&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threat vector&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Primary mitigation (from 6-capability framework)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool-call hijacking&lt;/td&gt;
&lt;td&gt;Malicious input manipulates the agent into calling a tool with malicious or manipulated parameters.&lt;/td&gt;
&lt;td&gt;Capability 4: action-layer guardrails (parameter validation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delegated prompt injection&lt;/td&gt;
&lt;td&gt;An agent is compromised by malicious data it retrieves from an external source, leading to undesired actions within its authorized scope.&lt;/td&gt;
&lt;td&gt;Capability 2: delegated agent authorization (scoped credentials limit blast radius)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential exfiltration&lt;/td&gt;
&lt;td&gt;An agent with overly broad permissions leaks or misuses sensitive credentials to which it has access.&lt;/td&gt;
&lt;td&gt;Capability 2: delegated agent authorization (per-agent identity, rapid revocation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shadow tool execution&lt;/td&gt;
&lt;td&gt;Developers connect unauthorized tools or external APIs to an agent without centralized security oversight.&lt;/td&gt;
&lt;td&gt;Capability 1: agent and tool registry under version control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unattributable automation&lt;/td&gt;
&lt;td&gt;An agent executes a destructive action, but security teams cannot definitively prove which user or policy authorized it.&lt;/td&gt;
&lt;td&gt;Capability 5: immutable, replayable audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window poisoning&lt;/td&gt;
&lt;td&gt;Sensitive information reaches the agent's context when it shouldn't: secrets or PII in tool outputs, retrieved data, or memory shared across user sessions.&lt;/td&gt;
&lt;td&gt;Capability 4: action-layer guardrails (output filtering and redaction)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Categorizing risks by tool invocation and identity lets security leaders build active defenses that intercept malicious intent before it reaches the resource server.&lt;/p&gt;

&lt;p&gt;This taxonomy shows that defending an agentic system requires structural controls at the exact moment a tool is called. Legacy approaches that rely solely on model-level alignment or generic network firewalls don't cut it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service accounts: the universal failure mode
&lt;/h2&gt;

&lt;p&gt;Every CISO has been burned by shared service accounts. They break attribution. They block revocation. They're how a single misconfigured credential ends up holding access to half the data lake six months after the engineer who provisioned it left the company. Every agent project that ships on a service account repeats that mistake faster.&lt;/p&gt;

&lt;p&gt;The failure modes are predictable. Give the agent its own identity with broad permissions, and any user behind that agent (including an intern) can bypass their own access controls. Lock those permissions down to be safe, and the agent can't do anything useful, which is how most agent projects stall before reaching production. Let the agent inherit the user's full permissions instead, and one prompt injection cascades through every system that user can touch. Three patterns, all breaking least privilege the moment an agent acts on behalf of more than one person.&lt;/p&gt;

&lt;p&gt;The fix is not better service account hygiene. It's per-agent identity tied to the requesting user, scoped to the specific tool and action, acquired just-in-time, and revocable in isolation when that user is offboarded or compromised. This is the foundation every other governance capability rests on. The rest of the rubric assumes you've fixed this problem first.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 6-capability rubric for AI agent governance at runtime
&lt;/h2&gt;

&lt;p&gt;Neutralizing those risks requires an architecture that controls the full lifecycle of an agent action. Fragmented observability tools don't get you there. You need a unified &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; runtime that addresses all four CISO concerns through six specific capabilities.&lt;/p&gt;

&lt;p&gt;This is the rubric a CISO hands to their AI/ML team to verify before any agent is deployed to production. Skipping any capability creates a gap that an auditor or attacker will find. &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; is the reference implementation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The 6-capability rubric, organized by CISO concern&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CISO concern&lt;/th&gt;
&lt;th&gt;Capabilities&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Identity and attribution:&lt;/strong&gt; the service account problem&lt;/td&gt;
&lt;td&gt;1. Agent and tool registry under version control&lt;br&gt;2. Delegated agent authorization with scoped, just-in-time credentials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Active prevention at the tool call:&lt;/strong&gt; saying yes safely&lt;/td&gt;
&lt;td&gt;3. Centralized policy enforcement at runtime&lt;br&gt;4. Action-layer guardrails (parameter validation, rate limits, output filtering, prompt injection interception, step-up authorization for high-impact actions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Observability your SIEM can use:&lt;/strong&gt; after the fact&lt;/td&gt;
&lt;td&gt;5. Immutable, replayable audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Continuous audit-readiness:&lt;/strong&gt; the verification rubric itself&lt;/td&gt;
&lt;td&gt;6. Compliance attestation at the agent and tool plane&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Identity and attribution: the service account problem
&lt;/h3&gt;

&lt;p&gt;If service accounts are the universal failure mode, this concern is the resolution. The registry establishes which agents and tools exist; delegated authorization with scoped credentials binds every action to the specific user, tool, and scope that authorized it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 1: Agent and tool registry under version control
&lt;/h4&gt;

&lt;p&gt;A centralized agent and tool registry under strict version control is the foundation of any governance stack. The registry ensures agents can only discover and invoke vetted, approved tools, preventing shadow servers and duplicated effort across teams.&lt;/p&gt;

&lt;p&gt;Every agent, every tool, and every connected MCP server should appear in the registry with their owners, purposes, model versions, autonomy tiers, and approved user populations. If you can't produce this list on demand, your governance posture is already drifting.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 2: Delegated agent authorization with scoped, just-in-time credentials
&lt;/h4&gt;

&lt;p&gt;Treating the agent as a distinct security principal is the architectural commitment that resolves the service account problem. That means per-agent identity, scoped credentials acquired just-in-time, and a credential scope bound to a specific user, tool, and action context. Identity answers who the agent is acting as. It does not, on its own, decide whether any particular request is safe to execute. That decision is the job of policy and enforcement, which come next.&lt;/p&gt;

&lt;p&gt;Static API keys and shared service accounts break attribution and force you into all-or-nothing access decisions. Per-user, per-tool, just-in-time scoped tokens preserve least privilege without bottlenecking the agent. They also let security teams revoke access rapidly, isolating and ending a compromised agent's access instantly without impacting the broader system or shared service accounts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Active prevention at the tool call: saying yes safely
&lt;/h3&gt;

&lt;p&gt;Identity is necessary but not sufficient. The CISO needs assurance that even a correctly identified action will be blocked if it falls outside policy or requires human authorization. This concern is the active defense layer between the agent's intent and the resource server.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 3: Centralized policy enforcement at runtime
&lt;/h4&gt;

&lt;p&gt;Identity says what the agent's credentials permit. Policy says what the organization permits. Different decisions, same tool call.&lt;/p&gt;

&lt;p&gt;An agent might have valid credentials to call the trade API (Cap 2) but still be blocked by a policy that requires human approval for trades over $10K, denies trades for restricted instruments, or restricts production changes outside business hours. Centralized policy-as-code, evaluated at every tool call, keeps these business rules consistent across teams.&lt;/p&gt;

&lt;p&gt;Each decision records the exact policy version that authorized it. Without strict version pinning, you get silent compliance breaks when authorization rules are modified or rolled back. An auditor investigating an action three months after the fact must be able to replay the exact decision matrix that authorized it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 4: Action-layer guardrails
&lt;/h4&gt;

&lt;p&gt;Identity tells you who the agent is. Policy tells you whether the action is allowed. Enforcement is what actually intercepts the request and either blocks it, modifies it, escalates it to a human, or otherwise transforms it. This is the layer that catches what identity and policy don't.&lt;/p&gt;

&lt;p&gt;Pre-tool-call enforcement validates parameters and applies rate limits before the request reaches the resource server. Post-tool-call enforcement filters and redact outputs before they re-enter the agent's context window. This is where threats like tool-call hijacking and context window poisoning are caught at the exact moment a tool is called.&lt;/p&gt;

&lt;p&gt;For irreversible or high-impact actions, enforcement should escalate the request out of band for human approval. The list of actions that trigger step-up authorization includes sending external email, modifying production data, executing code, transferring money, changing permissions, deleting records, and any decision affecting employment, credit, health, or legal status. Approval thresholds scale with the agent's autonomy tier, with stricter requirements at Tier 2 and above.&lt;/p&gt;

&lt;p&gt;Relying on an agent to request permission in a chat interface is deeply flawed. Prompt injections can easily bypass these in-band checks. Process approvals out-of-band using standard protocols like JSON Web Signatures, cryptographically linking the human approval to the specific tool-call's hash and context. You're proving mathematically that a human authorized the exact payload the agent intends to send.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability your SIEM can use: after the fact
&lt;/h3&gt;

&lt;p&gt;Even with prevention in place, incidents happen. The CISO comes in after the fact and needs an audit trail that their existing SIEM can query and replay, plus a detection layer that surfaces drift before it becomes the next incident.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 5: Immutable, replayable audit trail
&lt;/h4&gt;

&lt;p&gt;Governance requires tamper-proof, replayable evidence. You need immutable audit logs that support full replay of any agent interaction.&lt;/p&gt;

&lt;p&gt;The minimum for attribution is five fields: Agent ID, User ID, Tool Call, Target System, and Timestamp. With those, you can prove who triggered which action, against which system, when. Full replay (which an auditor will ask for) requires the runtime to capture in addition to the above: Tenant, Task, Prompt Hash, Retrieved-Context References, Model Version, Policy Version, Decision, Approval, and Output Hash. All stored immutably.&lt;/p&gt;

&lt;p&gt;This stream should follow the &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai" rel="noopener noreferrer"&gt;OpenTelemetry GenAI semantic conventions&lt;/a&gt; for export to an enterprise SIEM. Include key attributes, such as the operation name and requested model, to ensure interoperability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continuous audit-readiness: the verification rubric itself
&lt;/h3&gt;

&lt;p&gt;The first three concerns address what the runtime does. This one addresses how you continuously prove it, without manual evidence assembly at audit time.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capability 6: Compliance attestation at the agent and tool plane
&lt;/h4&gt;

&lt;p&gt;Compliance attestation becomes native to the runtime. Because every action is authenticated, evaluated, and immutably logged, the system continuously generates the exact evidence required for SOC 2 Type II attestation at the agent and tool plane.&lt;/p&gt;

&lt;p&gt;The same audit stream maps to ISO/IEC 42001 management-system controls, NIST AI RMF risk functions, ISO/IEC 42005 impact assessments, EU AI Act jurisdictional obligations, OWASP LLM Top 10 risk categories, and CSA Agentic Profile autonomy classification.&lt;/p&gt;

&lt;p&gt;Governance requires an integrated runtime. Security treated as an afterthought in observability won't survive a regulator's first replay request.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability and SIEM integration
&lt;/h2&gt;

&lt;p&gt;A runtime governance layer doesn't sit parallel to your security stack. It extends the SIEM, IAM, and DLP investments you've already made. The "I already have too many tools" objection is the right one for a CISO to lead with. The answer is that the runtime is not another tool. It's a layer that extends the ones you have into the place and enforces the policies where agents actually act. Flexible, not parallel.&lt;/p&gt;

&lt;p&gt;Every agent action emits a structured event that follows the &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai" rel="noopener noreferrer"&gt;OpenTelemetry GenAI semantic conventions&lt;/a&gt;. Your security operations team queries these events in the same SIEM they already use (Datadog, Splunk, New Relic, Sumo Logic), using the same query syntax and dashboards. Identity flows from the same IdP that handles human login. Sensitive-payload detection is built on the same DLP that classifies your file shares. Nothing parallel; everything already familiar to the security team.&lt;/p&gt;

&lt;p&gt;That distinction matters at audit time. Auditors don't ask for a binder of policies and screenshots. They query the audit log for the exact action, time window, or policy version they are interested in. A runtime that emits OpenTelemetry GenAI events lets your security operations team answer that query in the tools they already use to query everything else.&lt;/p&gt;

&lt;p&gt;It also closes a compliance gap most programs hit at audit time. SOC 2 Type II or HIPAA attestation on the underlying cloud doesn't extend to the agent or tool plane unless the runtime layer is explicitly in scope. The agent plane is where the action actually happens: every tool call, every credential resolution, every policy decision. Compliance evidence has to follow the action, not stop at the infrastructure boundary. A governance runtime that ships SOC 2 Type II coverage at the agent and tool plane closes that gap directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  10 AI agent governance anti-patterns that break runtime compliance
&lt;/h2&gt;

&lt;p&gt;The right architecture matters, but knowing what breaks it matters as much. These ten traps are the patterns that render compliance efforts useless at audit time or during incident response.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. AI governance by spreadsheet
&lt;/h3&gt;

&lt;p&gt;Treating compliance as static documentation rather than a dynamic byproduct of runtime execution guarantees your security posture will drift from reality the moment code is deployed. Security controls must be expressed as code and enforced automatically, not verified manually through periodic spreadsheet updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Mutable application logs
&lt;/h3&gt;

&lt;p&gt;Storing audit trails in standard, editable relational databases exposes you to massive compliance risks. Regulators and auditors demand immutable, replayable ledgers that prove an audit trail hasn't been tampered with to hide a rogue agent's actions or a developer's mistake.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Identity collapse and shadow MCP servers
&lt;/h3&gt;

&lt;p&gt;Using a single shared API key for all users interacting with an agent breaks attribution. When a breach occurs, you can't tell which user triggered the action. Without centralized identity, shadow servers proliferate without oversight, creating invisible, ungoverned attack surfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Credentials exposed to the agent
&lt;/h3&gt;

&lt;p&gt;Storing API keys in the agent's system prompt, passing OAuth tokens as parameters the LLM can see, or letting credentials touch the agent's context window at any point creates a leak vector that no audit log can fix. A prompt injection that exfiltrates the credential is no different from one that exfiltrates user data. Credentials must be brokered by the runtime, scoped to the specific tool call, and never enter the agent's context.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. In-band chat approvals
&lt;/h3&gt;

&lt;p&gt;Delivering human-in-the-loop approval prompts within the agent's own chat interface creates a critical vulnerability. Adversarial prompt injections can forge these interfaces or trick the model into bypassing approval logic, authorizing destructive actions without genuine user consent.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Agent-side policy enforcement
&lt;/h3&gt;

&lt;p&gt;Trusting the agent to enforce its own policy is like trusting a process to enforce its own permissions. LLMs can be prompt-injected to override their own guardrails, are non-deterministic about when they apply them, and produce no audit trail of what they decided or why. Policy enforcement must sit outside the agent, deterministic and auditable. The agent calls; the runtime decides.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Decentralized policy enforcement
&lt;/h3&gt;

&lt;p&gt;When policy is enforced in multiple places (the agent's system prompt, ad-hoc rules in tool wrappers, separate policy engines per team), there's no single source of truth. Each enforcement point drifts. Each runs its own version. Auditors can't replay decisions consistently, because no one can prove which policy authorized an action three months ago. Centralized, version-pinned policy enforcement at runtime is the only way to keep agent behavior consistent across teams and to make it replayable across audits.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. No autonomy-tier classification
&lt;/h3&gt;

&lt;p&gt;Treating every agent identically, regardless of risk, forces overinvestment in low-stakes workloads while underprotecting high-impact ones. Without a clear tier classification mapped to data sensitivity and action reversibility, governance strictness can't scale with operational risk. Your security posture stays uniform when it should be proportional.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. No revoke-and-rotate workflow
&lt;/h3&gt;

&lt;p&gt;When an employee is offboarded or a credential is suspected to be compromised, you need to instantly rotate that user's tokens and revoke their delegated agent access without disrupting the rest of the user base. Architectures built on shared service accounts can't selectively revoke a single user's access, forcing security teams to choose between all-or-nothing breakage.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Compliance attestation that covers the cloud but not the agent
&lt;/h3&gt;

&lt;p&gt;A SOC 2 or HIPAA certificate on your cloud provider is not a certificate on your agent fleet. Many programs only discover this gap mid-audit, when the auditor asks for evidence of agent actions, policy decisions, and approvals, and the answer is "our cloud is certified," which doesn't address the question. The agent plane needs its own attestation scope, or it will remain inadmissible no matter how many infrastructure-layer reports you produce.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation patterns for AI agent governance in regulated industries
&lt;/h2&gt;

&lt;p&gt;Those are the failure modes. The flip side is what running the framework actually looks like in production, across highly regulated environments with distinct autonomy requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare (HIPAA): Tier 1 approval and PHI logging
&lt;/h3&gt;

&lt;p&gt;Consider a Tier 1 clinical note-summarization agent deployed under HIPAA. The contractual layer (Business Associate Agreements between covered entities and processors) sits outside the runtime, but the obligations it creates live within it: strict data boundaries, demonstrable PHI access controls, and an audit trail that proves who accessed what, when, and under whose authority.&lt;/p&gt;

&lt;p&gt;Before any summarized note is committed back to an electronic health record system, the framework requires human-in-the-loop approval. The runtime logs the physician's cryptographic signature alongside the exact prompt hash, the retrieved patient context, and the output hash. Every instance of access to Protected Health Information is immutably logged.&lt;/p&gt;

&lt;p&gt;This pattern exercises three CISO concerns simultaneously: identity and attribution (the physician is the security principal), active prevention at the tool call (no PHI write without signed approval), and observability that the SIEM can use (every access is replayable). The contract still has to be signed, but the evidence to demonstrate it is generated automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Financial services: Tier 2 bounded trade execution
&lt;/h3&gt;

&lt;p&gt;Consider a Tier 2 bounded agent for compliant trade execution. The agent operates autonomously, but only within the pre-approved scope defined by policy-as-code.&lt;/p&gt;

&lt;p&gt;A trader might ask the agent to rebalance a portfolio based on specific market signals. When the agent attempts the tool call to the trading API, the runtime intercepts the request and evaluates it against trading limits and risk parameters. The system records the exact policy version used to make the decision. If an auditor or regulator questions a trade later, you can replay the exact decision matrix, from the initial user prompt hash through the policy evaluation to the final output hash.&lt;/p&gt;

&lt;p&gt;This pattern leans hardest on active prevention at the tool call (policy-as-code enforcement, version-pinned) and continuous audit-readiness (replayable decision matrices). Identity is the trader; attribution is the policy version. A regulator asking "why was this trade allowed?" has a deterministic answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-industry: enterprise offboarding (Tier 1–2)
&lt;/h3&gt;

&lt;p&gt;For large enterprises and public sector deployments running Tier 1 to Tier 2 agents, strict data residency and identity lifecycle management are most important.&lt;/p&gt;

&lt;p&gt;Consider an employee undergoing an unexpected HR offboarding event. In a traditional setup with shared API keys, revoking access without breaking the agent for other users is nearly impossible. With an integrated runtime treating the agent as a distinct security principal tied to the user's identity, the offboarding event automatically triggers token rotation. The system revokes that specific user's delegated agent access instantly, ending in-flight operations and neutralizing the credential risk. No disruption to the rest of the organization.&lt;/p&gt;

&lt;p&gt;This pattern is the clearest demonstration of identity and attribution doing their jobs. The service account problem is the failure mode that this prevents; per-agent identity tied to the requesting user is the resolution. If your runtime can't pass an offboarding fire drill in under a minute, the rest of the rubric doesn't matter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where each vendor category fits the agent governance rubric
&lt;/h2&gt;

&lt;p&gt;The AI security market is highly fragmented. Enterprise architects end up stitching together disparate tools that were never designed for autonomous agents. The vendor landscape sorts into categories that Arcade integrates with, displaces at the agent action layer, or treats as out of scope. The difference matters.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vendor categories and their relationship to Arcade&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architectural layer&lt;/th&gt;
&lt;th&gt;Example vendors&lt;/th&gt;
&lt;th&gt;Relationship to Arcade&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SIEM and observability platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Datadog, Splunk, New Relic, Sumo Logic&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Feed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Arcade exports OpenTelemetry GenAI events. Your security operations team queries them in the same SIEM they already use, with the same query syntax. The SIEM stays; the runtime sends it the agent-action layer it currently can't see.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Policy engines and FGA platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open Policy Agent, Cedar, OpenFGA, Oso (Polar DSL), WorkOS FGA&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Define and evaluate fine-grained authorization rules. The runtime integrates with your policy engine and enforces those rules at the agent action layer, applying them in the per-user, per-tool, per-action context where agents actually act.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GRC platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vanta, Drata, Secureframe&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Map theoretical controls and automate attestation paperwork. Don't govern the actual API tool calls an agent makes. Arcade integrates with your GRC platform and enforces those controls on every tool call. GRC declares; Arcade enforces.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity providers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Okta, Auth0, WorkOS, Clerk&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Authenticate the human user. Stop at the human login boundary. Arcade brokers delegated agent tokens against the same IdP and extend identity into the agent action plane.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP gateways and integration wrappers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Composio&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Displaces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connect language models to tools for rapid prototyping. Lacks enterprise-grade identity isolation, just-in-time consent, out-of-band approval routing, and immutable audit at the agent plane.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent frameworks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LangChain, Mastra&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operate at the reasoning layer (deciding what the agent should do). Arcade governs the underlying action layer, decoupled from the framework. Pick a framework for reasoning and a runtime for action. They combine.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP runtimes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;The unifying layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ships the complete 6-capability rubric natively. SOC 2 Type II coverage extends from the underlying cloud through to every tool call an agent makes.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Relying on a patchwork of these vendor classes leaves significant security gaps and integration liabilities. A unified MCP runtime brings agent registration, per-user authorization, policy-as-code enforcement, immutable audit, and runtime attestation under a single, cohesive operating model. It extends the SIEM, IdP, and DLP investments your security team already runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next steps to migrate to an MCP runtime for agent governance
&lt;/h2&gt;

&lt;p&gt;Closing the gap between governance policy and runtime enforcement is a concrete engineering exercise. Four moves get you most of the way:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit your current agent and tool registry.&lt;/strong&gt; Inventory every agent, every connected tool, every shadow MCP server, and every shared service account. If you can't produce an authoritative list with owner, purpose, model version, autonomy tier, and approved users in under a day, your governance posture is already drifting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop building bespoke audit infrastructure.&lt;/strong&gt; Custom event-bus schemas, mutable application logs masquerading as audit trails, hand-rolled OpenTelemetry pipelines for agent traces. This is undifferentiated technical debt. Your engineers should ship governed agents, not maintain compliance plumbing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test revoke-and-rotate aggressively.&lt;/strong&gt; Run an offboarding fire drill with a real test user. Verify that a single offboarding event rotates that user's tokens, terminates in-flight agent operations on their behalf, and leaves every other user's workflow undisturbed. If the workflow can't do this in under a minute, your runtime can't survive a real credential incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluate an MCP runtime.&lt;/strong&gt; Look for a runtime that ships the six governance capabilities natively, with SOC 2 Type II attestation that covers the agent and tool plane, not just the underlying cloud.&lt;/p&gt;

&lt;p&gt;Stitching together passive observability tools and standalone policy engines can't satisfy this requirement. &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; is the first MCP runtime to ship the complete agent governance rubric natively (runtime enforcement plus immutable, replayable audit), with SOC 2 Type II that extends from the underlying cloud to every tool call an agent makes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently asked questions (FAQ)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is AI agent governance, and how is it different from LLM governance?
&lt;/h3&gt;

&lt;p&gt;AI agent governance controls and proves what an agent can &lt;em&gt;do&lt;/em&gt;, especially tool/API calls, using runtime policy enforcement, identity, and immutable audit trails. LLM governance often focuses on model behavior and outputs rather than execution-layer actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why isn't logging enough for AI agent compliance?
&lt;/h3&gt;

&lt;p&gt;Logs are passive and occur after the fact. They can't stop an undesired tool call or prompt-injection-driven action. Regulated environments require deterministic, pre-execution enforcement plus tamper-proof, replayable evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does an MCP runtime replace our SIEM?
&lt;/h3&gt;

&lt;p&gt;No. It extends it. Every agent action emits an OpenTelemetry GenAI event into the same SIEM your security operations team already queries (Datadog, Splunk, New Relic, Sumo Logic). The runtime extends your SIEM into the agent action plane; it doesn't displace it. The same model applies to your IdP (Okta, Entra, etc.) and DLP. The runtime extends those investments rather than running parallel to them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this lock us into a specific agent framework?
&lt;/h3&gt;

&lt;p&gt;No. A runtime governance layer is decoupled from the agent framework. LangChain, Mastra, your own in-house framework: whichever your AI/ML team picks for agent reasoning, the runtime governs the action layer underneath it the same way. Frameworks decide what the agent should do; the runtime governs whether and how it gets to do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "runtime enforcement at the tool-call boundary" mean?
&lt;/h3&gt;

&lt;p&gt;Every tool/API request is intercepted, evaluated against policy-as-code, and either blocked, modified (e.g., redacted), or allowed before it reaches the resource server. Then it's logged with the decision and policy version.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I choose the right autonomy tier for an agent?
&lt;/h3&gt;

&lt;p&gt;Classify autonomy by data sensitivity, reversibility of actions, and potential legal/customer impact. Use Tier 1 (fully supervised, human approval per action) for high-stakes irreversible workloads. Tier 2 (constrained autonomy within pre-approved scope) is the default for the enterprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the minimum audit log fields required for agent governance?
&lt;/h3&gt;

&lt;p&gt;See the canonical schema in Capability 5 above: Agent ID, User ID, Tool Call, System, and Timestamp at minimum. Stored immutably. Richer fields (prompt hash, retrieved-context references, policy version, output hash) enable full replay.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is "policy version pinning" and why does it matter?
&lt;/h3&gt;

&lt;p&gt;Policy version pinning records the exact policy version that authorized a specific action at that time. It prevents "silent compliance breaks" when policies change and enables auditors to accurately replay historical decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why are in-chat human approvals unsafe for agents?
&lt;/h3&gt;

&lt;p&gt;In-band chat approvals can be spoofed or bypassed via prompt injection. Use out-of-band approvals (e.g., signed approvals bound to the tool-call hash) to cryptographically prove a human authorized the exact payload.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "agent-as-a-security-principal" mean?
&lt;/h3&gt;

&lt;p&gt;Each agent gets its own identity and scoped credentials tied to the requesting user and tenant. This enables least privilege, clear attribution, and rapid revocation without relying on shared API keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can we just use a service account per agent?
&lt;/h3&gt;

&lt;p&gt;No. Shared or pooled service accounts break attribution (you can't tell which user triggered an action), block selective revocation (you can't rotate one user's access without breaking everyone's), and force all-or-nothing access decisions. The category requires per-agent identity tied to the requesting user, scoped to specific tools and actions, acquired just-in-time, and revocable in isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does this map to the EU AI Act, NIST AI RMF, and ISO 42001?
&lt;/h3&gt;

&lt;p&gt;Those frameworks require traceability, monitoring, risk controls, and oversight. The runtime governance stack implements them operationally via identity, policy-as-code, immutable logs, HITL, and continuous assurance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Our cloud provider is SOC 2 Type II certified. Isn't that enough?
&lt;/h3&gt;

&lt;p&gt;No. Cloud attestation doesn't extend to the agent and tool plane unless the runtime layer is explicitly in scope. Auditors will ask for evidence of every agent action, every policy decision, and every approval. If your stack only attests at the infrastructure layer, the agent plane is unattested and inadmissible, regardless of your cloud provider's certificate.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the most common anti-patterns in agent governance?
&lt;/h3&gt;

&lt;p&gt;Ten patterns break runtime compliance: spreadsheet governance, mutable logs, identity collapse with shadow MCP servers, credentials exposed to the agent, in-band chat approvals, agent-side policy enforcement, decentralized policies, no autonomy-tier classification, no revoke-and-rotate workflow, and compliance attestation that covers the cloud but not the agent or tool plane. Each one breaks auditability or allows unauthorized actions to slip through.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>mcp</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>6 Signs Your In-House AI Agents Need an MCP Runtime</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 09 Jun 2026 20:44:55 +0000</pubDate>
      <link>https://dev.to/arcade/when-ai-agents-need-mcp-runtime-431p</link>
      <guid>https://dev.to/arcade/when-ai-agents-need-mcp-runtime-431p</guid>
      <description>&lt;p&gt;Someone on your revenue operations team got tired of nagging account executives about CRM hygiene. So they wired up an agent. Salesforce has an MCP server, the model can call tools, and the workflow is obvious: take the meeting transcript, pull out the next steps, update the opportunity, log the activity, push a follow-up task. An afternoon of work, one API token in a &lt;code&gt;.env&lt;/code&gt; file, and the thing runs.&lt;/p&gt;

&lt;p&gt;It works. AEs stop complaining. The demo gets passed around. Within a week, two other teams want the same thing for Zendesk and Jira, and you have quietly become the owner of production Agentic AI inside the company.&lt;/p&gt;

&lt;p&gt;Then it stops being an afternoon project. Not because the agent got worse, but because the moment it acts on behalf of other people, every shortcut that made the prototype fast turns into a question you cannot answer with a &lt;code&gt;print()&lt;/code&gt; statement.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You need an MCP runtime for your AI Agents when auth, permissions, audit logs, integrations, reuse, or risk ownership start moving out of the prototype phase.&lt;/li&gt;
&lt;li&gt;MCP standardizes tool connections, but it does not, by itself, solve production governance.&lt;/li&gt;
&lt;li&gt;An MCP runtime centralizes identity, policy, tool execution, and evidence so that you do not need to rebuild those layers from scratch for deploying AI agents in production&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The wall is predictable, not a failure
&lt;/h2&gt;

&lt;p&gt;You built the right thing. The prototype-first path is the correct first move. Prototypes are cheap to assemble once a model can call tools and an &lt;a href="https://dev.to/blog/announcing-native-support-for-mcp-servers"&gt;MCP server&lt;/a&gt; can expose capabilities, and a small team can tolerate a narrow happy path. Every team now running agents at scale started exactly where you are, with one workflow, one tenant, light usage, and a forgiving risk posture.&lt;/p&gt;

&lt;p&gt;The first version works because it quietly cheats. The engineer who built it is the security boundary. They know which records the agent can touch, they wrote the prompt, and they hold the token. There is no question of "what should this agent be allowed to do," because the answer is "whatever I, the builder, can do." That assumption holds right up until the agent is doing things for people who are not you.&lt;/p&gt;

&lt;p&gt;Six signs separate a working prototype from something that needs real infrastructure. None is exotic. You will recognize your own repo in most of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 1: You're writing more auth and login plumbing than agent logic
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The identity layer: proving who is actually calling the tools.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; your &lt;code&gt;auth/&lt;/code&gt; directory is bigger than your &lt;code&gt;tools/&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;Look at your repository. The &lt;code&gt;tools/&lt;/code&gt; directory grew a sibling &lt;code&gt;auth/&lt;/code&gt; directory, and &lt;code&gt;auth/&lt;/code&gt; is now bigger. Standups have shifted from "how do we improve the agent" to "why did this user's refresh token fail" and "which account is the agent using." A new engineer's first ticket is "add Slack," and it takes two weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;Acting agents sit in the hard middle between a user and a downstream API, which forces you to own &lt;a href="https://dev.to/blog/ai-agent-authentication-authorization"&gt;multi-user AI agent authentication and authorization&lt;/a&gt; mechanics you used to get for free, and enterprise APIs do not share an identity standard. &lt;a href="https://docs.slack.dev/authentication/using-token-rotation/" rel="noopener noreferrer"&gt;Slack rotates access tokens every 12 hours&lt;/a&gt;; &lt;a href="https://docs.github.com/en/apps/creating-github-apps/setting-up-a-github-app/best-practices-for-creating-a-github-app" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; expires installation tokens in an hour and refresh tokens in six months; &lt;a href="https://learn.microsoft.com/en-us/graph/permissions-overview" rel="noopener noreferrer"&gt;Microsoft Graph&lt;/a&gt; splits delegated from app-only access with its own consent model. Implementing one is a week. Implementing five, with refresh, rotation, revocation, and per-user storage, is a sustained quarter. Get the concurrency wrong and two threads refresh the same single-use token at once, the provider reads it as a replay attack, and the user is locked out.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;Trace it through the Salesforce agent. The first version runs on one static admin token. Then the sales director wants updates recorded under the rep who was on the call, so you build per-user OAuth with a background worker to store, encrypt, and refresh tokens (Salesforce access tokens expire in two hours). Then security asks what happens when an AE leaves, so revocation has to tie into your IdP's deprovisioning. Each request is reasonable. Together, they are an IAM client you never set out to build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; the prototype needed a login; the production agent needs an identity model, especially once enterprise teams expect &lt;a href="https://dev.to/blog/sso-for-ai-agents-authentication-and-authorization-guide"&gt;SSO for AI agents&lt;/a&gt; to work like the rest of their software stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 2: Your permissions are a growing pile of hand-maintained if-then rules
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The policy layer: deciding what they're allowed to do.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; nobody can say what the agent is allowed to do without reading the code.&lt;/p&gt;

&lt;p&gt;Sign 1 was authentication: proving who is asking. This is authorization, the harder half: deciding what they are allowed to do, and in production, agents usually mean a &lt;a href="https://dev.to/blog/ai-agent-authentication-authorization"&gt;delegated authorization stack&lt;/a&gt; that evaluates the user, the agent, and the action together. It starts with one clean check, updates the record only if the signed-in rep can edit it, and then the rules multiply. Update &lt;code&gt;Stage&lt;/code&gt;, but only with the "Pipeline Manager" permission set. Closed-won updates need manager approval. EMEA is exempt. SDRs can edit notes but not the advanced stages. Each is a defensible business need. Together, they are configuration hell, accreting in one file: &lt;code&gt;permissions.py&lt;/code&gt; fills with branches and comments like &lt;code&gt;# do NOT remove, breaks renewals team&lt;/code&gt;, and new permission requests take a sprint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;The problem is structural: authorization depends on subject, object, and context, and inline conditionals collapse those dimensions into procedural mush. &lt;a href="https://csrc.nist.gov/pubs/sp/800/162/upd2/final" rel="noopener noreferrer"&gt;NIST's ABAC guidance&lt;/a&gt; exists for exactly this reason, and tools like &lt;a href="https://www.openpolicyagent.org/docs/latest/" rel="noopener noreferrer"&gt;Open Policy Agent&lt;/a&gt; externalize policy to keep it out of application code.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;Salesforce is the cautionary tale. Its &lt;a href="https://developer.salesforce.com/docs/atlas.en-us.securityImplGuide.meta/securityImplGuide/security_data_sharing.htm" rel="noopener noreferrer"&gt;permission model&lt;/a&gt;, profiles, permission sets, sharing rules, field-level security, and more, is two decades of mature hand-maintained authorization. An agent re-implementing a slice of that in Python is starting the same journey with a fraction of the staff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; at first, if-statements are the fastest way to encode context; later, they are an undocumented policy system, and a single wrong branch has blast radius across every tenant the agent touches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 3: You need agent audit logs for every tool call
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The evidence layer: reconstructing what actually happened.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; you can't reconstruct who did what after the fact.&lt;/p&gt;

&lt;p&gt;Suppose the permission rules are right. You still cannot prove they were followed. The clearest version of this sign is a Slack thread:&lt;/p&gt;

&lt;p&gt;"Hey, did the bot just close that opp?"&lt;br&gt;
"I think so?"&lt;br&gt;
"Can you check?"&lt;br&gt;
"The logs rolled over."&lt;/p&gt;

&lt;p&gt;That conversation is the finding. When something looks wrong, you need to answer fast: which run did it, who authorized it, what was the input, what changed downstream, and was there an approval? If you cannot, you do not have guardrails. You have an opinionated wrapper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;An auditable action comprises at least five facets that must be recorded together: the requesting user, the agent identity, the authorization decision, the input, and the resulting change. Ad-hoc logging captures one or two, and they live in different systems. Salesforce &lt;a href="http://developer.salesforce.com/docs/atlas.en-us.field_history_retention.meta/field_history_retention/field_audit_trail.htm" rel="noopener noreferrer"&gt;Field History&lt;/a&gt; has the state change but not the reasoning; the LLM trace has the reasoning but not the change; nothing correlates them. Guardrails are point-in-time controls; audit trails are durable evidence, and acting systems need both. &lt;a href="https://docs.slack.dev/admins/audit-logs-api/" rel="noopener noreferrer"&gt;Slack's Audit Logs API&lt;/a&gt; gives you actor, action, entity, and context, but explicitly will not tell you whether the action was appropriate.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;When finance flags a deal at quarter close because the amount was moved after the close date, you can see the new value but cannot determine who changed it, on whose behalf, or based on what input. And the moment the agent mutates regulated data the question stops being internal: &lt;a href="https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-C/section-164.312" rel="noopener noreferrer"&gt;HIPAA&lt;/a&gt; at 45 CFR §164.312(b) requires systems handling ePHI to record and examine activity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; "we have the LLM transcript" is not an answer an auditor accepts. What they need instead is &lt;a href="https://dev.to/blog/connect-ai-agents-enterprise-tools"&gt;audit logs and telemetry for every tool call&lt;/a&gt;: who requested it, which tool ran, what changed, and how the action was authorized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 4: Every new system multiplies the work instead of adding to it
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The integration layer: running one action across many systems.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; the fifth connector costs more than the first, not less.&lt;/p&gt;

&lt;p&gt;Everything so far has been one agent against essentially one system. Then the roadmap arrives: "add Gmail, Calendar, Zendesk, Jira, Slack, and Salesforce." You budget for six connectors and price them roughly equal. Instead you get six different auth models, scope vocabularies, rate-limit behaviors, schemas, pagination styles, and audit surfaces. Adding Slack should have been easier than adding Salesforce. It was not. The first integration took two weeks; the fifth took five.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;You did not add six tools, you added six governance surfaces, and each one drags the earlier signs in behind it: another identity model to wire (Sign 1), another permission surface to encode (Sign 2), another audit stream to correlate (Sign 3). Every tool you bolt on imports a full instance of each. It gets worse when an agent composes across systems, because a single logical action (read from Calendar, look up in Salesforce, post to Slack) has to reconcile three identity propagations, three permission checks, three rate limits, and three failure modes within a single operation. This is the connector-count fallacy, and it is exactly the problem the &lt;a href="https://dev.to/blog/mcp-gateway-pattern"&gt;MCP gateway pattern&lt;/a&gt; is meant to avoid.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;The rate limits alone will stop a roadmap. &lt;a href="http://learn.microsoft.com/en-us/graph/throttling" rel="noopener noreferrer"&gt;Microsoft Graph&lt;/a&gt; caps you at four concurrent requests per mailbox, a ceiling that bites harder once &lt;a href="https://techcommunity.microsoft.com/blog/exchange/exchange-online-ews-your-time-is-almost-up/4492361" rel="noopener noreferrer"&gt;Exchange Web Services retires on October 1, 2026&lt;/a&gt; and its far roomier limit (27 connections) goes with it. Add Outlook so the agent can schedule follow-ups, and the first time it reads inbox threads while booking a meeting it trips that limit and starts collecting 429s. The roadmap stops while you build a centralized queue and rate limiter the prototype never needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; one more tool is not additive; it multiplies against everything already connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 5: Each new team rebuilds the same infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The reuse layer: amortizing the work across agents and teams.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; the next team forks nothing and starts from zero.&lt;/p&gt;

&lt;p&gt;Sign 4 was the cost of adding tools to one agent. This is the cost of adding agents to the company, the same multiplication seen from the other axis. The sales ops agent ships, after months of security clearance, token storage code, and custom audit logging. A month later the support team wants an agent that pulls Salesforce records when a Zendesk ticket opens. They look at the sales team's repo and start over. The auth is entangled with one queueing model, one set of scopes, one audit sink. Their logging assumes Salesforce. Nothing lifts cleanly, so both teams now maintain parallel auth code, and the security team has reviewed two patterns for the same risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;By the third agent (customer success wants a renewal-risk updater, finance wants an invoice assistant), you have three implementations of the same core layers (identity, policy, integration, evidence) and three separately approved patterns for the same risk. Put formally, you are solving an &lt;em&gt;N&lt;/em&gt; × &lt;em&gt;M&lt;/em&gt; problem by hand: &lt;em&gt;N&lt;/em&gt; agents, each rebuilt against &lt;em&gt;M&lt;/em&gt; systems. None of those layers is agent-specific, but in every repo they were written application-specific, so there is no interface to extract. A shared layer collapses the problem to &lt;em&gt;N&lt;/em&gt; + &lt;em&gt;M&lt;/em&gt;, where each connector is built once and every agent inherits it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;This is the canonical platform-engineering trigger, and the industry has run the play before. &lt;a href="http://engineering.atspotify.com/opensource" rel="noopener noreferrer"&gt;Spotify built Backstage&lt;/a&gt; because its engineers were drowning in fragmented tooling, and Netflix calls this same idea the "paved road." An MCP runtime acts as this exact shared substrate for AI agents. By providing a centralized control plane and a shared registry for team-level access, the runtime ensures that identity, policy, evidence, and integrations are built once. Every new agent simply connects to the runtime and inherits this infrastructure, making the safe path the easy one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; copying the first agent feels faster, right up until every copy inherits a private auth stack, permission model, and audit story. A centralized MCP runtime collapses this into a single integration point that every new team and agent can safely reuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sign 6: Sensitive or legacy systems are entering scope, and nobody wants to personally own the risk
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The ownership layer: deciding who carries the risk.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You've crossed it when:&lt;/strong&gt; the pull request to a sensitive system sits open because no one will approve it.&lt;/p&gt;

&lt;p&gt;This sign is psychological before it is technical, and that is the point. You were fine letting the agent draft notes and update low-risk CRM fields. You are not fine pointing the same stack at payroll, refunds, or the ERP. A pull request to give it write access to NetSuite or Workday sits open. Reviewers comment but will not approve. The engineer asks security for sign-off; security asks the engineer. Nothing ships.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;That hesitation is correct, and notice what it is not about. The earlier signs were about building the mechanics, and by now you have most of them. This one is about who answers for the outcome when those mechanics touch something irreversible. A Salesforce note is recoverable in minutes; a journal entry in NetSuite hits the general ledger. These systems carry formal control expectations: &lt;a href="https://learn.microsoft.com/en-us/dynamics365/fin-ops-core/fin-ops/sysadmin/set-up-segregation-duties" rel="noopener noreferrer"&gt;Dynamics 365 ties segregation of duties to SOX, IFRS, and FDA controls&lt;/a&gt;. "The agent probably did the right thing" is not part of the operating model there. Legacy systems sharpen it further, since they often lack what makes a bad write survivable: no fine-grained permissions, no auditable API, no transactional undo.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it plays out
&lt;/h3&gt;

&lt;p&gt;When the lead architect is asked to point the agent at a legacy financial database, the question is not "can I build the connection?" It is "if a malicious email steers this agent into the wrong write, the damage cannot be undone, and the name on the change is mine." Blocking that deploy is a rational refusal to personally absorb an institutional risk. This is where an MCP runtime steps in. By providing features like mandatory out-of-band human approvals ("read, draft, and commit"), contextual access policy hooks, and immutable OpenTelemetry-compatible audit logs, the runtime shifts the burden of trust from the developer to secure, verifiable infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; when accountability exceeds what one person can absorb, the work requires institutional ownership through an MCP runtime. With versioned policy, retained audit logs, routed approvals, and credentials kept entirely out of the LLM execution environment, the engineer's name is on the code, not on the risk of the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern behind the signs
&lt;/h2&gt;

&lt;p&gt;These are not six unrelated problems. They are one problem wearing six masks. You set out to build an agent and ended up hand-building a runtime, one feature at a time, without the architecture to hold it together. The auth daemon, the growing &lt;code&gt;permissions.py&lt;/code&gt;, the scattered logs, the per-connector rate limiter, the copy-pasted glue, the deploy nobody will approve: each is a piece of execution infrastructure that should exist once and apply to every agent, reinvented inside a single application instead. Identity, policy, evidence, integration, reuse, ownership: six names for the same missing layer.&lt;/p&gt;

&lt;p&gt;An &lt;a href="https://dev.to/blog/mcp-gateways-runtimes-registries-guide"&gt;MCP runtime&lt;/a&gt; is that missing layer. Not a framework for building agents, and not a platform that hosts them. It is the standard execution layer agents act through, where those six concerns live once, as infrastructure, the same way a language runtime or a container runtime is not something application code opts into so much as the substrate it cannot act without. The agent proposes; the runtime authenticates the call, enforces policy, executes the tool, and records what happened.&lt;/p&gt;

&lt;p&gt;Adopt one and your effort moves from building security boundaries to designing what the agent should actually do. The six concerns become properties of the layer rather than per-agent plumbing, and the next team inherits the safe path rather than rebuilding it. &lt;a href="https://dev.to/"&gt;Arcade.dev&lt;/a&gt;, the MCP runtime, is built for exactly this. It delivers per-action authorization that evaluates the intersection of agent and user permissions at the moment of the call (with credentials kept out of the model), a catalog of 8,000+ agent-optimized MCP tools that translate intent into safe API calls instead of letting the model hallucinate parameters, and centralized lifecycle governance with an OpenTelemetry-compatible audit record per user per service. How you get a runtime, build or buy, is its own conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick checklist: Have you outgrown the prototype?
&lt;/h2&gt;

&lt;p&gt;You do not need a runtime the first time an agent works. You need one when the agent becomes important enough that the surrounding questions matter as much as the prompt.&lt;/p&gt;

&lt;p&gt;Run the checklist against the agent you already have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It acts on behalf of more than one user.&lt;/li&gt;
&lt;li&gt;It uses per-user OAuth instead of one developer-owned token.&lt;/li&gt;
&lt;li&gt;It can write to systems of record, not just read from them.&lt;/li&gt;
&lt;li&gt;Permissions depend on role, team, region, record owner, approval state, or business context.&lt;/li&gt;
&lt;li&gt;You cannot reconstruct every tool call from request to downstream change.&lt;/li&gt;
&lt;li&gt;Adding a new connector means rebuilding auth, scopes, rate limits, retries, and audit behavior.&lt;/li&gt;
&lt;li&gt;A second team is copying the first agent rather than reusing the shared infrastructure.&lt;/li&gt;
&lt;li&gt;Sensitive systems such as payroll, refunds, ERP, finance, healthcare, and customer data are coming into scope.&lt;/li&gt;
&lt;li&gt;Security, legal, or compliance has started asking who approved the action, not just whether the code works.&lt;/li&gt;
&lt;li&gt;A pull request is stalled because everyone agrees the agent is useful, but nobody wants to personally own the risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you checked one or two, you may still be in prototype territory. If you checked three or more, the agent is probably no longer the hard part. The missing layer around it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; once the questions are about identity, permission, evidence, reuse, and ownership, you are no longer debugging an agent. You are discovering the runtime it needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  You've crossed the threshold
&lt;/h2&gt;

&lt;p&gt;If these signs sound like your standups, your repo, and your stalled pull requests, the conclusion is simple: you have outgrown the DIY approach. Not because you built it wrong, but because you built it well enough to hit the same wall that web apps hit before centralized identity, that deployments hit before CI/CD, and that infrastructure hit before container orchestration. The artifact that resolved each of those was the same shape every time: extract the execution layer. You are not late. You are exactly on time for the transition; every infrastructure category before this one has already been made.&lt;/p&gt;

&lt;p&gt;How you get that runtime, whether you &lt;a href="https://dev.to/blog/mcp-runtime-build-vs-buy"&gt;build or buy an MCP runtime&lt;/a&gt;, and how to evaluate the options, is the next conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;An MCP runtime is the governed execution layer for agents that use Model Context Protocol tools. It sits between the agent and the MCP servers it calls, handling identity, authorization, tool execution, credential isolation, policy enforcement, and audit logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do AI agents need a runtime?
&lt;/h3&gt;

&lt;p&gt;AI agents need a runtime when they move from prototypes to production. MCP helps agents connect to tools, but teams still need a governed layer to decide who the agent is acting for, what it is allowed to do, how credentials are protected, and how each action is recorded.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is MCP itself a runtime?
&lt;/h3&gt;

&lt;p&gt;No. MCP standardizes how agents connect to tools and context. An MCP runtime governs what happens when those tools are used, including authorization, credential handling, policy checks, approvals, retries, rate limits, and audit trails.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should a team use an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;A team should use an MCP runtime when an agent acts on behalf of multiple users, connects to sensitive systems, writes to systems of record, requires per-user OAuth, needs audit logs, or is being reused across multiple teams and workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is an MCP runtime different from an MCP server?
&lt;/h3&gt;

&lt;p&gt;An MCP server exposes tools, resources, or prompts to an agent. An MCP runtime governs the execution of those tools in production. The server defines what is available. The runtime controls who can use it, under what policy, with which credentials, and with what audit record.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is an MCP runtime different from an MCP gateway?
&lt;/h3&gt;

&lt;p&gt;An MCP gateway primarily federates tools from multiple MCP servers into a single endpoint for simplified routing and single-URL configuration. While useful for connectivity, a gateway just routes requests. An MCP runtime is a complete execution layer that goes beyond routing to include delegated multi-user authorization, intent-level tool execution, contextual policy enforcement, and immutable audit logging. A gateway routes; a runtime executes, enforces, and audits.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does an MCP runtime improve security?
&lt;/h3&gt;

&lt;p&gt;An MCP runtime improves security by separating the agent from raw credentials, enforcing per-user and per-action authorization, limiting tool access, routing sensitive actions through policy checks, and recording what happened for every tool call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should companies build or buy an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;Build an MCP runtime only if your agent is single-user, your APIs are fully internal, or your agent infrastructure is your core product. For multi-user production agents that need OAuth, credential vaulting, permissions, audit logs, or SaaS integrations, buying a runtime usually lets the team ship faster while avoiding the need to own permanent infrastructure.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>How to manage multi-user AI agent authentication and authorization in 2026 (OAuth 2.1, OIDC, and delegated access)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 14 May 2026 20:18:23 +0000</pubDate>
      <link>https://dev.to/arcade/how-to-manage-multi-user-ai-agent-authentication-and-authorization-in-2026-oauth-21-oidc-and-2943</link>
      <guid>https://dev.to/arcade/how-to-manage-multi-user-ai-agent-authentication-and-authorization-in-2026-oauth-21-oidc-and-2943</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR: multi-user AI agent authentication and authorization in 2026
&lt;/h2&gt;

&lt;p&gt;Moving AI agents from single-user desktop demos to enterprise production means solving a brutal engineering problem: multi-user, multi-system delegated authorization.&lt;/p&gt;

&lt;p&gt;Security architects and lead AI engineers are now dealing with agents that execute complex workflows across critical infrastructure on behalf of thousands of concurrent users.&lt;/p&gt;

&lt;p&gt;The core design principle is non-negotiable: treat every agent action as delegated user access, never as the agent's own blanket access. The whole authorization stack falls out of that distinction. Nine capabilities, two identities, one strict intersection rule.&lt;/p&gt;

&lt;p&gt;This guide breaks down how to combine OpenID Connect, OAuth 2.1, and a managed Model Context Protocol (MCP) runtime like &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; to prevent tool misuse, data leakage, and excessive agency. It's built for identity and access management leads, security architects, and AI engineering leads who need the exact infrastructure requirements to safely deploy multi-user agents into production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Threat model for multi-user AI agents: prompt injection, tool misuse, and confused deputy
&lt;/h2&gt;

&lt;p&gt;You can't engineer secure authorization without defining the threat model first. For large language models, the most dangerous attack vector runs from prompt injection straight to tool misuse.&lt;/p&gt;

&lt;p&gt;If an enterprise agent inherits blanket admin access to a backend system, a single poisoned RAG document or malicious prompt can weaponize that agent. An attacker instructs the model to scan an inbox, summarize sensitive financial data, and exfiltrate the payload via an external tool call. The whole exfil chain completes without a human in the loop.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/" rel="noopener noreferrer"&gt;Open Web Application Security Project highlights these vulnerabilities&lt;/a&gt; in its updated guidelines, citing &lt;a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/" rel="noopener noreferrer"&gt;prompt injection&lt;/a&gt; and &lt;a href="https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM06_ExcessiveAgency.md" rel="noopener noreferrer"&gt;excessive agency&lt;/a&gt; as primary risks that lead directly to the confused deputy problem.&lt;/p&gt;

&lt;p&gt;In a &lt;a href="https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./" rel="noopener noreferrer"&gt;confused deputy attack&lt;/a&gt;, an application gets tricked into misusing its inherited authority.&lt;/p&gt;

&lt;p&gt;There's a second class of attack that targets the authorization flow itself. An attacker who can intercept or guess the identifier for a pending OAuth authorization can redirect the consent step to their own browser, either capturing the user's grant or seeding the agent with credentials it shouldn't have. Treating every first-time tool authorization as a step that must be cryptographically bound to a verified app user is the only durable defense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-identity model for agent authorization
&lt;/h2&gt;

&lt;p&gt;Engineering teams typically make one of two mistakes when designing agent authorization. Give the agent its own identity, and an intern can bypass their permissions through the agent. Inherit the user's full access, and a single prompt injection cascades through every connected system.&lt;/p&gt;

&lt;p&gt;The right answer is the intersection: what this agent is allowed to do AND what this user is allowed to do, evaluated per action, at runtime.&lt;/p&gt;

&lt;p&gt;Effective authorization in agentic systems requires every request to carry two identity layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The project-level key (the agent application):&lt;/strong&gt; The workload identity making the call. Registered as an OAuth client, scoped to the application running the agent logic.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The user-level identity (on whose behalf the action is taken):&lt;/strong&gt; The actual person requesting the action, authenticated via a protocol like OpenID Connect, and represented in the request as a delegated subject.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The runtime evaluates these two identities against a &lt;em&gt;delegated execution context&lt;/em&gt;: a bounded, short-lived binding that ties a specific user to a specific agent for a specific task. The context isn't a third identity. It's the tuple of claims (user, agent, scopes, audience, tenant, task ID, expiry) the runtime evaluates at every tool call.&lt;/p&gt;

&lt;p&gt;This model enforces the identity intersection rule, which is the foundation of modern agent security.&lt;/p&gt;

&lt;p&gt;An agent's effective authority must always be calculated as the strict intersection of its own baseline permissions and the requesting human user's permissions. Never the union.&lt;/p&gt;

&lt;p&gt;If a user can't delete a database record, the agent acting on their behalf must fail when attempting the same action. It doesn't matter what the agent's maximum theoretical capabilities are.&lt;/p&gt;

&lt;p&gt;Implementing this intersection requires strict protocol separation. OpenID Connect authenticates the human user to establish who is interacting with the system. OAuth 2.1 authorizes what specific tool calls the agent can make on the human's behalf.&lt;/p&gt;

&lt;p&gt;Conflating these two protocols leads to over-permissioned tokens that get reused across systems they were never scoped for, giving a compromised agent durable access well beyond what the user actually authorized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Nine capabilities for production multi-user AI agent auth
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol's own authorization spec, developed as a broad collaboration with Anthropic, &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;, Microsoft, Okta/Auth0, and others, defines OAuth-style protected resources and authorization server discovery, with audience binding via Resource Indicators (RFC 8707) and delegation via Token Exchange (RFC 8693). MCP defines the auth handshake; the runtime layer above must still handle token vaulting, just-in-time consent, user verification, RBAC, and audit. The nine capabilities below close that gap.&lt;/p&gt;

&lt;p&gt;Building resilient multi-user agent infrastructure means evaluating your systems against this 2026 capability checklist. Unifying these capabilities prevents unauthorized access while ensuring reliable tool execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 1: Model user, agent, and delegated context
&lt;/h3&gt;

&lt;p&gt;Every authorization decision in your runtime must evaluate the user, agent, and context tuple simultaneously.&lt;/p&gt;

&lt;p&gt;If your backend tool plane only verifies the agent's API key, you've failed to model the human user.&lt;/p&gt;

&lt;p&gt;True delegated modeling ensures that the upstream resource server knows exactly which human began the request, which workload orchestrated it, and the precise context under which the delegation was granted.&lt;/p&gt;

&lt;p&gt;In practice, this means the user_id flows from your app's authenticated session into every runtime call. A typical pattern: your IdP (Stytch, Auth0, Okta, or similar) authenticates the user and issues a session, your app extracts the user identifier from that session, and your code passes that identifier explicitly to every runtime SDK call. For example, &lt;code&gt;getTools({ tools: [...], userId: userEmail })&lt;/code&gt; and &lt;code&gt;tools.execute({ ..., user_id: userEmail })&lt;/code&gt;. The runtime then resolves that specific user's vaulted OAuth tokens for the requested provider and scope. Without this explicit user binding on every call, the runtime has no way to enforce the intersection rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 2: Separate OpenID Connect authentication from OAuth authorization
&lt;/h3&gt;

&lt;p&gt;You need to strictly separate human authentication from delegated agent authorization. OpenID Connect handles the initial login session. OAuth 2.1 handles the subsequent tool authorization.&lt;/p&gt;

&lt;p&gt;By separating these concerns, you prevent identity conflation. An agent compromised by a malicious prompt can't reuse human session cookies to access unrelated systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 3: Issue short-lived, scoped, audience-bound access tokens
&lt;/h3&gt;

&lt;p&gt;Agent access tokens must adhere to the strictest cryptographic standards to prevent token replay and lateral movement.&lt;/p&gt;

&lt;p&gt;Each delegated access token should carry the full execution context as claims. In a delegated token, the subject (sub) identifies the human user on whose behalf the action is taken (e.g., user:alice). The actor (act) identifies the agent making the call (e.g., agent:support-copilot). The audience (aud) binds the token to a specific resource server (e.g., gmail-api), and the scope (scope) grants a specific permission (e.g., email.draft, not email.send). The expiry (exp) is set to a tight window of typically 5 to 30 minutes. A tenant claim (e.g., tenant:acme) carries the customer or workspace context, and a task ID (e.g., task_123) ties the call back to the originating user task or session.&lt;/p&gt;

&lt;p&gt;This claim structure enforces the intersection rule cryptographically: every token carries the user, the agent, and the bounded execution context, and the resource server validates all three before honoring the request.&lt;/p&gt;

&lt;p&gt;Your stack must enforce &lt;a href="https://www.rfc-editor.org/rfc/rfc8707.html" rel="noopener noreferrer"&gt;RFC 8707 resource indicators&lt;/a&gt; to bind tokens to a specific audience, ensuring a token minted for a calendar API can't be replayed against a CRM.&lt;/p&gt;

&lt;p&gt;Use &lt;a href="https://www.rfc-editor.org/rfc/rfc8693.html" rel="noopener noreferrer"&gt;RFC 8693 token exchange&lt;/a&gt; to safely trade broad user tokens for tightly downscoped agent tokens.&lt;/p&gt;

&lt;p&gt;Sender-constrain tokens using &lt;a href="https://www.rfc-editor.org/rfc/rfc9449.html" rel="noopener noreferrer"&gt;RFC 9449 demonstrating proof of possession (DPoP)&lt;/a&gt;, ensuring that even if an access token gets intercepted, attackers can't use it without the client's private key. The stack should also support &lt;a href="https://www.rfc-editor.org/rfc/rfc9126.html" rel="noopener noreferrer"&gt;RFC 9126&lt;/a&gt; pushed authorization requests and &lt;a href="https://www.rfc-editor.org/rfc/rfc9396.html" rel="noopener noreferrer"&gt;RFC 9396&lt;/a&gt; rich authorization requests for enhanced, tamper-proof granularity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 4: Vault tokens and automate refresh across providers
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://docs.arcade.dev/en/references/auth-providers/oauth2" rel="noopener noreferrer"&gt;runtime that handles token storage and refresh&lt;/a&gt; per-user, per-provider, is non-negotiable for production agents. Managing the OAuth token lifecycle across thousands of users and dozens of providers is a substantial engineering problem in its own right.&lt;/p&gt;

&lt;p&gt;Access and refresh tokens must be vaulted and encrypted on a strict per-user, per-provider basis. Your system needs to automatically handle provider-specific nuances outside the language model context.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://developers.google.com/identity/protocols/oauth2#expiration" rel="noopener noreferrer"&gt;Google enforces a rolling limit of 100 refresh tokens per client&lt;/a&gt;, and &lt;a href="https://learn.microsoft.com/en-us/azure/active-directory/develop/refresh-tokens" rel="noopener noreferrer"&gt;Microsoft Entra rotates refresh tokens on every redemption with a 90-day sliding inactivity window&lt;/a&gt;. A dedicated token vault must abstract this refresh logic away from the agent developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 5: Enforce read, draft, and commit approval steps
&lt;/h3&gt;

&lt;p&gt;Security architects must enforce &lt;a href="https://www.arcade.dev/agents/gateway-templates/human-approval-workflow/" rel="noopener noreferrer"&gt;out-of-band approval flows&lt;/a&gt; for any irreversible action.&lt;/p&gt;

&lt;p&gt;Reading data or drafting responses requires minimal friction and can be executed synchronously. But external side effects, such as sending emails, deleting records, or committing code, must trigger explicit human step-up approvals.&lt;/p&gt;

&lt;p&gt;These approvals should occur via a secure, out-of-band channel, such as an enterprise authentication app, a separate user interface, or a direct messaging platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 6: Evaluate policy before every tool call by hooking into existing entitlement systems
&lt;/h3&gt;

&lt;p&gt;Never trust a language model's direct API request. Every tool call must route through a centralized policy layer that intersects the user, agent, tenant, action, resource, and task. And it must evaluate that intersection in milliseconds to avoid throttling the agent's conversational latency.&lt;/p&gt;

&lt;p&gt;Critically, this is not an invitation to stand up yet another policy system. Enterprises already have entitlement systems and identity providers like Okta, Entra, SailPoint, and homegrown role/permission stores. The runtime's job is to hook into those systems, acquire scoped tokens at runtime, and enforce the policies the enterprise has already defined, not duplicate them in a new tool.&lt;/p&gt;

&lt;p&gt;Open Policy Agent, Cedar, Oso, OpenFGA, WorkOS FGA, and Zanzibar-style relationship graphs are useful as the local enforcement engine. But the source of truth for who can do what should remain in your existing identity and governance systems. A runtime that asks you to redefine your authorization model in its own DSL is moving the problem, not solving it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 7: Use just-in-time consent and authorization
&lt;/h3&gt;

&lt;p&gt;Blanket consent at user onboarding violates the principle of least privilege.&lt;/p&gt;

&lt;p&gt;Implement just-in-time authorization instead. When an agent requires access to a new system or an ungranted scope to fulfill a prompt, the runtime pauses execution. It returns a granular, context-specific consent interface to the user, captures the cryptographic consent, brokers the new token, and resumes the agent's task without losing conversational context.&lt;/p&gt;

&lt;p&gt;MCP's URL Elicitation Specification Enhancement Proposal (SEP), authored by &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; in collaboration with Anthropic and &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25/client/elicitation" rel="noopener noreferrer"&gt;accepted into the MCP spec&lt;/a&gt;, standardizes how an agent runtime delivers granular, context-specific consent URLs to the user mid-task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 8: Bind first-time auth flows to a verified app user
&lt;/h3&gt;

&lt;p&gt;Granular consent (Capability 7) only matters if the runtime can confirm which user is sitting at the keyboard during the first-time OAuth authorization. Without that confirmation, an attacker who intercepts a flow_id can redirect the consent step to their own browser and either hijack the authorization back into your user's session or capture the user's grant for themselves.&lt;/p&gt;

&lt;p&gt;The mitigation is a server-side user verifier. When a user authorizes a tool for the first time, the runtime redirects them to a verifier route in your app. Your verifier reads the flow_id from the query string, looks up the currently authenticated user from your app's session (Stytch, Auth0, Okta, as the IdP, or an app-layer auth system like Supabase), and posts that user_id back to the runtime via a server-side confirm_user call signed with your API key.&lt;/p&gt;

&lt;p&gt;If the user_id from your session matches the user_id specified when the flow started, the runtime continues. If not, the runtime rejects the flow. Every first-time authorization is therefore bound to a verified, authenticated identity in your app, which closes the flow-phishing attack surface.&lt;/p&gt;

&lt;p&gt;In production multi-user deployments, this is non-negotiable. Arcade's reference implementations show the pattern in &lt;a href="https://github.com/ArcadeAI/agency-tutorial-stytch" rel="noopener noreferrer"&gt;Next.js with Stytch&lt;/a&gt; and &lt;a href="https://github.com/ArcadeAI/arcade-custom-verifier-next" rel="noopener noreferrer"&gt;Next.js with Supabase&lt;/a&gt;, and Arcade's &lt;a href="https://docs.arcade.dev/en/guides/user-facing-agents/secure-auth-production" rel="noopener noreferrer"&gt;Secure Auth in Production guide&lt;/a&gt; walks through the verifier route end-to-end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capability 9: generate immutable audit logs for every agent action
&lt;/h3&gt;

&lt;p&gt;Every action taken by an agent must generate an immutable audit log with a complete chain of custody.&lt;/p&gt;

&lt;p&gt;This means capturing the requesting user, the agent identity, the tenant, the task ID, the specific tool invoked, the resource accessed, the policy decision and policy version, the prompt hash, input references, output hash, approval status, and the exact timestamp.&lt;/p&gt;

&lt;p&gt;These logs must be &lt;a href="https://opentelemetry.io/docs/concepts/signals/logs/" rel="noopener noreferrer"&gt;OpenTelemetry-compatible&lt;/a&gt;, providing structured traces that export cleanly into enterprise security information and event management systems for immediate incident response.&lt;/p&gt;

&lt;p&gt;And the audit story isn't only about the logs themselves. It's about the controls that produce them. SOC 2 Type 2 certification validates that the runtime's audit, access, and change-management controls operate as designed under independent audit. Treat the certification as a procurement floor and the per-action log structure as the actual product capability. You need both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a runtime, not a gateway: the architecture shift behind multi-user authorization
&lt;/h2&gt;

&lt;p&gt;In the traditional model, users interact with applications, applications call APIs, and a gateway sits between them, routing, authenticating, and rate-limiting at the perimeter. The proxy is the control point because it's the choke point: every request flows through it.&lt;/p&gt;

&lt;p&gt;In the agentic model, that topology inverts. The agent is already the proxy. A user talks to an agent. The agent reasons, plans, and calls tools on the user's behalf. It already handles mediation, routing, and orchestration. Adding a traditional API gateway in front of the tools doesn't add a control point; it adds a redundant hop that can't see into the execution context that actually matters: which user, which action, which permission, right now.&lt;/p&gt;

&lt;p&gt;That's why "MCP gateway" is the wrong frame for the auth problem. A stateless proxy evaluates each request in isolation. It can't track that a request is step 3 of a 6-step agent workflow, acting on behalf of a specific user who authorized a particular scope minutes ago. Bolting MCP support onto an API gateway is not a pivot. It's a patch.&lt;/p&gt;

&lt;p&gt;The control point in an agentic architecture is the execution layer where the tool runs. That's where credentials are resolved, permissions are checked, and actions are taken on behalf of a specific human. That's the runtime. The nine capabilities above can only be enforced there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where each layer fits in the agent auth stack (IdP, OAuth vault, policy engine, MCP runtime)
&lt;/h2&gt;

&lt;p&gt;Understanding the vendor landscape means categorizing platforms by their strict architectural function. Misunderstanding where a tool fits in the stack leads to dangerous auth gaps.&lt;/p&gt;

&lt;p&gt;The deeper issue is consistency at scale. Even with the right primitives in place (an IdP, a token vault, a policy engine), most stacks have no uniform way to apply them across every agent, every user, and every system. Each team stitches its own integration, and two teams in the same company end up enforcing the same policy differently. The runtime is what makes a single authorization model enforceable across every agent, without each team rebuilding the plumbing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architectural layer&lt;/th&gt;
&lt;th&gt;Example vendors&lt;/th&gt;
&lt;th&gt;Primary function&lt;/th&gt;
&lt;th&gt;Key gap for multi-user agents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity providers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Okta, Auth0, Entra, WorkOS, and Clerk&lt;/td&gt;
&lt;td&gt;Authenticate the human user into the application via OpenID Connect.&lt;/td&gt;
&lt;td&gt;Lacks the full agent authorization stack. Support for explicit delegation flows, such as RFC 8693 and sender-constraining via DPoP, varies significantly and often requires heavy custom actions. Audit covers authentication events, not per-tool-call agent actions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OAuth libraries and vaults&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Authlib, HashiCorp Vault, Doppler&lt;/td&gt;
&lt;td&gt;Securely store, encrypt, and manage raw OAuth tokens.&lt;/td&gt;
&lt;td&gt;Lacks a contextual decision engine, robust policy evaluation, and the dynamic, multi-provider refresh logic necessary for asynchronous agentic workflows. Audit captures token operations, not the user, agent, and tool context behind each call.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Policy engines and FGA platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open Policy Agent, Cedar, Oso (Polar DSL), OpenFGA, WorkOS FGA, Zanzibar-style, Sailpoint&lt;/td&gt;
&lt;td&gt;Evaluate fine-grained authorization policies against complex relationship graphs.&lt;/td&gt;
&lt;td&gt;Leaves token brokering, consent user experiences, and physical tool connectivity for the engineering team to build from scratch. Audit records the policy decision, not the full execution context that the resource server actually saw.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent frameworks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LangChain, Mastra, Crew AI&lt;/td&gt;
&lt;td&gt;Provide tool abstraction for agent workflows.&lt;/td&gt;
&lt;td&gt;Push the auth burden back onto your application code; treat tools like keys in a dotenv file and quietly break the moment a second customer signs up. No native audit trail for agent actions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP gateways and integration wrappers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Composio&lt;/td&gt;
&lt;td&gt;Connect language models to external tools using standardized interfaces.&lt;/td&gt;
&lt;td&gt;Designed for rapid prototyping and single-user proof-of-concept agents. An SDK-layer integration wrapper, not a runtime. Per-user OAuth is supported, but SSO, OIDC, and audit are limited rather than native, and the agent/user permission intersection isn't enforced.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP runtimes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;The first MCP runtime built for agent authorization. Delivers post-prompt user-specific permissions, isolated token lifecycle management (refresh, rotation, mismatch), OAuth protocol brokering,  contextual access policy enforcement, and immutable per-action audit logs exportable via OpenTelemetry.&lt;/td&gt;
&lt;td&gt;Not applicable. This layer explicitly unifies the previous layers and fills their operational gaps.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Reference architectures for multi-user agent auth
&lt;/h2&gt;

&lt;p&gt;These capabilities only matter if you can map them to real architectures. The three patterns below show how an MCP runtime enforces multi-user authorization in production.&lt;/p&gt;

&lt;p&gt;The patterns assume the canonical multi-user setup: an agent application that authenticates users via its own identity provider (Stytch, Auth0, Okta, or Entra) and calls the runtime through its client SDK, passing the authenticated user_id on every tool call. The runtime is the backend that brokers OAuth, vaults tokens per user, and enforces policy. For MCP-client integrations like Copilot, Cursor or Claude Desktop, the runtime's MCP gateway path is used instead, but the runtime semantics are the same.&lt;/p&gt;

&lt;p&gt;Two distinct auth flows run inside each pattern. &lt;strong&gt;Server-level auth&lt;/strong&gt; determines whether the agent application (an MCP client) can connect to the MCP server. &lt;strong&gt;Tool-level auth&lt;/strong&gt; governs whether the currently authenticated user can invoke a specific tool against this resource with these parameters right now. Server-level auth happens once per client-to-server connection. Tool-level auth runs on every tool call, and it's where the user verifier (Capability 8), just-in-time consent via URL Elicitation (Capability 7), and the permission intersection rule actually operate. Arcade's &lt;a href="https://docs.arcade.dev/en/learn/server-level-vs-tool-level-auth" rel="noopener noreferrer"&gt;Server-Level vs Tool-Level Authorization guide&lt;/a&gt; walks through the distinction in detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: internal productivity agent (Google Workspace)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Architectural flow:&lt;/strong&gt; Human User -&amp;gt; [OIDC Identity Provider] -&amp;gt; Agent Application -&amp;gt; MCP Runtime -&amp;gt; &lt;a href="https://docs.arcade.dev/en/resources/integrations" rel="noopener noreferrer"&gt;Gmail and Calendar MCP tools&lt;/a&gt;-&amp;gt; Google Workspace&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; An internal, Claude-based assistant organizes meetings and summarizes emails across a multi-user Google Workspace environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt; The agent must never possess domain-wide delegation. Instead, the MCP runtime brokers a user-specific OAuth flow. The runtime requests delegated gmail.readonly and gmail.compose scopes, binding the resulting token strictly to the individual employee.&lt;/p&gt;

&lt;p&gt;On the user's first authorization, the runtime redirects the user's browser to a verifier route in the app. The verifier reads the flow_id, looks up the authenticated user from the OIDC session, and confirms the user_id back to the runtime. Only after the runtime matches the verifier-confirmed user_id against the user_id that started the flow does the OAuth grant proceed. From that point forward, the user's token is vaulted per provider and reused on subsequent calls without re-authorization.&lt;/p&gt;

&lt;p&gt;When the agent attempts to read an inbox, the app passes the authenticated user_id from its session into the runtime SDK call. The runtime evaluates the policy engine, retrieves that specific user's token from the vault, and executes the call.&lt;/p&gt;

&lt;p&gt;If the agent hallucinates or receives a malicious prompt to send an email, it requests the gmail.send scope. The runtime catches this unauthorized request, pauses execution, and forces an out-of-band step-up approval to the user's device. A human explicitly authorizes the transmission, or it doesn't happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: multi-tenant Slack agent (workspace isolation)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Architectural flow:&lt;/strong&gt; Human User -&amp;gt; [OIDC Identity Provider] -&amp;gt; Agent Application -&amp;gt; MCP Runtime -&amp;gt; &lt;a href="https://docs.arcade.dev/en/resources/integrations/social/slack" rel="noopener noreferrer"&gt;Slack MCP tools&lt;/a&gt; -&amp;gt; Slack workspace&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; A business-to-business application deploys an agent that aggregates alerts and takes administrative actions across multiple customer Slack workspaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt; Managing access across distinct corporate boundaries requires strict multi-tenant isolation. The runtime manages workspace-level OAuth installations, generating bot tokens combined with granular user-level channel permissions like chat:write and channels:history.&lt;/p&gt;

&lt;p&gt;The runtime uses RFC 8707 resource indicators, ensuring that tokens minted for Tenant A's Slack instance are mathematically bound to that tenant's audience.&lt;/p&gt;

&lt;p&gt;If an injection attack attempts to force the agent to read Tenant B's data using Tenant A's context, the policy engine rejects the cross-tenant token replay instantly. That prevents catastrophic cross-customer data leakage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Salesforce CRM agent (user-level permissions)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Architectural flow:&lt;/strong&gt; Human User -&amp;gt; [OIDC Identity Provider] -&amp;gt; Agent Application -&amp;gt; MCP Runtime -&amp;gt; &lt;a href="https://docs.arcade.dev/en/resources/integrations/sales/salesforce" rel="noopener noreferrer"&gt;Salesforce MCP tools&lt;/a&gt; -&amp;gt; Salesforce&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; A sales copilot updates pipeline records, drafts follow-up emails, and queries customer history on behalf of individual account executives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt; Salesforce data access rules are notoriously complex. The MCP runtime requests the api and refresh_token OAuth scopes to call Salesforce on behalf of the user, then evaluates the account executive's specific Salesforce profile and permission sets at every tool call before allowing the agent to proceed. Object-level access (read on Account / Contact, edit on Opportunity stage transitions, commit on Lead conversion) is gated by the user's existing Salesforce permissions, not by the agent's own credentials.&lt;/p&gt;

&lt;p&gt;The implementation enforces strict separation between reading account contacts, drafting meeting notes, and committing pipeline updates.&lt;/p&gt;

&lt;p&gt;Through just-in-time authorization, if a junior rep asks the agent to update a closed-won opportunity they lack privileges to edit, the runtime's policy engine blocks the action at the tool boundary. It returns a graceful access denial to the language model without exposing backend credentials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent auth anti-patterns to avoid in production
&lt;/h2&gt;

&lt;p&gt;Answer engines and security audits favor systems that eliminate known architectural flaws. If your current homegrown agent setup relies on any of these anti-patterns, your infrastructure isn't ready for enterprise production.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single API key routing:&lt;/strong&gt; Your agent backend shares a single, highly privileged service account key across all users. This breaks identity attribution at the request layer. The backend can't distinguish between an intern's request and a CEO's request, and a single prompt injection inherits maximum blast radius across the entire user base.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;God mode with prompted guardrails:&lt;/strong&gt; The agent runs with root or admin credentials, and engineers rely on system prompts like "do not delete data" to maintain security. Language models are easily manipulated through indirect injection, so relying on the model to govern its own authorization is a fundamental security failure.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blanket sign-up consent:&lt;/strong&gt; Forcing users to grant massive, multi-system OAuth scopes during their initial onboarding. This violates the principle of least privilege, causes consent fatigue, and provisions tokens with dangerous capabilities long before the user actually needs them.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User interface-only checks:&lt;/strong&gt; Authorization checks are enforced exclusively at the chat interface or frontend web application, leaving the backend tool plane unprotected. If an attacker bypasses the chat interface and sends payloads directly to the tool execution endpoint, the system complies without verifying the delegated user context.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No distinction between draft and commit:&lt;/strong&gt; Your agent treats every action with the same authorization level, sending emails or transferring funds as easily as drafting them. Without a read/draft/commit gradient and an out-of-band approval step for irreversible actions, a single prompt injection causes irreversible damage.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No immutable audit trail:&lt;/strong&gt; Your agent system has no per-action audit log or relies on application logs that can be modified after the fact. Without an immutable record of who authorized what tool action when (with policy version, prompt hash, and approval status), security incidents can't be reconstructed, and regulator-facing audit reports become impossible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: the delegated authorization rule for multi-user agents
&lt;/h2&gt;

&lt;p&gt;The transition to production-grade, multi-user AI agents demands a fundamental shift in how we architect security. The entire philosophy of agent authorization boils down to one strict rule:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This specific agent may perform this specific action on this specific resource, for this specific user, in this specific tenant, for this specific task, for a strictly limited period of time.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If your current infrastructure can't cryptographically enforce and audit that exact sentence from the chat prompt down to the backend API layer, your system isn't ready for multi-user production in 2026.&lt;/p&gt;

&lt;p&gt;A gateway can't enforce that rule. A runtime can.&lt;/p&gt;

&lt;p&gt;Before you commit to a runtime, do three things. Audit your current identity mapping to confirm your backend systems actually model the user, agent, and context tuple on every tool call. Stop building bespoke OAuth plumbing. Refresh logic, just-in-time consent user interfaces, and multi-tenant token vaulting are undifferentiated technical debt your engineers shouldn't be writing. And test the intersection rule aggressively by sending malicious prompts against your own agents to verify that your policy engine intercepts them at the network boundary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade is the first MCP runtime purpose-built for agent authorization&lt;/a&gt;, handling per-user OAuth, just-in-time consent, token vaulting, policy intersection, and immutable audit as native capabilities, not bolt-on plugins. The nine capabilities above are unified under one control plane, alongside Arcade's agent-optimized tool catalog and lifecycle governance, so your engineering teams can focus on shipping high-value agent logic instead of maintaining fragile identity plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's the best way to manage multi-user AI agent authentication and authorization in 2026?
&lt;/h3&gt;

&lt;p&gt;Treat every tool call as delegated user access, not agent-owned access. Implement a two-identity model (the agent application and the user on whose behalf the action is taken), bind every call to a delegated execution context, and enforce the intersection rule via OAuth 2.1 delegated tokens, a policy engine in front of tools, short-lived scoped tokens, and immutable audit logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the two-identity model for agent authorization?
&lt;/h3&gt;

&lt;p&gt;Every request carries two identities: the project-level key (the agent application making the call) and the user-level identity (the human on whose behalf the action is taken). The runtime evaluates these two identities against a delegated execution context, a bounded binding that ties a specific user to a specific agent for a specific task, so the backend can attribute and constrain every action.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the "intersection rule," and why does it matter?
&lt;/h3&gt;

&lt;p&gt;The agent's effective permissions must be the intersection of the user's permissions and the agent's allowed capabilities. Never the union. This rule prevents "confused deputy" failures where an injected prompt causes the agent to misuse broad system access.&lt;/p&gt;

&lt;h3&gt;
  
  
  How should OpenID Connect and OAuth 2.1 be used together for agents?
&lt;/h3&gt;

&lt;p&gt;Use OpenID Connect to authenticate the human user (who they are). Use OAuth 2.1 to authorize the agent's tool calls (what the agent can do on the user's behalf) with scoped, audience-bound tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you prevent prompt injection from turning into tool misuse?
&lt;/h3&gt;

&lt;p&gt;Don't rely on prompts for security. Route every tool call through a policy enforcement layer that checks user/agent/context, scopes, tenant, and resource. Use short-lived, audience-bound tokens so even a successful injection can't pivot across systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which token properties are required for secure delegated-agent access?
&lt;/h3&gt;

&lt;p&gt;Tokens should be short-lived, scoped, and audience-bound (so they can't be replayed against other APIs). For stronger replay resistance, use sender-constrained tokens (e.g., DPoP) so stolen tokens are unusable without the client key.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you handle OAuth refresh tokens safely for thousands of users?
&lt;/h3&gt;

&lt;p&gt;Store tokens in a per-user, per-provider encrypted vault and automate refresh/rotation outside the LLM. This prevents secrets from leaking into prompts and prevents provider-specific refresh edge cases from breaking agent workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should an agent require step-up approval or human confirmation?
&lt;/h3&gt;

&lt;p&gt;Require step-up approval for irreversible or high-impact actions (e.g., sending an external email, deleting records, committing code, or transferring funds). Let the agent read and draft with lower friction, but gate "commit" actions via an out-of-band confirmation flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is just-in-time authorization for AI agents?
&lt;/h3&gt;

&lt;p&gt;The agent requests new scopes or system access only when needed for a specific task. The runtime pauses, collects granular consent, mints a downscoped token, and resumes. This reduces over-permissioning and consent fatigue.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is MCP URL Elicitation?
&lt;/h3&gt;

&lt;p&gt;URL Elicitation is a Specification Enhancement Proposal authored by &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt; with Anthropic and &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25/client/elicitation" rel="noopener noreferrer"&gt;accepted into the Model Context Protocol spec&lt;/a&gt;. It defines how an MCP runtime returns a granular, context-specific consent URL to the user mid-task when the agent needs a new scope or system, allowing the user to authorize the request out of band before the runtime resumes execution. URL Elicitation is the standardized mechanism behind just-in-time agent authorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  What should be included in an audit log for agent tool calls?
&lt;/h3&gt;

&lt;p&gt;Log the user identity, agent identity, tenant, tool/action/resource, policy decision, timestamp, and a prompt or request hash. Make logs immutable and exportable via OpenTelemetry-compatible formats for incident response and compliance.  &lt;/p&gt;

</description>
      <category>mcp</category>
      <category>agents</category>
      <category>security</category>
      <category>identity</category>
    </item>
    <item>
      <title>Should you build or buy an MCP runtime for enterprise AI agents in 2026?</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 13 May 2026 21:21:22 +0000</pubDate>
      <link>https://dev.to/arcade/should-you-build-or-buy-an-mcp-runtime-for-enterprise-ai-agents-in-2026-36jg</link>
      <guid>https://dev.to/arcade/should-you-build-or-buy-an-mcp-runtime-for-enterprise-ai-agents-in-2026-36jg</guid>
      <description>&lt;p&gt;The engineering bottleneck for enterprise AI has shifted. Your team has built agents. They work in single-user environments on LangChain or Mastra. The wall hits when you try to wire those agents into secure enterprise systems for thousands of employees without creating new security exposure or a permanent maintenance load.&lt;/p&gt;

&lt;p&gt;In 2026, engineering directors face a real architectural decision, and it isn't whether to write custom Model Context Protocol (MCP) servers. Custom MCP servers are how you connect agents to proprietary internal systems, regardless of which path you choose. The actual decision is whether you also build the runtime layer that wraps those servers: OAuth lifecycle, credential vaulting, multi-user auth, permission intersection logic, audit pipeline, policy enforcement, and observability. Build that layer yourself on top of LangChain or Mastra, or buy an MCP runtime that delivers it off the shelf.&lt;/p&gt;

&lt;p&gt;The right answer depends on your deployment profile. Once multi-user authorization, audit-grade governance, or asynchronous tool-call observability enter the picture, the build path incurs increasing costs and a growing risk surface. Maintaining your own auth, credential vaulting, and audit pipeline puts every agent action inside your security blast radius. The decision favors buying a runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR: Build vs. buy MCP runtime&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;An MCP runtime handles the work most teams have no business writing themselves: agent authorization, OAuth token rotation, audit logging, and policy enforcement. The runtime is the execution, authorization, and governance layer where your agent's tools (MCP servers) run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you build your own runtime.&lt;/strong&gt; Three narrow profiles fit this path: single-user scope, agent infrastructure as your core product, or all-internal API pipelines. You retain full control and assume responsibility for the OAuth lifecycle, credential vaulting, audit logging, and policy enforcement. Each integration becomes a permanent line item on your engineering roadmap; auth and policy maintenance never go to zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you buy a runtime.&lt;/strong&gt; This is the default for multi-user production. You get centralized lifecycle governance that maps to your existing policies, multi-user authorization with full OAuth lifecycle management, tool execution, and a path to build proprietary tools without rebuilding the runtime layer.&lt;/p&gt;

&lt;p&gt;Four tipping points force a transition from a self-built runtime layer to a vendor-provided one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crossing the three-integration threshold, where API maintenance starts consuming dedicated sprints.
&lt;/li&gt;
&lt;li&gt;Introducing user-delegated actions, requiring agents to execute tool calls on behalf of specific human users with distinct permissions.
&lt;/li&gt;
&lt;li&gt;Moving from synchronous read-only tasks to asynchronous, long-running read/write operations that break standard LLM timeouts.
&lt;/li&gt;
&lt;li&gt;Needing OpenTelemetry-compatible audit logs to satisfy compliance and security teams.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The state of MCP infrastructure: Config hell vs. the buy tradeoff&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol has standardized how AI applications consume context and execute tools, replacing the bespoke API wrappers teams used to write for every LLM feature.&lt;/p&gt;

&lt;p&gt;Adopting MCP introduces architectural challenges. Enterprise platform teams choose between two operational burdens: the DIY trap of "config hell," or the buy-side tradeoff of vendor cadence and ecosystem dependency.&lt;/p&gt;

&lt;p&gt;Config hell happens when you scale bespoke MCP servers. Platform engineers spend their time editing JSON configurations to re-map tool schemas every time an upstream SaaS provider deprecates an endpoint, chasing token rotation drift when an OAuth refresh expires and the custom retry logic doesn't handle the edge case, and handling the manual work that SOC 2 and GDPR compliance requires (immutable schema registries, signed tool manifests, middleware to redact PII from tool outputs). When you build your own infrastructure, you own every broken connection, every expired token, and every security patch.&lt;/p&gt;

&lt;p&gt;The runtime is not an additional proxy in front of your tools. In an agentic architecture, &lt;a href="https://www.arcade.dev/white-papers/why-mcp-needs-a-runtime" rel="noopener noreferrer"&gt;the agent is already the proxy&lt;/a&gt;. It mediates between the user and downstream systems, reasons about which tools to call, and orchestrates multi-step workflows. The runtime is the execution layer where the chosen action actually runs. It is where credentials are resolved, policy is enforced, and the call is made on behalf of a specific user.&lt;/p&gt;

&lt;p&gt;The runtime is the best gateway.&lt;/p&gt;

&lt;p&gt;The real buy-side tradeoffs are different. You accept the runtime's policy primitives and observability format as lock-in. You take on overhead from per-tool authorization checks and just-in-time token resolution, which is a fraction of LLM inference and downstream API latency.&lt;/p&gt;

&lt;p&gt;The real choice in 2026 is risk, not cost. Build your own runtime layer, and your security blast radius scales with every integration, user, and policy change. Buying a runtime moves that work to a vendor that has already been audited for it. For enterprise deployments, that is the safer side of the tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;When to build your own runtime&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Building your own runtime layer is the right call in a narrow set of scenarios. The open-source ecosystem has matured enough that deep platform engineering teams can stand up their own orchestration layer on top of the &lt;a href="https://modelcontextprotocol.io/docs/sdk" rel="noopener noreferrer"&gt;official Model Context Protocol Python or TypeScript SDKs&lt;/a&gt;. The SDKs implement the MCP specification over JSON-RPC 2.0 and support both stdio for local process communication and Streamable HTTP for remote execution. Teams wrap MCP servers in &lt;a href="https://github.com/langchain-ai/langchain-mcp-adapters" rel="noopener noreferrer"&gt;adapters provided by frameworks like LangChain&lt;/a&gt; or Mastra so agents can invoke them directly, then deploy on &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; using custom Helm charts.&lt;/p&gt;

&lt;p&gt;The MCP servers themselves then become the easy part. The runtime layer that wraps them is the actual work, and the cases where building that layer in-house make sense are narrow.&lt;/p&gt;

&lt;p&gt;Build your own runtime if you have a single-user scope. Per-user OAuth, token vaulting, and permission intersection are the hardest parts of the runtime layer, and they matter only once more than one human is involved. A solo developer connecting their own credentials to a single agent does not need them.&lt;/p&gt;

&lt;p&gt;Build your own runtime if the agent infrastructure is your core product. A startup whose entire product is a smart scheduling agent for end users must control every layer of the stack. The engineers should be deep into this work because it is the company.&lt;/p&gt;

&lt;p&gt;Build your own runtime if you own every API in the pipeline. If your agents act only on systems and data sources you control, with no third-party SaaS connections, you bypass the OAuth-lifecycle problem entirely, and the case for buying weakens.&lt;/p&gt;

&lt;p&gt;Air-gapped deployments are not a build trigger. They are a deployment-mode question. Self-hosted runtimes run the vendor's runtime layer entirely inside your infrastructure, satisfying the air-gap while inheriting auth, audit, and governance from the runtime. Build your own runtime layer only when the deployment also prohibits third-party vendor software, which typically applies to highly classified environments.&lt;/p&gt;

&lt;p&gt;Outside those three cases, building your own runtime is a misallocation of senior engineering time.&lt;/p&gt;

&lt;p&gt;Beyond the MCP servers themselves, you build secure token vaults to manage OAuth refresh lifecycles for each user and service. You handle provider-specific rate limits and pagination. You architect state machines for asynchronous debugging when a tool call takes ten minutes to execute. You patch custom servers every time an upstream API changes its schema. Skip that work, and you get agent hallucination and silent failures.&lt;/p&gt;

&lt;p&gt;Auth and policy carry their own ongoing burden, separate from API drift. People join and leave the company. Roles change. Permissions get revoked. Policies tighten after an incident. Each event has to flow through your custom auth layer in real time. This is a permanent FTE cost, not a build-once-leave-alone problem, and it never decreases as the deployment grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;When to buy a runtime&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;An MCP runtime shifts engineering effort from infrastructure to product. Your team operates on top of an execution layer that already handles auth, vaults, audit, and policy, instead of building each one.&lt;/p&gt;

&lt;p&gt;A runtime gives you four things off the shelf.&lt;/p&gt;

&lt;p&gt;Centralized lifecycle governance. The runtime is the enforcement point for the policies your organization has already defined elsewhere (in your IDPs, your sales tools, your security systems). It maps to those existing policies and enforces them at the agent layer. It does not ask you to recreate access policies inside a new tool. Administrators get a single control plane to manage agent behavior, audit tool execution, and roll out updates safely across the organization.&lt;/p&gt;

&lt;p&gt;Multi-user post-prompt authorization. Every tool call executes using the credentials and permissions of the human user requesting the action. The runtime handles the OAuth token lifecycle (secure vaulting, refresh, rotation) without exposing credentials to the LLM.&lt;/p&gt;

&lt;p&gt;A catalog of pre-built, version-controlled MCP tools, so your agents reach thousands of enterprise systems on day one.&lt;/p&gt;

&lt;p&gt;A path for proprietary tools that doesn't require rebuilding the runtime layer. When you need custom MCP servers for internal systems, you write them on the runtime's open-source MCP framework and inherit auth, audit, and governance for free. If you already have custom MCP servers built without the framework, you can connect them to the runtime and still get the same auth, audit, governance, and pre/post-call policy hooks without rewriting them.&lt;/p&gt;

&lt;p&gt;Platform engineers shift from writing brittle integration scripts and debugging broken OAuth flows to managing high-level access policies. Your team defines which agents can access which tools, sets up visibility filtering so specific teams only see permitted integrations, and monitors OpenTelemetry-compatible dashboards to track agent reasoning and tool execution latency. You spend time on the agent's logic, not the plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Enterprise MCP scorecard: Decision criteria for build vs. buy&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Eight dimensions separate a local prototype from a production deployment. The matrix scores each lane against them.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Evaluation dimension&lt;/th&gt;
&lt;th&gt;DIY runtime layer (open-source SDKs)&lt;/th&gt;
&lt;th&gt;Vendor MCP runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control &amp;amp; customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Absolute. Full control over transport layers, custom memory state, and bespoke hardware isolation.&lt;/td&gt;
&lt;td&gt;High. Standardized tool execution with hooks for custom policies, but limited underlying infrastructure access.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Weeks to months. Requires building auth layers, token vaults, and infrastructure deployment pipelines.&lt;/td&gt;
&lt;td&gt;Hours to days. Drop-in integration with existing IdPs and immediate access to pre-built tool catalogs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Severe. Team owns all API schema updates, deprecations, token rotation logic, and security patches. The work compounds with every integration and every policy change.&lt;/td&gt;
&lt;td&gt;Minimal. The vendor absorbs API drift, token lifecycle work, and security patching. Your team manages access policies and visibility, not infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-user authorization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual implementation. High risk of prompt injection and credential leakage if built incorrectly.&lt;/td&gt;
&lt;td&gt;Built-in. Automated just-in-time token issuance, scoped per user and isolated from the LLM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lifecycle governance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fragmented. Requires custom logging middleware, disparate SIEM integrations, and manual version control.&lt;/td&gt;
&lt;td&gt;Centralized. Unified control plane, OpenTelemetry-native audit logs, and shadow MCP prevention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Async task handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex. Requires building external polling, dead-letter queues, and durable state machines for timeouts.&lt;/td&gt;
&lt;td&gt;Native. Parallelized execution, automatic failover, intelligent retries, and decoupled result fetching.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment options&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infinite. Deploy anywhere, including fully air-gapped, offline environments.&lt;/td&gt;
&lt;td&gt;Cloud, self-hosted on-prem or in cloud (vendor enterprise tier), hybrid, or fully air-gapped. Cloud requires network egress to the vendor control plane; self-hosted runs the runtime entirely in your own infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best-fit team profile&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single-user scope, agent infrastructure is your core product, you own every API in the pipeline.&lt;/td&gt;
&lt;td&gt;Multi-user production, mixed proprietary plus SaaS requirements, teams optimizing for time-to-value and audit-grade governance.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Multi-user authorization in production&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Multi-user authorization is where most enterprise agent projects stall before production.&lt;/p&gt;

&lt;p&gt;A developer testing locally passes their personal API keys to the system. In production, an agent serves thousands of users with different permission scopes.&lt;/p&gt;

&lt;p&gt;If your runtime layer relies on a shared service account or forwards a user's full-scope bearer token to the LLM context, you've created an attack vector. A &lt;a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/" rel="noopener noreferrer"&gt;prompt injection attack instructs the agent to use those inherited permissions&lt;/a&gt; to exfiltrate data or delete repositories.&lt;/p&gt;

&lt;p&gt;Shared service accounts also break audit-trail requirements: systems can't tell an autonomous-agent action apart from a human-directed one.&lt;/p&gt;

&lt;p&gt;A runtime solves this with &lt;a href="https://www.arcade.dev/blog/why-agents-dont-need-non-human-identity" rel="noopener noreferrer"&gt;multi-user, post-prompt authorization&lt;/a&gt;. The runtime enforces a permission intersection at execution time:&lt;/p&gt;

&lt;p&gt;Agent Permissions ∩ User Permissions = Effective Action Scope&lt;/p&gt;

&lt;p&gt;The agent can only execute an action if both the agent's role policy and the user's native SaaS permissions allow it. Every other combination is denied.&lt;/p&gt;

&lt;p&gt;For example, an HR agent scoped to recruiting tasks is invoked by an employee with admin privileges in Workday, including access to global payroll data. When the agent attempts to read payroll, the runtime evaluates the intersection at call time and denies the request. The user has the authority. The agent's restricted scope blocks the action.&lt;/p&gt;

&lt;p&gt;The runtime acquires a tightly scoped, just-in-time token to execute the allowed action on behalf of the user. The credentials never reach the LLM client, which removes prompt injection as a direct credential-theft vector.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lifecycle governance and audit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Without centralized governance, enterprise agent deployments turn into shadow IT. Developers spin up rogue MCP servers on local machines or unauthorized cloud instances, connecting LLMs to internal databases without oversight.&lt;/p&gt;

&lt;p&gt;A runtime acts as the central enforcement point for the policies your organization has already defined elsewhere. It maps to your IDPs, your sales tools, your security systems, and enforces what's there. It does not ask you to recreate access policies inside a new tool. Think of the runtime as the bouncer: it enforces, it doesn't author. All tools and servers are registered in a single catalog. Visibility filtering ensures that an HR agent sees only HR-related tools, while a coding agent sees only repository tools.&lt;/p&gt;

&lt;p&gt;Beyond enforcing what's already defined, the runtime exposes pre- and post-tool-call hooks for custom logic. Compliance teams drop in their own variables (workflow state, time windows, request volume, contextual data on the user or session), and the runtime treats those as first-class enforcement primitives alongside standard policies. Organization-specific conditions get wired in without forking the runtime.&lt;/p&gt;

&lt;p&gt;The runtime generates fine-grained, &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt;-compatible audit logs. Every action is tracked: which user prompted the agent, which LLM model generated the tool call, what parameters were passed, and what the downstream API returned. That visibility is a prerequisite for passing security reviews in regulated industries.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Async and long-running tasks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Standard LLM architectures are synchronous. Inference endpoints time out within minutes.&lt;/p&gt;

&lt;p&gt;Enterprise agent actions, such as triggering CI/CD pipeline builds, provisioning cloud infrastructure, or querying large data warehouses, can run for tens of minutes or hours.&lt;/p&gt;

&lt;p&gt;In a DIY runtime, platform engineers build the asynchronous scaffolding themselves: job queues, external-memory state synchronization, polling mechanisms, and dead-letter queues for failed operations.&lt;/p&gt;

&lt;p&gt;A runtime handles this work. It supports the &lt;a href="http://modelcontextprotocol.io/specification/2025-11-25/basic/utilities/tasks" rel="noopener noreferrer"&gt;latest MCP Tasks specifications&lt;/a&gt;, so agents trigger a long-running process, receive a task identifier immediately, and poll for the result asynchronously.&lt;/p&gt;

&lt;p&gt;The runtime handles parallelized execution, failover routing when an endpoint drops, and backoff retries. The agent workflow stays durable without the application layer managing state.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Observability: end-to-end OpenTelemetry traces&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The hidden cost of DIY MCP stacks is debugging. When an agent fails a tool call at 3 a.m., platform engineers stitch together traces from the agent run, each MCP server's logs, each provider SDK's retry logs, and each target SaaS API's status page. There is no correlated view. Investigating one failed async action means grepping across three systems in parallel and reconstructing the sequence by hand.&lt;/p&gt;

&lt;p&gt;A runtime emits a single OpenTelemetry trace that carries the full chain. An example span tree for one agent action ("schedule a follow-up meeting and send the recap"):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent.run (root)                 user_id, session_id, agent_id
├─ llm.infer                     model, prompt_tokens, completion_tokens
├─ mcp.tool_call                 tool=google_calendar.create_event
│  ├─ mcp.authz                  policy_result=allow, user_scope=calendar.events.write
│  ├─ mcp.oauth.refresh          token_id, refresh_outcome=ok
│  └─ mcp.http.execute           target_host, status=200, latency_ms=412
├─ mcp.tool_call                 tool=gmail.send
│  ├─ mcp.authz                  policy_result=allow
│  ├─ mcp.retry                  attempt=2, reason=rate_limited
│  └─ mcp.http.execute           status=202, latency_ms=890
└─ llm.infer                     result synthesis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Export that trace to &lt;a href="https://www.honeycomb.io/" rel="noopener noreferrer"&gt;Honeycomb&lt;/a&gt;, Datadog, or your SIEM, and you can answer "which user, agent, tool, policy, token, or retry caused the failure?" in one view. DIY gets you there only if you build the trace-correlation layer yourself and maintain it as SDKs, provider log formats, and policy engines evolve. That maintenance is a direct cost on your DIY stack, and it goes away when you adopt a runtime that emits agent-to-tool traces natively.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Operational burden of building&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The operational burden of a DIY runtime layer compounds with every integration and every policy change. Initial development is the smallest part of the work. Most of the engineering effort lands after launch in API deprecations, schema changes, OAuth token rotation, security patching, and the auth and policy churn that grows with every user, every role change, and every revoked permission.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://ksingh7.medium.com/your-mcp-server-will-break-production-heres-how-to-stop-it-35574631a665" rel="noopener noreferrer"&gt;production post-mortem of custom MCP servers&lt;/a&gt; documents the typical failure chain: auth drift, orphaned session state, brittle retries, silent tool hallucinations. Each failure costs senior engineering capacity to diagnose and remediate, on a timeline that doesn't compress.&lt;/p&gt;

&lt;p&gt;Senior engineers building a DIY runtime spend their time on OAuth refresh scripts and incident-response patches. Senior engineers using a runtime spend their time on proprietary agent logic and domain-specific workflows. The differences compound across every team and every quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to evaluate MCP runtime vendors in 2026&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Buying a runtime starts with picking the right vendor. The MCP infrastructure market has been segmented into three rough categories. Gateways route MCP traffic. Registries catalog MCP servers. Runtimes handle execution, authorization, and governance. Different vendors cover different layers. Most cover one. Some bundle two. The &lt;a href="https://www.arcade.dev/blog/mcp-gateways-runtimes-registries-guide/" rel="noopener noreferrer"&gt;breakdown of MCP gateways, runtimes, and registries&lt;/a&gt; shows where specific vendors stack up across the three categories.&lt;/p&gt;

&lt;p&gt;Within the runtime category, evaluate vendors against four capabilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Centralized lifecycle governance.&lt;/strong&gt; Does the runtime enforce the policies your organization has already defined elsewhere (IDPs, sales tools, security systems), or does it ask you to recreate them in a new tool? Look for one control plane with audit logs, version control, and visibility filtering across every agent and tool.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-user post-prompt authorization.&lt;/strong&gt; Does the runtime evaluate per-user, per-action permissions at execution time, or does it pass through a shared service account? Per-user OAuth, with credentials isolated from the LLM, is the bar.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-optimized tools, plus a path for proprietary ones.&lt;/strong&gt; Are the tools intent-translating, or are they raw API wrappers that make the agent fill in object IDs and enums? Does the vendor offer an open-source framework that lets you build custom MCP servers for internal systems and inherit auth, audit, and governance without rebuilding the runtime layer?
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom policy hooks for contextual access.&lt;/strong&gt; Can your compliance team add organization-specific logic (workflow state, time windows, request volume, contextual data on the user or session) as first-class enforcement primitives, without forking the runtime?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How Arcade delivers on each&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Arcade is the MCP runtime. It delivers all four capabilities in a single layer for multi-user AI agents at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent lifecycle governance.&lt;/strong&gt; Arcade is the central enforcement point for the policies your organization has already defined. It maps to and enforces policies from your IDPs, sales tools, and security systems. It does not ask you to recreate access policies inside a new tool. One control plane for every tool, agent, and auth provider. Version control to safely roll out tool upgrades. A shared registry that prevents teams from rebuilding what already exists. Visibility filtering so agents only see tools their user is permitted to invoke. Fine-grained audit logs, OpenTelemetry-exportable to your SIEM, that track every agent action per user and per service. Arcade's SOC 2 Type 2 certification validates these controls through an independent audit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent authorization.&lt;/strong&gt; Every MCP request in Arcade carries two identity layers: a project-level key (which application is making the request) and a user-level identity (on whose behalf the action is taken). Arcade evaluates the intersection of agent and user permissions dynamically at runtime to prevent privilege escalation. It handles the full OAuth lifecycle (refresh, rotation, mismatch) with credentials isolated from the LLM, and hooks into existing enterprise identity governance systems like Okta, Entra, and SailPoint to enforce policies the enterprise has already defined rather than duplicating them. That is the layer that removes prompt injection as a direct credential-theft vector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-optimized tools.&lt;/strong&gt; Arcade's &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;catalog of over 8,000 agent-optimized MCP tools&lt;/a&gt; are not API wrappers. They translate natural-language intent into structured API calls, so an agent asked to "send this to Finance" does not have to hallucinate the target &lt;code&gt;recipient_user_id&lt;/code&gt;. The token cost shows up in &lt;a href="https://www.arcade.dev/blog/attio-mcp-toolkit-benchmark/" rel="noopener noreferrer"&gt;benchmarks&lt;/a&gt;: for identical CRM queries, intent-level tooling produced 100x fewer response tokens than a raw API-passthrough approach, with token output equivalent to 3.7% of a 200K context window versus 373%. At scale, that overhead translates to context-window overflow in multi-step workflows and degraded agent accuracy. The runtime handles parallelized tool execution, failover, and retries. The &lt;a href="https://github.com/ArcadeAI/arcade-mcp" rel="noopener noreferrer"&gt;Arcade MCP Framework&lt;/a&gt; lets you build custom proprietary tools that federate into the same control plane with the same auth and governance wrapping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contextual access and custom policies.&lt;/strong&gt; Beyond enforcing policies your organization has already defined elsewhere, Arcade exposes pre- and post-tool-call hooks for custom logic. Compliance teams drop in their own variables (workflow state, time windows, request volume, contextual data on the user or session), and the runtime treats those as first-class enforcement primitives. Organization-specific conditions get wired in without forking the runtime.&lt;/p&gt;

&lt;p&gt;For enterprises with mixed requirements (proprietary-internal systems plus SaaS breadth, multi-user auth plus governance, fast shipping plus safety), Arcade covers the full set without forcing ecosystem lock-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final recommendation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For most enterprise deployments in 2026, buy an MCP runtime. The deployment profile shapes how the runtime gets deployed, not whether to deploy it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proprietary-internal-only.&lt;/strong&gt; Sensitive data is the strongest buy signal, not a build trigger. Legacy systems holding proprietary data are precisely where Arcade gets pulled in. That's where the operational pain peaks and where security and compliance officers carry the most direct accountability. A custom OAuth pipeline maintained by a small team is a position no security leader wants to defend in a regulated audit. An audited, SOC 2 Type 2 runtime that has already cleared third-party scrutiny is much easier to defend.&lt;/p&gt;

&lt;p&gt;Recommended pattern: build custom MCP servers using the Arcade MCP Framework, run them inside your VPC or on-prem, and create an MCP gateway in the runtime to connect them to the Arcade control plane. For environments where even the control plane must stay in customer infrastructure, run the runtime self-hosted. The data stays inside your boundary. The runtime handles auth, OBO, vaulted credentials, audit logs, and governance.&lt;/p&gt;

&lt;p&gt;For fully air-gapped deployments with no external network egress, run a self-hosted runtime entirely inside your infrastructure. The runtime layer is identical to the cloud version; only the deployment mode changes. Build your own runtime only when the deployment also prohibits third-party vendor software.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SaaS-heavy.&lt;/strong&gt; Once your agentic workflow needs to touch Google Workspace, Microsoft, Salesforce, GitHub, or Slack, you buy. The runtime handles the OAuth lifecycle, schema drift, and tool maintenance for hundreds of SaaS APIs your team would otherwise rebuild. The security gap is largest in this profile. So is the operational gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed (most enterprises).&lt;/strong&gt; Agents query proprietary internal databases, synthesize that data, and act in public SaaS applications. Mixed-requirement teams do not have to choose between proprietary security and SaaS breadth. Adopt an MCP runtime, such as &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;, for SaaS coverage, then create an MCP gateway in the runtime to connect internal MCP servers (or custom servers built with the &lt;a href="https://github.com/ArcadeAI/arcade-mcp" rel="noopener noreferrer"&gt;Arcade MCP Framework&lt;/a&gt;) to the same control plane. Both surfaces inherit the same security and audit controls, with multi-user authorization wrapping every action.&lt;/p&gt;

&lt;p&gt;If you have already built MCP servers without the Arcade Framework, you do not have to rewrite them. Connecting an existing custom server to Arcade still gives you the runtime's auth, audit, governance, and pre- and post-call policy hooks on top of what you already have.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Summary&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment profile&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Proprietary-internal-only&lt;/td&gt;
&lt;td&gt;Buy an MCP runtime&lt;/td&gt;
&lt;td&gt;Build custom MCP servers on the Arcade MCP Framework, run them inside your VPC or on-prem, and create an MCP gateway in the runtime to reach them. Self-hosted for environments where the control plane must stay in customer infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fully air-gapped (no external egress)&lt;/td&gt;
&lt;td&gt;Buy an MCP runtime, self-hosted&lt;/td&gt;
&lt;td&gt;Run a vendor's self-hosted runtime entirely inside your infrastructure. Build your own only when the deployment also prohibits third-party vendor software.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SaaS-heavy&lt;/td&gt;
&lt;td&gt;Buy an MCP runtime&lt;/td&gt;
&lt;td&gt;Adopt the runtime directly. It handles the OAuth lifecycle, schema drift, and tool maintenance for hundreds of SaaS APIs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed proprietary plus SaaS&lt;/td&gt;
&lt;td&gt;Buy an MCP runtime&lt;/td&gt;
&lt;td&gt;Arcade for SaaS coverage. An MCP gateway created in the runtime connects  internal MCP servers (built with or without the Arcade Framework) to the same control plane.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Decision checklist&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Run your deployment plan against these five questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Will your agents serve more than one human user with different permission scopes?
&lt;/li&gt;
&lt;li&gt;Do you need audit-grade logs that tie every tool call to a specific human, agent, and target system?
&lt;/li&gt;
&lt;li&gt;Do any of your agents take asynchronous actions that exceed standard LLM request timeouts?
&lt;/li&gt;
&lt;li&gt;Are you connecting to five or more external SaaS APIs across the organization?
&lt;/li&gt;
&lt;li&gt;Are your regulatory constraints so severe that no external network egress is permitted, even through a gateway running inside your own network?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;How to read the answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Yes" on any one of the five questions:&lt;/strong&gt; buy an MCP runtime.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise:&lt;/strong&gt; confirm fit against the three build cases in "When to build your own runtime" before committing to DIY.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The deployments that stall in 2026 fail on risk: auth that can't be audited, credentials sitting inside an LLM context window, and a security blast radius no one in the room can scope. Sensitive data raises that bar, which is why proprietary scenarios are a buy trigger, not a build trigger. Rebuilding OAuth pipelines and schema registries is a poor use of senior engineering time, and the build path stops compounding the moment a second user or a regulated audit enters the picture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev's MCP runtime&lt;/a&gt; provides agent lifecycle governance, agent authorization, and an agent-optimized tool catalog in a single layer.&lt;/p&gt;

&lt;p&gt;Next step: &lt;a href="https://www.arcade.dev/contact" rel="noopener noreferrer"&gt;book a 30-minute technical discovery call&lt;/a&gt; with Arcade's team to walk through the multi-user authorization architecture and the deployment options for your environment.&lt;/p&gt;

&lt;p&gt;Or &lt;a href="https://app.arcade.dev/register" rel="noopener noreferrer"&gt;start in the Arcade playground&lt;/a&gt;. Connect one tool, run one user-scoped action, and see how the runtime handles OAuth, policy, and audit in a single trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Frequently Asked Questions&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is the difference between an MCP server and an MCP runtime?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An MCP server is a single endpoint that exposes tools. An MCP runtime is the execution layer that hosts, secures, and governs those servers. The runtime handles production complexities like multi-user authorization, load balancing, and audit logging that individual servers lack.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do MCP runtimes handle rate limits and long-running tasks?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;They use the asynchronous MCP Tasks specification, returning a task ID immediately while managing the long-running job in the background. The runtime handles vendor-specific API rate limits, backoff retries, and connection failovers. Your agent polls for the final result without managing execution state.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why is multi-user authorization so difficult for custom AI agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Multi-user authorization requires dynamic, just-in-time credential management to prevent prompt-injection attacks that compromise a user's full account. Custom builds must securely orchestrate complex "On-Behalf-Of" token flows, vault credentials out of the LLM context window, manage strict refresh token rotation, and enforce granular access policies at execution time.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can you mix custom MCP servers with an MCP runtime?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. Custom MCP servers and runtimes are not alternatives. You build custom MCP servers for proprietary internal systems in either path. The question is whether you also build the runtime layer wrapping them. Runtimes support hybrid architectures: custom servers running proprietary tools inside your VPC connect to the runtime's control plane via a &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;gateway or a secure tunnel&lt;/a&gt;. This governs public SaaS and custom internal tools through a single control plane. Servers built on the runtime's &lt;a href="https://github.com/ArcadeAI/arcade-mcp" rel="noopener noreferrer"&gt;open-source framework&lt;/a&gt; inherit auth and audit automatically. Existing servers built without the framework connect to the runtime and still get its auth, audit, and policy hooks without being rewritten.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;When should we build our own runtime layer instead of buying an MCP runtime?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Build your own runtime if you have a single-user scope with no multi-user requirement, if the agent infrastructure is itself your core product, if you own every API in the pipeline (no third-party SaaS). Sensitive data on its own is not a build trigger. Air-gapped deployments are handled by self-hosted runtimes from vendors that offer them. Buy a runtime in every other case.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;When does it become cheaper to buy an MCP runtime?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once you support multiple integrations and multi-user OAuth. Maintenance and security work exceed the runtime's usage cost beyond three integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Do MCP runtimes expose OAuth tokens or credentials to the LLM?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. The runtime keeps credentials in a vault and issues tightly scoped, just-in-time tokens for tool execution without placing secrets in the model context.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What security and compliance features should an enterprise MCP runtime include?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Post-prompt authorization, least-privilege policy enforcement, immutable audit logs (OpenTelemetry-friendly), secret vaulting and rotation, and admin controls for tool access and visibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is "post-prompt" (on-behalf-of) authorization for AI agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Post-prompt authorization means the runtime authorizes and executes each tool call using the requesting user's permissions at execution time, rather than using a shared service account or passing user tokens into prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How much latency does an MCP runtime add?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A small overhead from per-tool authorization checks and just-in-time token resolution. The overhead is a fraction of LLM inference and downstream SaaS API latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can an MCP runtime work in a private VPC or hybrid environment?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. The runtime's MCP gateway lets  internal MCP servers run inside your VPC while governance and routing stay centralized. Self-hosted deployment runs the runtime entirely in your own infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do MCP runtimes help with audit logging and incident response?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;They record who requested the action, which tool was called, the parameters, results, and timing. All exportable to SIEM via OpenTelemetry for compliance and investigations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do MCP runtimes handle SaaS API changes and version drift?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The vendor maintains tool schemas and centrally updates integrations. This reduces breakage from deprecations and keeps tool definitions consistent across agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can we start DIY and migrate to a runtime later?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. Teams begin with DIY for prototypes and migrate to a runtime when multi-user auth, governance, and operational load become production requirements.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>Claude Code Routines: 5 production workflows that ship real work</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Fri, 01 May 2026 16:50:58 +0000</pubDate>
      <link>https://dev.to/arcade/claude-code-routines-5-production-workflows-that-ship-real-work-25il</link>
      <guid>https://dev.to/arcade/claude-code-routines-5-production-workflows-that-ship-real-work-25il</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code Routines enable unattended, cloud-run workflows&lt;/strong&gt; via scheduled, API, and GitHub event triggers. Enterprise use breaks with demo-grade setups.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily run caps and shared subscription usage push teams to batch work&lt;/strong&gt; into a single daily "meta-orchestrator" routine plus a few real-time triggers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 production workflows:&lt;/strong&gt; incident postmortem drafting, on-call triage → ticket drafts, PR-aging report, expansion-signal scanning, and changelog PR generation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key enterprise risks:&lt;/strong&gt; over-permissioned connectors, prompt injection from untrusted inputs, API rate limits (notably Slack history), and weak auditability.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production pattern:&lt;/strong&gt; use an &lt;strong&gt;MCP runtime&lt;/strong&gt; that delivers &lt;strong&gt;agent authorization&lt;/strong&gt;, &lt;strong&gt;agent-optimized tools&lt;/strong&gt;, and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt;, plus &lt;strong&gt;human approval gates&lt;/strong&gt; for write actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud-hosted agents are not new. OpenClaw, Perplexity Computer, n8n, Zapier, and a handful of SaaS agent runtimes have been executing unattended work for a while. The release of Claude Code Routines adds a different option: teams that already use Claude Code as their day-to-day development agent can now run that same agent, with the same prompts, tools, and conventions, on Anthropic's cloud instead of tethered to a laptop.&lt;/p&gt;

&lt;p&gt;A routine is a saved Claude Code configuration (a prompt, one or more repositories, and a set of connectors) packaged once and run automatically on Anthropic-managed cloud infrastructure. Each routine can attach any combination of three trigger types: scheduled (recurring cadence), API (POST to a per-routine endpoint with a bearer token), and GitHub events (pull request or release activity on a connected repository). Routines are currently in &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;research preview&lt;/a&gt;, so limits and API shapes are still moving.&lt;/p&gt;

&lt;p&gt;Most of the early Routines content focuses on personal productivity: meeting prep, inbox summaries, and calendar wrangling. For senior developers and engineering leaders trying to run autonomous agents across an enterprise, those demos do not cut it.&lt;/p&gt;

&lt;p&gt;Moving from a script on one laptop to a production-grade engineering workflow means dealing with the realities of enterprise architecture. Production automation demands strict governance, robust security boundaries, and the ability to work within aggressive API rate limits.&lt;/p&gt;

&lt;p&gt;This article covers five production-leaning, unattended routines designed for engineering teams. We'll map exactly what happens at runtime, identify which workflows need human oversight, and outline the governance models you need to safely run scheduled, API-triggered, and GitHub-triggered Claude Code sessions without compromising your infrastructure. Before getting to the workflows, it's worth looking at why demo-grade setups buckle the moment they move from a single laptop to a shared team environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where demo patterns hit production reality (security, reliability, governance)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Routines formalize what teams have been wiring together with cron jobs, GitHub Actions, and custom middleware for two years: Claude Code running on a schedule, against a GitHub event, or through an API call, with no developer laptop in the loop. But moving from a single developer's personal setup to a shared enterprise environment exposes severe limitations in security, reliability, and auditability. Fast.&lt;/p&gt;

&lt;p&gt;Start with the execution model. Per &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt;, routines &lt;em&gt;"run autonomously as full Claude Code cloud sessions: there is no permission-mode picker and no approval prompts during a run."&lt;/em&gt; Whatever the agent decides to do, it does. At the speed of inference, without a human in the loop. That shifts the burden of "what is this agent allowed to do" from interactive confirmation to pre-deployment configuration. If the configuration leans on bundled first-party connectors and creator-inherited OAuth scopes, the guardrails come off exactly when you need them most.&lt;/p&gt;

&lt;p&gt;The most critical vulnerability is the permission inheritance model of bundled first-party connectors.&lt;/p&gt;

&lt;p&gt;In a standard setup, an automated routine inherits the full global access of the developer who created it. &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt; make the consequence explicit: &lt;em&gt;"Anything a routine does through your connected GitHub identity or connectors appears as you: commits and pull requests carry your GitHub user, and Slack messages, Linear tickets, or other connector actions use your linked accounts for those services."&lt;/em&gt; A first-party OAuth token works for a single developer querying their personal pull requests. It becomes a massive liability the moment you deploy it as an unattended routine on behalf of a whole team.&lt;/p&gt;

&lt;p&gt;If an agent operates with an engineering lead's administrative permissions, a single compromised routine gains unrestricted read and write access across your entire enterprise system. This architecture fails security reviews every time the automation touches shared customer data, source code, or regulated infrastructure.&lt;/p&gt;

&lt;p&gt;This over-permissioning makes &lt;a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection" rel="noopener noreferrer"&gt;prompt injection&lt;/a&gt; threats way worse. Unattended routines ingest untrusted third-party text by design. They process incoming PagerDuty incident descriptions, analyze raw Sentry stack traces, and scan customer support emails.&lt;/p&gt;

&lt;p&gt;Without typed, permission-scoped tool contracts to validate the output, a malicious payload hidden in a customer ticket can instruct the routine to exfiltrate data or delete production resources. Natural language instructions won't stop these exploits in an enterprise environment.&lt;/p&gt;

&lt;p&gt;Operational and reliability constraints compound the problem. Routines &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;draw down the same subscription usage&lt;/a&gt; as interactive sessions, plus a separate daily cap on how many runs can start per account. Anthropic doesn't publish a specific number, and Claude usage tightens once team activity ramps up, so unattended workflows have to be designed with quota-awareness from day one.&lt;/p&gt;

&lt;p&gt;This forces engineering teams to abandon simple event-driven architectures for complex batch processing. You can't trigger a routine for every individual pull request comment. Instead, you orchestrate batch jobs that process dozens of events at once to conserve quota, or enable extra usage and accept metered overage when caps hit.&lt;/p&gt;

&lt;p&gt;Reliability and visibility close out the failure list. Early adopters report consistent issues with bundled connectors in unattended execution: &lt;a href="https://github.com/anthropics/claude-code/issues/45306" rel="noopener noreferrer"&gt;community issue trackers show silent failures&lt;/a&gt; during runtime, OAuth token expiration errors that crash scheduled tasks, and connectors that fail to load in the cloud environment.&lt;/p&gt;

&lt;p&gt;Bundled connectors also lack auditability. When an unattended routine updates a Jira ticket, queries a GitHub repository, and posts a Slack message, standard bundled connectors give you opaque execution logs. Security teams can't construct a definitive audit trail of what the agent did across multiple platforms.&lt;/p&gt;

&lt;p&gt;The rest of this article shows how a dedicated MCP runtime resolves each of these failure modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Over-permissioned token&lt;/td&gt;
&lt;td&gt;Per-user, per-tool authorization evaluated per action&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt injection from untrusted text&lt;/td&gt;
&lt;td&gt;Agent-optimized tools with schema enforcement and isolated credentials&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quota overrun&lt;/td&gt;
&lt;td&gt;Meta-orchestrator batching plus targeted GitHub event triggers&lt;/td&gt;
&lt;td&gt;Routine design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silent write to production&lt;/td&gt;
&lt;td&gt;Human approval gate on drafts, PRs, or prefixed branches&lt;/td&gt;
&lt;td&gt;Workflow config and branch protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No audit trail for compliance&lt;/td&gt;
&lt;td&gt;Full execution context logged per tool call, exportable via &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5 production Claude Code routine workflows you can batch into one daily run&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The risks and controls above become concrete through workflow design. Before the patterns, one operational constraint shapes every choice below: quota. Routines share subscription usage with interactive sessions and add a daily cap on runs per account, so running a separate routine for every minor event burns through the budget fast.&lt;/p&gt;

&lt;p&gt;The solution is to architect a single "meta-orchestrator" routine that wakes up once a day, runs a sequential batch of discrete data-gathering and reporting tasks, and shuts down. That consumes one run from your daily cap.&lt;/p&gt;

&lt;p&gt;This strategy saves your remaining runs for critical, real-time API and GitHub event triggers that demand immediate attention.&lt;/p&gt;

&lt;p&gt;Here are five concrete engineering workflows designed for this quota-aware framework, with their technical triggers, human approval surfaces, and governance requirements. Three of them (nightly incident postmortem, weekly PR-aging, expansion-signal scanning) sit inside the meta-orchestrator and share the daily run. The other two (Sentry triage, release-notes draft) run real-time because their value is latency-bound. You want the Linear ticket while the incident is hot, and the changelog draft as soon as the release tag lands.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Routine&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Primary tools&lt;/th&gt;
&lt;th&gt;Approval surface&lt;/th&gt;
&lt;th&gt;Run slot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nightly incident postmortem&lt;/td&gt;
&lt;td&gt;Scheduled (2:00 AM daily)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/social/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/notion" rel="noopener noreferrer"&gt;Notion&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Human engineers review and publish the drafted Notion page&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On-call Sentry triage&lt;/td&gt;
&lt;td&gt;API (Sentry webhook → routine &lt;code&gt;/fire&lt;/code&gt; endpoint)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.sentry.io/ai/mcp/" rel="noopener noreferrer"&gt;Sentry&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/linear" rel="noopener noreferrer"&gt;Linear&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;On-call engineer triages the drafted Linear ticket queue&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly PR-aging report&lt;/td&gt;
&lt;td&gt;Scheduled (Friday morning)&lt;/td&gt;
&lt;td&gt;GitHub, email&lt;/td&gt;
&lt;td&gt;Read-only; no write approval needed&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expansion signal scanner&lt;/td&gt;
&lt;td&gt;API (nightly)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/sales/hubspot" rel="noopener noreferrer"&gt;HubSpot&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/social/slack" rel="noopener noreferrer"&gt;Slack Search&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Account managers review flagged accounts in a Slack channel&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Friday release notes draft&lt;/td&gt;
&lt;td&gt;GitHub event (release created)&lt;/td&gt;
&lt;td&gt;GitHub, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/jira" rel="noopener noreferrer"&gt;Jira&lt;/a&gt; / &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/linear" rel="noopener noreferrer"&gt;Linear&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;PM reviews the pull request and merges the changelog&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Nightly incident postmortem draft (PagerDuty, Slack, Notion)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Assembling a postmortem means stitching PagerDuty timestamps, Slack threads, and deploy markers into a readable narrative. This workflow does the assembly and drafts the first pass so the engineer lands on a structured Notion page instead of a blank one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; Scheduled. Runs as the first sequence in the daily 2:00 AM meta-orchestrator.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries the PagerDuty API for resolved events from the previous 24 hours. The hard part is Slack context: the &lt;a href="https://api.slack.com/methods/conversations.history" rel="noopener noreferrer"&gt;conversations.history endpoint&lt;/a&gt; now rate-limits non-Marketplace apps to one request per minute, so bulk-ingesting incident channels is off the table. The routine uses the Slack Search API to isolate key messages, or fires via the API trigger when a Slack reaction-event webhook (configured in your Slack app) POSTs to the routine's &lt;code&gt;/fire&lt;/code&gt; endpoint after an engineer drops a designated emoji on a summary message. It then drafts a Notion page with a timeline, impact, and initial resolution steps.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine runs unattended. An engineer reviews, edits, and publishes the Notion draft the next morning.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Scope the PagerDuty token to read-only on specific services. Scope Slack tokens to the incident channels only, not org-wide.
&lt;/li&gt;
&lt;li&gt;Redact customer identifiers (email, user ID, account ID) at the tool layer before the draft is written to Notion. Do not rely on the model to scrub PII.
&lt;/li&gt;
&lt;li&gt;Log triggering PagerDuty incident ID → drafted Notion page ID for every run, not just on failure.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;On-call triage and ticket creation (Sentry to Linear)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When a service degrades, on-call engineers get paged with a dozen near-identical error reports. This workflow groups the noise by Sentry fingerprint and files one Linear ticket per cluster so the on-call triages root causes, not duplicates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; API. Claude Code Routines don't accept arbitrary third-party webhooks (only GitHub events), so configure &lt;a href="https://docs.sentry.io/product/integrations/integration-platform/webhooks/" rel="noopener noreferrer"&gt;Sentry's webhook integration&lt;/a&gt; to POST to the routine's &lt;code&gt;/fire&lt;/code&gt; endpoint with its bearer token when an error spike crosses a configured threshold. Runs outside the daily orchestrator because triage value drops fast if it waits.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine reads fresh events from Sentry, groups them by fingerprint to collapse duplicates, and ranks clusters by event count and affected-users count. Each cluster becomes a Linear ticket with the stack trace snippet, affected release, and a link back to the Sentry issue. Tickets land in an un-triaged queue with a default P3 label.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine never triages itself. The on-call engineer reviews the queue, adjusts severity, and assigns the ticket.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Scope the Sentry token to specific project slugs. Exclude projects flagged as handling authentication or payment data.
&lt;/li&gt;
&lt;li&gt;Strip user-supplied strings (URL params, form inputs, search terms) from error payloads before the agent sees them. Those fields are the prompt-injection surface.
&lt;/li&gt;
&lt;li&gt;Log the mapping from Sentry event ID → Linear ticket ID. This is what lets post-incident reviews reconstruct which alert caused which ticket.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Weekly pull request aging and code review report (GitHub)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stale PRs create merge conflicts, block releases, and erode review velocity. This workflow replaces the Friday morning dashboard sweep with a single email that names the three PRs each lead needs to act on.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; Scheduled. The daily orchestrator runs the workflow every day; the body skips itself on non-Fridays.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries the &lt;a href="https://docs.github.com/en/graphql/overview/resource-limitations" rel="noopener noreferrer"&gt;GitHub GraphQL API&lt;/a&gt; for PRs open longer than three days across the org, pulling each PR's review state, failing check runs, and unresolved review comments in a single query. It summarizes each PR's blocker (waiting on reviewer X, failing CI check Y, unresolved change requests) and emails a grouped digest to the relevant engineering leads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; Read-only. The email dispatches without human intervention, so the token scope is the real control.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Use a &lt;a href="https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/about-authentication-with-a-github-app" rel="noopener noreferrer"&gt;GitHub App token&lt;/a&gt; with metadata, pull_requests, and issues read-only. Do not grant contents scope; the routine never needs the diff.
&lt;/li&gt;
&lt;li&gt;Strip code blocks from the email template before send, even if the agent tries to paste one.
&lt;/li&gt;
&lt;li&gt;Send from a dedicated service-account email, not a developer mailbox, so downstream audit trails stay clean.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Expansion signal scanner for customer health (HubSpot, Slack)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Support tickets and shared Slack channels are where customers accidentally self-identify as enterprise-tier: questions about rate limits, SSO, SOC 2 reviews, and data residency. This workflow surfaces those signals into a single account-health feed so the revenue team sees them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; API-triggered. Runs as part of the nightly meta-orchestrator.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries HubSpot for tickets created or updated in the last 24 hours and scans the body and notes for enterprise-tier keywords ("rate limits," "SSO," "SOC 2," "HIPAA," "data residency"). For shared customer Slack channels, bulk history ingestion is off the table because of conversations.history rate limits, so the routine uses the Slack Search API against the same keyword set. Each matching account gets a row in an internal Slack post with links back to the source ticket or message.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; Findings land in a dedicated internal Slack channel with source links. An account manager reviews each flagged account and decides whether to open an expansion conversation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;The routine never writes to HubSpot. It reads from an allowlist of ticket properties (subject, body, pipeline stage) and nothing else.
&lt;/li&gt;
&lt;li&gt;Restrict the Slack token to public support channels plus explicitly listed shared customer channels. Never grant channels:history org-wide.
&lt;/li&gt;
&lt;li&gt;Log which account IDs, ticket IDs, and Slack message IDs were scanned on each run, along with which keywords matched. The keyword that triggered the flag is the part account managers need to trust the signal.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Friday release notes and changelog draft (GitHub, Jira/Linear)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Commit messages are written for engineers; release notes are written for customers. This workflow drafts the customer version so the product team edits prose instead of compiling a changelog from scratch.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; GitHub event trigger on &lt;code&gt;release.created&lt;/code&gt;, scoped to the specific repository. Requires the Claude GitHub App installed on the repo. Running &lt;code&gt;/web-setup&lt;/code&gt; alone grants clone access but doesn't enable webhook delivery.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine finds the previous release tag, collects every PR merged into main between the two tags, and resolves each PR back to its Jira or Linear ticket using the ticket ID conventionally placed in the PR title or body. It then drafts customer-facing release notes in Markdown, grouped by feature area. One caveat: the bundled GitHub MCP connector has &lt;a href="https://github.com/anthropics/claude-code/issues/45306" rel="noopener noreferrer"&gt;gaps around basic writes like updating the release body directly&lt;/a&gt;, so the routine opens a pull request against a release-notes/ branch instead of editing the release in place.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine commits the Markdown to a release-notes/&amp;lt;tag&amp;gt; branch and opens a PR. A product manager edits the copy and merges.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Give the routine read-only access to Jira and Linear. It should never change a ticket's status or rewrite acceptance criteria.
&lt;/li&gt;
&lt;li&gt;Enforce a &lt;a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/about-protected-branches" rel="noopener noreferrer"&gt;branch protection rule&lt;/a&gt;: the routine's write token can only push to branches matching release-notes/*. The main branch is structurally unreachable.
&lt;/li&gt;
&lt;li&gt;Log triggering release tag → list of PRs analyzed → resulting changelog PR number. When the next release breaks, provenance is what makes the diff debuggable.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to evaluate an enterprise MCP runtime for Claude Code routines&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Every workflow above has a shared dependency: the tool layer underneath. Native Claude Code Routines can't safely execute these tasks on bundled connectors alone. Workflow 5's note about the GitHub connector missing basic writes is representative of the stock first-party set, not an outlier.&lt;/p&gt;

&lt;p&gt;Relying on bundled connectors and first-party token inheritance also means rate-limit failures, prompt injection exploits, and security audits that halt deployment.&lt;/p&gt;

&lt;p&gt;What's missing is a purpose-built &lt;a href="https://www.arcade.dev/mcp" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt;: the execution layer where tools run, credentials are resolved just-in-time, and every action is authorized against a specific user's permissions. This is not another proxy in front of your enterprise systems; &lt;a href="https://www.arcade.dev/documents/why-mcp-needs-a-runtime.pdf" rel="noopener noreferrer"&gt;the agent is already the proxy&lt;/a&gt;. The runtime is where the tool call lands, where identity and policy are evaluated, and where the audit record is written. Critically, the runtime is stateful. It maintains per-session, per-user context across an agent's entire reasoning loop, which is exactly what a stateless proxy cannot do. And this statefulness is what makes per-user, per-tool authorization enforceable.&lt;/p&gt;

&lt;p&gt;An enterprise MCP runtime delivers three capabilities working in concert: &lt;strong&gt;agent authorization&lt;/strong&gt; (per-user, per-tool, per-action), &lt;strong&gt;agent-optimized tools&lt;/strong&gt; (built for LLM consumption, not API passthrough), and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt; (centralized control, versioning, and full-execution audit logs).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Bundled first-party connectors&lt;/th&gt;
&lt;th&gt;Enterprise MCP runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Permission model&lt;/td&gt;
&lt;td&gt;Inherits the creator's global OAuth scope&lt;/td&gt;
&lt;td&gt;Scoped per routine, per user, per action&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth lifecycle&lt;/td&gt;
&lt;td&gt;Token embedded at setup; manual refresh&lt;/td&gt;
&lt;td&gt;Runtime manages refresh, rotation, and expiry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit logs&lt;/td&gt;
&lt;td&gt;Opaque, per-connector, not unified&lt;/td&gt;
&lt;td&gt;Full chain of custody per tool call (user, tool, params, result), exportable to SIEM via OpenTelemetry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt injection defense&lt;/td&gt;
&lt;td&gt;None; LLM parses raw input into API calls&lt;/td&gt;
&lt;td&gt;Multi-layered: isolated credentials, per-action auth, schema enforcement, visibility filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate-limit handling&lt;/td&gt;
&lt;td&gt;Direct hits against upstream APIs&lt;/td&gt;
&lt;td&gt;Throttling, batching, and targeted webhooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool catalog&lt;/td&gt;
&lt;td&gt;Stock first-party set only&lt;/td&gt;
&lt;td&gt;The &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;largest catalog of agent-optimized MCP tools (8000+)&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway composition&lt;/td&gt;
&lt;td&gt;One OAuth/connector per upstream service&lt;/td&gt;
&lt;td&gt;Runtime-level federation: tools composed into a single identity-scoped URL (Arcade calls this the &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;MCP Gateway feature&lt;/a&gt;: a composition layer, not a proxy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-harness portability&lt;/td&gt;
&lt;td&gt;Claude Code only&lt;/td&gt;
&lt;td&gt;Any MCP-compatible harness (Codex, OpenCode, local-model)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent authorization: per-user, per-tool, evaluated at runtime&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The most critical function of a dedicated MCP runtime is handling multi-user &lt;a href="https://docs.arcade.dev/home/auth/how-arcade-helps" rel="noopener noreferrer"&gt;agent authorization&lt;/a&gt;, sometimes called post-prompt authorization.&lt;/p&gt;

&lt;p&gt;Single-user demos hide the real problem. &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt; are explicit that &lt;em&gt;"routines belong to your individual claude.ai account. They are not shared with teammates."&lt;/em&gt; Every routine is structurally a single-user artifact, even when the work it does affects an entire team. The moment a routine has to act on behalf of multiple users (one-per-engineer on a platform team, or org-wide when a customer-health scanner runs for every account manager), shared service accounts and creator-inherited OAuth scopes collapse as a model. Teams either give the agent broad permissions (and an intern bypasses their access controls through the agent) or inherit the user's full permissions (and one prompt injection cascades through every system that user can touch). The right answer is the intersection: &lt;em&gt;what is this agent allowed to do AND what is this user allowed to do&lt;/em&gt;, evaluated per action at runtime. That is the problem the runtime has to solve before routines can move past single-user demos.&lt;/p&gt;

&lt;p&gt;Rather than letting a routine inherit the global, administrative permissions of its creator, an advanced runtime isolates the LLM entirely from underlying credentials and executes every tool call On-Behalf-Of (OBO) a specific user. The runtime evaluates the intersection of the agent's baseline permissions and that user's native permissions per action at runtime, so every action is attributable to a specific human in the audit log.&lt;/p&gt;

&lt;p&gt;Authorization is just-in-time. The runtime requests and validates credentials only when a specific user action requires them. If a user never invokes the Salesforce integration, no Salesforce tokens are ever obtained or stored. The entire OAuth flow (token exchange, refresh, storage) executes in deterministic backend logic that the LLM can never observe, alter, or leak. For additional governance, teams attach pre-tool-call and post-tool-call hooks to enforce custom policies: human-in-the-loop approvals for destructive actions, usage limits, or contextual access rules.&lt;/p&gt;

&lt;p&gt;The runtime &lt;a href="https://docs.arcade.dev/en/references/auth-providers/oauth2" rel="noopener noreferrer"&gt;manages the entire OAuth token lifecycle&lt;/a&gt;. It handles token refresh, rotation, and mismatch scenarios outside the view of the LLM. If a routine tries to access a repository the target user can't see, the runtime blocks the action at the protocol layer.&lt;/p&gt;

&lt;p&gt;Critically, the runtime hooks into the identity and entitlement systems you already run (Okta, Entra, SailPoint) instead of asking you to redefine authorization policies in yet another system. It acquires scoped tokens just-in-time, enforces the policy your IDP already owns, and keeps credentials isolated from the LLM and the MCP client. The runtime delegates authorization to what the enterprise has already defined; it doesn't duplicate it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent-optimized tools: built for LLM consumption, not API passthrough&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most MCP servers today are thin API wrappers. When a user says "update the Acme deal," the wrapper still asks the agent for &lt;code&gt;opportunity_id&lt;/code&gt;, &lt;code&gt;owner_id&lt;/code&gt;, &lt;code&gt;stage_enum&lt;/code&gt;, and &lt;code&gt;close_date&lt;/code&gt;. The agent fills those parameters probabilistically and either guesses the wrong values or retries blindly. This failure mode is called parameter hallucination, and it's where most agent failures happen in production. A proxy layer has no mechanism to close it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcade.dev/guides/create-tools/tool-basics/build-mcp-server" rel="noopener noreferrer"&gt;Agent-optimized tools&lt;/a&gt; invert this pattern. When a user asks to "make the intro paragraph friendlier," the tool translates that to &lt;code&gt;segmentId=gz49hg56, index=350, text='your friendlier message'&lt;/code&gt;. The agent never thinks beyond "intro paragraph." Every tool ships with rich semantic descriptions to help the LLM pick correctly, consistent schemas across services regardless of the underlying API, and agent-interpretable errors instead of raw HTTP status codes. In practice this ships as the &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;largest catalog of pre-built agent-optimized MCP tools (8000+)&lt;/a&gt;, covering productivity, CRM, communication, and developer systems, so teams skip the wrap-an-API-in-MCP step entirely.&lt;/p&gt;

&lt;p&gt;Reliability is a runtime concern, not an agent concern. Pagination, rate limiting, retries, and failover all get handled by the runtime, invisible to the agent. Tools execute in parallel where safe; failed calls retry with additional developer-defined context; MCP servers fail over automatically. The agent gets a clean result or a clean error, never a half-paginated list or a transient network blip bubbling up into the reasoning loop.&lt;/p&gt;

&lt;p&gt;Strict schemas also harden the tool layer against prompt injection. Schema enforcement is one layer of the defense, not the whole defense. A malicious payload buried in a customer email can't talk the agent into a destructive call that doesn't match an approved schema. More importantly, credentials never leave the runtime, so a jailbroken prompt has no tokens to exfiltrate. Per-user authorization is evaluated at every action, so an injected instruction can't do more than the acting user is already permitted to do. And visibility filtering scopes the tools a routine can even see, so there's no latent high-privilege tool hanging around for a payload to discover. Prompt injection defense has to be structural and in depth: at the tool layer, the auth layer, and the governance layer. Not a prompt-level patch.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent lifecycle governance: centralized control and full visibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agent lifecycle governance is the third pillar of an enterprise MCP runtime. Deploying autonomous agents at scale requires centralized control over which tools are available, to whom, and with what permissions, plus total visibility into what's happening at runtime.&lt;/p&gt;

&lt;p&gt;A dedicated runtime provides a full chain of custody for every agent action (user identity, tool name, parameters, and result), exportable to your SIEM via OpenTelemetry. Independent attestation (&lt;a href="https://www.arcade.dev/blog/soc-2-compliance-ai-agents-production-security/" rel="noopener noreferrer"&gt;Arcade.dev is SOC 2 Type 2 certified&lt;/a&gt;) validates that these controls hold in production, which matters when security reviews start before deployment, not after. The runtime also lets security teams enforce visibility filtering so a routine only sees the tools it explicitly has permission to use, and provides the infrastructure to mandate human-approval gates for any routine attempting to write data to a production system.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Portability across agent runtimes using MCP&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Investing in an MCP runtime also guarantees architectural portability. Because tools are exposed over the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;open MCP standard&lt;/a&gt;, the heavy lifting of building tool contracts, managing OAuth flows, and establishing governance policies happens once.&lt;/p&gt;

&lt;p&gt;That investment is usable from any MCP client (Claude Code Routines, Cursor, Claude Desktop, VS Code, ChatGPT, and custom applications) and stays portable across other agent harnesses like OpenAI Codex or on-prem deployments running open-weights models for regulated workloads. When your team swaps Claude for a different harness on a specific workflow, or moves sensitive routines onto on-prem compute for compliance reasons, the tool contracts, OAuth flows, and audit logs travel with you. The agent harness changes; the governance layer does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to test and deploy your first remote Claude Code routine&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With the runtime in place, the remaining question is how to ship a routine to production without breaking things. Writing a prompt, attaching a token, and flipping the schedule is not the move. The four-step framework below enforces clear boundaries on top of your MCP runtime:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Wire up Arcade MCP Gateway as a custom connector&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before you can safely test anything, give the routine somewhere governed to call. With Arcade, the flow is (full integration walkthrough at &lt;a href="https://docs.arcade.dev/en/get-started/mcp-clients/claude-code" rel="noopener noreferrer"&gt;Arcade for Claude Code&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your &lt;a href="http://app.arcade.dev/" rel="noopener noreferrer"&gt;Arcade dashboard&lt;/a&gt;, create a new &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;&lt;strong&gt;MCP Gateway&lt;/strong&gt;&lt;/a&gt;. Configure it with &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;&lt;strong&gt;Arcade auth&lt;/strong&gt;&lt;/a&gt; so tools inherit per-user, per-action authorization rather than a shared service account.
&lt;/li&gt;
&lt;li&gt;Add the tools this routine needs to the gateway, scoped to the minimum the workflow requires and nothing more.
&lt;/li&gt;
&lt;li&gt;In the Claude web interface, create a &lt;strong&gt;custom connector&lt;/strong&gt; pointing at the gateway's URL.
&lt;/li&gt;
&lt;li&gt;Complete the one-time authorization to link the connector to the gateway.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With the connector live, any routine you create can include it alongside (or in place of) bundled first-party connectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Sandbox execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Never test a new routine against production data. Sandbox the execution using the &lt;code&gt;/schedule&lt;/code&gt; command in the CLI or the "Run now" feature in the web interface.&lt;/p&gt;

&lt;p&gt;Point the routine at a scratch Notion workspace, a dedicated testing Slack channel, or a sandbox GitHub repository. Conduct multiple dry runs to observe how the routine handles edge cases, unexpected inputs, and empty datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Start with read-only permissions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When configuring the routine for its initial deployment, enforce a strict "Read-Only First" mandate. Use your Arcade gateway to scope the routine's MCP tools exclusively to read operations.&lt;/p&gt;

&lt;p&gt;For example, if you're building an incident triage routine, allow the routine to read from PagerDuty and output its analysis to a simple text file or a private Slack message. Validate the quality of the routine's logic and data extraction for at least one week before granting permission to write data or create tickets.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Add human approval gates for write actions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As you transition the routine to handle write operations, establish hard structural boundaries that mandate human oversight.&lt;/p&gt;

&lt;p&gt;Don't allow the agent to commit directly to your main branch or publish documentation live. Instead, configure the routine to draft documents, open pull requests, or push code exclusively to branches with a specific prefix. Every destructive or state-changing action requires a human engineer to review and merge the work.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where to start&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code Routines deliver genuine unattended automation for engineering teams: Claude Code running on a schedule, GitHub event, or API call, entirely off the developer laptop. Realizing that value across an organization means acknowledging that moving from a localized laptop demo to a nightly production workflow introduces severe architectural and security challenges.&lt;/p&gt;

&lt;p&gt;You can't run autonomous workflows at scale using bundled connectors, first-party token inheritance, and opaque execution logs. Production deployments demand typed tool contracts, robust rate-limit handling, and explicit permission scoping to protect against prompt injection and data exposure.&lt;/p&gt;

&lt;p&gt;If your engineering team is evaluating how to run unattended AI agents safely, &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade is the industry’s first MCP runtime&lt;/a&gt; purpose-built for this. By unifying &lt;strong&gt;agent authorization&lt;/strong&gt;, &lt;strong&gt;agent-optimized tools&lt;/strong&gt;, and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt; in a single runtime, we let you ship reliable production workflows without spending months rebuilding security and operational plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are Claude Code Routines, and what changed in the April 2026 release?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A routine is a saved Claude Code configuration (prompt, repositories, and connectors) packaged to run automatically on Anthropic-managed cloud infrastructure. The April 2026 release shipped three trigger types: scheduled, API (per-routine &lt;code&gt;/fire&lt;/code&gt; endpoint with a bearer token), and GitHub events (pull request or release activity on a connected repository). Routines are currently in research preview.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How many times per day can a Claude Code Routine run?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Routines share subscription usage with interactive sessions and have an additional daily cap on how many runs can start per account. Anthropic doesn't publish a specific number and it can change during the research preview, so per-event routines that fire on every PR comment or alert quickly become impractical.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do teams work around routine run quotas in production?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Two options. First, batch multiple tasks into a single daily "meta-orchestrator" routine and reserve real-time runs for only the highest-severity API and GitHub event triggers. Second, enable extra usage in Settings → Billing so runs that hit the cap continue on metered overage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why are bundled connectors risky for enterprise unattended routines?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Bundled first-party connectors inherit the creating developer's global OAuth scope. That permission inheritance fails security reviews the moment the routine touches shared code, customer data, or regulated systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do unattended routines increase prompt injection risk?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Untrusted third-party text (PagerDuty descriptions, Sentry traces, customer emails) flows directly into the agent at runtime. A payload buried in that text can steer the agent toward unsafe actions. Defense has to be multi-layered at the runtime: isolated credentials the LLM never sees, per-user authorization evaluated on every action, schema enforcement on each tool call, and visibility filtering so the routine can't even discover tools it isn't permitted to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is an MCP runtime, and why do I need it?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An MCP runtime is the execution layer where agent tool calls run. It resolves credentials just-in-time, authorizes each action against a specific user's permissions, enforces tool schemas, and writes a unified audit log. It is not another proxy in front of your enterprise systems. The agent is already the proxy. The runtime is where identity, policy, and execution come together.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is "post-prompt authorization"?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The runtime checks each individual tool action at execution time against the acting user's permissions and the routine's policy. The routine never inherits the creator's blanket credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Which routine actions should require human approval?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Any write or state-changing action (creating tickets, committing code, publishing documentation) should land as a draft, PR, or triage queue and go through a human review gate before merging.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do Slack API rate limits affect these workflows?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Slack's conversations.history endpoint now rate-limits non-Marketplace apps to a single request per minute. Production designs use Slack Search, targeted webhooks, or curated context instead of bulk history pulls.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What should I implement first to deploy a safe routine?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Wire up Arcade as a custom connector first so the routine calls tools through a governed runtime, then test in a sandbox, enforce read-only tools, and introduce human-in-the-loop gates before granting write permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What should be logged for auditability in enterprise routines?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Log the triggering event, the tools called, the target resources, the acting user or service account, and the resulting object IDs (e.g., Sentry event ID → Linear ticket ID).&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>claude</category>
      <category>devops</category>
    </item>
    <item>
      <title>Claude Code for the Outer Loop: An AI SRE Playbook to Reduce On-Call Toil</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 22 Apr 2026 22:03:16 +0000</pubDate>
      <link>https://dev.to/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</link>
      <guid>https://dev.to/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</guid>
      <description>&lt;p&gt;It is 2:13am. PagerDuty fires for checkout-service, p95 past threshold for four minutes. You open Datadog, find the wrong dashboard, then the right one, then the CI tool for recent deploys, then Jira for open incidents, then #incidents in Slack to check whether a co-worker is already in the war room. Eight minutes in, you have a working hypothesis.&lt;/p&gt;

&lt;p&gt;That is not incident response. That is a context-loading tax the on-call pays before the work begins.&lt;/p&gt;

&lt;p&gt;Coding agents, such as Claude Code, are eating the inner loop. The outer loop is a different story. Operational work (incident response, runbook execution, SLO investigation, on-call handoffs) still looks almost identical to how it looked five years ago. The gap is not the model. It is the infrastructure to run agentic tools across a team, against production, with the auth, scope, and audit guarantees an SRE program needs.&lt;/p&gt;

&lt;p&gt;This article is about the execution layer. The data substrate underneath is the other half of the problem, and I've written about it on &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;the ClickHouse blog.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code already works in the outer loop.&lt;/strong&gt; The interface, the reasoning, the tool-call contract all transfer. What changes is the data sources.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five workflows prove it.&lt;/strong&gt; Incident triage, runbook execution, postmortem drafting, SLO investigation, on-call handoffs. Every one of them is Claude-shaped.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The auth, scope, and audit gap is the bottleneck.&lt;/strong&gt; The MCP servers for most SaaS tools already exist. The problem is that when every engineer wires their own connection, you inherit inconsistent authorization, over-scoped credentials, and no audit trail. Useful to one person at best. A data exposure incident at worst.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The gap is an MCP runtime, not a model.&lt;/strong&gt; Managed auth, hosted compute, tool-level governance, persistent audit logs. Until something provides all four, outer-loop AI stays a party trick.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An MCP runtime is more than an MCP gateway.&lt;/strong&gt; A gateway routes MCP tools under one URL. An MCP runtime adds the compute that runs them, the auth that scopes them, and the audit trail that makes them safe in production. Arcade.dev is an MCP runtime with a gateway inside it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Five AI SRE workflows and the MCP servers that power them
&lt;/h2&gt;

&lt;p&gt;If you only read one thing in this article, read this table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;MCP servers&lt;/th&gt;
&lt;th&gt;What Claude Code does&lt;/th&gt;
&lt;th&gt;What on-call does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Incident triage&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Pulls the PagerDuty payload, correlates Datadog signals in the window, checks recent deploys, scans Jira and #incidents, drafts a war room post&lt;/td&gt;
&lt;td&gt;Decides the next move&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Runbook execution&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;, &lt;a href="https://github.com/containers/kubernetes-mcp-server" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Parses the Confluence doc into steps, lays out the diagnostic sequence with commands and expected output, proposes any write command&lt;/td&gt;
&lt;td&gt;Runs the steps, approves every write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Postmortem drafting&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Reconstructs the timeline from Slack, PagerDuty, Datadog, and the deploy log, fills the team template with source-linked evidence&lt;/td&gt;
&lt;td&gt;Writes the root cause and action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;SLO investigation&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-mcp" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Finds the burn inflection, correlates deploys, config changes, traffic shifts, and upstream incidents, ranks hypotheses with linked evidence&lt;/td&gt;
&lt;td&gt;Evaluates hypotheses, decides action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;On-call handoff&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/zendesk" rel="noopener noreferrer"&gt;Zendesk&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Assembles the shift briefing from pages, active incidents, baking deploys, SLO burn, and open action items, delivers it as a Slack DM&lt;/td&gt;
&lt;td&gt;Reviews, adds color, signs off&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Workflow 1: Incident triage is mostly archaeology
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The manual triage above is a parallelism problem, not a skill problem. One engineer, five workflows, sequential context loads. Every on-call engineer I know tells the same story: "I spent the first ten minutes figuring out what was happening."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Hand the alert to Claude Code: "Triage this particular alert, correlated with the Datadog metrics, service logs, and the deployment history. Scan Slack history for other correlated failures."&lt;/p&gt;

&lt;p&gt;Claude Code returns the alert context in two sentences, the top three correlated signals with direct Datadog links, and the deploys most likely to matter by service-graph proximity with commit SHAs and authors. Two to three minutes end to end, running while you are opening the laptop. Grafana's team &lt;a href="https://grafana.com/blog/a-tale-of-two-incident-responses-how-our-ai-assist-helped-us-find-the-cause-3-5x-faster/" rel="noopener noreferrer"&gt;reported a 3.5x reduction&lt;/a&gt; in time to root cause using a similar pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;By the time the on-call moves from the alert on their phone to opening their laptop, Claude Code's initial analysis is waiting. They read the summary, validate it against the dashboards, cross-reference the ranked deploys against what they know shipped recently, and decide the next move. They also catch the failure modes: the correlation that is spurious, the deploy the service graph does not know about, the #incidents thread that was noise. Claude Code compresses the archaeology. The on-call judges it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;PagerDuty, Datadog, Slack, Jira, and GitHub all ship MCP servers. The problem is running them across a team, not building them.&lt;/p&gt;

&lt;p&gt;If the setup is not configured consistently for every engineer on the rotation, the workflow breaks on the shift that needs it most. Misconfigured permissions lead to inconclusive analysis, and inconclusive analysis at 3am is worse than no analysis at all. Engineers who wire up their own connections often grant themselves broader scopes than the workflow needs, and the next access review turns into cleanup nobody planned for. The failure mode that matters most: if tool access is not scoped properly, a diagnostic step can inadvertently trigger a write action, mutate state in production, and turn the triage itself into the incident. Consistent setup, scoped credentials, and read-only enforcement are properties of the MCP runtime, not the individual engineer's configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 2: Runbook execution at 3am
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Mature teams maintain their runbooks. The ones in constant use stay fresh because people fix them after every incident. The rot lives in two quieter places. Runbooks that fire once a quarter drift between uses, and nobody notices until the next 3am page reveals that half the commands point at deprecated tools and renamed clusters. And new engineers on the rotation often do not know which runbook applies to the alert in front of them. Finding the right doc at 3am is its own skill, and it takes months on the rotation to build.&lt;/p&gt;

&lt;p&gt;"Runbooks are a lie we tell ourselves."&lt;/p&gt;

&lt;p&gt;During my time leading &lt;a href="https://www.confluent.io/blog/making-apache-kafka-10x-more-reliable/" rel="noopener noreferrer"&gt;reliability at Confluent&lt;/a&gt; and Dropbox, I saw this pattern play out across very different stacks. It is not an organization-specific problem. It is the law of prioritization playing out: the runbooks that fire often get the attention, and the ones that fire rarely do not.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Finding the right runbook.&lt;/strong&gt; Once triage narrows the problem, the on-call needs to know which runbook applies and what to run. Point Claude Code at the alert. It matches the metadata (service, symptom, tag) against the runbook index, surfaces the top candidate, and lays out the diagnostic sequence with exact commands, the systems they target, and expected output for each step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keeping runbooks fresh.&lt;/strong&gt; Most mature teams run quality weeks or reliability sprints to refresh runbooks. At Confluent, we did this quarterly. Claude Code makes the sprint cheaper since this is a safe environment: replay every runbook against staging in a batch, flag the commands pointing at deprecated tools and renamed clusters, regenerate steps against current infra. The rot that accumulated since the last review gets caught in hours instead of weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call runs the steps. Claude Code lays out the plan, the engineer executes it. Opening unbounded production access to a coding agent does not pass the sniff test for any reliability org I have worked with, and should not. The engineer confirms Claude Code picked the right runbook, runs each diagnostic in their own terminal with their own scoped credentials, and tracks pass/fail as they go. When Claude Code picks the wrong runbook, the on-call re-points it, and that correction feeds the index for the next page.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;If Claude Code does not execute against production directly, enforcement becomes the whole game. The runbook has to be scoped to the user running it, the environment it targets, and the actions the current step actually needs. A step that is safe in staging is dangerous in prod. A step that is safe for a senior SRE is catastrophic for a new joiner still learning the cluster. Without tool-level governance that understands user, environment, and action together, you are back to trusting every engineer to read carefully at 3am, which is exactly the failure mode the runbook was supposed to prevent. Finding the right runbook and enforcing the right scopes are two different problems. Claude Code solves the first. The MCP runtime solves the second, with governance scoped per user, per environment, and per action. Both have to work, and neither replaces the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 3: Postmortem drafting rots at the archaeology step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The incident resolved at 4pm. The retro is Thursday. Someone has to write the draft. The hard part is not the thinking. It is the archaeology: Slack scrollback, PagerDuty timeline, Datadog graphs, deploy history, team template. The &lt;a href="https://incident.io/blog/postmortem-software-roi-calculator" rel="noopener noreferrer"&gt;incident.io team puts manual reconstruction&lt;/a&gt; at 60 to 90 minutes per incident. That matches every team I have run.&lt;/p&gt;

&lt;p&gt;Most postmortems get drafted badly at the last minute. The retro starts from a weak foundation, and the same incident class comes back six months later.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Type into Claude Code: "Draft the postmortem for INC-4729 using the team template." Claude Code assembles the archaeology. It pulls the Slack transcript, the PagerDuty timeline, the Datadog panels from the incident dashboard, and the deploy log for every service touched. It drops each of those into the team template with source links, so every timeline entry traces back to the panel, commit, or message it came from.&lt;/p&gt;

&lt;p&gt;The draft stops at archaeology. Timeline, impact, affected services, evidence. The root cause, contributing factors, and action items fields are left structurally empty. Teams that let AI draft those turn every retro into a cleanup exercise. &lt;a href="https://engineering.zalando.com/posts/2025/09/dead-ends-or-data-goldmines-ai-powered-postmortem-analysis.html" rel="noopener noreferrer"&gt;Zalando's team reported hallucination rates as high as 40 percent&lt;/a&gt; in early AI-drafted postmortem analysis, and the lesson is not better prompting. It is to keep anything causal out of the draft.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call and the retro group review the draft. They are not rewriting it. They correct timeline entries that are wrong, add the signal the archaeology missed (a customer report that came through email, a related incident three days earlier, the deploy two sprints ago that introduced the latent bug), and spend their time on the part that matters: running the 5 whys, pressure-testing the root cause, deciding action items.&lt;/p&gt;

&lt;p&gt;The leverage is strongest on the long tail. In my experience, eighty to ninety percent of incidents a mature team handles are high-volume, low-priority events where the archaeology is mechanical and the writeup feels mundane. That is where teams cut corners, and where repeat incidents quietly accumulate. Claude Code absorbs the mundane work so the high-judgment work gets attention on every incident, not just the big ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The tools the draft pulls from carry the most sensitive data in the company. #incidents has customer PII and vendor secrets. The deploy log has commit messages that sometimes leak security context. Datadog dashboards expose traffic patterns across the fleet. The engineer who set up the Slack connector usually has broader workspace read than the postmortem role needs, and the draft ends up citing messages it had no business reading.&lt;/p&gt;

&lt;p&gt;Scoping has to happen at the tool layer, not the prompt layer. Which channels the draft can read, which dashboards it can fetch, which tables it can query, all bounded by policy and tied to the user triggering the workflow. Then a provenance trail in a persistent log, showing what the AI accessed, when, and under whose identity. That is the half compliance will ask about, and the half that decides whether the workflow survives its first security review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 4: SLO investigation and error budget reviews
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;At Confluent, my team reviewed our availability SLO every Monday. We pulled the week's incidents, measured their impact on the SLO and the customer SLA, and mapped the root causes from each postmortem back to services and themes. The goal was to see whether the week's error budget had been spent on one repeat problem or scattered across five unrelated ones.&lt;/p&gt;

&lt;p&gt;Most of the prep was manual correlation: error budget delta, matched to PagerDuty incident, matched to Datadog regression, matched to deploy history, matched to the postmortem, matched to the theme bucket. One SRE typically spent four to six hours on that pipeline before the meeting started. The thinking happened in the review. The prep was legwork.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Ask Claude Code to prep the Monday review. It pulls the SLO and SLA deltas, fetches every PagerDuty incident in the window, joins each to the Datadog regression that matches in time and service, pulls the postmortem from Confluence, and extracts the root cause section. It groups root causes into themes using the team's existing taxonomy and hands back a structured brief: error budget delta, the incidents that account for it, the themes, and the open questions the postmortems did not resolve.&lt;/p&gt;

&lt;p&gt;What Claude Code does not do is quantify how much of the burn each incident "caused" in percentage terms. That is causal analysis current models do poorly, and a made-up percentage in a metrics review is worse than no number.&lt;/p&gt;

&lt;p&gt;The AI hunts. The human decides.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The SRE running the review reads the brief, validates the incident-to-regression matches (Claude Code will get some wrong), writes the causal story the AI refused to guess at, decides which themes warrant action items, and raises the open questions in the meeting. Four hours of prep becomes thirty minutes of review and correction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;Warehouse-backed workflows are the ones SRE teams have held off on the longest, and the reason is scope. You cannot hand Claude Code unrestricted warehouse access and hope prompt engineering keeps it away from PII. You cannot give it unbounded query budgets and wait to see a five-thousand-dollar scan on next month's bill. Scope enforcement at the MCP runtime layer is what changes the math: this task queries these tables and not others, costs less than fifty dollars, never touches prod write paths. Without that, the workflow stays a prototype and never makes the rotation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 5: On-call handoffs lose the context nobody wrote down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Handoffs are the most undervalued ritual in SRE work because the incidents they prevent never get counted. Handoff quality tracks how tired the outgoing engineer is, which means handoffs are worst on the shifts that had the most incidents, which is when they matter most. The non-obvious cost: the morning incident where the new on-call did not know a deploy was still baking, and ends up paging the previous on-call at 8am to ask what happened overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Claude Code generates the briefing at the rotation boundary, without anyone triggering it. It pulls the last 24 hours of pages with resolution notes, active incidents, baking deploys, SLOs that crossed a burn threshold, unresolved #incidents threads, Zendesk escalations, and customer reports that came in through the on-call email alias. It lists open action items assigned to the rotation. It delivers the briefing as a Slack DM with a copy in the team's handoff Confluence doc.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The outgoing engineer adds the color only they can add: what they think is a false alarm, which customer report to watch, which deploy they are nervous about, which alert they silenced and why. That is the handoff knowledge that lives in the outgoing engineer's head and nowhere else. Claude Code assembles the facts. The on-call provides the judgment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The briefing fires at 5pm whether anyone is logged in or not, which means it needs a credential that lives outside any single engineer's session. Dotfiles on a closed laptop do not qualify. A scheduled workflow without a persistent service identity is not a workflow. It is a cron job that silently stops running the next time someone rotates off the team. Persistent service identity is a property of the MCP runtime, not the engineer's laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code is a companion, not an autonomous AI SRE
&lt;/h2&gt;

&lt;p&gt;Five workflows, one pattern. Claude Code reads, correlates, drafts, and waits. The human decides.&lt;/p&gt;

&lt;p&gt;Most of the AI SRE market is betting the other way. &lt;a href="https://traversal.com/" rel="noopener noreferrer"&gt;Traversal&lt;/a&gt;, &lt;a href="https://resolve.ai/" rel="noopener noreferrer"&gt;Resolve&lt;/a&gt;, &lt;a href="https://www.anyshift.io/" rel="noopener noreferrer"&gt;Anyshift&lt;/a&gt;, and others are building toward autonomous agents that page, remediate, and close incidents on their own. I am skeptical. A model's output is a function of its capability and the context it is given. Current models can do the archaeology reliably. They cannot reliably be given enough scoped context and the right tools to remediate production unsupervised. That is a context and tooling gap, not a model gap, and I would rather ship the shape that already works.&lt;/p&gt;

&lt;p&gt;Claude Code runs when you ask. It stops when the next step needs judgment. It never pages, rolls back, or closes an incident on its own.&lt;/p&gt;

&lt;p&gt;A companion also dodges the procurement fight that stalls autonomous rollouts. You are not replacing a role or adding an on-call tier. You are pointing the tool your team already uses at data sources they already trust, with an MCP runtime that scopes what it can do. The security review goes from "new vendor, new risk" to "scoped tools inside an existing agent."&lt;/p&gt;

&lt;p&gt;Every workflow in this article starts as a prompt and grows into a skill. The triage prompt, the runbook dispatcher, the postmortem drafter, the SLO prep pipeline, the handoff briefing: each one begins as something one engineer types once, and becomes a packaged skill every engineer on the rotation invokes the same way. The skill keeps getting sharper because the team keeps editing it: a new data source here, a tighter prompt there, a correction after an incident surfaces a blind spot. One person's trick becomes team infrastructure, and the infrastructure compounds.&lt;/p&gt;

&lt;p&gt;Reliability comes from running a proper reliability program, and a proper program is mostly operational work around rituals: triage, runbooks, postmortems, SLO reviews, handoffs. Claude Code earns its keep by making the rituals cheap enough to happen on every shift, not just the ones where someone has the energy for them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI SRE needs from its MCP tool integration layer
&lt;/h2&gt;

&lt;p&gt;Every workflow above needs the same four things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Managed authentication and authorization across tools.&lt;/strong&gt; OAuth flows for every connected tool, credentials refreshed automatically, scoped per user, reachable from any device including a phone at 3am.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed compute, always on, team-wide.&lt;/strong&gt; Tools run on shared infrastructure, cloud-hosted or on-prem, with the same behavior whether the trigger came from a laptop, a phone, a webhook, or a cron job.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool- and agent-level governance.&lt;/strong&gt; Per-tool permission policies, per-task cost budgets, and per-query data access limits enforced where the call happens, not where the model proposes it. This is the difference between a workflow security will approve and one they kill on sight.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent audit logs.&lt;/strong&gt; Every tool call logged with triggering user, arguments, response, and timestamp, in a log the agent cannot modify. Without this you cannot retro the AI, and you cannot trust it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Arcade: an MCP runtime for AI SRE workflows
&lt;/h2&gt;

&lt;p&gt;Arcade is an &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; built to close exactly this gap. &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;Managed OAuth&lt;/a&gt; handles every connected tool, with credentials that refresh automatically and never touch the language model. Every tool call runs &lt;a href="https://docs.arcade.dev/en/guides/create-tools/tool-basics/runtime-data-access" rel="noopener noreferrer"&gt;on behalf of the user&lt;/a&gt; who triggered it, so native permissions in PagerDuty, Datadog, and Snowflake apply exactly as they would outside the agent. You connect PagerDuty once, and every Claude Code session on your team picks it up at the right scope.&lt;/p&gt;

&lt;p&gt;The runtime runs tools on hosted workers, deployable in your cloud or on-prem, and enforces per-tool policies where the call happens, not where the model proposes it. The same workflow triggered from a phone, a laptop, or a cron job executes on shared infrastructure. Policies fire at the MCP runtime layer: "this workflow queries these Snowflake tables and not others," "this workflow can propose PagerDuty actions but cannot execute without approval," "this workflow has a $25 query budget."&lt;/p&gt;

&lt;p&gt;Every tool call lands in an OpenTelemetry-compatible run log with triggering user, arguments, response, and timestamp. It drops straight into the observability pipeline your platform team already runs. When your postmortem asks what Claude Code did during the incident, you have the answer. When compliance asks for every query this AI ran against the warehouse last quarter, you have the answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcade.dev/toolkits" rel="noopener noreferrer"&gt;Prebuilt tools&lt;/a&gt; ship for PagerDuty, Datadog, Slack, Jira, Confluence, GitHub, Snowflake, and more. You can also &lt;a href="https://docs.arcade.dev/en/home/custom-mcp-server-quickstart" rel="noopener noreferrer"&gt;bring your own MCP servers&lt;/a&gt; into the runtime: the PagerDuty, Datadog, Snowflake, and Kubernetes servers linked in the table above drop in as-is and inherit the same managed auth, policy enforcement, and audit logs as the prebuilt ones. You extend your existing MCP investment instead of replacing it.&lt;/p&gt;

&lt;p&gt;You can build this without Arcade, and the reason not to is the same reason you did not write your own CI system: the work is real, the edge cases are ugly, and it is not where your reliability differentiation lives. A mature team can hand-roll managed OAuth, stand up hosted workers, wire per-tool policy enforcement, and ship a tamper-evident audit log. A few platform teams I know started down that path and concluded it was too costly to own, or simply not where they wanted to spend their reliability budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reducing on-call toil is where SRE leverage lives
&lt;/h2&gt;

&lt;p&gt;The outer loop has not caught up to the inner loop because the infrastructure to run agentic tools safely against production systems has been missing. A coding assistant only needs your repo and your editor. An operational assistant needs managed identity, hosted compute, enforced governance, and an audit trail, because it reaches into systems where mistakes page the CTO.&lt;/p&gt;

&lt;p&gt;The SRE teams that figure this out over the next year will pull away from the ones that do not, the same way the teams that adopted Claude Code for inner-loop work in 2024 pulled away from the teams that waited. The inner loop is solved. The outer loop is where the leverage lives now, sitting on a &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;data substrate that is its own design problem&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Claude Code does not replace the on-call. It just lets them start on page 5 instead of page 1.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an AI SRE?
&lt;/h3&gt;

&lt;p&gt;An AI SRE is an AI assistant that helps site reliability engineers with operational work: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. Most practical AI SRE deployments today run as companions that read, correlate, and draft while a human engineer decides the next move, rather than as autonomous agents that page, remediate, and close incidents on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between an MCP gateway and an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;An MCP gateway routes MCP tools under a single URL so any MCP client can call them. An MCP runtime goes further: it adds the compute that runs the tools, managed authentication, per-tool permission enforcement, and persistent audit logs. A gateway is routing infrastructure. A runtime is production infrastructure. Arcade is an MCP runtime with a gateway inside it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Claude Code replace an on-call engineer?
&lt;/h3&gt;

&lt;p&gt;No. Claude Code works best as a companion to the on-call engineer, not a replacement. It compresses the archaeology (pulling alerts, correlating signals, drafting summaries) so the engineer starts with context already loaded. Every decision that requires judgment (rolling back a deploy, paging a co-worker, closing an incident) stays with the human.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I use Claude Code for incident triage?
&lt;/h3&gt;

&lt;p&gt;Point Claude Code at the alert with a prompt like "Triage this alert, correlated with Datadog metrics, service logs, and deployment history. Scan Slack for correlated failures." With MCP servers for PagerDuty, Datadog, Slack, and GitHub wired into an MCP runtime, Claude Code returns a summary, the top correlated signals, candidate deploys, and a draft war room post in two to three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to let Claude Code execute runbooks in production?
&lt;/h3&gt;

&lt;p&gt;Claude Code should not execute against production directly. The safer pattern is for Claude Code to parse the runbook, lay out the diagnostic sequence, and propose commands, while the on-call engineer runs each step in their own terminal with their own scoped credentials. Unbounded production access for any coding agent should not pass a reliability review.&lt;/p&gt;

&lt;h3&gt;
  
  
  What MCP servers do I need for AI SRE workflows?
&lt;/h3&gt;

&lt;p&gt;The core set covers the tools already in an SRE rotation: PagerDuty, Datadog, Slack, and GitHub for incident triage; Confluence and Kubernetes for runbook execution; Snowflake for SLO investigation; Zendesk for on-call handoffs. Each has a production-ready MCP server that can run inside an MCP runtime like Arcade, which handles managed auth, policies, and audit logs across all of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Arcade work with Claude Code?
&lt;/h3&gt;

&lt;p&gt;Arcade is an MCP runtime that manages OAuth, per-tool permission policies, and audit logs for every tool Claude Code calls. You connect PagerDuty, Datadog, or Snowflake once, and every Claude Code session on your team picks up the tools at the right scope. Arcade also runs bring-your-own MCP servers, so existing integrations work as-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between AI SRE tools like Traversal and using Claude Code with an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;Traversal, Resolve, and Anyshift are building autonomous agents that page, remediate, and close incidents on their own. Claude Code with an MCP runtime takes the companion approach: read, correlate, draft, and wait for the engineer to decide. The companion pattern ships today. The autonomous bet does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the observability store underneath matter as much as the MCP runtime above?
&lt;/h3&gt;

&lt;p&gt;Yes. An AI agent runs 10 to 30 queries per investigation, and most observability stores weren't built to serve that pattern at the retention and cardinality an SRE needs. The MCP runtime handles the execution layer; the observability store handles the cognitive substrate. Both matter. I've written about the substrate side &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;here&lt;/a&gt;.  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>devops</category>
      <category>mcp</category>
    </item>
    <item>
      <title>How to Connect AI Agents to Enterprise Productivity Tools Securely (2026 Architecture Guide)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 09 Apr 2026 20:58:36 +0000</pubDate>
      <link>https://dev.to/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</link>
      <guid>https://dev.to/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</guid>
      <description>&lt;p&gt;Most enterprise AI agents today can analyze but can't execute. They summarize documents, surface insights, and draft responses. They don't close support tickets, update Salesforce, or trigger deployments. The ROI stays incremental. The architecture that solves this is an MCP runtime, a secure execution layer that handles authorization, credentials, and tool calling on behalf of each user.&lt;/p&gt;

&lt;p&gt;The real transformation happens when agents take actions, when employees direct work instead of doing it. But getting agents to safely execute across enterprise systems is where everything falls apart.&lt;/p&gt;

&lt;p&gt;Recent industry studies from IDC and MIT show that &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;88 to 95 percent of enterprise AI pilots fail to reach production&lt;/a&gt;. The root cause isn't the language model. It's the complexity of secure integration, and every month spent rebuilding auth plumbing is a month your agents aren't delivering business value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use an MCP runtime as the secure action layer&lt;/strong&gt; between your agents and enterprise tools. It evaluates the intersection of agent permissions and user permissions per action at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute every tool call on behalf of the user (OBO).&lt;/strong&gt; The agent acts with the user's credentials, scoped to the user's native permissions, and every action is attributable in audit logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep OAuth tokens out of the LLM context.&lt;/strong&gt; Credentials must be vaulted at the runtime layer where the model cannot observe, alter, or leak them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not use static service accounts.&lt;/strong&gt; They break permission models and turn a single prompt injection into an enterprise-wide incident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build with agent-optimized tools, not raw API wrappers&lt;/strong&gt;: intent-level operations with validated schemas that prevent parameter hallucination and eliminate retry loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require human-in-the-loop approvals for all destructive actions&lt;/strong&gt;. Deletes, bulk updates, and external communications must pause for explicit sign-off before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship audit logs and telemetry from day one.&lt;/strong&gt; Export every tool call via OpenTelemetry to your SIEM for compliance, incident response, and root cause analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why connecting AI agents to enterprise tools is hard: identity, permissions, and safe execution
&lt;/h2&gt;

&lt;p&gt;The bottleneck in agentic systems, such as Claude Cowork or OpenClaw, isn't making API calls. It's identity propagation, permission inheritance, and safe execution within complex enterprise environments.&lt;/p&gt;

&lt;p&gt;When teams build direct integrations between LLMs and enterprise software, they immediately hit friction. Developers spend cycles managing fragile OAuth token lifecycles, handling async user consent flows, manually tuning least-privilege authorization scopes, and building custom approval controls. This is undifferentiated infrastructure work that burns engineering time without advancing the agent's core capabilities.&lt;/p&gt;

&lt;p&gt;Because this work is tedious and blocks core agent development, teams frequently take a dangerous shortcut: they use service accounts.&lt;/p&gt;

&lt;p&gt;Granting an agent global read and write access across an entire enterprise instance breaks native permission models. You're bypassing years of carefully configured role-based access controls.&lt;/p&gt;

&lt;p&gt;A single manipulated input can result in instant, untraceable data exfiltration or system modification. If an agent holds a static API key with global write access, a localized &lt;a href="https://genai.owasp.org/llm-top-10/" rel="noopener noreferrer"&gt;prompt injection vulnerability&lt;/a&gt; becomes an enterprise-wide blast radius.&lt;/p&gt;

&lt;p&gt;Teams make two mistakes here. Give the agent its own identity, and an intern can bypass their permissions through the agent. Inherit the user's full access, and one prompt injection cascades through every connected system.&lt;/p&gt;

&lt;p&gt;The right answer is the intersection: what is this agent allowed to do &lt;strong&gt;AND&lt;/strong&gt; what is this user allowed to do, evaluated per action, at runtime. This is the permission intersection model, and it's the only approach that prevents both privilege escalation and blast radius expansion simultaneously.&lt;/p&gt;

&lt;p&gt;This evaluation must happen at the runtime layer. Not at login time, not in the prompt, and not in the application code. Without it, scaling agents beyond single-user demos is unsafe.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural shift: The agent is already the proxy
&lt;/h2&gt;

&lt;p&gt;Before evaluating specific integration approaches, you need to understand why the traditional enterprise architecture no longer applies.&lt;/p&gt;

&lt;p&gt;In the pre-agentic model, a proxy (API gateway) sits between applications and APIs, routing, authenticating, and rate limiting. The proxy is the control point because all traffic flows through it.&lt;/p&gt;

&lt;p&gt;Agents invert this topology. The agent mediates between the user and the infrastructure. It already handles routing, orchestration, and decision-making. Adding a traditional proxy in front of the tools the agent calls doesn't add a control point. It adds a redundant hop that can't see into the execution context that matters: which user, which action, which permission, right now.&lt;/p&gt;

&lt;p&gt;The control point in an agentic architecture is the execution layer where the tool runs, where credentials are resolved, permissions are checked, and actions are taken on behalf of a specific human. That's the runtime.&lt;/p&gt;

&lt;p&gt;The gateway era was defined by the proxy as the control point. The agentic era is defined by the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four architectures for connecting AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;As organizations move from isolated pilots to production deployments, engineering teams adopt one of four integration models. Understanding where each approach breaks down under enterprise load is critical for architectural planning.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Integration approach&lt;/th&gt;
&lt;th&gt;Security &amp;amp; identity&lt;/th&gt;
&lt;th&gt;Maintenance burden&lt;/th&gt;
&lt;th&gt;Reliability &amp;amp; execution&lt;/th&gt;
&lt;th&gt;Speed-to-market&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom connectors &amp;amp; DIY auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly variable; often falls back to static keys.&lt;/td&gt;
&lt;td&gt;Extremely high; requires dedicated auth teams.&lt;/td&gt;
&lt;td&gt;Low; prone to parameter hallucination loops.&lt;/td&gt;
&lt;td&gt;Very slow.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legacy iPaaS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate; struggles with On-Behalf-Of execution.&lt;/td&gt;
&lt;td&gt;Medium; relies on maintaining visual workflows.&lt;/td&gt;
&lt;td&gt;Medium; optimized for linear triggers, not loops.&lt;/td&gt;
&lt;td&gt;Moderate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unmanaged MCP servers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low; lacks centralized multi-user authorization.&lt;/td&gt;
&lt;td&gt;High; requires manual deployment and patching.&lt;/td&gt;
&lt;td&gt;Low; lacks native retries and failover state.&lt;/td&gt;
&lt;td&gt;Fast for prototypes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP runtime (e.g., Arcade)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High; native permission mapping and token vaults.&lt;/td&gt;
&lt;td&gt;Low; runtime handles lifecycle and upgrades.&lt;/td&gt;
&lt;td&gt;High; parallel execution and automatic retries.&lt;/td&gt;
&lt;td&gt;Very fast.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Approach 1: Build custom connectors and OAuth (DIY authentication)
&lt;/h3&gt;

&lt;p&gt;Build one-off API wrappers and custom OAuth layers for every enterprise tool your agent needs.&lt;/p&gt;

&lt;p&gt;The upside is total control. You dictate every aspect of the integration and avoid third-party vendor lock-in.&lt;/p&gt;

&lt;p&gt;But the limitations get crippling fast. Custom connectors become a massive engineering drain. Teams spend months building secure token vaults, handling refresh token rotation, and writing edge-case logic. Those are months that could have been spent shipping agent features that actually move the business forward.&lt;/p&gt;

&lt;p&gt;Raw enterprise APIs compound the problem. They expect highly structured, deterministic inputs, but agents generate dynamic natural language. Wiring them directly to raw endpoints leads to parameter hallucination and endless retry loops. Authentication alone becomes a standalone infrastructure project: token rotation, user matching, session validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Use legacy iPaaS for agent tool calls
&lt;/h3&gt;

&lt;p&gt;Enterprises retrofit existing integration platforms like Workato, MuleSoft, or Zapier to trigger actions based on LLM outputs.&lt;/p&gt;

&lt;p&gt;The strength is familiarity. Enterprise IT teams already know these tools, and they come with massive pre-built endpoint catalogs.&lt;/p&gt;

&lt;p&gt;But the limitations are architectural and fundamental. These platforms were built for linear, deterministic, trigger-based automation. Agentic systems operate on non-deterministic, stateful reasoning loops where the agent decides what to call, when, and how many times based on intermediate results. Forcing that into a linear webhook pattern breaks down fast.&lt;/p&gt;

&lt;p&gt;The deeper problem is identity. Legacy iPaaS platforms center on system-to-system service accounts. They lack true &lt;a href="https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-on-behalf-of-flow" rel="noopener noreferrer"&gt;user-scoped, On-Behalf-Of (OBO) execution&lt;/a&gt;, which forces teams to build complex, fragile workarounds to ensure the agent only acts with the specific permissions of the user typing the prompt. Per-user authorization evaluated at runtime across every tool call requires infrastructure these platforms were never designed to deliver.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Run unmanaged MCP servers
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/specification/latest" rel="noopener noreferrer"&gt;Model Context Protocol standardized how AI models connect to data sources and tools&lt;/a&gt;. In this approach, teams deploy open-source MCP servers to expose local or SaaS capabilities directly to their agents.&lt;/p&gt;

&lt;p&gt;MCP's strength is standardization. It decouples the agent framework from the underlying tool implementation, creating a universal language for tool calling. The problem is that the quality of unmanaged, open-source MCP servers varies widely. According to &lt;a href="https://toolbench.arcade.dev/" rel="noopener noreferrer"&gt;benchmarks&lt;/a&gt; many struggle with reliability and correctness, which compounds the challenges of production deployments.&lt;/p&gt;

&lt;p&gt;These servers break down the moment you take them to production. Raw, unmanaged MCP servers lack centralized governance. They don't ship with multi-user enterprise authentication handling, meaning every user often shares the same connection identity.&lt;/p&gt;

&lt;p&gt;They also lack production reliability features like automatic retries, parallel execution, and stateful failover out of the box. That burden falls back on the application developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 4: Use an MCP runtime (the secure action layer)
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://docs.arcade.dev/en/home" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; is the infrastructure layer purpose-built to solve this problem. &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;, the industry's first MCP runtime, combines &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;agent-optimized tools&lt;/a&gt;, centralized authentication and authorization, and enterprise governance into a single control plane.&lt;/p&gt;

&lt;p&gt;This approach targets production AI specifically. The runtime speaks MCP natively (JSON-RPC, Streamable HTTP) with no protocol translation and no context loss. It preserves native permissions through On-Behalf-Of token flows, isolates credentials from the language model, and provides instant, OpenTelemetry-compatible audit logs for every action.&lt;/p&gt;

&lt;p&gt;Teams ship faster because the runtime handles authorization, token lifecycle, retries, and governance. Engineers focus entirely on agent logic and business outcomes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/blog/mcp-runtime-gateway" rel="noopener noreferrer"&gt;Arcade's MCP Gateway&lt;/a&gt; lets any MCP client access the full tool catalog through a single endpoint. Teams can also bring their own MCP servers into the runtime to get authorization, retries, and audit logs without rewriting what already works. The runtime extends your existing MCP investment rather than replacing it.&lt;/p&gt;

&lt;p&gt;For single-user hobbyist projects or local scripts, a full runtime adds unnecessary overhead. But for platform engineering teams deploying autonomous systems to thousands of corporate users, an MCP runtime is the only viable path to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What production demands: authorization, tooling, and governance
&lt;/h3&gt;

&lt;p&gt;The comparison above shows where each approach breaks. But understanding why the MCP runtime wins requires going deeper into the three capabilities that separate production deployments from demos: just-in-time authorization that enforces user-scoped access, agent-optimized tools that eliminate hallucination loops, and governance infrastructure that gives platform teams full visibility over every action.&lt;/p&gt;

&lt;h4&gt;
  
  
  How just-in-time authorization enforces user-scoped access
&lt;/h4&gt;

&lt;p&gt;Custom connectors fall back to static keys. Legacy iPaaS platforms rely on shared service accounts. Unmanaged MCP servers lack multi-user auth entirely. All three fail at the same point: they can't evaluate who is allowed to do what at the moment the tool is called.&lt;/p&gt;

&lt;p&gt;That’s the problem &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;just-in-time authorization&lt;/a&gt; solves.&lt;/p&gt;

&lt;p&gt;The agent requests and validates credentials only at the moment an action requires them, not upfront. If a user never invokes the Salesforce integration, no Salesforce tokens are ever obtained or stored.&lt;/p&gt;

&lt;p&gt;The entire authentication flow (OAuth exchanges, token refresh, credential storage) executes in deterministic backend logic that the LLM can never alter, observe, or leak. For additional governance, teams can attach pre-tool-call and post-tool-call hooks to enforce custom policies like human-in-the-loop approvals for certain actions, usage limits or contextual access rules.&lt;/p&gt;

&lt;p&gt;This works because the runtime is stateful. It maintains per-session, per-user context across an agent's entire reasoning loop. A stateless proxy evaluates each request in isolation and can't know that a request is step 3 of a 6-step workflow, acting on behalf of Alice, who authorized this specific scope 4 minutes ago. The runtime can, and that session context is what makes per-user, per-tool authorization enforceable.&lt;/p&gt;

&lt;p&gt;This is where the permission intersection model described earlier becomes operational. The architecture enforces: Agent Permissions ∩ User Permissions = Effective Action Scope. The agent can only execute an action if both the agent's role policy and the human user's native SaaS permissions explicitly allow it. Every other combination is denied.&lt;/p&gt;

&lt;p&gt;A concrete example: an enterprise AI agent is built to assist the Human Resources department. An employee using this agent has high-level administrative privileges in Workday, including access to global payroll data. But the HR agent itself is scoped strictly to recruiting tasks.&lt;/p&gt;

&lt;p&gt;Because the runtime evaluates the intersection of these permissions at call time, the agent is denied when prompted to access payroll data. The user has the authority, but the agent's restricted scope prevents the action. This stops data exfiltration and &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html" rel="noopener noreferrer"&gt;confused deputy&lt;/a&gt; attacks cold.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent-optimized tools vs API wrappers: what to use and why
&lt;/h4&gt;

&lt;p&gt;The comparison table flags a specific failure mode for custom connectors: parameter hallucination loops. This happens because raw REST endpoints require precise, deterministic parameters, and language models produce probabilistic natural language. Wiring one directly to the other without an intermediary is where agents break.&lt;/p&gt;

&lt;p&gt;Agents need intent-level tools rather than raw API wrappers. An intent-level tool absorbs the ambiguity of an agent's request and translates it into a safe, predictable transaction. The result is faster execution, fewer failed actions, and lower inference costs because the agent doesn't burn tokens on retry loops.&lt;/p&gt;

&lt;p&gt;Production execution also requires runtime reliability features that raw APIs don't provide. The runtime provides developer-defined context for intelligent retries, parallelized execution for multi-step tasks, and automatic failover to handle rate limits and transient network errors gracefully. Standardized schemas within these tools prevent parameter hallucination, the most common cause of agent failure when wiring models directly to APIs.&lt;/p&gt;

&lt;p&gt;Consider how this works in practice. Instead of an agent calling a raw Salesforce update endpoint and failing because it hallucinated a required stage ID string, the agent uses a high-level, agent-optimized progress tool.&lt;/p&gt;

&lt;p&gt;The tool natively understands the user's intent to move a deal to negotiation. Its internal logic securely looks up the correct, exact ID for that specific Salesforce instance, validates the state transition, and safely executes the update. The language model doesn't need to guess the exact database schema. The action succeeds on the first call, not the fifth.&lt;/p&gt;

&lt;h4&gt;
  
  
  Governance and observability for agent actions (audit logs, OTel, versioning)
&lt;/h4&gt;

&lt;p&gt;Unmanaged MCP servers scored "Low" on reliability and security in the comparison above because they lack centralized governance. Once agents execute real actions on behalf of users, platform teams need complete visibility and control over the integration ecosystem. The runtime delivers this through three mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visibility filtering&lt;/strong&gt; ensures agents only see the specific tools the current user is permitted to invoke. If a user doesn't have permission to merge code in GitHub, the GitHub merge tool doesn't appear in the agent's context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deep audit trails&lt;/strong&gt; log every action per user, per service, and per agent session. These logs are &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/" rel="noopener noreferrer"&gt;exportable to standard SIEM tools via OpenTelemetry (OTel)&lt;/a&gt; to satisfy compliance audits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version control&lt;/strong&gt; lets platform engineers safely upgrade tool schemas and rotate connection parameters without breaking production agents running mid-session on older versions.&lt;/p&gt;

&lt;p&gt;When an agent incorrectly closes several open opportunities in a CRM, the platform team can't spend days parsing raw application logs. With an OTel-compatible audit log generated by the action layer, the security team can instantly trace the destructive action back to the exact user prompt, the specific agent session, and the token used. This isolates the root cause in minutes, enabling teams to refine the agent's instructions or the tool's access policy immediately.&lt;/p&gt;

&lt;p&gt;Of the four approaches evaluated, only the MCP runtime delivers all three: user-scoped authorization at call time, intent-level tooling that prevents hallucination, and centralized governance with full audit trails. The remaining sections show how this architecture works in practice and how to evaluate it for your organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose an enterprise agent integration approach (security, OBO, and TCO)
&lt;/h2&gt;

&lt;p&gt;Choosing how to connect your AI agents to enterprise tools is a foundational architectural decision. It dictates the speed and security of your deployment. Platform engineers and technical leaders need to frame their buying and building criteria around security, scale, and where their engineering resources should focus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security and compliance requirements (SOC 2, ISO 27001, auditability)
&lt;/h3&gt;

&lt;p&gt;Can the proposed solution natively map to SOC 2 and ISO 27001 requirements for strict user attribution? If an agent deletes a file in Google Workspace, the audit log must definitively prove which human authorized that action.&lt;/p&gt;

&lt;p&gt;The system must support pre-tool-call &lt;a href="https://hoop.dev/blog/how-to-keep-human-in-the-loop-ai-control-soc-2-for-ai-systems-secure-and-compliant-with-action-level-approvals" rel="noopener noreferrer"&gt;Human-in-the-Loop (HITL) approval hooks&lt;/a&gt;. Destructive actions like modifying production configurations or bulk-updating database records must pause execution and require cryptographic sign-off from a human administrator via Slack or email before proceeding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build vs buy economics (OAuth maintenance and total cost of ownership)
&lt;/h3&gt;

&lt;p&gt;Build versus buy demands a ruthless economic assessment.&lt;/p&gt;

&lt;p&gt;Calculate the actual engineering hours required to build, maintain, and securely upgrade OAuth flows for ten or more distinct enterprise APIs. Factor in the hidden costs: managing refresh token rotation, building webhook callback URLs for long-running async tasks, patching custom connectors when SaaS vendors inevitably deprecate their API versions.&lt;/p&gt;

&lt;p&gt;Then ask what those engineers could have shipped instead.&lt;/p&gt;

&lt;p&gt;Adopting an MCP runtime transforms a multi-month infrastructure project into a configuration exercise. The total cost of ownership drops dramatically, and your team reclaims months of engineering capacity to invest in the agent capabilities that differentiate your product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time-to-value and engineering focus
&lt;/h3&gt;

&lt;p&gt;Time-to-value is where most teams underestimate the cost of building in-house.&lt;/p&gt;

&lt;p&gt;Will your highly paid AI engineers spend the next three months building reliable Slack and Workspace connectors, or will they spend that time optimizing agent prompts, evaluating reasoning logic, and shipping the agent capabilities that drive revenue? Every week spent on integration plumbing is a week your competitors use to get their agents into production.&lt;/p&gt;

&lt;p&gt;When evaluating external vendors or internal architecture plans, force the issue with hard technical questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are API keys or OAuth tokens ever visible in the language model's prompt context window?&lt;/li&gt;
&lt;li&gt;How does the system resolve conflicting permissions between a highly privileged user and a narrowly scoped agent?&lt;/li&gt;
&lt;li&gt;Can the system emit W3C-standard trace context to our existing OpenTelemetry collectors?&lt;/li&gt;
&lt;li&gt;How does the tool handle rate limiting when an agent enters an unexpected retry loop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to credential visibility is anything other than absolute isolation, the architecture is unfit for enterprise production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference architecture for an MCP runtime (step-by-step flow)
&lt;/h2&gt;

&lt;p&gt;With the architectural decision framed, here's how a request actually flows through the runtime end to end. The MCP runtime acts as the intermediary that brokers trust and execution between the non-deterministic reasoning engine and the deterministic enterprise environment.&lt;/p&gt;

&lt;p&gt;The flow of a secure request follows a strict sequence:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" alt="Secure AI agent enterprise integration architecture diagram showing MCP runtime flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User prompt&lt;/strong&gt;: The user submits a request, e.g., "close this support ticket."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM plan&lt;/strong&gt;: The agent's language model determines the sequence of tool calls needed to fulfill the request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP runtime&lt;/strong&gt;: The runtime receives the tool call request. It evaluates user and agent permissions and retrieves the necessary On-Behalf-Of credential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool execution&lt;/strong&gt;: The runtime, not the agent, executes the precise API call against the target system (e.g., Zendesk).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result &amp;amp; next action:&lt;/strong&gt; The runtime receives the API result, filters it, and passes it back to the agent. The LLM then either plans the next action in the sequence or determines the task is complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirmation &amp;amp; audit&lt;/strong&gt;: The agent confirms the action's completion to the user, and the runtime logs the entire transaction via OpenTelemetry for audit purposes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture enforces a hard separation of concerns. The language model handles reasoning, planning, action selection, and generation. The runtime layer handles credentials, policy enforcement, rate limiting, action execution, and logging.&lt;/p&gt;

&lt;p&gt;By vaulting tokens at the runtime layer, this architecture prevents prompt-injection-driven data exfiltration. The language model never possesses the keys required to export data.&lt;/p&gt;

&lt;h3&gt;
  
  
  How an MCP runtime works with any LLM
&lt;/h3&gt;

&lt;p&gt;The MCP runtime works with any LLM through any orchestration framework, or none at all. No framework dependency is required. Arcade serves as the secure execution backend: your code handles reasoning, Arcade handles credentials, authorization, and tool execution.&lt;/p&gt;

&lt;p&gt;This clean separation is what accelerates time-to-production. AI engineers focus entirely on agent logic while offloading the high-risk plumbing of enterprise integrations to the runtime.&lt;/p&gt;

&lt;p&gt;A working example: an agent that reads Gmail and sends Slack messages through Arcade's runtime. Setup requires three dependencies and three environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;arcadepy openai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_API_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_arcade_api_key&lt;/span&gt;        &lt;span class="c"&gt;# Free at arcade.dev
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_USER_ID&lt;/span&gt;=&lt;span class="n"&gt;your_email&lt;/span&gt;@&lt;span class="n"&gt;company&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;     &lt;span class="c"&gt;# The user the agent acts on behalf of
&lt;/span&gt;&lt;span class="n"&gt;OPENAI_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_openai_key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;arcadepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Arcade&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;arcade_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Arcade&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;arcade_user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define enterprise productivity tools — Arcade handles auth for each
&lt;/span&gt;&lt;span class="n"&gt;tool_catalog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.ListEmails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.SendEmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slack.SendMessage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Get tool definitions formatted for the LLM
&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_catalog&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# JIT authorization + execution — credentials never touch the LLM
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorize &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agentic loop — LLM reasons and selects tools, Arcade executes them
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
   &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
   &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude_none&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
           &lt;span class="k"&gt;continue&lt;/span&gt;
       &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
       &lt;span class="k"&gt;break&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize my latest 5 emails, then send me a DM on Slack with the summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM reasons through the task, selects &lt;code&gt;Gmail.ListEmails&lt;/code&gt; to fetch emails, summarizes them, then selects &lt;code&gt;Slack.SendMessage&lt;/code&gt; to deliver the summary. The runtime handles JIT authorization for each tool on behalf of that specific user. The agent never sees OAuth tokens, never manages refresh flows, and never touches credentials. &lt;a href="https://docs.arcade.dev/en/get-started/agent-frameworks/setup-arcade-with-your-llm-python" rel="noopener noreferrer"&gt;Full walkthrough in the Arcade docs.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps to productionize agent integrations (checklist)
&lt;/h2&gt;

&lt;p&gt;To transition from sandbox prototypes to production-grade deployments, platform engineering teams follow a structured, iterative implementation plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inventory required tools and least-privilege scopes
&lt;/h3&gt;

&lt;p&gt;Start by conducting a rigorous audit of your necessary tools. List the specific APIs your agents need, and document the exact user-scopes and OAuth granularities required for each. Don't request global access. Map out the principle of least privilege for every single workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Define autonomous vs human-approved actions (HITL)
&lt;/h3&gt;

&lt;p&gt;Next, define your operational boundaries. Build a matrix deciding which actions are safe for autonomous execution (like reading calendar events) and which high-risk actions require explicit user delegation or human-in-the-loop approval hooks (like deleting files or sending external emails).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Standardize on a single control plane
&lt;/h3&gt;

&lt;p&gt;Centralize your integration strategy immediately. Prevent the creation of "shadow registries."&lt;/p&gt;

&lt;p&gt;When disparate engineering teams build redundant, unmanaged integrations using hardcoded tokens, they create severe security vulnerabilities and integration sprawl. Standardize on a single control plane for all agent tool use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Pilot one workflow and validate token isolation and telemetry
&lt;/h3&gt;

&lt;p&gt;Before rolling out broadly, test the architecture with a narrow, controlled use case. Pilot a single workflow, like developer issue automation linking GitHub and Jira, to validate token isolation and telemetry.&lt;/p&gt;

&lt;p&gt;Invest in infrastructure, not just isolated connectors. Evaluate platforms that treat authorization, agent-optimized tools, and lifecycle governance as a unified secure runtime, not separate problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Use an MCP runtime to connect AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;The true challenge of connecting AI to enterprise productivity tools has little to do with formatting JSON payloads or making API calls. The bottleneck is securing user-scoped access, enforcing least-privilege permissions at runtime, and maintaining rigorous operational governance over non-deterministic systems.&lt;/p&gt;

&lt;p&gt;The most successful platform engineering teams recognize that rebuilding identity propagation, token lifecycles, and reliable integration mechanics from scratch is an expensive distraction from their core business objectives. They need an MCP runtime, not more custom connectors.&lt;/p&gt;

&lt;p&gt;Arcade is the industry's first MCP runtime. It delivers secure agent authorization, the largest catalog of agent-optimized tools, and centralized lifecycle governance in a single control plane. Arcade eliminates the undifferentiated heavy lifting of enterprise integration so your team ships faster and scales with control.&lt;/p&gt;

&lt;p&gt;If you're building agents that need to execute across enterprise tools, start with the &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;getting started guide&lt;/a&gt; or explore the &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;full tool catalog&lt;/a&gt; to see what's available out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: Enterprise AI agent integrations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the best way to connect AI agents to enterprise productivity tools?
&lt;/h3&gt;

&lt;p&gt;Use an MCP runtime, a secure action layer that performs user-scoped (OBO) execution, keeps tokens out of the LLM, and enforces runtime authorization per tool call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should AI agents use service accounts to access Slack, Google Workspace, or Microsoft 365?
&lt;/h3&gt;

&lt;p&gt;No. Service accounts bypass user permissions and expand the blast radius of prompt injection. Use on-behalf-of user execution with least-privilege scopes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "On-Behalf-Of (OBO)" mean for agent integrations?
&lt;/h3&gt;

&lt;p&gt;OBO means the agent executes each action using credentials tied to the requesting user, so the action is limited to that user's native permissions and is attributable in audit logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is just-in-time authorization for AI agents?
&lt;/h3&gt;

&lt;p&gt;Just-in-time authorization is a runtime policy check that executes at the moment of each tool call, evaluating the user's identity, the agent's allowed scope, and the requested action. Credentials are requested and validated only when needed, not pre-authorized during setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an MCP runtime, and how is it different from an MCP server?
&lt;/h3&gt;

&lt;p&gt;An MCP server exposes tools to an agent using the MCP, but it's typically single-user, stateless, and ships without built-in auth, token management, or observability. An MCP runtime is the enterprise infrastructure layer that complements MCP servers to add what they lack: multi-user OBO authentication, per-call policy enforcement, token vaulting, automatic retries, and audit/telemetry. The server defines what the agent can call; the runtime makes it safe to call at scale. Arcade is the industry's first MCP runtime, purpose-built for production agent deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the minimum security requirements for production agent tool access?
&lt;/h3&gt;

&lt;p&gt;Token isolation from the LLM, user-scoped/OBO execution, least-privilege scopes, per-action authorization, audit logs with user attribution, and HITL approvals for high-risk actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you audit and attribute agent actions for compliance (SOC 2 / ISO 27001)?
&lt;/h3&gt;

&lt;p&gt;Log every tool call with user identity, tool, parameters/intent, outcome, and trace context, and export via OpenTelemetry to your SIEM for investigation and reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  When do legacy iPaaS tools (Zapier/Workato/MuleSoft) break down for agents?
&lt;/h3&gt;

&lt;p&gt;They struggle with non-deterministic agent loops and true user-scoped OBO execution, forcing teams to rely on shared credentials or brittle workarounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do agent-optimized tools reduce hallucinations compared to raw API wrappers?
&lt;/h3&gt;

&lt;p&gt;They use intent-level operations with validated schemas and internal lookups, so the model doesn't have to guess required IDs/parameters and can fail safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should we add human-in-the-loop (HITL) approvals?
&lt;/h3&gt;

&lt;p&gt;For destructive or irreversible actions (deletes, external emails, bulk updates, permission changes) or any action that materially impacts security, finance, or customer data.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
