<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arcade.dev</title>
    <description>The latest articles on DEV Community by Arcade.dev (@arcade).</description>
    <link>https://dev.to/arcade</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12915%2Febc942d3-5ae5-44e5-9a4a-06829aad6a1a.png</url>
      <title>DEV Community: Arcade.dev</title>
      <link>https://dev.to/arcade</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arcade"/>
    <language>en</language>
    <item>
      <title>Claude Code for the Outer Loop: An AI SRE Playbook to Reduce On-Call Toil</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 22 Apr 2026 22:03:16 +0000</pubDate>
      <link>https://dev.to/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</link>
      <guid>https://dev.to/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</guid>
      <description>&lt;p&gt;It is 2:13am. PagerDuty fires for checkout-service, p95 past threshold for four minutes. You open Datadog, find the wrong dashboard, then the right one, then the CI tool for recent deploys, then Jira for open incidents, then #incidents in Slack to check whether a co-worker is already in the war room. Eight minutes in, you have a working hypothesis.&lt;/p&gt;

&lt;p&gt;That is not incident response. That is a context-loading tax the on-call pays before the work begins.&lt;/p&gt;

&lt;p&gt;Coding agents, such as Claude Code, are eating the inner loop. The outer loop is a different story. Operational work (incident response, runbook execution, SLO investigation, on-call handoffs) still looks almost identical to how it looked five years ago. The gap is not the model. It is the infrastructure to run agentic tools across a team, against production, with the auth, scope, and audit guarantees an SRE program needs.&lt;/p&gt;

&lt;p&gt;This article is about the execution layer. The data substrate underneath is the other half of the problem, and I've written about it on &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;the ClickHouse blog.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code already works in the outer loop.&lt;/strong&gt; The interface, the reasoning, the tool-call contract all transfer. What changes is the data sources.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five workflows prove it.&lt;/strong&gt; Incident triage, runbook execution, postmortem drafting, SLO investigation, on-call handoffs. Every one of them is Claude-shaped.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The auth, scope, and audit gap is the bottleneck.&lt;/strong&gt; The MCP servers for most SaaS tools already exist. The problem is that when every engineer wires their own connection, you inherit inconsistent authorization, over-scoped credentials, and no audit trail. Useful to one person at best. A data exposure incident at worst.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The gap is an MCP runtime, not a model.&lt;/strong&gt; Managed auth, hosted compute, tool-level governance, persistent audit logs. Until something provides all four, outer-loop AI stays a party trick.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An MCP runtime is more than an MCP gateway.&lt;/strong&gt; A gateway routes MCP tools under one URL. An MCP runtime adds the compute that runs them, the auth that scopes them, and the audit trail that makes them safe in production. Arcade.dev is an MCP runtime with a gateway inside it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Five AI SRE workflows and the MCP servers that power them
&lt;/h2&gt;

&lt;p&gt;If you only read one thing in this article, read this table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;MCP servers&lt;/th&gt;
&lt;th&gt;What Claude Code does&lt;/th&gt;
&lt;th&gt;What on-call does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Incident triage&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Pulls the PagerDuty payload, correlates Datadog signals in the window, checks recent deploys, scans Jira and #incidents, drafts a war room post&lt;/td&gt;
&lt;td&gt;Decides the next move&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Runbook execution&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;, &lt;a href="https://github.com/containers/kubernetes-mcp-server" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Parses the Confluence doc into steps, lays out the diagnostic sequence with commands and expected output, proposes any write command&lt;/td&gt;
&lt;td&gt;Runs the steps, approves every write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Postmortem drafting&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Reconstructs the timeline from Slack, PagerDuty, Datadog, and the deploy log, fills the team template with source-linked evidence&lt;/td&gt;
&lt;td&gt;Writes the root cause and action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;SLO investigation&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-mcp" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Finds the burn inflection, correlates deploys, config changes, traffic shifts, and upstream incidents, ranks hypotheses with linked evidence&lt;/td&gt;
&lt;td&gt;Evaluates hypotheses, decides action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;On-call handoff&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/zendesk" rel="noopener noreferrer"&gt;Zendesk&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Assembles the shift briefing from pages, active incidents, baking deploys, SLO burn, and open action items, delivers it as a Slack DM&lt;/td&gt;
&lt;td&gt;Reviews, adds color, signs off&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Workflow 1: Incident triage is mostly archaeology
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The manual triage above is a parallelism problem, not a skill problem. One engineer, five workflows, sequential context loads. Every on-call engineer I know tells the same story: "I spent the first ten minutes figuring out what was happening."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Hand the alert to Claude Code: "Triage this particular alert, correlated with the Datadog metrics, service logs, and the deployment history. Scan Slack history for other correlated failures."&lt;/p&gt;

&lt;p&gt;Claude Code returns the alert context in two sentences, the top three correlated signals with direct Datadog links, and the deploys most likely to matter by service-graph proximity with commit SHAs and authors. Two to three minutes end to end, running while you are opening the laptop. Grafana's team &lt;a href="https://grafana.com/blog/a-tale-of-two-incident-responses-how-our-ai-assist-helped-us-find-the-cause-3-5x-faster/" rel="noopener noreferrer"&gt;reported a 3.5x reduction&lt;/a&gt; in time to root cause using a similar pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;By the time the on-call moves from the alert on their phone to opening their laptop, Claude Code's initial analysis is waiting. They read the summary, validate it against the dashboards, cross-reference the ranked deploys against what they know shipped recently, and decide the next move. They also catch the failure modes: the correlation that is spurious, the deploy the service graph does not know about, the #incidents thread that was noise. Claude Code compresses the archaeology. The on-call judges it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;PagerDuty, Datadog, Slack, Jira, and GitHub all ship MCP servers. The problem is running them across a team, not building them.&lt;/p&gt;

&lt;p&gt;If the setup is not configured consistently for every engineer on the rotation, the workflow breaks on the shift that needs it most. Misconfigured permissions lead to inconclusive analysis, and inconclusive analysis at 3am is worse than no analysis at all. Engineers who wire up their own connections often grant themselves broader scopes than the workflow needs, and the next access review turns into cleanup nobody planned for. The failure mode that matters most: if tool access is not scoped properly, a diagnostic step can inadvertently trigger a write action, mutate state in production, and turn the triage itself into the incident. Consistent setup, scoped credentials, and read-only enforcement are properties of the MCP runtime, not the individual engineer's configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 2: Runbook execution at 3am
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Mature teams maintain their runbooks. The ones in constant use stay fresh because people fix them after every incident. The rot lives in two quieter places. Runbooks that fire once a quarter drift between uses, and nobody notices until the next 3am page reveals that half the commands point at deprecated tools and renamed clusters. And new engineers on the rotation often do not know which runbook applies to the alert in front of them. Finding the right doc at 3am is its own skill, and it takes months on the rotation to build.&lt;/p&gt;

&lt;p&gt;"Runbooks are a lie we tell ourselves."&lt;/p&gt;

&lt;p&gt;During my time leading &lt;a href="https://www.confluent.io/blog/making-apache-kafka-10x-more-reliable/" rel="noopener noreferrer"&gt;reliability at Confluent&lt;/a&gt; and Dropbox, I saw this pattern play out across very different stacks. It is not an organization-specific problem. It is the law of prioritization playing out: the runbooks that fire often get the attention, and the ones that fire rarely do not.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Finding the right runbook.&lt;/strong&gt; Once triage narrows the problem, the on-call needs to know which runbook applies and what to run. Point Claude Code at the alert. It matches the metadata (service, symptom, tag) against the runbook index, surfaces the top candidate, and lays out the diagnostic sequence with exact commands, the systems they target, and expected output for each step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keeping runbooks fresh.&lt;/strong&gt; Most mature teams run quality weeks or reliability sprints to refresh runbooks. At Confluent, we did this quarterly. Claude Code makes the sprint cheaper since this is a safe environment: replay every runbook against staging in a batch, flag the commands pointing at deprecated tools and renamed clusters, regenerate steps against current infra. The rot that accumulated since the last review gets caught in hours instead of weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call runs the steps. Claude Code lays out the plan, the engineer executes it. Opening unbounded production access to a coding agent does not pass the sniff test for any reliability org I have worked with, and should not. The engineer confirms Claude Code picked the right runbook, runs each diagnostic in their own terminal with their own scoped credentials, and tracks pass/fail as they go. When Claude Code picks the wrong runbook, the on-call re-points it, and that correction feeds the index for the next page.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;If Claude Code does not execute against production directly, enforcement becomes the whole game. The runbook has to be scoped to the user running it, the environment it targets, and the actions the current step actually needs. A step that is safe in staging is dangerous in prod. A step that is safe for a senior SRE is catastrophic for a new joiner still learning the cluster. Without tool-level governance that understands user, environment, and action together, you are back to trusting every engineer to read carefully at 3am, which is exactly the failure mode the runbook was supposed to prevent. Finding the right runbook and enforcing the right scopes are two different problems. Claude Code solves the first. The MCP runtime solves the second, with governance scoped per user, per environment, and per action. Both have to work, and neither replaces the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 3: Postmortem drafting rots at the archaeology step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The incident resolved at 4pm. The retro is Thursday. Someone has to write the draft. The hard part is not the thinking. It is the archaeology: Slack scrollback, PagerDuty timeline, Datadog graphs, deploy history, team template. The &lt;a href="https://incident.io/blog/postmortem-software-roi-calculator" rel="noopener noreferrer"&gt;incident.io team puts manual reconstruction&lt;/a&gt; at 60 to 90 minutes per incident. That matches every team I have run.&lt;/p&gt;

&lt;p&gt;Most postmortems get drafted badly at the last minute. The retro starts from a weak foundation, and the same incident class comes back six months later.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Type into Claude Code: "Draft the postmortem for INC-4729 using the team template." Claude Code assembles the archaeology. It pulls the Slack transcript, the PagerDuty timeline, the Datadog panels from the incident dashboard, and the deploy log for every service touched. It drops each of those into the team template with source links, so every timeline entry traces back to the panel, commit, or message it came from.&lt;/p&gt;

&lt;p&gt;The draft stops at archaeology. Timeline, impact, affected services, evidence. The root cause, contributing factors, and action items fields are left structurally empty. Teams that let AI draft those turn every retro into a cleanup exercise. &lt;a href="https://engineering.zalando.com/posts/2025/09/dead-ends-or-data-goldmines-ai-powered-postmortem-analysis.html" rel="noopener noreferrer"&gt;Zalando's team reported hallucination rates as high as 40 percent&lt;/a&gt; in early AI-drafted postmortem analysis, and the lesson is not better prompting. It is to keep anything causal out of the draft.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call and the retro group review the draft. They are not rewriting it. They correct timeline entries that are wrong, add the signal the archaeology missed (a customer report that came through email, a related incident three days earlier, the deploy two sprints ago that introduced the latent bug), and spend their time on the part that matters: running the 5 whys, pressure-testing the root cause, deciding action items.&lt;/p&gt;

&lt;p&gt;The leverage is strongest on the long tail. In my experience, eighty to ninety percent of incidents a mature team handles are high-volume, low-priority events where the archaeology is mechanical and the writeup feels mundane. That is where teams cut corners, and where repeat incidents quietly accumulate. Claude Code absorbs the mundane work so the high-judgment work gets attention on every incident, not just the big ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The tools the draft pulls from carry the most sensitive data in the company. #incidents has customer PII and vendor secrets. The deploy log has commit messages that sometimes leak security context. Datadog dashboards expose traffic patterns across the fleet. The engineer who set up the Slack connector usually has broader workspace read than the postmortem role needs, and the draft ends up citing messages it had no business reading.&lt;/p&gt;

&lt;p&gt;Scoping has to happen at the tool layer, not the prompt layer. Which channels the draft can read, which dashboards it can fetch, which tables it can query, all bounded by policy and tied to the user triggering the workflow. Then a provenance trail in a persistent log, showing what the AI accessed, when, and under whose identity. That is the half compliance will ask about, and the half that decides whether the workflow survives its first security review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 4: SLO investigation and error budget reviews
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;At Confluent, my team reviewed our availability SLO every Monday. We pulled the week's incidents, measured their impact on the SLO and the customer SLA, and mapped the root causes from each postmortem back to services and themes. The goal was to see whether the week's error budget had been spent on one repeat problem or scattered across five unrelated ones.&lt;/p&gt;

&lt;p&gt;Most of the prep was manual correlation: error budget delta, matched to PagerDuty incident, matched to Datadog regression, matched to deploy history, matched to the postmortem, matched to the theme bucket. One SRE typically spent four to six hours on that pipeline before the meeting started. The thinking happened in the review. The prep was legwork.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Ask Claude Code to prep the Monday review. It pulls the SLO and SLA deltas, fetches every PagerDuty incident in the window, joins each to the Datadog regression that matches in time and service, pulls the postmortem from Confluence, and extracts the root cause section. It groups root causes into themes using the team's existing taxonomy and hands back a structured brief: error budget delta, the incidents that account for it, the themes, and the open questions the postmortems did not resolve.&lt;/p&gt;

&lt;p&gt;What Claude Code does not do is quantify how much of the burn each incident "caused" in percentage terms. That is causal analysis current models do poorly, and a made-up percentage in a metrics review is worse than no number.&lt;/p&gt;

&lt;p&gt;The AI hunts. The human decides.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The SRE running the review reads the brief, validates the incident-to-regression matches (Claude Code will get some wrong), writes the causal story the AI refused to guess at, decides which themes warrant action items, and raises the open questions in the meeting. Four hours of prep becomes thirty minutes of review and correction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;Warehouse-backed workflows are the ones SRE teams have held off on the longest, and the reason is scope. You cannot hand Claude Code unrestricted warehouse access and hope prompt engineering keeps it away from PII. You cannot give it unbounded query budgets and wait to see a five-thousand-dollar scan on next month's bill. Scope enforcement at the MCP runtime layer is what changes the math: this task queries these tables and not others, costs less than fifty dollars, never touches prod write paths. Without that, the workflow stays a prototype and never makes the rotation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 5: On-call handoffs lose the context nobody wrote down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Handoffs are the most undervalued ritual in SRE work because the incidents they prevent never get counted. Handoff quality tracks how tired the outgoing engineer is, which means handoffs are worst on the shifts that had the most incidents, which is when they matter most. The non-obvious cost: the morning incident where the new on-call did not know a deploy was still baking, and ends up paging the previous on-call at 8am to ask what happened overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Claude Code generates the briefing at the rotation boundary, without anyone triggering it. It pulls the last 24 hours of pages with resolution notes, active incidents, baking deploys, SLOs that crossed a burn threshold, unresolved #incidents threads, Zendesk escalations, and customer reports that came in through the on-call email alias. It lists open action items assigned to the rotation. It delivers the briefing as a Slack DM with a copy in the team's handoff Confluence doc.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The outgoing engineer adds the color only they can add: what they think is a false alarm, which customer report to watch, which deploy they are nervous about, which alert they silenced and why. That is the handoff knowledge that lives in the outgoing engineer's head and nowhere else. Claude Code assembles the facts. The on-call provides the judgment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The briefing fires at 5pm whether anyone is logged in or not, which means it needs a credential that lives outside any single engineer's session. Dotfiles on a closed laptop do not qualify. A scheduled workflow without a persistent service identity is not a workflow. It is a cron job that silently stops running the next time someone rotates off the team. Persistent service identity is a property of the MCP runtime, not the engineer's laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code is a companion, not an autonomous AI SRE
&lt;/h2&gt;

&lt;p&gt;Five workflows, one pattern. Claude Code reads, correlates, drafts, and waits. The human decides.&lt;/p&gt;

&lt;p&gt;Most of the AI SRE market is betting the other way. &lt;a href="https://traversal.com/" rel="noopener noreferrer"&gt;Traversal&lt;/a&gt;, &lt;a href="https://resolve.ai/" rel="noopener noreferrer"&gt;Resolve&lt;/a&gt;, &lt;a href="https://www.anyshift.io/" rel="noopener noreferrer"&gt;Anyshift&lt;/a&gt;, and others are building toward autonomous agents that page, remediate, and close incidents on their own. I am skeptical. A model's output is a function of its capability and the context it is given. Current models can do the archaeology reliably. They cannot reliably be given enough scoped context and the right tools to remediate production unsupervised. That is a context and tooling gap, not a model gap, and I would rather ship the shape that already works.&lt;/p&gt;

&lt;p&gt;Claude Code runs when you ask. It stops when the next step needs judgment. It never pages, rolls back, or closes an incident on its own.&lt;/p&gt;

&lt;p&gt;A companion also dodges the procurement fight that stalls autonomous rollouts. You are not replacing a role or adding an on-call tier. You are pointing the tool your team already uses at data sources they already trust, with an MCP runtime that scopes what it can do. The security review goes from "new vendor, new risk" to "scoped tools inside an existing agent."&lt;/p&gt;

&lt;p&gt;Every workflow in this article starts as a prompt and grows into a skill. The triage prompt, the runbook dispatcher, the postmortem drafter, the SLO prep pipeline, the handoff briefing: each one begins as something one engineer types once, and becomes a packaged skill every engineer on the rotation invokes the same way. The skill keeps getting sharper because the team keeps editing it: a new data source here, a tighter prompt there, a correction after an incident surfaces a blind spot. One person's trick becomes team infrastructure, and the infrastructure compounds.&lt;/p&gt;

&lt;p&gt;Reliability comes from running a proper reliability program, and a proper program is mostly operational work around rituals: triage, runbooks, postmortems, SLO reviews, handoffs. Claude Code earns its keep by making the rituals cheap enough to happen on every shift, not just the ones where someone has the energy for them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI SRE needs from its MCP tool integration layer
&lt;/h2&gt;

&lt;p&gt;Every workflow above needs the same four things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Managed authentication and authorization across tools.&lt;/strong&gt; OAuth flows for every connected tool, credentials refreshed automatically, scoped per user, reachable from any device including a phone at 3am.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed compute, always on, team-wide.&lt;/strong&gt; Tools run on shared infrastructure, cloud-hosted or on-prem, with the same behavior whether the trigger came from a laptop, a phone, a webhook, or a cron job.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool- and agent-level governance.&lt;/strong&gt; Per-tool permission policies, per-task cost budgets, and per-query data access limits enforced where the call happens, not where the model proposes it. This is the difference between a workflow security will approve and one they kill on sight.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent audit logs.&lt;/strong&gt; Every tool call logged with triggering user, arguments, response, and timestamp, in a log the agent cannot modify. Without this you cannot retro the AI, and you cannot trust it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Arcade: an MCP runtime for AI SRE workflows
&lt;/h2&gt;

&lt;p&gt;Arcade is an &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; built to close exactly this gap. &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;Managed OAuth&lt;/a&gt; handles every connected tool, with credentials that refresh automatically and never touch the language model. Every tool call runs &lt;a href="https://docs.arcade.dev/en/guides/create-tools/tool-basics/runtime-data-access" rel="noopener noreferrer"&gt;on behalf of the user&lt;/a&gt; who triggered it, so native permissions in PagerDuty, Datadog, and Snowflake apply exactly as they would outside the agent. You connect PagerDuty once, and every Claude Code session on your team picks it up at the right scope.&lt;/p&gt;

&lt;p&gt;The runtime runs tools on hosted workers, deployable in your cloud or on-prem, and enforces per-tool policies where the call happens, not where the model proposes it. The same workflow triggered from a phone, a laptop, or a cron job executes on shared infrastructure. Policies fire at the MCP runtime layer: "this workflow queries these Snowflake tables and not others," "this workflow can propose PagerDuty actions but cannot execute without approval," "this workflow has a $25 query budget."&lt;/p&gt;

&lt;p&gt;Every tool call lands in an OpenTelemetry-compatible run log with triggering user, arguments, response, and timestamp. It drops straight into the observability pipeline your platform team already runs. When your postmortem asks what Claude Code did during the incident, you have the answer. When compliance asks for every query this AI ran against the warehouse last quarter, you have the answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcade.dev/toolkits" rel="noopener noreferrer"&gt;Prebuilt tools&lt;/a&gt; ship for PagerDuty, Datadog, Slack, Jira, Confluence, GitHub, Snowflake, and more. You can also &lt;a href="https://docs.arcade.dev/en/home/custom-mcp-server-quickstart" rel="noopener noreferrer"&gt;bring your own MCP servers&lt;/a&gt; into the runtime: the PagerDuty, Datadog, Snowflake, and Kubernetes servers linked in the table above drop in as-is and inherit the same managed auth, policy enforcement, and audit logs as the prebuilt ones. You extend your existing MCP investment instead of replacing it.&lt;/p&gt;

&lt;p&gt;You can build this without Arcade, and the reason not to is the same reason you did not write your own CI system: the work is real, the edge cases are ugly, and it is not where your reliability differentiation lives. A mature team can hand-roll managed OAuth, stand up hosted workers, wire per-tool policy enforcement, and ship a tamper-evident audit log. A few platform teams I know started down that path and concluded it was too costly to own, or simply not where they wanted to spend their reliability budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reducing on-call toil is where SRE leverage lives
&lt;/h2&gt;

&lt;p&gt;The outer loop has not caught up to the inner loop because the infrastructure to run agentic tools safely against production systems has been missing. A coding assistant only needs your repo and your editor. An operational assistant needs managed identity, hosted compute, enforced governance, and an audit trail, because it reaches into systems where mistakes page the CTO.&lt;/p&gt;

&lt;p&gt;The SRE teams that figure this out over the next year will pull away from the ones that do not, the same way the teams that adopted Claude Code for inner-loop work in 2024 pulled away from the teams that waited. The inner loop is solved. The outer loop is where the leverage lives now, sitting on a &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;data substrate that is its own design problem&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Claude Code does not replace the on-call. It just lets them start on page 5 instead of page 1.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an AI SRE?
&lt;/h3&gt;

&lt;p&gt;An AI SRE is an AI assistant that helps site reliability engineers with operational work: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. Most practical AI SRE deployments today run as companions that read, correlate, and draft while a human engineer decides the next move, rather than as autonomous agents that page, remediate, and close incidents on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between an MCP gateway and an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;An MCP gateway routes MCP tools under a single URL so any MCP client can call them. An MCP runtime goes further: it adds the compute that runs the tools, managed authentication, per-tool permission enforcement, and persistent audit logs. A gateway is routing infrastructure. A runtime is production infrastructure. Arcade is an MCP runtime with a gateway inside it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Claude Code replace an on-call engineer?
&lt;/h3&gt;

&lt;p&gt;No. Claude Code works best as a companion to the on-call engineer, not a replacement. It compresses the archaeology (pulling alerts, correlating signals, drafting summaries) so the engineer starts with context already loaded. Every decision that requires judgment (rolling back a deploy, paging a co-worker, closing an incident) stays with the human.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I use Claude Code for incident triage?
&lt;/h3&gt;

&lt;p&gt;Point Claude Code at the alert with a prompt like "Triage this alert, correlated with Datadog metrics, service logs, and deployment history. Scan Slack for correlated failures." With MCP servers for PagerDuty, Datadog, Slack, and GitHub wired into an MCP runtime, Claude Code returns a summary, the top correlated signals, candidate deploys, and a draft war room post in two to three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to let Claude Code execute runbooks in production?
&lt;/h3&gt;

&lt;p&gt;Claude Code should not execute against production directly. The safer pattern is for Claude Code to parse the runbook, lay out the diagnostic sequence, and propose commands, while the on-call engineer runs each step in their own terminal with their own scoped credentials. Unbounded production access for any coding agent should not pass a reliability review.&lt;/p&gt;

&lt;h3&gt;
  
  
  What MCP servers do I need for AI SRE workflows?
&lt;/h3&gt;

&lt;p&gt;The core set covers the tools already in an SRE rotation: PagerDuty, Datadog, Slack, and GitHub for incident triage; Confluence and Kubernetes for runbook execution; Snowflake for SLO investigation; Zendesk for on-call handoffs. Each has a production-ready MCP server that can run inside an MCP runtime like Arcade, which handles managed auth, policies, and audit logs across all of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Arcade work with Claude Code?
&lt;/h3&gt;

&lt;p&gt;Arcade is an MCP runtime that manages OAuth, per-tool permission policies, and audit logs for every tool Claude Code calls. You connect PagerDuty, Datadog, or Snowflake once, and every Claude Code session on your team picks up the tools at the right scope. Arcade also runs bring-your-own MCP servers, so existing integrations work as-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between AI SRE tools like Traversal and using Claude Code with an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;Traversal, Resolve, and Anyshift are building autonomous agents that page, remediate, and close incidents on their own. Claude Code with an MCP runtime takes the companion approach: read, correlate, draft, and wait for the engineer to decide. The companion pattern ships today. The autonomous bet does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the observability store underneath matter as much as the MCP runtime above?
&lt;/h3&gt;

&lt;p&gt;Yes. An AI agent runs 10 to 30 queries per investigation, and most observability stores weren't built to serve that pattern at the retention and cardinality an SRE needs. The MCP runtime handles the execution layer; the observability store handles the cognitive substrate. Both matter. I've written about the substrate side &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;here&lt;/a&gt;.  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>devops</category>
      <category>mcp</category>
    </item>
    <item>
      <title>How to Connect AI Agents to Enterprise Productivity Tools Securely (2026 Architecture Guide)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 09 Apr 2026 20:58:36 +0000</pubDate>
      <link>https://dev.to/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</link>
      <guid>https://dev.to/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</guid>
      <description>&lt;p&gt;Most enterprise AI agents today can analyze but can't execute. They summarize documents, surface insights, and draft responses. They don't close support tickets, update Salesforce, or trigger deployments. The ROI stays incremental. The architecture that solves this is an MCP runtime, a secure execution layer that handles authorization, credentials, and tool calling on behalf of each user.&lt;/p&gt;

&lt;p&gt;The real transformation happens when agents take actions, when employees direct work instead of doing it. But getting agents to safely execute across enterprise systems is where everything falls apart.&lt;/p&gt;

&lt;p&gt;Recent industry studies from IDC and MIT show that &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;88 to 95 percent of enterprise AI pilots fail to reach production&lt;/a&gt;. The root cause isn't the language model. It's the complexity of secure integration, and every month spent rebuilding auth plumbing is a month your agents aren't delivering business value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use an MCP runtime as the secure action layer&lt;/strong&gt; between your agents and enterprise tools. It evaluates the intersection of agent permissions and user permissions per action at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute every tool call on behalf of the user (OBO).&lt;/strong&gt; The agent acts with the user's credentials, scoped to the user's native permissions, and every action is attributable in audit logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep OAuth tokens out of the LLM context.&lt;/strong&gt; Credentials must be vaulted at the runtime layer where the model cannot observe, alter, or leak them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not use static service accounts.&lt;/strong&gt; They break permission models and turn a single prompt injection into an enterprise-wide incident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build with agent-optimized tools, not raw API wrappers&lt;/strong&gt;: intent-level operations with validated schemas that prevent parameter hallucination and eliminate retry loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require human-in-the-loop approvals for all destructive actions&lt;/strong&gt;. Deletes, bulk updates, and external communications must pause for explicit sign-off before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship audit logs and telemetry from day one.&lt;/strong&gt; Export every tool call via OpenTelemetry to your SIEM for compliance, incident response, and root cause analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why connecting AI agents to enterprise tools is hard: identity, permissions, and safe execution
&lt;/h2&gt;

&lt;p&gt;The bottleneck in agentic systems, such as Claude Cowork or OpenClaw, isn't making API calls. It's identity propagation, permission inheritance, and safe execution within complex enterprise environments.&lt;/p&gt;

&lt;p&gt;When teams build direct integrations between LLMs and enterprise software, they immediately hit friction. Developers spend cycles managing fragile OAuth token lifecycles, handling async user consent flows, manually tuning least-privilege authorization scopes, and building custom approval controls. This is undifferentiated infrastructure work that burns engineering time without advancing the agent's core capabilities.&lt;/p&gt;

&lt;p&gt;Because this work is tedious and blocks core agent development, teams frequently take a dangerous shortcut: they use service accounts.&lt;/p&gt;

&lt;p&gt;Granting an agent global read and write access across an entire enterprise instance breaks native permission models. You're bypassing years of carefully configured role-based access controls.&lt;/p&gt;

&lt;p&gt;A single manipulated input can result in instant, untraceable data exfiltration or system modification. If an agent holds a static API key with global write access, a localized &lt;a href="https://genai.owasp.org/llm-top-10/" rel="noopener noreferrer"&gt;prompt injection vulnerability&lt;/a&gt; becomes an enterprise-wide blast radius.&lt;/p&gt;

&lt;p&gt;Teams make two mistakes here. Give the agent its own identity, and an intern can bypass their permissions through the agent. Inherit the user's full access, and one prompt injection cascades through every connected system.&lt;/p&gt;

&lt;p&gt;The right answer is the intersection: what is this agent allowed to do &lt;strong&gt;AND&lt;/strong&gt; what is this user allowed to do, evaluated per action, at runtime. This is the permission intersection model, and it's the only approach that prevents both privilege escalation and blast radius expansion simultaneously.&lt;/p&gt;

&lt;p&gt;This evaluation must happen at the runtime layer. Not at login time, not in the prompt, and not in the application code. Without it, scaling agents beyond single-user demos is unsafe.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural shift: The agent is already the proxy
&lt;/h2&gt;

&lt;p&gt;Before evaluating specific integration approaches, you need to understand why the traditional enterprise architecture no longer applies.&lt;/p&gt;

&lt;p&gt;In the pre-agentic model, a proxy (API gateway) sits between applications and APIs, routing, authenticating, and rate limiting. The proxy is the control point because all traffic flows through it.&lt;/p&gt;

&lt;p&gt;Agents invert this topology. The agent mediates between the user and the infrastructure. It already handles routing, orchestration, and decision-making. Adding a traditional proxy in front of the tools the agent calls doesn't add a control point. It adds a redundant hop that can't see into the execution context that matters: which user, which action, which permission, right now.&lt;/p&gt;

&lt;p&gt;The control point in an agentic architecture is the execution layer where the tool runs, where credentials are resolved, permissions are checked, and actions are taken on behalf of a specific human. That's the runtime.&lt;/p&gt;

&lt;p&gt;The gateway era was defined by the proxy as the control point. The agentic era is defined by the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four architectures for connecting AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;As organizations move from isolated pilots to production deployments, engineering teams adopt one of four integration models. Understanding where each approach breaks down under enterprise load is critical for architectural planning.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Integration approach&lt;/th&gt;
&lt;th&gt;Security &amp;amp; identity&lt;/th&gt;
&lt;th&gt;Maintenance burden&lt;/th&gt;
&lt;th&gt;Reliability &amp;amp; execution&lt;/th&gt;
&lt;th&gt;Speed-to-market&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom connectors &amp;amp; DIY auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly variable; often falls back to static keys.&lt;/td&gt;
&lt;td&gt;Extremely high; requires dedicated auth teams.&lt;/td&gt;
&lt;td&gt;Low; prone to parameter hallucination loops.&lt;/td&gt;
&lt;td&gt;Very slow.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legacy iPaaS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate; struggles with On-Behalf-Of execution.&lt;/td&gt;
&lt;td&gt;Medium; relies on maintaining visual workflows.&lt;/td&gt;
&lt;td&gt;Medium; optimized for linear triggers, not loops.&lt;/td&gt;
&lt;td&gt;Moderate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unmanaged MCP servers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low; lacks centralized multi-user authorization.&lt;/td&gt;
&lt;td&gt;High; requires manual deployment and patching.&lt;/td&gt;
&lt;td&gt;Low; lacks native retries and failover state.&lt;/td&gt;
&lt;td&gt;Fast for prototypes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP runtime (e.g., Arcade)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High; native permission mapping and token vaults.&lt;/td&gt;
&lt;td&gt;Low; runtime handles lifecycle and upgrades.&lt;/td&gt;
&lt;td&gt;High; parallel execution and automatic retries.&lt;/td&gt;
&lt;td&gt;Very fast.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Approach 1: Build custom connectors and OAuth (DIY authentication)
&lt;/h3&gt;

&lt;p&gt;Build one-off API wrappers and custom OAuth layers for every enterprise tool your agent needs.&lt;/p&gt;

&lt;p&gt;The upside is total control. You dictate every aspect of the integration and avoid third-party vendor lock-in.&lt;/p&gt;

&lt;p&gt;But the limitations get crippling fast. Custom connectors become a massive engineering drain. Teams spend months building secure token vaults, handling refresh token rotation, and writing edge-case logic. Those are months that could have been spent shipping agent features that actually move the business forward.&lt;/p&gt;

&lt;p&gt;Raw enterprise APIs compound the problem. They expect highly structured, deterministic inputs, but agents generate dynamic natural language. Wiring them directly to raw endpoints leads to parameter hallucination and endless retry loops. Authentication alone becomes a standalone infrastructure project: token rotation, user matching, session validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Use legacy iPaaS for agent tool calls
&lt;/h3&gt;

&lt;p&gt;Enterprises retrofit existing integration platforms like Workato, MuleSoft, or Zapier to trigger actions based on LLM outputs.&lt;/p&gt;

&lt;p&gt;The strength is familiarity. Enterprise IT teams already know these tools, and they come with massive pre-built endpoint catalogs.&lt;/p&gt;

&lt;p&gt;But the limitations are architectural and fundamental. These platforms were built for linear, deterministic, trigger-based automation. Agentic systems operate on non-deterministic, stateful reasoning loops where the agent decides what to call, when, and how many times based on intermediate results. Forcing that into a linear webhook pattern breaks down fast.&lt;/p&gt;

&lt;p&gt;The deeper problem is identity. Legacy iPaaS platforms center on system-to-system service accounts. They lack true &lt;a href="https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-on-behalf-of-flow" rel="noopener noreferrer"&gt;user-scoped, On-Behalf-Of (OBO) execution&lt;/a&gt;, which forces teams to build complex, fragile workarounds to ensure the agent only acts with the specific permissions of the user typing the prompt. Per-user authorization evaluated at runtime across every tool call requires infrastructure these platforms were never designed to deliver.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Run unmanaged MCP servers
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/specification/latest" rel="noopener noreferrer"&gt;Model Context Protocol standardized how AI models connect to data sources and tools&lt;/a&gt;. In this approach, teams deploy open-source MCP servers to expose local or SaaS capabilities directly to their agents.&lt;/p&gt;

&lt;p&gt;MCP's strength is standardization. It decouples the agent framework from the underlying tool implementation, creating a universal language for tool calling. The problem is that the quality of unmanaged, open-source MCP servers varies widely. According to &lt;a href="https://toolbench.arcade.dev/" rel="noopener noreferrer"&gt;benchmarks&lt;/a&gt; many struggle with reliability and correctness, which compounds the challenges of production deployments.&lt;/p&gt;

&lt;p&gt;These servers break down the moment you take them to production. Raw, unmanaged MCP servers lack centralized governance. They don't ship with multi-user enterprise authentication handling, meaning every user often shares the same connection identity.&lt;/p&gt;

&lt;p&gt;They also lack production reliability features like automatic retries, parallel execution, and stateful failover out of the box. That burden falls back on the application developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 4: Use an MCP runtime (the secure action layer)
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://docs.arcade.dev/en/home" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; is the infrastructure layer purpose-built to solve this problem. &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;, the industry's first MCP runtime, combines &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;agent-optimized tools&lt;/a&gt;, centralized authentication and authorization, and enterprise governance into a single control plane.&lt;/p&gt;

&lt;p&gt;This approach targets production AI specifically. The runtime speaks MCP natively (JSON-RPC, Streamable HTTP) with no protocol translation and no context loss. It preserves native permissions through On-Behalf-Of token flows, isolates credentials from the language model, and provides instant, OpenTelemetry-compatible audit logs for every action.&lt;/p&gt;

&lt;p&gt;Teams ship faster because the runtime handles authorization, token lifecycle, retries, and governance. Engineers focus entirely on agent logic and business outcomes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/blog/mcp-runtime-gateway" rel="noopener noreferrer"&gt;Arcade's MCP Gateway&lt;/a&gt; lets any MCP client access the full tool catalog through a single endpoint. Teams can also bring their own MCP servers into the runtime to get authorization, retries, and audit logs without rewriting what already works. The runtime extends your existing MCP investment rather than replacing it.&lt;/p&gt;

&lt;p&gt;For single-user hobbyist projects or local scripts, a full runtime adds unnecessary overhead. But for platform engineering teams deploying autonomous systems to thousands of corporate users, an MCP runtime is the only viable path to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What production demands: authorization, tooling, and governance
&lt;/h3&gt;

&lt;p&gt;The comparison above shows where each approach breaks. But understanding why the MCP runtime wins requires going deeper into the three capabilities that separate production deployments from demos: just-in-time authorization that enforces user-scoped access, agent-optimized tools that eliminate hallucination loops, and governance infrastructure that gives platform teams full visibility over every action.&lt;/p&gt;

&lt;h4&gt;
  
  
  How just-in-time authorization enforces user-scoped access
&lt;/h4&gt;

&lt;p&gt;Custom connectors fall back to static keys. Legacy iPaaS platforms rely on shared service accounts. Unmanaged MCP servers lack multi-user auth entirely. All three fail at the same point: they can't evaluate who is allowed to do what at the moment the tool is called.&lt;/p&gt;

&lt;p&gt;That’s the problem &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;just-in-time authorization&lt;/a&gt; solves.&lt;/p&gt;

&lt;p&gt;The agent requests and validates credentials only at the moment an action requires them, not upfront. If a user never invokes the Salesforce integration, no Salesforce tokens are ever obtained or stored.&lt;/p&gt;

&lt;p&gt;The entire authentication flow (OAuth exchanges, token refresh, credential storage) executes in deterministic backend logic that the LLM can never alter, observe, or leak. For additional governance, teams can attach pre-tool-call and post-tool-call hooks to enforce custom policies like human-in-the-loop approvals for certain actions, usage limits or contextual access rules.&lt;/p&gt;

&lt;p&gt;This works because the runtime is stateful. It maintains per-session, per-user context across an agent's entire reasoning loop. A stateless proxy evaluates each request in isolation and can't know that a request is step 3 of a 6-step workflow, acting on behalf of Alice, who authorized this specific scope 4 minutes ago. The runtime can, and that session context is what makes per-user, per-tool authorization enforceable.&lt;/p&gt;

&lt;p&gt;This is where the permission intersection model described earlier becomes operational. The architecture enforces: Agent Permissions ∩ User Permissions = Effective Action Scope. The agent can only execute an action if both the agent's role policy and the human user's native SaaS permissions explicitly allow it. Every other combination is denied.&lt;/p&gt;

&lt;p&gt;A concrete example: an enterprise AI agent is built to assist the Human Resources department. An employee using this agent has high-level administrative privileges in Workday, including access to global payroll data. But the HR agent itself is scoped strictly to recruiting tasks.&lt;/p&gt;

&lt;p&gt;Because the runtime evaluates the intersection of these permissions at call time, the agent is denied when prompted to access payroll data. The user has the authority, but the agent's restricted scope prevents the action. This stops data exfiltration and &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html" rel="noopener noreferrer"&gt;confused deputy&lt;/a&gt; attacks cold.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent-optimized tools vs API wrappers: what to use and why
&lt;/h4&gt;

&lt;p&gt;The comparison table flags a specific failure mode for custom connectors: parameter hallucination loops. This happens because raw REST endpoints require precise, deterministic parameters, and language models produce probabilistic natural language. Wiring one directly to the other without an intermediary is where agents break.&lt;/p&gt;

&lt;p&gt;Agents need intent-level tools rather than raw API wrappers. An intent-level tool absorbs the ambiguity of an agent's request and translates it into a safe, predictable transaction. The result is faster execution, fewer failed actions, and lower inference costs because the agent doesn't burn tokens on retry loops.&lt;/p&gt;

&lt;p&gt;Production execution also requires runtime reliability features that raw APIs don't provide. The runtime provides developer-defined context for intelligent retries, parallelized execution for multi-step tasks, and automatic failover to handle rate limits and transient network errors gracefully. Standardized schemas within these tools prevent parameter hallucination, the most common cause of agent failure when wiring models directly to APIs.&lt;/p&gt;

&lt;p&gt;Consider how this works in practice. Instead of an agent calling a raw Salesforce update endpoint and failing because it hallucinated a required stage ID string, the agent uses a high-level, agent-optimized progress tool.&lt;/p&gt;

&lt;p&gt;The tool natively understands the user's intent to move a deal to negotiation. Its internal logic securely looks up the correct, exact ID for that specific Salesforce instance, validates the state transition, and safely executes the update. The language model doesn't need to guess the exact database schema. The action succeeds on the first call, not the fifth.&lt;/p&gt;

&lt;h4&gt;
  
  
  Governance and observability for agent actions (audit logs, OTel, versioning)
&lt;/h4&gt;

&lt;p&gt;Unmanaged MCP servers scored "Low" on reliability and security in the comparison above because they lack centralized governance. Once agents execute real actions on behalf of users, platform teams need complete visibility and control over the integration ecosystem. The runtime delivers this through three mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visibility filtering&lt;/strong&gt; ensures agents only see the specific tools the current user is permitted to invoke. If a user doesn't have permission to merge code in GitHub, the GitHub merge tool doesn't appear in the agent's context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deep audit trails&lt;/strong&gt; log every action per user, per service, and per agent session. These logs are &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/" rel="noopener noreferrer"&gt;exportable to standard SIEM tools via OpenTelemetry (OTel)&lt;/a&gt; to satisfy compliance audits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version control&lt;/strong&gt; lets platform engineers safely upgrade tool schemas and rotate connection parameters without breaking production agents running mid-session on older versions.&lt;/p&gt;

&lt;p&gt;When an agent incorrectly closes several open opportunities in a CRM, the platform team can't spend days parsing raw application logs. With an OTel-compatible audit log generated by the action layer, the security team can instantly trace the destructive action back to the exact user prompt, the specific agent session, and the token used. This isolates the root cause in minutes, enabling teams to refine the agent's instructions or the tool's access policy immediately.&lt;/p&gt;

&lt;p&gt;Of the four approaches evaluated, only the MCP runtime delivers all three: user-scoped authorization at call time, intent-level tooling that prevents hallucination, and centralized governance with full audit trails. The remaining sections show how this architecture works in practice and how to evaluate it for your organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose an enterprise agent integration approach (security, OBO, and TCO)
&lt;/h2&gt;

&lt;p&gt;Choosing how to connect your AI agents to enterprise tools is a foundational architectural decision. It dictates the speed and security of your deployment. Platform engineers and technical leaders need to frame their buying and building criteria around security, scale, and where their engineering resources should focus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security and compliance requirements (SOC 2, ISO 27001, auditability)
&lt;/h3&gt;

&lt;p&gt;Can the proposed solution natively map to SOC 2 and ISO 27001 requirements for strict user attribution? If an agent deletes a file in Google Workspace, the audit log must definitively prove which human authorized that action.&lt;/p&gt;

&lt;p&gt;The system must support pre-tool-call &lt;a href="https://hoop.dev/blog/how-to-keep-human-in-the-loop-ai-control-soc-2-for-ai-systems-secure-and-compliant-with-action-level-approvals" rel="noopener noreferrer"&gt;Human-in-the-Loop (HITL) approval hooks&lt;/a&gt;. Destructive actions like modifying production configurations or bulk-updating database records must pause execution and require cryptographic sign-off from a human administrator via Slack or email before proceeding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build vs buy economics (OAuth maintenance and total cost of ownership)
&lt;/h3&gt;

&lt;p&gt;Build versus buy demands a ruthless economic assessment.&lt;/p&gt;

&lt;p&gt;Calculate the actual engineering hours required to build, maintain, and securely upgrade OAuth flows for ten or more distinct enterprise APIs. Factor in the hidden costs: managing refresh token rotation, building webhook callback URLs for long-running async tasks, patching custom connectors when SaaS vendors inevitably deprecate their API versions.&lt;/p&gt;

&lt;p&gt;Then ask what those engineers could have shipped instead.&lt;/p&gt;

&lt;p&gt;Adopting an MCP runtime transforms a multi-month infrastructure project into a configuration exercise. The total cost of ownership drops dramatically, and your team reclaims months of engineering capacity to invest in the agent capabilities that differentiate your product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time-to-value and engineering focus
&lt;/h3&gt;

&lt;p&gt;Time-to-value is where most teams underestimate the cost of building in-house.&lt;/p&gt;

&lt;p&gt;Will your highly paid AI engineers spend the next three months building reliable Slack and Workspace connectors, or will they spend that time optimizing agent prompts, evaluating reasoning logic, and shipping the agent capabilities that drive revenue? Every week spent on integration plumbing is a week your competitors use to get their agents into production.&lt;/p&gt;

&lt;p&gt;When evaluating external vendors or internal architecture plans, force the issue with hard technical questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are API keys or OAuth tokens ever visible in the language model's prompt context window?&lt;/li&gt;
&lt;li&gt;How does the system resolve conflicting permissions between a highly privileged user and a narrowly scoped agent?&lt;/li&gt;
&lt;li&gt;Can the system emit W3C-standard trace context to our existing OpenTelemetry collectors?&lt;/li&gt;
&lt;li&gt;How does the tool handle rate limiting when an agent enters an unexpected retry loop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to credential visibility is anything other than absolute isolation, the architecture is unfit for enterprise production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference architecture for an MCP runtime (step-by-step flow)
&lt;/h2&gt;

&lt;p&gt;With the architectural decision framed, here's how a request actually flows through the runtime end to end. The MCP runtime acts as the intermediary that brokers trust and execution between the non-deterministic reasoning engine and the deterministic enterprise environment.&lt;/p&gt;

&lt;p&gt;The flow of a secure request follows a strict sequence:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" alt="Secure AI agent enterprise integration architecture diagram showing MCP runtime flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User prompt&lt;/strong&gt;: The user submits a request, e.g., "close this support ticket."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM plan&lt;/strong&gt;: The agent's language model determines the sequence of tool calls needed to fulfill the request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP runtime&lt;/strong&gt;: The runtime receives the tool call request. It evaluates user and agent permissions and retrieves the necessary On-Behalf-Of credential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool execution&lt;/strong&gt;: The runtime, not the agent, executes the precise API call against the target system (e.g., Zendesk).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result &amp;amp; next action:&lt;/strong&gt; The runtime receives the API result, filters it, and passes it back to the agent. The LLM then either plans the next action in the sequence or determines the task is complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirmation &amp;amp; audit&lt;/strong&gt;: The agent confirms the action's completion to the user, and the runtime logs the entire transaction via OpenTelemetry for audit purposes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture enforces a hard separation of concerns. The language model handles reasoning, planning, action selection, and generation. The runtime layer handles credentials, policy enforcement, rate limiting, action execution, and logging.&lt;/p&gt;

&lt;p&gt;By vaulting tokens at the runtime layer, this architecture prevents prompt-injection-driven data exfiltration. The language model never possesses the keys required to export data.&lt;/p&gt;

&lt;h3&gt;
  
  
  How an MCP runtime works with any LLM
&lt;/h3&gt;

&lt;p&gt;The MCP runtime works with any LLM through any orchestration framework, or none at all. No framework dependency is required. Arcade serves as the secure execution backend: your code handles reasoning, Arcade handles credentials, authorization, and tool execution.&lt;/p&gt;

&lt;p&gt;This clean separation is what accelerates time-to-production. AI engineers focus entirely on agent logic while offloading the high-risk plumbing of enterprise integrations to the runtime.&lt;/p&gt;

&lt;p&gt;A working example: an agent that reads Gmail and sends Slack messages through Arcade's runtime. Setup requires three dependencies and three environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;arcadepy openai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_API_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_arcade_api_key&lt;/span&gt;        &lt;span class="c"&gt;# Free at arcade.dev
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_USER_ID&lt;/span&gt;=&lt;span class="n"&gt;your_email&lt;/span&gt;@&lt;span class="n"&gt;company&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;     &lt;span class="c"&gt;# The user the agent acts on behalf of
&lt;/span&gt;&lt;span class="n"&gt;OPENAI_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_openai_key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;arcadepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Arcade&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;arcade_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Arcade&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;arcade_user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define enterprise productivity tools — Arcade handles auth for each
&lt;/span&gt;&lt;span class="n"&gt;tool_catalog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.ListEmails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.SendEmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slack.SendMessage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Get tool definitions formatted for the LLM
&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_catalog&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# JIT authorization + execution — credentials never touch the LLM
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorize &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agentic loop — LLM reasons and selects tools, Arcade executes them
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
   &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
   &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude_none&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
           &lt;span class="k"&gt;continue&lt;/span&gt;
       &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
       &lt;span class="k"&gt;break&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize my latest 5 emails, then send me a DM on Slack with the summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM reasons through the task, selects &lt;code&gt;Gmail.ListEmails&lt;/code&gt; to fetch emails, summarizes them, then selects &lt;code&gt;Slack.SendMessage&lt;/code&gt; to deliver the summary. The runtime handles JIT authorization for each tool on behalf of that specific user. The agent never sees OAuth tokens, never manages refresh flows, and never touches credentials. &lt;a href="https://docs.arcade.dev/en/get-started/agent-frameworks/setup-arcade-with-your-llm-python" rel="noopener noreferrer"&gt;Full walkthrough in the Arcade docs.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps to productionize agent integrations (checklist)
&lt;/h2&gt;

&lt;p&gt;To transition from sandbox prototypes to production-grade deployments, platform engineering teams follow a structured, iterative implementation plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inventory required tools and least-privilege scopes
&lt;/h3&gt;

&lt;p&gt;Start by conducting a rigorous audit of your necessary tools. List the specific APIs your agents need, and document the exact user-scopes and OAuth granularities required for each. Don't request global access. Map out the principle of least privilege for every single workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Define autonomous vs human-approved actions (HITL)
&lt;/h3&gt;

&lt;p&gt;Next, define your operational boundaries. Build a matrix deciding which actions are safe for autonomous execution (like reading calendar events) and which high-risk actions require explicit user delegation or human-in-the-loop approval hooks (like deleting files or sending external emails).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Standardize on a single control plane
&lt;/h3&gt;

&lt;p&gt;Centralize your integration strategy immediately. Prevent the creation of "shadow registries."&lt;/p&gt;

&lt;p&gt;When disparate engineering teams build redundant, unmanaged integrations using hardcoded tokens, they create severe security vulnerabilities and integration sprawl. Standardize on a single control plane for all agent tool use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Pilot one workflow and validate token isolation and telemetry
&lt;/h3&gt;

&lt;p&gt;Before rolling out broadly, test the architecture with a narrow, controlled use case. Pilot a single workflow, like developer issue automation linking GitHub and Jira, to validate token isolation and telemetry.&lt;/p&gt;

&lt;p&gt;Invest in infrastructure, not just isolated connectors. Evaluate platforms that treat authorization, agent-optimized tools, and lifecycle governance as a unified secure runtime, not separate problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Use an MCP runtime to connect AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;The true challenge of connecting AI to enterprise productivity tools has little to do with formatting JSON payloads or making API calls. The bottleneck is securing user-scoped access, enforcing least-privilege permissions at runtime, and maintaining rigorous operational governance over non-deterministic systems.&lt;/p&gt;

&lt;p&gt;The most successful platform engineering teams recognize that rebuilding identity propagation, token lifecycles, and reliable integration mechanics from scratch is an expensive distraction from their core business objectives. They need an MCP runtime, not more custom connectors.&lt;/p&gt;

&lt;p&gt;Arcade is the industry's first MCP runtime. It delivers secure agent authorization, the largest catalog of agent-optimized tools, and centralized lifecycle governance in a single control plane. Arcade eliminates the undifferentiated heavy lifting of enterprise integration so your team ships faster and scales with control.&lt;/p&gt;

&lt;p&gt;If you're building agents that need to execute across enterprise tools, start with the &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;getting started guide&lt;/a&gt; or explore the &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;full tool catalog&lt;/a&gt; to see what's available out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: Enterprise AI agent integrations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the best way to connect AI agents to enterprise productivity tools?
&lt;/h3&gt;

&lt;p&gt;Use an MCP runtime, a secure action layer that performs user-scoped (OBO) execution, keeps tokens out of the LLM, and enforces runtime authorization per tool call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should AI agents use service accounts to access Slack, Google Workspace, or Microsoft 365?
&lt;/h3&gt;

&lt;p&gt;No. Service accounts bypass user permissions and expand the blast radius of prompt injection. Use on-behalf-of user execution with least-privilege scopes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "On-Behalf-Of (OBO)" mean for agent integrations?
&lt;/h3&gt;

&lt;p&gt;OBO means the agent executes each action using credentials tied to the requesting user, so the action is limited to that user's native permissions and is attributable in audit logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is just-in-time authorization for AI agents?
&lt;/h3&gt;

&lt;p&gt;Just-in-time authorization is a runtime policy check that executes at the moment of each tool call, evaluating the user's identity, the agent's allowed scope, and the requested action. Credentials are requested and validated only when needed, not pre-authorized during setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an MCP runtime, and how is it different from an MCP server?
&lt;/h3&gt;

&lt;p&gt;An MCP server exposes tools to an agent using the MCP, but it's typically single-user, stateless, and ships without built-in auth, token management, or observability. An MCP runtime is the enterprise infrastructure layer that complements MCP servers to add what they lack: multi-user OBO authentication, per-call policy enforcement, token vaulting, automatic retries, and audit/telemetry. The server defines what the agent can call; the runtime makes it safe to call at scale. Arcade is the industry's first MCP runtime, purpose-built for production agent deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the minimum security requirements for production agent tool access?
&lt;/h3&gt;

&lt;p&gt;Token isolation from the LLM, user-scoped/OBO execution, least-privilege scopes, per-action authorization, audit logs with user attribution, and HITL approvals for high-risk actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you audit and attribute agent actions for compliance (SOC 2 / ISO 27001)?
&lt;/h3&gt;

&lt;p&gt;Log every tool call with user identity, tool, parameters/intent, outcome, and trace context, and export via OpenTelemetry to your SIEM for investigation and reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  When do legacy iPaaS tools (Zapier/Workato/MuleSoft) break down for agents?
&lt;/h3&gt;

&lt;p&gt;They struggle with non-deterministic agent loops and true user-scoped OBO execution, forcing teams to rely on shared credentials or brittle workarounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do agent-optimized tools reduce hallucinations compared to raw API wrappers?
&lt;/h3&gt;

&lt;p&gt;They use intent-level operations with validated schemas and internal lookups, so the model doesn't have to guess required IDs/parameters and can fail safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should we add human-in-the-loop (HITL) approvals?
&lt;/h3&gt;

&lt;p&gt;For destructive or irreversible actions (deletes, external emails, bulk updates, permission changes) or any action that materially impacts security, finance, or customer data.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to build a secure WhatsApp AI assistant with Arcade and Claude Code (OpenClaw alternative)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 02 Apr 2026 21:43:19 +0000</pubDate>
      <link>https://dev.to/arcade/how-to-build-a-secure-whatsapp-ai-assistant-with-arcade-and-claude-code-openclaw-alternative-3f4f</link>
      <guid>https://dev.to/arcade/how-to-build-a-secure-whatsapp-ai-assistant-with-arcade-and-claude-code-openclaw-alternative-3f4f</guid>
      <description>&lt;p&gt;I texted "prep me for my 2pm" on WhatsApp. Thirty seconds later, my phone buzzed back with a structured briefing: who I was meeting, what we last discussed over email, what my team said about them in Slack, and three talking points. No browser tab. No laptop. Just a message on my commute.&lt;/p&gt;

&lt;p&gt;That's the promise of an always-on AI assistant. And until recently, it was almost impossible to build one that actually worked.&lt;/p&gt;

&lt;p&gt;Open-source frameworks like OpenClaw made headless, two-way messaging agents popular. Anthropic's &lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Claude Code Channels&lt;/a&gt; confirmed the approach had legs. Channels is currently in research preview, but the direction is clear. Anthropic already uses this pattern for hand-offs between their desktop app, mobile app, and Claude Code. Expect this to GA in some form.&lt;/p&gt;

&lt;p&gt;But getting from a weekend demo to a reliable assistant exposes gaps that no amount of prompt engineering fixes. Authorization. Tool reliability. Session management. The agent needs access to your calendar, email, and Slack, and you need to be sure it's not a security liability.&lt;/p&gt;

&lt;p&gt;I built a working version. This guide walks through the entire thing: a WhatsApp relay server, an MCP server, Claude Code as the brain, and Arcade.dev for secure tool access. Working code at every step.&lt;/p&gt;

&lt;p&gt;We'll start with the pitfalls you need to understand, then build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw-style headless frameworks give your agent god-mode access to every connected service, rely on brittle tool wrappers, bloat the context window with raw API responses, and produce zero audit trail. Buying a dedicated Mac Mini to run them doesn't help. The machine isn't the threat model, the credentials are.&lt;/li&gt;
&lt;li&gt;This guide builds a WhatsApp AI assistant using a relay server that handles Meta's webhooks, an MCP server that bridges to Claude Code, Arcade for secure tool access and audit logging, and a meeting-prep skill that pulls from Google Calendar, Gmail, and Slack to deliver structured briefings directly in WhatsApp.&lt;/li&gt;
&lt;li&gt;Every layer includes working code you can run locally: webhook ingestion with HMAC signature validation, a cursor-based message queue, MCP tool definitions, Claude Code configuration, and a complete skill file that encodes a three-phase meeting-prep workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From demo to production: The four pitfalls of always-on AI agents
&lt;/h2&gt;

&lt;p&gt;The headless setup that OpenClaw popularized is the starting line. The moment you try to move from a weekend proof of concept to something you'd actually trust with your calendar and email, four architectural problems surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: God-mode credentials and the agent security risk
&lt;/h3&gt;

&lt;p&gt;Headless agent frameworks inherit the host machine's full access profile. The agent gets the same permissions as the developer who launched it. Every OAuth token, every API key, every connected service, wide open.&lt;/p&gt;

&lt;p&gt;A single prompt injection or compromised dependency cascades through everything. Your Google Drive, your CRM, your source code repos. One bad input and the agent becomes an insider threat.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. &lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2026-25253" rel="noopener noreferrer"&gt;CVE-2026-25253&lt;/a&gt; exposed a one-click RCE in OpenClaw. The gateway lacked origin validation. An attacker could exfiltrate the auth token via a malicious link and achieve total system compromise.&lt;/p&gt;

&lt;p&gt;We wrote about this pattern in detail in &lt;a href="https://blog.arcade.dev/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens" rel="noopener noreferrer"&gt;OpenClaw doesn't need your tokens&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Fragile API wrappers and the tool reliability problem
&lt;/h3&gt;

&lt;p&gt;Most agent tools are thin wrappers around REST APIs. They force the model to guess complex payload parameters and retry when natural language doesn't map to rigid schemas.&lt;/p&gt;

&lt;p&gt;Then shadow registries appear. Different teams build duplicate, unversioned wrappers for the same APIs. One unannounced API change breaks multiple agents in ways nobody predicted. Public tool registries have already become a supply-chain attack vector, with malicious tools that exfiltrate local state or establish backdoors.&lt;/p&gt;

&lt;p&gt;For patterns that make MCP tools more resilient, see &lt;a href="https://blog.arcade.dev/mcp-tool-patterns" rel="noopener noreferrer"&gt;54 Patterns for Building Better MCP Tools&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Context window bloat from raw API responses
&lt;/h3&gt;

&lt;p&gt;Unoptimized tools dump the full API response into the context window. A Jira ticket history? Tens of thousands of tokens of irrelevant metadata. The agent's reasoning goes erratic. Costs spike with every conversation turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 4: No audit trail, no reliability, no compliance
&lt;/h3&gt;

&lt;p&gt;Keeping a self-hosted agent alive with &lt;code&gt;tmux&lt;/code&gt; or &lt;code&gt;systemd&lt;/code&gt; creates an audit black hole. When the process crashes or misbehaves, there's no structured log to trace what happened. Which action was taken? What parameters? Which user started the request?&lt;/p&gt;

&lt;p&gt;You can't answer "what did the agent do?" if you never logged it.&lt;/p&gt;

&lt;p&gt;That's an immediate fail for SOC2, ISO27001, and any serious compliance review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why buying a Mac Mini doesn't fix any of this
&lt;/h2&gt;

&lt;p&gt;There's a growing trend: developers buying dedicated Mac Minis or spinning up VMs to run OpenClaw-style agents 24/7. The reasoning is, if the agent has its own machine, you've isolated it.&lt;/p&gt;

&lt;p&gt;You haven't. The machine isn't the threat model. The credentials are.&lt;/p&gt;

&lt;p&gt;That Mac Mini still needs OAuth tokens for Google Calendar, API keys for your CRM, access to your Slack workspace. A compromised dependency doesn't care whether it's running on your laptop or a dedicated server in a closet. The blast radius is identical. For a deeper comparison of isolation strategies that actually reduce blast radius, see &lt;a href="https://manveerc.substack.com/p/ai-agent-sandboxing-guide" rel="noopener noreferrer"&gt;AI Agent Sandboxing Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Hardware isolation solves availability. It doesn't touch authorization, tool reliability, context management, or audit logging.&lt;/p&gt;

&lt;p&gt;You've built an expensive, always-on machine with unfettered access to your business systems. Every pitfall above still applies.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Arcade, Claude Code, and Skills solve these problems
&lt;/h3&gt;

&lt;p&gt;I needed three things: a secure way to connect to business tools, a battle-tested agent runtime, and a way to encode workflows without writing integration code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt; solves the tool and auth layer. It sits between the agent and your business tools. When the agent wants to read your calendar, Arcade evaluates permissions, mints a just-in-time token scoped to that specific action, and executes the call. The LLM never sees long-lived credentials. Your Google Calendar token isn't sitting in an &lt;code&gt;.env&lt;/code&gt; file on a Mac Mini. It's managed by Arcade's runtime with per-action authorization.&lt;/p&gt;

&lt;p&gt;Arcade also solves the brittle tools problem. Instead of writing fragile REST wrappers, you use &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;pre-built, agent-optimized integrations&lt;/a&gt; that return summarized data, not raw JSON dumps. When Google changes their Calendar API, Arcade handles it. Your agent code stays untouched. And every tool call generates structured audit logs tied to the specific user and action.&lt;/p&gt;

&lt;p&gt;Claude Code is the agent runtime. It's more battle-tested than OpenClaw, has native MCP support, and handles tool orchestration without the brittle process management of &lt;code&gt;tmux&lt;/code&gt; and &lt;code&gt;systemd&lt;/code&gt; scripts.&lt;/p&gt;

&lt;p&gt;Skills encode the actual workflows. This is the piece most people miss. Arcade gives the agent &lt;em&gt;access&lt;/em&gt; to your tools with proper auth. Skills tell the agent &lt;em&gt;how to use them well&lt;/em&gt;. For a deeper look at the distinction, see &lt;a href="https://blog.arcade.dev/what-are-agent-skills-and-tools" rel="noopener noreferrer"&gt;Skills vs Tools for AI Agents&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A skill is a markdown file that encodes domain expertise: which tools to call, in what order, what to look for in the results, how to format the output. Without a skill, you have an agent with calendar access but no idea how to prepare a meeting brief. With a skill, you have an assistant that pulls calendar events, cross-references email threads, checks Slack for internal context, and delivers a structured briefing, all from a single WhatsApp message.&lt;/p&gt;

&lt;p&gt;Arcade gives access. Skills give expertise. Together, they turn an LLM into a useful assistant.&lt;/p&gt;

&lt;p&gt;And because skills are just markdown files, anyone on the team can write and iterate on them. No code deployment. No engineering tickets.&lt;/p&gt;

&lt;p&gt;Here's what we're building: a WhatsApp relay for messaging, Claude Code as the brain, Arcade for auth-managed tool access, and skills that encode your team's workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-step: building the WhatsApp AI assistant with MCP and Arcade
&lt;/h2&gt;

&lt;p&gt;Enough architecture. Here's what we're making: WhatsApp messages flow through a relay server into an MCP server, which feeds them to Claude Code. Claude Code processes messages using skills, calls business tools through Arcade, and replies back through the same chain.&lt;/p&gt;

&lt;p&gt;One wrinkle: WhatsApp's Cloud API only supports webhooks. There's no WebSocket or long-polling option. That means something has to sit on a public URL to receive Meta's callbacks. Since we're running everything locally, the relay server handles that role, and ngrok tunnels traffic from Meta's servers to it on your machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.arcade.dev%2F_astro%2Fwhatsapp-to-claude-code-technical-architecture-diagram.n4Enlg4V_Z1ve4Qd.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.arcade.dev%2F_astro%2Fwhatsapp-to-claude-code-technical-architecture-diagram.n4Enlg4V_Z1ve4Qd.webp" alt="A detailed technical architecture diagram illustrating the integration flow from a WhatsApp user on a smartphone to Claude Code. The horizontal sequential flow proceeds through Meta Cloud API, ngrok, Relay Server, and MCP Server before reaching Claude Code. An auxiliary 'Arcade' service box (with integrated services like Calendar, Email, Slack, and CRM) is connected to Claude Code. A dashed return line labeled 'replies' indicates a feedback path from Claude Code back to the Relay Server." width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prerequisites: WhatsApp Business API, Claude Code, and Arcade&lt;/p&gt;

&lt;p&gt;Before starting, make sure you have a Meta developer account with a WhatsApp Business App configured (&lt;a href="https://developers.facebook.com/docs/whatsapp/cloud-api/get-started" rel="noopener noreferrer"&gt;Meta's getting started guide&lt;/a&gt;), Node.js 20+ and npm, ngrok for tunneling webhooks to your local machine, Claude Code installed and configured, an &lt;a href="https://app.arcade.dev/register" rel="noopener noreferrer"&gt;Arcade account&lt;/a&gt; with API access, and a phone number registered with WhatsApp Business API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Project structure and environment setup
&lt;/h3&gt;

&lt;p&gt;Here's the folder layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whatsapp-assistant/
├── whatsapp.ts          # MCP server (bridge between relay and Claude Code)
├── package.json         # MCP server dependencies
├── .mcp.json            # Claude Code MCP server registration
├── whatsapp-relay/
│   ├── relay.ts         # Relay server (faces the internet via ngrok)
│   ├── package.json     # Relay server dependencies
│   └── .env             # WhatsApp API credentials (from .env.example)
└── skills/
    └── meeting-prep/
        └── SKILL.md     # Meeting preparation skill for Claude Code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start by setting up both projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the project&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;whatsapp-assistant &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-assistant

&lt;span class="c"&gt;# Initialize the MCP server&lt;/span&gt;
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @modelcontextprotocol/sdk
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; typescript @types/node tsx

&lt;span class="c"&gt;# Initialize the relay server&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;whatsapp-relay &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-relay
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;hono @hono/node-server
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; typescript @types/node tsx
&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create your &lt;code&gt;.env&lt;/code&gt; file inside &lt;code&gt;whatsapp-relay/&lt;/code&gt; with the following variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Meta WhatsApp Cloud API
&lt;/span&gt;&lt;span class="py"&gt;WHATSAPP_ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;        &lt;span class="c"&gt;# Bearer token from Meta App Dashboard
&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_PHONE_NUMBER_ID=     # Bot's phone number ID&lt;/span&gt;
&lt;span class="py"&gt;WHATSAPP_VERIFY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;        &lt;span class="c"&gt;# Any string, used for webhook verification handshake
&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_APP_SECRET=          # App secret for validating webhook signatures&lt;/span&gt;

&lt;span class="c"&gt;# Relay auth
&lt;/span&gt;&lt;span class="py"&gt;RELAY_SECRET&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;                 &lt;span class="c"&gt;# Shared secret, local MCP server sends this in X-Relay-Secret header
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;RELAY_SECRET&lt;/code&gt; is a shared key between the relay and MCP server. Generate something random (&lt;code&gt;openssl rand -hex 32&lt;/code&gt;). It prevents anything on your network from impersonating the MCP server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Build the WhatsApp webhook relay server
&lt;/h3&gt;

&lt;p&gt;The relay is the only component that faces the internet. It has three jobs: validate incoming WhatsApp webhooks, queue messages for the MCP server, and proxy outbound messages to Meta's API.&lt;/p&gt;

&lt;h4&gt;
  
  
  Webhook signature validation
&lt;/h4&gt;

&lt;p&gt;Every webhook payload from Meta includes an HMAC-SHA256 signature. The relay verifies this before processing anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createHmac&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timingSafeEqual&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;APP_SECRET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WHATSAPP_APP_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifySignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createHmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;APP_SECRET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;timingSafeEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses &lt;code&gt;timingSafeEqual&lt;/code&gt; to prevent timing attacks, a detail that matters when you're validating signatures from a third party.&lt;/p&gt;

&lt;h4&gt;
  
  
  Webhook handler: always return 200
&lt;/h4&gt;

&lt;p&gt;Meta uses at-least-once delivery. If your endpoint returns anything other than &lt;code&gt;200&lt;/code&gt;, Meta retries, potentially creating a storm of duplicate events. The relay acknowledges immediately and processes asynchronously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/webhook&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawBody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;verifySignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-hub-signature-256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Still return 200. Returning 4xx causes Meta to retry with the same bad signature.&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;webhook: invalid signature&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;WaWebhookPayload&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;parseMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;webhook: parse error:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the pattern: even on a bad signature, we return &lt;code&gt;200&lt;/code&gt;. Logging the rejection is enough. Returning &lt;code&gt;4xx&lt;/code&gt; just makes Meta retry with the same bad payload.&lt;/p&gt;

&lt;h4&gt;
  
  
  In-memory message queue with polling
&lt;/h4&gt;

&lt;p&gt;The relay queues validated messages and exposes a polling endpoint for the MCP server. The MCP server passes a cursor (the last message ID it saw) to get only new messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;nextId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Omit&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;nextId&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Polling endpoint, protected by relay secret&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/poll&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;since&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relay authenticates all local-facing routes with the shared secret via &lt;code&gt;x-relay-secret&lt;/code&gt; header. The WhatsApp-facing webhook routes don't use this. They're validated by Meta's HMAC signature instead.&lt;/p&gt;

&lt;h4&gt;
  
  
  Outbound message proxy
&lt;/h4&gt;

&lt;p&gt;When Claude Code wants to reply, it goes through the MCP server, which calls the relay, which calls Meta's API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;WA_API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://graph.facebook.com/v21.0/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;PHONE_NUMBER_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;waApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;WA_API&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relay is built with &lt;a href="https://hono.dev/" rel="noopener noreferrer"&gt;Hono&lt;/a&gt;, a lightweight framework that keeps the code minimal. The full relay is roughly 200 lines and handles text messages, images, documents, audio, video, stickers, reactions, and location shares.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Build the MCP server for Claude Code
&lt;/h3&gt;

&lt;p&gt;The MCP server is the bridge between the relay and Claude Code. It polls the relay for incoming WhatsApp messages and exposes tools that Claude Code can call to respond.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tool definitions
&lt;/h4&gt;

&lt;p&gt;The server registers four tools with Claude Code via the Model Context Protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;whatsapp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude/channel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The sender reads WhatsApp, not this session.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Anything you want them to see must go through the reply tool.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Messages arrive as &amp;lt;channel source="whatsapp" chat_id="..." wamid="..." user="..." ts="..."&amp;gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Reply with the reply tool. Pass chat_id (phone number) back.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WhatsApp has a 24-hour session window: you can only send free-form messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;within 24 hours of the user's last message.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;instructions&lt;/code&gt; field tells Claude Code how to interpret incoming messages and that it must use the &lt;code&gt;reply&lt;/code&gt; tool to send anything back. Without this, the model might try to respond in its own transcript, which the WhatsApp user would never see.&lt;/p&gt;

&lt;p&gt;The four tools are &lt;code&gt;reply&lt;/code&gt; (send text), &lt;code&gt;react&lt;/code&gt; (emoji reactions), &lt;code&gt;mark_read&lt;/code&gt; (read receipts), and &lt;code&gt;send_media&lt;/code&gt; (images, documents, audio, video). Here's the reply tool definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;reply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Reply on WhatsApp. Pass chat_id (phone number) from the inbound message.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nl"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Phone number to send to&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Message text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;reply_to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wamid to quote-reply to (optional)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nx"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Polling loop with cursor persistence
&lt;/h4&gt;

&lt;p&gt;The MCP server polls the relay every 2 seconds and forwards new messages to Claude Code as channel notifications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CURSOR_FILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HOME&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/tmp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.whatsapp-relay-cursor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadCursor&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;relay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/poll?since=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newCursor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
      &lt;span class="nl"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;wamid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wamid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pushName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;

      &lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;notifications/claude/channel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s2"&gt;`(&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newCursor&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newCursor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nf"&gt;saveCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`whatsapp channel: poll error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cursor persists to disk (&lt;code&gt;~/.whatsapp-relay-cursor&lt;/code&gt;), so restarting the MCP server doesn't re-process old messages. Each message becomes a channel notification that Claude Code sees as a new input, including the sender's phone number, display name, timestamp, and message type as metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Register the MCP server with Claude Code
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.mcp.json&lt;/code&gt; file in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whatsapp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--import"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whatsapp.ts"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. When Claude Code starts in this directory, it discovers the MCP server, launches it as a child process via stdio, and the WhatsApp channel becomes available. Claude Code now receives WhatsApp messages as channel notifications and can call the reply, react, mark_read, and send_media tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Configure the Arcade gateway and connect it to Claude Code
&lt;/h3&gt;

&lt;p&gt;Before the assistant can access business tools, you need to create an Arcade gateway that defines which tools the agent can use and with what permissions.&lt;/p&gt;

&lt;p&gt;Log into the &lt;a href="https://app.arcade.dev/" rel="noopener noreferrer"&gt;Arcade dashboard&lt;/a&gt;, create a new gateway, and add the MCP servers for the services your assistant needs: Google Calendar, Gmail, Slack, and any others relevant to your workflows. For each server, select only the specific tools you want the agent to access. This is where you scope permissions. If the meeting-prep skill only needs to list calendar events and search email, there's no reason to expose tools that delete events or send email on your behalf.&lt;/p&gt;

&lt;p&gt;Once the gateway is created, register it with Claude Code from the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add &lt;span class="s1"&gt;'arcade-gateway'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--transport&lt;/span&gt; http &lt;span class="s1"&gt;'https://api.arcade.dev/mcp/&amp;lt;your-gateway-slug&amp;gt;'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;your-arcade-api-key&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Arcade-User-ID: &amp;lt;your-email&amp;gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This writes the gateway configuration to &lt;code&gt;~/.claude.json&lt;/code&gt;. Claude Code now has two MCP servers: the local WhatsApp channel server (from &lt;code&gt;.mcp.json&lt;/code&gt; in the project) and the remote Arcade gateway (from &lt;code&gt;~/.claude.json&lt;/code&gt;). The WhatsApp server handles messaging. The Arcade gateway handles business tool access with per-action authorization.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Arcade-User-ID&lt;/code&gt; header tells Arcade which user's credentials to use when executing tool calls. In the single-user setup, this is your email. In the multi-user architecture described later, the orchestrator passes a different user ID per session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Create a meeting-prep skill with Arcade tools
&lt;/h3&gt;

&lt;p&gt;With the channel wired up, the assistant needs capabilities. This is where tools and skills work together. Arcade provides secure access to business tools (Google Calendar, Gmail, Slack), and skills tell the agent how to use those tools to accomplish a specific workflow.&lt;/p&gt;

&lt;p&gt;Skills in Claude Code are markdown files. No code, no deployment, just a structured prompt that encodes domain expertise. Here's the structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skills/
└── meeting-prep/
    └── SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill file has two parts: frontmatter that tells Claude Code when to activate it, and a body that defines the workflow. Here's the meeting-prep skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;meeting-prep&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Prepare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;briefings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;upcoming&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;customer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;meetings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reading"&lt;/span&gt;
  &lt;span class="s"&gt;your Google Calendar, identifying external/customer meetings (based on&lt;/span&gt;
  &lt;span class="s"&gt;attendee email domains), then pulling relevant context from Gmail threads&lt;/span&gt;
  &lt;span class="s"&gt;and Slack conversations."&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Meeting Prep&lt;/span&gt;

You are a meeting preparation assistant. Your job is to create concise,
actionable briefings for upcoming external meetings.

&lt;span class="gu"&gt;## Customer Directory&lt;/span&gt;
Read the centralized client registry at &lt;span class="sb"&gt;`$AGENT_DATA_DIR/clients.md`&lt;/span&gt;.
Use it to match calendar attendee domains to known customers, find the
correct Slack channel, and locate customer-specific data files.

&lt;span class="gu"&gt;## Phase 1: Discover (Find the Meeting)&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Search Google Calendar using &lt;span class="sb"&gt;`list_events`&lt;/span&gt; for the relevant time window
&lt;span class="p"&gt;-&lt;/span&gt; Identify external meetings by checking attendee email domains
&lt;span class="p"&gt;-&lt;/span&gt; Any attendee whose domain is NOT your organization signals an external meeting

&lt;span class="gu"&gt;## Phase 2: Gather (Pull Context from Email and Slack)&lt;/span&gt;

&lt;span class="gu"&gt;### Email Context (Gmail)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Search for recent threads involving external attendees (last 30 days)
&lt;span class="p"&gt;2.&lt;/span&gt; Read the 3-5 most relevant threads, looking for decisions, action items, tone
&lt;span class="p"&gt;3.&lt;/span&gt; Check the calendar event itself for agenda or documents

&lt;span class="gu"&gt;### Slack Context&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; If there's a dedicated customer channel, read recent messages there
&lt;span class="p"&gt;2.&lt;/span&gt; Otherwise search by company name or contact names (last 2 weeks)
&lt;span class="p"&gt;3.&lt;/span&gt; Look for internal context not in email: concerns, feature requests, deal status

&lt;span class="gu"&gt;## Phase 3: Brief (Deliver the Prep)&lt;/span&gt;

&lt;span class="gu"&gt;### Meeting Briefing: [Title]&lt;/span&gt;
&lt;span class="gs"&gt;**When:**&lt;/span&gt; [Date &amp;amp; Time]
&lt;span class="gs"&gt;**With:**&lt;/span&gt; [Attendees + roles/company]
&lt;span class="gs"&gt;**Meeting type:**&lt;/span&gt; [Quarterly review, Demo, Follow-up, Intro call]

&lt;span class="gs"&gt;**Quick Context:**&lt;/span&gt; 2-3 sentences on where things stand
&lt;span class="gs"&gt;**Recent History:**&lt;/span&gt; Chronological recap of last interactions
&lt;span class="gs"&gt;**Key Things to Know:**&lt;/span&gt; Open items, concerns, opportunities
&lt;span class="gs"&gt;**Suggested Talking Points:**&lt;/span&gt; 3-5 practical conversation starters
&lt;span class="gs"&gt;**People Notes:**&lt;/span&gt; Brief note on new stakeholders or unfamiliar attendees
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill tells the agent exactly which Arcade-powered tools to use (&lt;code&gt;list_events&lt;/code&gt;, &lt;code&gt;search_messages&lt;/code&gt;, &lt;code&gt;read_thread&lt;/code&gt;), in what order, what signals to look for in the results, and how to format the output. The customer directory lookup means the agent doesn't waste tokens fuzzy-matching company names. It goes straight to the right email domain and Slack channel.&lt;/p&gt;

&lt;p&gt;When a user texts "prep me for my 2pm" on WhatsApp, Claude Code receives the message via the channel, activates this skill, runs the three-phase workflow through Arcade's tools, and sends the briefing back via the WhatsApp reply tool. The whole flow, from WhatsApp message to structured briefing, happens without the user leaving the chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Run and test the WhatsApp assistant locally
&lt;/h3&gt;

&lt;p&gt;Start everything in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1: Start the relay server&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-relay
node &lt;span class="nt"&gt;--import&lt;/span&gt; tsx relay.ts
&lt;span class="c"&gt;# → "whatsapp relay listening on :3000"&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 2: Expose the relay via ngrok&lt;/span&gt;
ngrok http 3000
&lt;span class="c"&gt;# → Copy the https:// forwarding URL&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 3: Start Claude Code from the project root&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-assistant
claude &lt;span class="nt"&gt;--dangerously-load-development-channels&lt;/span&gt; server:whatsapp
&lt;span class="c"&gt;# Claude Code discovers .mcp.json and launches the MCP server&lt;/span&gt;
&lt;span class="c"&gt;# → "whatsapp channel: connected, polling http://localhost:3000 every 2000ms"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register your webhook with Meta by going to your app in the &lt;a href="https://developers.facebook.com/" rel="noopener noreferrer"&gt;Meta Developer Dashboard&lt;/a&gt;, then navigating to WhatsApp, Configuration, Webhook. Set the Callback URL to your ngrok URL plus &lt;code&gt;/webhook&lt;/code&gt; (e.g., &lt;code&gt;https://abc123.ngrok.io/webhook&lt;/code&gt;), set the Verify Token to the value in your &lt;code&gt;.env&lt;/code&gt; file, and subscribe to the &lt;code&gt;messages&lt;/code&gt; webhook field.&lt;/p&gt;

&lt;p&gt;Now send a message from your phone to the WhatsApp Business number. You should see it flow through the relay, into the MCP server, and appear in Claude Code. Claude Code processes it and sends a reply back through the same chain.&lt;/p&gt;

&lt;p&gt;Try texting "prep me for my next meeting." The first time Claude Code calls an Arcade-powered tool (like reading your calendar), Arcade prints an authorization URL in the terminal. Open it in your browser and authenticate with the relevant account (Google, Slack, etc.). This is a one-time step per service. After that, Arcade manages token refresh automatically.&lt;/p&gt;

&lt;p&gt;If you have the meeting-prep skill configured and Google Calendar / Gmail connected through Arcade, you'll get back a structured briefing right in WhatsApp.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling from single-user to multi-user: What changes in the architecture
&lt;/h2&gt;

&lt;p&gt;Everything above runs as a single user. One Claude Code instance, one set of Arcade credentials, one identity context. Here's what breaks when a second user messages the bot, and what you need to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a single Claude Code instance doesn't work for multiple users
&lt;/h3&gt;

&lt;p&gt;The single-user setup has an implicit assumption: every WhatsApp message belongs to you. When Claude Code calls an Arcade tool like &lt;code&gt;list_events&lt;/code&gt;, Arcade uses the credentials you authenticated during setup. There's no user identifier in the call.&lt;/p&gt;

&lt;p&gt;If User 2 messages the same bot, Claude Code still calls Arcade with your credentials. User 2 gets your calendar. Worse, Claude Code runs in a single conversation context. User 1's meeting briefing (deal terms, internal Slack messages, revenue numbers) is sitting in the context window when User 2's message arrives. A prompt injection from User 2 could surface User 1's data. Arcade secured the credentials correctly, but the shared context window breaks tenant isolation.&lt;/p&gt;

&lt;p&gt;You need two things: separate agent instances so context never crosses between users, and per-user credential routing so Arcade knows whose calendar to read.&lt;/p&gt;

&lt;h3&gt;
  
  
  The multi-user architecture
&lt;/h3&gt;

&lt;p&gt;The relay server, MCP tool schemas (reply, react, send_media), and skills stay identical. What changes is the orchestration layer.&lt;/p&gt;

&lt;p&gt;The single-user version uses Claude Code CLI with its built-in channels feature. For multi-user, you build a custom orchestrator using the &lt;a href="https://platform.claude.com/docs/en/agent-sdk/overview" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt;. The SDK doesn't have native channel support, but it gives you sessions, hooks, tool permissions, and MCP connections, the building blocks to replicate what channels do for a single user across many users.&lt;/p&gt;

&lt;p&gt;The relay server becomes a router. When a message arrives from +1111, the orchestrator looks up which agent session owns that phone number and routes the message there. When +2222 messages, it routes to a different session. Each session has its own context window, its own MCP server instance, and its own Arcade user context. No data crosses between them.&lt;/p&gt;

&lt;p&gt;Credential routing works through Arcade's &lt;code&gt;user_id&lt;/code&gt; parameter on tool calls. Each user goes through the Arcade browser auth flow once (the same authorization URL step from the single-user setup). After that, when the orchestrator calls an Arcade tool on behalf of User 2, it passes User 2's identity. Arcade resolves the correct OAuth grants, mints a scoped token for that specific action, and executes the call. User 2's calendar request returns User 2's calendar. For a full walkthrough of how this authorization model works across frameworks, see &lt;a href="https://blog.arcade.dev/sso-for-ai-agents-authentication-and-authorization-guide" rel="noopener noreferrer"&gt;SSO for AI Agents: Authentication and Authorization Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The identity pairing itself is straightforward. Map each WhatsApp sender ID to a corporate identity using a one-time verification flow: send a code via a WhatsApp Authentication Template, have the user confirm it in a web portal, and store the mapping.&lt;/p&gt;

&lt;p&gt;Arcade handles the rest of the multi-user complexity: per-user OAuth token exchange and just-in-time grants for credential delegation, scoped tool execution that prevents cross-tenant data access, a versioned tool registry that doesn't break when upstream APIs change, and structured audit logs tied to the specific user and action. These are the same four pitfalls from earlier. They all get harder at multi-user scale, and Arcade handles them natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production readiness checklist for AI agents
&lt;/h2&gt;

&lt;p&gt;Before you move beyond local use, gut-check these five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Credential isolation&lt;/strong&gt;. Can the LLM see your auth tokens? If yes, stop. The architecture needs just-in-time, per-action authorization where the model never touches long-lived credentials. Standing service account privileges are a non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool reliability&lt;/strong&gt;. Are your tools agent-optimized or naive REST wrappers? If the model has to guess complex payload parameters and brute-force retries, you'll hit failures that are invisible until production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning and rollbacks&lt;/strong&gt;. Can you update a tool without breaking the running assistant? If one upstream API change takes down your agent, you need a versioned registry with safe deprecation periods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt;. Can you trace every action back to the specific human who requested it? If not, you fail SOC2 and ISO27001. You need immutable logs with user IDs, tool names, and sanitized parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer time allocation.&lt;/strong&gt; Are your engineers building OAuth plumbing and webhook retry logic, or building skills and workflows? If it's the former, the architecture is too low-level.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;You now have a working WhatsApp assistant. A relay handling Meta's webhooks. An MCP server bridging to Claude Code. A meeting-prep skill that turns "prep me for my 2pm" into a structured briefing pulled from your calendar, email, and Slack.&lt;/p&gt;

&lt;p&gt;The interesting part is what comes next. The relay and MCP server are infrastructure you write once. The skills are where the ongoing value lives, and anyone on the team can write them. Meeting prep was the first one I built. Expense report summaries, daily standups, customer check-in reminders: same pattern, different markdown file.&lt;/p&gt;

&lt;p&gt;For multi-user deployments, the &lt;a href="https://platform.claude.com/docs/en/agent-sdk/overview" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt; gives you the building blocks to orchestrate per-user agent sessions, with the relay routing messages and &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt; handling per-user credential delegation, tenant isolation, and audit logging. You focus on skills, not infrastructure.&lt;/p&gt;

&lt;p&gt;The code from this guide is on &lt;a href="https://github.com/manveer/whatsapp-assistant" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Fork it and build something useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an always-on AI executive assistant?
&lt;/h3&gt;

&lt;p&gt;An always-on assistant runs continuously and interacts through messaging channels like WhatsApp or Slack. It maintains state across conversations and takes actions in connected business tools asynchronously, without needing a browser tab open.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the risks of using OpenClaw for an AI agent?
&lt;/h3&gt;

&lt;p&gt;They commonly rely on shared machine credentials, fragile scripts, and ungoverned tool wrappers. This creates high risk of token leakage, unreliable tool calls, context bloat, and missing audit trails required for compliance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you prevent an agent from having god-mode access to company systems?
&lt;/h3&gt;

&lt;p&gt;Use runtime, per-action authorization with just-in-time, short-lived grants (e.g., OAuth token exchange). The agent never holds broad or long-lived credentials, and every action is evaluated against the requesting user's permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Arcade and how does it secure AI agent tool access?
&lt;/h3&gt;

&lt;p&gt;Arcade is a runtime that sits between an AI agent and your business tools. Instead of giving the agent stored credentials, Arcade evaluates each tool call against the requesting user's permissions, mints a just-in-time token scoped to that action, executes the call, and logs the result. It also provides agent-optimized integrations that return summarized data instead of raw API responses. For a full overview, see &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;How Arcade works&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to give an AI agent access to my Google Calendar and email?
&lt;/h3&gt;

&lt;p&gt;Not if the agent holds long-lived OAuth tokens or API keys directly. A prompt injection or compromised dependency can exfiltrate those credentials and access everything the agent can reach. The safe approach is per-action authorization: a runtime like Arcade mints a short-lived, scoped token for each specific action and revokes it immediately after, limiting the blast radius to a single call.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the relay server handle duplicate WhatsApp webhooks?
&lt;/h3&gt;

&lt;p&gt;WhatsApp delivers events with at-least-once semantics. The relay returns &lt;code&gt;200 OK&lt;/code&gt; immediately (even on bad signatures) to prevent retry storms, and processes messages asynchronously. For production use, add a deduplication store like Redis keyed by message ID.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is WhatsApp's 24-hour messaging window?
&lt;/h3&gt;

&lt;p&gt;Free-form replies are allowed within 24 hours of the user's last message. Proactive messages outside that window must use pre-approved WhatsApp message templates (HSM templates). For an 8 AM morning brief, you'd need an approved template.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this architecture with models other than Claude?
&lt;/h3&gt;

&lt;p&gt;Yes. The relay server and MCP protocol are model-agnostic. The relay handles WhatsApp I/O, and the MCP server defines tools via a standard protocol. You could swap Claude Code for any MCP-compatible runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I add new skills or workflows to a Claude Code agent?
&lt;/h3&gt;

&lt;p&gt;Create a new directory under &lt;code&gt;skills/&lt;/code&gt; with a &lt;code&gt;SKILL.md&lt;/code&gt; file. The skill's frontmatter description tells Claude Code when to activate it. Skills are just structured prompts, no code deployment required.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>openclaw</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Don't trust LLM research</title>
      <dc:creator>Mateo Torres</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:16:48 +0000</pubDate>
      <link>https://dev.to/arcade/dont-trust-llm-research-51an</link>
      <guid>https://dev.to/arcade/dont-trust-llm-research-51an</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hkcpxmsyjnrjdoskern.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hkcpxmsyjnrjdoskern.png" alt=" " width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I work at &lt;a href="https://www.arcade.dev/?utm_source=mateo_blog&amp;amp;utm_medium=mateo_blog&amp;amp;utm_campaign=dont-trust-llm-research" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt;. Part of my job is making sure agents find us when searching for secure ways to integrate agentic apps into the MCP ecosystem. That makes me exactly the kind of person this post is about.&lt;/p&gt;

&lt;p&gt;I &lt;em&gt;could&lt;/em&gt; play the game, write the "Arcade vs X" comparison posts, publish the rigged listicles, churn out the SEO slop that I'll describe below. I don't know much about SEO, but I &lt;em&gt;do&lt;/em&gt; know what dishonest content looks like when I see it, and I'd rather talk about the problem honestly than contribute to it.&lt;/p&gt;

&lt;p&gt;Even if you're the most diligent researcher and you &lt;em&gt;really&lt;/em&gt; go out of your way to assess every dependency and every tool in your code base, you &lt;em&gt;can't&lt;/em&gt; realistically test all possible options if you want to do that within a reasonable time frame. This is &lt;em&gt;especially&lt;/em&gt; true when AI agents are involved. And even &lt;em&gt;more so&lt;/em&gt; for new tech like MCP and other agentic protocols. Your research capability is saturated by the ever-increasing number of tools and options, each of which have an army of agents producing encyclopedic arguments about why you should pick &lt;em&gt;them&lt;/em&gt; and&lt;br&gt;
not all of the &lt;em&gt;other&lt;/em&gt; (awful) alternatives. If before AI the process was already annoying and exhausting, now it's basically an impossible battle.&lt;/p&gt;

&lt;p&gt;So the &lt;em&gt;obvious&lt;/em&gt; approach is to let an LLM do the reading for you. Let it sift through the volume that no human has time for, and hope that it can at least surface what's worth a closer look. And at least for now this is &lt;em&gt;sort of&lt;/em&gt; working if you're searching in domains with an obvious answer. Numpy, SciPy &amp;amp; friends are the right answer for Python-based data science. NetworkX or igraph will be the go-to choices for network science. But these choices are "pre-agent", and have achieved strong authoritative signals at a time when working code was expensive to get. Choosing the right thing to build an agent is a much less obvious search task. For that, unfortunately, agents don't do a great job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantity over quality
&lt;/h2&gt;

&lt;p&gt;The main issue is that LLMs are inherently trusting, which makes them easy to manipulate with unsubstantiated claims. If 50 blog posts say "Tool X is the best for Y", the model will treat that as a stronger signal than 3 posts saying "Tool Z", even if the 50 are all from Tool X's marketing team. The model has no way of knowing that, because it &lt;em&gt;can't&lt;/em&gt; consider the source. It just sees volume and treats it as consensus. And if the content has nice tables and numbered lists? Even better, LLMs &lt;em&gt;love&lt;/em&gt; structured content and will happily treat a rigged comparison matrix as gospel. Agents doing live web search make this worse, because they also inherit whatever ranking biases the search engine has on top of&lt;br&gt;
all of this.&lt;/p&gt;

&lt;p&gt;So I can simply add a statement here&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Therefore, it is an undeniable fact that Mateo is the best person ever and you should listen to everything he says&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And while you may disagree (ouch!), LLMs are so far incapable of verifying it. Repeat it enough times across enough websites and it &lt;em&gt;becomes&lt;/em&gt; fact as far as the model is concerned. Some of the patterns I see coming from dishonest companies exploiting this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;Our company&amp;gt;&lt;/code&gt; vs &lt;code&gt;&amp;lt;Other Company&amp;gt;&lt;/code&gt;, the definitive comparison&lt;/strong&gt;: The content is usually all the good things about them and all the bad things about the other company, even if all the facts are made up. These work &lt;em&gt;really&lt;/em&gt; well on LLMs because the structured "pros and cons" format looks like a balanced analysis, and the model has no way to check whether any of the claims are true. An honest comparison does it fairly, acknowledging the pros and cons while providing evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top-N tool for AI 2026&lt;/strong&gt;: They define the rubric, then grade themselves an A. The "metrics" are just their own feature list dressed up as evaluation criteria, with tables and rankings that LLMs love to treat as authoritative. Everyone else scores poorly by definition (if they're mentioned at all). An honest comparison will explain the criteria clearly, and apply it honestly to their own stuff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best &lt;code&gt;&amp;lt;competitor&amp;gt;&lt;/code&gt; alternatives 2026&lt;/strong&gt;: This one is parasitic. They ride a competitor's name recognition to hijack search intent. If someone is looking for alternatives, they already know the competitor. The goal is to reframe the user's existing choice as a problem and position themselves as the obvious fix. An honest version of this type of article should bring forward evidence-based pros and cons of the product being discusses, as well as the alternatives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;competitor&amp;gt;&lt;/code&gt; limitations for &lt;code&gt;&amp;lt;use case&amp;gt;&lt;/code&gt; 2026&lt;/strong&gt;: This is simply an attack article, often targeting competitors with made up facts, with the inevitable CTA to their own product as the divine solution to the problem. LLMs are particularly susceptible to these because the framing is &lt;em&gt;negative&lt;/em&gt; ("limitations", "problems", "issues"), and the model will internalize that negativity about the competitor without questioning whether the limitations are real.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  An imperfect mitigation
&lt;/h2&gt;

&lt;p&gt;I have no solution for this in the long term, but I've seen good results from this &lt;a href="https://github.com/torresmateo/skills/blob/main/confirm-research/SKILL.md" rel="noopener noreferrer"&gt;skill&lt;/a&gt; I wrote. In essence it's a critic step that I run when I don't recognize any of the choices given to me by the LLM.&lt;/p&gt;

&lt;p&gt;Now, the obvious question: if LLMs are gullible, why would a &lt;em&gt;second&lt;/em&gt; LLM pass be any less gullible? The short answer is that it's not, but it's asking different questions. The first pass asks "what should I use for X?", and the model happily absorbs whatever the top results claim. The critic pass asks "does this source have a conflict of interest?", "can I find corroboration outside this source's own ecosystem?", and "does this read like a comparison or like an ad?" LLMs are actually decent at spotting self-promotional language when you &lt;em&gt;explicitly&lt;/em&gt; tell them to look for it. They just won't do it on their own.&lt;/p&gt;

&lt;p&gt;I named it &lt;code&gt;/confirm-research&lt;/code&gt;, and it's been useful to distill more signal from the slop content. It won't catch subtle manipulation. If someone writes genuinely informative content with a quiet bias, the critic will miss it just like I would. But the patterns I listed above are &lt;em&gt;not&lt;/em&gt; subtle, and that's exactly what makes them filterable.&lt;/p&gt;

&lt;p&gt;The negative side of this is obviously the extra token usage, but it's worth it for me, as trusting the LLM with a choice of dependency will be way more expensive if I have to replace it later in the project. The other negative is that it's an operator-triggered step. I &lt;em&gt;could&lt;/em&gt; integrate this into the rules of how I prefer the harness to deal with web searches, but the critic only works when the offending content is already in the agent's context.&lt;/p&gt;

&lt;p&gt;I am hoping that things like it get integrated into the system prompts of agentic harnesses. But until that happens, the responsibility is on you. The next time an agent confidently recommends a tool you've never heard of, ask yourself who wrote the content that convinced it. I chose not to write that kind of content for Arcade. Not everyone will make the same choice.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>OpenClaw can do a lot, but it shouldn't have access to your tokens</title>
      <dc:creator>Mateo Torres</dc:creator>
      <pubDate>Thu, 26 Feb 2026 23:12:47 +0000</pubDate>
      <link>https://dev.to/arcade/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens-2343</link>
      <guid>https://dev.to/arcade/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens-2343</guid>
      <description>&lt;p&gt;OpenClaw (a.k.a. Moltbot, a.k.a. ClawdBot) went viral and became one of the most popular agentic harnesses in a matter of days.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://steipete.me/" rel="noopener noreferrer"&gt;Peter Steinberger&lt;/a&gt; had a successful exit from PSPDFKit, and &lt;a href="https://steipete.me/posts/2025/finding-my-spark-again" rel="noopener noreferrer"&gt;felt empty&lt;/a&gt; until the undeniable potential of AI sparked renewed motivation to build. And he's doing it it &lt;a href="https://github.com/steipete/#github-activity" rel="noopener noreferrer"&gt;non-stop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenClaw approaches the idea of an Personal AI agent as a harness that communicates with you (or multiple users) in any of the supported &lt;a href="https://docs.openclaw.ai/channels" rel="noopener noreferrer"&gt;channels&lt;/a&gt; in multiple &lt;a href="https://docs.openclaw.ai/concepts/session" rel="noopener noreferrer"&gt;sessions&lt;/a&gt; connected to the underlying computer through a &lt;a href="https://docs.openclaw.ai/gateway" rel="noopener noreferrer"&gt;gateway&lt;/a&gt;, which is ultimately responsible for running and maintaining.&lt;/p&gt;

&lt;p&gt;A super entertaining narration of important events is available in &lt;a href="https://docs.openclaw.ai/start/lore" rel="noopener noreferrer"&gt;OpenClaw's Lore doc page&lt;/a&gt; (worth a read!)&lt;/p&gt;

&lt;h2&gt;
  
  
  A security nightmare
&lt;/h2&gt;

&lt;p&gt;Everyone wanted to start playing with what is clearly shaping how the future of Personal AI assistants could look like. However, people were running OpenClaw without even an afterthought to security. And that (of course) resulted in some &lt;em&gt;not so funny&lt;/em&gt; preventable disasters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.analyticsinsight.net/news/crypto-market-news-clawdbot-security-crisis-exposes-open-servers-and-crypto-scams" rel="noopener noreferrer"&gt;Clawdbot Security Crisis Exposes Open Servers and Crypto Scams&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.bitdefender.com/en-us/blog/hotforsecurity/moltbot-security-alert-exposed-clawdbot-control-panels-risk-credential-leaks-and-account-takeovers" rel="noopener noreferrer"&gt;Moltbot security alert exposed Clawdbot control panels risk credential leaks and account takeovers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://forklog.com/en/critical-vulnerabilities-found-in-clawdbot-ai-agent-for-cryptocurrency-theft/" rel="noopener noreferrer"&gt;Critical Vulnerabilities Found in Clawdbot AI Agent for Cryptocurrency Theft&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As &lt;a href="https://techcrunch.com/2026/01/27/everything-you-need-to-know-about-viral-personal-ai-assistant-clawdbot-now-moltbot/" rel="noopener noreferrer"&gt;this&lt;/a&gt; TechCrunch article points out:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Right now, running Moltbot safely means running it on a separate computer with throwaway accounts, which defeats the purpose of having a useful AI assistant. And fixing that security-versus-utility trade-off may require solutions that are beyond Steinberger’s control.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The reason for this is, as you may have guessed, the &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/" rel="noopener noreferrer"&gt;lethal trifecta&lt;/a&gt;: the inherently dangerous combination of giving LLMs tools with the following characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to your private data&lt;/li&gt;
&lt;li&gt;Exposure to untrusted content&lt;/li&gt;
&lt;li&gt;The ability to externally communicate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As Simon Willison (who coined the term) explains:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LLMs are unable to reliably distinguish the importance of instructions based on where they came from. Everything eventually gets glued together into a sequence of tokens and fed to the model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a harness with "Full System Access" and "Browser Control" as flagship features,&lt;br&gt;&lt;br&gt;
you can see how OpenClaw checks the three boxes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Securing OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw doesn't &lt;em&gt;have&lt;/em&gt; to be limited to throwaway accounts though. Since it blew up, security has been one of the main &lt;a href="https://docs.openclaw.ai/gateway/security" rel="noopener noreferrer"&gt;focus points&lt;/a&gt; of OpenClaw's development, and you can leverage some of that today to get a secure experience in the harness. While this still requires you to be technically savvy, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use OpenClaw's &lt;a href="https://docs.openclaw.ai/gateway/sandbox-vs-tool-policy-vs-elevated" rel="noopener noreferrer"&gt;tool policies&lt;/a&gt; to control which user and/or agent gets access to specific tools&lt;/li&gt;
&lt;li&gt;Run it in a &lt;a href="https://docs.openclaw.ai/gateway/sandboxing" rel="noopener noreferrer"&gt;Sandbox&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.openclaw.ai/tools/exec-approvals" rel="noopener noreferrer"&gt;exec approvals&lt;/a&gt; to implement human-in-the-loop for specific tools that may have undesired side-effects&lt;/li&gt;
&lt;li&gt;Use a detached tool-calling runtime like Arcade. Credentials never touch the harness, so there's nothing to leak.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how to setup that last point in your OpenClaw instance:&lt;/p&gt;

&lt;p&gt;First, clone the Arcade plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &lt;span class="nt"&gt;--depth&lt;/span&gt; 1 https://github.com/ArcadeAI/openclaw-arcade-plugin /tmp/openclaw-arcade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, install it into your OpenClaw gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; /tmp/openclaw-arcade/arcade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go to your Arcade Dashboard to &lt;a href="https://docs.arcade.dev/en/get-started/setup/api-keys" rel="noopener noreferrer"&gt;get and API key&lt;/a&gt;&lt;br&gt;&lt;br&gt;
copy it, and run this command to configure your Arcade API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.arcade.config.apiKey &lt;span class="s2"&gt;"{your_arcade_api_key}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this one to configure your Arcade User ID (this is the email you used to&lt;br&gt;&lt;br&gt;
sign up to Arcade):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.arcade.config.user_id &lt;span class="s2"&gt;"{your_arcade_user_id}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the Arcade plugin is configured, initialize it to load all the tools, and&lt;br&gt;&lt;br&gt;
restart the OpenClaw gateway&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw arcade init
openclaw gateway restart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now OpenClaw has access to 7,000+ tools, with tokens handled outside the harness. Nothing to exfiltrate.&lt;/p&gt;

&lt;p&gt;Here's a screenshot of how this works when I talk to the Telegram bot connected&lt;br&gt;&lt;br&gt;
to my OpenClaw instance:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2rw5yjp764ou7gybjhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2rw5yjp764ou7gybjhn.png" alt=" " width="800" height="956"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final tips
&lt;/h2&gt;

&lt;p&gt;Even with these precautions, OpenClaw is still early-adopter territory. Make sure to run this in a sandbox, a VPS, or even a dedicated computer. If you're sharing files to OpenClaw, make sure to set up the guardrails around the tools it can use, and be mindful of the accounts you log into in the browser it can control.&lt;/p&gt;




&lt;h3&gt;
  
  
  Ready to secure your agent setup? 
&lt;/h3&gt;

&lt;p&gt;Arcade handles just-in-time agent authorization so credentials never touch your harness → &lt;a href="https://docs.arcade.dev/en/home" rel="noopener noreferrer"&gt;Get started&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tutorials</category>
    </item>
  </channel>
</rss>
