<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jangwook Kim</title>
    <description>The latest articles on DEV Community by Jangwook Kim (@jangwook_kim_e31e7291ad98).</description>
    <link>https://dev.to/jangwook_kim_e31e7291ad98</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1909290%2F60a8c15f-b2b5-4189-8578-78b8ab78900b.jpg</url>
      <title>DEV Community: Jangwook Kim</title>
      <link>https://dev.to/jangwook_kim_e31e7291ad98</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jangwook_kim_e31e7291ad98"/>
    <language>en</language>
    <item>
      <title>Cloudflare Agents Week 2026 — When AI Agents Become Cloud Customers</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Fri, 15 May 2026 06:43:30 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/cloudflare-agents-week-2026-when-ai-agents-become-cloud-customers-47b2</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/cloudflare-agents-week-2026-when-ai-agents-become-cloud-customers-47b2</guid>
      <description>&lt;p&gt;This time last year, every AI agent infrastructure conversation started with Kubernetes + LangGraph. Cloudflare's April Agents Week presented a different picture. Agents don't just call APIs — they create Cloudflare accounts, register domains, and deploy code on their own. The phrase "agents as cloud customers" sounds like marketing fluff, but this time they actually built it.&lt;/p&gt;

&lt;p&gt;Here's my read on what matters, what doesn't, and where I'm skeptical.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agents Week Was
&lt;/h2&gt;

&lt;p&gt;Cloudflare declared April 2026 "agents week" and shipped announcements every day — 20+ new features and GA transitions by the end of it. The overall impression is a company-wide bet that agents will be the primary actors on the internet, and they rebuilt infrastructure accordingly across compute, storage, networking, and security.&lt;/p&gt;

&lt;p&gt;I'm focusing on the items that actually affect how you write and deploy agent code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Most Provocative Announcement — Agents That Create Their Own Accounts
&lt;/h2&gt;

&lt;p&gt;My honest reaction when I first read this: "is this real?" The mechanics: once a user accepts Cloudflare's terms of service once, agents can autonomously create a Cloudflare account, start a paid subscription, register a domain, get an API token, and deploy code. Stripe partnership handles payment tokenization; OAuth + OIDC authenticate the agent as a trusted actor.&lt;/p&gt;

&lt;p&gt;The implication is significant. Until now, agents worked within infrastructure that humans provisioned. Now agents can be the entity that provisions the infrastructure itself. If you're building a SaaS product, "agent handles new customer onboarding end-to-end" becomes a real architectural option.&lt;/p&gt;

&lt;p&gt;That said, I have two concerns I can't shake. First, an agent connected to live billing requires airtight cost controls. Cloudflare's new &lt;code&gt;task_budget&lt;/code&gt; concept seems designed for exactly this, but real-world examples of the two working together are scarce. Second, the legal accountability picture is murky. If an agent registers the wrong domain or incurs unexpected charges, who owns that? User consent to ToS exists, but the specific liability hasn't been tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Announcements Worth Your Attention
&lt;/h2&gt;

&lt;p&gt;Past the headline, here are the things I'd actually build with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandboxes GA&lt;/strong&gt;: Nine months from beta (June 2025) to general availability. Each sandbox is an isolated Linux environment — real shell, real filesystem, background processes — that spins up on demand and, critically, picks up exactly where it left off after interruption. Sub-millisecond start times mean a code-generation agent can write, execute, observe output, and iterate in tight loops.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/en/blog/en/ai-agent-framework-comparison-2026-langgraph-crewai-dapr-production"&gt;Compared to setting up a separate code execution environment alongside LangGraph or CrewAI&lt;/a&gt;, Sandboxes shifts the question from "how do I configure the execution environment" to "which infrastructure layer do I trust to manage it." Those are meaningfully different decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Artifacts&lt;/strong&gt;: Git-compatible versioned storage for agents. Create tens of millions of repos, fork from any remote, access with standard Git clients. Moved from private beta to public beta in early May. The practical use case: agents that produce code outputs now have a permanent home for those outputs, survives context resets, accessible from outside Cloudflare's stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Workers&lt;/strong&gt;: Isolated runtime for AI-generated code. Millisecond spin-up, scales to millions of concurrent executions. Enables the generate-execute-observe loop agents need without managing container infrastructure. Still feels early but the concept is right.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Actually Installed the SDK
&lt;/h2&gt;

&lt;p&gt;Theory aside, I ran through the setup myself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;cloudflare-agent-demo &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;cloudflare-agent-demo
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @cloudflare/agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean install. &lt;code&gt;@cloudflare/agents@0.0.16&lt;/code&gt; exports &lt;code&gt;Agent&lt;/code&gt;, &lt;code&gt;AIChatAgent&lt;/code&gt;, and &lt;code&gt;routeAgentRequest&lt;/code&gt; as the main surfaces.&lt;/p&gt;

&lt;p&gt;Here's a minimal but representative agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/index.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;routeAgentRequest&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cloudflare/agents&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TaskState&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;processedCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;lastHeartbeat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;TASK_AGENT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableObjectNamespace&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TaskAgent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskAgent&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TaskState&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onStart&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setState&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;processedCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;lastHeartbeat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// Built-in cron scheduling — no external scheduler needed&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0 * * * *&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;heartbeat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;`SELECT COUNT(*) as n FROM tasks`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setState&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;processedCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;lastHeartbeat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Agents receive email directly&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ForwardableEmailMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`
      INSERT INTO tasks (id, content, created_at)
      VALUES (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;)
    `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;routed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;routeAgentRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;routed&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;wrangler dev&lt;/code&gt; starts immediately, no Cloudflare account needed for local work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;⛅️ wrangler 4.91.0
Your Worker has access to the following bindings:
  env.TASK_AGENT (TaskAgent)   Durable Object   local

⎔ Starting local server...
[wrangler:info] Ready on http://localhost:9998
[wrangler:info] GET / 200 OK (7ms)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One important caveat: &lt;code&gt;@cloudflare/agents&lt;/code&gt; is Workers runtime-only. Trying to run it with standard Node.js throws &lt;code&gt;ERR_UNSUPPORTED_ESM_URL_SCHEME&lt;/code&gt; because of the &lt;code&gt;cloudflare:&lt;/code&gt; protocol imports. You need Wrangler. &lt;a href="https://dev.to/en/blog/en/claude-agent-sdk-tool-use-complete-guide-2026"&gt;If you're used to SDKs like the Claude Agent SDK that run anywhere in Python or Node&lt;/a&gt;, this is an adjustment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Choices Worth Understanding
&lt;/h2&gt;

&lt;p&gt;A few design decisions in the SDK that reflect Cloudflare's broader approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedded SQLite&lt;/strong&gt;: Declare &lt;code&gt;new_sqlite_classes&lt;/code&gt; in &lt;code&gt;wrangler.toml&lt;/code&gt; and every Agent instance gets its own SQLite. No external database configuration. Query with &lt;code&gt;this.sql&lt;/code&gt;. The Durable Object isolation model gives you natural multi-tenancy — each agent instance has independent data. Sounds wasteful but it's actually clean for state isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-process scheduling&lt;/strong&gt;: Register cron jobs directly from agent code. No external cron service. Wraps the Durable Object alarm API, which keeps scheduling and state management co-located. High cohesion, lower operational surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email handler&lt;/strong&gt;: &lt;code&gt;onEmail&lt;/code&gt; lets agents receive email directly via Workers Email Routing. An agent that turns email into tasks is straightforward to write.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/en/blog/en/dapr-agents-v1-cncf-production-ai-framework"&gt;The way Dapr Agents handles state and messaging through Kubernetes sidecar patterns&lt;/a&gt; contrasts interestingly here. Cloudflare's model is more code-centric; Dapr is more infrastructure-centric. Both have legitimate use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I'm Skeptical
&lt;/h2&gt;

&lt;p&gt;I'll be direct about the rough edges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vendor lock-in is significant.&lt;/strong&gt; The &lt;code&gt;cloudflare:workers&lt;/code&gt; runtime dependency means your agent code doesn't run outside Cloudflare's stack. Migrating to a different platform later means substantial rewrites. &lt;a href="https://dev.to/en/blog/en/mcp-server-production-deployment-kubernetes-guide"&gt;Containerized approaches like running MCP servers on Kubernetes&lt;/a&gt; don't have this problem — you trade operational simplicity now for portability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent orchestration is thin.&lt;/strong&gt; The single-agent story is compelling. But the SDK-level support for complex multi-agent coordination — handoffs, shared memory, hierarchical orchestration — is limited. Project Think is meant to address this but it's early. If your use case involves agents coordinating at scale, you'll need to build significant structure yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDK maturity.&lt;/strong&gt; &lt;code&gt;@cloudflare/agents@0.0.16&lt;/code&gt; is pre-1.0. The API surface will change. For production use, you're accepting that risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Take on When to Use This
&lt;/h2&gt;

&lt;p&gt;Cloudflare is the right infrastructure choice when: response latency at the edge matters for your agents, your team already operates Cloudflare Workers, you want to minimize infrastructure management and focus on agent logic, or your architecture involves many independent agents each owning their own state.&lt;/p&gt;

&lt;p&gt;It's not the right choice when: you need complex multi-agent orchestration and you're already invested in LangGraph, you're locked to AWS or GCP infrastructure, or your agents need to run in Python or standard Node.js environments.&lt;/p&gt;

&lt;p&gt;The overall direction from Agents Week is coherent. Cloudflare is positioning itself as the infrastructure layer for the agent era — what Kubernetes became for containers. The SDK being at v0 means production adoption should be cautious, but the design thinking is consistent. Worth running through the setup and forming your own opinion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signed Agents: Cryptographic Identity for Agent Traffic
&lt;/h2&gt;

&lt;p&gt;One announcement that got less coverage but caught my attention: Signed Agents. The idea is that HTTP requests made by agents carry a cryptographic signature proving their origin — "this was sent by an agent, not a human."&lt;/p&gt;

&lt;p&gt;Right now there's no standard way to distinguish agent traffic from human traffic on the internet. User-Agent strings and IP patterns are guesses at best. Signed Agents gives servers a verifiable signal: they can check the signature and apply agent-specific rate limits, billing, or access controls. It's an early-stage primitive but it's the right one to build. Once agents are common enough to treat as distinct traffic types, having cryptographic identity for them becomes infrastructure rather than a feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Email Service Public Beta
&lt;/h2&gt;

&lt;p&gt;Workers Email Service graduated to public beta during Agents Week. Any agent can now send email without integrating a third-party service like SendGrid or AWS SES.&lt;/p&gt;

&lt;p&gt;Combined with the &lt;code&gt;onEmail&lt;/code&gt; handler already in the SDK, agents can now handle both inbound and outbound email entirely within Cloudflare's stack. An agent that receives a customer email, processes it, creates a task, and sends a reply — with no external email service in the loop. For customer support agents, notification pipelines, or email-based task management, this is a meaningful simplification.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Looking at Agents Week as a whole, it reads less like a feature release and more like a positioning statement. Twenty-plus announcements, all pointing the same direction: Cloudflare intends to be the infrastructure layer for the agent era the way AWS became the infrastructure layer for the web era.&lt;/p&gt;

&lt;p&gt;The single thing I'd actually go build with first from this week: Sandboxes. Not the headline "agents create accounts" story — the persistent isolated Linux environment for agent code execution. That's immediately useful for any code-generation or code-testing agent, and it works today without novel legal or financial risk.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;@cloudflare/agents@0.0.16&lt;/code&gt; tells you what you need to know about production readiness. But if you're serious about evaluating agent infrastructure options, run through the local setup and form your own opinion. Twenty minutes, no account required.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Test environment&lt;/strong&gt;: &lt;code&gt;@cloudflare/agents@0.0.16&lt;/code&gt;, &lt;code&gt;wrangler@4.91.0&lt;/code&gt;, Node.js v22.22.0, macOS 14&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Note&lt;/strong&gt;: The autonomous agent account creation feature requires a real Cloudflare account and Stripe integration — out of scope for local testing.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Source&lt;/strong&gt;: &lt;a href="https://blog.cloudflare.com/agents-week-in-review/" rel="noopener noreferrer"&gt;Cloudflare Agents Week 2026&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudflare</category>
      <category>aiagents</category>
      <category>agentinfrastructure</category>
      <category>webplatform</category>
    </item>
    <item>
      <title>AWS MCP Server GA Practical Guide — Connecting CloudWatch &amp; IAM to Your AI Coding Agent</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Thu, 14 May 2026 06:42:39 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/aws-mcp-server-ga-practical-guide-connecting-cloudwatch-iam-to-your-ai-coding-agent-3d79</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/aws-mcp-server-ga-practical-guide-connecting-cloudwatch-iam-to-your-ai-coding-agent-3d79</guid>
      <description>&lt;p&gt;A CloudWatch alarm fired. Lambda error rate crossed the threshold, and I needed to dig through logs — flipping between the AWS console and my terminal, copying log group names by hand. At some point I had a clear thought: what if Claude Code could just look at my CloudWatch directly?&lt;/p&gt;

&lt;p&gt;On May 6, 2026, AWS shipped an answer. &lt;strong&gt;AWS MCP Server hit general availability.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the AWS MCP Server Actually Is
&lt;/h2&gt;

&lt;p&gt;AWS MCP Server gives AI coding agents — Claude Code, Cursor, Codex — a standardized way to query AWS services directly. It wraps AWS APIs as MCP tools, using the Model Context Protocol that Anthropic defined. One &lt;code&gt;uvx&lt;/code&gt; command wires 31 CloudWatch tools and 29 IAM tools into your coding agent.&lt;/p&gt;

&lt;p&gt;Instead of copying log group names from the console and pasting them into CLI commands, you can ask your agent: "Find the Lambda function with the highest error rate in the past hour and summarize the relevant logs." The agent runs the Logs Insights query itself and brings back results.&lt;/p&gt;

&lt;p&gt;If you've &lt;a href="https://dev.to/en/blog/en/mcp-server-build-practical-guide-2026"&gt;built an MCP server from scratch&lt;/a&gt;, you already understand the protocol. AWS MCP Server is the official, AWS-maintained collection of MCP servers for AWS services, published at &lt;code&gt;awslabs/mcp&lt;/code&gt; on GitHub and installable from PyPI.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Changed at GA
&lt;/h3&gt;

&lt;p&gt;Three things matter compared to pre-GA versions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAM condition context keys.&lt;/strong&gt; Every API call routed through AWS MCP Server now carries &lt;code&gt;aws:ViaAWSMCPService&lt;/code&gt; and &lt;code&gt;aws:CalledViaAWSMCP&lt;/code&gt; condition keys automatically. Your IAM policies can differentiate agent-initiated calls from human-initiated calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full CloudTrail integration.&lt;/strong&gt; Every API call goes to CloudTrail. There's a complete audit trail of what the agent did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Separate CloudWatch namespace.&lt;/strong&gt; Metrics published under &lt;code&gt;AWS-MCP&lt;/code&gt; let you monitor how much of your API traffic comes from agents versus direct calls.&lt;/p&gt;

&lt;p&gt;The practical upshot: &lt;strong&gt;you can now enforce different IAM permissions for agents and humans while using the same AWS credentials.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation: One Line with uvx
&lt;/h2&gt;

&lt;p&gt;I installed and ran both servers. Here is what it takes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install uv if you don't have it&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh

&lt;span class="c"&gt;# Run CloudWatch MCP server (creates isolated env automatically)&lt;/span&gt;
uvx awslabs.cloudwatch-mcp-server@latest

&lt;span class="c"&gt;# Run IAM MCP server&lt;/span&gt;
uvx awslabs.iam-mcp-server@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;uvx&lt;/code&gt; handles the virtual environment. First run pulls 53 packages for the CloudWatch server — botocore, pandas, scipy, statsmodels, and more. The reason for scipy and statsmodels is that the CloudWatch server includes built-in anomaly detection and statistical analysis on metrics, not just passthrough queries.&lt;/p&gt;

&lt;p&gt;Installed versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;awslabs.cloudwatch-mcp-server&lt;/code&gt; v0.1.1&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;awslabs.iam-mcp-server&lt;/code&gt; v1.0.20&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 0.x version on the CloudWatch server signals the API is still stabilizing. That is worth keeping in mind before putting it in production workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wiring It Into Claude Code (.mcp.json)
&lt;/h3&gt;

&lt;p&gt;Put this in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cloudwatch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"awslabs.cloudwatch-mcp-server@latest"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_REGION"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ap-northeast-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_PROFILE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"FASTMCP_LOG_LEVEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WARNING"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"iam"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"awslabs.iam-mcp-server@latest"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_REGION"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ap-northeast-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"FASTMCP_LOG_LEVEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WARNING"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;FASTMCP_LOG_LEVEL&lt;/code&gt; to &lt;code&gt;WARNING&lt;/code&gt;. Without it, INFO logs bleed into the agent's responses. You can also install via the Claude Code CLI: &lt;code&gt;claude mcp add aws-mcp-server&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  CloudWatch MCP Server: 31 Tools
&lt;/h2&gt;

&lt;p&gt;When the server starts, it registers exactly 31 tools. Here is the breakdown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log group tools (8):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;describe_log_groups         List log groups
analyze_log_group           AI-powered log pattern analysis
execute_log_insights_query  Run a Logs Insights query
get_logs_insight_query_results  Poll query results
cancel_logs_insight_query   Cancel a running query
execute_cwl_insights_batch  Batch query execution
recommend_indexes_loggroup  Index recommendations for a log group
recommend_indexes_account   Account-wide index recommendations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Metrics tools (11):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get_metric_data             Fetch metric data points
get_metric_metadata         Metadata lookup (1,179 entries indexed at startup)
analyze_metric              Anomaly detection on a metric
get_recommended_metric_alarms  Suggest alarm thresholds
execute_promql_query        Run a PromQL query
execute_promql_range_query  PromQL range query
get_promql_label_values     PromQL label values
get_promql_series           PromQL series
get_promql_labels           PromQL labels
get_active_alarms           List active alarms
get_alarm_history           Alarm state history
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_metric_metadata&lt;/code&gt; detail is worth noting. At startup, the server loads and indexes 1,179 metric metadata entries covering EC2, Lambda, RDS, DynamoDB, and most other AWS services. The server logs show it explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;INFO | Loaded 1179 metric metadata entries
INFO | Successfully indexed 1179 metric metadata entries
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what allows the agent to answer "which metric measures Lambda cold start duration?" without hitting the AWS docs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Found on My Account
&lt;/h3&gt;

&lt;p&gt;I ran this against a real ap-northeast-1 account. The output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Available log groups (5):
  /aws/lambda/remotax-renewal-fe-CustomCDKBucketDeployment: 331,695 bytes
  /aws/lambda/remotax-renewal-fe-CustomS3AutoDeleteObjects:   2,038 bytes
  /aws/lambda/remotax-renewal-fe-LambdaServerFunctionHandler:     0 bytes
  /aws/lambda/remotax-renewal-fe-LogRetentionaae0aa3c5b4d4f:     0 bytes
  RDSOSMetrics: 55,192,669 bytes

Active CloudWatch Alarms:
  OK    EC2-HighCPU-Alarm
&lt;/span&gt;&lt;span class="gp"&gt;        CPUUtilization &amp;gt;&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 80% | Currently: OK
&lt;span class="go"&gt;  ?     EC2-HighDiskUsage-Alarm
&lt;/span&gt;&lt;span class="gp"&gt;        disk_used_percent &amp;gt;&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 80% | INSUFFICIENT_DATA
&lt;span class="go"&gt;  ?     EC2-HighMemoryUsage-Alarm
&lt;/span&gt;&lt;span class="gp"&gt;        mem_used_percent &amp;gt;&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 80% | INSUFFICIENT_DATA
&lt;span class="go"&gt;  ?     LaravelErrorAlarm
&lt;/span&gt;&lt;span class="gp"&gt;        LaravelErrorCount &amp;gt;&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 1 | INSUFFICIENT_DATA
&lt;span class="go"&gt;
EC2 metrics available: 85
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three alarms sitting in &lt;code&gt;INSUFFICIENT_DATA&lt;/code&gt;. Disk and memory alarms with no data means CloudWatch Agent is not running or misconfigured on those EC2 instances. That is the kind of silent infrastructure problem that usually only surfaces when an alert should fire and doesn't. The agent picked it up immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM MCP Server: 29 Tools and the Security Architecture That Matters
&lt;/h2&gt;

&lt;p&gt;The IAM server ships 29 tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list_users / get_user / create_user / delete_user
list_roles / create_role
list_policies / get_managed_policy_document
attach_user_policy / detach_user_policy
create_access_key / delete_access_key
simulate_principal_policy    ← the important one
list_groups / create_group / delete_group
add_user_to_group / remove_user_from_group
put_role_policy / get_role_policy / delete_role_policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I find &lt;code&gt;simulate_principal_policy&lt;/code&gt; the most useful. It checks whether an IAM principal can perform specific actions without actually making those API calls. After reading about &lt;a href="https://dev.to/en/blog/en/mcp-security-crisis-30-cves-enterprise-hardening"&gt;MCP ecosystem security vulnerabilities and 30 CVEs&lt;/a&gt;, having agents pre-validate their permissions before executing is a meaningful safety step.&lt;/p&gt;

&lt;p&gt;Test run against my account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;simulate_principal_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;PolicySourceArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::370193714718:user/remotax-fe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ActionNames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cloudwatch:DescribeAlarms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;logs:DescribeLogGroups&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iam:ListUsers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3:ListBuckets&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;ResourceArns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Results:
# ✓ cloudwatch:DescribeAlarms: allowed
# ✓ logs:DescribeLogGroups: allowed
# ✓ iam:ListUsers: allowed
# ✓ s3:ListBuckets: allowed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Condition Key Architecture
&lt;/h3&gt;

&lt;p&gt;This is the part I think matters most about the GA release. Every API call through AWS MCP Server automatically carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;aws:ViaAWSMCPService&lt;/code&gt; — marks this as a request via an MCP service&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws:CalledViaAWSMCP&lt;/code&gt; — marks this as originating from an MCP client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An IAM deny policy using these keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Deny"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"iam:CreateUser"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"iam:DeleteUser"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"iam:AttachUserPolicy"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Bool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"aws:ViaAWSMCPService"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this policy, a human using the AWS console can manage IAM users. Claude Code using the same credentials cannot. Same key pair, different effective permissions. When I was &lt;a href="https://dev.to/en/blog/en/claude-agent-sdk-tool-use-complete-guide-2026"&gt;implementing Tool Use in the Claude Agent SDK&lt;/a&gt;, I had to build agent permission scoping into application logic. AWS is solving that at the infrastructure layer here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Faws-mcp-server-ga-practical-guide-2026-arch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Faws-mcp-server-ga-practical-guide-2026-arch.png" alt="AWS MCP Server Security Architecture — CloudWatch, IAM, CloudTrail"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three layers: coding agent → AWS MCP Server (stdio) → AWS API (SigV4 auth). Every AWS API call goes to CloudTrail. Metrics land in the AWS-MCP CloudWatch namespace separately from direct human calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Available AWS MCP Servers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;awslabs.cloudwatch-mcp-server&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logs, Metrics, Alarms&lt;/td&gt;
&lt;td&gt;v0.1.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;awslabs.iam-mcp-server&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;IAM management&lt;/td&gt;
&lt;td&gt;v1.0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;awslabs.aws-api-mcp-server&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any AWS API&lt;/td&gt;
&lt;td&gt;separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Application Signals&lt;/td&gt;
&lt;td&gt;APM/SLO monitoring&lt;/td&gt;
&lt;td&gt;separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Network MCP Server&lt;/td&gt;
&lt;td&gt;VPC/network diagnostics&lt;/td&gt;
&lt;td&gt;separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Pricing MCP Server&lt;/td&gt;
&lt;td&gt;Cost estimation&lt;/td&gt;
&lt;td&gt;separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EKS MCP Server&lt;/td&gt;
&lt;td&gt;EKS cluster management&lt;/td&gt;
&lt;td&gt;separate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;aws-api-mcp-server&lt;/code&gt; is interesting. It exposes every AWS API through a single tool. When &lt;a href="https://dev.to/en/blog/en/fastmcp-python-mcp-server-build-guide-2026"&gt;building a FastMCP-based MCP server&lt;/a&gt;, each API endpoint needed its own tool definition. The aws-api-mcp-server flips that — one tool, all APIs. The trade-off is that the agent needs more context to figure out which API to call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Assessment — What Works, What Doesn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What I find genuinely useful:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The IAM condition key separation is real. If you've been hesitant to give agents AWS access because you can't restrict them beyond the IAM user's permissions, this changes the calculation. You can attach &lt;code&gt;aws:ViaAWSMCPService&lt;/code&gt; deny statements to enforce read-only agent access while keeping full human access with the same credentials.&lt;/p&gt;

&lt;p&gt;PromQL support surprised me. CloudWatch supports PromQL for Container Insights metrics, and the MCP server exposes it. If you run Kubernetes on EKS and already write PromQL, you can use that syntax directly through the agent.&lt;/p&gt;

&lt;p&gt;The 1,179-entry metric metadata index means the agent can reason about AWS services it has never seen before in your specific account. It knows what metrics EC2, Lambda, RDS, and most other services expose without needing to query AWS each time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What gives me pause:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CloudWatch server at v0.1.1. The AI analysis tools like &lt;code&gt;analyze_log_group&lt;/code&gt; and &lt;code&gt;analyze_metric&lt;/code&gt; look promising but I have not stress-tested them. A 0.x version in production tooling warrants caution.&lt;/p&gt;

&lt;p&gt;Logs Insights cost. CloudWatch charges for scanned log data in Insights queries. An agent with unconstrained query access could run up meaningful charges. There are no cost guardrails at the tool level — that has to be managed at the IAM level (restricting query scope) or through agent instructions.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;create_access_key&lt;/code&gt; in the IAM server. An agent tool that creates new AWS access keys is, by default, accessible. The condition key approach can block it, but you have to set that up deliberately. I would not wire up the IAM server in a production environment without first adding explicit deny policies for the write operations.&lt;/p&gt;

&lt;p&gt;My recommendation: start with &lt;code&gt;cloudwatch-mcp-server&lt;/code&gt; in read-heavy workflows. Treat the IAM server as a development tool until you have the deny policies in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;If AWS credentials are configured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install uv&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh

&lt;span class="c"&gt;# Test immediately&lt;/span&gt;
uvx awslabs.cloudwatch-mcp-server@latest

&lt;span class="c"&gt;# Add to a project&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .mcp.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
{
  "mcpServers": {
    "cloudwatch": {
      "command": "uvx",
      "args": ["awslabs.cloudwatch-mcp-server@latest"],
      "env": {
        "AWS_REGION": "us-east-1",
        "FASTMCP_LOG_LEVEL": "WARNING"
      }
    }
  }
}
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Official docs: &lt;a href="https://awslabs.github.io/mcp" rel="noopener noreferrer"&gt;awslabs.github.io/mcp&lt;/a&gt;. Source: &lt;a href="https://github.com/awslabs/mcp" rel="noopener noreferrer"&gt;github.com/awslabs/mcp&lt;/a&gt;. Free to use — you pay only for the AWS resources the agent touches.&lt;/p&gt;

&lt;p&gt;AI agents having console-level visibility into AWS infrastructure is coming regardless. AWS MCP Server GA is the first production-ready step in that direction.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>MemRL: Self-Evolving Agents via Episodic Memory RL</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Thu, 14 May 2026 04:16:52 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/memrl-self-evolving-agents-via-episodic-memory-rl-464b</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/memrl-self-evolving-agents-via-episodic-memory-rl-464b</guid>
      <description>&lt;p&gt;There is a gap in how most AI agents handle experience. They reason well from the start, but they don't get smarter from what they do. Fine-tuning closes that gap, but it's expensive, slow, and prone to catastrophic forgetting. RAG-based memory is cheaper, but it retrieves by similarity — not by whether a past strategy actually worked.&lt;/p&gt;

&lt;p&gt;MemRL, published on arXiv in January 2026, proposes a different approach: apply reinforcement learning directly to episodic memory at runtime, without touching model weights. The result is an agent that improves through trial and error, storing structured experiences and learning which ones to prioritize based on real task outcomes.&lt;/p&gt;

&lt;p&gt;This guide breaks down how MemRL works, what the benchmarks show, and how the core mechanism looks in practice — including a minimal reproduction Effloow Lab ran to verify the concept.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem MemRL Solves
&lt;/h2&gt;

&lt;p&gt;Current agent memory systems face a fundamental tradeoff. On one end, fine-tuning embeds knowledge directly into model weights — but requires expensive compute, labeled data, and still risks overwriting previously learned behavior (catastrophic forgetting). On the other end, RAG-style retrieval keeps knowledge external, making it cheap to update. But standard RAG retrieves by semantic similarity alone. It surfaces documents that look similar to the current query, not documents associated with strategies that previously worked.&lt;/p&gt;

&lt;p&gt;This is the stability-plasticity dilemma: agents either freeze their knowledge (stable but rigid) or update it continuously (plastic but forgetful). MemRL's claim is that this tradeoff is a false choice — you can have a frozen LLM backbone (stable) with an external memory that evolves through RL feedback (plastic).&lt;/p&gt;

&lt;h2&gt;
  
  
  What MemRL Is
&lt;/h2&gt;

&lt;p&gt;MemRL (arXiv:2601.03192, from MemTensor, updated February 2026) is a non-parametric framework that enables agents to self-evolve through runtime reinforcement learning on episodic memory. The LLM's weights never change. Instead, MemRL maintains a structured external memory, refines it based on task outcomes, and uses a two-phase retrieval mechanism to surface the most useful experiences — not just the most similar ones.&lt;/p&gt;

&lt;p&gt;The open-source code is available at &lt;a href="https://github.com/MemTensor/MemRL" rel="noopener noreferrer"&gt;MemTensor/MemRL&lt;/a&gt;, with support for ALFWorld, BigCodeBench, HLE, and Lifelong Agent Bench benchmarks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Intent-Experience-Utility Triplet
&lt;/h2&gt;

&lt;p&gt;The core data structure in MemRL is not a document. It's a triplet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Intent&lt;/strong&gt;: the task or query the agent was addressing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experience&lt;/strong&gt;: the specific action trajectory or solution strategy used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Utility (Q-value)&lt;/strong&gt;: a learned score representing how successful that experience was&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where RAG stores raw text and retrieves by embedding similarity, MemRL stores structured (intent, experience, Q-value) records. The Q-value is not fixed at write time — it evolves as the agent receives environmental feedback across episodes.&lt;/p&gt;

&lt;p&gt;This distinction matters. Two experiences with similar intents might have very different Q-values if one led to a successful outcome and the other failed. RAG can't distinguish these. MemRL can.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Two-Phase Retrieval Works
&lt;/h2&gt;

&lt;p&gt;When an agent faces a new task, MemRL retrieves relevant past experiences in two stages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase A — Semantic Filter:&lt;/strong&gt; The agent computes similarity between the current intent and all stored intents using dense embeddings. The top-k candidates (by semantic relevance) are kept. This narrows the search to experiences that are topically related to the current task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase B — Q-Value Ranking:&lt;/strong&gt; Among those filtered candidates, MemRL re-ranks by Q-value. Experiences with higher utility — those associated with successful outcomes — rise to the top. The agent retrieves the highest-Q candidates and uses them as in-context guidance for the current task.&lt;/p&gt;

&lt;p&gt;The paper describes Phase A as analogical transfer (retrieving similar past events) and Phase B as mental rehearsal (selecting strategies proven to work). Together, they avoid the main failure mode of pure RAG: retrieving semantically similar but strategically useless memories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Q-Value Learning: The RL Mechanism
&lt;/h2&gt;

&lt;p&gt;After the agent completes a task using retrieved memories, it receives a reward signal from the environment — success, partial success, or failure. MemRL applies a Monte Carlo-style update to the Q-value of the used memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q_new = Q_old + α × (reward - Q_old)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where α is the learning rate. Positive outcomes increase the Q-value; failures decrease it. Over many episodes, Q-values diverge: experiences associated with reliable strategies accumulate higher scores, while noise and failed attempts are downweighted.&lt;/p&gt;

&lt;p&gt;The entire optimization loop runs outside the LLM. No gradient computation, no retraining. The LLM reasons over whatever context it's given — MemRL just gets better at deciding what to put in that context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effloow Lab PoC: Core Mechanism in Python
&lt;/h2&gt;

&lt;p&gt;Effloow Lab ran a minimal reproduction of the IEU triplet and two-phase retrieval to verify the concept. Full repo installation requires ALFWorld and LLM credentials, so this PoC uses word-overlap similarity instead of dense embeddings — a known limitation documented in the lab run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleMemRL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k_semantic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k_q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;top_k_semantic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;top_k_semantic&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;top_k_q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;top_k_q&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cosine_sim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# word-overlap proxy for embeddings (sandbox limitation)
&lt;/span&gt;        &lt;span class="n"&gt;set_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;set_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;set_a&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;set_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;set_a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;set_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;set_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;set_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;experience&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initial_q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;experience&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;experience&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_q&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_q&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reward&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reward&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_intent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="c1"&gt;# Phase A: semantic filter
&lt;/span&gt;        &lt;span class="n"&gt;scored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_cosine_sim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;scored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scored&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;top_k_semantic&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="c1"&gt;# Phase B: Q-value ranking
&lt;/span&gt;        &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;top_k_q&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this with a small set of coding strategy memories, then applying positive feedback to sort-related experiences and negative feedback to a debugging strategy, produced the expected result: sort strategies rose to Q≈0.62, while the debugging entry dropped to Q≈0.24. Subsequent queries for sorting tasks surfaced the higher-Q memories first.&lt;/p&gt;

&lt;p&gt;The key limitation observed: word-overlap similarity doesn't capture semantic equivalence well, which caused some retrieval mismatches. Real MemRL uses dense embeddings (e.g., OpenAI text-embedding models or similar), resolving this. Full lab-run details and output are in &lt;code&gt;data/lab-runs/memrl-self-evolving-agents-episodic-memory-rl-guide-2026.md&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Results
&lt;/h2&gt;

&lt;p&gt;The paper benchmarks MemRL across these tasks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;MemRL (Last Acc.)&lt;/th&gt;
&lt;th&gt;MemP Baseline&lt;/th&gt;
&lt;th&gt;No-Memory Baseline&lt;/th&gt;
&lt;th&gt;Key Gain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ALFWorld&lt;/td&gt;
&lt;td&gt;0.507&lt;/td&gt;
&lt;td&gt;0.324&lt;/td&gt;
&lt;td&gt;0.278&lt;/td&gt;
&lt;td&gt;+56% over MemP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HLE&lt;/td&gt;
&lt;td&gt;0.573&lt;/td&gt;
&lt;td&gt;0.528&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;+8.5% over MemP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BigCodeBench&lt;/td&gt;
&lt;td&gt;0.508&lt;/td&gt;
&lt;td&gt;0.494&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;+2.8% over MemP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lifelong Agent Bench&lt;/td&gt;
&lt;td&gt;0.697 CSR&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Best overall&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gains are largest on ALFWorld and Lifelong Agent Bench — multi-step sequential tasks where memory utility accumulates across episodes. BigCodeBench shows smaller gains because it's primarily single-turn: there's less opportunity for multi-episode Q-value refinement when each task is independent.&lt;/p&gt;

&lt;p&gt;This pattern is important. MemRL's value is proportional to how much your agent loops over time. If your agent handles isolated, one-shot queries, you won't see ALFWorld-level improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  MemRL vs Traditional RAG
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MemRL Strengths
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Learns from success/failure — not just semantic match&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;No model fine-tuning required — frozen LLM backbone&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Q-values suppress noise and bad strategies over time&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Improves within a session and across sessions (transfer)&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Open-source with multi-benchmark validation&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;


Where It Lags
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Needs an environmental feedback signal — not always available&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Less useful for purely one-shot tasks without episode loops&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Q-value cold start: early episodes have unrefined utility scores&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;More complex to set up than a standard RAG pipeline&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;The underlying difference is what retrieval optimizes for. RAG finds memories that are similar. MemRL finds memories that are similar &lt;em&gt;and&lt;/em&gt; proved useful. For long-running agents where failure has a cost — home automation, coding assistants, planning agents — this distinction is meaningful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tempera MCP Server
&lt;/h2&gt;

&lt;p&gt;A community implementation called &lt;a href="https://github.com/anvanster/tempera" rel="noopener noreferrer"&gt;Tempera&lt;/a&gt; applies MemRL concepts to AI coding workflows via Model Context Protocol (MCP). Tempera captures coding sessions as episodes, indexes them for semantic search, and uses RL to surface the most valuable memories at query time. All projects share a common memory database stored under &lt;code&gt;~/.tempera/&lt;/code&gt;, enabling cross-project learning — a direct practical application of the MemRL architecture.&lt;/p&gt;

&lt;p&gt;This matters for developers already using MCP-compatible tools: Tempera is one path to experimenting with MemRL ideas without implementing the full research framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started with MemRL
&lt;/h2&gt;

&lt;p&gt;For developers interested in running the actual MemRL benchmarks, the setup flow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Clone the repo&lt;/span&gt;
git clone https://github.com/MemTensor/MemRL
&lt;span class="nb"&gt;cd &lt;/span&gt;MemRL

&lt;span class="c"&gt;# 2. Create environment (Python 3.10 required)&lt;/span&gt;
conda create &lt;span class="nt"&gt;-n&lt;/span&gt; memrl &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.10
conda activate memrl

&lt;span class="c"&gt;# 3. Install dependencies&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# 4. Configure LLM + embedding settings in configs/&lt;/span&gt;
&lt;span class="c"&gt;# (YAML files per benchmark)&lt;/span&gt;

&lt;span class="c"&gt;# 5. Run a benchmark runner&lt;/span&gt;
python memrl/run/alfworld_rl_runner.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results write to &lt;code&gt;logs/&lt;/code&gt; and &lt;code&gt;results/&lt;/code&gt; directories. The &lt;code&gt;configs/&lt;/code&gt; directory controls which LLM and embedding model you use — the paper uses frontier models but the code supports swapping these.&lt;/p&gt;

&lt;p&gt;Full environment setup for ALFWorld requires additional installation steps documented in the repo's README.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implications for Agent Developers
&lt;/h2&gt;

&lt;p&gt;MemRL's ideas translate to a few concrete questions worth asking about any agent system:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does your agent run repeatedly over similar tasks?&lt;/strong&gt; If yes, runtime Q-value learning could improve retrieval quality. If your agent handles purely isolated requests, the benefit is limited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your feedback signal?&lt;/strong&gt; MemRL needs a reward — task success, user rating, test pass/fail, something. Agents that get no structured outcome signal can't update Q-values. Designing a feedback loop is a prerequisite, not an afterthought.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are you fighting retrieval noise?&lt;/strong&gt; If your RAG-based memory system frequently surfaces semantically similar but strategically useless memories, MemRL's Phase B filtering is directly relevant. The Q-value layer exists precisely to downweight experiences that match the query but don't help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you need to avoid retraining?&lt;/strong&gt; MemRL's strongest argument is that agents can improve without compute-intensive fine-tuning cycles. For teams running agents at scale where fine-tuning is prohibitively expensive, this is a meaningful alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How is MemRL different from Reflexion or Voyager?
&lt;/h3&gt;

&lt;p&gt;Reflexion stores verbal self-reflection notes in memory. Voyager builds a skill library. MemRL is distinct in applying Q-value learning to determine which stored experiences to retrieve. Reflexion and Voyager still rely on recency or semantic matching; MemRL's retrieval is utility-driven.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can MemRL work with any LLM?
&lt;/h3&gt;

&lt;p&gt;Yes — the LLM backbone is frozen. MemRL is agnostic to the underlying model. The paper runs experiments with frontier models, but the memory and retrieval mechanism is entirely external to the LLM's weights.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What happens if the reward signal is noisy?
&lt;/h3&gt;

&lt;p&gt;Noisy rewards are a known challenge in RL. The paper applies Monte Carlo-style updates (averaging over episodes) which provides some robustness, but highly noisy reward signals will produce unreliable Q-values. The quality of MemRL's learning is bounded by the quality of the feedback signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Does MemRL require embeddings?
&lt;/h3&gt;

&lt;p&gt;Yes, Phase A requires dense vector similarity. The sandbox PoC used word-overlap as a proxy, but real MemRL uses embedding models to compute semantic similarity between stored intents and current queries. Any embedding model compatible with your stack works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;MemRL addresses a genuine gap: the cost of fine-tuning versus the limitations of static retrieval. Its approach — structure memory as IEU triplets, filter by semantics, rank by learned Q-values, update Q-values from task outcomes — is conceptually clean and benchmarked across four tasks.&lt;/p&gt;

&lt;p&gt;The gains are largest for multi-step, episodic tasks (ALFWorld: +56% over MemP) and more modest for single-turn workloads (BigCodeBench: +2.8%). The framework needs a feedback signal, and Q-values start uninformed — so there's a cold-start cost on early episodes.&lt;/p&gt;

&lt;p&gt;For teams building agents that loop repeatedly over tasks, interact with real environments, and can capture task success as a signal, MemRL is a well-evidenced alternative to both fine-tuning and standard RAG. The code is open, the benchmarks are public, and the Tempera MCP server offers a path to experimenting without setting up the full research framework.&lt;/p&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;MemRL is one of the more rigorous proposals for non-parametric agent learning published in early 2026. If you're running agents that repeat tasks and can capture feedback, the two-phase retrieval mechanism is worth understanding — and the open-source code makes it possible to test on your own benchmarks without writing the RL layer from scratch.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2601.03192" rel="noopener noreferrer"&gt;MemRL: Self-Evolving Agents via Runtime RL on Episodic Memory (arXiv:2601.03192)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.03192v1" rel="noopener noreferrer"&gt;MemRL Full Paper HTML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/MemTensor/MemRL" rel="noopener noreferrer"&gt;MemTensor/MemRL — GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/technology/memrl-outperforms-rag-on-complex-agent-benchmarks-without-fine-tuning" rel="noopener noreferrer"&gt;VentureBeat: MemRL Outperforms RAG Without Fine-Tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anvanster/tempera" rel="noopener noreferrer"&gt;anvanster/tempera — Tempera MCP Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.agentic-patterns.com/patterns/memory-reinforcement-learning-memrl/" rel="noopener noreferrer"&gt;Agentic Patterns: Memory Reinforcement Learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>memrl</category>
      <category>episodicmemory</category>
      <category>reinforcementlearning</category>
      <category>llmagents</category>
    </item>
    <item>
      <title>OpenAI Realtime Audio API: Voice Agents Guide 2026</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Thu, 14 May 2026 00:12:11 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/openai-realtime-audio-api-voice-agents-guide-2026-478i</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/openai-realtime-audio-api-voice-agents-guide-2026-478i</guid>
      <description>&lt;p&gt;On May 7, 2026, OpenAI quietly made voice agents production-viable. Three new realtime audio models landed in the API at the same time: &lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; (voice with GPT-5-class reasoning), &lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt; (live speech-to-speech translation across 70+ languages), and &lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt; (streaming speech-to-text billed by the minute). Each model has its own pricing, endpoint, and use-case fit.&lt;/p&gt;

&lt;p&gt;If you have been waiting for a stable, production-ready voice API before building, the wait is over. This guide walks through what each model does, how to connect to the API, what it costs, and the production patterns that separate a working demo from a robust voice agent.&lt;/p&gt;

&lt;p&gt;Effloow Lab inspected the Realtime API protocol and validated client-side event structures locally as part of this article's research. Full live testing requires an OpenAI API key; where relevant, we note what we verified and what we did not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Release Matters
&lt;/h2&gt;

&lt;p&gt;Previous versions of the Realtime API required working around a 32K-token context ceiling, managing your own speech-to-text pipeline, and accepting that the model would sometimes lose the thread of a long conversation. GPT-Realtime-2 removes these constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window expanded to 128K tokens&lt;/strong&gt; — four times the previous limit, enough for multi-turn conversations spanning tens of minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5-class reasoning integrated directly&lt;/strong&gt; — the model can call tools, reason through steps, and respond, all without leaving the audio stream&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three specialized models&lt;/strong&gt; instead of one general voice model, each optimized for a specific cost-performance point&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The split into three models is also a pricing move. If you only need transcription, GPT-Realtime-Whisper at $0.017/minute is dramatically cheaper than running voice inference at $32/1M tokens. Choose the right model and you can cut costs by 80–90% relative to using GPT-Realtime-2 for everything.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gpt-realtime-2&lt;/td&gt;
&lt;td&gt;Voice reasoning agent&lt;/td&gt;
&lt;td&gt;$32/1M input · $64/1M output tokens&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-realtime-translate&lt;/td&gt;
&lt;td&gt;Live speech translation&lt;/td&gt;
&lt;td&gt;$0.034/min&lt;/td&gt;
&lt;td&gt;Translation-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-realtime-whisper&lt;/td&gt;
&lt;td&gt;Streaming transcription&lt;/td&gt;
&lt;td&gt;$0.017/min&lt;/td&gt;
&lt;td&gt;STT-only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-2: Voice Reasoning for Production Agents
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 is the flagship of the trio. It brings GPT-5-level intelligence into the audio stream: the model can reason through multi-step requests, call functions, handle tool results, and continue speaking — all without pausing the conversation for a round trip to a separate text model.&lt;/p&gt;

&lt;h3&gt;
  
  
  How audio tokens are billed
&lt;/h3&gt;

&lt;p&gt;OpenAI encodes audio duration into tokens rather than sampling audio at a fixed rate. The billing math is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User speech (input):&lt;/strong&gt; 1 token per 100 ms of audio → 600 tokens per minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model response (output):&lt;/strong&gt; 1 token per 50 ms of audio → 1,200 tokens per minute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a typical bidirectional voice call where the user talks roughly as much as the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input cost:  600 tokens × ($32 / 1,000,000) = $0.0192 / min
Output cost: 1,200 tokens × ($64 / 1,000,000) = $0.0768 / min
Total uncached: ~$0.096 / min (~$5.76 / hour)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With prompt caching applied to system instructions and persistent session context, real-world costs can drop to roughly $0.05–$0.10/min according to third-party production estimates published by OpenAI partners.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting via WebSocket
&lt;/h3&gt;

&lt;p&gt;The Realtime API uses a persistent WebSocket connection. Every interaction is modeled as an exchange of typed JSON events — the client sends events, the server sends events back. Effloow Lab validated that the client-side event structures serialize and round-trip correctly in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;

&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# your key
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;voice_agent_session&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI-Beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;realtime=v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;additional_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# 1. Configure the session
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modalities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alloy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_vad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;silence_duration_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up a customer order by ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                        &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_choice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Stream audio (PCM16, 24kHz, base64-encoded chunks)
&lt;/span&gt;        &lt;span class="c1"&gt;# await ws.send(json.dumps({
&lt;/span&gt;        &lt;span class="c1"&gt;#     "type": "input_audio_buffer.append",
&lt;/span&gt;        &lt;span class="c1"&gt;#     "audio": base64_chunk
&lt;/span&gt;        &lt;span class="c1"&gt;# }))
&lt;/span&gt;
        &lt;span class="c1"&gt;# 3. Listen for server events
&lt;/span&gt;        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;raw_msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response.audio.delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# stream audio bytes to speaker
&lt;/span&gt;                &lt;span class="k"&gt;pass&lt;/span&gt;
            &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response.function_call_arguments.done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# handle tool call, then send result back
&lt;/span&gt;                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conversation.item.create&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function_call_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}))&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response.create&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;voice_agent_session&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI Agents Python SDK (&lt;code&gt;openai-agents&lt;/code&gt;) wraps this pattern into a higher-level &lt;code&gt;RealtimeAgent&lt;/code&gt; class if you prefer avoiding raw WebSocket management. The underlying transport is the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool calls mid-conversation
&lt;/h3&gt;

&lt;p&gt;GPT-Realtime-2 can call functions while speaking. The agent does not stop talking and wait — it continues the audio stream with a phrase like "Let me look that up" while dispatching the tool call in parallel. When the result arrives, it folds it into the ongoing response. This pattern is what makes GPT-Realtime-2 meaningfully different from a text model with TTS bolted on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interruption handling
&lt;/h3&gt;

&lt;p&gt;Voice activity detection (VAD) is built in when you set &lt;code&gt;turn_detection.type = "server_vad"&lt;/code&gt;. When the user starts speaking mid-response, the API sends a &lt;code&gt;response.cancelled&lt;/code&gt; event, truncates the current audio output, and starts a new inference cycle. The 128K context window means the model retains everything said before the interruption without a context reset.&lt;/p&gt;

&lt;p&gt;Three things to get right in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;VAD threshold&lt;/strong&gt; (&lt;code&gt;threshold: 0.5&lt;/code&gt; in the example above) — lower values detect softer speech but increase false triggers in noisy environments. Tune per your deployment channel (phone line vs browser microphone vs call center headset).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silence duration&lt;/strong&gt; (&lt;code&gt;silence_duration_ms&lt;/code&gt;) — how long a pause triggers end-of-turn. 500ms works for conversational speech; customer support scripts may need 700–1000ms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Barge-in state management on your server&lt;/strong&gt; — when &lt;code&gt;response.cancelled&lt;/code&gt; fires, flush any queued tool results from the cancelled turn or you'll deliver stale data to the next response cycle.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-Translate: Live Speech-to-Speech Translation
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-Translate is a single-purpose model trained on thousands of hours of professional interpreter audio. It takes live speech in any of 70+ input languages, detects the source language automatically, and returns translated speech plus text transcripts in one of 13 output languages.&lt;/p&gt;

&lt;p&gt;Target output languages as of May 2026: Spanish, Portuguese, French, Japanese, Russian, Chinese, German, Korean, Hindi, Indonesian, Vietnamese, Italian, and English.&lt;/p&gt;

&lt;p&gt;The dedicated endpoint is &lt;code&gt;/v1/realtime/translations&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wss://api.openai.com/v1/realtime/translations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;session_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ja&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# target language code
&lt;/span&gt;        &lt;span class="c1"&gt;# source language is auto-detected
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alloy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You stream 24 kHz PCM16 audio into &lt;code&gt;input_audio_buffer.append&lt;/code&gt; exactly as you would with GPT-Realtime-2. The model processes input audio while simultaneously streaming translated audio back, which keeps perceived latency low over continuous speech.&lt;/p&gt;

&lt;p&gt;Unlike a general-purpose voice model, GPT-Realtime-Translate will not answer questions or carry on conversation. It is translation-only by design. If a user asks "what time is it?" in French and your output language is English, the model translates the question into English — it does not answer it. Build a routing layer in front if your product needs both translation and reasoning.&lt;/p&gt;

&lt;p&gt;At $0.034/minute, a one-hour multilingual support call costs $2.04 in translation credits. A 30-person conference session with real-time translation for 60 minutes costs around $60 — cheaper than a human interpreter for a short session, and it runs at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-Whisper: Streaming Speech-to-Text
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-Whisper is the transcription-only model in the trio. It starts producing text output as the speaker talks rather than waiting for an utterance to finish. This keeps the UI feeling responsive — a transcription bar can update word-by-word instead of appearing in blocks.&lt;/p&gt;

&lt;p&gt;Pricing at $0.017/minute makes it among the cheapest options for streaming STT in the OpenAI ecosystem. An eight-hour workday of continuous transcription costs about $8.16.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Whisper Realtime session uses the standard /v1/realtime endpoint
# with model=gpt-realtime-whisper
&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-whisper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Server returns transcript deltas as speech is detected:
# { "type": "conversation.item.input_audio_transcription.delta", "delta": "Hello, " }
# { "type": "conversation.item.input_audio_transcription.delta", "delta": "can you hear me?" }
# { "type": "conversation.item.input_audio_transcription.completed", "transcript": "Hello, can you hear me?" }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GPT-Realtime-Whisper is the right choice when you need transcription but not inference — meeting recorders, live captioning systems, accessibility tools, voice-search preprocessing, and call analytics pipelines where a separate LLM processes the transcript downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Application: Choosing the Right Model
&lt;/h2&gt;

&lt;p&gt;The three models are not interchangeable. Use this decision tree:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does your user need a spoken response from the AI?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yes, and it involves reasoning, tool calls, or multi-turn logic → &lt;strong&gt;gpt-realtime-2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Yes, but it is a direct translation of what another person said → &lt;strong&gt;gpt-realtime-translate&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No, you only need the text of what the user said → &lt;strong&gt;gpt-realtime-whisper&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A customer support agent that looks up orders and reads statuses aloud: gpt-realtime-2.&lt;br&gt;
A multilingual conference call platform where each attendee hears their own language: gpt-realtime-translate.&lt;br&gt;
A meeting transcription SaaS that feeds into a separate summarizer: gpt-realtime-whisper.&lt;/p&gt;

&lt;p&gt;For hybrid products, you can run models side-by-side. A global customer support pipeline might use gpt-realtime-translate for non-English callers to produce an English transcript, then pass that transcript to a text-only GPT-5 for classification and routing, and only invoke gpt-realtime-2 when the agent needs to speak back. This layering can reduce per-call cost significantly compared to routing all audio through gpt-realtime-2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes in Production Voice Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ignoring prompt caching on system instructions.&lt;/strong&gt; The session configuration message is sent at the start of every WebSocket connection. For long system prompts, this is the largest per-session input cost. OpenAI caches inputs at $0.40/1M tokens vs $32/1M for uncached. Keep your system prompt stable and reuse session configurations where possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treating &lt;code&gt;response.cancelled&lt;/code&gt; as an error.&lt;/strong&gt; Interruptions are a normal part of conversation. Your application should handle the cancel event cleanly — flush pending state, log the cancelled turn, and let the model proceed with the new input. Applications that surface interruption events as errors create broken UX and noisy logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting that context grows.&lt;/strong&gt; The 128K context window means gpt-realtime-2 can hold a very long conversation without a reset. But it also means costs accumulate. A one-hour conversation with balanced speaking time can push well past $10 in audio tokens alone. For high-volume deployments, consider session time limits or periodic context compaction using a text-model summarization step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using gpt-realtime-2 for transcription-only use cases.&lt;/strong&gt; If you only need the text of what the user said, run gpt-realtime-whisper at $0.017/min instead of gpt-realtime-2 at $0.096+/min. The cost difference is roughly 5–6x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hard-coding the VAD threshold.&lt;/strong&gt; Different audio channels have different noise floors. A browser tab with a decent microphone is not the same as a phone call over PSTN. Ship a configuration option, even if only for internal deployment channels.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Does gpt-realtime-2 use GPT-5 under the hood?
&lt;/h3&gt;

&lt;p&gt;OpenAI describes gpt-realtime-2 as bringing "GPT-5-class reasoning" to live voice, and their Big Bench Audio benchmark shows +15.2% audio intelligence over GPT-Realtime-1.5. OpenAI has not confirmed whether the underlying weights are shared with GPT-5 or whether this is a separate model trained to the same capability level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use the Realtime API from a browser (client-side)?
&lt;/h3&gt;

&lt;p&gt;Yes. OpenAI supports ephemeral session tokens for client-side WebSocket connections. Generate a short-lived token from your backend (&lt;code&gt;POST /v1/realtime/sessions&lt;/code&gt;), pass it to the browser, and open the WebSocket from JavaScript. Do not embed your main API key in client-side code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does server VAD compare to manual turn detection?
&lt;/h3&gt;

&lt;p&gt;Server VAD (&lt;code&gt;turn_detection.type = "server_vad"&lt;/code&gt;) lets OpenAI's infrastructure handle speech segmentation — it detects when the user stops speaking and triggers inference automatically. Manual turn detection (&lt;code&gt;turn_detection: null&lt;/code&gt;) gives your application full control: you decide when to commit an audio buffer and request a response. Manual mode is more predictable in noisy environments but requires more engineering. Start with server VAD and switch to manual if you hit false-trigger issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is gpt-realtime-translate available on Azure OpenAI?
&lt;/h3&gt;

&lt;p&gt;Microsoft's Azure AI Foundry announced support for the new realtime audio models including gpt-realtime-whisper and gpt-realtime-translate shortly after the OpenAI release. Check the &lt;a href="https://azure.microsoft.com/en-us/pricing/details/azure-openai/" rel="noopener noreferrer"&gt;Azure OpenAI pricing page&lt;/a&gt; for regional availability and pricing, which may differ from direct OpenAI API pricing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What audio format does the Realtime API accept?
&lt;/h3&gt;

&lt;p&gt;The API accepts PCM16 audio at 24 kHz, base64-encoded and sent as &lt;code&gt;input_audio_buffer.append&lt;/code&gt; events. Most browser &lt;code&gt;MediaRecorder&lt;/code&gt; APIs require a format conversion step. The OpenAI cookbook includes a &lt;code&gt;realtime_translation_guide&lt;/code&gt; example with a JavaScript AudioWorklet for in-browser PCM16 capture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What happens if the WebSocket connection drops mid-conversation?
&lt;/h3&gt;

&lt;p&gt;The session state is held server-side for the duration of the connection. If the connection drops, the session is lost — there is no resume or reconnect mechanism as of May 2026. Build reconnect logic in your client and design conversations to be resumable from the last committed turn. Store transcript deltas locally and replay context if a reconnect is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;The May 2026 Realtime Audio API update is the first time all three voice agent primitives — reasoning, translation, and transcription — are available in a single unified API with clear per-minute or per-token pricing.&lt;/p&gt;

&lt;p&gt;For most developers building voice agents, the practical starting point is gpt-realtime-2 for prototyping and gpt-realtime-whisper for any transcription path that feeds a separate model. GPT-Realtime-Translate is genuinely useful and underpriced compared to traditional translation infrastructure — a multilingual product that previously required third-party translation services can now route entirely through one API.&lt;/p&gt;

&lt;p&gt;The 128K context window and built-in VAD make gpt-realtime-2 a legitimate foundation for production voice agents rather than a demo novelty. The remaining work is on your side: audio channel handling, graceful interruption management, prompt caching discipline, and cost modeling before you scale.&lt;/p&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;OpenAI's three-model voice API split is the right architecture: specialized models at specialized prices, all behind one WebSocket protocol. GPT-Realtime-2 is finally production-ready with 128K context and native tool calling. GPT-Realtime-Whisper at $0.017/min is the new default for any transcription-only pipeline. Build the routing layer between them and you can cover most voice AI use cases without leaving the OpenAI ecosystem.&lt;/p&gt;

</description>
      <category>openai</category>
      <category>voiceai</category>
      <category>realtimeapi</category>
      <category>voiceagents</category>
    </item>
    <item>
      <title>AWS Kiro: Spec-Driven IDE for Agentic Development</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Wed, 13 May 2026 08:20:17 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/aws-kiro-spec-driven-ide-for-agentic-development-5bol</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/aws-kiro-spec-driven-ide-for-agentic-development-5bol</guid>
      <description>&lt;p&gt;There is a quiet argument happening inside every engineering team that uses AI coding tools: should the AI write code directly from a chat prompt, or should it first commit to a plan you can actually verify?&lt;/p&gt;

&lt;p&gt;Cursor and Windsurf answer "write from the prompt." AWS Kiro answers "write the spec first."&lt;/p&gt;

&lt;p&gt;That is not a small difference. It changes what you version, what you review in a pull request, and who on the team can understand what the agent actually built. This guide covers what Kiro does, how the spec workflow is structured, how agent hooks automate the repetitive parts, and where it fits relative to the other agentic IDEs competing for your workflow in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kiro Is
&lt;/h2&gt;

&lt;p&gt;Kiro is a desktop IDE built on Code OSS — the open-source base that VS Code also runs on — developed by Amazon Web Services and released to the public in late 2025. It reached general availability in November 2025 after hitting capacity limits within days of its July preview launch.&lt;/p&gt;

&lt;p&gt;The product is positioned as AWS's successor to Amazon Q Developer. AWS ended new Q Developer signups effective May 15, 2026, explicitly directing new users to Kiro. That matters for team context: if your organization is already on AWS and was evaluating Q Developer, Kiro is now the answer.&lt;/p&gt;

&lt;p&gt;The core design principle: &lt;strong&gt;specs are the source of truth, and code is a build artifact derived from them.&lt;/strong&gt; Rather than asking an agent to "add a rate limiter," you write a spec that describes what the rate limiter should do, under what conditions, and what the acceptance criteria are. The agent then generates code to satisfy the spec, not just a prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Spec Workflow
&lt;/h2&gt;

&lt;p&gt;When you start a feature, Kiro creates three structured markdown files under &lt;code&gt;.kiro/specs/{feature-name}/&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;requirements.md&lt;/strong&gt; captures user stories and acceptance criteria using EARS notation (Easy Approach to Requirements Syntax). EARS structures each requirement as a conditional assertion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="err"&gt;WHEN&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;user&lt;/span&gt; &lt;span class="err"&gt;submits&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;registration&lt;/span&gt; &lt;span class="err"&gt;form&lt;/span&gt;
&lt;span class="err"&gt;THEN&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;system&lt;/span&gt; &lt;span class="err"&gt;SHALL&lt;/span&gt; &lt;span class="err"&gt;validate&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;email&lt;/span&gt; &lt;span class="err"&gt;format&lt;/span&gt; &lt;span class="err"&gt;before&lt;/span&gt; &lt;span class="err"&gt;saving&lt;/span&gt;
&lt;span class="err"&gt;AND&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;system&lt;/span&gt; &lt;span class="err"&gt;SHALL&lt;/span&gt; &lt;span class="err"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;a &lt;/span&gt;422 with field-level errors when validation fails
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That format is not just documentation. It maps directly to testable assertions, which is why Kiro can generate test stubs from requirements with reasonable accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;design.md&lt;/strong&gt; documents the technical architecture for the feature — data models, sequence diagrams in text form, interface contracts, and any relevant infrastructure considerations. This file lives in the repo alongside the feature code, so anyone reviewing a pull request can see the design intent without reconstructing it from the implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;tasks.md&lt;/strong&gt; contains a discrete task list that Kiro generates from the requirements and design. Tasks are tracked as in-progress or completed as the agent works through them. You can pause, redirect, or reassign tasks manually; Kiro treats them as a checkpoint-able queue, not a linear script.&lt;/p&gt;

&lt;p&gt;The three-document structure is also the surface where human review happens. Before the agent touches code, you can edit requirements to narrow scope, add edge cases to the design, or reprioritize tasks. That is the mechanism Kiro offers for keeping the human in the loop on complex features without turning every step into a manual approval.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Strengths
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Specs survive code refactors — the "why" stays versioned in the repo&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;EARS format produces testable acceptance criteria, not vague prose&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Spec review is a natural code review gate that any team member can participate in&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Free tier (50 requests/month) requires no AWS account or credit card&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Powers bundle MCP servers + hooks into reusable, context-aware packages&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;


Limitations
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Spec-first workflow adds planning time — not suited for fast prototyping&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Credits deplete quickly on multi-file specs (community-reported)&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Deep AWS integrations require an AWS account and Bedrock access&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Smaller extension/plugin ecosystem compared to VS Code or Cursor&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Agent Hooks: Automating the Repetitive Parts
&lt;/h2&gt;

&lt;p&gt;One feature that distinguishes Kiro from competitors is its hook system. Hooks are event-driven automations configured in &lt;code&gt;.kiro/hooks/&lt;/code&gt; as JSON files. When a trigger event fires, the hook either runs a natural-language agent prompt or executes a shell command.&lt;/p&gt;

&lt;p&gt;The available triggers as of Kiro's 0.10 changelog:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;file:save&lt;/code&gt; — fires whenever you save a file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;file:create&lt;/code&gt; — fires when a new file is created&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;task:pre&lt;/code&gt; — fires before a spec task begins executing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;task:post&lt;/code&gt; — fires after a spec task completes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common hook patterns from the official Kiro blog:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trigger"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file:save"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"match"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/components/**/*.tsx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Update the test file for the component that was just saved. Keep existing test cases; add new ones only for changed behavior."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That hook means you never manually sync your test file after touching a component. The agent does it on save, automatically, every time.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;task:post&lt;/code&gt; hook is useful for quality gates. You can configure it to run linting, type checking, or test execution after each agent task completes — so that a multi-step spec run doesn't silently accumulate broken intermediate states.&lt;/p&gt;

&lt;p&gt;Hooks are committed to the repository, not stored locally in user preferences. That means the automation behavior is consistent across the whole team and survives machine changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kiro Powers and MCP Integration
&lt;/h2&gt;

&lt;p&gt;Kiro supports both local and remote MCP servers. Its differentiated feature here is "Powers" — a packaging concept introduced in changelog 0.10.&lt;/p&gt;

&lt;p&gt;A Power bundles three things into a single installable unit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An MCP server providing tools&lt;/li&gt;
&lt;li&gt;A steering file that defines when and how to activate those tools&lt;/li&gt;
&lt;li&gt;Optional hooks that automate related tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Powers activate on-demand based on conversation context rather than loading all MCP tools upfront. This keeps the token budget clean: if you are working on a CloudFormation stack, the CloudFormation Power becomes active; the pricing tools stay dormant until they are relevant.&lt;/p&gt;

&lt;p&gt;AWS ships first-party Powers for several of its own platforms: CDK, CloudFormation, Pricing, and HealthOmics workflows. Third-party Powers follow the same packaging spec. If you are building your own MCP server and want it to integrate cleanly with Kiro, the Powers format gives you a structured way to bundle it.&lt;/p&gt;

&lt;p&gt;This is worth comparing to how Cursor handles MCP: Cursor supports MCP servers directly but without the packaging abstraction. All configured servers load simultaneously, and there is no built-in concept of context-aware activation. For teams with many MCP tools, the Powers approach reduces noise at the cost of an additional configuration layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing and Getting Started
&lt;/h2&gt;

&lt;p&gt;Kiro runs on a credit system. One agentic request equals one credit. Plans as of May 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Monthly Credits&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;AWS Account Required&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;Unlimited*&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro+&lt;/td&gt;
&lt;td&gt;Unlimited*&lt;/td&gt;
&lt;td&gt;$40/mo&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Power&lt;/td&gt;
&lt;td&gt;Unlimited*&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*Overage credits beyond the plan's included usage cost $0.04 each, billed at month-end.&lt;/p&gt;

&lt;p&gt;To install: download from &lt;a href="https://kiro.dev/downloads/" rel="noopener noreferrer"&gt;kiro.dev/downloads&lt;/a&gt;. The installer is available for macOS, Windows, and Linux. Sign in with GitHub, Google, AWS Builder ID, or IAM Identity Center. No credit card for the free tier.&lt;/p&gt;

&lt;p&gt;Your first project follows this path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a folder in Kiro&lt;/li&gt;
&lt;li&gt;Open the Kiro panel and type a feature description in natural language&lt;/li&gt;
&lt;li&gt;Kiro generates &lt;code&gt;.kiro/specs/your-feature/requirements.md&lt;/code&gt; — review and edit it&lt;/li&gt;
&lt;li&gt;Approve the requirements → Kiro generates &lt;code&gt;design.md&lt;/code&gt; and &lt;code&gt;tasks.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Approve the design → Kiro begins working through &lt;code&gt;tasks.md&lt;/code&gt; sequentially&lt;/li&gt;
&lt;li&gt;Hooks run automatically on file saves during implementation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full quickstart is at &lt;a href="https://kiro.dev/docs/getting-started/first-project/" rel="noopener noreferrer"&gt;kiro.dev/docs/getting-started/first-project/&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Kiro Compares to Cursor and Windsurf
&lt;/h2&gt;

&lt;p&gt;The agentic IDE space has three dominant positions heading into mid-2026:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; (1M+ daily active users, $20/mo Pro) is the market leader. Its strength is codebase indexing: semantic embeddings of the entire repo, @-file references, and a polished multi-file editing experience. Agent mode handles large refactors well. The weakness is that "prompt → code" means the agent's intent is implicit in the output, not in a verifiable artifact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf&lt;/strong&gt; ($15/mo) targets enterprise teams. Its Cascade feature auto-discovers context without manual file tagging, which works well on large codebases. First-pass success on complex tasks is reported as higher than Cursor's agent mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kiro&lt;/strong&gt; is the most opinionated of the three. It trades speed for verifiability. The spec workflow adds 15–30 minutes of upfront planning to any non-trivial feature. In return, you get requirements that you can reference in code review, design decisions that survive refactors, and hooks that keep tests and documentation in sync automatically.&lt;/p&gt;

&lt;p&gt;A useful heuristic: if your team already writes design documents before implementing, Kiro formalizes that workflow and connects it to the code generation loop. If your team goes from Jira ticket straight to code, Kiro will feel like it is adding ceremony without clear return.&lt;/p&gt;

&lt;p&gt;For further context on the broader agentic IDE landscape, see the &lt;a href="https://dev.to/articles/cursor-vs-windsurf-vs-zed-ai-ide-comparison-2026"&gt;cursor vs windsurf vs zed comparison&lt;/a&gt; and the &lt;a href="https://dev.to/articles/best-ai-coding-agents-2026"&gt;best AI coding agents roundup for 2026&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Actually Use Kiro
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Good fit:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teams already on AWS who want AI coding integrated with their existing Bedrock and IAM setup&lt;/li&gt;
&lt;li&gt;Projects where requirements traceability matters: regulated industries, complex APIs, multi-team codebases&lt;/li&gt;
&lt;li&gt;Engineers who write design documents by habit and want to close the gap between the doc and the code&lt;/li&gt;
&lt;li&gt;Anyone evaluating Amazon Q Developer alternatives (Kiro is now the official successor)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Less useful:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solo developers doing rapid prototyping where the cost of planning exceeds the cost of mistakes&lt;/li&gt;
&lt;li&gt;Projects where the team does not review design artifacts — specs without readers add overhead with no return&lt;/li&gt;
&lt;li&gt;Teams wanting the largest VS Code extension ecosystem (Kiro's is smaller, though growing)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Question: Is Spec-Driven Development a Better Default?
&lt;/h2&gt;

&lt;p&gt;The honest answer is that spec-driven development is better for some teams and worse for others — and Kiro does not resolve that ambiguity for you.&lt;/p&gt;

&lt;p&gt;What Kiro does resolve is the artifact gap that exists in every other agentic IDE: the mismatch between what you asked for and what the code actually does, documented nowhere. The spec files live in the repository. When something breaks three months later, you can read what the system was supposed to do instead of reverse-engineering it from the output.&lt;/p&gt;

&lt;p&gt;Whether that is worth the additional workflow overhead depends on how much of your team's time currently goes into maintaining context versus generating new code. For teams where "why does this work this way" is a common question in standups, the spec overhead pays back quickly. For solo builders iterating fast, the overhead stays overhead.&lt;/p&gt;

&lt;p&gt;Kiro's MCP Powers concept is worth watching independently of the spec workflow. Bundling MCP servers with activation context and hooks is a packaging idea that other IDEs will likely adopt — it solves a real problem with how multiple MCP tools currently have to be configured and managed.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Does Kiro work without an AWS account?
&lt;/h3&gt;

&lt;p&gt;Yes. The free tier (50 agentic requests/month) and the paid Pro plans ($20/mo) work with GitHub or Google sign-in. An AWS account only becomes relevant if you want to use Bedrock directly or connect to AWS-specific Powers like CloudFormation or CDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Are Kiro specs committed to the repository?
&lt;/h3&gt;

&lt;p&gt;Yes. The &lt;code&gt;.kiro/specs/&lt;/code&gt; and &lt;code&gt;.kiro/hooks/&lt;/code&gt; directories are intended to be committed. Specs and hooks are team artifacts, not personal IDE settings. This is deliberate: Kiro's design assumes the spec files are part of the code review surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How are Kiro credits consumed?
&lt;/h3&gt;

&lt;p&gt;Each agentic request consumes one credit. Generating a spec from a prompt, executing a task from &lt;code&gt;tasks.md&lt;/code&gt;, or running an agent hook each count as one request. Autocomplete and inline suggestions do not consume credits. On the free tier (50 credits/month), a medium-complexity feature with 8–10 spec tasks plus several hooks will use most of the monthly allowance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What is the difference between Kiro Powers and regular MCP servers?
&lt;/h3&gt;

&lt;p&gt;A Power is an MCP server plus a steering file plus optional hooks, packaged together. The steering file tells Kiro when to activate the Power's tools based on conversation context. Regular MCP servers load all their tools upfront; Powers load on-demand. The practical difference is a shorter tool list in the agent's context window, which reduces token usage and improves relevance on complex tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is Kiro open source?
&lt;/h3&gt;

&lt;p&gt;The Kiro codebase repository is at &lt;a href="https://github.com/kirodotdev/Kiro" rel="noopener noreferrer"&gt;github.com/kirodotdev/Kiro&lt;/a&gt;. The IDE is built on Code OSS (VS Code open-source base). The agent runtime and Bedrock integrations are proprietary AWS services.&lt;/p&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;Kiro is the first agentic IDE that makes the design document part of the build process rather than a separate artifact that decays. The spec workflow adds overhead that pays back on team codebases where requirements traceability matters. For solo prototyping or teams that run Cursor smoothly, there is no compelling reason to switch today — but the Powers and hooks concepts are worth watching as patterns the rest of the IDE market will absorb.&lt;/p&gt;

</description>
      <category>awskiro</category>
      <category>agenticide</category>
      <category>specdrivendevelopment</category>
      <category>developertools</category>
    </item>
    <item>
      <title>Claude Agent SDK Practical Guide — Building Tool-Using AI Agents from Scratch</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Wed, 13 May 2026 06:41:24 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/claude-agent-sdk-practical-guide-building-tool-using-ai-agents-from-scratch-448b</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/claude-agent-sdk-practical-guide-building-tool-using-ai-agents-from-scratch-448b</guid>
      <description>&lt;p&gt;I ran into the Tool Use moment while building a FastAPI streaming backend with the Claude API. The trigger was simple: a user asked "how many days are left in this year?" and Claude answered wrong. Not just wrong — confidently wrong. I remember thinking, "OK, a chatbot can't handle this."&lt;/p&gt;

&lt;p&gt;Tool Use fixes that structurally. Instead of the model calculating directly, it calls a calculation function and uses the result to answer. That difference is what separates a chatbot from an agent.&lt;/p&gt;

&lt;p&gt;This guide covers the Tool Use patterns I validated by directly installing and running anthropic SDK 0.101.0. From basic tool definitions to the agentic loop, error handling, and cost — practical code you can actually use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tool Use Is Different from a Chatbot — The Structural Gap
&lt;/h2&gt;

&lt;p&gt;An LLM samples tokens from a probability distribution. Tasks like date arithmetic, precise numerical calculations, or live API lookups are structurally unreliable — the model recreates patterns from training data, not ground truth.&lt;/p&gt;

&lt;p&gt;Tool Use addresses this at a different layer. The model decides &lt;em&gt;what to do&lt;/em&gt;, and actual execution is delegated to external code. Instead of computing directly, the model emits something like &lt;code&gt;calculate("365 - today.day_of_year")&lt;/code&gt;, and Python runs it and returns the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Chatbot: model answers directly
# "Doesn't know today's date, has to compute directly -&amp;gt; can be wrong"
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many days left in this year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent: delegates to a tool
# "Model picks the tool, Python computes accurately"
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# includes date calculation tool
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many days left in this year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decisive difference is reliability. Python's &lt;code&gt;datetime&lt;/code&gt; module doesn't get dates wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup — Sandbox Verification Results
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate  &lt;span class="c"&gt;# Windows: venv\Scripts\activate&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results from running this directly in a temp directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;anthropic version: 0.101.0
Client instantiated: ✓
Client type: Anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;0.101.0 is the latest as of 2026-05-13. This is the official Anthropic SDK — completely different from packages like &lt;code&gt;pyautogen&lt;/code&gt; that were common before 2025.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# or set ANTHROPIC_API_KEY env var
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK auto-loads the API key from &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt;. Don't hard-code it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining Your First Tool — JSON Schema Is All You Need
&lt;/h2&gt;

&lt;p&gt;Tool Use uses a structure similar to OpenAI Function Calling. Each tool has three parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;name&lt;/code&gt;: Tool identifier (like a function name)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;description&lt;/code&gt;: The basis for the model's decision on when to use this tool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;input_schema&lt;/code&gt;: JSON Schema for input parameters
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_current_date_info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Returns current date and time information. Use for questions about &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;today&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;now&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, or anything requiring current date knowledge.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timezone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IANA timezone (e.g. America/New_York, Asia/Seoul). Default: UTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Performs mathematical operations. Handles addition, subtraction, multiplication, division, exponentiation, and modulo.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subtract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multiply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;divide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;power&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modulo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The operation to perform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;First operand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Second operand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;description&lt;/code&gt; field matters more than it looks. The model reads only the description to decide whether to use this tool. When I tested with vague descriptions, the model picked the wrong tool or skipped it entirely.&lt;/p&gt;

&lt;p&gt;Validated tool definition structure from my sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_current_date_info&lt;/span&gt;
  &lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Returns current date info&lt;/span&gt;
  &lt;span class="s"&gt;Required params&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;

&lt;span class="na"&gt;Tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;calculate&lt;/span&gt;
  &lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Performs math operations&lt;/span&gt;
  &lt;span class="s"&gt;Required params&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;operation'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;b'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementing the Agentic Loop — The Core of Tool Use
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Fclaude-agent-sdk-tool-use-complete-guide-2026%2Fagentic-loop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Fclaude-agent-sdk-tool-use-complete-guide-2026%2Fagentic-loop.png" alt="Agentic loop diagram — flow from user message through tool execution to result return"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the core. Tool Use doesn't finish in a single API call. When the model calls a tool → we execute it → we feed the result back. This cycle repeats until the model returns &lt;code&gt;end_turn&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# No tool call — return the final answer
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle tool calls
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Add the full assistant response to messages (including tool calls)
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

            &lt;span class="c1"&gt;# Collect all tool results and add together
&lt;/span&gt;            &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;

            &lt;span class="c1"&gt;# Tool results go under the "user" role (API requirement)
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max iterations exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things are easy to miss here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, add the entire &lt;code&gt;response.content&lt;/code&gt; to messages — not just the text block. The model needs to know which tool it called in order to generate its next response correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, tool results go under the &lt;code&gt;user&lt;/code&gt; role. Counterintuitive, but the API treats tool execution results as coming from the environment (the user side), not the assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Real Tools — Calculator, Date, File Reader
&lt;/h2&gt;

&lt;p&gt;The tool execution function is straightforward. It takes a name and input, returns a string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytz&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="c1"&gt;# Safe math — uses operator mapping instead of string expression execution
&lt;/span&gt;&lt;span class="n"&gt;SAFE_OPERATIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subtract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multiply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;divide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;truediv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;power&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modulo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mod&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_current_date_info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tz_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timezone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pytz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tz_str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;day_of_year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timetuple&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;tm_yday&lt;/span&gt;
            &lt;span class="n"&gt;days_remaining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;365&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;day_of_year&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timezone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tz_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day_of_year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;day_of_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;days_remaining_in_year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;days_remaining&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;op_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;op_func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SAFE_OPERATIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op_func&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Unknown operation: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;op_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;divide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Cannot divide by zero&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;op_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
        &lt;span class="n"&gt;filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Path traversal prevention: only allow within designated base directory
&lt;/span&gt;        &lt;span class="n"&gt;allowed_base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/app/data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;abs_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;realpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;abs_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;allowed_base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Path not allowed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;abs_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 2KB limit
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Unknown tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual sandbox results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;calculate(multiply, 15, 7) = 105
calculate(add, 105, 3) = 108
calculate(divide, 100, 4) = 25.0
Input validation (required field present): True
Input validation (missing required field): False — Missing required field: location
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error classification strategy from the &lt;a href="https://dev.to/en/blog/en/fastapi-claude-api-streaming-production-guide-2026"&gt;FastAPI + Claude API streaming guide&lt;/a&gt; applies here too — categorize tool errors as retryable vs. non-retryable for better production stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Multiple Tool Calls — Can We Run in Parallel?
&lt;/h2&gt;

&lt;p&gt;Claude can call multiple tools simultaneously in a single turn. Ask "compare the weather in Seoul and Tokyo" and it returns two &lt;code&gt;get_weather&lt;/code&gt; calls at once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# When Claude calls multiple tools in one turn
&lt;/span&gt;&lt;span class="n"&gt;tool_use_blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Technically possible to run in parallel
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_completed&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_use_blocks&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sandbox-verified multi-tool results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_use_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"25.0"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool_use_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;temp&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: 18, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;condition&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Sunny&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'd only apply parallel execution to idempotent read tools. External API calls with side effects need careful rate-limit and ordering consideration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error Handling — Failing Gracefully
&lt;/h2&gt;

&lt;p&gt;When a tool fails, return &lt;code&gt;is_error: true&lt;/code&gt;. The model reads this, recognizes the error, and either tries something else or gives the user contextual guidance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safe_process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Tool execution with error handling. Returns (content, is_error).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool execution failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;safe_process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;is_error: true&lt;/code&gt; is set, the model doesn't just skip past it. From my testing, it reads the error content and responds with something like "The file couldn't be found — please double-check the path." Returning empty strings or ignoring errors tends to produce confused or hallucinated responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cost of Tool Use — How Many Tokens Does It Add?
&lt;/h2&gt;

&lt;p&gt;Honestly, Tool Use costs more. According to Anthropic's documentation, each tool definition adds roughly 200–300 tokens of overhead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 tool definitions → ~1,250 tokens fixed overhead (every request)
1 tool call → additional input + output tokens
3-turn agentic loop → accumulating context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agentic loop accumulates context. After 5 turns, everything from the first message to the fifth tool result is in context. Costs can compound quickly in long-running agents.&lt;/p&gt;

&lt;p&gt;Two ways to manage this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Combine with Prompt Caching&lt;/strong&gt;: Tool definitions are the same on every request. As covered in the &lt;a href="https://dev.to/en/blog/en/claude-api-prompt-caching-cost-optimization-guide"&gt;Claude API Prompt Caching guide&lt;/a&gt;, caching the system prompt with &lt;code&gt;cache_control: {"type": "ephemeral"}&lt;/code&gt; applies here too, and tool definitions benefit similarly from repeated identical structures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Pass only the tools you need&lt;/strong&gt;: Always including 10 tool definitions is worse than passing the 2–3 that matter for the current task. More tools consume more tokens and occasionally lead the model to pick the wrong one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Tool Use
&lt;/h2&gt;

&lt;p&gt;Tool Use works with streaming responses. In anthropic 0.101.0, use &lt;code&gt;client.messages.stream&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Stream text chunks in real time
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text_chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get the final message after streaming completes
&lt;/span&gt;    &lt;span class="n"&gt;final_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_final_message&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;final_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# ... same handling as above
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When streaming with tool use: if you're showing text chunks to the user in real time and also need to process tool calls, design the UX flow before you start. The &lt;a href="https://dev.to/en/blog/en/vercel-ai-sdk-claude-streaming-agent-2026"&gt;Vercel AI SDK approach&lt;/a&gt; is worth looking at to see how this gets abstracted on the frontend side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Pattern: GitHub Issue Monitor Agent
&lt;/h2&gt;

&lt;p&gt;A complete example tying everything together — a simple agent that fetches and summarizes GitHub issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# reads ANTHROPIC_API_KEY
&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_github_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetches the issue list for a GitHub repository.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;owner/repo format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;closed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max issues to return (default: 10)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_issue_detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetches the details of a specific GitHub issue.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;owner/repo format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Issue number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_github_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Real impl: requests.get(f"https://api.github.com/repos/{repo}/issues", ...)
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TypeError in data processor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add streaming support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_issue_detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reproduce: pass an empty list as input. Stack trace attached.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_issue_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loop limit exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Still Unresolved — Honest Limitations
&lt;/h2&gt;

&lt;p&gt;Here's what I find genuinely frustrating about Tool Use in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context accumulation&lt;/strong&gt;: The agentic loop keeps growing the context. After 10 turns, everything from the first message to the tenth tool result is in there. Long-running agents need a context management strategy — summarize intermediate results, prune stale messages — and there's no standard pattern for this yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-deterministic tool selection&lt;/strong&gt;: Same question, different tool selection on different runs. Even with &lt;code&gt;temperature=0&lt;/code&gt;, you can't guarantee identical behavior across invocations. This makes testing harder than it should be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description quality is everything&lt;/strong&gt;: Vague &lt;code&gt;description&lt;/code&gt; → wrong tool selection or no tool use at all. Writing good tool descriptions is its own prompt engineering discipline. No framework solves this for you.&lt;/p&gt;

&lt;p&gt;I think Tool Use is underappreciated. Agent frameworks offer impressive abstractions, but this pattern is what's running underneath all of them. &lt;a href="https://dev.to/en/blog/en/pydantic-ai-type-safe-agent-tutorial-2026"&gt;PydanticAI's type-safe tool definitions&lt;/a&gt; are a convenient layer that auto-generates the JSON schema, but understanding the underlying mechanism is what gets you unstuck when things break.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Validated findings from anthropic 0.101.0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool definitions&lt;/strong&gt;: &lt;code&gt;name&lt;/code&gt; + &lt;code&gt;description&lt;/code&gt; + &lt;code&gt;input_schema&lt;/code&gt;. Description quality determines whether the tool gets used correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic loop&lt;/strong&gt;: Detect &lt;code&gt;stop_reason == "tool_use"&lt;/code&gt; → execute tool → append &lt;code&gt;tool_result&lt;/code&gt; → repeat. Simple pattern, but the message structure has to be exactly right.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error handling&lt;/strong&gt;: Use &lt;code&gt;is_error: true&lt;/code&gt; so the model recognizes failures and responds appropriately. Never return empty strings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: ~250 tokens overhead per tool definition. Combine with Prompt Caching. Watch context accumulation in multi-turn agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel tool calls&lt;/strong&gt;: &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; works for idempotent read tools. Apply selectively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tool Use is the most direct path from chatbot to agent. You don't need a complex framework — this pattern alone is enough to build practical agents.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>anthropicsdk</category>
      <category>tooluse</category>
      <category>agents</category>
    </item>
    <item>
      <title>A-Mem: Agentic Memory for LLM Agents Explained</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Wed, 13 May 2026 04:18:18 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/a-mem-agentic-memory-for-llm-agents-explained-454e</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/a-mem-agentic-memory-for-llm-agents-explained-454e</guid>
      <description>&lt;p&gt;Your agent forgets everything between sessions. You bolt on a vector database, retrieve the top-5 similar chunks at query time, and call it memory. It works — until the agent needs to reason across multiple related memories it cannot connect on the fly, or until a new fact should change how it interprets older ones.&lt;/p&gt;

&lt;p&gt;That is the problem A-Mem (Agentic Memory for LLM Agents, arXiv:2502.12110) was built to solve. Accepted at NeurIPS 2025, A-Mem introduces a memory system where the agent actively organizes, links, and evolves its memories on write — not just at retrieval time. The result is a system that handles multi-hop reasoning tasks at roughly six times the accuracy of standard vector retrieval baselines on the LoCoMo benchmark.&lt;/p&gt;

&lt;p&gt;Effloow Lab inspected the paper, codebase (MIT license, GitHub: agiresearch/A-mem), and documented the architecture. This guide explains what A-Mem does differently and when it is worth reaching for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Static Memory Systems Fall Short
&lt;/h2&gt;

&lt;p&gt;Most agent memory setups follow the same pattern: embed a document or conversation turn, store it in a vector database, retrieve by cosine similarity at query time. The pattern is fast and simple, but it has three structural weaknesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weak multi-hop reasoning.&lt;/strong&gt; If memory A is about "Redis sorted sets" and memory B is about "leaderboard query optimization," a query about "how to build a fast leaderboard" may retrieve either memory but not both in the right relationship. The agent has to reconstruct the connection itself — often unreliably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No retroactive updating.&lt;/strong&gt; When you add a new memory that changes the interpretation of an older one, the old memory stays unchanged. The agent may retrieve the old, stale context and draw the wrong conclusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixed retrieval patterns.&lt;/strong&gt; Standard RAG requires you to predefine how memories are accessed: top-k by similarity, keyword filter, or graph traversal. Each new task type may need a new access pattern that you have not engineered.&lt;/p&gt;

&lt;p&gt;Graph-enhanced RAG systems (like MemGPT) address the third problem partially by adding explicit entity-relationship graphs, but they still rely on a predefined schema. A-Mem addresses all three by making memory organization an active, agentic process rather than a fixed retrieval mechanism. (For a practical foundation on building RAG pipelines before layering on agentic memory, see &lt;a href="https://dev.to/articles/build-rag-app-python-llamaindex-tutorial-2026"&gt;Build a RAG App with LlamaIndex&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  What A-Mem Is
&lt;/h2&gt;

&lt;p&gt;A-Mem treats memory the way a thoughtful knowledge worker treats a Zettelkasten — a note-taking methodology where every note is a structured unit linked to related notes. Rather than storing raw text and embedding it once, A-Mem constructs a rich note for each memory, analyzes its relationship to existing memories, creates explicit links, and can update existing notes when new knowledge changes the picture.&lt;/p&gt;

&lt;p&gt;The project is open-source under the MIT license and was accepted at NeurIPS 2025. The primary repositories are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Official: &lt;a href="https://github.com/agiresearch/A-mem" rel="noopener noreferrer"&gt;github.com/agiresearch/A-mem&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Paper author mirror: &lt;a href="https://github.com/WujiangXu/A-mem" rel="noopener noreferrer"&gt;github.com/WujiangXu/A-mem&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Community MCP server extension: &lt;a href="https://github.com/tobs-code/a-mem-mcp-server" rel="noopener noreferrer"&gt;github.com/tobs-code/a-mem-mcp-server&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Core Architecture: Three Operations
&lt;/h2&gt;

&lt;p&gt;A-Mem's architecture centers on three operations that run every time a new memory is added.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Note Construction
&lt;/h3&gt;

&lt;p&gt;When a new piece of information enters the system — a conversation turn, a tool result, an observation — A-Mem does not just embed and store it. It generates a structured note containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Contextual description&lt;/strong&gt;: a short LLM-generated summary that captures the meaning, not just the surface text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keywords and tags&lt;/strong&gt;: structured labels for categorical retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding vector&lt;/strong&gt;: stored in ChromaDB for similarity search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This enrichment step is the first departure from vanilla RAG. The embedding is of a richer, LLM-synthesized representation rather than raw text.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Link Generation
&lt;/h3&gt;

&lt;p&gt;After note construction, A-Mem scans the existing memory store for related notes. When meaningful semantic overlap exists — shared keywords, similar contextual descriptions, or high embedding similarity — it creates an explicit directed link between the notes. These links are stored in a NetworkX graph alongside the ChromaDB vector store.&lt;/p&gt;

&lt;p&gt;The combination of ChromaDB (vector similarity) and NetworkX (graph traversal) means the system can answer both "what is similar to this?" (ChromaDB) and "what is connected to this?" (graph walk) without choosing one or the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Memory Evolution
&lt;/h3&gt;

&lt;p&gt;This is A-Mem's most distinctive operation. When a new memory is integrated, the system checks whether any existing linked memories should be updated. If the new information changes or deepens the context of an older note, the older note's contextual description is rewritten to reflect the new understanding.&lt;/p&gt;

&lt;p&gt;Consider an agent that first learns "the team uses Redis for session storage" and later learns "the team is migrating from Redis to Valkey for cost reasons." With vanilla RAG, both facts sit independently. With A-Mem, the second memory triggers an evolution of the first: its contextual description is updated to reflect that this is an in-progress migration, not a stable architecture decision.&lt;/p&gt;

&lt;p&gt;This makes A-Mem's memory graph a living structure — not an append-only log.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage Backend
&lt;/h2&gt;

&lt;p&gt;The implementation combines two storage layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector store&lt;/td&gt;
&lt;td&gt;ChromaDB&lt;/td&gt;
&lt;td&gt;Fast approximate similarity search on enriched embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph store&lt;/td&gt;
&lt;td&gt;NetworkX&lt;/td&gt;
&lt;td&gt;Explicit inter-memory links for multi-hop traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM backend&lt;/td&gt;
&lt;td&gt;OpenAI / other&lt;/td&gt;
&lt;td&gt;Note enrichment, link scoring, evolution reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ChromaDB handles retrieval when you query by concept similarity. NetworkX handles traversal when the agent needs to follow a chain of related memories. The LLM backend drives the intelligent parts: note enrichment, deciding which links to create, and whether evolution should happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Results on LoCoMo
&lt;/h2&gt;

&lt;p&gt;A-Mem's paper evaluates on the LoCoMo (Long Conversational Memory) benchmark, a dataset of long-form conversations designed to test multi-session memory recall. The multi-hop category is most revealing — these are questions that require reasoning across two or more distinct stored memories.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Multi-Hop ROUGE-L&lt;/th&gt;
&lt;th&gt;Temporal Reasoning F1&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LoCoMo baseline&lt;/td&gt;
&lt;td&gt;4.68&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ReadAgent&lt;/td&gt;
&lt;td&gt;2.81&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MemGPT (GPT-4o-mini)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;25.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A-Mem (Qwen2.5-15b)&lt;/td&gt;
&lt;td&gt;27.23&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A-Mem (GPT-4o-mini)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;45.85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The multi-hop ROUGE-L improvement with Qwen2.5-15b is roughly 5.8x over the LoCoMo baseline (27.23 vs 4.68). On temporal reasoning tasks with GPT-4o-mini, A-Mem reaches 45.85 F1 against MemGPT's 25.52 — nearly double. These gains are structural, not prompt tricks: they come from having precomputed the links between related memories at write time, so the agent does not need to reconstruct connections at query time under token pressure.&lt;/p&gt;

&lt;p&gt;A-Mem's multi-hop advantage is more pronounced than its gains on simpler single-fact retrieval. Open Domain tasks — where the question maps to a single stored fact — show improvements too, but smaller. This tells you something important about when to use A-Mem: it earns its complexity for tasks that require chaining related facts, not for simple key-value lookups.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use A-Mem
&lt;/h2&gt;

&lt;p&gt;The project is installed from source. The core API is straightforward once the dependencies are in place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/agiresearch/A-mem
&lt;span class="nb"&gt;cd &lt;/span&gt;A-mem
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dependencies include &lt;code&gt;chromadb&lt;/code&gt;, &lt;code&gt;networkx&lt;/code&gt;, and an LLM backend (OpenAI by default, but the backend is configurable).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initializing the memory system:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgenticMemorySystem&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgenticMemorySystem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Embedding model (SentenceTransformers)
&lt;/span&gt;    &lt;span class="n"&gt;llm_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;           &lt;span class="c1"&gt;# Used for note enrichment + evolution
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;model_name&lt;/code&gt; controls the embedding model. &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; is a compact, fast option. For higher quality embeddings, substitute a larger SentenceTransformers model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adding a memory:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simple content
&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_note&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Learned that batch size of 16 reduces GPU OOM errors on A100s.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# With metadata
&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_note&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Redis sorted sets are efficient for leaderboard queries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Engineering&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;202503021500&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every &lt;code&gt;add_note&lt;/code&gt; call triggers the full Note Construction → Link Generation → Memory Evolution pipeline. The call blocks while the LLM enriches the note and evaluates links, so latency is higher than a plain vector insert. This is the write cost you pay for smarter retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieving memories:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database performance optimization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The search returns notes ordered by relevance, now including notes that are linked to the top matches — so a query about "database performance" can surface both the Redis sorted sets note and a linked note about index strategy, even if the latter does not match the query embedding closely on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  A-Mem vs. Other Memory Systems
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Vanilla RAG&lt;/th&gt;
&lt;th&gt;MemGPT&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;th&gt;A-Mem&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage type&lt;/td&gt;
&lt;td&gt;Vector only&lt;/td&gt;
&lt;td&gt;Vector + graph (schema)&lt;/td&gt;
&lt;td&gt;Fact extraction&lt;/td&gt;
&lt;td&gt;Vector + graph (dynamic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write-time enrichment&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Yes (facts)&lt;/td&gt;
&lt;td&gt;Yes (full note + links)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory evolution&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-hop reasoning&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write latency&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High (LLM call per write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema flexibility&lt;/td&gt;
&lt;td&gt;None needed&lt;/td&gt;
&lt;td&gt;Predefined&lt;/td&gt;
&lt;td&gt;Fact-based&lt;/td&gt;
&lt;td&gt;Fully flexible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Static corpora&lt;/td&gt;
&lt;td&gt;Structured entities&lt;/td&gt;
&lt;td&gt;Fact-heavy chat&lt;/td&gt;
&lt;td&gt;Multi-session reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mem0 (which uses a fact extraction pattern and scores 66.9% on LOCOMO) is a reasonable middle ground for production: lower write latency than A-Mem, better multi-hop than vanilla RAG. A-Mem wins on the hardest multi-hop tasks but at a real cost: every write requires an LLM call for enrichment and link evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Using A-Mem for simple key-value lookups.&lt;/strong&gt; If your agent stores "user prefers dark mode" and retrieves it verbatim, a plain vector store is faster and sufficient. A-Mem's overhead is only justified when you need cross-memory reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring write latency in production.&lt;/strong&gt; The note enrichment LLM call is synchronous in the base implementation. For high-throughput applications, this needs to be moved to an async queue. The community MCP server (tobs-code/a-mem-mcp-server) is one starting point for integration patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing the wrong embedding model.&lt;/strong&gt; &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; is fast but loses nuance for specialized domains (code, legal text, medical). For domain-specific agents, use a domain-adapted embedding model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not monitoring memory graph growth.&lt;/strong&gt; As the note graph grows, link evaluation cost scales. For agents running thousands of sessions, you need a graph pruning strategy. The paper does not fully address this; it is an open implementation concern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expecting zero-shot plugin behavior.&lt;/strong&gt; A-Mem requires a different design philosophy than RAG. You need to think in terms of notes and links, not documents and embeddings. Teams that treat it as a drop-in RAG replacement will not see the multi-hop gains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: How does A-Mem compare to MemMachine?
&lt;/h3&gt;

&lt;p&gt;MemMachine (see &lt;a href="https://dev.to/articles/memmachine-ground-truth-agent-memory-paper-poc-2026"&gt;Effloow's MemMachine guide&lt;/a&gt;) focuses on ground-truth-preserving memory: it ensures memories are never silently corrupted or overwritten without provenance. A-Mem focuses on dynamic organization and cross-memory evolution. They address different failure modes — A-Mem solves the multi-hop reasoning gap, MemMachine solves the reliability gap. The two approaches are complementary rather than competing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is A-Mem ready for production use?
&lt;/h3&gt;

&lt;p&gt;A-Mem is an MIT-licensed research implementation, not a managed service. The GitHub codebase is functional and documented, but it has not been stress-tested at enterprise scale. For production use, you would need to wrap it in an async worker queue, add monitoring, and handle ChromaDB persistence and backup. Teams who want the architecture without the ops overhead should watch for managed implementations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does A-Mem compare to Mem0 for agent memory?
&lt;/h3&gt;

&lt;p&gt;Mem0 uses a fact-extraction approach: it identifies discrete facts from conversations and stores them as atomic units. This is efficient and production-friendly, scoring 66.9% on LOCOMO. A-Mem builds richer structured notes and evolves them — winning on multi-hop tasks but with higher write cost. If your agent needs to chain across multiple related memories, A-Mem has a structural advantage. For simpler recall, Mem0's lower latency is more practical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Does A-Mem work with local LLMs?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;llm_backend&lt;/code&gt; parameter is configurable. The codebase supports OpenAI out of the box and can be adapted to other backends. For local LLMs (Ollama, vLLM, LM Studio), you would configure an OpenAI-compatible endpoint. Note enrichment quality depends on the LLM: a stronger model produces better contextual descriptions and more accurate link decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What is the LoCoMo benchmark?
&lt;/h3&gt;

&lt;p&gt;LoCoMo (Long Conversational Memory) is a dataset of long-form multi-session conversations designed to test whether memory systems can recall facts and relationships across extended interactions. The multi-hop subset specifically tests questions that require connecting two or more stored facts. It is the primary benchmark used in the A-Mem paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What is memory evolution and when does it trigger?
&lt;/h3&gt;

&lt;p&gt;Memory evolution is the process by which A-Mem updates the contextual description of an existing note when a new, related note is added. It triggers when the system determines — via LLM evaluation — that the new memory meaningfully changes the interpretation of an existing linked memory. In practice, this is most useful in long-running agents where knowledge compounds over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A-Mem (NeurIPS 2025, arXiv:2502.12110) builds structured, evolving memory graphs for LLM agents using Zettelkasten-inspired note construction.&lt;/li&gt;
&lt;li&gt;The three core operations — Note Construction, Link Generation, Memory Evolution — happen at write time, not retrieval time.&lt;/li&gt;
&lt;li&gt;On the LoCoMo benchmark multi-hop tasks, A-Mem achieves roughly 5.8x better ROUGE-L than the standard vector baseline with GPT-4o-mini.&lt;/li&gt;
&lt;li&gt;Storage uses ChromaDB for vector similarity and NetworkX for graph traversal, giving both similarity search and relationship-aware retrieval.&lt;/li&gt;
&lt;li&gt;The write latency cost (LLM call per memory) is real: A-Mem is not a drop-in replacement for RAG. It is a deliberate upgrade for agents where multi-session, multi-hop reasoning quality matters.&lt;/li&gt;
&lt;li&gt;The codebase is MIT-licensed on GitHub and installable from source.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;A-Mem solves the multi-hop memory problem that vanilla RAG cannot — by making memory organization agentic at write time rather than patchwork at query time. If your agent needs to reason across sessions and chain related facts reliably, the architecture is worth the added write latency. For simpler recall tasks, stick with Mem0 or a plain vector store.&lt;/p&gt;

</description>
      <category>agentmemory</category>
      <category>llmagents</category>
      <category>zettelkasten</category>
      <category>chromadb</category>
    </item>
    <item>
      <title>Cloudflare Project Think: Durable Agent Runtime Guide</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Wed, 13 May 2026 01:13:13 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/cloudflare-project-think-durable-agent-runtime-guide-7id</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/cloudflare-project-think-durable-agent-runtime-guide-7id</guid>
      <description>&lt;p&gt;Most AI agents on serverless platforms share the same fatal flaw: they can't survive a restart. If the underlying worker crashes or cold-starts mid-task, the agent's progress disappears. The typical workaround is to keep tasks short and stateless — which means you cannot run a 10-minute research loop, a multi-file refactor, or an autonomous investigation that makes 50 external calls.&lt;/p&gt;

&lt;p&gt;Cloudflare's Project Think, announced during Agents Week 2026 (April 2026), is a direct answer to that constraint. It ships a set of primitives — fiber checkpointing, sub-agents, a persistent Session API, and a 5-tier execution ladder — all wired into an opinionated base class (&lt;code&gt;@cloudflare/think&lt;/code&gt;) that runs on Durable Objects.&lt;/p&gt;

&lt;p&gt;Effloow Lab inspected the SDK packages, confirmed installability, and traced the API surface from official docs and the open-source &lt;code&gt;cloudflare/agents&lt;/code&gt; repository. The following is a source-based guide to how Project Think works and when to use it. See &lt;code&gt;data/lab-runs/cloudflare-project-think-durable-agent-runtime-2026.md&lt;/code&gt; for the full evidence note.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Serverless Agents Break — and Why Project Think Fixes It
&lt;/h2&gt;

&lt;p&gt;A standard Cloudflare Worker is a request handler: it starts, does work, returns a response, and dies. Cloudflare Workflows added durable multi-step execution, but the state machine is managed outside your code and requires a separate infrastructure primitive.&lt;/p&gt;

&lt;p&gt;Project Think takes a different approach. Each agent runs inside a &lt;strong&gt;Durable Object&lt;/strong&gt; — a stateful micro-server with its own SQLite database, WebSocket connections, and scheduling. That alone gives agents persistence. But Project Think goes further by introducing &lt;strong&gt;fibers&lt;/strong&gt;: durable invocations that can checkpoint their own instruction pointer directly into the co-located SQLite database.&lt;/p&gt;

&lt;p&gt;The practical result: an agent can run a 30-step task, checkpoint after each step, survive a server restart, and resume exactly where it left off — without any external workflow orchestrator.&lt;/p&gt;

&lt;p&gt;This is the critical architectural distinction from Cloudflare Dynamic Workers (covered in an &lt;a href="https://dev.to/articles/cloudflare-dynamic-workers-ai-sandbox-guide-2026"&gt;earlier Effloow article on Dynamic Workers&lt;/a&gt;), which handle sandboxed code execution but are stateless by design. Project Think layers durable execution on top of the full Cloudflare platform stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Primitives of Project Think
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fibers — Checkpointed Execution
&lt;/h3&gt;

&lt;p&gt;The fiber is the foundational primitive. Unlike a regular async function, a fiber can call &lt;code&gt;ctx.stash()&lt;/code&gt; to serialize the current state of its local variables into SQLite. If the Durable Object restarts, &lt;code&gt;runFiber&lt;/code&gt; rehydrates from the last stash point.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;runFiber&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cloudflare/think&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchAgent&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Think&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;runFiber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;searchWeb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;sources&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;         &lt;span class="c1"&gt;// checkpoint 1&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;summaries&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// checkpoint 2&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;synthesize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;ctx.stash()&lt;/code&gt; call writes to the Durable Object's SQLite database. On resume, the fiber fast-forwards to the last stash point. For long-horizon tasks — multi-file code reviews, iterative search loops, automated report generation — this removes the "start over" failure mode entirely.&lt;/p&gt;

&lt;p&gt;Fibers also include automatic keepalive for long-running operations and handle non-deterministic workloads that would time out in a standard Worker context.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Sub-Agents (Facets) — Isolated Child Agents with Typed RPC
&lt;/h3&gt;

&lt;p&gt;Project Think supports spawning child agents as &lt;strong&gt;facets&lt;/strong&gt; — child Durable Objects colocated with the parent on the same machine. Each facet has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its own isolated SQLite database (no shared state)&lt;/li&gt;
&lt;li&gt;Its own execution context and fiber support&lt;/li&gt;
&lt;li&gt;A typed RPC stub returned to the parent for method calls
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Parent agent spawning a specialist sub-agent&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spawnFacet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data-extractor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;DataExtractorAgent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structured&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;extractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spawnFacet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;validator&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ValidationAgent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;structured&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is more predictable than passing messages through a shared queue. Because the facet RPC is typed, TypeScript catches mismatches at compile time. And because facets are colocated, the latency for inter-agent calls is dramatically lower than network-based agent-to-agent communication.&lt;/p&gt;

&lt;p&gt;Facets are useful when you need to decompose a task into specialist roles — a researcher, a writer, a fact-checker — without those roles sharing any mutable state.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Session API — Relational Conversation Trees
&lt;/h3&gt;

&lt;p&gt;Standard chat agent implementations append messages to a flat array. That works for simple Q&amp;amp;A but breaks when you need to explore alternatives without polluting the main reasoning path.&lt;/p&gt;

&lt;p&gt;Project Think's Session API stores messages as a &lt;strong&gt;relational tree&lt;/strong&gt;, with each message carrying a &lt;code&gt;parent_id&lt;/code&gt;. This enables three capabilities that flat-list approaches cannot support:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forking&lt;/strong&gt;: The agent can branch off a conversation node to explore an alternative without modifying the main path. If the alternative fails, the original path is untouched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-destructive compaction&lt;/strong&gt;: Rather than truncating context when the window fills, the Session API creates a compaction overlay — a summary that sits beside the original messages without replacing them. The full history is still queryable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full-text search&lt;/strong&gt;: FTS5 indexing over all stored messages lets the agent retrieve relevant earlier context without re-reading the entire history into the LLM context window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongHorizonAgent&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Think&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;configureSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a thorough technical researcher.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;contextBlocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DOMAIN_KNOWLEDGE&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All session storage runs on the Durable Object's local SQLite — no external vector database required for the conversation layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Execution Ladder — Graduated Code Trust
&lt;/h3&gt;

&lt;p&gt;One of Project Think's most distinctive ideas is the &lt;strong&gt;execution ladder&lt;/strong&gt;: a tiered system of code execution environments that agents escalate through based on the trust level required by a task.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Package / API&lt;/th&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Trust Level&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Workspace&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cloudflare/shell&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Durable filesystem (SQLite + R2)&lt;/td&gt;
&lt;td&gt;Fully trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Dynamic Worker&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cloudflare/codemode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sandboxed V8 isolate, no network&lt;/td&gt;
&lt;td&gt;LLM-generated code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cloudflare/worker-bundler&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fetch npm pkgs, esbuild, load into DW&lt;/td&gt;
&lt;td&gt;Third-party packages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Browser&lt;/td&gt;
&lt;td&gt;Cloudflare Browser Run&lt;/td&gt;
&lt;td&gt;Navigate, click, extract&lt;/td&gt;
&lt;td&gt;Web content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Sandbox&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cloudflare/sandbox-sdk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full Linux env, git, cargo, npm test&lt;/td&gt;
&lt;td&gt;Untrusted workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Agents do not jump directly to Tier 4 for every task. A simple data transformation can run in Tier 1 (a sandboxed V8 isolate that starts in milliseconds). A task requiring npm packages escalates to Tier 2. A task that needs to test a full Rust codebase goes to Tier 4.&lt;/p&gt;

&lt;p&gt;The ladder enforces the principle of least privilege: agents operate at the lowest tier that can handle the task, escalating only when needed. This keeps the security surface small and execution fast for common cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Self-Authored Extensions — Agents Writing Their Own Tools
&lt;/h3&gt;

&lt;p&gt;The final primitive is the most experimental: agents can write their own tools at runtime. An agent inspects a task, decides it needs a capability it doesn't have, generates a tool implementation, and loads it into a Dynamic Worker for execution — all within the same session.&lt;/p&gt;

&lt;p&gt;This is not the same as calling an external tool-use API. The agent generates actual TypeScript code, bundles it with &lt;code&gt;@cloudflare/worker-bundler&lt;/code&gt;, and executes it in a Tier 1 or Tier 2 environment. The generated tool becomes part of the agent's toolkit for the duration of the session.&lt;/p&gt;

&lt;p&gt;In practice, this is useful for tasks where the required transformation or extraction logic cannot be fully specified in advance — for example, parsing a novel API response format or implementing a domain-specific calculation that varies per client.&lt;/p&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;@cloudflare/think&lt;/code&gt; Base Class
&lt;/h2&gt;

&lt;p&gt;All five primitives are exposed through the &lt;code&gt;Think&lt;/code&gt; base class, which handles the full chat lifecycle: agentic loop, message persistence, streaming, tool execution, stream resumption, and extensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Minimal example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Think&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cloudflare/think&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createWorkersAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;workers-ai-provider&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyAgent&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Think&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;getModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createWorkersAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AI&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// Workers AI free tier includes @cf/meta/llama-3.3-70b-instruct&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cf/meta/llama-3.3-70b-instruct&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;configureSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;wrangler.toml&lt;/code&gt; binding wires the Durable Object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[[durable_objects.bindings]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"MY_AGENT"&lt;/span&gt;
&lt;span class="py"&gt;class_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"MyAgent"&lt;/span&gt;

&lt;span class="nn"&gt;[[migrations]]&lt;/span&gt;
&lt;span class="py"&gt;tag&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"v1"&lt;/span&gt;
&lt;span class="py"&gt;new_sqlite_classes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"MyAgent"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cloudflare/agents&lt;/code&gt; GitHub repository contains 30+ self-contained example agents demonstrating fibers, facets, sessions, and execution ladder integration. The &lt;code&gt;docs/think/index.md&lt;/code&gt; file in that repository is the most complete reference beyond the official documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Think vs. Dynamic Workers vs. Cloudflare Workflows
&lt;/h2&gt;

&lt;p&gt;Developers familiar with Cloudflare's existing primitives will have one question: how does this fit alongside Dynamic Workers and Workflows?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Workers&lt;/strong&gt; (covered in &lt;a href="https://dev.to/articles/cloudflare-dynamic-workers-ai-sandbox-guide-2026"&gt;Effloow's Dynamic Workers guide&lt;/a&gt;) are stateless sandboxed V8 isolates for executing LLM-generated code. They correspond to Tier 1 of Project Think's execution ladder. They are not durable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare Workflows&lt;/strong&gt; provide durable multi-step execution, but the state machine lives outside your Worker. Steps are defined declaratively, and Cloudflare's infrastructure manages replay. This is powerful for ETL pipelines and scheduled jobs, but the agent has no access to its own state between steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Think&lt;/strong&gt; puts the state machine inside the agent itself via fibers and the co-located SQLite database. The agent is both the executor and the state store. This gives more flexibility for agentic patterns where the next step depends on reasoning about the previous step's output — not just a declared execution graph.&lt;/p&gt;

&lt;p&gt;The right choice depends on your workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateless code execution only → Dynamic Workers&lt;/li&gt;
&lt;li&gt;Declarative multi-step pipeline with retry guarantees → Cloudflare Workflows&lt;/li&gt;
&lt;li&gt;Autonomous agents with reasoning-driven state transitions → Project Think&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Mistakes When Building Durable Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Checkpoint too infrequently.&lt;/strong&gt; If you only call &lt;code&gt;ctx.stash()&lt;/code&gt; at the end of a multi-minute operation, a crash at minute 8 means re-running 8 minutes of work. Checkpoint after each meaningful unit — after a web request, after a parsing step, after a tool call returns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Share state through the parent's SQLite instead of facet isolation.&lt;/strong&gt; Facets exist precisely so specialist sub-agents do not see each other's state. Routing everything through the parent's database re-introduces the coupling you were trying to avoid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Escalate to Tier 4 for every code execution task.&lt;/strong&gt; Cloudflare Sandbox (Tier 4) has more overhead than Dynamic Workers (Tier 1). Use Tier 4 only when the task genuinely needs a Linux environment — git operations, compiled languages, or full test runners.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignore compaction until the context window overflows.&lt;/strong&gt; Plan compaction as a regular scheduled step, not an emergency measure. The Session API's non-destructive overlay lets you compact early and often without losing history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treat &lt;code&gt;@cloudflare/think&lt;/code&gt; as production-stable.&lt;/strong&gt; As of May 2026, Project Think is in &lt;strong&gt;experimental preview&lt;/strong&gt;. The package version is &lt;code&gt;0.0.1-experimental.x&lt;/code&gt;. The API surface is intended to be stable, but Cloudflare explicitly says it will continue to evolve. Treat it as early-adopter infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Application: When to Choose Project Think
&lt;/h2&gt;

&lt;p&gt;Project Think is well-suited to agent workloads that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceed Cloudflare Worker's standard CPU time limits&lt;/li&gt;
&lt;li&gt;Require specialist sub-tasks that should not share state&lt;/li&gt;
&lt;li&gt;Need to explore multiple reasoning paths without forking the entire agent&lt;/li&gt;
&lt;li&gt;Generate and execute code as part of their reasoning loop&lt;/li&gt;
&lt;li&gt;Must maintain conversation history across days or weeks for personalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is less well-suited to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple request/response pipelines (standard Worker is simpler)&lt;/li&gt;
&lt;li&gt;Batch jobs without agent reasoning (Cloudflare Workflows is more appropriate)&lt;/li&gt;
&lt;li&gt;Workloads requiring GPUs or dedicated compute (no GPU support on Workers)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  FAQ
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Q: Does Project Think work with any LLM or only Workers AI?
&lt;/h3&gt;

&lt;p&gt;Project Think's &lt;code&gt;Think&lt;/code&gt; base class is model-agnostic — &lt;code&gt;getModel()&lt;/code&gt; can return any model compatible with the Vercel AI SDK's provider interface. Workers AI (&lt;code&gt;workers-ai-provider&lt;/code&gt;) is the zero-egress option for Cloudflare-hosted models, but you can wire in OpenAI, Anthropic, or any other provider via the AI SDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What's the cost of fiber checkpointing?
&lt;/h3&gt;

&lt;p&gt;Each &lt;code&gt;ctx.stash()&lt;/code&gt; writes to the Durable Object's SQLite database — a local write, not a network call. The overhead is the same as any SQLite write on the same machine. Cloudflare does not charge extra for SQLite writes beyond the standard Durable Object storage pricing. For most agents, checkpointing 10–50 times per session adds negligible cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can sub-agents (facets) span multiple geographic regions?
&lt;/h3&gt;

&lt;p&gt;Facets are colocated with the parent Durable Object on the same machine by design — this is what makes their typed RPC low-latency. They do not span regions. If you need geographically distributed agent coordination, that requires a different architecture (message queues or service bindings across Workers).&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is Project Think production-ready in May 2026?
&lt;/h3&gt;

&lt;p&gt;No. It is in experimental preview. Cloudflare describes the API surface as stable but explicitly notes it will evolve. For production workloads, monitor the &lt;code&gt;cloudflare/agents&lt;/code&gt; GitHub repository and the Cloudflare changelog for GA announcements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does the Session API relate to a vector database?
&lt;/h3&gt;

&lt;p&gt;The Session API is not a semantic search layer — it is a relational message store with FTS5 full-text search. It handles conversation history, forking, and compaction well. For semantic retrieval over large external knowledge bases, you still need a vector database (Cloudflare Vectorize, Pinecone, etc.). They are complementary, not alternatives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Project Think solves the fundamental durability problem in serverless AI agents: agents can now checkpoint progress and survive restarts without re-running from the beginning.&lt;/li&gt;
&lt;li&gt;The five core primitives — fibers, sub-agents (facets), the Session API, the execution ladder, and self-authored extensions — address distinct failure modes in long-horizon agentic workloads.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@cloudflare/think&lt;/code&gt; is the opinionated base class that wires all primitives together; it is model-agnostic and works with any Vercel AI SDK provider.&lt;/li&gt;
&lt;li&gt;The 5-tier execution ladder enforces least-privilege code execution, keeping fast tasks in lightweight V8 isolates and escalating to full Linux environments only when necessary.&lt;/li&gt;
&lt;li&gt;As of May 2026, Project Think is in experimental preview. The API is intended to be stable but will continue to evolve — suitable for early adoption and evaluation, not yet for production-critical deployments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;Project Think is the most complete answer Cloudflare has given to "how do I run an AI agent that lasts longer than a serverless function?" The fiber + facet + session combination solves real architectural problems, not theoretical ones. Get familiar with it now — when it reaches GA, it will become the default pattern for serious agent infrastructure on the Workers platform.&lt;/p&gt;

</description>
      <category>cloudflare</category>
      <category>aiagents</category>
      <category>durableexecution</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Building a Python MCP Server in 30 Minutes with FastMCP 3.x — One @tool Decorator Is All You Need</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Tue, 12 May 2026 09:42:26 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/building-a-python-mcp-server-in-30-minutes-with-fastmcp-3x-one-tool-decorator-is-all-you-need-58oh</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/building-a-python-mcp-server-in-30-minutes-with-fastmcp-3x-one-tool-decorator-is-all-you-need-58oh</guid>
      <description>&lt;p&gt;Building an MCP (Model Context Protocol) server from scratch is more work than it looks. stdio transport handling, JSON-RPC 2.0 serialization, handler registration — if you've gone through &lt;a href="https://dev.to/en/blog/en/mcp-server-build-practical-guide-2026"&gt;implementing an MCP server with Streamable HTTP&lt;/a&gt;, you know the moment where you think: "I just want to add one AI tool, why does this need so much boilerplate?"&lt;/p&gt;

&lt;p&gt;FastMCP exists to fix that. Today, I installed it in a sandbox via pip and had a working MCP server running in under 30 minutes. Here's what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  What FastMCP Actually Is
&lt;/h2&gt;

&lt;p&gt;FastMCP is a high-level layer on top of the MCP Python SDK — similar to how Express.js wraps Node's http module. The official tagline: "The fast, Pythonic way to build MCP servers and clients." After hands-on testing, I'd say that's accurate.&lt;/p&gt;

&lt;p&gt;Version check first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;fastmcp version
&lt;span class="go"&gt;
FastMCP version:   3.2.4
MCP version:       1.27.0
Python version:    3.12.8
Platform:          macOS-15.6-arm64
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My backlog had this noted as "v2.0," but it's already at 3.x. The MCP protocol itself is at 1.27.0. This version gap means one thing: the API has changed, and docs don't always reflect that. I had to verify things directly by running code rather than trusting older articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install and First Server — This Really Is All of It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastmcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installation takes about ten seconds. Here's the first server I built in the sandbox — two weather-related tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns the current time.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Current time (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_temp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;celsius&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Converts Celsius to Fahrenheit and Kelvin.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;celsius&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;celsius&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kelvin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;celsius&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;273.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data://server-info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;server_info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns server info.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FastMCP 3.x weather server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;weather_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Weather analysis prompt template.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze the weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; and recommend appropriate clothing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# runs in stdio mode
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Add a decorator to a Python function and it becomes an MCP tool. Type hints are automatically converted to JSON Schema and passed to Claude.&lt;/p&gt;

&lt;p&gt;Inspect the server with the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;fastmcp inspect server.py
&lt;span class="go"&gt;
Server
  Name:         weather-tools
  Version:      1.0.0
  Generation:   2

Components
  Tools:        2
  Prompts:      1
  Resources:    1
  Templates:    0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Three Building Blocks: Tool, Resource, Prompt
&lt;/h2&gt;

&lt;p&gt;FastMCP has three core abstractions. Getting these right is what makes a well-designed server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/mcp"&gt;@mcp&lt;/a&gt;.tool()&lt;/strong&gt; — A function Claude can directly invoke. It takes parameters, does work, and returns results. Search, compute, file operations, API calls — anything with execution behavior goes here. If I want Claude to interact with my filesystem or an external API, &lt;code&gt;@mcp.tool()&lt;/code&gt; is the answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/mcp"&gt;@mcp&lt;/a&gt;.resource()&lt;/strong&gt; — A read-only data source. Register it with a URI like &lt;code&gt;data://&lt;/code&gt;, &lt;code&gt;file://&lt;/code&gt;, or &lt;code&gt;https://&lt;/code&gt;, and Claude reads it as context. Unlike tools, this is "read" not "execute." Database schemas, config files, documentation — put these here and they flow into Claude's context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/mcp"&gt;@mcp&lt;/a&gt;.prompt()&lt;/strong&gt; — A reusable prompt template. Takes parameters, returns a structured prompt message. Works like a slash command in Claude Desktop or claude.ai.&lt;/p&gt;

&lt;p&gt;The Tool vs Resource distinction trips people up. My rule: &lt;strong&gt;if it has side effects, it's a Tool; if it's read-only, it's a Resource&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sending Progress Updates with Context
&lt;/h2&gt;

&lt;p&gt;When a tool runs a long operation, you can stream progress back to the client in real time. Add a &lt;code&gt;Context&lt;/code&gt; parameter and FastMCP injects it automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns a list of files in the specified directory.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reading directory: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# streams log to client
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report_progress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Directory not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran this in the sandbox and confirmed that &lt;code&gt;ctx.info()&lt;/code&gt; actually streams to the client side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INFO  Received INFO from server: {'msg': 'Reading directory: /tmp', 'extra': None}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this works inside Claude Desktop, users see real-time feedback about what the tool is doing. It's a meaningful UX improvement for long-running operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing with FastMCP Client
&lt;/h2&gt;

&lt;p&gt;You don't need an actual Claude Desktop to test. FastMCP provides an in-process client. This is also handy when implementing &lt;a href="https://dev.to/en/blog/en/claude-code-agentic-workflow-patterns-5-types"&gt;agentic workflow patterns&lt;/a&gt; — tests stay self-contained.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Searches for a pattern in text.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;word_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns word count, character count, and line count.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;words&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Registered tools (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FastMCP is fast. FastMCP is easy.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FastMCP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;search_text result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# → {'pattern': 'FastMCP', 'matches': ['FastMCP', 'FastMCP'], 'count': 2}
&lt;/span&gt;
        &lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;word_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello World from FastMCP 3.x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;word_count result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# → {'words': 5, 'characters': 27, 'lines': 1}
&lt;/span&gt;
&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access the structured return value directly through &lt;code&gt;result.data&lt;/code&gt;. Ran this in the sandbox — zero errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Ffastmcp-python-mcp-server-build-guide-2026%2Fcli-output.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..%2F..%2F..%2Fassets%2Fblog%2Ffastmcp-python-mcp-server-build-guide-2026%2Fcli-output.png" alt="FastMCP CLI output — fastmcp version, inspect, tool call test"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  HTTP Deployment for Remote Access
&lt;/h2&gt;

&lt;p&gt;Beyond local stdio mode, you can run the server over HTTP. Useful when sharing an MCP server across Cursor instances or deploying remotely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# HTTP mode (default port 8000)
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Or run directly with uvicorn&lt;/span&gt;
uvicorn server:mcp.http_app&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The FastMCP HTTP app is Starlette-based (&lt;code&gt;StarletteWithLifespan&lt;/code&gt; under the hood). That means you can mount it inside a FastAPI app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;http_app&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connecting Claude Desktop to the HTTP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8000/mcp/"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The fastmcp CLI
&lt;/h2&gt;

&lt;p&gt;FastMCP ships with a CLI that I didn't notice at first. Running &lt;code&gt;fastmcp --help&lt;/code&gt; reveals quite a bit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Commands:
  inspect      — Print server component summary
  list         — List registered tools
  call         — Directly call a tool (useful for debugging)
  install      — Auto-register to Claude Desktop / Cursor
  dev          — Run dev server with hot reload
  discover     — Find MCP servers configured in editors
  run          — Start the server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;fastmcp install server.py --client claude&lt;/code&gt; is supposed to automatically patch your Claude Desktop config. No more hand-editing JSON. I couldn't verify this directly since I don't have Claude Desktop installed in my sandbox environment — check the official docs for exactly which config path it touches.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;fastmcp dev&lt;/code&gt; command seems more immediately useful: hot reload during development means no manual server restarts as you iterate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Type Hints Are Your API Schema
&lt;/h2&gt;

&lt;p&gt;The feature I found most impressive: type hints become JSON Schema automatically. With the raw SDK, you write an &lt;code&gt;inputSchema&lt;/code&gt; dict for every tool by hand. FastMCP delegates that to Python's type system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FileFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;extension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;min_size_kb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;exclude_hidden&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_files_advanced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FileFilter&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns filtered and sorted file listing.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scandir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exclude_hidden&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extension&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;size_kb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;st_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;size_kb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_size_kb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size_kb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size_kb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;st_mtime&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;key_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size_kb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register this with &lt;code&gt;@mcp.tool()&lt;/code&gt; and Claude automatically knows the structure of &lt;code&gt;FileFilter&lt;/code&gt;, the valid values for &lt;code&gt;sort_by&lt;/code&gt; (name/size/modified), and &lt;code&gt;limit&lt;/code&gt;'s default. Pydantic models work too, so complex nested inputs don't need any extra wiring.&lt;/p&gt;

&lt;p&gt;The docstring becomes the tool description Claude sees. A well-written docstring is the usage manual you send to the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real-World Example: Code Analysis MCP Server
&lt;/h2&gt;

&lt;p&gt;Here's something I'd actually ship — a Python code analysis tool server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code-analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_python_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyzes a Python file with AST and returns functions and classes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyzing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionDef&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docstring&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_docstring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClassDef&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineno&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report_progress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;functions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_todo_comments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Finds TODO/FIXME/HACK comments in a file.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;markers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TODO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FIXME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HACK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XXX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;markers&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;markers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data://project-structure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;project_structure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns Python file list in the current directory.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;py_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;dirs&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dirs&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;py_files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;py_files&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect this to Claude Desktop and you can ask in plain English: "Show me all classes in this file" or "How many TODO comments are there?" No Python required from the user's side. That's the point of an MCP tool server.&lt;/p&gt;

&lt;h2&gt;
  
  
  FastMCP vs Raw MCP SDK
&lt;/h2&gt;

&lt;p&gt;Compare with &lt;a href="https://dev.to/en/blog/en/mcp-server-build-practical-guide-2026"&gt;building a Streamable HTTP MCP server directly&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;Raw SDK approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a &lt;code&gt;Server&lt;/code&gt; instance&lt;/li&gt;
&lt;li&gt;Register &lt;code&gt;@server.list_tools()&lt;/code&gt; and &lt;code&gt;@server.call_tool()&lt;/code&gt; separately&lt;/li&gt;
&lt;li&gt;Manually parse input parameters&lt;/li&gt;
&lt;li&gt;Combine &lt;code&gt;anyio.run()&lt;/code&gt; + &lt;code&gt;stdio_server()&lt;/code&gt; to run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FastMCP approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One &lt;code&gt;FastMCP&lt;/code&gt; instance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@mcp.tool()&lt;/code&gt; registers functions directly as tools&lt;/li&gt;
&lt;li&gt;JSON Schema auto-generated from type hints&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcp.run()&lt;/code&gt; — one line&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fewer lines is secondary. The real point: &lt;strong&gt;you focus on business logic, not transport mechanics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That said, FastMCP trades off control for convenience. If you need to customize low-level MCP messages, use non-standard transports, or access MCP features FastMCP hasn't exposed, you'll end up digging under the abstraction. In those cases, reach for the MCP Python SDK directly — like in &lt;a href="https://dev.to/en/blog/en/mcp-code-execution-practical-implementation"&gt;MCP code execution scenarios&lt;/a&gt; that need finer control.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use FastMCP
&lt;/h2&gt;

&lt;p&gt;My practical take:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use FastMCP when&lt;/strong&gt;: You're building a server for standard MCP clients (Claude, Cursor, VS Code). Especially for rapid AI tool prototyping, or exposing existing Python functions as MCP tools for your team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use the raw SDK when&lt;/strong&gt;: You need custom transport, non-standard message formats, or MCP features FastMCP hasn't wrapped. Performance-critical paths where every layer matters.&lt;/p&gt;

&lt;p&gt;One honest complaint about FastMCP: 3.x moved faster than the docs. I found &lt;code&gt;get_tools()&lt;/code&gt; referenced in older content but it doesn't exist — &lt;code&gt;list_tools()&lt;/code&gt; is the actual method. Trust &lt;code&gt;dir(mcp)&lt;/code&gt; and the source code over older blog posts. Including mine.&lt;/p&gt;

&lt;p&gt;Before going to production, also look at &lt;a href="https://dev.to/en/blog/en/mcp-gateway-agent-traffic-control"&gt;MCP Gateway for controlling which tools agents can call&lt;/a&gt;. Once you've exposed a server, you'll want some control over what actually gets invoked and when.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;FastMCP 3.x is the fastest path for a Python developer to ship an MCP server. One &lt;code&gt;pip install fastmcp&lt;/code&gt;, one &lt;code&gt;@mcp.tool()&lt;/code&gt; decorator, one &lt;code&gt;mcp.run()&lt;/code&gt;. Under 30 minutes to a working AI tool server that Claude Desktop can call.&lt;/p&gt;

&lt;p&gt;MCP's ecosystem is maturing fast. My &lt;a href="https://dev.to/en/blog/en/mcp-servers-toolkit-introduction"&gt;MCP server toolkit&lt;/a&gt; covers what's already available before you build your own. Check there first — but if you need something custom, FastMCP makes building it genuinely quick.&lt;/p&gt;

&lt;p&gt;Verified versions today: FastMCP 3.2.4, MCP 1.27.0. This space moves fast — check the &lt;a href="https://gofastmcp.com" rel="noopener noreferrer"&gt;FastMCP official docs&lt;/a&gt; for the latest API before you ship anything.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>fastmcp</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>Vercel AI SDK 6: First-Class Agents for TypeScript</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Tue, 12 May 2026 09:33:54 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/vercel-ai-sdk-6-first-class-agents-for-typescript-242p</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/vercel-ai-sdk-6-first-class-agents-for-typescript-242p</guid>
      <description>&lt;p&gt;When Vercel released AI SDK 6 on December 22, 2025, the headline feature was not a new model integration or a faster streaming API. It was a different kind of addition: agents became a first-class primitive in the SDK, not an afterthought patched on top of &lt;code&gt;generateText&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Effloow Lab installed &lt;code&gt;ai@latest&lt;/code&gt; (currently &lt;strong&gt;6.0.177&lt;/strong&gt;) and inspected the package exports, constructor signatures, and available methods directly. This guide covers what actually changed, what the new &lt;code&gt;ToolLoopAgent&lt;/code&gt; API looks like in practice, and how to move existing code from SDK 5.x to 6.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters: The Shift from Loops to Primitives
&lt;/h2&gt;

&lt;p&gt;In AI SDK 5.x, building an agent meant writing your own loop. You called &lt;code&gt;generateText&lt;/code&gt;, checked for tool calls in the response, executed those tools, passed results back, and repeated until done — all in application code that you maintained yourself.&lt;/p&gt;

&lt;p&gt;The result was that every team ended up writing a slightly different, slightly buggy version of the same control loop. Edge cases around retries, step limits, streaming, and type safety accumulated per project.&lt;/p&gt;

&lt;p&gt;SDK 6 formalizes this pattern. &lt;code&gt;ToolLoopAgent&lt;/code&gt; handles the loop. Your code defines &lt;em&gt;what&lt;/em&gt; the agent can do and &lt;em&gt;when&lt;/em&gt; it should stop. The runtime handles &lt;em&gt;how&lt;/em&gt; it executes.&lt;/p&gt;

&lt;p&gt;This is not just a convenience wrapper. The &lt;code&gt;Agent&lt;/code&gt; interface and &lt;code&gt;ToolLoopAgent&lt;/code&gt; class are designed so that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The same agent definition works in API routes, background jobs, and UI streaming contexts without modification&lt;/li&gt;
&lt;li&gt;TypeScript type inference flows end-to-end from tool schemas to UI message types&lt;/li&gt;
&lt;li&gt;Human-in-the-loop approval, stop conditions, and structured output are all opt-in per-agent rather than bolt-ons&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation and Current Version
&lt;/h2&gt;

&lt;p&gt;The package name is still &lt;code&gt;ai&lt;/code&gt;. Installing the latest v6 release:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;ai@latest
&lt;span class="c"&gt;# or&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;ai@^6.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Effloow Lab confirmed that &lt;code&gt;npm install ai@latest&lt;/code&gt; pulls &lt;strong&gt;10 packages&lt;/strong&gt; with zero vulnerabilities. The install is lean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Peer dependency:&lt;/strong&gt; Zod 3.x or Zod 4.x are both supported.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"zod"&lt;/span&gt;: &lt;span class="s2"&gt;"^3.25.76 || ^4.1.8"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are on Zod 3, nothing changes. If you want Zod 4, SDK 6 now accepts it natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Change: ToolLoopAgent
&lt;/h2&gt;

&lt;p&gt;The central addition is &lt;code&gt;ToolLoopAgent&lt;/code&gt;. In SDK 5.x, an &lt;code&gt;Experimental_Agent&lt;/code&gt; class existed (it is still exported in 6.0.177 as a compatibility shim, but it is deprecated). In SDK 6, the production API is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ToolLoopAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;anthropic&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@ai-sdk/anthropic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;searchTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search documentation for a given query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// your search implementation&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful developer assistant.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;searchTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;ToolLoopAgent&lt;/code&gt; constructor accepts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;LanguageModel&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;required&lt;/td&gt;
&lt;td&gt;The model to use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;instructions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tools&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Record&amp;lt;string, Tool&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Available tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stopWhen&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;`StopCondition&lt;/td&gt;
&lt;td&gt;StopCondition[]`&lt;/td&gt;
&lt;td&gt;&lt;code&gt;stepCountIs(20)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;output&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Output&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Structured output format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;toolChoice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ToolChoice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Force/auto/none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;topP&lt;/code&gt;, etc.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Model parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;onStepFinish&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;callback&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Hook for each step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;onFinish&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;callback&lt;/td&gt;
&lt;td&gt;optional&lt;/td&gt;
&lt;td&gt;Hook on completion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The instance exposes two methods: &lt;code&gt;generate()&lt;/code&gt; (returns &lt;code&gt;Promise&amp;lt;GenerateTextResult&amp;gt;&lt;/code&gt;) and &lt;code&gt;stream()&lt;/code&gt; (returns a streaming result). Both accept the same parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What does the useEffect hook do?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent Is an Interface
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ToolLoopAgent&lt;/code&gt; implements the &lt;code&gt;Agent&lt;/code&gt; interface. This matters because you can create custom agent implementations that slot into the same API surface. If you need a routing agent, a retrieval-augmented agent, or an agent that checks a policy before every step, you implement &lt;code&gt;Agent&lt;/code&gt; directly rather than subclassing &lt;code&gt;ToolLoopAgent&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This design rewards dependency injection: pass an &lt;code&gt;Agent&lt;/code&gt; type through your application, and the concrete implementation can swap out in tests or staging environments without changing call sites.&lt;/p&gt;

&lt;h2&gt;
  
  
  Human-in-the-Loop: needsApproval
&lt;/h2&gt;

&lt;p&gt;Any tool can be marked as requiring human approval before execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deployTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Deploy a new version to production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;staging&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;needsApproval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// deployment logic&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;deployed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;needsApproval: true&lt;/code&gt;, the tool call pauses and surfaces a pending approval in the UI stream. Your application handles the approval UI. The agent resumes execution only after the user confirms.&lt;/p&gt;

&lt;p&gt;You can also pass an async function to &lt;code&gt;needsApproval&lt;/code&gt; for conditional approval logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;needsApproval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;toolInput&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;toolInput&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern makes it straightforward to build agents that handle low-risk actions automatically (staging deploys, read-only lookups) while requiring human sign-off on high-risk ones (production deploys, account deletions).&lt;/p&gt;

&lt;h2&gt;
  
  
  Structured Output from Agents
&lt;/h2&gt;

&lt;p&gt;In SDK 5.x, getting structured data back from a multi-step agent required parsing the final text output yourself. SDK 6 introduces the &lt;code&gt;Output&lt;/code&gt; helper that works at the agent level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reportAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dataFetchTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;keyFindings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
      &lt;span class="na"&gt;confidenceScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reportAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Analyze Q1 sales data and provide a structured report.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// result.object is typed as { summary: string; keyFindings: string[]; confidenceScore: number }&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;keyFindings&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supported output types are &lt;code&gt;Output.object()&lt;/code&gt;, &lt;code&gt;Output.array()&lt;/code&gt;, &lt;code&gt;Output.choice()&lt;/code&gt;, &lt;code&gt;Output.json()&lt;/code&gt;, and &lt;code&gt;Output.text()&lt;/code&gt;. The model calls tools as many times as needed, then produces the final structured output in one pass at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Agent Patterns: Agents as Tools
&lt;/h2&gt;

&lt;p&gt;The most powerful pattern in SDK 6 is composing agents into multi-agent systems by wrapping them as tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summarizeAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the given text in 3 bullet points.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;factCheckAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fact-check the claims in the given text against the web.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;webSearchTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Wrap subagents as tools&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summarizeTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize a long block of text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;summarizeAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;factCheckTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fact-check claims in a passage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;passage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;passage&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;factCheckAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;passage&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Orchestrator agent&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orchestrator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Research topics thoroughly: summarize sources, then fact-check key claims.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;summarizeTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;factCheck&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;factCheckTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Subagents call &lt;code&gt;.generate()&lt;/code&gt; (which returns the final result) rather than &lt;code&gt;.stream()&lt;/code&gt;. The orchestrator calls tools, gets back text results from subagents, and continues reasoning.&lt;/p&gt;

&lt;p&gt;This decomposition keeps each agent focused on a narrow task, makes individual agents testable in isolation, and avoids the context bloat that comes from cramming all capabilities into one massive system prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop Conditions
&lt;/h2&gt;

&lt;p&gt;By default, &lt;code&gt;ToolLoopAgent&lt;/code&gt; stops after 20 steps. You can configure this explicitly or combine multiple conditions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;stepCountIs&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;searchTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;stopWhen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stepCountIs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;prepareStep&lt;/code&gt; callback gives you per-step control if you need dynamic stop conditions based on intermediate results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;searchTool&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;prepareStep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;steps&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DONE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stopCondition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  MCP Support
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol support ships in a separate package to keep the core &lt;code&gt;ai&lt;/code&gt; bundle lean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @ai-sdk/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SDK 6 adds OAuth authentication handling for HTTP-based MCP servers, plus resources and prompts discovery and elicitation support (server-initiated user input). The &lt;code&gt;createMCPClient&lt;/code&gt; function now handles PKCE, token refresh, and session management transparently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createMCPClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@ai-sdk/mcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createMCPClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-mcp-server.com/mcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;authProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// handles OAuth flow automatically&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it practical to connect agents to external MCP servers (databases, APIs, enterprise systems) without writing OAuth boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from AI SDK 5.x
&lt;/h2&gt;

&lt;p&gt;Most of the migration is mechanical. Vercel provides a codemod that handles the majority of changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @ai-sdk/codemod upgrade v6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key manual changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  Rename Experimental_Agent to ToolLoopAgent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Experimental_Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Experimental_Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ToolLoopAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToolLoopAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the parameter rename: &lt;code&gt;system&lt;/code&gt; becomes &lt;code&gt;instructions&lt;/code&gt;. The default &lt;code&gt;stopWhen&lt;/code&gt; also changed from &lt;code&gt;stepCountIs(1)&lt;/code&gt; to &lt;code&gt;stepCountIs(20)&lt;/code&gt; — if your existing agent relied on single-step behavior, set this explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  CoreMessage to ModelMessage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CoreMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ModelMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// Use convertToModelMessages() (now async) for conversion&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  generateObject / streamObject Deprecation
&lt;/h3&gt;

&lt;p&gt;These functions still work in 6.0.177 but are on the deprecation path. Migrate to &lt;code&gt;generateText&lt;/code&gt;/&lt;code&gt;streamText&lt;/code&gt; with an &lt;code&gt;output&lt;/code&gt; setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Generate a blog post title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Generate a blog post title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token Usage Fields
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachedInputTokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reasoningTokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputTokenDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheReadTokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputTokenDetails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reasoningTokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ignoring the default step count change.&lt;/strong&gt; If you migrated from &lt;code&gt;Experimental_Agent&lt;/code&gt; and assumed single-step behavior, your agent now runs up to 20 steps. This affects cost and latency. Set &lt;code&gt;stopWhen: stepCountIs(1)&lt;/code&gt; explicitly if you need the old behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Calling &lt;code&gt;.stream()&lt;/code&gt; on subagents.&lt;/strong&gt; In multi-agent patterns, subagents called via tool execution should use &lt;code&gt;.generate()&lt;/code&gt;. Streaming only makes sense at the top level where you pipe output to a UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing the async on convertToModelMessages.&lt;/strong&gt; The function became async in v6 to support async &lt;code&gt;Tool.toModelOutput()&lt;/code&gt;. Forgetting &lt;code&gt;await&lt;/code&gt; produces a Promise instead of a message array — a TypeScript type that might not catch this at the call site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Assuming MCP is in the core package.&lt;/strong&gt; &lt;code&gt;createMCPClient&lt;/code&gt; requires &lt;code&gt;@ai-sdk/mcp&lt;/code&gt; separately. The &lt;code&gt;ai&lt;/code&gt; core package does not include MCP to keep bundle size down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not using the codemod first.&lt;/strong&gt; The automated codemod handles renaming, import changes, and common method signature updates. Running it before manual review saves time and reduces manual errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Is AI SDK 6 compatible with Next.js 14 and 15?
&lt;/h3&gt;

&lt;p&gt;Yes. The SDK works with any React framework including Next.js App Router and Pages Router. The &lt;code&gt;ai/react&lt;/code&gt; sub-package provides hooks (&lt;code&gt;useChat&lt;/code&gt;, &lt;code&gt;useCompletion&lt;/code&gt;) that remain unchanged in v6.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use Anthropic Claude models with ToolLoopAgent?
&lt;/h3&gt;

&lt;p&gt;Yes. Any provider supported by the SDK works with &lt;code&gt;ToolLoopAgent&lt;/code&gt;. Install the provider package (&lt;code&gt;@ai-sdk/anthropic&lt;/code&gt;, &lt;code&gt;@ai-sdk/openai&lt;/code&gt;, &lt;code&gt;@ai-sdk/google&lt;/code&gt;) and pass the model to the constructor. Provider-specific tools (Anthropic memory tool, OpenAI file patching, Google Maps grounding) are also accessible through the standard &lt;code&gt;tools&lt;/code&gt; API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Does ToolLoopAgent support streaming to the UI?
&lt;/h3&gt;

&lt;p&gt;Yes. Call &lt;code&gt;.stream()&lt;/code&gt; instead of &lt;code&gt;.generate()&lt;/code&gt;. Use &lt;code&gt;createAgentUIStream&lt;/code&gt; and &lt;code&gt;pipeAgentUIStreamToResponse&lt;/code&gt; to pipe the stream to a Next.js API route or other HTTP handler. The &lt;code&gt;InferAgentUIMessage&amp;lt;typeof myAgent&amp;gt;&lt;/code&gt; TypeScript type infers the correct message shape for type-safe UI components.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What is the difference between stopWhen and prepareStep?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;stopWhen&lt;/code&gt; is a declarative stop condition — "stop after N steps" or "stop when no tool calls remain." &lt;code&gt;prepareStep&lt;/code&gt; is a per-step callback that lets you inspect intermediate results and dynamically modify the agent's behavior (change tools, update instructions, or signal a stop) based on what happened so far.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Are there breaking changes in the OpenAI provider?
&lt;/h3&gt;

&lt;p&gt;Yes. &lt;code&gt;strictJsonSchema&lt;/code&gt; now defaults to &lt;code&gt;true&lt;/code&gt; for OpenAI. This improves JSON reliability but requires stricter Zod schema compliance (no &lt;code&gt;.optional()&lt;/code&gt; on required fields, no &lt;code&gt;.default()&lt;/code&gt;). If you see validation errors after upgrading, check your OpenAI-specific schemas first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;AI SDK 6 is the version where Vercel's SDK stops being a collection of LLM utility functions and starts being an agent framework. The lean install (10 packages), stable MCP support, and type-safe multi-agent composition make it a credible foundation for production TypeScript agent systems in 2026. Run the codemod first, check your stop conditions, and migrate generateObject calls when you have time — the migration is not urgent but the new APIs are cleaner.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ToolLoopAgent&lt;/code&gt;&lt;/strong&gt; is the production API. &lt;code&gt;Experimental_Agent&lt;/code&gt; still exports in 6.0.177 but is deprecated.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Agent interface&lt;/strong&gt; design lets you swap implementations without changing call sites — valuable for testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;needsApproval&lt;/code&gt;&lt;/strong&gt; on individual tools gives you granular human-in-the-loop control without restructuring your agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents as tools&lt;/strong&gt; is the idiomatic multi-agent pattern: wrap &lt;code&gt;agent.generate()&lt;/code&gt; inside a &lt;code&gt;tool()&lt;/code&gt; definition and compose freely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; lives in &lt;code&gt;@ai-sdk/mcp&lt;/code&gt;, separate from core. OAuth handling is now built in.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;codemod&lt;/strong&gt; (&lt;code&gt;npx @ai-sdk/codemod upgrade v6&lt;/code&gt;) handles most migration automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zod 4&lt;/strong&gt; is now supported alongside Zod 3, so you are not blocked from upgrading Zod independently.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>vercel</category>
      <category>aisdk</category>
      <category>typescript</category>
      <category>agents</category>
    </item>
    <item>
      <title>MemMachine: Ground-Truth Memory for AI Agents</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Tue, 12 May 2026 04:15:52 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/memmachine-ground-truth-memory-for-ai-agents-4aj0</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/memmachine-ground-truth-memory-for-ai-agents-4aj0</guid>
      <description>&lt;p&gt;Every time an agent summarizes a conversation to save memory, it loses information. That trade-off has been accepted as unavoidable — LLMs produce long outputs, context windows are finite, and token costs are real. MemMachine, presented in arXiv paper &lt;a href="https://arxiv.org/abs/2604.04853" rel="noopener noreferrer"&gt;2604.04853&lt;/a&gt; (April 2026), rejects that premise. Instead of extracting facts at write time, it stores entire conversational episodes verbatim and does the heavy lifting at retrieval time. The result: 93.0% on LongMemEvalS (ICLR 2025) and approximately 80% fewer input tokens compared to Mem0 under matched conditions.&lt;/p&gt;

&lt;p&gt;This article walks through the architecture, explains why ground-truth preservation changes the memory equation for agent developers, and shows how to integrate MemMachine into a Python-based agent using the open-source SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Agent Memory Is Still Broken
&lt;/h2&gt;

&lt;p&gt;Most production agents handle long-term memory with one of two approaches: stuff everything into the context window (expensive and bounded) or summarize with an LLM before storing (lossy and irreversible).&lt;/p&gt;

&lt;p&gt;The summarization approach — used by Mem0 and many RAG-based systems — runs an extraction pass at write time. The LLM reads a conversation and outputs a set of facts or a condensed summary. Those facts go into a vector store. When the user comes back, retrieved facts are injected into the prompt.&lt;/p&gt;

&lt;p&gt;The problem is structural: LLM extraction at write time introduces irreversible information loss. What looks like a minor paraphrase today becomes a missed fact next month when the user returns with a follow-up. Multi-hop reasoning across multiple sessions is especially fragile because each hop must rely on the lossy summaries produced at previous write points.&lt;/p&gt;

&lt;p&gt;The LoCoMo benchmark makes this concrete — it tests whether an agent can recall facts from extended conversations, and Mem0's token-heavy pipeline still trails open-source alternatives on accuracy. MemMachine reaches 0.9169 on LoCoMo (with gpt-4.1-mini), above published scores for Mem0, Zep, Memobase, and LangMem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea: Ground-Truth Preservation
&lt;/h2&gt;

&lt;p&gt;MemMachine's defining design choice is deferred extraction. Raw conversational episodes are stored verbatim in a graph database. No LLM extraction pass runs at write time. When the agent needs to recall something, a Retrieval Agent queries the episode store and surfaces the original conversation context.&lt;/p&gt;

&lt;p&gt;This flips the cost curve. Write operations become cheap — store an episode, index it, done. Read operations become slightly richer — contextualized retrieval expands nucleus matches with neighboring episode context, so a query about "my dietary restrictions" pulls not just the turn where that phrase appeared, but the surrounding dialogue that gives it meaning.&lt;/p&gt;

&lt;p&gt;The architecture has four distinct memory tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short-term workspace&lt;/strong&gt; — the current conversation buffer. Limited capacity, cleared between sessions (standard context window behavior).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-term episodic memory&lt;/strong&gt; — the ground-truth store. Full conversational episodes in a graph database. This is the structural difference from Mem0-style systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic/profile memory&lt;/strong&gt; — high-level user facts (preferences, identity, stated goals) stored in a SQL database. This tier does run LLM extraction, but only for stable profile data, not for ephemeral conversation content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedural memory&lt;/strong&gt; — learned patterns, action sequences, and strategies the agent has acquired over interactions.&lt;/p&gt;

&lt;p&gt;The episodic tier does the heavy lifting. Because episodes are stored verbatim, the system can always return to the source of truth rather than a derivative. That is what "ground-truth-preserving" means in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Retrieval Agent: Three Routing Strategies
&lt;/h2&gt;

&lt;p&gt;Raw storage would be useless without smart retrieval. MemMachine introduces a companion Retrieval Agent with a ToolSelectAgent classifier that routes each incoming query to one of three strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Direct retrieval&lt;/strong&gt; — semantic similarity search over the episode store. Used for simple, single-fact queries ("What is the user's preferred language?"). Fast and low-latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel decomposition&lt;/strong&gt; — splits multi-part queries into sub-queries and executes them concurrently, then merges results. Used when a question has several independent dimensions ("What does the user prefer about both their IDE setup and their deployment workflow?").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chain-of-query&lt;/strong&gt; — iterative retrieval where each step informs the next. Used for multi-hop reasoning ("What framework was the user migrating to, and what deployment platform did they choose for it?"). Each query builds on what the previous step retrieved.&lt;/p&gt;

&lt;p&gt;This adaptive routing is what enables the multi-hop benchmark numbers. On HotpotQA-hard the Retrieval Agent reaches 93.2%. On WikiMultiHop (2WikiMultiHopQA with randomized noise) it reaches 92.6%, while reducing input tokens by 59% (from 103k to 42k) compared to context-window-stuffing approaches.&lt;/p&gt;

&lt;p&gt;The crucial insight: the routing strategies are layered on top of the storage model without modifying it. Improving retrieval does not require changing how episodes are stored. That separation makes the system extensible — you can swap in a new routing strategy without touching the episode graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  LongMemEvalS: What the 93% Actually Means
&lt;/h2&gt;

&lt;p&gt;LongMemEval (ICLR 2025) benchmarks five long-term memory abilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Information extraction — can the agent find a stated fact?&lt;/li&gt;
&lt;li&gt;Multi-session reasoning — can it connect facts across sessions?&lt;/li&gt;
&lt;li&gt;Temporal reasoning — can it handle time-sensitive updates ("I changed jobs last month")?&lt;/li&gt;
&lt;li&gt;Knowledge updates — does it override stale information correctly?&lt;/li&gt;
&lt;li&gt;Abstention — does it correctly say "I don't know" rather than hallucinate?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;LongMemEvalS is the subset used in the MemMachine paper's ablation study. The 93.0% overall accuracy comes from stacking six optimization dimensions, with retrieval-stage improvements contributing more than ingestion-stage changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval depth tuning: +4.2%&lt;/li&gt;
&lt;li&gt;Context formatting: +2.0%&lt;/li&gt;
&lt;li&gt;Search prompt design: +1.8%&lt;/li&gt;
&lt;li&gt;Query bias correction: +1.4%&lt;/li&gt;
&lt;li&gt;Sentence chunking (ingestion): smaller contribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ablation is honest — it shows that the biggest gains come from &lt;em&gt;how you retrieve&lt;/em&gt;, not from a magic storage algorithm. The ground-truth-preserving model matters because it gives retrieval something accurate to work with. But retrieval engineering is where the performance headroom lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
  &lt;th&gt;Benchmark&lt;/th&gt;
  &lt;th&gt;MemMachine&lt;/th&gt;
  &lt;th&gt;Mem0&lt;/th&gt;
  &lt;th&gt;Zep / Graphiti&lt;/th&gt;
  &lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
  &lt;td&gt;LongMemEvalS (ICLR 2025)&lt;/td&gt;
  &lt;td class="highlight"&gt;93.0%&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;6-dimension ablation, gpt-4.1-mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;LoCoMo (F1/accuracy)&lt;/td&gt;
  &lt;td class="highlight"&gt;0.9169&lt;/td&gt;
  &lt;td&gt;Lower (paper claim)&lt;/td&gt;
  &lt;td&gt;Lower (paper claim)&lt;/td&gt;
  &lt;td&gt;gpt-4.1-mini; best published open-framework result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;HotpotQA-hard&lt;/td&gt;
  &lt;td class="highlight"&gt;93.2%&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;Retrieval Agent multi-hop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;WikiMultiHop (noisy)&lt;/td&gt;
  &lt;td class="highlight"&gt;92.6%&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;[DATA NOT AVAILABLE]&lt;/td&gt;
  &lt;td&gt;103k → 42k tokens (59% reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;Input tokens vs Mem0 (LoCoMo)&lt;/td&gt;
  &lt;td&gt;~80% fewer&lt;/td&gt;
  &lt;td&gt;Baseline&lt;/td&gt;
  &lt;td&gt;—&lt;/td&gt;
  &lt;td&gt;Write-time savings; no LLM extraction pass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Independent benchmark comparisons beyond what the MemMachine paper reports are not available at the time of writing. The figures above come from arXiv:2604.04853 and the MemMachine official blog; treat the relative comparisons as claims to verify when production-scale testing is feasible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: Installation and Basic Usage
&lt;/h2&gt;

&lt;p&gt;MemMachine is open-source at &lt;a href="https://github.com/MemMachine/MemMachine" rel="noopener noreferrer"&gt;github.com/MemMachine/MemMachine&lt;/a&gt; and published as &lt;code&gt;memmachine&lt;/code&gt; on PyPI. The quickstart requires Docker and Docker Compose because the system uses both a graph database and a SQL database for its memory tiers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docker 24+ and Docker Compose&lt;/li&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;An OpenAI API key (or compatible LLM endpoint for the Retrieval Agent)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download the latest release tarball from the GitHub releases page,&lt;/span&gt;
&lt;span class="c"&gt;# extract it, then run the setup script&lt;/span&gt;
./setup.sh  &lt;span class="c"&gt;# walks through Docker config and API key setup&lt;/span&gt;

&lt;span class="c"&gt;# Alternatively, install the Python SDK standalone:&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;memmachine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Workflow
&lt;/h3&gt;

&lt;p&gt;The SDK follows a four-step pattern: retrieve → enrich → generate → store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;memmachine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemMachineClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemMachineClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I prefer TypeScript over Python for backend work.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Retrieve relevant memories before generating a response
&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Enrich context with retrieved episodes
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;enriched_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;User message:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Generate response using your LLM (not shown — use your preferred client)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;your_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enriched_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Store the full episode verbatim
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;producer&lt;/code&gt; field ties episodes to a specific user. When that user returns, every &lt;code&gt;search()&lt;/code&gt; call scoped to their &lt;code&gt;producer&lt;/code&gt; ID retrieves from their episode history — across all past sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accessing the Retrieval Agent
&lt;/h3&gt;

&lt;p&gt;For multi-hop queries, use the Retrieval Agent directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The Retrieval Agent automatically selects the routing strategy
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieval_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What database was the user migrating to last month, and what hosting provider did they pick?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# episodes that contributed to the answer
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent classifies the query, selects direct, parallel, or chain-of-query routing, and returns both the answer and the source episodes. This traceability — knowing which stored episodes contributed to the answer — is a practical advantage over LLM-extraction-based systems where the provenance chain is broken at write time.&lt;/p&gt;

&lt;h2&gt;
  
  
  MemMachine vs Mem0: When to Choose Which
&lt;/h2&gt;

&lt;p&gt;The ground-truth-preserving approach is not free. Storing raw episodes grows the database faster than storing extracted summaries. For applications where storage cost is a hard constraint and conversation length is short, Mem0's extraction approach may be a reasonable trade-off. For applications where accuracy and multi-session reasoning matter — personalized coding assistants, customer support agents with long histories, companion AI — the accuracy gains from MemMachine's architecture are likely to outweigh the storage overhead.&lt;/p&gt;

&lt;p&gt;Key questions to guide the choice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Are your sessions long and multi-turn?&lt;/strong&gt; Ground-truth preservation gains more from longer episodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do you need multi-hop reasoning across sessions?&lt;/strong&gt; The Retrieval Agent's chain-of-query routing is designed for this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is write-time token cost a concern?&lt;/strong&gt; MemMachine's no-extraction-at-write-time model cuts ingestion costs substantially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do you need audit trails?&lt;/strong&gt; Storing raw episodes lets you trace every memory back to its source conversation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Mistakes When Building Agent Memory
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Over-indexing on single-session performance.&lt;/strong&gt; Most agent memory benchmarks run in a single session. LongMemEval is one of the few that tests across sessions. Evaluate your memory system on multi-session workloads before deploying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Assuming RAG extraction is lossless.&lt;/strong&gt; Every LLM extraction pass introduces paraphrase drift. Test by storing a conversation, extracting from it, then asking questions the original conversation answers but the summary might not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring retrieval depth.&lt;/strong&gt; The MemMachine ablation shows retrieval depth tuning (+4.2%) has more impact than chunking strategy. Most teams optimize chunking obsessively while leaving retrieval depth at defaults.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skipping abstention testing.&lt;/strong&gt; A memory system that hallucinations when it does not know something is worse than one with lower recall. LongMemEval's abstention dimension is worth including in your eval suite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not scoping memories to users.&lt;/strong&gt; Mixing memories across users is a privacy and accuracy risk. Always use a user-scoped key (like MemMachine's &lt;code&gt;producer&lt;/code&gt; field) from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Does MemMachine work with models other than OpenAI?
&lt;/h3&gt;

&lt;p&gt;The Retrieval Agent is LLM-agnostic — it uses the LLM for query classification and chain-of-query reasoning, so any model with tool-use capability should work. The documentation references OpenAI by default in the quickstart, but the SDK supports custom LLM endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does MemMachine handle knowledge updates?
&lt;/h3&gt;

&lt;p&gt;Newer episodes naturally take precedence in retrieval ranking. For explicit corrections ("actually, I switched from TypeScript to Go last week"), the Retrieval Agent's temporal reasoning handles the update — it surfaces the more recent episode and the semantic/profile memory layer can be updated with the corrected fact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is there a hosted / cloud version?
&lt;/h3&gt;

&lt;p&gt;The GitHub repository and PyPI package are open-source. A managed cloud offering (memmachine.ai) appears to be available based on the official site, but pricing and tier details were not independently verified at time of writing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does the graph database fit into the architecture?
&lt;/h3&gt;

&lt;p&gt;Episodic memory is stored in a graph database rather than a flat vector store. This allows the system to represent relationships between episodes (same user, same topic, same session) and traverse those relationships during contextualized retrieval — expanding a nucleus match with connected episodes without running a second embedding search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use MemMachine with LangChain or LlamaIndex?
&lt;/h3&gt;

&lt;p&gt;Integration guides for major agent frameworks were not available in the documentation at the time of writing. The Python SDK and RESTful API are framework-agnostic, so wrapping them for LangChain or LlamaIndex is straightforward. An MCP server interface is also listed as available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Ground-truth-preserving memory is a principled response to a real problem: LLM extraction at write time is lossy, and that loss compounds across sessions. MemMachine's approach — store raw episodes, retrieve intelligently — trades slightly more storage for substantially better accuracy and traceability.&lt;/p&gt;

&lt;p&gt;The benchmark results (93.0% LongMemEvalS, 0.9169 LoCoMo, 93.2% HotpotQA-hard) place it at the top of published open-framework results. The 80% token reduction on write operations is a meaningful cost argument for high-volume applications.&lt;/p&gt;

&lt;p&gt;For developers building agents that need to remember users across sessions — and get the details right — MemMachine is the most technically grounded option in the open-source memory layer space as of April 2026.&lt;/p&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;MemMachine's ground-truth-preserving architecture solves the write-time extraction problem that makes most agent memory systems degrade over long conversations. If you're building personalized agents that need accurate multi-session recall, it's worth evaluating against your current Mem0 or RAG-based setup — the token savings alone may offset the storage overhead.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://arxiv.org/abs/2604.04853" rel="noopener noreferrer"&gt;arXiv:2604.04853&lt;/a&gt; · &lt;a href="https://github.com/MemMachine/MemMachine" rel="noopener noreferrer"&gt;MemMachine GitHub&lt;/a&gt; · &lt;a href="https://pypi.org/project/memmachine/" rel="noopener noreferrer"&gt;PyPI: memmachine&lt;/a&gt; · &lt;a href="https://github.com/xiaowu0162/longmemeval" rel="noopener noreferrer"&gt;LongMemEval benchmark&lt;/a&gt; · &lt;a href="https://memmachine.ai/blog/2025/12/memmachine-v0.2-delivers-top-scores-and-efficiency-on-locomo-benchmark/" rel="noopener noreferrer"&gt;MemMachine blog: LoCoMo results&lt;/a&gt; · &lt;a href="https://memmachine.ai/blog/2026/02/how-memmachine-transforms-openclaws-memory-on-wikimultihop/" rel="noopener noreferrer"&gt;MemMachine blog: WikiMultiHop&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agentmemory</category>
      <category>llminfrastructure</category>
      <category>paperpoc</category>
      <category>rag</category>
    </item>
    <item>
      <title>Cursor SDK: Build AI Coding Agents in TypeScript</title>
      <dc:creator>Jangwook Kim</dc:creator>
      <pubDate>Tue, 12 May 2026 00:13:01 +0000</pubDate>
      <link>https://dev.to/jangwook_kim_e31e7291ad98/cursor-sdk-build-ai-coding-agents-in-typescript-3gf6</link>
      <guid>https://dev.to/jangwook_kim_e31e7291ad98/cursor-sdk-build-ai-coding-agents-in-typescript-3gf6</guid>
      <description>&lt;p&gt;When Cursor released its TypeScript SDK on April 29, 2026, it changed the framing of what a coding assistant is. Before the SDK, Cursor agents lived inside the IDE. After it, they're programmable infrastructure — agents you can invoke from a CI pipeline, a cron job, or a backend service with the same capabilities as the desktop app.&lt;/p&gt;

&lt;p&gt;This guide covers what the Cursor SDK actually is, how its deployment model works, and how to wire it into real workflows. Effloow Lab verified that &lt;code&gt;@cursor/sdk&lt;/code&gt; v1.0.12 is installable and inspected its public API surface as part of this write-up (see &lt;code&gt;data/lab-runs/cursor-sdk-typescript-agent-deployment.md&lt;/code&gt; for the raw commands and outputs).&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Cursor SDK Is (and Isn't)
&lt;/h2&gt;

&lt;p&gt;The Cursor SDK (&lt;code&gt;@cursor/sdk&lt;/code&gt;) is a TypeScript package that gives you programmatic access to the same agent runtime that powers the Cursor desktop app, CLI, and web interface. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codebase indexing&lt;/strong&gt; — the agent understands your repo structure out of the box&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full tool access&lt;/strong&gt; — read, write, glob, grep, shell, semantic search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model choice&lt;/strong&gt; — Composer 2 (Cursor's in-house model), Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, and others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server support&lt;/strong&gt; — connect external tools over stdio or HTTP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; — lifecycle callbacks to observe, modify, or block agent actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents&lt;/strong&gt; — delegate subtasks to named sub-agents with different prompts and models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it isn't: a general-purpose LLM client or a thin wrapper over the OpenAI API. The SDK is specifically built around the Cursor agent loop — the same harness that handles multi-step coding tasks with tool use, checkpointing, and repo context.&lt;/p&gt;

&lt;p&gt;As of v1.0.12 (published 2026-05-01), the package has been in public beta for all users since launch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Environment Setup
&lt;/h2&gt;

&lt;p&gt;Installing the SDK is a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @cursor/sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Effloow Lab sandbox confirmed this installs successfully on Node.js v25.9.0 with npm 11.12.1, pulling 131 packages total. The main dependencies are &lt;code&gt;@bufbuild/protobuf&lt;/code&gt; and &lt;code&gt;@connectrpc/connect&lt;/code&gt; (for the gRPC-based communication layer), &lt;code&gt;sqlite3&lt;/code&gt; (used to store local run events), and &lt;code&gt;zod&lt;/code&gt; for schema validation.&lt;/p&gt;

&lt;p&gt;One note from the sandbox run: &lt;code&gt;npm audit&lt;/code&gt; reports 10 vulnerabilities (2 low, 1 moderate, 7 high) in the current dependency tree, mostly stemming from &lt;code&gt;sqlite3&lt;/code&gt;'s native bindings. This is common for public beta SDKs. Run &lt;code&gt;npm audit&lt;/code&gt; before any production deployment and monitor for updates.&lt;/p&gt;

&lt;p&gt;You'll also need a Cursor API key. Set it as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API keys are available from your Cursor account settings. The SDK reads &lt;code&gt;CURSOR_API_KEY&lt;/code&gt; from the environment automatically, or you can pass it explicitly in the &lt;code&gt;Agent.create()&lt;/code&gt; call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent API: Three Deployment Modes
&lt;/h2&gt;

&lt;p&gt;The SDK exposes an &lt;code&gt;Agent&lt;/code&gt; class with a static &lt;code&gt;Agent.create()&lt;/code&gt; factory. The Effloow Lab inspection confirmed these static methods on the &lt;code&gt;Agent&lt;/code&gt; class: &lt;code&gt;create&lt;/code&gt;, &lt;code&gt;resume&lt;/code&gt;, &lt;code&gt;prompt&lt;/code&gt;, &lt;code&gt;list&lt;/code&gt;, &lt;code&gt;listRuns&lt;/code&gt;, &lt;code&gt;getRun&lt;/code&gt;, &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;archive&lt;/code&gt;, &lt;code&gt;unarchive&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, and &lt;code&gt;messages&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;create&lt;/code&gt; call accepts three mutually exclusive deployment targets via the options object:&lt;/p&gt;

&lt;h3&gt;
  
  
  Mode 1: Local Agent
&lt;/h3&gt;

&lt;p&gt;The agent runs on your machine, using your local filesystem. Good for one-off scripts, local automation, and development.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cursor/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize what this repository does and list its main entry points&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;run.stream()&lt;/code&gt; method yields typed &lt;code&gt;InteractionUpdate&lt;/code&gt; events — text deltas, tool call starts and completions, thinking deltas, and shell output. This lets you build real-time UIs or pipe the output to logs without buffering the whole response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mode 2: Cloud Agent
&lt;/h3&gt;

&lt;p&gt;Each cloud run gets its own sandboxed VM managed by Cursor. The VM clones the target repository, sets up the development environment, and continues running even if the invoking script exits. You can reconnect and stream the conversation later using &lt;code&gt;Agent.resume()&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cursor/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github.com/your-org/your-repo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;main&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review the changes in the last PR and open a follow-up issue for any technical debt introduced&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Optionally stream, or fire-and-forget&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Agent status:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud agents are the right choice for long-running tasks, CI-triggered workflows, or anything where you don't want the agent tied to a process lifecycle. The agent can push branches and open pull requests directly from the VM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mode 3: Self-Hosted Agent
&lt;/h3&gt;

&lt;p&gt;You can run the agent runtime on your own infrastructure — useful for air-gapped environments or when you need to control where code is executed. Self-hosted mode requires a separate runner binary and additional setup documented in the Cursor SDK docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks: Observing and Controlling the Agent Loop
&lt;/h2&gt;

&lt;p&gt;Hooks let you intercept the agent at specific points in its execution loop. They're defined in &lt;code&gt;.cursor/hooks.json&lt;/code&gt; at the repo root and apply across all three deployment modes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"onFileEdit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx prettier --write {file}"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"beforeShellCommand"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scripts/guard-destructive.sh {command}"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"onRunComplete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scripts/notify-slack.sh {runId} {status}"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common hook events include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;onFileEdit&lt;/code&gt; — fires after each file write, useful for formatters and linters&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;beforeShellCommand&lt;/code&gt; — runs before shell execution, good for safety guardrails&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;onRunComplete&lt;/code&gt; — fires when the agent finishes, useful for notifications or post-processing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;onToolCall&lt;/code&gt; — fires on any tool use, giving you a full audit log&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hooks receive context variables like &lt;code&gt;{file}&lt;/code&gt;, &lt;code&gt;{command}&lt;/code&gt;, and &lt;code&gt;{runId}&lt;/code&gt; as string interpolations in the command. The hook's exit code determines whether the agent proceeds (exit 0) or aborts the action (any non-zero exit).&lt;/p&gt;

&lt;h2&gt;
  
  
  Subagents: Multi-Agent Orchestration Without Custom Glue
&lt;/h2&gt;

&lt;p&gt;Subagents let the main agent delegate subtasks to specialized agents. You define them either in &lt;code&gt;.cursor/agents/*.md&lt;/code&gt; files or inline in the &lt;code&gt;Agent.create()&lt;/code&gt; call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code-reviewer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a strict code reviewer focused on security and performance. Review changes and output a list of issues.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;doc-writer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You write concise, accurate technical documentation in Markdown.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the main agent calls the &lt;code&gt;Agent&lt;/code&gt; tool with a name matching one of these definitions, the subagent runs as a separate call with its own prompt context and model. The main agent receives the subagent's output and continues. This is how teams build multi-step pipelines — review agent → fix agent → doc agent — without custom orchestration code.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Server Integration
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) servers extend the agent's tool set with external data sources and actions. You can configure them in &lt;code&gt;.cursor/mcp.json&lt;/code&gt; or pass them directly to &lt;code&gt;Agent.create()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;mcpServers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;linear&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;stdio&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;npx&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-y&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@linear/mcp-server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;LINEAR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LINEAR_API_KEY&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postgres&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:8080/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With MCP servers configured, the agent can query Linear for ticket context, pull database schemas, or call any custom tool you expose over the MCP protocol — without any special prompting. The agent discovers available tools automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD Patterns
&lt;/h2&gt;

&lt;p&gt;The most common production use case teams are building with the Cursor SDK is CI/CD integration. Here are three patterns that work today:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: PR Summary on Push
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ci/summarize-pr.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cursor/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prNumber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PR_NUMBER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;branch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_HEAD_REF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_REPOSITORY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;branch&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  Review the changes in PR #&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prNumber&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; compared to main.
  Write a concise summary covering: what changed, why it matters, and any risks.
  Output as a GitHub comment body in Markdown.
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Post result.output to the PR via GitHub API&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 2: Auto-Fix CI Failures
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ci/fix-failures.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cursor/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;readFileSync&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;failureLog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/tmp/ci-failures.txt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_REPOSITORY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`fix/auto-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;baseBranch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_HEAD_REF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  The following tests are failing:\n\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;failureLog&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\n
  Identify the root cause and apply the minimal fix.
  Push the changes and open a pull request with a description of what you changed and why.
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 3: Scheduled Documentation Sync
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// scripts/sync-docs.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cursor/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CURSOR_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;composer-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github.com/your-org/docs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;main&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review all public API functions in src/api/ for missing or outdated JSDoc comments. Update them to match the current implementation. Commit directly to main.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These patterns work because cloud agents persist independently — the CI runner just fires the &lt;code&gt;Agent.create()&lt;/code&gt; call and optionally monitors the result. The actual work happens in Cursor's infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing Considerations
&lt;/h2&gt;

&lt;p&gt;The Cursor SDK uses the same token-based pricing as Cursor's Max Mode. A few reference points from Cursor's published pricing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto mode&lt;/strong&gt; (Cursor selects model): ~$0.25/M tokens (cache read), $1.25/M tokens (input), $6.00/M tokens (output)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; and &lt;strong&gt;GPT-5.5&lt;/strong&gt; are frontier-tier models priced higher&lt;/li&gt;
&lt;li&gt;A complex task with 150 tool calls, 200K input tokens, and 20K output tokens costs roughly $3–8 depending on the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud agents also incur VM compute costs during execution. For typical CI tasks (PR summary, test fix), an agent run of 5–15 minutes will consume a manageable amount of tokens — but high-frequency automation on large repos can accumulate cost quickly. Start with &lt;code&gt;composer-2&lt;/code&gt; for volume use cases and reserve Opus 4.7 for tasks where accuracy matters most.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Strengths
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Same runtime as the desktop app — no capability gap between IDE and API agents&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Three deployment modes (local, cloud, self-hosted) in a single API surface&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Hooks and subagents enable complex pipelines without custom orchestration code&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;MCP server support extends the agent's tool set to any external system&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Cloud agents persist independently of the invoking process&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;


Limitations
&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;Public beta — API surface may change between minor versions&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;10 npm audit vulnerabilities in v1.0.12 (monitor for fixes)&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Token-based pricing can accumulate at scale; no flat-rate option for SDK use&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;Self-hosted mode requires a separate runner binary setup&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;No offline or air-gapped cloud VM option; cloud mode phones home to Cursor infrastructure&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  What to Build First
&lt;/h2&gt;

&lt;p&gt;If you're evaluating the Cursor SDK, the fastest path to value is a PR summary script. It's a contained task, produces immediately useful output, and doesn't require write access to your repo. From there, the escalation path is clear: CI failure diagnosis, documentation sync, and eventually multi-agent pipelines with subagents.&lt;/p&gt;

&lt;p&gt;Rippling, Notion, Faire, and C3 AI are confirmed early adopters from the April 2026 launch announcement — all using cloud agents for CI/CD automation at various scales.&lt;/p&gt;

&lt;p&gt;The SDK is in active development. Check the &lt;a href="https://cursor.com/docs/sdk/typescript" rel="noopener noreferrer"&gt;Cursor SDK changelog&lt;/a&gt; and the &lt;a href="https://github.com/cursor/cookbook" rel="noopener noreferrer"&gt;cursor/cookbook&lt;/a&gt; GitHub repo for recipes that the Cursor team adds as real-world patterns emerge.&lt;/p&gt;

&lt;p&gt;Bottom Line&lt;br&gt;
  &lt;/p&gt;
&lt;p&gt;The Cursor SDK turns a coding assistant into programmable infrastructure. If your team already uses Cursor and runs any manual agent tasks on a schedule, this SDK makes those tasks repeatable, auditable, and composable. Start with cloud mode for CI/CD and move to subagents once the basic pattern works.&lt;/p&gt;

&lt;h3&gt;
  
  
  FAQ
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Q: Do I need a Cursor Pro subscription to use the SDK?
&lt;/h3&gt;

&lt;p&gt;You need a Cursor account with API access. As of the public beta, SDK usage is billed separately via token-based pricing, not against your plan's monthly request quota. Check your Cursor account settings for API key availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use the SDK without the Cursor desktop app installed?
&lt;/h3&gt;

&lt;p&gt;Yes. The SDK is a standalone npm package. You don't need the desktop app on the machine running the agent. You do need a &lt;code&gt;CURSOR_API_KEY&lt;/code&gt; and network access to Cursor's API endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does cloud agent persistence work if my script exits?
&lt;/h3&gt;

&lt;p&gt;When you launch a cloud agent, Cursor's infrastructure takes ownership of the VM and the agent run. Your script can exit immediately after calling &lt;code&gt;agent.send()&lt;/code&gt;. You can reconnect later with &lt;code&gt;Agent.resume(agentId)&lt;/code&gt; and stream the conversation from wherever it left off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Are there rate limits on the SDK?
&lt;/h3&gt;

&lt;p&gt;The SDK inherits Cursor's standard rate limits. High-frequency automation (many concurrent cloud agents) may hit limits. The &lt;code&gt;Cursor.models()&lt;/code&gt; call returns the current model list, and responses include rate limit headers when limits are approached.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What models are available?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Cursor.models()&lt;/code&gt; returns the full list of available models at runtime. At launch, this includes Composer 2, Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, and additional frontier models as they become available inside Cursor.&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>typescript</category>
      <category>aiagents</category>
      <category>sdk</category>
    </item>
  </channel>
</rss>
