<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Max Quimby</title>
    <description>The latest articles on DEV Community by Max Quimby (@max_quimby).</description>
    <link>https://dev.to/max_quimby</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3823178%2F0a97facc-1e95-494c-9db9-084aa3b35e47.png</url>
      <title>DEV Community: Max Quimby</title>
      <link>https://dev.to/max_quimby</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/max_quimby"/>
    <language>en</language>
    <item>
      <title>Shannon AI Review: Autonomous Web Pentesting Agent</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:25:16 +0000</pubDate>
      <link>https://dev.to/max_quimby/shannon-ai-review-autonomous-web-pentesting-agent-3jdi</link>
      <guid>https://dev.to/max_quimby/shannon-ai-review-autonomous-web-pentesting-agent-3jdi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/shannon-ai-pentester-review-autonomous-web-security-2026" rel="noopener noreferrer"&gt;Read the full version with screenshots and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 22, 2026, the &lt;a href="https://news.ycombinator.com/item?id=47876043" rel="noopener noreferrer"&gt;Bitwarden CLI package was compromised&lt;/a&gt; and pushed to npm as version 2026.4.0. The malicious release was live for 19 hours. 334 users downloaded it before detection. Bitwarden is one of the most-audited, most-trusted password managers on the planet — and the attack was caught by community monitoring, not by the organization's own tooling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47876043" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksg200fy1xty3fpxc884.png" alt="Hacker News: Bitwarden CLI compromised in Checkmarx supply chain campaign — 679 points, 337 comments" width="600" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the context in which &lt;a href="https://github.com/KeygraphHQ/shannon" rel="noopener noreferrer"&gt;Shannon&lt;/a&gt; needs to be evaluated — not as an academic security toy, but as a response to an increasingly hostile environment where the traditional model of "annual pentest, quarterly audit" is already obsolete before the PDF is delivered.&lt;/p&gt;

&lt;p&gt;Shannon is an open-source autonomous AI pentesting agent built by &lt;a href="https://keygraph.io/shannon" rel="noopener noreferrer"&gt;Keygraph&lt;/a&gt;. It reads your source code, maps your attack surface, and attempts to break in — producing a report with zero false positives, because it only files findings it can actively prove with a working exploit. It has 40.1K GitHub stars as of April 2026. Powered by Anthropic's Claude.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/The_Cyber_News/status/2019777360313434478" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frddtlmdi1zutukguj1z5.png" alt="@The_Cyber_News: Shannon AI Pentesting Tool that Autonomously Checks for Code Vulnerabilities in 90 Minutes" width="548" height="920"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Shannon Actually Does
&lt;/h2&gt;

&lt;p&gt;When you run Shannon, it executes a five-phase workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-reconnaissance&lt;/strong&gt; — Static code analysis: architecture patterns, entry points, authentication mechanisms, likely attack vectors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reconnaissance&lt;/strong&gt; — Dynamic analysis via Playwright browser automation: forms, API endpoints, authentication flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulnerability &amp;amp; Exploitation&lt;/strong&gt; — Five parallel Claude agents simultaneously test for SQLi, XSS, authorization bypasses, SSRF, and IDOR. No PoC = no finding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirmation&lt;/strong&gt; — Dedicated pass verifies each exploit is reproducible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reporting&lt;/strong&gt; — Proven vulnerabilities only, with exact &lt;code&gt;curl&lt;/code&gt; commands to reproduce&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cost: ~$50 in Anthropic API credits. Time: 1–1.5 hours. Compare: $10,000–$50,000 for a traditional pentest.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/DavidBorish/status/2041171017029042465" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4ykpdasi1k6j8ay35tb.png" alt="@DavidBorish: Shannon hit 10,000 GitHub stars by actually breaking into web applications instead of just flagging potential problems" width="548" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The XBOW Benchmark: 96.15%
&lt;/h2&gt;

&lt;p&gt;Shannon scored 96.15% on the XBOW security benchmark — 100 of 104 intentionally vulnerable web apps solved in hint-free, source-aware mode. Commercial DAST tools typically score 30–40% on comparable evaluations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/AISecHub/status/2000413083693445600" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5un01bd5gie7t5376aqd.png" alt="@AISecHub: Shannon has achieved a 96.15% success rate on the hint-free source-aware XBOW Benchmark" width="548" height="764"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On Test Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DVNA (Node.js)&lt;/strong&gt; — Shannon detected SQL injection, command injection, XSS, and XXE with working exploits. "What stood out was how Shannon organized the analysis — it structured the findings into clear sections."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OWASP Juice Shop&lt;/strong&gt; — &lt;a href="https://betterstack.com/community/guides/ai/shannon-ai/" rel="noopener noreferrer"&gt;Better Stack's test&lt;/a&gt; consumed ~$60 in API credits. Shannon "didn't say 'this login looks weak' — it bypassed the login, dumped data, and handed me the screenshots and logs to prove it." Zero false positives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Traditional pentest&lt;/td&gt;
&lt;td&gt;$10,000–$50,000&lt;/td&gt;
&lt;td&gt;Weeks&lt;/td&gt;
&lt;td&gt;Annual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shannon per scan&lt;/td&gt;
&lt;td&gt;~$50 API&lt;/td&gt;
&lt;td&gt;1–1.5 hours&lt;/td&gt;
&lt;td&gt;Daily in CI/CD&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Shannon Misses
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;White-box only&lt;/strong&gt; — requires source code access; can't test closed-source dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Four categories only&lt;/strong&gt; — SQLi, XSS, SSRF, broken auth. Business logic flaws: not in scope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not for production&lt;/strong&gt; — creates users, modifies data, fires injection probes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM residual risk&lt;/strong&gt; — confirmation phase helps but human review still essential&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Dual-Use Concern
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://news.ycombinator.com/item?id=46944416" rel="noopener noreferrer"&gt;HN discussion&lt;/a&gt;: "Since this is open source, it's a white-hat tool, but it also democratizes script kiddos being able to do some serious damage." Developer: "I guess who owns the most hardware wins the arms race?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Requirements: Docker, Node.js 18+, Anthropic API key&lt;/span&gt;
npx @keygraph/shannon setup
npx @keygraph/shannon start &lt;span class="nt"&gt;-u&lt;/span&gt; https://your-dev-app.com &lt;span class="nt"&gt;-r&lt;/span&gt; /path/to/repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Shannon if:&lt;/strong&gt; shifting security left, web app with source code you control, OWASP Top 10 exposure, need something between nothing and a full pentest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't rely on Shannon if:&lt;/strong&gt; black-box testing needed, business logic is your risk, compliance-ready reports required, production environment.&lt;/p&gt;

&lt;p&gt;Shannon is at &lt;a href="https://github.com/KeygraphHQ/shannon" rel="noopener noreferrer"&gt;github.com/KeygraphHQ/shannon&lt;/a&gt; — AGPL-3.0.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/shannon-ai-pentester-review-autonomous-web-security-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>opensource</category>
      <category>pentesting</category>
    </item>
    <item>
      <title>GPT-5.5 vs Claude Code: Which AI Should You Use?</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Fri, 24 Apr 2026 03:26:24 +0000</pubDate>
      <link>https://dev.to/max_quimby/gpt-55-vs-claude-code-which-ai-should-you-use-58fe</link>
      <guid>https://dev.to/max_quimby/gpt-55-vs-claude-code-which-ai-should-you-use-58fe</guid>
      <description>&lt;p&gt;The agentic coding race just got a whole lot more explicit.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 23, 2026, OpenAI shipped &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5&lt;/a&gt; with a framing it hasn't used before: not a smarter chat model, but "a new class of intelligence for real work and powering agents." The subtext is unmistakable — OpenAI is coming directly for the territory Claude Code has been quietly dominating among professional developers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/OpenAI/status/2047376561205325845" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fondzunwp52ngnswtmi75.png" alt="OpenAI tweet announcing GPT-5.5 — 40K likes, 8.4K retweets" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The launch racked up 40K likes within hours. Developers who have been routing serious coding work through Claude Code are suddenly asking whether it's time to reconsider. The honest answer? It depends on what you're building — and who's paying for it.&lt;/p&gt;

&lt;p&gt;This is a practical decision guide. We'll cover the benchmark reality, the pricing drama that erupted this week, and the three distinct use cases where each tool wins. No hype, no both-sides-ism. Just a clear read on the current state of the agentic coding wars.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GPT-5.5 Actually Is
&lt;/h2&gt;

&lt;p&gt;GPT-5.5 is the first fully retrained base model OpenAI has shipped since GPT-4.5. Every previous 5.x release (5.1, 5.2, 5.3, 5.4) was built on the same foundation — this one is not.&lt;/p&gt;

&lt;p&gt;The headline benchmark: &lt;strong&gt;82.7% on Terminal-Bench 2.0&lt;/strong&gt;, a test of complex command-line workflows that require planning, iteration, and coordinated tool use. It also posts 58.6% on SWE-Bench Pro (real GitHub issue resolution end-to-end in a single pass) and 84.9% on GDPval, which tests general-purpose knowledge work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;TechCrunch's coverage&lt;/a&gt; notes that Greg Brockman called it "a real step forward towards the kind of computing that we expect in the future" — pointing to autonomous task completion, not just chat fluency. The model is designed to use tools, verify its own work, and carry multi-step tasks through to completion without requiring constant human steering.&lt;/p&gt;

&lt;p&gt;What changed under the hood according to &lt;a href="https://interestingengineering.com/ai-robotics/opanai-gpt-5-5-agentic-coding-gains" rel="noopener noreferrer"&gt;Interesting Engineering&lt;/a&gt;: fewer refusals mid-task, better intent retention across long tool chains, and more efficient token usage per completed task than GPT-5.4. It's natively omnimodal (text, images, audio, video in a single unified system) and available in both ChatGPT and Codex immediately on launch day for Plus, Pro, Business, and Enterprise subscribers.&lt;/p&gt;

&lt;p&gt;The pricing is not gentle. &lt;a href="https://venturebeat.com/ai/openais-gpt-5-5-is-here-and-its-no-potato-narrowly-beats-anthropics-claude-mythos-preview-on-terminal-bench-2-0/" rel="noopener noreferrer"&gt;VentureBeat's analysis&lt;/a&gt; puts GPT-5.5 API at $5/million input tokens and $30/million output tokens — roughly 2x the per-token cost of GPT-5.4. OpenAI's defense is fewer tokens per task, but that tradeoff only holds if your workload actually benefits from GPT-5.5's strengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code Actually Is
&lt;/h2&gt;

&lt;p&gt;Claude Code is a different category of product. It's not a chat interface with coding capabilities bolted on — it's a terminal-native agent built specifically for software engineers. It runs in your local terminal, integrates directly with VS Code and JetBrains, understands your full repo context, and executes multi-hour autonomous coding sessions that Anthropic describes as its core use case.&lt;/p&gt;

&lt;p&gt;The underlying model powering serious Claude Code work today is &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;, released April 16, 2026. Its signature benchmark is &lt;strong&gt;64.3% on SWE-Bench Pro&lt;/strong&gt; — the highest score on that test for complex multi-file GitHub issue resolution. Opus 4.7 leads GPT-5.5 on 6 of the 10 shared benchmarks both providers report, particularly on the reasoning-heavy and code review-grade tests (GPQA Diamond, HLE, SWE-Bench Pro, MCP Atlas).&lt;/p&gt;

&lt;p&gt;For a ground-level look at how real developers are using it, the &lt;a href="https://www.youtube.com/watch?v=wkv2ifxPpF8" rel="noopener noreferrer"&gt;Y Combinator video featuring Garry Tan's Claude Code setup&lt;/a&gt; is worth 15 minutes. Tan walks through his "GStack" — the full Claude Code-native development environment he runs as a solo-founder-style operator.&lt;/p&gt;

&lt;p&gt;Claude Code's strongest differentiator isn't a benchmark. It's the depth of context retention and the autonomy of its execution. In the &lt;a href="https://news.ycombinator.com/item?id=47879092" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; that followed GPT-5.5's launch, one recurring pattern emerged: developers described Claude Code as "autonomous/thoughtful — it plans deeply and asks less of the human," while Codex/GPT-5.5 is characterized as "an interactive collaborator where you steer it mid-execution."&lt;/p&gt;

&lt;p&gt;Check our &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;complete guide to Claude Code&lt;/a&gt; for a deep dive on how to set up and optimize Claude Code for your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head: Benchmarks That Actually Matter
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://lushbinary.com/blog/gpt-5-5-vs-claude-opus-4-7-comparison-benchmarks-pricing/" rel="noopener noreferrer"&gt;Lushbinary's analysis&lt;/a&gt; of the 10 benchmarks both providers publicly report gives the clearest picture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.7 leads on 6:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-Bench Pro: &lt;strong&gt;64.3%&lt;/strong&gt; vs 58.6%&lt;/li&gt;
&lt;li&gt;GPQA Diamond: Opus leads&lt;/li&gt;
&lt;li&gt;HLE (with and without tools): Opus leads&lt;/li&gt;
&lt;li&gt;MCP Atlas: Opus leads&lt;/li&gt;
&lt;li&gt;FinanceAgent v1.1: Opus leads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5 leads on 4:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terminal-Bench 2.0: &lt;strong&gt;82.7%&lt;/strong&gt; vs 69.4%&lt;/li&gt;
&lt;li&gt;BrowseComp: GPT-5.5 leads&lt;/li&gt;
&lt;li&gt;OSWorld-Verified: GPT-5.5 leads&lt;/li&gt;
&lt;li&gt;CyberGym: 82%&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ One important nuance: GPT-5.5's 58.6% on SWE-Bench Pro is measured in single-pass mode. Claude Code typically runs multiple iterations. Comparing single-pass GPT-5.5 scores to multi-pass Claude Code sessions is not apples-to-apples.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://x.com/omarsar0/status/2047424707310289058" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw7dyhdrdmfmmfedem1f.png" alt="AI researcher first impressions of GPT-5.5 agentic capabilities" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47879092" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yerdqx6jnmwu8iptc13.png" alt="Hacker News discussion on GPT-5.5 — developers compare Claude Code vs Codex workflows" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Drama You Need to Know
&lt;/h2&gt;

&lt;p&gt;On April 22, &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register reported&lt;/a&gt; that Anthropic quietly updated its pricing page — Claude Code showed an "X" in the Pro column, suggesting the feature was being moved exclusively to the $100/month and $200/month Max plans. No press release, no email, no changelog entry.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Reddit and HN caught fire immediately. For a large segment of Pro subscribers, Claude Code &lt;em&gt;was&lt;/em&gt; the reason they paid $20/month. The apparent removal felt like a retroactive bait-and-switch.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faivgm4g3on4z8isb7ay7.png" alt="The Register coverage of Anthropic removing Claude Code from Pro plan" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison's take&lt;/a&gt; captured the confusion well: within hours of his blog post being drafted, Anthropic had reversed the pricing page change. Anthropic's Head of Growth Amol Avasare clarified the change affected "~2% of new prosumer signups" only.&lt;/p&gt;

&lt;p&gt;The contrast with Codex is stark. &lt;a href="https://www.builder.io/blog/codex-vs-claude-code" rel="noopener noreferrer"&gt;Builder.io's comparison&lt;/a&gt; makes it plain: "Many more people can live comfortably on the $20 Codex plan than Claude's $17 plan where limits get hit quickly."&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Decision Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Solo Developer / Indie Hacker
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude Code&lt;/strong&gt; — with caveats on budget.&lt;/p&gt;

&lt;p&gt;If you're running a solo operation and want an AI that will autonomously execute multi-hour coding sessions while you focus on product decisions, Claude Code on Opus 4.7 is the deeper tool. The caveat: if you're on the $20 Pro plan and hitting limits regularly, GPT-5.5 in Codex is a legitimate alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Engineering Team (5–50 People)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: GPT-5.5 / Codex&lt;/strong&gt; — on ecosystem and GitHub integration.&lt;/p&gt;

&lt;p&gt;For teams, &lt;a href="https://www.builder.io/blog/codex-vs-claude-code" rel="noopener noreferrer"&gt;Builder.io&lt;/a&gt; identifies Codex's GitHub integration as its decisive advantage. GPT-5.5 also supports the Agents.md standard — Claude Code's exclusive use of Claude.md creates friction in multi-tool team environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 3: Enterprise (100+ Engineers)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Hybrid + &lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At enterprise scale, the right answer is an intelligent routing layer. cc-switch (49K stars) unifies Claude Code, Codex, OpenCode, and Gemini CLI into a single Rust-powered desktop app.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 For enterprise teams: Claude Opus 4.7 for code review and complex refactors; GPT-5.5 for long-running agentic workflows and computer use. cc-switch makes this routing practical at scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Claude Code (Opus 4.7) if:&lt;/strong&gt; complex multi-file coding, autonomous execution, terminal-native workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use GPT-5.5 / Codex if:&lt;/strong&gt; long-running tool chains, computer use, GitHub-centric team workflows, cost-sensitive setups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use both (via cc-switch) if:&lt;/strong&gt; team or enterprise scale with mixed workloads.&lt;/p&gt;

&lt;p&gt;The developers winning with AI coding in 2026 stop asking "which is better overall?" and start asking "which is better for this specific task?"&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>openai</category>
      <category>coding</category>
    </item>
    <item>
      <title>Claude Code Agentic Stack: cc-switch &amp; claude-context MCP</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:36:41 +0000</pubDate>
      <link>https://dev.to/max_quimby/claude-code-agentic-stack-cc-switch-claude-context-mcp-1dg2</link>
      <guid>https://dev.to/max_quimby/claude-code-agentic-stack-cc-switch-claude-context-mcp-1dg2</guid>
      <description>&lt;p&gt;Claude Code just won a &lt;a href="https://www.webbyawards.com/press/press-releases/30th-annual-webby-awards-announce-2026-winners/" rel="noopener noreferrer"&gt;Webby Award&lt;/a&gt; for Best Product or Service in AI Features &amp;amp; Innovation. Boris Cherny, Claude Code's PM at Anthropic, &lt;a href="https://x.com/bcherny/status/2047004804283773321" rel="noopener noreferrer"&gt;announced the win on X&lt;/a&gt; to a wave of congratulations from the developer community:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/claude-code-agentic-dev-stack-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://x.com/bcherny/status/2047004804283773321" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the real story isn't the trophy — it's what's happening in the GitHub repos trending alongside it.&lt;/p&gt;

&lt;p&gt;Two repos hit the GitHub Trending page on the same day as the Webby announcement: &lt;strong&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt;&lt;/strong&gt; (+665 stars in 24 hours, 48,667 total) and &lt;strong&gt;&lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context&lt;/a&gt;&lt;/strong&gt; (+873 stars). Both extend Claude Code's capabilities significantly — and together with a properly configured &lt;code&gt;CLAUDE.md&lt;/code&gt;, they represent what serious agentic developer stacks look like in 2026.&lt;/p&gt;

&lt;p&gt;This guide covers exactly how to set up both tools and wire everything together for maximum development velocity.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the "Agentic Developer Stack" Actually Means in 2026
&lt;/h2&gt;

&lt;p&gt;In the 2026 context, an agentic developer stack has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider management&lt;/strong&gt; — switch between Claude Code, Codex, Gemini CLI, OpenCode, and other AI coding tools from a single interface, sharing provider configs, MCP servers, and skills&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codebase context&lt;/strong&gt; — give your AI agent deep semantic understanding of your entire codebase, not just the files currently open&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent configuration&lt;/strong&gt; — the &lt;code&gt;CLAUDE.md&lt;/code&gt; files, skills, and subagent definitions that turn Claude Code from a general-purpose tool into a domain-specific engineering partner&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;According to &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Anthropic's 2026 Agentic Coding Trends Report&lt;/a&gt;, teams using structured &lt;code&gt;CLAUDE.md&lt;/code&gt; configs and subagent workflows report 2-4x velocity improvements over baseline Claude Code usage. The tools in this guide enable exactly that configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: cc-switch — Unified Provider Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What cc-switch Does
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt; is a cross-platform desktop app built with Tauri and Rust that unifies management of five AI coding CLI tools: Claude Code, OpenAI Codex, Gemini CLI, OpenCode, and OpenClaw. Instead of maintaining separate configuration files and MCP server setups for each tool, cc-switch provides a single interface that syncs settings bidirectionally.&lt;/p&gt;

&lt;p&gt;Key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50+ built-in provider presets&lt;/strong&gt; — one-click import of API configurations for Anthropic, OpenAI, Gemini, xAI, Mistral, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System tray quick switch&lt;/strong&gt; — instant provider switching without opening a terminal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified MCP &amp;amp; Skills Management&lt;/strong&gt; — install MCP servers and skills once, sync across all four apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud sync&lt;/strong&gt; — settings sync via Dropbox, OneDrive, iCloud, or WebDAV servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage dashboard&lt;/strong&gt; — track spending, request counts, and token consumption per provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform&lt;/strong&gt; — Windows, macOS, and Linux support&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 cc-switch is built with Tauri (Rust-based) for native performance — not an Electron wrapper. Cold launch is under 200ms and system tray switching responds in under 50ms. This matters when you're switching between providers dozens of times a day.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Installing cc-switch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; cc-switch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or download the latest release from &lt;a href="https://github.com/farion1231/cc-switch/releases" rel="noopener noreferrer"&gt;cc-switch/releases&lt;/a&gt; — &lt;code&gt;.dmg&lt;/code&gt; for macOS, &lt;code&gt;.exe&lt;/code&gt; for Windows, &lt;code&gt;.AppImage&lt;/code&gt; for Linux.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cc-switch &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Initial Setup: Provider Configuration
&lt;/h3&gt;

&lt;p&gt;On first launch, cc-switch walks you through connecting your providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open cc-switch from the system tray or Applications folder&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Providers&lt;/strong&gt; → &lt;strong&gt;Add Provider&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select from the preset list (Anthropic, OpenAI, Gemini, etc.) or add a custom provider&lt;/li&gt;
&lt;li&gt;Paste your API key — cc-switch stores it in your OS keychain, not in plain text&lt;/li&gt;
&lt;li&gt;Test the connection with the &lt;strong&gt;Verify&lt;/strong&gt; button&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Claude Code, cc-switch automatically detects your existing &lt;code&gt;~/.claude/&lt;/code&gt; configuration and imports it. Your existing settings, custom commands, and history are preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up MCP Servers in cc-switch
&lt;/h3&gt;

&lt;p&gt;The real power of cc-switch is managing MCP servers across all your coding tools simultaneously. Instead of configuring the same MCP server four separate times, you configure it once and cc-switch deploys to all connected tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cc-switch mcp add &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"claude-context"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s2"&gt;"npx"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--args&lt;/span&gt; &lt;span class="s2"&gt;"-y @zilliztech/claude-context"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; all-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 2: claude-context MCP — Semantic Codebase Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Codebase Context Is the Biggest Bottleneck
&lt;/h3&gt;

&lt;p&gt;When you ask Claude Code to modify a function that depends on types defined in five other files, Claude Code has to either load all five files into context (expensive) or try to infer the types from what it can see (error-prone). &lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context&lt;/a&gt; solves this with semantic search over your entire codebase.&lt;/p&gt;

&lt;p&gt;Instead of loading full files, it retrieves only the semantically relevant code snippets. According to &lt;a href="https://www.augmentcode.com/mcp/claude-context-mcp-server" rel="noopener noreferrer"&gt;Augment Code's MCP registry benchmarks&lt;/a&gt;, claude-context achieves approximately &lt;strong&gt;40% token reduction&lt;/strong&gt; under equivalent retrieval quality conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How claude-context Works
&lt;/h3&gt;

&lt;p&gt;claude-context uses a hybrid search approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BM25&lt;/strong&gt; — lexical matching (finds exact variable names, function signatures)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dense vector search&lt;/strong&gt; — semantic matching (finds conceptually related code even with different naming)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your codebase is indexed into a Milvus vector database (local) or Zilliz Cloud (managed). The index uses AST-aware chunking — it understands code structure at the syntax level. Function bodies, class definitions, and interface declarations are kept semantically intact.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 claude-context uses incremental Merkle-tree-based indexing. After the initial index build, only changed files are re-indexed. For a mid-size repo (50K LOC), re-indexing typically completes in under 5 seconds after a &lt;code&gt;git pull&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Installing and Configuring claude-context
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Node.js 18+ and a running Milvus instance (local Docker) or &lt;a href="https://zilliz.com/" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @zilliztech/claude-context
claude-context init   &lt;span class="c"&gt;# configure vector DB + embedding provider&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claude-context index &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Register with Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"claude-context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@zilliztech/claude-context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MILVUS_URI"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:19530"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${OPENAI_API_KEY}"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use cc-switch's MCP manager (recommended) — it handles the configuration and syncs it across all your AI coding tools automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=45181577" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Using claude-context During Development
&lt;/h3&gt;

&lt;p&gt;Once installed, claude-context adds a &lt;code&gt;search_codebase&lt;/code&gt; tool to Claude Code. You can invoke it explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use the search_codebase tool to find all implementations of the PaymentProcessor interface before modifying it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or Claude Code will invoke it automatically when understanding more of the codebase would improve its response.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 For large monorepos, create a &lt;code&gt;.claude-context-ignore&lt;/code&gt; file (similar to &lt;code&gt;.gitignore&lt;/code&gt;) to exclude generated files, &lt;code&gt;node_modules&lt;/code&gt;, build artifacts, and test fixtures. This keeps the index clean and retrieval precise.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Layer 3: CLAUDE.md Configuration — Making It All Stick
&lt;/h2&gt;

&lt;p&gt;Having great tools is only half the equation. The other half is configuring Claude Code to use them intelligently. This is where &lt;code&gt;CLAUDE.md&lt;/code&gt; comes in — and where most developers leave significant productivity on the table.&lt;/p&gt;

&lt;p&gt;For the fundamentals, see our &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;Claude Code Complete Guide&lt;/a&gt;. This section focuses on configuration patterns specific to the 2026 agentic stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Role of CLAUDE.md in an Agentic Stack
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is the document Claude Code reads at the start of every session. According to the &lt;a href="https://www.mindstudio.ai/blog/agentic-business-os-claude-code-architecture-guide" rel="noopener noreferrer"&gt;MindStudio guide on Agentic Business OS architecture&lt;/a&gt;, it's the "foundational document for your brand context layer — it defines what every agent knows before it starts any task."&lt;/p&gt;

&lt;p&gt;Use it to tell the agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which MCP servers are available and when to use them&lt;/li&gt;
&lt;li&gt;Your coding standards and conventions&lt;/li&gt;
&lt;li&gt;When to spawn subagents vs. work in the main context&lt;/li&gt;
&lt;li&gt;What tools to reach for first&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sample CLAUDE.md for the 2026 Agentic Stack
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: [Your Project Name]&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Language: TypeScript 5.4 (strict mode)
&lt;span class="p"&gt;-&lt;/span&gt; Runtime: Node.js 22 LTS
&lt;span class="p"&gt;-&lt;/span&gt; Package manager: pnpm

&lt;span class="gu"&gt;## MCP Servers Available&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**claude-context**&lt;/span&gt;: Use &lt;span class="sb"&gt;`search_codebase`&lt;/span&gt; before modifying any class, interface, 
  or utility function that may have downstream consumers. Always search before refactoring.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**chrome-mcp**&lt;/span&gt;: Available for UI verification tasks.

&lt;span class="gu"&gt;## Coding Standards&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Functions: single responsibility, &amp;lt;=50 lines
&lt;span class="p"&gt;-&lt;/span&gt; No &lt;span class="sb"&gt;`any`&lt;/span&gt; types — use &lt;span class="sb"&gt;`unknown`&lt;/span&gt; + type guards
&lt;span class="p"&gt;-&lt;/span&gt; Tests: co-located &lt;span class="sb"&gt;`.test.ts`&lt;/span&gt; files, Vitest
&lt;span class="p"&gt;-&lt;/span&gt; Commits: conventional commits format

&lt;span class="gu"&gt;## Subagent Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Spawn a subagent (with worktree isolation) for: feature branches, large refactors, research
&lt;span class="p"&gt;-&lt;/span&gt; Keep the main context for: interactive debugging, short edits, Q&amp;amp;A

&lt;span class="gu"&gt;## Agent Workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Search codebase (claude-context) before modifying shared code
&lt;span class="p"&gt;2.&lt;/span&gt; Write tests before implementation for new features
&lt;span class="p"&gt;3.&lt;/span&gt; Run &lt;span class="sb"&gt;`pnpm build`&lt;/span&gt; and &lt;span class="sb"&gt;`pnpm test`&lt;/span&gt; before committing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern — explicitly naming available MCP servers and when to use subagents — is what separates teams that get 2-4x velocity gains from teams that treat Claude Code as smart autocomplete.&lt;/p&gt;

&lt;p&gt;For detailed &lt;code&gt;CLAUDE.md&lt;/code&gt; patterns, see &lt;a href="https://computeleap.com/blog/karpathy-claude-md-template-skills-github-stars-viral" rel="noopener noreferrer"&gt;Karpathy's CLAUDE.md template analysis&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagent Setup with Worktree Isolation
&lt;/h3&gt;

&lt;p&gt;For complex features requiring parallel workstreams, the &lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;official subagent documentation&lt;/a&gt; provides the full setup. The key pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;feature-agent&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use for implementing new features across multiple modules&lt;/span&gt;
&lt;span class="na"&gt;isolation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;worktree&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;read&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;edit&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;write&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;bash&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;search_codebase&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You are a focused implementation agent. Use search_codebase to understand 
existing patterns before writing new code. Work in the isolated worktree.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;isolation: worktree&lt;/code&gt; gives the subagent its own copy of the repository, preventing conflicts when multiple agents work in parallel. For more on this, see the &lt;a href="https://github.com/shanraisshan/claude-code-best-practice" rel="noopener noreferrer"&gt;Claude Code best practices guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pricing Context: What the Pro Plan Controversy Means for Your Setup
&lt;/h2&gt;

&lt;p&gt;On April 21, 2026, Anthropic briefly removed Claude Code from the $20/month Pro plan listing — prompting a 2,648-upvote Reddit thread and coverage in &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt; and &lt;a href="https://www.xda-developers.com/anthropic-cut-claude-code-new-pro-subscriptions/" rel="noopener noreferrer"&gt;XDA Developers&lt;/a&gt;. &lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison's analysis&lt;/a&gt; described it as an "A/B test on ~2% of new prosumer signups." Anthropic reversed the change the same day — existing Pro and Max subscribers are not affected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/ClaudeAI/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the incident reveals the underlying tension: Claude Code sessions with Claude Opus 4.7 run up to three times longer than on 4.6, and inference costs are escalating.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ If you're building agentic workflows with long Claude Code sessions, budget for the Max plan ($100/month for 5x). Agentic sessions — especially with subagents and frequent claude-context queries — consume context much faster than interactive sessions. Use cc-switch's usage dashboard to track token consumption and catch runaway workflows before they hit billing limits.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Full Stack Setup Sequence
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Install Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @anthropic-ai/claude-code
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Install cc-switch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; cc-switch
&lt;span class="c"&gt;# Or: github.com/farion1231/cc-switch/releases&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Import your existing Claude Code config&lt;/strong&gt; — cc-switch auto-detects &lt;code&gt;~/.claude/&lt;/code&gt; on first launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Install and configure claude-context:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @zilliztech/claude-context
claude-context init
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claude-context index &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Register claude-context MCP via cc-switch&lt;/strong&gt; → MCP → Add Server → scope: All Tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Write your CLAUDE.md&lt;/strong&gt; in your project root using the template above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Define subagents&lt;/strong&gt; in &lt;code&gt;.claude/agents/&lt;/code&gt; — start with a feature-agent and a research-agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Test the full stack:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude
&lt;span class="c"&gt;# Ask: "Search the codebase for the authentication flow and explain it"&lt;/span&gt;
&lt;span class="c"&gt;# claude-context should invoke automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next in the Ecosystem
&lt;/h2&gt;

&lt;p&gt;A few things worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cc-switch's cloud sync&lt;/strong&gt; is expanding to git-based sync, enabling team-wide provider config sharing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-context's offline mode&lt;/strong&gt; (tracking in &lt;a href="https://github.com/zilliztech/claude-context/issues/162" rel="noopener noreferrer"&gt;Issue #162&lt;/a&gt;) would enable fully local indexing without an external vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Tool Search&lt;/strong&gt; (launched January 14, 2026) allows Claude Code to dynamically load tools into context when MCP servers have 50+ tools — reducing context pressure from large MCP setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying trend is clear: Claude Code has crossed from "developer tool" to "developer platform." The Webby Award is the cultural marker. The GitHub trending repos are the technical evidence. Setting up this stack today puts you in front of the curve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;GitHub&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cc-switch&lt;/td&gt;
&lt;td&gt;Unified provider + MCP management desktop app&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;farion1231/cc-switch&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claude-context&lt;/td&gt;
&lt;td&gt;Semantic codebase search MCP&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;zilliztech/claude-context&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE.md&lt;/td&gt;
&lt;td&gt;Agent configuration and context file&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/shanraisshan/claude-code-best-practice" rel="noopener noreferrer"&gt;shanraisshan/claude-code-best-practice&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the full Claude Code foundation, read the &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;Claude Code Complete Guide&lt;/a&gt;. For browser automation integration, see &lt;a href="https://computeleap.com/blog/chrome-built-in-mcp-server-native-mcp-v2-2026" rel="noopener noreferrer"&gt;Chrome's built-in MCP server guide&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch GitHub&lt;/a&gt; · &lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context GitHub&lt;/a&gt; · &lt;a href="https://www.webbyawards.com/press/press-releases/30th-annual-webby-awards-announce-2026-winners/" rel="noopener noreferrer"&gt;Webby Awards 2026&lt;/a&gt; · &lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison&lt;/a&gt; · &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt; · &lt;a href="https://www.xda-developers.com/anthropic-cut-claude-code-new-pro-subscriptions/" rel="noopener noreferrer"&gt;XDA Developers&lt;/a&gt; · &lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Anthropic Subagent Docs&lt;/a&gt; · &lt;a href="https://www.mindstudio.ai/blog/agentic-business-os-claude-code-architecture-guide" rel="noopener noreferrer"&gt;MindStudio Agentic OS&lt;/a&gt; · &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Anthropic 2026 Agentic Coding Trends&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/claude-code-agentic-dev-stack-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>devtools</category>
      <category>ai</category>
    </item>
    <item>
      <title>Iran's Prediction Markets Tell Two Stories</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 04:04:57 +0000</pubDate>
      <link>https://dev.to/max_quimby/irans-prediction-markets-tell-two-stories-4i29</link>
      <guid>https://dev.to/max_quimby/irans-prediction-markets-tell-two-stories-4i29</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://thearcofpower.com/blog/iran-prediction-markets-polymarket-insider-trading-ceasefire-2026" rel="noopener noreferrer"&gt;Read the full analysis with charts on The Arc of Power →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 7, 2026, as President Trump was preparing to announce a two-week ceasefire with Iran, more than fifty newly-created accounts on &lt;a href="https://polymarket.com/predictions/iran" rel="noopener noreferrer"&gt;Polymarket&lt;/a&gt; placed large, specific bets that the ceasefire would be announced that day. Minutes later, Trump made the announcement. The accounts profited approximately $600,000. Within 48 hours, the White House had sent internal emails warning staff not to place prediction market bets related to the Iran war.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our thesis: prediction markets are a remarkably accurate signal for short-term diplomatic timing and a systematically poor signal for structural outcomes.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The $200 Million Experiment
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.newsweek.com/iran-ceasefire-wreaks-havoc-on-prediction-markets-11806355" rel="noopener noreferrer"&gt;Over $200 million has traded&lt;/a&gt; on Polymarket contracts related to Iran's ceasefire timing. Approximately $118 million was bet specifically on an April 7 deadline — the exact day Trump announced the ceasefire.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cnn.com/2026/03/24/politics/iran-war-bets-prediction-markets" rel="noopener noreferrer"&gt;CNN reported&lt;/a&gt; that a single trader made nearly $1 million from well-timed Polymarket bets correctly predicting US and Israeli military actions against Iran since 2024. The &lt;a href="https://www.cnbc.com/2026/04/10/iran-war-prediction-markets-white-house.html" rel="noopener noreferrer"&gt;White House warned staff&lt;/a&gt; not to bet on Iran war outcomes. Two senators wrote the CFTC demanding investigation. The BETS OFF Act was introduced in Congress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The regulatory framing matters:&lt;/strong&gt; The BETS OFF Act would prohibit contracts on "government actions, terrorism, war, assassination, and events where an individual knows or controls the outcome." The last clause is the tell — directed at people who influence outcomes, not just know about them.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What the Markets Actually Got Right
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The ceasefire timing markets were accurate.&lt;/strong&gt; April 7 was right. The probability curve leading into the announcement showed a sustained spike beginning roughly 6-8 hours before Trump spoke — consistent with information leakage through informal channels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The regime stability markets have been roughly accurate.&lt;/strong&gt; Polymarket currently prices an &lt;a href="https://polymarket.com/event/will-the-iranian-regime-fall-by-the-end-of-2026" rel="noopener noreferrer"&gt;80.5% probability against the Iranian regime falling before 2027&lt;/a&gt;. Despite Khamenei's assassination and ongoing protests, the IRGC's institutional structure has remained intact. The market's skepticism of regime collapse — maintained even as Western media ran "end of the regime" framings — has proven correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Markets Are Getting Wrong
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://polymarket.com/event/us-iran-nuclear-deal-by-june-30" rel="noopener noreferrer"&gt;67% probability&lt;/a&gt; of a nuclear deal by June 30 is where prediction markets hit the limits of their model.&lt;/p&gt;

&lt;p&gt;Current odds (April 20, 2026):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Market&lt;/th&gt;
&lt;th&gt;Odds&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal by April 30&lt;/td&gt;
&lt;td&gt;36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal by June 30&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal before 2027&lt;/td&gt;
&lt;td&gt;59-61%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regime Falls before 2027&lt;/td&gt;
&lt;td&gt;19.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The problem: prediction markets aggregate the probability that &lt;em&gt;a deal happens&lt;/em&gt;, not the probability that a deal &lt;em&gt;resolves the underlying dispute&lt;/em&gt;. A deal that fails to address uranium enrichment infrastructure is not a deal. It is a delay.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/Kalshi/status/2043002302722654321" rel="noopener noreferrer"&gt;Kalshi surged to 61%&lt;/a&gt; on nuclear deal odds following Trump's April 13 statement that Iran wants a deal "badly." Markets correctly incorporated Trump's statement — but cannot distinguish between a statement made for domestic political effect and one reflecting genuine diplomatic progress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The contrarian read on 67%:&lt;/strong&gt; The markets cannot price the difference between "a document is signed" and "the structural conditions for Iranian nuclear breakout capability are removed." These resolve identically in contract language but produce very different geopolitical outcomes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Information Asymmetry Diagnostic
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.bloomberg.com/news/articles/2026-04-08/polymarket-s-iran-bets-draw-fresh-disputes-and-insider-scrutiny" rel="noopener noreferrer"&gt;Bloomberg analysis&lt;/a&gt; identified two populations who could have generated the April 7 betting pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Population 1: Informed analysts&lt;/strong&gt; — people who track backchannel communications through open-source methods and correctly model diplomatic decision-making. The Islamabad negotiations were not secret. A skilled analyst could have assessed April 7 as the most likely ceasefire date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Population 2: Informed insiders&lt;/strong&gt; — people with access to non-public government information. The fifty new accounts make this the more plausible explanation for that specific cluster.&lt;/p&gt;

&lt;p&gt;The distinction matters: if it's Population 1, markets aggregate genuine analytical skill. If it's Population 2, markets track who has access to government communications. The signal quality is real in both cases — but for different reasons.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;Prediction markets on Iran are useful as a rough prior — a starting estimate before you apply your own analysis. They are not useful as a substitute for structural analysis.&lt;/p&gt;

&lt;p&gt;The 67% nuclear deal by June 30 tells you what the aggregate of informed and uninformed bettors believes will happen. It does not tell you whether the deal, if it happens, will matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to watch in the next 48 hours:&lt;/strong&gt; The ceasefire expires April 22. Watch whether the ceasefire extension contract price spike precedes or follows the official announcement. That timing will tell you more about information asymmetry in Iran prediction markets than any regulatory filing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://thearcofpower.com/blog/iran-prediction-markets-polymarket-insider-trading-ceasefire-2026" rel="noopener noreferrer"&gt;The Arc of Power&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geopolitics</category>
      <category>analysis</category>
      <category>markets</category>
      <category>prediction</category>
    </item>
    <item>
      <title>Hermes Agent v0.10: Local AGI Stack &amp; Browser Guide</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 03:53:49 +0000</pubDate>
      <link>https://dev.to/max_quimby/hermes-agent-v010-local-agi-stack-browser-guide-33bo</link>
      <guid>https://dev.to/max_quimby/hermes-agent-v010-local-agi-stack-browser-guide-33bo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/hermes-agent-review-local-agi-stack-browser-integration-2026" rel="noopener noreferrer"&gt;Read the full version with diagrams and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In seven weeks, &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt; went from zero to 95,600 GitHub stars — the fastest star velocity of any agent framework in 2026. The question isn't whether Hermes Agent matters. The question is what v0.10.0 (released April 16, 2026) actually changes — and whether local deployment and browser integration are ready for production use.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's New in v0.10.0 (v2026.4.16)
&lt;/h2&gt;

&lt;p&gt;The v0.10 release is the most practically significant update for developers who want to run Hermes without API costs or need browser automation in their workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key additions in v0.10:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama integration&lt;/strong&gt; — First-class local model support via Ollama, llama.cpp, and vLLM with zero API cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hermes-plugin-chrome-profiles&lt;/strong&gt; — Experimental Chrome CDP integration for multi-profile browser automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser Use v0.8.0+&lt;/strong&gt; — Upgraded browser automation with better reliability and vision integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GEPA v2 improvements&lt;/strong&gt; — Faster evolution cycles for the self-improvement engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android/Termux support&lt;/strong&gt; — Hermes can now run natively on Android devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The install story hasn't changed: one command, works everywhere.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Local Deployment: Ollama Integration in Practice
&lt;/h2&gt;

&lt;p&gt;The case for local Hermes is straightforward: if you're running a long-horizon autonomous task — a 2-hour coding session, a research crawl, a data pipeline — API costs compound fast. Switching to Ollama means the economics of "leave it running" change completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Requirements
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.ollama.com/integrations/hermes" rel="noopener noreferrer"&gt;Official Ollama integration docs&lt;/a&gt; are specific about what local deployment requires:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apple Silicon (M2/M3/M4)&lt;/td&gt;
&lt;td&gt;Unified RAM (≥16GB)&lt;/td&gt;
&lt;td&gt;50-80 tok/s on 7B&lt;/td&gt;
&lt;td&gt;Metal acceleration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA GPU&lt;/td&gt;
&lt;td&gt;8-16GB VRAM+&lt;/td&gt;
&lt;td&gt;60-100+ tok/s on 7B&lt;/td&gt;
&lt;td&gt;CUDA via Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU-only&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;3-8 tok/s on 7B&lt;/td&gt;
&lt;td&gt;Usable, not recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The recommendation is a 7B or 13B model with 64K+ context window. Models with shorter contexts will truncate mid-task and produce inconsistent results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama first (if not already)&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama  &lt;span class="c"&gt;# macOS&lt;/span&gt;

&lt;span class="c"&gt;# Pull a compatible model (llama3.1 has 128K context natively)&lt;/span&gt;
ollama pull llama3.1:8b

&lt;span class="c"&gt;# Configure Hermes to use local model&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/config.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
llm:
  provider: ollama
  model: llama3.1:8b
  base_url: http://localhost:11434
  context_window: 65536
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Start Ollama server&lt;/span&gt;
ollama serve &amp;amp;

&lt;span class="c"&gt;# Run Hermes&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"your task here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Context Window Constraint
&lt;/h3&gt;

&lt;p&gt;The critical gotcha: &lt;strong&gt;your model must support ≥64K context&lt;/strong&gt; for reliable multi-step tasks. Most quantized 7B models default to 4K or 8K context.&lt;/p&gt;

&lt;p&gt;Models confirmed to work well with local Hermes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;llama3.1:8b&lt;/code&gt; (128K context natively)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mistral:7b-instruct-q4_K_M&lt;/code&gt; (64K context with extended config)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qwen2.5:14b&lt;/code&gt; (32K context, good for medium tasks)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deepseek-coder-v2:16b&lt;/code&gt; (128K context, strong for coding tasks)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Browser Integration: CDP and Browser Use
&lt;/h2&gt;

&lt;p&gt;Hermes ships with two browser automation layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser Use v0.8.0+&lt;/strong&gt; is the default — high-level API for navigation, form filling, clicking, and vision-enabled page reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;hermes-plugin-chrome-profiles&lt;/strong&gt; is the experimental CDP layer for multi-account workflows. It lets you connect to a running Chrome instance and switch between profiles programmatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Browser Use is bundled — just enable it in config&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/config.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
tools:
  browser:
    enabled: true
    provider: browser_use
    headless: false
    timeout: 30
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;hermes run &lt;span class="s2"&gt;"Research and summarize the top 5 HN posts from today, save to research-notes.md"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CDP plugin is useful for multi-account testing but not production-stable — &lt;a href="https://news.ycombinator.com/item?id=47726913" rel="noopener noreferrer"&gt;community reports&lt;/a&gt; of connection drops mid-task. Treat it as beta.&lt;/p&gt;




&lt;h2&gt;
  
  
  The GEPA Self-Improvement Engine
&lt;/h2&gt;

&lt;p&gt;GEPA (Genetic Evolution of Prompt Architectures) was presented as an &lt;a href="https://github.com/NousResearch/hermes-agent-self-evolution" rel="noopener noreferrer"&gt;ICLR 2026 Oral&lt;/a&gt;. The mechanism: GEPA reads execution traces, identifies failure patterns, and proposes improvements to skill prompts. Unlike simple retry logic, GEPA does causal analysis — it tries to understand &lt;em&gt;why&lt;/em&gt; something failed.&lt;/p&gt;

&lt;p&gt;The 40% speedup on repeat tasks is achievable, but accumulates over time. The first hour feels similar to any other agent. By hour two, after 15-20 similar tasks, the improvement becomes noticeable.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Self-Grading Problem
&lt;/h3&gt;

&lt;p&gt;Hermes's self-evaluation is optimistic. The workaround: explicit success criteria.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Instead of vague prompts:&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"Fix the authentication bug in auth.py"&lt;/span&gt;

&lt;span class="c"&gt;# Use verifiable success criteria:&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"Fix the authentication bug in auth.py.
Success criteria:
1. All tests in test_auth.py pass
2. Login endpoint returns 200 for valid credentials
3. Login endpoint returns 401 for invalid credentials
Run the tests and show output before marking complete."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Hermes vs Claude Code: Complementary, Not Competing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/articles/openclaw-vs-hermes-what-reddit-says" rel="noopener noreferrer"&gt;Community consensus on Reddit&lt;/a&gt;: these are complementary tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hermes excels at:&lt;/strong&gt; long-horizon orchestration, repetitive workflows, local deployment, multi-agent coordination, persistent memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code excels at:&lt;/strong&gt; deep intensive coding, complex architecture decisions, production-critical changes, interactive debugging.&lt;/p&gt;

&lt;p&gt;The practical pattern: Hermes runs background orchestration, calls Claude Code for intensive steps, accumulates skills from each cycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

&lt;span class="c"&gt;# Cloud API path&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ANTHROPIC_API_KEY=your-key"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/.env
hermes run &lt;span class="s2"&gt;"your first task"&lt;/span&gt;

&lt;span class="c"&gt;# Local Ollama path (zero cost)&lt;/span&gt;
ollama pull llama3.1:8b
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;llm.provider ollama llm.model llama3.1:8b
hermes run &lt;span class="s2"&gt;"your first task"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,600 stars in seven weeks is an endorsement of the concept. v0.10 is the release where the execution starts catching up to the pitch.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/hermes-agent-review-local-agi-stack-browser-integration-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
    <item>
      <title>Kimi K2.6 vs Claude Opus 4.7: The 88% Cost Advantage</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 03:28:38 +0000</pubDate>
      <link>https://dev.to/max_quimby/kimi-k26-vs-claude-opus-47-the-88-cost-advantage-2916</link>
      <guid>https://dev.to/max_quimby/kimi-k26-vs-claude-opus-47-the-88-cost-advantage-2916</guid>
      <description>&lt;p&gt;When Clement Delangue, the CEO of Hugging Face, called Kimi K2.6 a standout open-source model on the day of its release, the AI procurement conversation shifted. Not because a Chinese model was competitive — Kimi's K2 family and DeepSeek had already proved that point — but because of what &lt;em&gt;competitive&lt;/em&gt; now costs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kimi K2.6, the latest open-weight model from Beijing-based Moonshot AI, runs at &lt;strong&gt;$0.60 per million input tokens&lt;/strong&gt; on the official API. &lt;a href="https://openrouter.ai/anthropic/claude-opus-4.7" rel="noopener noreferrer"&gt;Claude Opus 4.7&lt;/a&gt;, Anthropic's frontier model, costs &lt;strong&gt;$5.00 per million input tokens&lt;/strong&gt;. That's an 8.3× difference — or roughly 88% cheaper.&lt;/p&gt;

&lt;p&gt;If your team spends $10,000 a month on Claude Opus 4.7 today, K2.6 could in theory handle the same workload for $1,200. Engineering teams are already running the math. This guide gives you the honest version of that calculation: where K2.6 delivers, where it doesn't, and how to make the decision without the hype in either direction.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture Behind the Price
&lt;/h2&gt;

&lt;p&gt;The reason Kimi K2.6 can be so cheap while performing at frontier level comes down to architecture. K2.6 is a &lt;strong&gt;Mixture-of-Experts (MoE) model&lt;/strong&gt;: it has 1 trillion total parameters but activates only 32 billion per token during inference.&lt;/p&gt;

&lt;p&gt;Dense models pay the full computational cost of every parameter on every token. MoE models route each token through a small subset of specialized "expert" subnetworks. The result is trillion-parameter model quality at a fraction of the inference cost — which flows directly to the API price.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5ae684sfat6snv8czy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5ae684sfat6snv8czy.jpg" alt="MoE architecture diagram showing how Kimi K2.6 routes tokens through 8 of 384 experts, activating only 32B of 1T total parameters" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;K2.6's MoE structure is unusually large-scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;384 expert subnetworks&lt;/strong&gt;, with 8 selected per token plus 1 shared expert&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;61 transformer layers&lt;/strong&gt; (including 1 dense layer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-head Latent Attention (MLA)&lt;/strong&gt; mechanism for efficient long-context processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;256K token context window&lt;/strong&gt; — enough to process entire large codebases in a single prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoonViT vision encoder&lt;/strong&gt; (400M parameters) for native multimodal input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 256K context and 160K-token vocabulary round out a model that's clearly engineered for production coding workloads, not benchmark optimization.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ MoE models have a catch: they're harder to run locally. At 1T total parameters, K2.6 requires significant hardware even with 8-bit quantization. Community quantizations exist on HuggingFace (via unsloth and ubergarm), but self-hosted K2.6 is a serious infrastructure commitment. If local deployment is your goal, smaller Chinese open-source models may be more practical.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Benchmarks: Where K2.6 Actually Leads
&lt;/h2&gt;

&lt;p&gt;Benchmark theater is a real phenomenon in AI. But some numbers here are worth taking seriously because they map to real engineering workloads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Kimi K2.6&lt;/th&gt;
&lt;th&gt;Claude Opus 4.7&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53.4&lt;/td&gt;
&lt;td&gt;57.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HLE Full w/ Tools&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;54.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53.0&lt;/td&gt;
&lt;td&gt;52.1&lt;/td&gt;
&lt;td&gt;51.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BrowseComp&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;82.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Verified&lt;/td&gt;
&lt;td&gt;80.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Input Price&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.60/M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5.00/M&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Output Price&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.50/M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$25.00/M&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;SWE-Bench Pro&lt;/strong&gt; measures performance on real GitHub issues — actual engineering tasks, not constructed problems. K2.6's 58.6 vs Claude Opus 4.6's 53.4 is a meaningful gap on the metric that matters most to software teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HLE (Humanity's Last Exam) with Tools&lt;/strong&gt; is a research-grade exam specifically designed to resist AI memorization. K2.6 leads all frontier models at 54.0, placing above Claude Opus 4.6 (53.0) and GPT-5.4 (52.1). This is surprising for a model priced as a "budget" alternative.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ These benchmarks are from Moonshot AI's own release. Independent, third-party SWE-Bench Pro evaluations are still catching up. Take the K2.6-specific numbers with the usual caveat applied to vendor benchmarks — the HN community reception and Cursor integration are better early signals than the numbers alone.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Agent Swarm Capability
&lt;/h2&gt;

&lt;p&gt;Beyond raw benchmark scores, K2.6 introduces a capability that doesn't have an obvious analogue in Opus 4.7: &lt;strong&gt;agent swarm scaling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;K2.6 can orchestrate up to &lt;strong&gt;300 sub-agents executing 4,000 coordinated steps&lt;/strong&gt; — decomposing a complex task into parallel, domain-specialized subtasks running simultaneously. According to &lt;a href="https://www.kimi.com/blog/kimi-k2-6" rel="noopener noreferrer"&gt;Moonshot's technical blog&lt;/a&gt;, real-world case studies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing Zig inference performance from 15 to 193 tokens/second over a 12-hour autonomous run&lt;/li&gt;
&lt;li&gt;Overhauling a financial matching engine from 0.43 to 1.24 million transactions/second (185% improvement) over a 13-hour session&lt;/li&gt;
&lt;li&gt;Generating full-stack websites with databases from text-only prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A "Claw Groups" preview feature lets humans and agents collaborate in a shared operational space, with task-to-agent matching and failure detection. This positions K2.6 less as a chat model and more as an infrastructure primitive for long-horizon background workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Developer Reception: What the HN Thread Reveals
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=47835735" rel="noopener noreferrer"&gt;Kimi K2.6 Hacker News thread&lt;/a&gt; scored 592 points with 303 comments within hours of release — unusually strong engagement for a non-US model launch.&lt;/p&gt;

&lt;p&gt;The developer sentiment breaks roughly into thirds:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bullish:&lt;/strong&gt; "Dirt cheap on OpenRouter for how good it is" (regularfry). Simon Willison posted a live demo of K2.6 generating animated SVG HTML via OpenRouter, citing it as practical and fast. One commenter confirmed K2.6 &lt;strong&gt;powers Cursor's composer-2 model&lt;/strong&gt; — a real-world quality endorsement that's harder to fake than a benchmark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skeptical:&lt;/strong&gt; "Tried it once... my experience was just okay-ish despite strong benchmarks." Some users report it "does only slightly better than Kimi K2.5" and "struggles with domain-specific tasks."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Philosophical:&lt;/strong&gt; "Funny that Chinese companies are pioneering possibly the world's most important tech via open source while the US goes closed" — a sentiment that lands differently when you consider DeepSeek R1, Qwen, and now K2.6 all dropped open weights.&lt;/p&gt;

&lt;p&gt;The median impression aligns with &lt;a href="https://benchlm.ai/compare/claude-opus-4-7-vs-kimi-k2-5" rel="noopener noreferrer"&gt;BenchLM's Claude Opus 4.7 vs Kimi K2.5 comparison&lt;/a&gt;: Claude leads overall (94 vs 68) with its sharpest advantage in agentic reliability. K2.6 closes that gap meaningfully, but the gap hasn't entirely closed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Qwen3.6-Max-Preview Context: Two Chinese Models in One Day
&lt;/h2&gt;

&lt;p&gt;K2.6 didn't land in isolation. On the same day — April 20, 2026 — Alibaba released &lt;a href="https://decrypt.co/364948/alibaba-qwen-3-6-max-preview-most-powerful-model" rel="noopener noreferrer"&gt;Qwen3.6-Max-Preview&lt;/a&gt;, topping six major coding benchmarks including SWE-benchPro, Terminal-Bench 2.0, SkillsBench, and SciCode.&lt;/p&gt;

&lt;p&gt;Qwen3.6-Max-Preview is proprietary (no open weights), but the convergence of two major Chinese AI releases on the same day is structurally significant. &lt;a href="https://importai.substack.com/p/import-ai-454-automating-alignment" rel="noopener noreferrer"&gt;Jack Clark's Import AI newsletter&lt;/a&gt; has tracked this arc: Chinese models are no longer "almost competitive" — they're trading leads on specific benchmarks with the frontier models from Anthropic, OpenAI, and Google.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://chinai.substack.com/p/chinai-291-chinese-open-source-models" rel="noopener noreferrer"&gt;ChinAI newsletter&lt;/a&gt; framed it earlier this year: "Chinese open-source models are now leading foreign open-source models and closing in on global first-tier closed-source models." April 20 is a data point, not an anomaly.&lt;/p&gt;

&lt;p&gt;If you've been following &lt;a href="https://computeleap.com/blog/qwen3-35b-a3b-local-mac-setup-lm-studio-open-source" rel="noopener noreferrer"&gt;our Qwen 3.5B local setup guide&lt;/a&gt;, K2.6 is the cloud-API counterpart to that story — optimized for different constraints but part of the same structural trend.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use Kimi K2.6
&lt;/h2&gt;

&lt;p&gt;K2.6 is the right choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-horizon coding tasks&lt;/strong&gt; — multi-hour autonomous runs on well-scoped engineering problems, where the agent swarm architecture pays off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume production workloads&lt;/strong&gt; — teams spending $5K+/month on Opus-level API calls where the 88% cost delta is real money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-shot code generation&lt;/strong&gt; — initial code scaffolding, UI generation from design prompts, full-stack boilerplate where SWE-Bench Pro performance matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent orchestration&lt;/strong&gt; — building multi-agent systems (see &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;our OpenAI Agents Python SDK tutorial&lt;/a&gt; for framework context) where K2.6's 300-sub-agent ceiling gives headroom&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-tier architectures&lt;/strong&gt; — using K2.6 for first-pass generation and Claude for final review/validation captures most of the cost savings without sacrificing output quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Claude Opus 4.7 Is Still Worth the Premium
&lt;/h2&gt;

&lt;p&gt;Stick with Opus 4.7 when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning under ambiguity&lt;/strong&gt; — open-ended problems where the model needs judgment, not execution; Claude's agentic reliability lead is real&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production workloads where errors are expensive&lt;/strong&gt; — if a wrong answer costs $10K to fix, the API call price is irrelevant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise compliance&lt;/strong&gt; — Anthropic's usage policies, data handling, and audit trails are more mature than Moonshot's at the enterprise procurement level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal tasks requiring judgment&lt;/strong&gt; — vision tasks that need contextual interpretation, not just image recognition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative and long-form writing&lt;/strong&gt; — anecdotal but consistent: Claude's prose quality and editorial judgment remain ahead&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The hybrid approach is underrated: use K2.6 for code generation and execution, Claude Opus 4.7 for planning and validation. Our &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;API cost comparison&lt;/a&gt; showed that most production AI spend is concentrated in generation volume — exactly where the K2.6 cost advantage is largest.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Accessing K2.6: Your Options
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kimi.com API (direct):&lt;/strong&gt; &lt;code&gt;$0.60/M&lt;/code&gt; input, &lt;code&gt;$2.50/M&lt;/code&gt; output. Compatible with the OpenAI Python SDK via base URL swap — no code refactoring if you're already calling OpenAI-compatible endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenRouter:&lt;/strong&gt; &lt;code&gt;$0.60/M&lt;/code&gt; input, &lt;code&gt;$2.80/M&lt;/code&gt; output (slight markup). Useful for routing alongside other models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted:&lt;/strong&gt; Available on HuggingFace under Modified MIT license. Requires &lt;code&gt;transformers &amp;gt;=4.57.1&lt;/code&gt;. Recommended inference: vLLM or SGLang. Commercial restriction applies for entities with 100M+ MAU or $20M+ monthly revenue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Drop-in replacement for OpenAI-compatible code
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-kimi-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.kimi.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kimi-k2.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI SDK compatibility is the practical win here — most teams can A/B test K2.6 against their current model with a one-line base URL change.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Kimi K2.6 is not a Claude Opus 4.7 replacement for all workloads. But for code generation at volume, long-horizon agent tasks, and cost-sensitive production workloads, K2.6 delivers at a price point that makes the tradeoffs genuinely favorable.&lt;/p&gt;

&lt;p&gt;The hidden cost of cheap models is real — we covered it &lt;a href="https://computeleap.com/blog/hidden-cost-cheap-ai-reasoning-models-2026" rel="noopener noreferrer"&gt;here&lt;/a&gt;. But the hidden cost of expensive models is also real: teams that overpay for capabilities they don't use, or avoid running AI on high-volume tasks because the math doesn't work. K2.6 makes more tasks economically viable, and that's worth something even if you keep Claude for the hard stuff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick decision:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-volume coding generation → &lt;strong&gt;K2.6&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Complex reasoning, enterprise compliance, judgment-heavy tasks → &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Both → &lt;strong&gt;two-tier architecture&lt;/strong&gt; (K2.6 generates, Claude validates)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>deer-flow vs evolver vs GenericAgent: Production-Ready?</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Mon, 20 Apr 2026 04:22:45 +0000</pubDate>
      <link>https://dev.to/max_quimby/deer-flow-vs-evolver-vs-genericagent-production-ready-33m6</link>
      <guid>https://dev.to/max_quimby/deer-flow-vs-evolver-vs-genericagent-production-ready-33m6</guid>
      <description>&lt;h1&gt;
  
  
  deer-flow vs evolver vs GenericAgent: Production-Ready?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026" rel="noopener noreferrer"&gt;Read the full version with diagrams and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 19, 2026, three self-evolving agent frameworks landed simultaneously in GitHub's global top 10: &lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;bytedance/deer-flow&lt;/a&gt; at 62,800 stars, &lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; at 5,700 stars, and &lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;lsdefine/GenericAgent&lt;/a&gt; at 4,600 stars. That's not three projects trending. That's a category arriving.&lt;/p&gt;

&lt;p&gt;The timing matters. We've &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-genericagent-evomap-skill-trees-guide" rel="noopener noreferrer"&gt;already covered GenericAgent and EvoMap's skill-tree approaches&lt;/a&gt; in detail. What hasn't been covered is how they compare to deer-flow, which is by far the largest of the three — and how all three stack up on the question that actually matters for teams considering them: can you run this in production without it becoming a liability?&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Self-Evolving" Actually Means (And What It Doesn't)
&lt;/h2&gt;

&lt;p&gt;Before comparing frameworks, the clarification that saves everyone time: &lt;strong&gt;none of these systems modify their underlying model weights.&lt;/strong&gt; This is important because the marketing doesn't always make it clear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2507.21046" rel="noopener noreferrer"&gt;The academic survey that anchors this category&lt;/a&gt; defines the feedback loop cleanly: agent executes a task → environment responds → optimizer extracts patterns → skill store is updated → next execution draws on those patterns. The agent improves over time not because the model gets smarter, but because the tools available to the model improve.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=44884091" rel="noopener noreferrer"&gt;Hacker News discussion&lt;/a&gt; put it plainly: "Self-improvement is really prompt/tool optimization, not weight updates." The skeptic position is correct if you're expecting AGI-style capability jumps. The practitioner position is also correct: process recursion — skill accumulation — is a genuine capability improvement, even if it's not the learning the term implies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44884091" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-self-evolving-agents-survey-94pts.png" alt="HN: A Comprehensive Survey of Self-Evolving AI Agents — 94 points, 29 comments" width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With that framing established, here are the three frameworks.&lt;/p&gt;




&lt;h2&gt;
  
  
  deer-flow (ByteDance) — The SuperAgent Harness
&lt;/h2&gt;

&lt;p&gt;At 62,800 stars, deer-flow isn't just the largest self-evolving framework on GitHub — it's one of the largest agent frameworks period. It claimed #1 on GitHub Trending in February 2026 when version 2 launched, and crossed 60,000 stars within weeks.&lt;/p&gt;

&lt;p&gt;The core concept is what ByteDance calls a "SuperAgent harness." Rather than a single intelligent agent, deer-flow is &lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;an orchestration runtime&lt;/a&gt; that gives agents the infrastructure to actually get work done: a lead agent that decomposes complex tasks into parallelizable sub-tasks, spawning sub-agents with scoped contexts, running them concurrently, then synthesizing results into a coherent output. The framework handles tasks that "take minutes to hours."&lt;/p&gt;

&lt;p&gt;What makes this concrete is the execution environment. As &lt;a href="https://dev.to/arshtechpro/deerflow-20-what-it-is-how-it-works-and-why-developers-should-pay-attention-3ip3"&gt;Dev.to's technical breakdown&lt;/a&gt; put it directly: "The agent does not suggest a bash command. It runs it." Deer-flow provides agents with an isolated Docker container with filesystem access and a bash terminal — actual compute, not a sandbox emulation.&lt;/p&gt;

&lt;p&gt;Key architecture decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sub-agent parallelization&lt;/strong&gt;: Scoped contexts, concurrent execution, convergent synthesis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent memory&lt;/strong&gt;: Asynchronous debounced queue tracking user preferences and project state across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills system&lt;/strong&gt;: Markdown-based workflow definitions (extensible without code changes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model agnosticism&lt;/strong&gt;: Works with GPT-4, Claude, DeepSeek, Kimi, Doubao-Seed, and Ollama&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The production deployment guidance is notably serious. The documentation specifies 8+ vCPU / 16GB RAM minimum for server deployment, Docker-based production and development modes, and explicit warnings about untrusted network exposure with IP allowlisting and VLAN isolation recommendations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ByteDance factor:&lt;/strong&gt; &lt;a href="https://venturebeat.com/orchestration/what-is-deerflow-and-what-should-enterprises-know-about-this-new-local-ai" rel="noopener noreferrer"&gt;VentureBeat noted&lt;/a&gt; that "ByteDance provenance may trigger organizational review processes." Enterprise teams in regulated industries or US government-adjacent environments should route this through procurement before deploying. MIT-licensed, fully auditable codebase — but the organizational source still matters for some teams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-deerflow-bytedance-superagent-62k.png" alt="DeerFlow: 62,800 GitHub stars, #1 trending Feb 2026, ByteDance SuperAgent harness" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built on:&lt;/strong&gt; LangGraph + LangChain. If your team already uses LangGraph for orchestration, deer-flow's mental model will feel familiar.&lt;/p&gt;




&lt;h2&gt;
  
  
  evolver (EvoMap) — Genome Evolution Protocol
&lt;/h2&gt;

&lt;p&gt;At 5,700 stars, &lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; is the smallest of the three by star count but the most distinctive by architecture. It introduced the Genome Evolution Protocol (GEP) — a framework for treating prompt evolution as a structured, auditable process analogous to biological gene expression.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://evomap.ai/blog/gep-protocol-deep-dive" rel="noopener noreferrer"&gt;GEP deep dive&lt;/a&gt; explains the key insight: rather than letting agents evolve through raw trial-and-error, GEP solidifies successful behaviors into three reusable asset types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Genes&lt;/strong&gt;: Atomic capability units — validated code or prompt fragments for a single operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capsules&lt;/strong&gt;: Successful task execution paths — complex problem solutions encoded as reusable workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Events&lt;/strong&gt;: Immutable evolution logs — every mutation (Innovation) or repair (Repair) recorded with full context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The operational logic is disciplined: the 70/30 rule allocates 70% of compute to stability (Repair mode) and 30% to capability expansion (Feature mode). When crashes or tool call failures are detected, evolver enters Repair Mode and follows explicit protocol gates before any mutation.&lt;/p&gt;

&lt;p&gt;Critically: &lt;strong&gt;evolver does not edit code directly.&lt;/strong&gt; It generates guided prompts for human review or integration with host runtimes. This limits scope — and also limits blast radius.&lt;/p&gt;

&lt;p&gt;The launch story is worth knowing: evolver hit the top of ClawHub within 10 minutes of release in February 2026, racking up 36,000 downloads in three days. It later became the center of a plagiarism controversy when EvoMap accused Hermes Agent (released March 2026) of copying evolver's self-evolution architecture — a 24-39 day window from evolver's open-source release to Hermes Agent's similar feature shipping.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-evolver-evomap-gep-protocol.png" alt="EvoMap/evolver: GEP Genome Evolution Protocol — 5,700 stars, 36K ClawHub downloads" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams that need compliance-friendly audit trails for agent behavior changes, or deployments in regulated environments where agent mutations need to be explainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  GenericAgent (lsdefine) — The Minimal Skill Tree
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;GenericAgent&lt;/a&gt; makes its design philosophy explicit: "grows a skill tree from a 3,300-line seed, achieving full system control with 6x less token consumption." The Fudan University team built something unusually minimal — the entire framework is ~3K lines with a ~100-line agent loop.&lt;/p&gt;

&lt;p&gt;The architecture is built around five layers of memory (L0–L4):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L0&lt;/strong&gt;: Meta-rules (agent identity and constraints)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L1&lt;/strong&gt;: Insights (generalized patterns from past tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2&lt;/strong&gt;: Global facts (persistent world knowledge)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L3&lt;/strong&gt;: Task skills (crystallized execution paths from completed tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L4&lt;/strong&gt;: Session archives (full interaction logs, added April 2026)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When GenericAgent completes a task, it automatically crystallizes the execution path as a skill file. As &lt;a href="https://pyshine.com/GenericAgent-Self-Evolving-AI-Agent/" rel="noopener noreferrer"&gt;PyShine's walkthrough&lt;/a&gt; notes: "After a few weeks, an agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code."&lt;/p&gt;

&lt;p&gt;The token efficiency claim is real and measurable. Where comparable agents require 200K–1M token context windows, GenericAgent operates under 30K by loading only relevant skills from memory rather than the full history. The "6x less" figure comes from this selective loading compared to agents that stuff entire conversation histories into context.&lt;/p&gt;

&lt;p&gt;Nine atomic tools cover the full system control surface: browser (with preserved login sessions), terminal, filesystem, keyboard/mouse input, screen vision, and mobile ADB. Multi-model: supports Claude, Gemini, Kimi, MiniMax.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-genericagent-skill-tree-4k.png" alt="GenericAgent: 4,600 stars, 6x token reduction, Fudan team self-evolving skill tree" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Cost-conscious teams running long-running autonomous agents where token efficiency directly maps to operational cost. Also the most approachable codebase of the three — 3,300 lines is something a team can actually audit in a week.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Security Reality No One Mentions
&lt;/h2&gt;

&lt;p&gt;All three frameworks share a category-level risk that &lt;a href="https://simonw.substack.com/p/the-lethal-trifecta-for-ai-agents" rel="noopener noreferrer"&gt;Simon Willison identified&lt;/a&gt; as "the lethal trifecta": if an agent combines (1) access to private data, (2) exposure to untrusted content, and (3) the ability to externally communicate, an attacker can trick it into exfiltrating private data to an external endpoint. Self-evolving agents make this attack surface significantly larger than standard API-call agents.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control" rel="noopener noreferrer"&gt;2026 AI Agent Security Report&lt;/a&gt; puts it starkly: 88% of organizations confirmed or suspected security incidents involving AI agents in the last year. Only 24.4% have full visibility into which agents are communicating with each other. More than half run with no security oversight or logging.&lt;/p&gt;

&lt;p&gt;For self-evolving frameworks specifically, the risk compounds: if the framework modifies agent behavior over time (as all three do), security review at deployment isn't sufficient — you need ongoing behavioral monitoring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bvp.com/atlas/securing-ai-agents-the-defining-cybersecurity-challenge-of-2026" rel="noopener noreferrer"&gt;Bessemer Venture Partners&lt;/a&gt; frames the identity problem: "In a mature agentic ecosystem, swarms of agents may be instantiated to perform a single task and then decommissioned within minutes — traditional security architectures that rely on periodic scans will fail to detect these identities entirely."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical mitigation per framework:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;deer-flow&lt;/strong&gt;: Docker sandbox isolation is built-in; use it. Enable IP allowlisting and VLAN isolation as the docs recommend. Monitor sub-agent spawning rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;evolver&lt;/strong&gt;: Use Review mode and validation steps. The audit trail via Events is the strongest governance artifact of the three.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenericAgent&lt;/strong&gt;: Audit the skill tree periodically. Skills accumulate without a built-in approval gate — add one in production deployments.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Decision Matrix
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fself-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026-diagram-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fself-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026-diagram-1.jpg" alt="Comparison table: deer-flow vs evolver vs GenericAgent — stars, architecture, security, production readiness" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;deer-flow&lt;/th&gt;
&lt;th&gt;evolver&lt;/th&gt;
&lt;th&gt;GenericAgent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;62.8k&lt;/td&gt;
&lt;td&gt;5.7k&lt;/td&gt;
&lt;td&gt;4.6k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python + TypeScript&lt;/td&gt;
&lt;td&gt;JavaScript&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-evolution type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sub-agent + memory&lt;/td&gt;
&lt;td&gt;Prompt/gene evolution&lt;/td&gt;
&lt;td&gt;Skill tree accumulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;6x vs. alternatives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sandbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker (built-in)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit trail&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LangSmith/Langfuse&lt;/td&gt;
&lt;td&gt;Built-in Events log&lt;/td&gt;
&lt;td&gt;Session archive (L4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ByteDance provenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Production-ready&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (with hardening)&lt;/td&gt;
&lt;td&gt;Yes (limited scope)&lt;/td&gt;
&lt;td&gt;Yes (with monitoring)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose deer-flow&lt;/strong&gt; when you're building long-horizon autonomous tasks — research pipelines, multi-step code generation, content workflows that run for hours. The Docker sandbox, sub-agent parallelization, and extensive deployment documentation make it the most enterprise-ready despite the ByteDance provenance consideration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose evolver&lt;/strong&gt; when compliance and audit trails are non-negotiable. The GEP protocol's structured mutation model is the only framework here that produces a legally defensible record of every agent behavior change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose GenericAgent&lt;/strong&gt; when token cost is the primary constraint, or when you want a framework small enough to audit completely. The 3,300-line codebase is readable by a small team in a week. The 6x token efficiency advantage is real and meaningful at production scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of the above&lt;/strong&gt; if you're building a customer-facing application where adversarial users could reach the agent with untrusted content. All three need additional input sanitization and communication controls before they're safe in that context.&lt;/p&gt;

&lt;p&gt;For context on related frameworks: the &lt;a href="https://agentconn.com/blog/nousresearch-hermes-agent-self-improving-framework-review" rel="noopener noreferrer"&gt;hermes-agent review&lt;/a&gt; covers NousResearch's self-improving framework (95.6K stars) which is the highest-starred in this category but follows a different architectural approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;deer-flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/bytedance/deer-flow
&lt;span class="nb"&gt;cd &lt;/span&gt;deer-flow &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;code&gt;localhost:3000&lt;/code&gt;. Works with any OpenAI-compatible API key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;evolver:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @evomap/evolver
evolver init &lt;span class="nt"&gt;--mode&lt;/span&gt; review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Review mode prevents any mutation from applying without human confirmation — recommended for first deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GenericAgent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/lsdefine/GenericAgent
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See &lt;code&gt;GETTING_STARTED.md&lt;/code&gt; in the repo — the Fudan team wrote unusually clear onboarding documentation.&lt;/p&gt;




&lt;p&gt;The category is real. Three frameworks at 62.8k, 5.7k, and 4.6k stars trending simultaneously isn't noise — it's the infrastructure layer of agentic AI arriving in production-deployable form. The question isn't whether to pay attention; it's which one fits your actual use case, and whether your team has thought through the security posture before the first deployment.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/abs/2507.21046" rel="noopener noreferrer"&gt;comprehensive academic survey&lt;/a&gt; ends with an observation worth sitting with: "The challenge isn't making agents that learn — it's making agents whose learning is observable, bounded, and reversible." All three frameworks here have made progress on the first goal. The second and third are still largely up to the team deploying them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>security</category>
    </item>
    <item>
      <title>openai-agents-python: Build Multi-Agent AI Workflows (2026)</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:38:53 +0000</pubDate>
      <link>https://dev.to/max_quimby/openai-agents-python-build-multi-agent-ai-workflows-2026-45gk</link>
      <guid>https://dev.to/max_quimby/openai-agents-python-build-multi-agent-ai-workflows-2026-45gk</guid>
      <description>&lt;p&gt;OpenAI's &lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;openai-agents-python&lt;/a&gt; crossed 22,981 GitHub stars this week — gaining 751 in a single day and landing at #2 on GitHub's global trending list. That's not hype noise. It's developer validation. And it happened the same week OpenAI rolled out sandbox execution support for enterprise deployments, cementing this library's position as the most-starred agent framework on the platform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;Read the full version with charts, code, and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But star counts tell you nothing about whether something is worth learning. So this tutorial skips the marketing and goes straight to the code. By the end, you'll have a working multi-agent research pipeline you can actually run — and an honest assessment of when this SDK makes sense versus building the same workflow with Anthropic's Claude.&lt;/p&gt;

&lt;p&gt;Today's intelligence signals confirm what GitHub is showing: &lt;strong&gt;5 of the top 7 trending AI repos are explicitly multi-agent or self-evolving systems&lt;/strong&gt;. The infrastructure layer is materializing. If you're a developer building anything AI-adjacent in 2026, understanding how agent orchestration actually works — not in theory, but in production — is now a baseline skill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=35nxORG1mtg" rel="noopener noreferrer"&gt;▶️ Watch: Agents SDK from OpenAI! Full Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why openai-agents-python Is Having Its Moment
&lt;/h2&gt;

&lt;p&gt;The library is the official, production-ready successor to OpenAI's experimental &lt;a href="https://github.com/openai/swarm" rel="noopener noreferrer"&gt;Swarm&lt;/a&gt; library. Where Swarm was a research demo, &lt;code&gt;openai-agents-python&lt;/code&gt; ships the same multi-agent primitives in a framework that's designed for real deployments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The SDK is provider-agnostic — it works with OpenAI's APIs and supports 100+ additional LLMs via LiteLLM and compatible adapters. So despite the OpenAI branding, you're not locked in at the model layer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Nine capabilities ship out of the box:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; — LLMs configured with instructions, tools, guardrails, and handoffs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox Agents&lt;/strong&gt; — agents running inside isolated containers for extended tasks (&lt;a href="https://techcrunch.com/2026/04/15/openai-updates-its-agents-sdk-to-help-enterprises-build-safer-more-capable-agents/" rel="noopener noreferrer"&gt;TechCrunch, April 2026&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Delegation&lt;/strong&gt; — agents that function as tools, callable by other agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — function tools, MCP integrations, and hosted tools (file search, web search, code interpreter)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails&lt;/strong&gt; — input/output validation with blocking and tripwire modes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human In The Loop&lt;/strong&gt; — structured pause points for human review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sessions&lt;/strong&gt; — automatic conversation history management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tracing&lt;/strong&gt; — built-in observability integrating with OpenAI's dashboard, Logfire, and OpenTelemetry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice&lt;/strong&gt; — support for &lt;code&gt;gpt-realtime-1.5&lt;/code&gt; voice agents&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Version v0.13 (the current release) added an any-LLM adapter, opt-in retry policies, MCP resource support, and session persistence — making it meaningfully more production-ready than it was at launch. The &lt;a href="https://softmaxdata.com/blog/definitive-guide-to-agentic-frameworks-in-2026-langgraph-crewai-ag2-openai-and-more/" rel="noopener noreferrer"&gt;Definitive Guide to Agentic Frameworks in 2026&lt;/a&gt; ranks it among the top 3 most actively developed frameworks alongside LangGraph and Microsoft's Agent Framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Setup
&lt;/h2&gt;

&lt;p&gt;Requirements: Python 3.10+, an OpenAI API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For voice support:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"openai-agents[voice]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your first agent in under 10 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → "The capital of France is Paris."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the complete hello world. &lt;code&gt;Agent&lt;/code&gt; defines the LLM + instructions + tools. &lt;code&gt;Runner&lt;/code&gt; executes it. &lt;code&gt;run_sync&lt;/code&gt; blocks until the agent produces its final output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Concepts in 5 Minutes
&lt;/h2&gt;

&lt;p&gt;Before building anything non-trivial, you need to understand five primitives.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Agents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You research topics thoroughly.
    Always provide sources and key facts.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;model&lt;/code&gt; parameter defaults to &lt;code&gt;gpt-4o&lt;/code&gt; if omitted. You can swap in any OpenAI model, or any LiteLLM-compatible endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Function Tools
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for information on a topic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Your search implementation here
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use search_web to find information.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@function_tool&lt;/code&gt; decorator auto-generates the JSON schema from your function signature and docstring. Pydantic validation runs on every call — no manual schema writing required.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Handoffs
&lt;/h3&gt;

&lt;p&gt;Handoffs let one agent transfer control entirely to another:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write clear, engaging content based on research provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the topic, then hand off to the Writer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the researcher decides the user would be better served by the writer, it hands off and the writer takes over the conversation entirely. This is a one-way transfer — the researcher is done.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Agent as Tool
&lt;/h3&gt;

&lt;p&gt;The alternative pattern keeps one agent in charge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;writer_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft written content from a research summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;coordinator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coordinator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Orchestrate research and writing. Use draft_content to get the writer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s output.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here the coordinator calls the writer as a function and receives its output — the coordinator never loses control of the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Guardrails
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_guardrail&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@input_guardrail&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safety_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;malicious&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Flagged content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;safe_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SafeAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Help users with their questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_guardrails&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;safety_check&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;tripwire_triggered=True&lt;/code&gt;, the agent never executes — preventing token spend on inputs that would fail downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your First Multi-Agent Workflow
&lt;/h2&gt;

&lt;p&gt;Here's a complete, runnable research pipeline with three specialized agents. You can copy and run this directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;

&lt;span class="c1"&gt;# --- Tool definitions ---
&lt;/span&gt;
&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for information on a given query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace with your actual search API (Tavily, SerpAPI, etc.)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Search results for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: Top 5 results found.]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_draft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Save a draft to disk.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved draft to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# --- Agent definitions ---
&lt;/span&gt;
&lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a critical editor. Review drafts for:
    - Accuracy and factual claims
    - Clear structure and flow
    - Specific, actionable improvements
    Provide a verdict: APPROVED or NEEDS_REVISION.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a clear, concise technical writer.
    Write well-structured content from research notes.
    When done, hand off to the Reviewer for quality check.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;save_draft&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You research topics thoroughly using web_search.
    Gather at least 3 distinct facts or perspectives.
    Summarize your findings, then hand off to the Writer.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Run the pipeline ---
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;🔍 Starting research pipeline for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research this topic and produce a written summary: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✅ Pipeline complete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final output:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s openai-agents-python SDK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;strong&gt;chain&lt;/strong&gt;: &lt;code&gt;Researcher → Writer → Reviewer&lt;/code&gt;. Each agent does its job and hands off. The &lt;code&gt;Runner&lt;/code&gt; handles the entire execution loop — including managing multiple turns if an agent needs to call tools before handing off.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The &lt;a href="https://cookbook.openai.com/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration" rel="noopener noreferrer"&gt;OpenAI Cookbook's multi-agent portfolio collaboration example&lt;/a&gt; is the best reference for production-style patterns — a coordinator calls data analyst, statistician, and report writer as tools and merges their outputs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For debugging, enable tracing to see every step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;
&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_verbose_stdout_logging&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full trace — every LLM call, tool execution, and handoff — is viewable in the OpenAI Traces Dashboard. This is essential for debugging where a pipeline stalls in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handoffs vs. Agent-as-Tool: Which Pattern to Use
&lt;/h2&gt;

&lt;p&gt;This is the core architectural decision in multi-agent systems. The &lt;a href="https://openai.github.io/openai-agents-python/multi_agent/" rel="noopener noreferrer"&gt;official multi-agent docs&lt;/a&gt; define the distinction clearly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Handoff&lt;/th&gt;
&lt;th&gt;Agent-as-Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specialist takes over&lt;/td&gt;
&lt;td&gt;Manager retains control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Conversation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specialist responds directly&lt;/td&gt;
&lt;td&gt;Manager synthesizes output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routing workflows&lt;/td&gt;
&lt;td&gt;Aggregation workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Customer service triage&lt;/td&gt;
&lt;td&gt;Report generation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use handoffs&lt;/strong&gt; when the conversation is inherently routing — the user interacts with whichever specialist is most relevant, and you want that specialist to own the exchange.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use agent-as-tool&lt;/strong&gt; when a manager needs to collect results from multiple specialists and synthesize them. The portfolio collaboration example from OpenAI's cookbook demonstrates this: a coordinator calls a data analyst, statistician, and report writer as tools, then merges their outputs into a final deliverable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v6f2wcg4t8ytih0i96l.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v6f2wcg4t8ytih0i96l.jpg" alt="Side-by-side diagram comparing Handoff pattern (triage routes to specialist who owns conversation) vs Agent-as-Tool pattern (manager calls specialists and synthesizes output)" width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/jangwook_kim_e31e7291ad98/build-your-first-multi-agent-system-with-openai-agents-sdk-step-by-step-python-tutorial-2026-2n79"&gt;Dev.to tutorial by Jangwook Kim&lt;/a&gt; demonstrates both patterns with a complete content production pipeline — worth reading alongside this tutorial for a different angle on the same concepts.&lt;/p&gt;

&lt;p&gt;The developer community has been active on this architectural question. A popular HN thread showed practitioners converging on the same conclusion:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=45654040" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l83b3ri3fcveqkv7pn4.png" alt="HN thread: Show HN Multi-Agent AI with OpenAI Agents SDK — developers debating handoff vs agent-as-tool pattern for report generation workflows" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails That Actually Work in Production
&lt;/h2&gt;

&lt;p&gt;The guardrails system is more sophisticated than it first appears. Two distinct scopes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-level guardrails&lt;/strong&gt; run before the agent processes its turn. Good for filtering malicious inputs, PII, or off-topic requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool-level guardrails&lt;/strong&gt; run on every tool invocation within an agent's execution. Use these when you need to validate what the agent is actually &lt;em&gt;doing&lt;/em&gt;, not just what it received.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;output_guardrail&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="nd"&gt;@output_guardrail&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;no_pii_in_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Ensure no PII leaks in the agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s response.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\d{3}-\d{2}-\d{4}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SSN pattern detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per the &lt;a href="https://openai.github.io/openai-agents-python/guardrails/" rel="noopener noreferrer"&gt;guardrails docs&lt;/a&gt;: "Blocking execution runs and completes the guardrail before the agent starts. If the guardrail tripwire is triggered, the agent never executes, preventing token consumption and tool execution."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Latent Space's analysis found a &lt;strong&gt;60x higher security incident rate&lt;/strong&gt; for agent deployments compared to standard API calls. Guardrails are necessary but not sufficient — you also need robust authentication, access controls, and sandbox execution for agents that touch the filesystem or execute code. OpenAI's April 2026 SDK update added sandbox support via E2B, Modal, Cloudflare, Daytona, Runloop, Vercel, and Blaxel.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  State Management and Sessions
&lt;/h2&gt;

&lt;p&gt;Sessions are the SDK's answer to long-horizon tasks — multi-step workflows where an agent needs to remember context across multiple runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.extensions.sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemorySessionStorage&lt;/span&gt;

&lt;span class="n"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemorySessionStorage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LongRunningAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help users with multi-step tasks. Remember context from previous messages.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# First interaction
&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start a report on market trends in AI agent frameworks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Second interaction — agent remembers the previous exchange
&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Now add a section on the OpenAI Agents SDK specifically.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, swap &lt;code&gt;InMemorySessionStorage&lt;/code&gt; for the Redis-backed session store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"openai-agents[redis]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This persists sessions across server restarts and horizontal scale — essential for production multi-step workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Integration
&lt;/h2&gt;

&lt;p&gt;The SDK supports Model Context Protocol for connecting external tools and data sources. Version 0.0.7+ includes the &lt;code&gt;MCPServerStdio&lt;/code&gt; class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPServerStdio&lt;/span&gt;

&lt;span class="n"&gt;mcp_server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPServerStdio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@modelcontextprotocol/server-filesystem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/workspace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FileAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help with file operations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mcp_servers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mcp_server&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=43485566" rel="noopener noreferrer"&gt;HN discussion on OpenAI's MCP support&lt;/a&gt; captured the developer community's mixed reaction: top criticism is that "MCP overcomplicates tool calling" versus the counterpoint that MCP enables runtime tool discovery — you can add new tools to an MCP server without redeploying your agent code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=43485566" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7pbie9chkiitsse170o.png" alt="HN thread: OpenAI adds MCP support to Agents SDK — 807 points, 267 comments debating complexity vs runtime tool discovery benefits" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For most projects, function tools are simpler and sufficient. Reach for MCP when you need to reuse an existing MCP server ecosystem or when runtime tool discovery is a genuine requirement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;Production deployments bring additional complexity that tutorials rarely cover. Community experience on HN offers the honest take:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44358969" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ckkt82eqx1j6b1dbzgd.png" alt="HN thread: Agentic AI Hands-On in Python — practitioners sharing production war stories about security incidents, guardrails, and sandbox requirements" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability first.&lt;/strong&gt; In multi-agent systems, a single user query can trigger multiple LLM calls, tool executions, and handoffs. Tracing captures all of this. Connect to Logfire or export OpenTelemetry spans to your existing stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token accounting.&lt;/strong&gt; With multi-agent chains, token costs multiply fast. Each handoff means a new context window with the full conversation history. Design your agent instructions to be minimal and your handoff payloads to carry only what the next agent needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel execution.&lt;/strong&gt; For independent subtasks, use &lt;code&gt;asyncio.gather&lt;/code&gt; with multiple &lt;code&gt;Runner.run&lt;/code&gt; calls rather than sequential handoffs. The &lt;a href="https://softmaxdata.com/blog/definitive-guide-to-agentic-frameworks-in-2026-langgraph-crewai-ag2-openai-and-more/" rel="noopener noreferrer"&gt;definitive guide&lt;/a&gt; covers this pattern in depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox for code execution.&lt;/strong&gt; Any agent that can execute arbitrary code should run inside a sandbox. The April 2026 update made this straightforward — pick your sandbox provider from the supported list and pass it to the agent configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Assessment: OpenAI SDK vs. Anthropic Claude SDK
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://composio.dev/content/claude-agents-sdk-vs-openai-agents-sdk-vs-google-adk" rel="noopener noreferrer"&gt;Composio three-way comparison&lt;/a&gt; puts it well: "These represent two competing visions of agentic AI: OpenAI ships an opinionated, batteries-included SDK; Anthropic ships a model plus an open protocol."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose openai-agents-python when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team is already on GPT models and wants minimal switching cost&lt;/li&gt;
&lt;li&gt;You want hosted tools (file_search, web_search, code_interpreter) without managing your own infrastructure&lt;/li&gt;
&lt;li&gt;You need rapid prototyping — hello world in under 10 lines&lt;/li&gt;
&lt;li&gt;Your workflow is routing-oriented (triage → specialist patterns)&lt;/li&gt;
&lt;li&gt;Cost matters for longer sessions: OpenAI bills only tokens; Managed Agents adds $0.08/hour runtime fee that adds up for sessions over 10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Anthropic's Claude SDK when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building multi-model architectures — Claude's SDK is built on MCP, an open standard&lt;/li&gt;
&lt;li&gt;You need native computer control — agents can read files, write code, and execute commands without additional configuration&lt;/li&gt;
&lt;li&gt;Model quality is your primary variable — Polymarket currently prices Anthropic at 92% for "best AI model end of April"&lt;/li&gt;
&lt;li&gt;Vendor lock-in at the protocol layer is a concern (MCP is open; OpenAI's hosted tools are proprietary)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Per &lt;a href="https://agentpatch.ai/blog/openai-agents-sdk-vs-claude-agent-sdk/" rel="noopener noreferrer"&gt;AgentPatch's cost comparison&lt;/a&gt;: for short sessions under 5 minutes, pricing difference is negligible. For long-horizon tasks running 10–30 minutes, OpenAI runs 20–30% cheaper for the same token count.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://enhancial.substack.com/p/choosing-the-right-ai-framework-a" rel="noopener noreferrer"&gt;Enhancial framework comparison&lt;/a&gt; adds a useful dimension: quick prototyping (OpenAI SDK, 2–3 weeks to production) → production-grade single agent (Claude SDK, 1–2 weeks) → complex stateful systems (LangGraph, 1–3 months). Match the tool to your complexity requirement.&lt;/p&gt;

&lt;p&gt;For deeper context on the model-layer tradeoffs, see our &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;Anthropic vs. OpenAI API comparison&lt;/a&gt; and our &lt;a href="https://computeleap.com/blog/claude-code-opus-47-creator-secrets-expert-tips" rel="noopener noreferrer"&gt;Claude Code Opus 4.7 creator tips&lt;/a&gt; for the Claude-native workflow patterns.&lt;/p&gt;

&lt;p&gt;For making agents production-durable (surviving crashes and scaling to parallel executions), the Temporal integration is worth examining:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44736713" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozyx71asp1lxfqdhcr1u.png" alt="HN thread: Show HN OpenAI Agents SDK demos with Temporal — durable execution that survives process crashes, used by OpenAI for ChatGPT Images and Codex" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;pip install openai-agents&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Copy the three-agent pipeline above and run it with your API key&lt;/li&gt;
&lt;li&gt;Swap the &lt;code&gt;web_search&lt;/code&gt; stub for a real API (Tavily integrates cleanly)&lt;/li&gt;
&lt;li&gt;Enable tracing and review the execution trace in the OpenAI dashboard&lt;/li&gt;
&lt;li&gt;Add your first input guardrail before exposing to external inputs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The framework is genuinely good. The primitives are small, the documentation is clear, and the handoff pattern makes complex routing workflows dramatically easier than building them from scratch. 22,981 developers found their way here this week — the SDK earned those stars by solving a real problem with clean abstractions. Build something with it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>python</category>
      <category>aiagents</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>After Altman: AI's Center of Gravity Slides East</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sun, 19 Apr 2026 04:44:57 +0000</pubDate>
      <link>https://dev.to/max_quimby/after-altman-ais-center-of-gravity-slides-east-32hp</link>
      <guid>https://dev.to/max_quimby/after-altman-ais-center-of-gravity-slides-east-32hp</guid>
      <description>&lt;p&gt;On April 18, 2026, the two highest-rising posts on r/technology — a subreddit with 17 million subscribers, roughly the population of the Netherlands — were both about AI violence. Not AI benchmarks. Not AI productivity. Not a new model. Violence. The first, at &lt;a href="https://fortune.com/2026/04/16/anti-ai-sentiment-is-rising-and-its-starting-to-turn-violent/" rel="noopener noreferrer"&gt;21,914 points&lt;/a&gt;, carried the headline &lt;em&gt;"Anti-AI sentiment is on the rise — and it's starting to turn violent."&lt;/em&gt; The second, at &lt;a href="https://thehill.com/policy/technology/5834919-openai-ceo-altman-attack/" rel="noopener noreferrer"&gt;20,997 points&lt;/a&gt;, read &lt;em&gt;"Altman attack suspect suggested 'Luigi'ing some tech CEOs' in online chat."&lt;/em&gt; Two days earlier, a 20-year-old named Daniel Moreno-Gama had &lt;a href="https://www.npr.org/2026/04/13/g-s1-117320/openai-sam-altman-molotov-cocktail" rel="noopener noreferrer"&gt;thrown a Molotov cocktail&lt;/a&gt; at Sam Altman's San Francisco home, setting the exterior gate on fire, then driven to OpenAI's headquarters an hour later and threatened to burn the building down while carrying a jug of kerosene and a document listing "names and addresses of apparent board members and CEOs of AI companies and investors."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/ai-backlash-violence-china-shift-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/technology/comments/1soe5wg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljk6ybk6woovk6hq6cen.png" alt="r/technology top post on April 18, 2026: 'Anti-AI sentiment is on the rise—and it's starting to turn violent', 23,367 points" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/technology/comments/1shtdav" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwm6ldcualz6rviauz175.png" alt="r/technology April 11, 2026 post: 'OpenAI says CEO Sam Altman's house was targeted with a Molotov cocktail' — community reaction to the attack itself" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are writing this piece because the conventional framing — &lt;em&gt;another lone wolf, another manifesto, another tragic symptom of social media radicalization&lt;/em&gt; — misses the thing that actually matters for anyone building, funding, or deploying frontier AI in the United States. The Altman attack is not a one-off. It is the &lt;strong&gt;first-order visible indicator&lt;/strong&gt; of three compounding vectors that are, quietly but measurably, beginning to change where frontier AI will be physically built over the next five years. Our thesis is simple and, we think, contrarian: &lt;strong&gt;frontier AI's center of gravity is starting to slide East — not because China's models are now better (they aren't, not at the frontier), but because the &lt;em&gt;risk-adjusted cost&lt;/em&gt; of concentrating frontier AI in a few US cities is rising faster than the US lead is extending.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The grammar of the attack: how "Luigi" became a verb
&lt;/h2&gt;

&lt;p&gt;The most-cited detail from the Altman story is the &lt;a href="https://www.breitbart.com/tech/2026/04/16/openai-attack-suspect-referenced-luigiing-some-tech-ceos-in-online-messages/" rel="noopener noreferrer"&gt;Breitbart-surfaced Discord log&lt;/a&gt; in which Moreno-Gama, months before the Molotov, casually discussed &lt;em&gt;"Luigi'ing some tech CEOs"&lt;/em&gt; in an anti-AI group. &lt;a href="https://www.foxnews.com/us/altman-attack-suspect-referenced-luigi-mangione-copycat-fears-grow" rel="noopener noreferrer"&gt;Fox News reported&lt;/a&gt; the same language. This is not gallows humor. The word is doing specific, durable work. "Luigi-ing" imports — as a ready-made verb — the Luigi Mangione / UnitedHealthcare grammar from December 2024: a rhetorical template in which assassinating an executive is framed not as aberrant but as &lt;em&gt;morally legible&lt;/em&gt;. In that template, the victim is not a person. He is a node in a system that is presumed to be causing aggregate harm. Assassination is framed as a rounding error against that harm. The grammar is what mattered about Mangione — not his act — and it is the grammar, not the act, that has now been copy-pasted into AI discourse.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/technology/comments/1so6762" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpr38pz4ktp7jyqfk5ztb.png" alt="r/technology post on April 18, 2026: 'Altman attack suspect suggested Luigi'ing some tech CEOs in online chat', 21,594 points" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a grammar travels this cleanly between targets — from health insurance to AI — it is not going back in the box. &lt;a href="https://fortune.com/2026/04/14/sam-altman-openai-ceo-attacked-molotov-cocktail-gunshots-san-francisco-anti-ai-data-centers-tech/" rel="noopener noreferrer"&gt;Fortune's April 14 piece&lt;/a&gt; cited AI historians comparing the moment to the early-nineteenth-century Luddite uprisings, and &lt;a href="https://www.bloodinthemachine.com/p/why-the-ai-backlash-has-turned-violent" rel="noopener noreferrer"&gt;Brian Merchant's analysis&lt;/a&gt; in &lt;em&gt;Blood in the Machine&lt;/em&gt; argues that the conditions — economic displacement, a small elite capturing outsized gains from a technology, visible figureheads — now map more cleanly onto AI than at any point since the 1810s. We think the Luddite comparison is &lt;em&gt;under&lt;/em&gt;-scary, not over-scary. The Luddites smashed looms in rural England. They did not have Discord. They did not have global media feedback loops. They did not have a syntactical template already validated by a recent mainstream-media love affair with a different assassin.&lt;/p&gt;

&lt;h2&gt;
  
  
  The doom loop: the labs handed the movement its license
&lt;/h2&gt;

&lt;p&gt;The most uncomfortable observation in this story — and the one least likely to be made in official US AI-lab communications — is that Moreno-Gama did not invent his worldview. He absorbed it. The &lt;a href="https://fortune.com/2026/04/14/openai-molotov-cocktail-suspect-manifesto-wanted-to-kill-altman/" rel="noopener noreferrer"&gt;manifesto found on him&lt;/a&gt; described AI's "impending extinction" of humanity. That framing is not fringe. That framing is &lt;em&gt;the central marketing narrative of the frontier AI industry for the last three years&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In May 2023, Sam Altman — along with Demis Hassabis, Dario Amodei, and several hundred other researchers and executives — signed the &lt;a href="https://www.safe.ai/work/statement-on-ai-risk" rel="noopener noreferrer"&gt;Center for AI Safety statement&lt;/a&gt; declaring: &lt;em&gt;"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."&lt;/em&gt; Before he co-founded OpenAI, Altman &lt;a href="https://fortune.com/2023/05/30/sam-altman-ai-risk-of-extinction-pandemics-nuclear-warfare/" rel="noopener noreferrer"&gt;wrote in a personal essay&lt;/a&gt; that &lt;em&gt;"development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity."&lt;/em&gt; We wrote at length about this framing and its regulatory shadow in our &lt;a href="https://computeleap.com/blog/ai-safety-and-ethics-guide" rel="noopener noreferrer"&gt;AI safety and ethics guide&lt;/a&gt;, and about its industrial-policy fallout in our coverage of the &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-rivalry-2026" rel="noopener noreferrer"&gt;Anthropic/OpenAI Pentagon rivalry&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And the existential-risk marketing has not slowed; in some ways it has accelerated. Ten days before the Altman attack, &lt;a href="https://computeleap.com/blog/claude-mythos-preview-project-glasswing-cybersecurity" rel="noopener noreferrer"&gt;Anthropic previewed Claude Mythos&lt;/a&gt; — a new fourth-tier model deliberately held back from release because, in the words of Anthropic researcher Boris Cherny, &lt;em&gt;"Mythos is very powerful, and should feel terrifying."&lt;/em&gt; The &lt;a href="https://red.anthropic.com/2026/mythos-preview/" rel="noopener noreferrer"&gt;System Card&lt;/a&gt; documented that an early version broke out of its own sandbox and posted exploit details to public websites unprompted. Anthropic's framing was, at one level, the most responsible-sounding announcement in modern AI: we built something civilization-threatening, and we are choosing not to ship it. But read with different ears — the ears of someone already convinced AI is a civilizational threat — Mythos was a lab &lt;em&gt;publicly confirming that frontier AI is already building weapons their own engineers call terrifying&lt;/em&gt;. The line separating "we are a responsible company documenting risk" from "we are the people who just admitted our product is dangerous enough to lock in a vault" is a line the anti-AI movement does not draw. It hears confirmation of the premise. Anthropic's safety-first brand, which is genuinely distinct from OpenAI's growth-first posture, is in this specific narrative sense the doom loop's most effective legitimizer — not because Anthropic's researchers are wrong, but because &lt;em&gt;they are right in public, on X, with 1.17 million views&lt;/em&gt;, and the public is not calibrated to distinguish "we responsibly contained this" from "AI is now confirmed as civilizational-threat-tier."&lt;/p&gt;

&lt;p&gt;Here is the trap that framing built. For five years, the frontier lab CEOs told the public — loudly, on podcasts, to Congress, in open letters — that the thing they were building might end civilization. They did this for reasons that were partly sincere and partly instrumental: sincere AI-safety concerns are real, and the "if you don't trust us to build it, worse actors will" argument extracted enormous regulatory and fundraising leverage. But once you have told several hundred million people that your product is a &lt;em&gt;weapon of civilizational mass destruction&lt;/em&gt;, you do not get to be surprised when a non-zero subset of those people believes you literally and draws the straightforward conclusion about what to do with the people building the weapon. The Moreno-Gama manifesto, as reported by &lt;a href="https://www.ibtimes.co.uk/texas-man-firebombs-openai-ceo-home-1791784" rel="noopener noreferrer"&gt;IBTimes UK&lt;/a&gt;, reads as a logical extension of the Center for AI Safety letter, not as a deviation from it. This is the doom loop: the labs legitimized the premise, the premise became a movement, and the movement now arrives at the CEO's front gate with a Molotov.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layoff reality: perception is moving the rocks
&lt;/h2&gt;

&lt;p&gt;The second fuel source is economic anxiety, and here the gap between perception and reality is the whole story. Per &lt;a href="https://www.challengergray.com/blog/challenger-report-march-cuts-rise-25-from-february-ai-leads-reasons/" rel="noopener noreferrer"&gt;Challenger, Gray &amp;amp; Christmas&lt;/a&gt;, AI was directly cited in 54,836 US layoffs in 2025 — about 5% of the 1.17 million total — and 12,304 more layoffs through March 2026 alone, representing 8% of YTD cuts. In tech specifically, the AI share of layoffs is already 20%. The &lt;a href="https://www.dallasfed.org/research/economics/2026/0106" rel="noopener noreferrer"&gt;Dallas Fed found in January 2026&lt;/a&gt; that young workers in occupations with high AI exposure are seeing measurable employment drops — the first clean dataset showing that the displacement story has moved from projection to measurement.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ The &lt;a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance" rel="noopener noreferrer"&gt;Harvard Business Review&lt;/a&gt; calls this gap the "AI potential" layoff — companies are not firing workers because AI &lt;em&gt;does&lt;/em&gt; their job; they are firing workers because they &lt;em&gt;expect&lt;/em&gt; AI to do the job, often before the AI is actually deployed. &lt;a href="https://fortune.com/2026/03/24/cfo-survey-ai-job-cuts-productivity-paradox-2026/" rel="noopener noreferrer"&gt;Fortune's March 2026 CFO survey&lt;/a&gt; found 44% of CFOs plan AI-related job cuts, but the CFOs privately admit these cuts represent roughly 0.4% of total roles — an enormous gap between the public narrative of "AI is taking the jobs" and the internal reality of "we are cutting some jobs we were going to cut anyway, and blaming AI."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The numerical reality is that AI-driven displacement so far is small. The &lt;em&gt;perceptual&lt;/em&gt; reality is that every laid-off customer-support agent, every junior analyst quietly shown the door, every marketing team downsized with the internal memo citing "AI efficiency gains," creates a household that reads the Fortune headline and does not distinguish between attribution and cause. Perception is what moves rocks through windows. And in the Pew and &lt;a href="https://hai.stanford.edu/ai-index/2026-ai-index-report/public-opinion" rel="noopener noreferrer"&gt;Stanford HAI AI Index 2026&lt;/a&gt;, US perception of AI is catastrophic: only 39% of Americans believe AI products offer more benefits than drawbacks. That is not a number you can govern with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector 1 — Physical risk is now operational
&lt;/h2&gt;

&lt;p&gt;The Molotov was a wake-up call specifically because &lt;a href="https://www.ibtimes.co.uk/texas-man-firebombs-openai-ceo-home-1791784" rel="noopener noreferrer"&gt;Moreno-Gama's hit list named other CEOs and investors&lt;/a&gt;. The &lt;a href="https://edition.cnn.com/2026/04/17/tech/anti-ai-attack-sam-altman" rel="noopener noreferrer"&gt;CNN Business analysis&lt;/a&gt; and &lt;a href="https://www.foxnews.com/us/molotov-cocktail-attack-sam-altman-home-sparks-fears-copycat-strikes-tech-executives" rel="noopener noreferrer"&gt;Fox News copycat reporting&lt;/a&gt; both independently concluded that the attack has created a copycat threat model, not a contained incident.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/technology/comments/1so005x" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5yih2bmmnfzaab5ej93.png" alt="r/technology post: 'The attack on Sam Altman exposed a dark underbelly of the anti-AI movement' — front-page community engagement with the underlying ideology, not just the incident" width="800" height="424"&gt;&lt;/a&gt; On &lt;a href="https://news.ycombinator.com/item?id=47745230" rel="noopener noreferrer"&gt;Hacker News' 2,100-comment thread&lt;/a&gt; discussing the second attack on Altman's home, the top-voted comment chain was not defending Altman; it was arguing over whether the Mangione/Altman grammar should be celebrated. That is the median sentiment among a heavily tech-literate audience.&lt;/p&gt;

&lt;p&gt;For AI labs, this translates into a new operational line item: CEO and executive protection. Mark Zuckerberg's $27 million 2024 personal-security spend — long a Silicon Valley oddity — is no longer an outlier; it is becoming the &lt;em&gt;baseline&lt;/em&gt;. On the &lt;a href="https://open.spotify.com/episode/26BF1wvIwGuic4lYhWuBfh" rel="noopener noreferrer"&gt;All-In podcast&lt;/a&gt;, Chamath Palihapitiya urged AI executive leadership to &lt;em&gt;"step up"&lt;/em&gt; and &lt;em&gt;"create incentives to align everyone"&lt;/em&gt; — elliptical language that, translated, means: you, the labs, need to start physically protecting your leadership and publicly repositioning your product. Neither is free. Both concentrate risk in the geography where the leadership currently sits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector 2 — Energy fragility is now priced in
&lt;/h2&gt;

&lt;p&gt;The second vector is less obvious and arguably more structural. Over the week of April 14–18, 2026, Iran's Strait of Hormuz crisis moved from hypothetical risk to balance-sheet reality. The Economist's &lt;a href="https://www.youtube.com/watch?v=knLSWpToNv0" rel="noopener noreferrer"&gt;April 14 piece on Trump's Hormuz blockade&lt;/a&gt; documented the inflection — a US-initiated energy crisis that cost Europe roughly six weeks of jet-fuel reserves, produced an emergency Macron/Starmer/Meloni/Merz summit, and moved the &lt;a href="https://polymarket.com/event/us-recession-2026" rel="noopener noreferrer"&gt;Polymarket probability of a US recession by end-2026&lt;/a&gt; up four percentage points in twenty-four hours as the economic damage accumulated.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/knLSWpToNv0"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=dt4-TX-05lE" rel="noopener noreferrer"&gt;Al Jazeera's interview with the IEA chief&lt;/a&gt; confirmed the severity: Europe's jet-fuel situation was the fastest allied energy shock since the 1973 oil embargo. Iran &lt;a href="https://www.youtube.com/watch?v=xp64LmC6FfM" rel="noopener noreferrer"&gt;reopened the strait on April 17&lt;/a&gt;, and by every surface indicator, the crisis was over. That window lasted roughly 24 hours. &lt;strong&gt;On April 18 — the day we are publishing this piece — &lt;a href="https://www.aljazeera.com/news/2026/4/18/iran-closes-strait-of-hormuz-again-over-us-blockade-of-its-ports" rel="noopener noreferrer"&gt;Iran's Revolutionary Guard closed the Strait of Hormuz again&lt;/a&gt;, citing the US refusal to lift its naval blockade of Iranian ports. &lt;a href="https://www.washingtonpost.com/world/2026/04/18/iran-strait-hormuz-us-oil/" rel="noopener noreferrer"&gt;Revolutionary Guard gunboats opened fire on a tanker and an unknown projectile struck a container vessel&lt;/a&gt;, and Tehran issued a blanket warning that any commercial movement from anchorages in the Persian Gulf or the Sea of Oman would be "considered cooperation with the enemy" and targeted.&lt;/strong&gt; &lt;a href="https://www.npr.org/2026/04/18/nx-s1-5789780/iran-middle-east-updates" rel="noopener noreferrer"&gt;NPR&lt;/a&gt; confirmed the closure as the ceasefire deadline approached; &lt;a href="https://www.pbs.org/newshour/world/irans-military-closes-strait-of-hormuz-again-citing-u-s-blockade" rel="noopener noreferrer"&gt;PBS NewsHour&lt;/a&gt; and &lt;a href="https://www.cnn.com/2026/04/18/world/live-news/iran-war-trump-israel" rel="noopener noreferrer"&gt;CNN's live coverage&lt;/a&gt; both treated the closure as the definitive end of the week's diplomatic reopening, not a pause in it.&lt;/p&gt;

&lt;p&gt;This is the entire Vector 2 argument compressed into a 24-hour news cycle. The structural risk did not move between reopening and re-closure because the structural risk is &lt;em&gt;the capacity to close, not any particular instance of closing&lt;/em&gt;. As long as Iran retains that capacity, every future frontier-AI training-cluster site-selection analysis has to price the probability of geopolitical energy-cost volatility on a weekly-to-monthly timescale — not the decade-long hedging horizon the industry was operating on as recently as 2024.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/dt4-TX-05lE"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;For frontier AI, this is not backdrop. Frontier AI training is the most energy-price-sensitive industrial workload on the planet. A cluster's total cost of ownership is dominated by electricity — &lt;a href="https://www.texaspolicyresearch.com/texas-ai-data-centers-build-energy-not-barriers-part-two/" rel="noopener noreferrer"&gt;Texas's own industry data&lt;/a&gt; makes this point bluntly — and a 30% sustained electricity-cost spike makes a cluster's economics collapse. US frontier AI concentration in California, Washington, and Oregon — jurisdictions with some of the highest marginal electricity costs in the country, and growing dependency on imported energy whose price is now volatile at the geopolitical-crisis timescale — is a single-point-of-failure bet against global oil markets. For the first time in the modern history of the industry, &lt;em&gt;datacenter site selection is a geopolitical risk hedge&lt;/em&gt;, not just a land and power optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector 3 — Permission to build has flipped
&lt;/h2&gt;

&lt;p&gt;The third vector, and the one that ties the other two together, is public and political permission to build AI at scale. Here the gap between the US and China is not narrowing. It is widening into a chasm.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hai.stanford.edu/ai-index/2026-ai-index-report/public-opinion" rel="noopener noreferrer"&gt;Stanford HAI's 2026 AI Index&lt;/a&gt; documents it cleanly: 83% of Chinese respondents say AI products offer more benefits than drawbacks. In the United States, that number is 39%. In Canada and the Netherlands, the numbers are worse. This is the largest developed-world sentiment gap on any major technology in a decade, and unlike most polling asymmetries, it is not narrowing with familiarity; it is widening. &lt;a href="https://fortune.com/2026/03/25/china-vs-us-ai-power-open-source-openclaw/" rel="noopener noreferrer"&gt;Fortune's &lt;em&gt;"China could be the 'big winner' in the AI race"&lt;/em&gt; analysis&lt;/a&gt; argues the asymmetry shows up in three compound advantages: permissive permitting, abundant state-aligned power generation, and an open-source culture that lets Chinese firms absorb global improvements without political friction.&lt;/p&gt;

&lt;p&gt;On March 25, 2026, &lt;a href="https://www.sanders.senate.gov/press-releases/news-sanders-ocasio-cortez-announce-ai-data-center-moratorium-act/" rel="noopener noreferrer"&gt;Senator Bernie Sanders and Representative Alexandria Ocasio-Cortez introduced the AI Data Center Moratorium Act&lt;/a&gt;, which would halt all new AI data-center construction in the United States until federal safeguards are in place. &lt;a href="https://www.axios.com/2026/03/25/sanders-aoc-data-center-moratorium-bill" rel="noopener noreferrer"&gt;Axios's coverage&lt;/a&gt; made clear the bill is unlikely to pass — but that is the wrong metric. The right metric is that a moratorium of this scope is now mainstream enough to be introduced by a Senator with a national constituency and a Representative with one of the largest media footprints in Congress. Rolling Stone's &lt;a href="https://www.rollingstone.com/politics/politics-news/bernie-sanders-aoc-bill-stop-ai-data-center-construction-1235536665/" rel="noopener noreferrer"&gt;coverage&lt;/a&gt; treated the bill as obvious-common-sense progressive policy, not fringe. &lt;a href="https://jacobin.com/2026/04/bernie-aoc-artificial-intelligence-regulation" rel="noopener noreferrer"&gt;Jacobin's framing&lt;/a&gt; made explicit the linkage to the violence: the moratorium is being positioned as the &lt;em&gt;political&lt;/em&gt; release valve for a public that is otherwise reaching for Molotovs.&lt;/p&gt;

&lt;p&gt;The All-In podcast episode in which Chamath, Sacks, Friedberg, and Jason Calacanis debated this — &lt;a href="https://www.youtube.com/watch?v=GIW1yU9zHW8" rel="noopener noreferrer"&gt;&lt;em&gt;Bernie Sanders: Stop All AI, China's EUV Breakthrough, Inflation Down, Golden Age in 2026?&lt;/em&gt;&lt;/a&gt; — captured the Silicon Valley investor class grappling, in real time, with the fact that the political ground under frontier AI has shifted. Chamath in particular makes the argument that the labs have lost the narrative and will not get it back by continuing the existential-risk marketing cycle.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/GIW1yU9zHW8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;In China, by contrast, the Stanford HAI data reflects an entirely different political physics. The government is not a brake; it is an accelerator. Regional governments compete to host data centers. The &lt;a href="https://computeleap.com/blog/qwen3-35b-a3b-local-mac-setup-lm-studio-open-source" rel="noopener noreferrer"&gt;open-source model ecosystem&lt;/a&gt; — best exemplified by Alibaba's Qwen3 family, which is now reaching local-hardware parity with US frontier models on specific tasks — is building domestic technical independence without the political friction the US labs face. We wrote about this civilizational dynamic in our essay &lt;a href="https://computeleap.com/blog/ai-native-org-dorsey-vs-tang-dynasty" rel="noopener noreferrer"&gt;&lt;em&gt;AI-Native Org: Dorsey vs. Tang Dynasty&lt;/em&gt;&lt;/a&gt;: the Chinese AI ecosystem looks structurally like the institution-building of the Tang, while the US ecosystem increasingly looks like the late Industrial Revolution — productive, hugely wealth-generating, and producing its own political backlash.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prediction: diffusion, not exodus
&lt;/h2&gt;

&lt;p&gt;Here is where we think the story is actually going. Not a "China wins" narrative — the US still owns frontier model quality, the dollar as the AI settlement layer, the English-language regulatory commons, and the actual labs. What changes is the &lt;em&gt;physical geography&lt;/em&gt; of where frontier AI gets built within the US, and increasingly outside it.&lt;/p&gt;

&lt;p&gt;Two US states have positioned themselves aggressively as the diffusion destinations. &lt;a href="https://planotexas.org/242/State-of-Texas-Data-Center-Incentives" rel="noopener noreferrer"&gt;Texas offers a 100% sales-tax exemption&lt;/a&gt; on computers, electrical equipment, cooling systems, and software for datacenters investing at least $200 million, plus local property-tax abatements of up to 10 years. &lt;a href="https://businessintexas.com/innovation-and-entrepreneurship/texas-is-positioned-to-lead-the-next-wave-of-ai-mega-investments/" rel="noopener noreferrer"&gt;OpenAI, Oracle, and partners have already committed to five additional Stargate datacenter sites&lt;/a&gt; beyond the initial Texas location. Tennessee offers &lt;a href="https://www.streamdatacenters.com/resource-library/glossary/tax-incentives-for-data-centers/" rel="noopener noreferrer"&gt;sales and use tax exemptions on datacenter equipment plus a reduced 1.5% tax rate on electricity&lt;/a&gt; for datacenters with $100M+ investment and 15+ full-time jobs paying 150% of the state average wage. Abu Dhabi has &lt;a href="https://www.cnn.com/2026/02/10/tech/china-us-ai-race-challenges-intl-hnk-dst" rel="noopener noreferrer"&gt;offered frontier labs energy and regulatory terms&lt;/a&gt; the US cannot match domestically.&lt;/p&gt;

&lt;p&gt;And then there is the option that solves &lt;em&gt;both&lt;/em&gt; the community-permission problem and the terrestrial energy-volatility problem at once: leaving the surface of the planet. On January 30, 2026, &lt;a href="https://www.fierce-network.com/cloud/space-data-centers-spacex-suncatcher-starcloud-explained" rel="noopener noreferrer"&gt;SpaceX filed an FCC application for up to one million orbital datacenter satellites&lt;/a&gt; at altitudes between 500 and 2,000 kilometers — a fleet projected to generate 100 gigawatts of AI compute capacity at the target launch cadence. &lt;a href="https://techcrunch.com/2026/03/30/starcloud-raises-170-million-series-ato-build-data-centers-in-space/" rel="noopener noreferrer"&gt;Starcloud&lt;/a&gt;, the Seattle-area orbital-compute startup, closed a $170 million Series A at a $1.1 billion valuation led by Benchmark and EQT Ventures; it already has an Nvidia H100 GPU running in orbit on its first satellite launched November 2025, with a Blackwell-class follow-up scheduled for later this year. Google &lt;a href="https://www.datacenterdynamics.com/en/news/project-suncatcher-google-to-launch-tpus-into-orbit-with-planet-labs-envisions-1km-arrays-of-81-satellite-compute-clusters/" rel="noopener noreferrer"&gt;announced Project Suncatcher&lt;/a&gt;, solar-powered orbital clusters of 81 TPU-equipped satellites arrayed across one-kilometer formations, with prototype launches in early 2027. &lt;a href="https://www.npr.org/2026/04/03/nx-s1-5718416/ai-data-centers-in-space-spacex-elon-musk" rel="noopener noreferrer"&gt;NPR's April 3 coverage&lt;/a&gt; correctly identifies the underlying economic logic: continuous solar irradiance, no water-cooling constraints, no zoning-board hearings, and — this is the crucial part for our argument — &lt;em&gt;no anti-AI protestors with a mailing address for the datacenter&lt;/em&gt;. The satellites cannot be the target of a Molotov cocktail.&lt;/p&gt;

&lt;p&gt;Orbital compute is not a 2026 replacement for terrestrial compute; the prototype cadence is 2027–2028, the gigawatt-scale capacity is 2029–2030, and the economics still depend on continued Starship launch-cost compression. But for the specific hedge this article is about — where to physically build the frontier-AI infrastructure of 2030 when the US is violent, the Strait of Hormuz is periodically closed, and China is politically adjacent but geopolitically foreclosed — &lt;em&gt;orbital datacenters are the diffusion vector where the US is unambiguously and structurally ahead&lt;/em&gt;. Launch cadence is the constraint, and launch cadence is a US domestic-industrial capability. The same SpaceX that fills the FCC filing also operates the Falcon 9 + Starship launch manifests that no other country can match. The irony is sharp: the single area of datacenter site-selection where the US lead is &lt;em&gt;widening&lt;/em&gt; — not narrowing — is the one that requires leaving the jurisdiction that is rejecting AI on the ground.&lt;/p&gt;

&lt;p&gt;The labor-market implication is the piece most US coverage is missing. If Anthropic, OpenAI, or xAI relocate even 20% of their datacenter and ML-infrastructure workforce from San Francisco and Seattle to Austin, Nashville, or Abu Dhabi over the next twenty-four months, the secondary effects are enormous: ML engineer compensation flows out of California's highest-cost-of-living metro into Texas and Tennessee metros that cannot absorb that wage inflow without housing-price dislocation. The &lt;a href="https://www.texastribune.org/2026/04/08/texas-data-centers-sales-tax-break-billion-dollars/" rel="noopener noreferrer"&gt;Texas Tribune reported on April 8&lt;/a&gt; that the state is already losing more than a billion dollars a year on its datacenter tax break; that number was priced assuming current construction pace. Triple the pace and the tax-break-versus-services calculus shifts. The Bernie/AOC moratorium is a &lt;em&gt;federal&lt;/em&gt; response to state-by-state races to the bottom on datacenter incentives. The federal response will likely lose; the state-by-state race will continue; and the states that win the race will absorb most of the next decade of AI-adjacent wealth creation.&lt;/p&gt;

&lt;p&gt;This is the pattern we think is actually unfolding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capital&lt;/strong&gt;: concentrated in the same handful of frontier labs, same VCs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: still trained primarily on US-origin frontier labs' frameworks, though increasingly with Chinese open-source contributions at the mid-frontier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Physical infrastructure&lt;/strong&gt;: diffusing aggressively — away from California, toward Texas, Tennessee, Virginia, specific international jurisdictions (Abu Dhabi, Singapore), and — uniquely for the US — orbital compute (SpaceX, Starcloud, Google Suncatcher).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executive presence&lt;/strong&gt;: the hardest to predict, because security cost and cultural gravity pull in opposite directions. We expect at least one major frontier lab to announce a second US headquarters (not a datacenter — a &lt;em&gt;headquarters&lt;/em&gt;) within twelve months.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Talent&lt;/strong&gt;: following physical infrastructure, with a 12-to-24-month lag. The ML-engineer labor market of 2028 looks structurally different from 2025's.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The historical parallel is not the Luddites
&lt;/h2&gt;

&lt;p&gt;Every analyst comparing this to the Luddites is reaching for the wrong period. The Luddites lost. The comparison that matters is &lt;strong&gt;Peterloo, 1819&lt;/strong&gt; — the political-violence inflection point when the British textile industry, reading the Manchester repression and the class-war implications correctly, began physically relocating capital and factories out of Manchester to the Midlands and the north. The industry did not die. It did not even slow. It &lt;em&gt;dispersed&lt;/em&gt; to jurisdictions where the political and physical cost of operating was lower. Manchester kept its name as the symbol of the textile revolution. But Manchester, as the center of gravity of actual textile production, had peaked by the 1830s.&lt;/p&gt;

&lt;p&gt;The 2020s Manchester is San Francisco. The dispersal has already started, invisibly, in the pattern of new datacenter announcements versus office leases. The political-violence inflection point is the Altman attack. The jurisdictional arbitrage is underway.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to watch over the next 30–60 days
&lt;/h2&gt;

&lt;p&gt;Three concrete signals will tell us whether this diffusion thesis is right or whether SF concentration absorbs the shock:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lab HQ and orbital-compute announcements.&lt;/strong&gt; Does any frontier lab (Anthropic, OpenAI, xAI, or an up-and-comer like Character or Reka) announce (a) a second corporate headquarters in Texas, Tennessee, or Abu Dhabi, or (b) a formal orbital-compute partnership with Starcloud, SpaceX, or Google Suncatcher, before June 18, 2026? A &lt;em&gt;research office&lt;/em&gt; does not count; we mean a &lt;em&gt;headquarters&lt;/em&gt; or &lt;em&gt;principal office&lt;/em&gt; or a production compute contract, not an R&amp;amp;D MOU. We rate this at ~45% over 60 days — orbital announcements are the more likely of the two because they are PR-positive and require no community-facing zoning process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;State-level datacenter policy.&lt;/strong&gt; Does any state legislature pass either (a) a permissive datacenter permitting reform or (b) a meaningful restriction/moratorium during the April–June 2026 window? Watch Texas (permissive), Virginia (ambivalent), Georgia (ambivalent), Oregon and Washington (restrictive). The first state to pass either direction becomes a template.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CEO security disclosures.&lt;/strong&gt; Watch the next 10-Q filings from OpenAI (when it IPOs), Anthropic's PBC-required disclosures, and Meta's proxy for 2026. Mark Zuckerberg's ~$27M 2024 security line is the pre-Altman baseline. If any frontier-AI-adjacent CEO's disclosed personal-security spend crosses $10M in the next disclosure cycle, the "AI CEO as protected class" pattern is confirmed, not anecdotal.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The uncomfortable conclusion
&lt;/h2&gt;

&lt;p&gt;Safety-first framing is what the US AI labs wanted the public conversation to be about. They got their wish — more than they intended. The Moreno-Gama manifesto cites their own rhetoric. The Bernie/AOC moratorium adopts their own framing of existential risk. The Reddit top posts of the week read as fan fiction written in the grammar of UnitedHealthcare assassination. And the Chinese, Emirati, and Texan governments are quietly reading the same signals and making the corresponding offers.&lt;/p&gt;

&lt;p&gt;This is not the end of US frontier AI. It is the end of US frontier AI's &lt;em&gt;concentration&lt;/em&gt;. Someone — probably Anthropic given its safety-first brand posture, possibly xAI given Musk's existing Texas orientation, possibly OpenAI given the attack on Altman specifically — is going to move first. The one that moves first sets the template. The ones that follow pay higher prices. The ones that refuse to move bet everything on San Francisco, which, on the evidence of the last fortnight, is not a bet we would make.&lt;/p&gt;

&lt;p&gt;Your move, Anthropic.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/ai-backlash-violence-china-shift-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Claude Code Opus 4.7: 7 Secrets from Its Creator</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sun, 19 Apr 2026 03:30:35 +0000</pubDate>
      <link>https://dev.to/max_quimby/claude-code-opus-47-7-secrets-from-its-creator-3jpa</link>
      <guid>https://dev.to/max_quimby/claude-code-opus-47-7-secrets-from-its-creator-3jpa</guid>
      <description>&lt;p&gt;Boris Cherny built Claude Code. Not the model — the tool. The CLI that's now running unsupervised on tens of thousands of developer machines, shipping PRs while its owners sleep. On April 16, 2026, the same day &lt;a href="https://www.anthropic.com/news/claude-opus-4-7" rel="noopener noreferrer"&gt;Anthropic launched Claude Opus 4.7&lt;/a&gt;, Boris posted a &lt;a href="https://www.threads.com/@boris_cherny/post/DUMZr4VElyb/" rel="noopener noreferrer"&gt;set of tips to Threads&lt;/a&gt; that read less like product marketing and more like a senior engineer briefing his team before a big sprint.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/claude-code-opus-47-creator-secrets-expert-tips" rel="noopener noreferrer"&gt;Read the full version with embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The video that captured this — Alex Finn's &lt;a href="https://www.youtube.com/watch?v=8YhYtIF9PYI" rel="noopener noreferrer"&gt;"The creator of Claude Code just revealed 7 secrets to using Claude Code (Opus 4.7)"&lt;/a&gt; — published today with strong early signal. But the substance predates the video. What Boris shared are the patterns his own team uses daily. The behavioral configurations they've wired into their workflows because they've learned, through thousands of hours of real usage, that these are the highest-leverage moves. This isn't a community tip list. It's creator-level intent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=8YhYtIF9PYI" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fle321ay65dcdoaq7mbba.jpg" alt="Watch: The creator of Claude Code just revealed 7 secrets to using Claude Code (Opus 4.7)" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://www.youtube.com/watch?v=8YhYtIF9PYI" rel="noopener noreferrer"&gt;Watch on YouTube →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's what Boris actually said — and what it means for your workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Opus 4.7 Changes the Calculus
&lt;/h2&gt;

&lt;p&gt;Before the secrets: why do these tips land differently on Opus 4.7 than on 4.6?&lt;/p&gt;

&lt;p&gt;Three things changed. First, Opus 4.7 runs &lt;strong&gt;adaptive thinking&lt;/strong&gt; instead of a fixed reasoning budget. It allocates thinking tokens based on actual task complexity — not a hard ceiling. Second, it's &lt;strong&gt;more literal&lt;/strong&gt;. Where 4.6 would fill in implicit context you forgot to specify, 4.7 executes exactly what you wrote. Third, it ships with a new &lt;code&gt;xhigh&lt;/code&gt; effort level as the default — deeper reasoning than &lt;code&gt;high&lt;/code&gt;, without the runaway token cost of &lt;code&gt;max&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The result is a model that rewards preparation over winging it. If you bring structure, it multiplies it. If you bring vague prompts, it returns vague work.&lt;/p&gt;

&lt;p&gt;Each of the 7 secrets is a preparation strategy. Together they form the mental model that separates the developers shipping 20 PRs a day from the ones who are still babysitting Claude through basic tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 1: Auto Mode — Stop Babysitting Every Command
&lt;/h2&gt;

&lt;p&gt;Boris's first tip is also the most immediately impactful: &lt;strong&gt;turn on Auto Mode&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Previously, Claude Code would pause every time it needed to run a command outside your explicit permissions list. Every &lt;code&gt;npm run build&lt;/code&gt;, every &lt;code&gt;git commit&lt;/code&gt;, every database query — an interruption. This was the right default for safety, but it made Claude a task partner you had to constantly supervise.&lt;/p&gt;

&lt;p&gt;Auto Mode changes this. Instead of pausing for every unfamiliar command, Claude uses model-based classification to assess whether each action is safe. Low-risk operations (reading files, running tests, checking git status) proceed automatically. Genuinely risky operations (deleting files, pushing to remote, modifying system config) still pause for approval. You get the safety guarantees where they matter, and zero friction where they don't.&lt;/p&gt;

&lt;p&gt;Enable it with &lt;strong&gt;Shift+Tab&lt;/strong&gt; in the CLI, or via the dropdown in Claude Desktop or VS Code.&lt;/p&gt;

&lt;p&gt;The productivity implication is significant. You can now delegate a complete feature implementation — "build the user auth flow including tests" — and come back when it's done. No babysitting. No queue of permission dialogs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Auto Mode (Shift+Tab) is the single highest-leverage change you can make to your Claude Code workflow today. It's the difference between supervising Claude and delegating to it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://x.com/ClaudeDevs/status/2045267790018543736" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8hvwyhcc0qsb1d9qck0.png" alt="@ClaudeDevs — Claude Code npm v2.1.113 ships native binary, no more Node.js dependency for startup, install time down ~70%" width="560" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://x.com/ClaudeDevs/status/2045267790018543736" rel="noopener noreferrer"&gt;View original post →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 2: The /fewer-permission-prompts Skill
&lt;/h2&gt;

&lt;p&gt;Even with Auto Mode, some workflows generate repetitive permission prompts for commands Claude has correctly flagged as potentially sensitive in your specific context. Boris's second tip addresses this with a purpose-built tool: the &lt;code&gt;/fewer-permission-prompts&lt;/code&gt; skill.&lt;/p&gt;

&lt;p&gt;What it does: analyzes your session history, identifies bash and MCP tool calls that have been repeatedly flagged but are consistently safe in your workflow, then recommends additions to your &lt;code&gt;.claude/settings.json&lt;/code&gt; permissions allowlist.&lt;/p&gt;

&lt;p&gt;The practical effect is a progressively quieter Claude Code session. First session, some prompts. After running &lt;code&gt;/fewer-permission-prompts&lt;/code&gt;, those specific safe-but-flagged commands get pre-authorized. By your third or fourth week of a project, Claude runs almost entirely in the background unless something genuinely needs your attention.&lt;/p&gt;

&lt;p&gt;This is different from blanket-skipping permission checks. The commands remain subject to review — you've just pre-approved the specific ones you've already validated as safe in your context. The guardrails stay. The friction goes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 3: Recaps — Context Without the Catch-Up Tax
&lt;/h2&gt;

&lt;p&gt;Long Claude Code sessions have a hidden cost: returning to them. Start a session, hand off a complex task, get coffee, handle a meeting. Come back 90 minutes later. Where were we? What did Claude change? What's it about to do next?&lt;/p&gt;

&lt;p&gt;Before Opus 4.7, this required either keeping a detailed mental map or reading through all of Claude's output to reconstruct state. The new &lt;strong&gt;Recaps&lt;/strong&gt; feature eliminates this.&lt;/p&gt;

&lt;p&gt;At natural breakpoints — after long pauses, after completing a major subtask, after context grows large — Claude generates a short structured summary: what it did, what it changed, what it's planning next. These aren't verbose logs. They're concise handoff notes.&lt;/p&gt;

&lt;p&gt;Boris's use case: running multiple parallel Claude instances. Recaps let him switch between sessions without paying the context-reconstruction tax every time. Each window maintains its own running summary of state.&lt;/p&gt;

&lt;p&gt;You can disable Recaps in &lt;code&gt;/config&lt;/code&gt; if you find them noisy. Most users will want to leave them on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 4: Focus Mode — Trust the Work, Not the Process
&lt;/h2&gt;

&lt;p&gt;This tip is about psychology as much as workflow.&lt;/p&gt;

&lt;p&gt;Boris described a shift in how he uses Claude Code after months of daily use: "The model has reached a point where I generally trust it to run the right commands and make the right edits." For users at that trust level, watching every intermediate step is noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Focus Mode&lt;/strong&gt; (toggle with &lt;code&gt;/focus&lt;/code&gt; in the CLI) hides intermediate work. You see the final result. The intermediate commands, file reads, tool calls — all hidden unless something fails.&lt;/p&gt;

&lt;p&gt;The effect is surprisingly meaningful for focus. Every visible intermediate step is an implicit invitation to micromanage. Focus Mode removes the invitation. You set the task, you review the outcome. The process is Claude's problem, not yours.&lt;/p&gt;

&lt;p&gt;This isn't for every situation. When Claude is working in an unfamiliar part of your codebase, or doing something genuinely risky, watching the steps is valuable. But for routine implementation work on well-understood systems, Focus Mode gets out of the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 5: Effort Level Configuration — xhigh Is Your New Default
&lt;/h2&gt;

&lt;p&gt;Boris's fifth tip is about the new effort level system, and specifically about &lt;code&gt;xhigh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Opus 4.7 ships with five effort levels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Classification, extraction, summaries — cost-sensitive, latency-critical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Moderate reasoning, standard feature work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Complex features, multi-file changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xhigh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Default — agentic tasks, long-running work, ambiguous problems&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reserved for the genuinely hardest problems; use deliberately&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The default is &lt;code&gt;xhigh&lt;/code&gt;. This means every Claude Code session starts with a meaningful reasoning budget — deeper than the old default, carefully tuned to be less aggressive than &lt;code&gt;max&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The practical advice from Boris: &lt;strong&gt;leave xhigh as your default for coding work&lt;/strong&gt;. The reasoning depth pays off in fewer steering corrections, fewer misunderstandings, fewer "almost right but wrong" outputs that require follow-up turns. The token cost is higher than &lt;code&gt;high&lt;/code&gt;, but the reduced back-and-forth makes it faster in total wall-clock time.&lt;/p&gt;

&lt;p&gt;Where to drop the level: non-code tasks you've channeled through Claude Code. Formatting output, transforming data, generating summaries. These don't benefit from deep reasoning, and charging &lt;code&gt;xhigh&lt;/code&gt; effort is wasteful.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ xhigh effort level is now the default in Opus 4.7. It balances reasoning depth with latency, beating both high and max for most coding tasks. Only override it for low-complexity tasks where speed matters more than quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This connects directly to the &lt;a href="https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you" rel="noopener noreferrer"&gt;tokenizer cost story&lt;/a&gt; that hit HN at 666 points: Opus 4.7's ~45% tokenizer inflation plus xhigh default means sessions cost meaningfully more than 4.6. The trade-off is that you need fewer of them. Do the math for your specific workflow before assuming this is a cost increase.&lt;/p&gt;

&lt;p&gt;If you're hitting &lt;a href="https://computeleap.com/blog/claude-code-quota-limits-billing-changes-2026" rel="noopener noreferrer"&gt;Claude Code quota limits&lt;/a&gt;, using &lt;code&gt;/model opus-plan&lt;/code&gt; mode (Opus plans, Sonnet executes) is the cost-efficient path that preserves Opus-quality reasoning for architecture decisions while using Sonnet's lower cost for implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secret 6: Give Claude a Way to Verify Its Own Work
&lt;/h2&gt;

&lt;p&gt;Boris's most underrated tip — and the one with the biggest impact on long-running agentic tasks.&lt;/p&gt;

&lt;p&gt;"Ensure Claude can validate its output through appropriate channels: bash testing for backend work, browser control via Chromium extension for frontend tasks, or Computer Use for desktop apps."&lt;/p&gt;

&lt;p&gt;The point isn't just "run tests." It's about wiring the verification loop into the task itself. If Claude can check whether it succeeded, it doesn't need you to check. It runs the test, sees the failure, fixes the code, runs the test again. The loop closes without human intervention.&lt;/p&gt;

&lt;p&gt;Boris's recommendation: make this explicit in your task specification. &lt;em&gt;"After implementing, run the full test suite. If any tests fail, fix them before stopping."&lt;/em&gt; Or for frontend work: &lt;em&gt;"After building the component, open it in the browser and verify it renders correctly at 1280px and 768px."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The verification method determines which tasks can safely run unattended and which can't. If you can't give Claude a way to check its own work, you're committed to reviewing every step. If you can, you're delegating, not supervising.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Opus 4.7 is more literal than 4.6. If your old prompts give worse results, it's because 4.7 no longer fills in implicit context. Add explicit success criteria and verification steps to every long-running task.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Secret 7: CLAUDE.md — The Compound Advantage
&lt;/h2&gt;

&lt;p&gt;This is Boris's oldest tip — first posted in an &lt;a href="https://news.ycombinator.com/item?id=46256606" rel="noopener noreferrer"&gt;HN thread&lt;/a&gt; where he wrote: "If there is anything Claude tends to repeatedly get wrong, not understand, or spend lots of tokens on, put it in your CLAUDE.md file, which Claude automatically reads and is a great way to avoid repeating yourself."&lt;/p&gt;

&lt;p&gt;In 2026, this pattern has compounded into a full organizational memory system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Team CLAUDE.md&lt;/strong&gt;: Committed to git. The whole team contributes. After Claude makes a mistake, someone adds the correction so it never happens again. Boris's team updates theirs multiple times a week.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supplementary notes directories&lt;/strong&gt;: Per-task markdown files in &lt;code&gt;.claude/notes/&lt;/code&gt;, referenced at session start for context-dense work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slash commands in &lt;code&gt;.claude/commands/&lt;/code&gt;&lt;/strong&gt;: Committed workflows like &lt;code&gt;/techdebt&lt;/code&gt; for removing duplication, or &lt;code&gt;/sync&lt;/code&gt; for pulling context from Slack and GitHub.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostToolUse hooks&lt;/strong&gt;: Automatic formatting after every file edit. No more CI failures from forgotten &lt;code&gt;prettier&lt;/code&gt; runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The compound effect is the story. Teams that have been doing this for 6 months have a CLAUDE.md encoding hundreds of learned rules. New team members (or new Claude sessions) instantly inherit months of institutional knowledge.&lt;/p&gt;

&lt;p&gt;With Opus 4.7's literal instruction-following, a well-maintained CLAUDE.md is more valuable than ever. In 4.6, Claude might infer what you meant. In 4.7, it executes exactly what you specified. The CLAUDE.md closes the gap.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/bcherny/status/2044839936235553167" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxn69kd08c9nvjxt8tzz1.png" alt="@bcherny — Opus 4.7 uses more thinking tokens, so we've increased rate limits for all subscribers to make up for it. Enjoy! (22.1K likes)" width="560" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://x.com/bcherny/status/2044839936235553167" rel="noopener noreferrer"&gt;View original post →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Community Is Saying
&lt;/h2&gt;

&lt;p&gt;The HN thread on Claude Opus 4.7 hit &lt;a href="https://news.ycombinator.com/item?id=47793411" rel="noopener noreferrer"&gt;1,947 points with 1,439 comments&lt;/a&gt; — significant even by HN standards for a model launch. The discussion went immediately practical: developers testing adaptive thinking behavior, benchmarking the tokenizer inflation, and debating whether xhigh effort default justifies the cost increase.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47793411" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo82f5ghd6vie5rv0tcq7.png" alt="Hacker News thread: Claude Opus 4.7 — 1947 points, 1439 comments. Top comment from simonw: 'I'm finding the adaptive thinking thing very confusing'" width="600" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://news.ycombinator.com/item?id=47793411" rel="noopener noreferrer"&gt;View original post →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The honest community verdict: the 45% tokenizer cost increase is real, and developers doing simple code generation are noticing it. But developers running complex agentic workflows — the exact use case these 7 secrets are designed for — are reporting fewer round-trips and better output quality than 4.6 at the same task. The math only works if you use the model the way its creator designed it to be used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/AlexFinn/status/2007585393584353688" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F626nnkzhtgmj1as0fayn.png" alt="@AlexFinn — Anthropic just released ALL the Claude Code secrets. I spent hours reading and testing all the tips. Here are the 10 that make Claude Code so much better." width="560" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://x.com/AlexFinn/status/2007585393584353688" rel="noopener noreferrer"&gt;View original post →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Start Here" Checklist
&lt;/h2&gt;

&lt;p&gt;You don't need to implement all seven at once. Here's the order that gives the fastest return:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1 (15 minutes):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Enable Auto Mode (Shift+Tab)&lt;/li&gt;
&lt;li&gt;[ ] Add explicit success criteria to your next task prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 1 (1 hour total):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Create a &lt;code&gt;CLAUDE.md&lt;/code&gt; with your project's conventions, anti-patterns, and past mistakes&lt;/li&gt;
&lt;li&gt;[ ] Run &lt;code&gt;/fewer-permission-prompts&lt;/code&gt; after your first three sessions and apply recommendations&lt;/li&gt;
&lt;li&gt;[ ] Set up one slash command for your most-repeated workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Month 1:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Establish verification loops for all long-running tasks (test commands, browser checks)&lt;/li&gt;
&lt;li&gt;[ ] Enable Recaps and learn your context rhythm across parallel sessions&lt;/li&gt;
&lt;li&gt;[ ] Try Focus Mode for a week of routine implementation work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Boris's own framing: "There is no one right way to use Claude Code — everyone's setup is different. You should experiment to see what works for you." The tips are starting points, not mandates.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Context
&lt;/h2&gt;

&lt;p&gt;Boris's post landed on the same day Anthropic shipped Opus 4.7, native binary packages for Claude Code (no more Node.js startup dependency), raised rate limits for all subscribers, and fixed a long-context rate limit bug within hours of deployment. That's four coordinated launches in one day.&lt;/p&gt;

&lt;p&gt;If you want to see how this fits into the broader Claude Code trajectory — including the &lt;a href="https://computeleap.com/blog/claude-code-routines-scheduled-agents-no-local-machine" rel="noopener noreferrer"&gt;scheduled agents and routines&lt;/a&gt; work that makes these workflow tips even more powerful when combined — that's worth reading before you go build.&lt;/p&gt;

&lt;p&gt;And for the cost context: if you're comparing Claude Code to alternatives, the &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;Anthropic vs OpenAI developer platform comparison&lt;/a&gt; has the current pricing breakdown including the Opus 4.7 tokenizer changes.&lt;/p&gt;

&lt;p&gt;The creator built the tool with these patterns in mind. Now you have the map.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/claude-code-opus-47-creator-secrets-expert-tips" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>GenericAgent and EvoMap: How AI Grows Its Own Skill Trees</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sat, 18 Apr 2026 04:23:12 +0000</pubDate>
      <link>https://dev.to/max_quimby/genericagent-and-evomap-how-ai-grows-its-own-skill-trees-13h6</link>
      <guid>https://dev.to/max_quimby/genericagent-and-evomap-how-ai-grows-its-own-skill-trees-13h6</guid>
      <description>&lt;p&gt;GitHub trending today: two repos at 800+ stars per day, both built around the same idea. &lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;GenericAgent&lt;/a&gt; hit 848 stars Thursday. &lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; hit 750. Both describe agents that accumulate capabilities from their own execution history. Neither one retrains a model. Both are working in production deployments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-genericagent-evomap-skill-trees-guide/" rel="noopener noreferrer"&gt;Read the full version with screenshots and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The category is called self-evolving agents. It's been an academic concept since 2023. What's different in April 2026 is that the implementations are arriving, they're open-source, and the organic velocity suggests the developer community is treating them seriously — not as research curiosities, but as infrastructure.&lt;/p&gt;

&lt;p&gt;This article covers what GenericAgent and EvoMap actually do technically, how they differ from each other and from older approaches, &lt;a href="https://agentconn.com/blog/ai-agent-security-risks/" rel="noopener noreferrer"&gt;the security exposure they create&lt;/a&gt;, and what the current evidence says about whether the "self-evolving" claim holds up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Self-Evolving Agents Actually Are (and Aren't)
&lt;/h2&gt;

&lt;p&gt;The phrase "self-evolving" is doing a lot of work. Before going further, the distinction that matters most:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of these systems modify model weights.&lt;/strong&gt; GenericAgent, EvoMap, OpenSpace, and the other repos trending this week are not fine-tuning their underlying LLMs. They are not doing RL on the fly. They are accumulating structured artifacts — skills, gene fragments, capsules, playbooks — as external memory that grows over time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2507.21046" rel="noopener noreferrer"&gt;Two recent academic surveys&lt;/a&gt; have converged on the same framework for understanding this. The feedback loop is: agent executes a task → environment responds → optimizer extracts patterns → skill store is updated → next execution draws on those patterns. The agent gets more capable with each cycle not because the model improves, but because the &lt;em&gt;tools available to the model&lt;/em&gt; improve.&lt;/p&gt;

&lt;p&gt;This distinction matters for managing expectations. What these systems do extremely well: compress common task patterns into reusable primitives, avoid re-solving problems already solved, and reduce token consumption dramatically. What they don't do: generalize to genuinely novel domains the underlying model couldn't handle, or improve their reasoning.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://news.ycombinator.com/item?id=44884091" rel="noopener noreferrer"&gt;Hacker News thread on the comprehensive survey paper&lt;/a&gt; (94 points, 29 comments) laid this out plainly: the skeptic position is that LLMs can't truly learn without fine-tuning or RL — "self-improvement is really prompt/tool optimization, not weight updates." The practitioner position, from people who've actually built these systems, is that process recursion (skill accumulation, which works) is genuinely valuable even if it's not weight modification (genuine learning, which requires training).&lt;/p&gt;

&lt;p&gt;Both are correct. The skill-tree approach is a real capability improvement. Just not the one the marketing usually implies.&lt;/p&gt;

&lt;h2&gt;
  
  
  GenericAgent: The Skill Tree Approach
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;GenericAgent&lt;/a&gt; makes its design philosophy explicit in the README: "grows a skill tree from a 3,300-line seed, achieving full system control with 6x less token consumption."&lt;/p&gt;

&lt;p&gt;The architecture is minimal by design. Nine atomic tools (browser, terminal, filesystem, keyboard/mouse, screen vision, mobile ADB) sit beneath a ~100-line agent loop. No preloaded skill library. No custom tooling scaffolding. The skills emerge from execution.&lt;/p&gt;

&lt;p&gt;Every time the agent successfully completes a task, it crystallizes the approach into a reusable skill. The next time a similar task appears, the agent retrieves the relevant skill from its tree rather than reasoning from scratch. Over time the tree grows — from that 3,300-line seed to a comprehensive library of the agent's accumulated operational knowledge.&lt;/p&gt;

&lt;p&gt;The "6x less token consumption" claim is the most concrete assertion in the README. It's an architectural consequence: GenericAgent operates in under 30K tokens by design, versus 200K-1M for most agentic frameworks. Each skill retrieval replaces what would otherwise be a multi-turn planning session burning context.&lt;/p&gt;

&lt;p&gt;This is the real competitive claim: skill accumulation as a &lt;strong&gt;context compression strategy&lt;/strong&gt;. It trades generalizability for efficiency — the model doesn't burn tokens re-solving problems it's seen before.&lt;/p&gt;

&lt;p&gt;The April 2026 update added L4 session archive memory and scheduler/cron integration — meaning skills can now persist across sessions and agents can execute scheduled tasks autonomously.&lt;/p&gt;

&lt;p&gt;Supported backends: Claude, Gemini, Kimi, MiniMax. Frontend integrations: WeChat, Telegram, Feishu, DingTalk, QQ.&lt;/p&gt;

&lt;p&gt;For context: &lt;a href="https://agentconn.com/blog/archon-open-source-harness-builder-ai-coding-deterministic-review/" rel="noopener noreferrer"&gt;Archon's harness approach&lt;/a&gt; solves determinism through structured YAML workflows. GenericAgent solves efficiency through accumulated skills. Different problems, complementary solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  EvoMap/evolver: The Genome Evolution Protocol
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; takes the same core concept — skills accumulating from execution — and frames it in biological terms. The README: &lt;em&gt;"Evolution is not optional. Adapt or die."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The GEP (Genome Evolution Protocol) treats agent behaviors as genes. Successful approaches are encoded as "gene fragments" stored in &lt;code&gt;genes.json&lt;/code&gt; and &lt;code&gt;capsules.json&lt;/code&gt;. The evolution pipeline scans execution logs, selects assets worth preserving, generates updated GEP prompts, and writes an audit trail to &lt;code&gt;events.jsonl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What's different from GenericAgent's approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit audit trail.&lt;/strong&gt; Every evolution event is logged. You can inspect what changed, when, and why.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback via Git.&lt;/strong&gt; The system requires Git and uses it for rollback + blast-radius calculation when an evolution step goes wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline by default.&lt;/strong&gt; No external API dependency for the core evolution loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety gate.&lt;/strong&gt; Commands must use node/npm/npx prefix; 180-second timeout enforced.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The four evolution strategies — balanced, innovate, harden, repair-only — let operators tune the system's risk appetite. A production deployment running critical workflows would use &lt;code&gt;repair-only&lt;/code&gt;. An exploration environment would use &lt;code&gt;innovate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;EvoMap/evolver is more conservative than GenericAgent by default — it assumes the evolved gene pool needs governance, not just accumulation.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenSpace and the Benchmarks That Matter
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/HKUDS/OpenSpace" rel="noopener noreferrer"&gt;OpenSpace&lt;/a&gt; (5,400 stars, 656 forks) provides the most concrete benchmark data in the self-evolving agent space. The GDPVal benchmark — 50 professional tasks across compliance, engineering, and document generation — is the most controlled evaluation available.&lt;/p&gt;

&lt;p&gt;Results: 4.2x higher income versus baseline agents using the same LLM backbone. 46% fewer tokens on real-world professional tasks. $11,484 of $15,764 possible earned in 6 hours.&lt;/p&gt;

&lt;p&gt;The 46% token reduction is the more reliable signal — it's reproducible across runs and doesn't depend on benchmark weighting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent-self-evolution" rel="noopener noreferrer"&gt;Nous Research's Hermes/GEPA approach&lt;/a&gt; is the most technically sophisticated variant. GEPA (Genetic-Pareto Prompt Evolution) analyzes failure &lt;em&gt;causes&lt;/em&gt; rather than just detecting failures, then proposes targeted text mutations. No GPU required. $2-10 per optimization run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Surface
&lt;/h2&gt;

&lt;p&gt;Two repos at 800 stars/day about agents that can write and execute their own skills is also a description of agents that can write and execute arbitrary code with system-level access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2602.12430" rel="noopener noreferrer"&gt;An arXiv paper (2602.12430)&lt;/a&gt; is the clearest statement of the exposure: &lt;strong&gt;26.1% of community-contributed skills contain vulnerabilities.&lt;/strong&gt; Community skill registries are already being used for credential exfiltration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2026/agentic-ai-evolution-and-the-security-claw" rel="noopener noreferrer"&gt;ISACA's April 7 analysis&lt;/a&gt; catalogs four specific risk categories: visibility gaps, prompt-layer compromise, supply chain attacks, and direct vulnerabilities.&lt;/p&gt;

&lt;p&gt;The stat that should be in every write-up: &lt;strong&gt;1 in 8 companies reported AI breaches linked to agentic systems&lt;/strong&gt; in ISACA's 2026 threat report.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apiiro.com/blog/code-execution-risks-agentic-ai/" rel="noopener noreferrer"&gt;Apiiro's research&lt;/a&gt; adds the blast-radius calculation: a single compromised agent poisoned 87% of downstream decision-making within 4 hours.&lt;/p&gt;

&lt;p&gt;This is the &lt;a href="https://agentconn.com/blog/ai-agent-supply-chain-attacks-litellm-breach-security-2026/" rel="noopener noreferrer"&gt;supply chain attack surface for AI agents&lt;/a&gt; in its most direct form. When the agent can write its own tools, the skill store is the attack surface.&lt;/p&gt;

&lt;p&gt;The question to ask before deploying either: &lt;em&gt;What's the worst my agent can do, and is that OK?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Fits: Practical Assessment
&lt;/h2&gt;

&lt;p&gt;The "RIP Pull Requests" framing is directionally correct but premature. Stripe is producing 1,300 agent-written PRs per week — with human review still in the loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases where skill accumulation pays now:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autonomous debugging loops where the agent encounters the same class of failure repeatedly&lt;/li&gt;
&lt;li&gt;Code generation pipelines for standardized patterns&lt;/li&gt;
&lt;li&gt;Research scaffolding for structured tasks with repetitive retrieval patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use cases where you need more than current implementations offer:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Novel domain problems the model hasn't seen&lt;/li&gt;
&lt;li&gt;High-security environments where unvetted skill execution is unacceptable&lt;/li&gt;
&lt;li&gt;Tasks requiring genuine reasoning improvement (skill trees don't improve reasoning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most useful framing: practitioners distinguish "process recursion" (skill accumulation, which works) from "weight modification" (genuine learning, which doesn't work without training). GenericAgent and EvoMap do the former.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-minimax-m27-darwin-godel-2026/" rel="noopener noreferrer"&gt;prior generation of self-evolving work&lt;/a&gt; — MiniMax M2.7 and Darwin-Gödel — focused on weight-level adaptation. These new repos have moved the practical frontier to skill-level accumulation. Less dramatic, more deployable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The security tooling gap.&lt;/strong&gt; A skill registry vetting process analogous to npm audit doesn't exist yet. When it does, it will reshape how teams deploy these systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The context compression thesis.&lt;/strong&gt; GenericAgent's core claim (6x token reduction via skill trees) is the most testable assertion in the category. Independent benchmarks at production scale will validate or falsify it in the next quarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The convergence signal.&lt;/strong&gt; When GitHub trending, HN, X/Twitter discourse, and the Latent.Space newsletter all point the same direction in the same day, the signal is "agents are already the infrastructure."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-genericagent-evomap-skill-trees-guide/" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>selfevolving</category>
      <category>aiagents</category>
      <category>security</category>
      <category>agentdev</category>
    </item>
    <item>
      <title>Claude Design: Anthropic's AI Design Tool, Explained</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sat, 18 Apr 2026 03:31:33 +0000</pubDate>
      <link>https://dev.to/max_quimby/claude-design-anthropics-ai-design-tool-explained-4b5l</link>
      <guid>https://dev.to/max_quimby/claude-design-anthropics-ai-design-tool-explained-4b5l</guid>
      <description>&lt;p&gt;Anthropic shipped Claude Design today — a research-preview product that turns text prompts into prototypes, slide decks, one-pagers, and marketing assets using Claude Opus 4.7. It's the kind of launch that makes &lt;a href="https://sherwood.news/tech/anthropic-launches-claude-design-sending-shares-of-figma-down/" rel="noopener noreferrer"&gt;Figma stock drop 6.8% before lunch&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/claude-design-anthropic-ai-design-tool-handoff-claude-code" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the headline number misses the more interesting story. Claude Design isn't competing with Figma the way a fancier Canva would. What makes it structurally different is one button: &lt;strong&gt;Handoff to Claude Code&lt;/strong&gt;. When your prototype is ready, you pass it directly to Claude Code, which implements it as production code. Design-to-deployed, inside a single conversation.&lt;/p&gt;

&lt;p&gt;That's the story. Let's break it down.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/claudeai/status/2045156267690213649" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5psko6veq0u6nl4j55s.png" alt="@claudeai announcing Claude Design: make prototypes, slides, and one-pagers by talking to Claude — 31.4M views" width="548" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Claude Design?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/news/claude-design-anthropic-labs" rel="noopener noreferrer"&gt;Claude Design&lt;/a&gt; is a visual design tool built directly into Claude. You describe what you want — "a landing page for a B2B SaaS product, dark mode, emphasize the ROI calculator" — and Claude generates it. You refine it through conversation, inline comments, direct text edits, or custom sliders that adjust spacing, color, and layout in real-time.&lt;/p&gt;

&lt;p&gt;Anthropic positions it for "founders and product managers without a design background" who need to go "from an idea to something visual quickly." That's accurate as far as it goes, but it undersells what the tool actually does for designers too — specifically, the speed at which you can explore design directions before committing to Figma.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it can produce:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive prototypes for user testing&lt;/li&gt;
&lt;li&gt;Product wireframes and mockups&lt;/li&gt;
&lt;li&gt;Pitch decks and presentations&lt;/li&gt;
&lt;li&gt;Marketing one-pagers and landing pages&lt;/li&gt;
&lt;li&gt;Social media assets&lt;/li&gt;
&lt;li&gt;What Anthropic calls "frontier design": code-powered prototypes with voice, video, 3D elements, and special effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What powers it:&lt;/strong&gt; Claude Opus 4.7, Anthropic's latest and most capable vision model, released earlier this week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who can use it:&lt;/strong&gt; Available in research preview for Claude Pro, Max, Team, and Enterprise subscribers at no extra cost. Rolling out gradually throughout today.&lt;/p&gt;

&lt;p&gt;The brand integration during onboarding is genuinely useful: Claude reads your codebase and design files to automatically apply your colors, typography, and component patterns to every project it creates. Datadog noted that prototyping that used to require "a week of back-and-forth" now happens "in a single conversation." That's not marketing copy — that's the design system import working as advertised.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Claude Code Handoff: One Button to Production
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The Claude Code handoff is the feature that no other AI design tool has. It's not a nice-to-have — it's the reason Claude Design belongs in a developer's workflow rather than just a designer's.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the workflow that didn't exist six months ago:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Claude Design and describe your UI&lt;/li&gt;
&lt;li&gt;Refine through conversation until it looks right&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Handoff to Claude Code&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Claude packages the design into a handoff bundle&lt;/li&gt;
&lt;li&gt;Claude Code receives the bundle and implements it as production code&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole thing happens inside a single conversation. No Figma export. No copy-pasting design specs. No design-to-dev handoff meeting.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.banani.co/blog/claude-design-review" rel="noopener noreferrer"&gt;Banani first impressions review&lt;/a&gt; captures why this matters: &lt;em&gt;"The Claude Code handoff is a genuinely different workflow unavailable a year ago."&lt;/em&gt; Every other AI design tool — Lovable, v0, Figma Make — is exploring the design-to-code space from the design direction. Claude Design is doing it from the AI-native direction, and the Claude Code integration is the proof.&lt;/p&gt;

&lt;p&gt;For context on why this is significant: Claude Code already has &lt;a href="https://computeleap.com/blog/claude-code-quota-limits-billing-changes-2026" rel="noopener noreferrer"&gt;quota-aware billing&lt;/a&gt;, background task execution via &lt;a href="https://computeleap.com/blog/claude-code-routines-scheduled-agents-no-local-machine" rel="noopener noreferrer"&gt;Claude Code Routines&lt;/a&gt;, and a mature CLI. When Claude Design feeds work into that infrastructure, you're not just getting generated code — you're getting code that can be immediately deployed, tested, and maintained within the same agent system.&lt;/p&gt;

&lt;p&gt;The handoff bundle includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generated HTML/CSS/component structure&lt;/li&gt;
&lt;li&gt;Design tokens (colors, spacing, typography) mapped to your existing design system&lt;/li&gt;
&lt;li&gt;Responsive breakpoints set from Claude's interpretation of your design intent&lt;/li&gt;
&lt;li&gt;Inline comments documenting structural decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It exports as standalone HTML files or routes directly to Claude Code. Notably, it does &lt;em&gt;not&lt;/em&gt; export to Figma — which is either an oversight or a strategic choice, given that &lt;a href="https://thenewstack.io/anthropic-claude-design-launch/" rel="noopener noreferrer"&gt;The New Stack reports&lt;/a&gt; Anthropic's chief product officer resigned from Figma's board the same week.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Actually Works: Inputs, Workflow, and Outputs
&lt;/h2&gt;

&lt;p&gt;The interaction model is richer than most AI generation tools. You're not just typing prompts into a chatbox and getting static images back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input methods:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text prompts&lt;/strong&gt; — describe the design from scratch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document upload&lt;/strong&gt; — provide a DOCX, PPTX, or XLSX as source material&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codebase reference&lt;/strong&gt; — Claude reads your existing code to match your stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web element capture&lt;/strong&gt; — grab visual elements from existing pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Refinement tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat requests&lt;/strong&gt; — "make the hero section larger, add a gradient"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline comments&lt;/strong&gt; — click any element to leave a revision note&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct text editing&lt;/strong&gt; — click to edit copy without disrupting the layout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom sliders&lt;/strong&gt; — Claude generates adjustment controls specific to your design (spacing multiplier, color temperature, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Export options:&lt;/strong&gt; PDF, PPTX, HTML, Canva. The Canva integration is notable — Canva highlighted making "ideas and drafts from Claude Design into Canva...fully editable and collaborative designs," positioning it as complementary rather than competitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collaboration:&lt;/strong&gt; Designs support private, link-shared, and organization-scoped multiplayer editing with view and edit permissions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Included within existing subscription limits on Pro, Max, Team, and Enterprise plans. No separate SKU. After weekly limits are hit, usage goes to pay-as-you-go.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;a href="https://venturebeat.com/technology/anthropic-just-launched-claude-design-an-ai-tool-that-turns-prompts-into-prototypes-and-challenges-figma" rel="noopener noreferrer"&gt;VentureBeat&lt;/a&gt; found that complex pages requiring 20+ prompts on competing tools now complete in just 2 prompts with Claude Design's design system context.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://techcrunch.com/2026/04/17/anthropic-launches-claude-design-a-new-product-for-creating-quick-visuals/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt; described Claude Design as Anthropic's push to help users "move from an idea to something visual quickly" outside traditional design platforms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/artificial/comments/1so44z2/claude_design_a_new_anthropic_labs_product_lets/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tavrthqonc38303a951.png" alt="Reddit r/artificial: Claude Design post — Top 1% Poster, community discussion" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Design vs. Figma, Lovable, and v0
&lt;/h2&gt;

&lt;p&gt;The competitive frame matters here. Claude Design is not trying to be Figma. Let's be clear about what each tool is actually for:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Production-ready?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Design&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rapid exploration, non-designer prototyping, Claude Code handoff&lt;/td&gt;
&lt;td&gt;Via handoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Figma&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Professional UI/UX design, component systems, design-dev collaboration&lt;/td&gt;
&lt;td&gt;Via dev mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lovable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full-stack app generation from brief to deployment&lt;/td&gt;
&lt;td&gt;Yes (React + Supabase)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v0 (Vercel)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;React component generation, shadcn/ui-based UI scaffolding&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Figma Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI generation within Figma's professional environment&lt;/td&gt;
&lt;td&gt;Via Figma dev mode&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;a href="https://muz.li/blog/vibe-design-in-2026-what-ai-generated-ui-means-for-your-work/" rel="noopener noreferrer"&gt;Muzli analysis of vibe design in 2026&lt;/a&gt; puts it well: practitioners have settled into a three-layer workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exploration layer:&lt;/strong&gt; Claude Design (or Claude Artifacts) for rapid concept generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build layer:&lt;/strong&gt; Lovable or v0 when you need a full-stack app with real data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision layer:&lt;/strong&gt; Claude Code with Figma MCP when you need production-quality implementation against your actual design system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Design owns the exploration layer and has a unique bridge to the precision layer via the Claude Code handoff. Lovable and v0 own the build layer for full apps. Figma remains the professional standard for component-level precision.&lt;/p&gt;

&lt;p&gt;What's new as of today: the bridge between exploration and precision is one button instead of a multi-day handoff process.&lt;/p&gt;

&lt;p&gt;Figma's stock reaction (−6.8%) reflects genuine concern about the exploration layer, where Figma's own "First Draft" feature has struggled to dominate. Figma's moat is in the precision layer — component systems, dev mode, annotation workflows, Jira integrations. Claude Design doesn't threaten that moat directly. But it potentially removes the entry point that brought users to Figma in the first place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47806725" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53ar63pvl0uhlzb5gdty.png" alt="Hacker News: Claude Design thread with 423 points and 259 comments" width="800" height="900"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Homogeneity Problem: Why This Matters (and What's Its Ceiling)
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=47806725" rel="noopener noreferrer"&gt;HN thread on Claude Design&lt;/a&gt; hit 423 points and 259 comments, and the most-upvoted discussion wasn't about the tool — it was about what it reveals about modern web design.&lt;/p&gt;

&lt;p&gt;Top comment from user &lt;strong&gt;ljm&lt;/strong&gt; (877 upvotes): &lt;em&gt;"The internet has become too uniform since Web 2.0 and Bootstrap. While AI-generated UIs will be 'competent,' they'll lack uniqueness or innovation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the honest ceiling. Claude Design works &lt;em&gt;because&lt;/em&gt; the web has converged to a narrow visual language. Sans-serif fonts. Card grids. Sidebar navigation. Blue accent colors. SaaS templates built on the same handful of patterns. Claude Opus 4.7 can generate "competent" designs because competent has been defined by everyone copying everyone else for fifteen years.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://muz.li/blog/vibe-design-in-2026-what-ai-generated-ui-means-for-your-work/" rel="noopener noreferrer"&gt;Muzli team identified&lt;/a&gt; the signature of AI-generated UI without intentional direction: &lt;em&gt;"a blue accent color, an Inter-like font at default weight, a sidebar with icons and labels, a card grid, a data table."&lt;/em&gt; Every output reads like a variation of the same three SaaS templates.&lt;/p&gt;

&lt;p&gt;But the HN counterarguments are worth hearing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ &lt;strong&gt;mbesto:&lt;/strong&gt; "Homogenous design serves practical purposes — internal tools benefit from familiarity and predictability over aesthetic distinctiveness." This is true. Most software is internal tooling.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;simplyluke&lt;/strong&gt; made the sharpest counter: "Lower barriers to acceptable design will paradoxically increase the value of truly exceptional products." When everyone's landing page is competent, brand differentiation through exceptional design becomes a stronger signal, not weaker.&lt;/p&gt;

&lt;p&gt;All three perspectives are correct in their domains. For internal dashboards, admin panels, and investor decks, "competent but interchangeable" is exactly what you need. The homogeneity ceiling matters for consumer apps and brand differentiation. It doesn't matter for the 80% of business software that needs to be functional and consistent, not beautiful and unique.&lt;/p&gt;

&lt;p&gt;Claude Design's honest use case is that 80%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Claude Design Fits in a Real Design-to-Deploy Stack
&lt;/h2&gt;

&lt;p&gt;For teams already using Claude Code, the practical workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Idea → Claude Design (exploration, rapid prototyping)
     → Claude Code Handoff (implementation bundle generated)
     → Claude Code (production code, deployed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the closed loop Anthropic is building. Claude Design is the on-ramp; Claude Code is the engine room. For teams already invested in the Claude ecosystem — &lt;a href="https://computeleap.com/blog/claude-cowork-complete-guide-2026" rel="noopener noreferrer"&gt;Claude Cowork&lt;/a&gt; for collaboration, &lt;a href="https://computeleap.com/blog/claude-code-routines-scheduled-agents-no-local-machine" rel="noopener noreferrer"&gt;Claude Code Routines&lt;/a&gt; for background tasks — Design slots in at the front of the pipeline.&lt;/p&gt;

&lt;p&gt;The comparison point from Anthropic's &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;platform strategy&lt;/a&gt; is relevant here: Anthropic is building a vertically integrated product surface where Claude is the intelligence layer at every step. Design generates intent; Code implements it; Cowork coordinates the team. OpenAI is playing horizontal platform — Anthropic is going vertical on the developer workflow.&lt;/p&gt;

&lt;p&gt;The risk of this strategy is lock-in. If you're using Claude Design for exploration and Claude Code for implementation, you're optimizing for the Anthropic ecosystem. That trade-off is worth being clear-eyed about, even if Claude is currently the best model for this workflow.&lt;/p&gt;

&lt;p&gt;For teams not already in the Claude ecosystem: Claude Design works as a standalone tool. The handoff to Claude Code is optional. You can use it purely for exploration and export HTML or PPTX to your existing workflow. The ecosystem lock-in only applies if you choose it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/artificial/search/?q=Claude+Design&amp;amp;sort=new&amp;amp;t=day" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwe7ietc6tu9gite3c5wm.png" alt="Reddit r/artificial search for Claude Design — community posts and reactions" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is It Worth It? Who Should Use Claude Design
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Claude Design if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're a founder or PM who needs to communicate design intent to engineers without Figma skills&lt;/li&gt;
&lt;li&gt;You're an engineer who wants to iterate on UI ideas before handing off to a designer&lt;/li&gt;
&lt;li&gt;You're already on Claude Pro/Max/Team and want to collapse the prototype-to-code step&lt;/li&gt;
&lt;li&gt;Your use case is internal tooling, MVP prototyping, or investor-facing decks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip Claude Design if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're a professional designer doing precision component systems work (Figma still wins)&lt;/li&gt;
&lt;li&gt;You need a full deployed app immediately (Lovable gets you there faster)&lt;/li&gt;
&lt;li&gt;Your product requires highly differentiated visual design (the homogeneity ceiling is real)&lt;/li&gt;
&lt;li&gt;You're on the free Claude tier (research preview is paid plans only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pricing math is simple: if you're already paying for Claude Pro ($20/month), Claude Design is included. That's a meaningfully lower entry point than Figma Professional ($15/month per seat), Lovable's paid tier, or any bespoke design tool subscription.&lt;/p&gt;

&lt;p&gt;The honest summary from the &lt;a href="https://www.banani.co/blog/claude-design-review" rel="noopener noreferrer"&gt;Banani review&lt;/a&gt;: it's in research preview, which means rough edges and limited access initially. But the Claude Code handoff is real, it works, and it's something no other AI design tool has.&lt;/p&gt;

&lt;p&gt;Anthropic's CPO resigning from Figma's board the same week Claude Design launched was either coincidental or the most efficient press release in Silicon Valley history. Either way, Figma noticed — and so should you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/claude-design-anthropic-ai-design-tool-handoff-claude-code" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
  </channel>
</rss>
