<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 정상록</title>
    <description>The latest articles on DEV Community by 정상록 (@_46ea277e677b888e0cd13).</description>
    <link>https://dev.to/_46ea277e677b888e0cd13</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3798483%2F5958ee10-90dd-45e5-815a-91d3f8196156.png</url>
      <title>DEV Community: 정상록</title>
      <link>https://dev.to/_46ea277e677b888e0cd13</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_46ea277e677b888e0cd13"/>
    <language>en</language>
    <item>
      <title>Microsoft-OpenAI Partnership Amendment (April 2026): What Indie Devs Need to Know About Non-Exclusive Licenses and Revenue Caps</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:46:32 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/microsoft-openai-partnership-amendment-april-2026-what-indie-devs-need-to-know-about-58p6</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/microsoft-openai-partnership-amendment-april-2026-what-indie-devs-need-to-know-about-58p6</guid>
      <description>&lt;p&gt;On April 27, 2026, Microsoft published an amendment to its OpenAI partnership. The headline reads "long-term clarity," but the actual changes restructure how AI infrastructure is distributed across cloud providers.&lt;/p&gt;

&lt;p&gt;This post breaks down what changed and why it matters for indie developers and small teams building on OpenAI APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 Changes That Matter
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Azure preference&lt;/td&gt;
&lt;td&gt;Exclusive&lt;/td&gt;
&lt;td&gt;Primary (multi-cloud allowed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP license&lt;/td&gt;
&lt;td&gt;Through 2032, &lt;strong&gt;exclusive&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Through 2032, &lt;strong&gt;non-exclusive&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft → OpenAI revenue share&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Ended&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI → Microsoft revenue share&lt;/td&gt;
&lt;td&gt;Until AGI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Until 2030 + total cap&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft's OpenAI stake&lt;/td&gt;
&lt;td&gt;~27% ($135B value)&lt;/td&gt;
&lt;td&gt;Maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two words changed everything: &lt;strong&gt;"exclusive" was removed&lt;/strong&gt;, and &lt;strong&gt;"cap" was added&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change 1: Azure Primary, Not Exclusive
&lt;/h2&gt;

&lt;p&gt;OpenAI still launches products on Azure first. But here's the structural shift: &lt;strong&gt;OpenAI can now serve any product on any cloud provider.&lt;/strong&gt; AWS Bedrock, Google Cloud Vertex AI, self-hosted infrastructure—all legally possible.&lt;/p&gt;

&lt;p&gt;TechCrunch headlined this as "OpenAI ends Microsoft legal peril over its $50B Amazon deal." The previously blocked OpenAI-Amazon $50B agreement is no longer constrained by exclusivity terms.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Means for Your Stack
&lt;/h3&gt;

&lt;p&gt;Today, if you use OpenAI APIs, you typically pick:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct OpenAI platform access&lt;/li&gt;
&lt;li&gt;Azure OpenAI Service (for enterprise/regulated industries)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Soon, you'll likely add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Bedrock (with native OpenAI model availability)&lt;/li&gt;
&lt;li&gt;GCP Vertex AI (same)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More options means actual price negotiation power and region-level latency optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change 2: Non-Exclusive IP License (The Biggest Structural Shift)
&lt;/h2&gt;

&lt;p&gt;The IP license through 2032 stays, but it's now non-exclusive. The gatekeeper role Microsoft played for OpenAI tech access is over.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft Copilot vs. OpenAI ChatGPT: Stop Treating Them As the Same Product
&lt;/h3&gt;

&lt;p&gt;Many users assumed "Copilot and ChatGPT are the same GPT model with different interfaces." Under non-exclusivity, this assumption breaks down.&lt;/p&gt;

&lt;p&gt;Both companies can now develop their products in different directions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft can invest more in proprietary models (MAI, Phi series)&lt;/li&gt;
&lt;li&gt;OpenAI can distribute GPT successors freely across channels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For your stack: &lt;strong&gt;evaluate them as separate product lines.&lt;/strong&gt; "I heard Copilot is good, so ChatGPT must be similar" is increasingly wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change 3: Revenue Share Restructuring + AGI Decoupling
&lt;/h2&gt;

&lt;p&gt;This is the subtle but most important change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft → OpenAI: Ended
&lt;/h3&gt;

&lt;p&gt;Microsoft no longer pays a portion of its AI revenue to OpenAI.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI → Microsoft: Until 2030 + Cap, AGI-Decoupled
&lt;/h3&gt;

&lt;p&gt;The old contract had this revenue share ending "at AGI achievement." But AGI definitions were a constant legal friction point between the two companies.&lt;/p&gt;

&lt;p&gt;The amendment explicitly decouples it: &lt;strong&gt;"regardless of OpenAI's technology progress."&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether AGI is achieved or not&lt;/li&gt;
&lt;li&gt;Sharing continues until 2030&lt;/li&gt;
&lt;li&gt;Same percentage, but with a &lt;strong&gt;total cap&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Both Companies Win Here
&lt;/h3&gt;

&lt;p&gt;The AGI argument was slowing down both companies. Every release required legal teams to evaluate "does this trigger the AGI clause?" That overhead is gone.&lt;/p&gt;

&lt;p&gt;Expect both companies to ship faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change 4: Clear End Dates for Enterprises
&lt;/h2&gt;

&lt;p&gt;For enterprise architecture decisions, this is the best news.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2030&lt;/strong&gt;: OpenAI → Microsoft revenue share ends + cap may be reached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2032&lt;/strong&gt;: IP license expires&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Previously, "AGI achievement" was an undefined trigger. Now there's a clear clock for migration planning, license renewal negotiations, and alternative model evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change 5: Microsoft's 27% Stake Stays — This Is Not a Separation
&lt;/h2&gt;

&lt;p&gt;Microsoft maintains its ~27% stake (~$135B value) as a major shareholder. Joint work continues on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gigawatt-scale data center buildout&lt;/li&gt;
&lt;li&gt;Next-gen AI silicon&lt;/li&gt;
&lt;li&gt;AI cybersecurity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is relationship redefinition, not separation. Like a couple acknowledging each other's careers and friendships rather than divorcing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Action Items for Indie Devs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Re-evaluate Multi-Cloud Options
&lt;/h3&gt;

&lt;p&gt;If you're running on Azure OpenAI Service today, prepare for AWS Bedrock and GCP Vertex AI to add OpenAI models in the next 6-12 months. Pre-document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing comparison&lt;/li&gt;
&lt;li&gt;Region availability&lt;/li&gt;
&lt;li&gt;Data governance policy differences&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Treat Copilot and ChatGPT as Separate Products
&lt;/h3&gt;

&lt;p&gt;Use these criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Copilot&lt;/strong&gt;: Office 365/Windows integration, enterprise data governance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI ChatGPT&lt;/strong&gt;: Latest model availability, API flexibility, direct integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Mark Calendar Triggers
&lt;/h3&gt;

&lt;p&gt;For any service with deep OpenAI dependency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2029-2030&lt;/strong&gt;: Pricing policy may shift before revenue share ends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2031-2032&lt;/strong&gt;: License terms in Microsoft channels may shift before IP license ends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These dates are negotiation cards if you're signing 5-year contracts now.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Avoid AGI Marketing Hype
&lt;/h3&gt;

&lt;p&gt;Both companies are now legally free from AGI definition battles. Expect a shift from "AGI proximity" marketing to &lt;strong&gt;concrete capability benchmarks&lt;/strong&gt; (coding, reasoning, domain expertise). That's a more honest comparison framework anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Old model: Exclusive license + AGI-triggered revenue share = locked-in vendor + fuzzy timeline
New model: Non-exclusive license + capped revenue share + clear 2030/2032 = open vendor + predictable horizon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For indie devs and small teams: vendor lock-in is the killer. More OpenAI distribution channels means more pricing leverage for everyone.&lt;/p&gt;

&lt;p&gt;This is the official end of the single-vendor era for OpenAI APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/" rel="noopener noreferrer"&gt;Microsoft official blog (2026-04-27)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/next-phase-of-microsoft-partnership/" rel="noopener noreferrer"&gt;OpenAI announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/04/27/openai-microsoft-partnership-revenue-cap.html" rel="noopener noreferrer"&gt;CNBC coverage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/27/openai-ends-microsoft-legal-peril-over-its-50b-amazon-deal/" rel="noopener noreferrer"&gt;TechCrunch on Amazon deal implications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;What's your take? Are you planning to evaluate AWS Bedrock or GCP Vertex AI for OpenAI workloads once they're available?&lt;/p&gt;

</description>
    </item>
    <item>
      <title>TradingAgents v0.2.4: A Multi-Agent LLM Framework That Simulates an Entire Trading Firm</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:35:34 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/tradingagents-v024-a-multi-agent-llm-framework-that-simulates-an-entire-trading-firm-g2e</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/tradingagents-v024-a-multi-agent-llm-framework-that-simulates-an-entire-trading-firm-g2e</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;UCLA Tauric Research released &lt;strong&gt;TradingAgents v0.2.4&lt;/strong&gt; (2026-04-25) — a LangGraph-based multi-agent LLM framework that mimics a real trading firm with 5 layers and ~12 agents. The new release adds Pydantic-typed structured outputs, LangGraph checkpoint resumption, a persistent decision-memory file, 5-tier rating, and 10 LLM provider integrations. Backtest on AAPL/GOOGL/AMZN shows 23-27% cumulative return.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Disclaimer: backtest only. Not financial advice. Paper trade before any real capital.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why this is interesting beyond "another trading bot"
&lt;/h2&gt;

&lt;p&gt;Most LLM trading bots are a single model with a giant prompt. They suffer hard from &lt;strong&gt;confirmation bias&lt;/strong&gt; — once they form an initial thesis, they cherry-pick evidence to support it.&lt;/p&gt;

&lt;p&gt;TradingAgents counters this &lt;strong&gt;structurally&lt;/strong&gt; with 5 layers of explicit role-based agents that argue with each other:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Analyst Team x4] -&amp;gt; [Bull vs Bear debate]
        |
        v
   [Trader (3-tier)]
        |
        v
[Risk Mgmt: Aggressive vs Conservative vs Neutral]
        |
        v
   [Portfolio Manager (5-tier)] -&amp;gt; Buy / Overweight / Hold / Underweight / Sell
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole pipeline runs on &lt;strong&gt;LangGraph state graph&lt;/strong&gt; with explicit handoffs. You can replace any node, log any state, and resume from any checkpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.2.4 highlights for developers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Structured output decision agents
&lt;/h3&gt;

&lt;p&gt;The Research Manager, Trader, and Portfolio Manager now use &lt;code&gt;llm.with_structured_output(Schema)&lt;/code&gt; with Pydantic schemas. This works across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI: &lt;code&gt;json_schema&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Google Gemini: &lt;code&gt;response_schema&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic Claude: &lt;code&gt;tool-use&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Other models: function-calling fallback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No more brittle text parsing for decision values.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LangGraph checkpoint resumption
&lt;/h3&gt;

&lt;p&gt;Pass &lt;code&gt;--checkpoint&lt;/code&gt; to enable per-node state persistence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.tradingagents/cache/checkpoints/&amp;lt;TICKER&amp;gt;.db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your run crashes after the Bull/Bear debate but before Risk Management, you resume from there instead of paying for the full pipeline again. &lt;strong&gt;Big API cost savings.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Persistent decision memory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.tradingagents/memory/trading_memory.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every run appends a decision record. On the next run for the &lt;strong&gt;same ticker&lt;/strong&gt;, the framework auto-injects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous decision&lt;/li&gt;
&lt;li&gt;Actual realized return (raw + alpha vs SPY)&lt;/li&gt;
&lt;li&gt;A one-paragraph retrospective&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;into the Portfolio Manager prompt. This is automated trading-journal-as-context — the framework literally learns from its own past mistakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. 5-tier rating system
&lt;/h3&gt;

&lt;p&gt;The Portfolio Manager now outputs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rating&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Buy&lt;/td&gt;
&lt;td&gt;strong buy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overweight&lt;/td&gt;
&lt;td&gt;increase position&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hold&lt;/td&gt;
&lt;td&gt;maintain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Underweight&lt;/td&gt;
&lt;td&gt;reduce position&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sell&lt;/td&gt;
&lt;td&gt;exit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Trader still uses 3-tier (Buy/Hold/Sell). Only the final Portfolio Manager gets the finer granularity.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. 10 LLM providers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# or:
# openai, google, anthropic, xai, deepseek,
# qwen, glm, openrouter, ollama, azure
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local Ollama is the cost-killer if you don't need top-tier reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tradingagents.graph.trading_graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TradingAgentsGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tradingagents.default_config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_CONFIG&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_CONFIG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_think_llm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quick_think_llm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_debate_rounds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkpoint_enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="n"&gt;ta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TradingAgentsGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;propagate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVDA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-01-15&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The split between &lt;code&gt;deep_think_llm&lt;/code&gt; and &lt;code&gt;quick_think_llm&lt;/code&gt; is the cost-optimization sweet spot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;deep_think_llm&lt;/strong&gt; — used for Bull/Bear debate, Risk debate, Portfolio Manager (heavy reasoning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;quick_think_llm&lt;/strong&gt; — used for Analyst data summarization (lightweight)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Claude users, Opus 4.7 + Haiku 4.5 is a clean combo.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI mode
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tradingagents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interactive prompt asks for ticker, date, LLM provider, and research depth. Great for quick experiments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backtest performance (paper figures)
&lt;/h2&gt;

&lt;p&gt;On AAPL / GOOGL / AMZN:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;TradingAgents&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cumulative Return&lt;/td&gt;
&lt;td&gt;23.21~26.62%&lt;/td&gt;
&lt;td&gt;lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Annualized Return&lt;/td&gt;
&lt;td&gt;up to 30.5%&lt;/td&gt;
&lt;td&gt;lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharpe Ratio&lt;/td&gt;
&lt;td&gt;improved&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Drawdown&lt;/td&gt;
&lt;td&gt;improved&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: arXiv 2412.20138 v7 (Yijia Xiao et al.).&lt;/p&gt;

&lt;h2&gt;
  
  
  What's NOT in the paper
&lt;/h2&gt;

&lt;p&gt;The honest limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slippage&lt;/strong&gt; is not modeled&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Taxes&lt;/strong&gt; are not deducted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Market impact&lt;/strong&gt; for non-trivial positions is ignored&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live data latency&lt;/strong&gt; is assumed-zero in backtest&lt;/li&gt;
&lt;li&gt;All numbers are &lt;strong&gt;AAPL/GOOGL/AMZN only&lt;/strong&gt; — large-cap US tech, the easiest regime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Translation: treat it as a research framework. Don't put real capital based on the paper figures alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generalizing the pattern
&lt;/h2&gt;

&lt;p&gt;The interesting part isn't trading — it's the &lt;strong&gt;multi-agent debate pattern&lt;/strong&gt; itself. The same 5-layer structure could be applied to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content production (writer + editor + SEO + reviewer + final approver)&lt;/li&gt;
&lt;li&gt;Marketing campaigns (strategy + copy + design + measurement)&lt;/li&gt;
&lt;li&gt;Hiring decisions (technical + culture-fit + reference + final)&lt;/li&gt;
&lt;li&gt;Pricing decisions (cost + market + competitive + customer)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anywhere a single human or single LLM tends to confirm-bias their initial take, structured debate between 2+ adversarial agents helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;TradingAgents is &lt;strong&gt;not&lt;/strong&gt; a "make money with AI" toolkit. It's a research framework that demonstrates the multi-agent paradigm works in a high-stakes domain. The v0.2.4 additions (structured outputs, checkpoints, persistent memory) make it actually usable for serious experimentation — not just paper-friendly demos.&lt;/p&gt;

&lt;p&gt;Worth cloning, reading the LangGraph code, and stealing patterns for your own multi-agent systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/TauricResearch/TradingAgents" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2412.20138" rel="noopener noreferrer"&gt;arXiv paper v7&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/TauricResearch/TradingAgents/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;CHANGELOG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tradingagents-ai.github.io/" rel="noopener noreferrer"&gt;Project site with benchmarks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;⚠️ Disclaimer: This article analyzes the TradingAgents framework from a technical perspective. It is not investment advice. Backtest numbers are historical and do not guarantee future returns. Always paper trade extensively before deploying real capital.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>free-claude-code: Route Claude Code Through NVIDIA NIM, OpenRouter, and Ollama for $0</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:35:01 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/free-claude-code-route-claude-code-through-nvidia-nim-openrouter-and-ollama-for-0-1h58</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/free-claude-code-route-claude-code-through-nvidia-nim-openrouter-and-ollama-for-0-1h58</guid>
      <description>&lt;p&gt;If you've wanted to try Claude Code but balked at the $20 Pro subscription or API credit commitment, there's an open-source proxy worth knowing about: &lt;a href="https://github.com/Alishahryar1/free-claude-code" rel="noopener noreferrer"&gt;free-claude-code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's a small FastAPI server that sits on &lt;code&gt;localhost:8082&lt;/code&gt;. You set &lt;code&gt;ANTHROPIC_BASE_URL=http://localhost:8082&lt;/code&gt; and Claude Code's API calls get rerouted to whatever backend you've mapped. The proxy handles bidirectional translation between Anthropic's SSE format and OpenAI chat / Anthropic Messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's interesting
&lt;/h2&gt;

&lt;p&gt;Two reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, the backends.&lt;/strong&gt; NVIDIA NIM gives you 40 req/min for free with just an &lt;code&gt;nvapi-&lt;/code&gt; key. OpenRouter has free models like DeepSeek R1. LM Studio, llama.cpp, and Ollama run locally with no rate limits at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, the model tier mapping.&lt;/strong&gt; Claude Code internally picks between Opus/Sonnet/Haiku based on task complexity. The proxy intercepts that decision and routes each tier to a different backend you choose.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MODEL_OPUS="nvidia_nim/moonshotai/kimi-k2.5"
MODEL_SONNET="open_router/deepseek/deepseek-r1-0528:free"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="nvidia_nim/z-ai/glm4.7"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heavy reasoning → Kimi K2.5 on NIM. Regular work → free DeepSeek R1 on OpenRouter. Fast responses → local LM Studio. Total cost: $0. And critically, it preserves Claude Code's own tier-routing logic, which is the part you can't easily replicate by manually configuring a single model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-in features that aren't obvious
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Thinking token conversion&lt;/strong&gt;: &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags and &lt;code&gt;reasoning_content&lt;/code&gt; from reasoning models get converted to Claude's native thinking blocks. You don't lose the reasoning trace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heuristic tool parser&lt;/strong&gt;: Some free models emit tool calls as text. The proxy parses them back into structured &lt;code&gt;tool_use&lt;/code&gt; blocks. Imperfect but helpful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagent control&lt;/strong&gt;: It intercepts Claude Code's Task tool and forces &lt;code&gt;run_in_background=False&lt;/code&gt;. Without this, free-tier quotas can disappear in seconds when subagents spawn in parallel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart rate limiting&lt;/strong&gt;: Rolling-window throttling and 429 exponential backoff are baked in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord/Telegram bots&lt;/strong&gt;: Optional. Tree-based threading, session persistence, voice note transcription. You can trigger coding sessions remotely.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install uv&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
uv python &lt;span class="nb"&gt;install &lt;/span&gt;3.14

&lt;span class="c"&gt;# 2. Clone and configure&lt;/span&gt;
git clone https://github.com/Alishahryar1/free-claude-code.git
&lt;span class="nb"&gt;cd &lt;/span&gt;free-claude-code
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Edit .env: add NVIDIA_NIM_API_KEY=nvapi-xxx&lt;/span&gt;

&lt;span class="c"&gt;# 3. Run the proxy&lt;/span&gt;
uv run uvicorn server:app &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8082
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"freecc"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8082"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ANTHROPIC_AUTH_TOKEN&lt;/code&gt; can be any string. The proxy doesn't validate it.&lt;/p&gt;

&lt;p&gt;For convenience, alias it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fcc&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"freecc"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8082"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  claude &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$@&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's also &lt;code&gt;claude-pick&lt;/code&gt;, an &lt;code&gt;fzf&lt;/code&gt;-based interactive selector if you don't want to keep editing &lt;code&gt;.env&lt;/code&gt; to swap models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest tradeoffs
&lt;/h2&gt;

&lt;p&gt;This isn't a free Claude Code. It's a way to route Claude Code's interface to different backends, with the quality and reliability tradeoffs that implies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool use accuracy&lt;/strong&gt;: The hardest part. Anthropic's models are tuned heavily for accurate tool calling. Free models are usually weaker, and the heuristic parser only goes so far.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: NIM's 40 req/min cap means heavy parallel work will throttle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation loss&lt;/strong&gt;: The OpenAI chat → Anthropic conversion can drop features in edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted reliability&lt;/strong&gt;: Uptime is on you.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ship production code, Claude Pro at $20/month is still the right answer. Where this shines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trying Claude Code's workflow before committing to a subscription.&lt;/li&gt;
&lt;li&gt;Sensitive code (client data, internal repos) that you want processed entirely locally via Ollama.&lt;/li&gt;
&lt;li&gt;Hybrid setups: real Anthropic for the Opus tier, free backends for Sonnet/Haiku. Cuts cost 70-90% with surprisingly small quality loss for boilerplate work.&lt;/li&gt;
&lt;li&gt;Remote autonomous coding via the Discord/Telegram bots.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where I'd start
&lt;/h2&gt;

&lt;p&gt;NIM-only setup. Get an &lt;code&gt;nvapi-&lt;/code&gt; key, set all three &lt;code&gt;MODEL_*&lt;/code&gt; to NIM models, run the proxy, point Claude Code at it. Twenty minutes of setup, zero cost, and you can immediately tell whether Claude Code's workflow fits how you work.&lt;/p&gt;

&lt;p&gt;Once that's working, layer in Ollama for sensitive work and OpenRouter free models for variety. Keep a separate &lt;code&gt;.env&lt;/code&gt; per project so you can swap the routing strategy per repo.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Alishahryar1/free-claude-code" rel="noopener noreferrer"&gt;https://github.com/Alishahryar1/free-claude-code&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Matt Pocock's Agent Skills — 30K Stars and the Start of the Skill Economy</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Tue, 28 Apr 2026 01:32:36 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/matt-pococks-agent-skills-30k-stars-and-the-start-of-the-skill-economy-lg2</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/matt-pococks-agent-skills-30k-stars-and-the-start-of-the-skill-economy-lg2</guid>
      <description>&lt;p&gt;If you've ever caught yourself thinking "AI coding tools are great, but I'm doing the same prompt setup again and again," you'll want to look at what Matt Pocock just shipped.&lt;/p&gt;

&lt;p&gt;The Total TypeScript founder open-sourced his personal &lt;code&gt;.claude/skills/&lt;/code&gt; folder on 2026-02-03. Three months later: &lt;strong&gt;30,344 stars, 2,380 forks, still actively pushed&lt;/strong&gt;. No marketing campaign. No launch post. Just a directory dump that resonated.&lt;/p&gt;

&lt;p&gt;The repo is called, simply, &lt;a href="https://github.com/mattpocock/skills" rel="noopener noreferrer"&gt;&lt;code&gt;mattpocock/skills&lt;/code&gt;&lt;/a&gt;. The slogan in the README is the whole thesis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"My agent skills that I use every day to do real engineering — not vibe coding."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction — &lt;strong&gt;real engineering vs vibe coding&lt;/strong&gt; — is doing a lot of work. Let's unpack it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills aren't prompts
&lt;/h2&gt;

&lt;p&gt;Most people use AI coding tools by writing prompts. The better the prompt, the better the output. Matt's framing is different.&lt;/p&gt;

&lt;p&gt;A skill is &lt;strong&gt;a process&lt;/strong&gt;. It tells the AI &lt;em&gt;how&lt;/em&gt; to work, not &lt;em&gt;what&lt;/em&gt; to say. Same input, different process, fundamentally different output. Think of it less like writing a message to the AI and more like designing the workshop the AI lives in.&lt;/p&gt;

&lt;p&gt;Let's see what that looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 16 skills, by category
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Planning &amp;amp; Design (5)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;to-prd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Synthesizes the current conversation into a PRD, files it as a GitHub issue. No interview — just synthesis of what's already been discussed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;to-issues&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Breaks PRDs/specs into independent &lt;strong&gt;vertical-slice&lt;/strong&gt; GitHub issues.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;grill-me&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interrogates you until every branch of the decision tree is resolved.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;design-an-interface&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Uses parallel sub-agents to generate radically different interface designs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request-refactor-plan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interview → small-commit refactor plan → GitHub issue.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The standout here is &lt;code&gt;grill-me&lt;/code&gt;. It's like having a skeptical tech lead permanently stationed in your terminal. Why are you doing this now? What's the cost of this decision? What edge case haven't you considered? It won't shut up until you've answered everything. It's exhausting in the best possible way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Development (5)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tdd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Red-Green-Refactor loop, vertical slice at a time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;triage-issue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Explore codebase → root cause → TDD fix plan as issue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;improve-codebase-architecture&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Finds deepening opportunities from &lt;code&gt;CONTEXT.md&lt;/code&gt; + &lt;code&gt;docs/adr/&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;migrate-to-shoehorn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Migrates &lt;code&gt;as&lt;/code&gt; assertions in tests to &lt;code&gt;@total-typescript/shoehorn&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scaffold-exercises&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generates exercise directory structure.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;tdd&lt;/code&gt; is brilliant. Before any code is written, it forces 3 questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is this code designed to be testable?&lt;/li&gt;
&lt;li&gt;What is the core behavior?&lt;/li&gt;
&lt;li&gt;Does this require an interface change?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Notice where the friction is. It's not "before every action." It's "before code-writing — the place where rushing produces bad outcomes." You buckle the seatbelt where the road is dangerous, not where it's flat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tooling &amp;amp; Setup (2)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;setup-pre-commit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Husky + lint-staged + Prettier + typecheck + tests.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git-guardrails-claude-code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Blocks dangerous git commands at the hook level.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;git-guardrails-claude-code&lt;/code&gt; is the safety harness for autonomous loops. It blocks &lt;code&gt;git push --force&lt;/code&gt;, &lt;code&gt;git reset --hard&lt;/code&gt;, &lt;code&gt;git clean -fd&lt;/code&gt;, etc. at the Claude Code hook layer — so your AI literally cannot nuke &lt;code&gt;main&lt;/code&gt; at 3am.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing &amp;amp; Knowledge (4)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;write-a-skill&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generates new skills with progressive disclosure structure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;edit-article&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Restructures sections, improves clarity, compresses prose.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ubiquitous-language&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extracts DDD ubiquitous language glossary from current conversation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;obsidian-vault&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Manages Obsidian vault with wikilinks and index notes.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;write-a-skill&lt;/code&gt; is the meta layer. &lt;strong&gt;Skills that make skills.&lt;/strong&gt; This is why the repo doesn't get stale: any user can fork it, run &lt;code&gt;write-a-skill&lt;/code&gt;, and add their own workflow. The system extends itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;The repo uses &lt;a href="https://www.npmjs.com/package/skills" rel="noopener noreferrer"&gt;&lt;code&gt;vercel-labs/skills&lt;/code&gt;&lt;/a&gt; (582K weekly downloads on npm):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install one skill in current project&lt;/span&gt;
npx skills@latest add mattpocock/skills/tdd

&lt;span class="c"&gt;# Install globally (~/.claude/skills/)&lt;/span&gt;
npx skills add mattpocock/skills/tdd &lt;span class="nt"&gt;-g&lt;/span&gt;

&lt;span class="c"&gt;# Install all 16 at once&lt;/span&gt;
npx skills add mattpocock/skills &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compatible with 49+ agents: Claude Code, Codex, Cursor, OpenCode, Gemini CLI, GitHub Copilot, OpenClaw, Antigravity, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four design principles
&lt;/h2&gt;

&lt;p&gt;If you read the SKILL.md files across the 16 skills, four principles repeat:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Process &amp;gt; Prompt
&lt;/h3&gt;

&lt;p&gt;The skill defines the workflow. The prompt is just a trigger.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Progressive Disclosure
&lt;/h3&gt;

&lt;p&gt;Each &lt;code&gt;SKILL.md&lt;/code&gt; root is &lt;strong&gt;under 50 lines&lt;/strong&gt;. Detail goes into &lt;code&gt;references/&lt;/code&gt;. The AI loads detail only when needed. This saves context tokens and keeps the surface area readable.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Vertical Slice
&lt;/h3&gt;

&lt;p&gt;Every workflow that could be horizontal (TDD, refactoring, issue breakdown) is sliced vertically instead. One full feature end-to-end, then the next.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Self-extending
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;write-a-skill&lt;/code&gt; is itself a skill. The system extends itself. This is the secret sauce of the 30K stars — adoption snowballs because users add to the library.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the "Skill Economy"?
&lt;/h2&gt;

&lt;p&gt;Vibe Sparking coined the phrase. The trajectory looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025: Prompt engineering (individual know-how)
  ↓
2026 Q1: AGENTS.md / CLAUDE.md (project-level context)
  ↓
2026 Q2: Agent Skills (reusable workflows)
  ↓
2026+: Skill registry (npm-like ecosystem)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two months ago, your AI workflow was implicit knowledge in your prompt. Today, it's a versioned artifact you can fork, share, and review. That's a fundamental change.&lt;/p&gt;

&lt;p&gt;Matt also released a course in April 2026 — &lt;a href="https://udcourse.com/product/claude-code-for-real-engineers-matt-pocock/" rel="noopener noreferrer"&gt;"Claude Code for Real Engineers"&lt;/a&gt; — covering Plan/Execute/Clear workflows, AGENTS.md patterns, and Ralph Wiggum autonomous loops. The skills repo is the practical foundation of that course.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start (5 skills, 30 minutes)
&lt;/h2&gt;

&lt;p&gt;If you're new to this, install these five and you'll see the difference within a week:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add mattpocock/skills/tdd &lt;span class="nt"&gt;-g&lt;/span&gt;
npx skills add mattpocock/skills/grill-me &lt;span class="nt"&gt;-g&lt;/span&gt;
npx skills add mattpocock/skills/to-prd &lt;span class="nt"&gt;-g&lt;/span&gt;
npx skills add mattpocock/skills/write-a-skill &lt;span class="nt"&gt;-g&lt;/span&gt;
npx skills add mattpocock/skills/git-guardrails-claude-code &lt;span class="nt"&gt;-g&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Workflow examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New feature: &lt;code&gt;grill-me&lt;/code&gt; → &lt;code&gt;to-prd&lt;/code&gt; → &lt;code&gt;tdd&lt;/code&gt; (with &lt;code&gt;git-guardrails-claude-code&lt;/code&gt; running quietly in the background)&lt;/li&gt;
&lt;li&gt;Custom workflow: &lt;code&gt;write-a-skill&lt;/code&gt; → share the skill with your team or open-source it back&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;The shift here isn't about better tooling. It's about &lt;strong&gt;AI as a collaborator with defined processes&lt;/strong&gt;, not AI as an oracle you prompt. That distinction is what 30K stars are voting on.&lt;/p&gt;

&lt;p&gt;Try it for a week. The change will compound.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source repo&lt;/strong&gt;: &lt;a href="https://github.com/mattpocock/skills" rel="noopener noreferrer"&gt;github.com/mattpocock/skills&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Canva Magic Layers — How AI Decomposes Flat Images Back into Editable Layers</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Mon, 27 Apr 2026 00:43:58 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/canva-magic-layers-how-ai-decomposes-flat-images-back-into-editable-layers-44o7</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/canva-magic-layers-how-ai-decomposes-flat-images-back-into-editable-layers-44o7</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Canva launched &lt;strong&gt;Magic Layers&lt;/strong&gt; in March 2026. It takes any flat JPG/PNG and decomposes it into editable text, object, and background layers using Canva's proprietary foundation model. 9M+ uses in the first month. Now part of Canva AI 2.0 (announced April 16, 2026).&lt;/p&gt;

&lt;p&gt;The interesting part for developers and automation builders: &lt;strong&gt;the direction is opposite to Photoshop Generative Fill&lt;/strong&gt;. Generative Fill creates new content. Magic Layers reverses an existing image back into editable form.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Magic Layers actually does
&lt;/h2&gt;

&lt;p&gt;Upload a JPG or PNG. Click "Edit image" → "Magic Layers". In ~5 seconds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text becomes &lt;strong&gt;live text boxes&lt;/strong&gt; (not OCR — original font, size, color preserved)&lt;/li&gt;
&lt;li&gt;Objects become individual movable layers&lt;/li&gt;
&lt;li&gt;Background sits behind cleanly separated foreground&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can then change a font, swap a background, or replace product photography without re-generating the source image.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Canva Design Model
&lt;/h2&gt;

&lt;p&gt;This isn't built on top of Stable Diffusion or DALL-E. It runs on Canva's own &lt;strong&gt;Canva Design Model&lt;/strong&gt;, a foundation model trained to &lt;em&gt;understand&lt;/em&gt; design structure rather than generate pixels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Canva AI Stack
├── External (OpenAI, Anthropic) ── text generation, general AI
├── Canva Magic Media ───────────── image/video generation
└── Canva Design Model (in-house) ── Magic Layers (decomposition)
                                   ── Agentic editing
                                   ── Structural design
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Canva publicly framed this as their proprietary moat. OpenAI and Anthropic still power Canva's text features, but the layering capability is built and trained internally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for automation pipelines
&lt;/h2&gt;

&lt;p&gt;If you're building any kind of content automation — and especially if you generate carousels, social posts, or programmatic banners — you've hit this wall:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI generates a flawless image&lt;/li&gt;
&lt;li&gt;You need to change one word of text&lt;/li&gt;
&lt;li&gt;...you're back to prompt-rewriting and re-rolling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The cost-per-variant of AI imagery has been bottlenecked at editing, not generation. Magic Layers attacks the editing bottleneck head-on.&lt;/p&gt;

&lt;p&gt;For a Python automation pipeline that today does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Current AI image flow
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;midjourney_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ~30s
# To create a variant: re-run from line 1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The post-Magic-Layers conceptual flow looks more like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Hypothetical post-Magic-Layers flow
&lt;/span&gt;&lt;span class="n"&gt;master&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;midjourney_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# generate once
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;variants&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;layered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;canva_magic_layers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# decompose once
&lt;/span&gt;    &lt;span class="nf"&gt;swap_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;layered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# cheap edit
&lt;/span&gt;    &lt;span class="nf"&gt;swap_background&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;layered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# cheap edit
&lt;/span&gt;    &lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;layered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Canva does &lt;strong&gt;not&lt;/strong&gt; publicly expose Magic Layers via API yet (as of April 2026). The above is conceptual until they ship it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Canva Magic Layers&lt;/th&gt;
&lt;th&gt;Photoshop Generative Fill&lt;/th&gt;
&lt;th&gt;Adobe Firefly&lt;/th&gt;
&lt;th&gt;Figma AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;Decompose&lt;/td&gt;
&lt;td&gt;Generate&lt;/td&gt;
&lt;td&gt;Generate&lt;/td&gt;
&lt;td&gt;Generate layouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill barrier&lt;/td&gt;
&lt;td&gt;Click once&lt;/td&gt;
&lt;td&gt;High (masking, selection)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;Editable Canva file&lt;/td&gt;
&lt;td&gt;PSD layers&lt;/td&gt;
&lt;td&gt;New asset&lt;/td&gt;
&lt;td&gt;Figma frames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Marketers, solo builders&lt;/td&gt;
&lt;td&gt;Pro photo retouching&lt;/td&gt;
&lt;td&gt;Brand teams&lt;/td&gt;
&lt;td&gt;UI/UX designers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API access&lt;/td&gt;
&lt;td&gt;Not yet&lt;/td&gt;
&lt;td&gt;Photoshop SDK&lt;/td&gt;
&lt;td&gt;Firefly API&lt;/td&gt;
&lt;td&gt;Figma plugin API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Limitations engineers should know
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Format support&lt;/strong&gt;: JPEG and PNG only. No WEBP, HEIC, PDF, or SVG yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases&lt;/strong&gt;: Hair, transparent objects, and heavy shadow overlap reduce mask quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No SDK/API&lt;/strong&gt;: Magic Layers is UI-only inside Canva editor at the moment. No programmatic access for automation pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region locked&lt;/strong&gt;: US, UK, Canada, Australia public beta. Asian markets not yet supported.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language support&lt;/strong&gt;: English plus 6 European languages. Korean/Japanese/Chinese text layer extraction unverified.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credits&lt;/strong&gt;: Free plan = 50 shared AI credits/month. Magic Layers consumes more credits per call than text generation. Pro plan ($120/year) recommended for any sustained use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What this signals about the AI image market
&lt;/h2&gt;

&lt;p&gt;The last 18 months of AI image tooling competed on generation quality — Midjourney v6, DALL-E 3, SDXL, FLUX, and so on. The actual workflow blocker for everyone shipping content was post-edit. Canva moved first into that gap with a &lt;em&gt;proprietary&lt;/em&gt; foundation model rather than fine-tuning an open one.&lt;/p&gt;

&lt;p&gt;Adobe's response will likely come from the Photoshop side (extending Generative Fill into structural decomposition). Figma might extend AI to broader image-to-frame conversion. Watch for someone shipping an open-source equivalent within 6-12 months — the architecture (segmentation + OCR + layout reconstruction) is reproducible if not trivial.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm doing about it as a solo builder
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Today&lt;/strong&gt;: monitoring Canva newsroom for global rollout including Korea&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;At Korean launch&lt;/strong&gt;: testing Korean text layer extraction quality on Pro plan&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Until then&lt;/strong&gt;: keeping my existing card-news pipeline (Gemini + HTML rendering) as the production path. Magic Layers becomes a workflow integration when API ships and Korean support lands.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Source
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.canva.com/newsroom/news/magic-layers/" rel="noopener noreferrer"&gt;Canva Newsroom: Introducing Magic Layers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.canva.com/magic-layers/" rel="noopener noreferrer"&gt;Canva: Make any design editable&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.businesswire.com/news/home/20260311951174/en/Canva-Introduces-Magic-Layers-Turning-Static-AI-Outputs-Into-Editable-Designs" rel="noopener noreferrer"&gt;BusinessWire announcement (March 11, 2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://qjc.app/blog/canva-magic-layer" rel="noopener noreferrer"&gt;qjc.app/blog&lt;/a&gt;. Discussion welcome below.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>GPT-5.5 Released: What the Marketing Headlines Don't Tell You</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Mon, 27 Apr 2026 00:42:55 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/gpt-55-released-what-the-marketing-headlines-dont-tell-you-3keg</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/gpt-55-released-what-the-marketing-headlines-dont-tell-you-3keg</guid>
      <description>&lt;p&gt;OpenAI announced GPT-5.5 on April 23, 2026. The API rolled out one day later on April 24. Four days in, the marketing claims and benchmark hype are everywhere — but the picture is more nuanced than headlines suggest.&lt;/p&gt;

&lt;p&gt;This post is a 1st-source-only digest. Everything here is cross-validated against openai.com, developers.openai.com, CNBC, TechCrunch, Fortune, Help Net Security, and the OpenAI/Codex GitHub issue tracker.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Was Actually Announced
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Announcement&lt;/strong&gt;: April 23, 2026 (Brockman, Glaese, Chen, Pachocki — not Sam Altman)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API release&lt;/strong&gt;: April 24, 2026 (one day later, separate safeguard process)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model IDs&lt;/strong&gt;: &lt;code&gt;gpt-5.5&lt;/code&gt;, snapshot &lt;code&gt;gpt-5.5-2026-04-23&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge cutoff&lt;/strong&gt;: December 1, 2025&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT availability&lt;/strong&gt;: Plus, Pro, Business, Enterprise (immediate)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex availability&lt;/strong&gt;: Plus, Pro, Business, Enterprise, Edu, Go&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benchmarks: Where GPT-5.5 Is SOTA
&lt;/h2&gt;

&lt;p&gt;All scores below are at reasoning effort &lt;code&gt;xhigh&lt;/code&gt; per OpenAI's official tables.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.0&lt;/td&gt;
&lt;td&gt;75.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+7.6pp (SOTA)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expert-SWE (Internal)&lt;/td&gt;
&lt;td&gt;68.5%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;73.1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+4.6pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDPval (knowledge work)&lt;/td&gt;
&lt;td&gt;83.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;84.9%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;beats Claude Opus 4.7 (80.3%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OSWorld-Verified&lt;/td&gt;
&lt;td&gt;75.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;beats Claude (78.0%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tau2-bench Telecom&lt;/td&gt;
&lt;td&gt;92.8%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;no prompt tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FrontierMath Tier 4&lt;/td&gt;
&lt;td&gt;27.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;35.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;beats Claude (22.9%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARC-AGI-2&lt;/td&gt;
&lt;td&gt;73.3%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+11.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRCR v2 8-needle 512K-1M&lt;/td&gt;
&lt;td&gt;36.6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;74.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+37.4pp (2x recovery)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The MRCR v2 long-context recovery is particularly impressive. GPT-5.4 was losing more than half the needles in the 512K-1M range; GPT-5.5 retains roughly three-quarters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where GPT-5.5 Is NOT #1
&lt;/h2&gt;

&lt;p&gt;This is the part most marketing posts skip. Per OpenAI's own published comparison tables:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;th&gt;Leader&lt;/th&gt;
&lt;th&gt;Lead Margin&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;58.6%&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7 (64.3%)&lt;/td&gt;
&lt;td&gt;-5.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond&lt;/td&gt;
&lt;td&gt;93.6%&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7 (94.2%)&lt;/td&gt;
&lt;td&gt;-0.6pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Humanity's Last Exam (with tools)&lt;/td&gt;
&lt;td&gt;52.2%&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7 (54.7%)&lt;/td&gt;
&lt;td&gt;-2.5pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARC-AGI-1 (Verified)&lt;/td&gt;
&lt;td&gt;95.0%&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro (98.0%)&lt;/td&gt;
&lt;td&gt;-3.0pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BrowseComp&lt;/td&gt;
&lt;td&gt;84.4%&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro (85.9%)&lt;/td&gt;
&lt;td&gt;-1.5pp&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenAI itself notes in the announcement that SWE-Bench Pro has potential memorization concerns documented in the literature. Take any single benchmark with appropriate skepticism.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 1M Context Catch
&lt;/h2&gt;

&lt;p&gt;OpenAI markets GPT-5.5 with a "1M context window." The exact number per developer docs is &lt;strong&gt;1,050,000 tokens&lt;/strong&gt;. But this number depends heavily on where you use it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API (&lt;code&gt;gpt-5.5&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;1,050,000 tokens&lt;/td&gt;
&lt;td&gt;developers.openai.com&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex (official)&lt;/td&gt;
&lt;td&gt;400,000 tokens&lt;/td&gt;
&lt;td&gt;OpenAI announcement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex (measured)&lt;/td&gt;
&lt;td&gt;258,400 tokens (bug report)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/openai/codex/issues/19319" rel="noopener noreferrer"&gt;openai/codex#19319&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max output&lt;/td&gt;
&lt;td&gt;128,000 tokens&lt;/td&gt;
&lt;td&gt;developers.openai.com&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Users in the GitHub issue are reporting "exceeds the context window" errors at unexpectedly low input sizes. If you're building tooling that depends on the full 1M window, validate the actual environment, not the marketing claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing: 2x Increase + Long Context Premium
&lt;/h2&gt;

&lt;p&gt;The published API pricing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gpt-5.5
  Input:        $5.00 / 1M tokens
  Output:      $30.00 / 1M tokens
  Cached input: $0.50 / 1M tokens

gpt-5.5-pro (parallel test-time compute variant)
  Input:       $30.00 / 1M tokens
  Output:     $180.00 / 1M tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;exactly 2x GPT-5.4's pricing&lt;/strong&gt; ($2.50 input / $10 output).&lt;/p&gt;

&lt;p&gt;The hidden premium: &lt;strong&gt;inputs over 272K tokens get 2x input cost and 1.5x output cost&lt;/strong&gt;. So if you actually use the full 1M window, you're paying double on input. This makes "1M context is essentially priced twice" a fair characterization.&lt;/p&gt;

&lt;p&gt;Other pricing modifiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batch / Flex: 50% of standard&lt;/li&gt;
&lt;li&gt;Priority processing: 250% of standard&lt;/li&gt;
&lt;li&gt;Regional processing (data residency): +10%&lt;/li&gt;
&lt;li&gt;Codex Fast mode: 1.5x speed at 2.5x cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI argues that token efficiency improvements offset the price hike for many workloads. Your mileage will depend heavily on the work type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safety: AISI Found a Universal Jailbreak in 6 Hours
&lt;/h2&gt;

&lt;p&gt;The UK AI Security Institute (AISI) ran a 6-hour expert red team and found a universal jailbreak before launch. OpenAI says they fixed it before release. However, per Transformer News, &lt;strong&gt;AISI did not directly verify the fix in the final deployment configuration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GPT-5.5 is rated &lt;strong&gt;"High" on OpenAI's Preparedness Framework&lt;/strong&gt; for both cybersecurity and biology (below "Critical" but above prior models). OpenAI launched a &lt;strong&gt;Bio Bug Bounty&lt;/strong&gt; program for finding biology safeguard bypasses.&lt;/p&gt;

&lt;p&gt;For cybersecurity, OpenAI is positioning defensively via the &lt;strong&gt;Trusted Access for Cyber&lt;/strong&gt; program — vetted defenders get expanded access to GPT-5.5's cyber capabilities. SecureBio evaluation reportedly found "wet-lab virology troubleshooting assistance above expert level," which is the basis for the High rating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Guidance (Day 4)
&lt;/h2&gt;

&lt;p&gt;Real-world feedback is still thin. Based on what OpenAI has published:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use GPT-5.5 when&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agentic coding workflows (Terminal-Bench-style tasks)&lt;/li&gt;
&lt;li&gt;Computer use / OS automation (OSWorld-Verified)&lt;/li&gt;
&lt;li&gt;Long-context recall in the 512K-1M range&lt;/li&gt;
&lt;li&gt;Tier-4 frontier mathematics&lt;/li&gt;
&lt;li&gt;Knowledge work where GDPval is representative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Claude Opus 4.7 when&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pure SWE-Bench Pro-style coding tasks&lt;/li&gt;
&lt;li&gt;Academic reasoning (GPQA Diamond)&lt;/li&gt;
&lt;li&gt;Humanity's Last Exam-style questions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost optimization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stay below 272K input tokens to avoid the long context premium&lt;/li&gt;
&lt;li&gt;Use Batch/Flex modes for 50% off when latency is flexible&lt;/li&gt;
&lt;li&gt;Cached input drops cost to $0.50/1M (90% savings)&lt;/li&gt;
&lt;li&gt;In Codex, plan for ~258-400K context, not 1M&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Caveats Worth Repeating
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;All benchmarks above are at reasoning effort &lt;code&gt;xhigh&lt;/code&gt;. Default API settings will likely produce lower scores.&lt;/li&gt;
&lt;li&gt;We're 4 days post-launch. External reproduction and independent evaluation are pending.&lt;/li&gt;
&lt;li&gt;OpenAI's comparison tables have empty cells (&lt;code&gt;-&lt;/code&gt;) for some Claude/Gemini entries, so "SOTA across the board" is an overstatement of what was actually published.&lt;/li&gt;
&lt;li&gt;Korean and other non-English language performance is not specifically benchmarked in the announcement.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Introducing GPT-5.5 - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/gpt-5-5-system-card/" rel="noopener noreferrer"&gt;GPT-5.5 System Card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/models/gpt-5.5" rel="noopener noreferrer"&gt;GPT-5.5 API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html" rel="noopener noreferrer"&gt;CNBC coverage (Ashley Capoot)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;TechCrunch coverage (Lucas Ropek)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fortune.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;Fortune coverage (Sharon Goldman)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.helpnetsecurity.com/2026/04/24/openai-gpt-5-5-cybersecurity-safeguards/" rel="noopener noreferrer"&gt;Help Net Security - cybersecurity safeguards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.transformernews.ai/p/openai-shouldnt-be-deciding-if-its-gpt-55" rel="noopener noreferrer"&gt;Transformer News - AISI red team&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/codex/issues/19319" rel="noopener noreferrer"&gt;GitHub openai/codex#19319 - context window bug&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Disclaimer: AI-assisted research digest. Verify primary sources before making decisions. We're 4 days into the release; expect updates as third-party evaluations come in.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Hermes Agent v0.11.0: How a Self-Improving Open-Source AI Agent Hit 105K GitHub Stars in 7 Weeks</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Mon, 27 Apr 2026 00:37:51 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/hermes-agent-v0110-how-a-self-improving-open-source-ai-agent-hit-105k-github-stars-in-7-weeks-112a</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/hermes-agent-v0110-how-a-self-improving-open-source-ai-agent-hit-105k-github-stars-in-7-weeks-112a</guid>
      <description>&lt;h1&gt;
  
  
  Hermes Agent v0.11.0: How a Self-Improving Open-Source AI Agent Hit 105K GitHub Stars in 7 Weeks
&lt;/h1&gt;

&lt;p&gt;If you missed it: &lt;strong&gt;Nous Research dropped Hermes Agent v0.11.0 on April 23, 2026&lt;/strong&gt;, and the project crossed 105,000 GitHub stars in just 7 weeks since its February 25 launch. That's faster than AutoGPT, faster than CrewAI, and arguably the most significant release in the open-source agent space this year.&lt;/p&gt;

&lt;p&gt;I spent the weekend digging into the actual code and benchmarks. Here's what I found, and why I think the GEPA self-improvement loop is the part most articles are underselling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than typical "another agent framework" launches
&lt;/h2&gt;

&lt;p&gt;Most "agent frameworks" published in 2024-2025 were orchestration layers — they call LLM APIs in sequence and manage state. Hermes Agent does that, but adds something new: &lt;strong&gt;the agent literally rewrites its own prompts and skills as it works&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This isn't a marketing claim. It's GEPA (Generative Embedding Prompt Adaptation), accepted as an Oral paper at ICLR 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GEPA Loop in Detail
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task complete (5+ tool calls)
      ↓
Trace analysis (which tools, in what order, with what context)
      ↓
Skill file auto-generated (.md format, human-readable)
      ↓
System prompt auto-tuned (small deltas)
      ↓
SQLite FTS5 index updated for retrieval
      ↓
Next similar task → 40% faster (after ~20 skills accumulated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill files end up in &lt;code&gt;~/.hermes/skills/&lt;/code&gt; and look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;market-research-pipeline&lt;/span&gt;
&lt;span class="na"&gt;trigger_keywords&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;market&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;trends"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;competitive&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;analysis"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;tool_sequence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;web-search&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;extract-content&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;summarize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;translate&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;image-gen&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Market Research Pipeline&lt;/span&gt;

When user asks for market research:
&lt;span class="p"&gt;1.&lt;/span&gt; Run web-search with date filter (last 30 days)
&lt;span class="p"&gt;2.&lt;/span&gt; Extract from top 5 results
&lt;span class="p"&gt;3.&lt;/span&gt; Summarize in target language
&lt;span class="p"&gt;4.&lt;/span&gt; Generate infographic if data is quantitative
&lt;span class="p"&gt;5.&lt;/span&gt; Format as markdown report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Crucially, &lt;strong&gt;you can edit these files manually&lt;/strong&gt;. The agent's "memory" is just a directory of markdown files. No proprietary vector store, no opaque embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks: Where GEPA Actually Wins
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Hermes Agent (GEPA)&lt;/th&gt;
&lt;th&gt;Comparison Point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MATH&lt;/td&gt;
&lt;td&gt;93%&lt;/td&gt;
&lt;td&gt;Base CoT on same model: 67% (+26pt)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIME-2025&lt;/td&gt;
&lt;td&gt;MIPROv2 +12%&lt;/td&gt;
&lt;td&gt;vs leading prompt optimizer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GEPA vs GRPO&lt;/td&gt;
&lt;td&gt;avg +6%, max +20%&lt;/td&gt;
&lt;td&gt;with 35x fewer rollouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RefusalBench (Hermes 4.3 36B)&lt;/td&gt;
&lt;td&gt;57%+&lt;/td&gt;
&lt;td&gt;GPT-4o / Claude: ~17%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RefusalBench result is the one I'd verify independently if I were betting production budget on this. 3.4x less false refusal vs the major closed models is a big claim. But if it holds, the enterprise implications are significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New in v0.11 Specifically
&lt;/h2&gt;

&lt;h3&gt;
  
  
  React/Ink TUI v2 (Complete Rewrite)
&lt;/h3&gt;

&lt;p&gt;The terminal interface was completely rebuilt on React + Ink. New capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sticky composer&lt;/strong&gt;: Input area stays at the bottom even with long output streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSC-52 clipboard&lt;/strong&gt;: Click any code block to copy to system clipboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live streaming&lt;/strong&gt;: Tool call results render in real-time with progress indicators&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've used the TUI in v0.10, this feels like a different product.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/steer&lt;/code&gt; — Mid-Execution Intervention
&lt;/h3&gt;

&lt;p&gt;This one's underrated. Traditional agent frameworks make you wait for the entire run to complete before you can correct course. With &lt;code&gt;/steer&lt;/code&gt;, you intervene right before the next tool call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: "I'll translate all 10 articles next."
You: /steer only translate 5
Agent: [adjusts plan, translates 5]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implementation uses a queue inspection at the tool dispatch boundary. Clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unlimited Sub-Agent Recursion
&lt;/h3&gt;

&lt;p&gt;Sub-agents can spawn sub-agents to arbitrary depth and breadth. Example pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main-agent
  ├── researcher
  │   ├── web-scraper-agent
  │   └── pdf-extractor-agent
  ├── analyst
  │   └── data-validator-agent
  └── writer
      └── editor-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In v0.10 this was capped. v0.11 removes the cap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Five New Model Providers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 (Codex OAuth)&lt;/td&gt;
&lt;td&gt;OpenAI's latest coding-specialized model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Bedrock (Converse API)&lt;/td&gt;
&lt;td&gt;Enterprise AWS infrastructure integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA NIM&lt;/td&gt;
&lt;td&gt;NVIDIA inference containers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arcee AI&lt;/td&gt;
&lt;td&gt;Small specialized models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel ai-gateway&lt;/td&gt;
&lt;td&gt;Multi-provider routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AWS Bedrock integration is the one I'd watch closely. It enables truly private deployments inside enterprise VPCs — which is what most regulated industries need.&lt;/p&gt;

&lt;h3&gt;
  
  
  QQBot (17th Messaging Platform)
&lt;/h3&gt;

&lt;p&gt;Tencent QQ integration is the new platform. Combined with Discord, Slack, Telegram, Line, WhatsApp, WeChat, etc., Hermes Agent now covers basically every major chat surface globally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Story That Should Make You Pay Attention
&lt;/h2&gt;

&lt;p&gt;Per Nous Research's published numbers, equivalent enterprise tasks cost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;th&gt;Hermes Agent (Local 4.3 36B)&lt;/th&gt;
&lt;th&gt;GPT-5.5 / Claude (API)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single 5-tool task&lt;/td&gt;
&lt;td&gt;$0.001&lt;/td&gt;
&lt;td&gt;$0.02 - $0.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000 tasks/day&lt;/td&gt;
&lt;td&gt;$1&lt;/td&gt;
&lt;td&gt;$20 - $90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;$30&lt;/td&gt;
&lt;td&gt;$600 - $2,700&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a &lt;strong&gt;20-90x cost differential&lt;/strong&gt;. For a 10-person engineering team running agents 24/7, the math becomes obvious quickly. The catch: you need a 24GB+ GPU for local Hermes 4.3 36B. If you're running on API providers, you lose most of the cost advantage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Highlights
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│  Messaging Platforms (17)               │
│  Telegram / Discord / Slack / LINE / ... │
└─────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  Hermes Agent Core                       │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  Memory (3-tier)                    │ │
│  │  • Short: in-memory context         │ │
│  │  • Medium: SQLite FTS5 (cross-session) │
│  │  • Long: skills/personas/profiles   │ │
│  └────────────────────────────────────┘ │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  GEPA Loop                          │ │
│  │  • Trace capture                    │ │
│  │  • Skill generation                 │ │
│  │  • Prompt tuning                    │ │
│  └────────────────────────────────────┘ │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  Tool Gateway (v0.10+)              │ │
│  │  • Web search (Firecrawl)           │ │
│  │  • Image gen (FAL FLUX 2 Pro)       │ │
│  │  • TTS (OpenAI)                     │ │
│  │  • Browser (Browser Use)            │ │
│  └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  Model Providers (multiplexed)          │
│  Local 4.3 / GPT-5.5 / Claude / Gemini  │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting Started in 30 Minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @nousresearch/hermes-agent

&lt;span class="c"&gt;# Initialize&lt;/span&gt;
hermes init my-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;my-agent

&lt;span class="c"&gt;# Choose provider (pick one)&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;provider &lt;span class="nb"&gt;local&lt;/span&gt;        &lt;span class="c"&gt;# Free, needs GPU&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;provider openai        &lt;span class="c"&gt;# Easy, costs money&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;provider anthropic
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;provider google

&lt;span class="c"&gt;# Connect Telegram (easiest messaging platform)&lt;/span&gt;
hermes connect telegram &lt;span class="nt"&gt;--token&lt;/span&gt; &lt;span class="nv"&gt;$TELEGRAM_BOT_TOKEN&lt;/span&gt;

&lt;span class="c"&gt;# Start&lt;/span&gt;
hermes start

&lt;span class="c"&gt;# Now message your bot on Telegram&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Tool Gateway (web search, image gen, TTS, browser):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes login nous-portal
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;tool-gateway nous-portal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or BYO API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes config &lt;span class="nb"&gt;set &lt;/span&gt;tools.web-search.firecrawl-key &lt;span class="nv"&gt;$FIRECRAWL_KEY&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;tools.image.fal-key &lt;span class="nv"&gt;$FAL_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I'm Still Watching
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Independent benchmark verification&lt;/strong&gt;: GEPA numbers come from Nous Research themselves. Would be valuable to see third-party reproduction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GEPA in production&lt;/strong&gt;: Does the 40% speedup on repeat tasks materialize after 1-3 months of real usage, or is it a benchmark artifact?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Gateway availability&lt;/strong&gt;: Nous Portal is currently the easiest path. Is it stable enough for production SLAs?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;Hermes Agent v0.11.0 is the most significant open-source agent release of 2026 so far. The GEPA self-improvement loop is genuinely novel, the 20-90x cost advantage opens up agent use cases that didn't pencil out before, and the 17-platform messaging integration makes consumer-facing deployments trivial.&lt;/p&gt;

&lt;p&gt;If you're building agents in 2026, you owe it to yourself to spend a weekend with this.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/" rel="noopener noreferrer"&gt;Official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/releases/tag/v2026.4.23" rel="noopener noreferrer"&gt;v0.11.0 release notes (April 23, 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/releases/tag/v2026.4.16" rel="noopener noreferrer"&gt;v0.10.0 release notes (April 16, 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/NousResearch/Hermes-4.3-36B" rel="noopener noreferrer"&gt;Hermes 4.3 36B model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nousresearch.com/introducing-hermes-4-3" rel="noopener noreferrer"&gt;Nous Research blog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Codex CLI 다 알려드릴게요. 그래도 메인은 Claude Code입니다</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Mon, 27 Apr 2026 00:35:44 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/codex-cli-da-alryeodeurilgeyo-geuraedo-meineun-claude-codeibnida-5cao</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/codex-cli-da-alryeodeurilgeyo-geuraedo-meineun-claude-codeibnida-5cao</guid>
      <description>&lt;h1&gt;
  
  
  Codex CLI 다 알려드릴게요. 그래도 메인은 Claude Code입니다
&lt;/h1&gt;

&lt;p&gt;요즘 OpenAI Codex CLI에 대한 관심이 많습니다. "Claude Code에서 갈아타야 하는가"라는 질문도 자주 보이고요. 결론부터 말씀드리면, &lt;strong&gt;갈아탈 필요 없습니다.&lt;/strong&gt; 코덱스 CLI는 잘 만든 도구이지만, 메인 도구를 바꿀 정도는 아닙니다. 메인은 여전히 Claude Code입니다.&lt;/p&gt;

&lt;p&gt;이 글에서는 두 가지를 다룹니다. 먼저 Codex CLI의 핵심 활용법을 정리하고, 그 다음 왜 여전히 Claude Code가 메인이어야 하는지 다섯 가지 근거로 분석합니다.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Claude Code 품질 저하" 논란부터 정리합니다
&lt;/h2&gt;

&lt;p&gt;코덱스로 옮겨가려는 분들 대부분이 한 번쯤 겪으셨을 이슈입니다. 한동안 Claude Code 품질이 떨어진다는 체감이 분명히 있었습니다. Anthropic이 공식적으로 원인을 밝혔고, 세 가지가 함께 작용했다고 발표했습니다.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;추론 난이도 설정이 중간에서 높음으로 변경됐던 시기&lt;/li&gt;
&lt;li&gt;캐싱 동작 변경&lt;/li&gt;
&lt;li&gt;시스템 프롬프트 변경&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;이 세 가지 모두 원상 복구가 완료된 상태입니다. 즉, 많은 사용자가 "코덱스로 갈아타야 하나" 고민할 무렵 Claude Code는 이미 정상화되어 있었습니다. 일시적 이슈에 끌려 메인 도구를 바꾸는 의사결정은 신중해야 합니다.&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex CLI 핵심 활용법
&lt;/h2&gt;

&lt;h3&gt;
  
  
  설치와 첫 실행
&lt;/h3&gt;

&lt;p&gt;설치는 npm 한 줄로 끝납니다.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;터미널에서 &lt;code&gt;codex&lt;/code&gt;를 입력하면 실행됩니다. UI는 Claude Code와 거의 흡사해서 적응 비용은 낮은 편입니다. 키보드 단축키 체계도 비슷하고, 인터랙티브 방식도 유사합니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  권한 모드 3가지
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/permissions&lt;/code&gt; 명령어로 권한 모드를 설정할 수 있습니다.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;모드&lt;/th&gt;
&lt;th&gt;동작&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Default&lt;/td&gt;
&lt;td&gt;파일 읽기/수정은 자유. 외부 인터넷 접근이나 작업 디렉토리 외부 접근은 사용자 승인 필요&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto Review&lt;/td&gt;
&lt;td&gt;자율 동작 + 서브에이전트가 명령어의 안전성을 자체 판단&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Access&lt;/td&gt;
&lt;td&gt;YOLO 모드. 모든 명령을 자율 실행&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;일상 작업은 Default나 Auto Review로 충분합니다. Full Access는 격리된 컨테이너나 가상머신에서만 사용을 권장합니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  슬래시 명령어 정리
&lt;/h3&gt;

&lt;p&gt;코덱스 CLI에서 자주 쓰게 되는 명령어들입니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/statusline&lt;/code&gt;: 모델, 추론 단계, 컨텍스트, 패스트 모드 여부, 프로젝트명을 한 줄에 표시&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/copy&lt;/code&gt; (단축키 Ctrl+O): 마지막 응답을 클립보드로 복사&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/clear&lt;/code&gt;: 컨텍스트를 0%로 리셋&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/resume&lt;/code&gt;: 기존 세션 복원. &lt;code&gt;/resume --all&lt;/code&gt;은 다른 프로젝트 세션까지 포함&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/fork&lt;/code&gt;: 현재 세션을 복제. 분기 작업할 때 유용&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/status&lt;/code&gt;: 컨텍스트 윈도우 사용량과 위클리 리밋 확인&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/model&lt;/code&gt;: 모델 변경 + 추론 단계(low/medium/high) 설정&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/fast&lt;/code&gt;: 패스트 모드 토글&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/plan&lt;/code&gt; (단축키 Shift+Tab): 계획 문서를 먼저 작성하고 코드는 나중&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/review&lt;/code&gt;: 4가지 리뷰 모드 (브랜치 비교, 커밋 직전 셀프 리뷰, 커스텀 인스트럭션 등)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/compact&lt;/code&gt;: 대화 내역을 압축해 필요한 맥락만 새 컨텍스트로 추리기&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  단축키 정리
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tab&lt;/strong&gt;: 후속 작업을 큐에 적재. 현재 작업 진행 중에도 다음 작업을 미리 등록 가능&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ctrl+R&lt;/strong&gt;: 프롬프트 히스토리 검색&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ctrl+C (작성 중)&lt;/strong&gt;: 프롬프트 임시 삭제. 위 방향키로 복원 가능&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ctrl+C (대기 중)&lt;/strong&gt;: 코덱스 종료&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;위/아래 방향키&lt;/strong&gt;: 프롬프트 히스토리 탐색&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;!&lt;/strong&gt; 입력: 배시 모드. 셸 명령을 직접 실행&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$&lt;/strong&gt; 입력: 코덱스 스킬 호출 (예: &lt;code&gt;$image&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tab으로 큐를 쌓아두는 패턴이 의외로 편합니다. AI가 작업하는 동안 다음 단계를 미리 적어두는 식입니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  모델 라인업
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;모델&lt;/th&gt;
&lt;th&gt;특징&lt;/th&gt;
&lt;th&gt;크레딧&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;최신, 패스트 모드 지원&lt;/td&gt;
&lt;td&gt;2.5배&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;td&gt;패스트 모드 지원&lt;/td&gt;
&lt;td&gt;2배&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.3 Spark&lt;/td&gt;
&lt;td&gt;속도 빠름&lt;/td&gt;
&lt;td&gt;별도 리밋&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;대부분의 작업은 5.4로 충분합니다. 5.5는 정말 어려운 작업에만 쓰는 게 효율적입니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  $image 스킬로 이미지 생성
&lt;/h3&gt;

&lt;p&gt;코덱스 CLI 안에서 &lt;code&gt;$image&lt;/code&gt; 스킬을 호출하면 이미지를 생성할 수 있습니다. Image 2.0 모델을 쓰는데, 한글 텍스트 렌더링이 거의 완벽한 수준입니다. 코덱스 요금제에 포함이라 별도 결제는 필요 없습니다.&lt;/p&gt;

&lt;p&gt;로고나 배너를 개발 워크플로우 안에서 빠르게 뽑을 때 활용도가 있습니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plan 모드 워크플로우
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Shift+Tab&lt;/code&gt; 또는 &lt;code&gt;/plan&lt;/code&gt;으로 진입합니다. 흐름은 이렇습니다.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;계획 문서(plan)부터 작성합니다&lt;/li&gt;
&lt;li&gt;모호한 부분이 있으면 에이전트가 역질문합니다&lt;/li&gt;
&lt;li&gt;완성된 plan을 검토한 뒤 실행 옵션 3가지 중 선택합니다

&lt;ul&gt;
&lt;li&gt;현재 세션에서 바로 실행&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;컨텍스트를 초기화하고 plan만 가지고 실행&lt;/strong&gt; (권장)&lt;/li&gt;
&lt;li&gt;plan을 더 수정&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;권장 패턴은 두 번째입니다. "feature 단위로 plan을 만들고, 컨텍스트를 초기화한 뒤 실행"하는 방식이 일관성과 품질 모두에서 가장 안정적이라는 게 정설입니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review 모드
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/review&lt;/code&gt; 명령어로 진입합니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;현재 브랜치 vs 베이스 브랜치 비교&lt;/li&gt;
&lt;li&gt;커밋되지 않은 변경사항 셀프 리뷰&lt;/li&gt;
&lt;li&gt;커스텀 리뷰 인스트럭션 적용&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PR을 올리기 직전 셀프 리뷰 용도로 쓰면 한 번 더 점검하는 효과가 있습니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex Cloud로 PR 자동 리뷰
&lt;/h3&gt;

&lt;p&gt;이게 코덱스에서 꽤 강력한 기능입니다.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chat.openai.com/codex&lt;/code&gt; → Connectors → GitHub 연결을 거쳐 코드 검토 옵션을 켭니다. 트리거는 세 가지 중 선택할 수 있습니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PR이 열릴 때&lt;/li&gt;
&lt;li&gt;푸시할 때&lt;/li&gt;
&lt;li&gt;스마트 트리거 (변경 규모/유형에 따라)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PR에 코덱스 봇이 코멘트를 달고, 이슈 수정을 직접 요청하면 코덱스 클라우드가 자동으로 커밋해줍니다. 솔직히 이 부분은 편합니다.&lt;/p&gt;

&lt;h2&gt;
  
  
  그래서 왜 여전히 Claude Code가 메인인가
&lt;/h2&gt;

&lt;p&gt;여기까지 보면 코덱스가 매력적으로 보일 수 있습니다. 하지만 다음 다섯 가지 이유로 메인은 여전히 Claude Code입니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 응답 속도가 빠릅니다
&lt;/h3&gt;

&lt;p&gt;일상 개발에서 체감 응답 속도는 Claude Code가 더 빠릅니다. 코덱스는 패스트 모드를 켜야 비슷한 수준이 되는데, 패스트 모드는 크레딧을 2~2.5배 소비합니다. 즉, 같은 속도를 얻으려면 비용이 두 배 이상 들어가는 구조입니다.&lt;/p&gt;

&lt;p&gt;코드 생산성은 결국 "한 번의 응답을 기다리는 시간"의 누적입니다. 하루에 수백 번 반복되는 인터랙션에서 0.5초 차이가 쌓이면 큰 차이가 됩니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 대규모 리팩토링에서 안정적입니다
&lt;/h3&gt;

&lt;p&gt;코드베이스 전체를 이해하고 일관된 변경을 적용하는 능력에서 Claude Code가 한 발 앞섭니다. 흥미로운 점은, 코덱스를 적극 추천하는 분들도 "일상 개발과 대규모 리팩토링은 Claude Code를 쓴다"고 자주 인정한다는 겁니다.&lt;/p&gt;

&lt;p&gt;이건 단순히 모델 성능 차이가 아니라 도구 설계 철학 차이입니다. Claude Code는 처음부터 코드베이스 단위 작업을 염두에 두고 만들어졌습니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 생태계가 압도적입니다
&lt;/h3&gt;

&lt;p&gt;이게 결정적입니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP 표준화&lt;/strong&gt;: Anthropic이 만든 Model Context Protocol이 사실상 업계 표준이 됐습니다. 외부 툴 연동의 공통 언어 역할을 합니다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;서브에이전트 / Agent Teams&lt;/strong&gt;: 역할별 에이전트를 조합해 복잡한 워크플로우를 구성할 수 있습니다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;스킬 시스템&lt;/strong&gt;: 재사용 가능한 자동화 단위를 만들고 공유할 수 있습니다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;훅(Hook)&lt;/strong&gt;: SessionStart, PreToolUse, PostToolUse 등 라이프사이클에 맞춰 자동화를 끼워 넣을 수 있습니다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code Plugin 생태계&lt;/strong&gt;가 빠르게 성장 중입니다&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;코덱스가 이 수준에 도달하려면 상당한 시간이 필요합니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. 코덱스의 강점은 "보조 영역"입니다
&lt;/h3&gt;

&lt;p&gt;코덱스를 추천할 때 자주 거론되는 두 가지 장점은 이렇습니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;토큰 비용 절감&lt;/li&gt;
&lt;li&gt;장기 자율 실행&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;그런데 두 가지 모두 Claude Code에서 충분히 대응됩니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;토큰 절감&lt;/strong&gt;: Claude Pro/Max 정액제는 사실상 무제한에 가까운 사용량을 제공합니다. 토큰 단위 과금 비교는 의미가 줄어듭니다&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;장기 자율 실행&lt;/strong&gt;: Claude Code의 SDK, 헤드리스 모드, 백그라운드 작업으로 동일한 시나리오를 구현할 수 있습니다&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;결국 코덱스의 차별화 지점은 "second opinion"으로 좁아집니다. 다른 시각의 코드 분석, 다른 모델의 PR 리뷰가 필요할 때 보조로 쓰는 도구로 포지셔닝됩니다.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. 도구를 늘리되 주축은 지켜야 합니다
&lt;/h3&gt;

&lt;p&gt;여러 AI 도구를 동시에 쓰는 건 좋습니다. 하지만 메인 도구를 바꾸는 건 학습 비용, 워크플로우 재구성, 팀 동기화 비용을 동반합니다. "코덱스가 더 좋다고 들어서 옮겼더니 한 달 동안 적응만 하다 끝났다"는 사례가 드물지 않습니다.&lt;/p&gt;

&lt;p&gt;지금 권장되는 구도는 명확합니다.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;메인&lt;/strong&gt;: Claude Code (일상 개발 + 리팩토링 + 자동화 파이프라인)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;보조&lt;/strong&gt;: Codex (특정 PR 리뷰, 가끔 다른 시각의 코드 분석)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  권장 활용 시나리오
&lt;/h2&gt;

&lt;p&gt;코덱스를 보조로 쓸 때 가장 효율적인 시점을 정리하면 다음과 같습니다.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;상황&lt;/th&gt;
&lt;th&gt;도구&lt;/th&gt;
&lt;th&gt;이유&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;일상적인 코드 작성&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;응답 속도, 친숙한 UX&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;대규모 리팩토링&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;코드베이스 이해도&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;자동화 파이프라인 구축&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;MCP, 서브에이전트, 스킬&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;중요 PR의 second opinion&lt;/td&gt;
&lt;td&gt;Codex Cloud&lt;/td&gt;
&lt;td&gt;다른 모델 시각의 리뷰&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;가끔의 plan 검토&lt;/td&gt;
&lt;td&gt;Codex &lt;code&gt;/plan&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;다른 설계 시각&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;격리된 백엔드 잡 자동 실행&lt;/td&gt;
&lt;td&gt;둘 다 가능&lt;/td&gt;
&lt;td&gt;환경에 맞춰 선택&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;이 구도가 현재 가장 합리적입니다.&lt;/p&gt;

&lt;h2&gt;
  
  
  결론
&lt;/h2&gt;

&lt;p&gt;Codex CLI는 잘 만든 도구입니다. Plan 모드, Review 모드, Codex Cloud 모두 충분히 매력적입니다. 하지만 메인 도구를 바꿔야 할 만큼은 아닙니다.&lt;/p&gt;

&lt;p&gt;Claude Code의 일시적 품질 저하 이슈는 이미 복구됐고, 응답 속도와 생태계는 여전히 한 단계 앞서 있습니다. 코덱스로 갈아타려는 시간을 Claude Code를 더 잘 쓰는 데 투자하는 쪽이 훨씬 더 큰 생산성 향상을 가져옵니다.&lt;/p&gt;

&lt;p&gt;도구를 늘리되 주축은 흔들지 마세요. 그게 생산성을 지키는 가장 빠른 길입니다.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Schedule a One-Off Cloud Job in Claude Code with /schedule</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:54:05 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/schedule-a-one-off-cloud-job-in-claude-code-with-schedule-gk0</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/schedule-a-one-off-cloud-job-in-claude-code-with-schedule-gk0</guid>
      <description>&lt;h1&gt;
  
  
  Schedule a One-Off Cloud Job in Claude Code with &lt;code&gt;/schedule&lt;/code&gt;
&lt;/h1&gt;

&lt;p&gt;There are two flavors of automation tools: alarm clocks that ring every day at the same time (cron), and parcel pickup reservations that fire exactly once. Both are useful. The trick is having the right tool for each.&lt;/p&gt;

&lt;p&gt;Anthropic's &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Claude Code Routines&lt;/a&gt;, launched on April 14, 2026, recently added the second flavor under the name &lt;strong&gt;"Schedule a One-Off Run."&lt;/strong&gt; It fires a routine &lt;strong&gt;once&lt;/strong&gt; at a specific future timestamp, then auto-disables.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's a Routine?
&lt;/h2&gt;

&lt;p&gt;A routine is a saved Claude Code configuration — a self-contained prompt plus one or more repositories plus connectors — that runs autonomously on Anthropic's cloud. Your laptop can be closed and it still fires.&lt;/p&gt;

&lt;p&gt;A routine packages five things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt&lt;/strong&gt; — the self-contained instructions Claude receives at every fire&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repositories&lt;/strong&gt; — cloned at the start of each run from default branch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment&lt;/strong&gt; — network access, env vars, setup script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectors&lt;/strong&gt; — Slack, Linear, Google Drive (MCP)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Triggers&lt;/strong&gt; — schedule, API (&lt;code&gt;/fire&lt;/code&gt; endpoint), or GitHub events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are three trigger types: &lt;code&gt;Schedule&lt;/code&gt;, &lt;code&gt;API&lt;/code&gt;, and &lt;code&gt;GitHub events&lt;/code&gt;. The schedule trigger has two flavors — recurring (cron-like) and &lt;strong&gt;one-off&lt;/strong&gt; (today's topic).&lt;/p&gt;

&lt;h2&gt;
  
  
  How One-Off Runs Work
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;A one-off schedule fires the routine a single time at a specific timestamp.&lt;br&gt;
— Anthropic docs&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After firing, the routine is marked &lt;strong&gt;"Ran"&lt;/strong&gt; in the web UI. To run it again, you have to edit the routine and set a new one-off time, or create a brand new routine.&lt;/p&gt;

&lt;p&gt;Timezones are handled for you. You enter a local time, it gets stored as UTC, and Anthropic guarantees wall-clock execution regardless of where the underlying infra lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using &lt;code&gt;/schedule&lt;/code&gt; (CLI)
&lt;/h2&gt;

&lt;p&gt;Inside a Claude Code session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/schedule in 2 weeks, open a cleanup PR that removes the SIGNUP_V2_ENABLED flag

/schedule tomorrow at 9am, summarize yesterday's merged PRs

/schedule next Monday at 3pm KST, run smoke tests on production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude converts the natural-language timestamp into an absolute one and asks for confirmation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scheduled to fire at: 2026-05-09T15:00:00+09:00 (2026-05-09 06:00 UTC)
Repositories: my-app
Prompt summary: Remove SIGNUP_V2_ENABLED feature flag and open cleanup PR
Confirm? [y/N]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If GitHub isn't connected yet, you'll be guided through &lt;code&gt;/web-setup&lt;/code&gt;. Adding API or GitHub-event triggers requires the web UI (&lt;code&gt;claude.ai/code/routines&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quota Detail That Matters
&lt;/h2&gt;

&lt;p&gt;Recurring routines hit a &lt;strong&gt;daily firing cap&lt;/strong&gt;. One-off runs &lt;strong&gt;do not count against that cap.&lt;/strong&gt; They consume normal plan usage (Pro/Max/Team/Enterprise) just like an interactive session, but they don't deplete the routine quota.&lt;/p&gt;

&lt;p&gt;That means you can be at 100% routine usage and still schedule a cleanup task for 2 weeks from now without it being rejected.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-Off vs Recurring
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;One-Off&lt;/th&gt;
&lt;th&gt;Recurring&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fires&lt;/td&gt;
&lt;td&gt;Once&lt;/td&gt;
&lt;td&gt;Every cadence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-disables after firing&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Counts against daily cap&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum interval&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom cron expression&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Yes (5-field via &lt;code&gt;/schedule update&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where One-Off Runs Beat Other Tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Where it runs&lt;/th&gt;
&lt;th&gt;When session ends&lt;/th&gt;
&lt;th&gt;Maintenance burden&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Routines One-Off Run&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Anthropic cloud&lt;/td&gt;
&lt;td&gt;Keeps running&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude &lt;code&gt;/loop&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Local terminal&lt;/td&gt;
&lt;td&gt;Stops immediately&lt;/td&gt;
&lt;td&gt;Active session required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Desktop scheduled task&lt;/td&gt;
&lt;td&gt;Your machine&lt;/td&gt;
&lt;td&gt;Machine must stay on&lt;/td&gt;
&lt;td&gt;OS-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cron daemon&lt;/td&gt;
&lt;td&gt;Your server&lt;/td&gt;
&lt;td&gt;Server must stay on&lt;/td&gt;
&lt;td&gt;You operate it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you've ever maintained a cron server just to run a once-a-month cleanup, this is the rebuke.&lt;/p&gt;

&lt;h2&gt;
  
  
  Realistic Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Cleanup after rollout
&lt;/h3&gt;

&lt;p&gt;The canonical use case. Ship a feature behind a flag, then schedule its removal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/schedule in 2 weeks, open a cleanup PR that removes the SIGNUP_V2_ENABLED flag from frontend and backend repos. Update tests. Branch name: claude/remove-signup-v2-flag.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Post-deploy follow-up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/schedule in 1 hour, run smoke tests on production and post the result to #deploy-alerts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Forget-me-not reminder
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/schedule next Friday at 4pm, summarize this week's merged PRs and draft a release note.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Long-horizon tracking
&lt;/h3&gt;

&lt;p&gt;Decision in a meeting that needs revisiting in 30 days? Drop a routine and forget about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Research Preview Caveats
&lt;/h2&gt;

&lt;p&gt;This feature is still in research preview. Before relying on it in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Beta header&lt;/strong&gt;: &lt;code&gt;experimental-cc-routine-2026-04-01&lt;/code&gt;. Request shapes, response shapes, rate limits, and token semantics may change. Only the last two versions are kept.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API trigger tokens&lt;/strong&gt;: shown &lt;strong&gt;once&lt;/strong&gt; at creation. Store them in a password manager or your secrets manager &lt;strong&gt;immediately&lt;/strong&gt; — there's no recovery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub webhook caps&lt;/strong&gt;: per-routine and per-account hourly limits. Excess events are dropped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branch protection&lt;/strong&gt;: by default, routines can only push to branches starting with &lt;code&gt;claude/&lt;/code&gt;. There's an opt-out (&lt;code&gt;Allow unrestricted branch pushes&lt;/code&gt;), but I'd leave it on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Author attribution&lt;/strong&gt;: PRs, commits, and Slack messages from a routine are recorded under &lt;strong&gt;your&lt;/strong&gt; user, not a service account. Plan team conventions accordingly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Self-Contained Prompt Template
&lt;/h2&gt;

&lt;p&gt;For the cleanup-PR scenario, this is what an actually-runnable prompt looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Search the codebase for usages of the feature flag SIGNUP_V2_ENABLED in `frontend/` and `backend/`.
2. Remove all references, treating the flag as permanently enabled.
3. Update unit tests that depended on the previous behavior.
4. Run `npm test` (frontend) and `pytest` (backend). If tests fail, fix and rerun.
5. Open a PR titled "chore: remove SIGNUP_V2_ENABLED feature flag" on branch `claude/remove-signup-v2-flag`.
6. PR body should summarize removed code paths and test results.
7. Add the label "automated-cleanup".
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule of thumb: write the prompt as if you won't be there to answer follow-up questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;One-off runs aren't flashy. They fill the small gap between "do it now" and "set up a cron." But that gap matters in practice — feature-flag cleanups, post-deploy follow-ups, meeting follow-ups all live there.&lt;/p&gt;

&lt;p&gt;If you're already using Claude Code, try one this week. Schedule a cleanup PR for 2 weeks from now, then forget about it. When Monday comes around, the PR will be waiting.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://code.claude.com/docs/en/routines#schedule-a-one-off-run" rel="noopener noreferrer"&gt;code.claude.com/docs/en/routines&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>automation</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>DeepSeek V4 is out: 1.6T parameters, 1M context, MIT license — and it costs 1/6 of Opus 4.7</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:52:48 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/deepseek-v4-is-out-16t-parameters-1m-context-mit-license-and-it-costs-16-of-opus-47-1glb</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/deepseek-v4-is-out-16t-parameters-1m-context-mit-license-and-it-costs-16-of-opus-47-1glb</guid>
      <description>&lt;p&gt;DeepSeek released V4 on April 24, 2026 as an open-weight model under MIT license. The headline numbers reset expectations for what frontier-grade AI costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro&lt;/strong&gt;: 1.6T total params (49B active), 1M context, $1.74/$3.48 per 1M tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Flash&lt;/strong&gt;: 284B total (13B active), 1M context, $0.14/$0.28 per 1M tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT (full commercial + fine-tuning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training hardware&lt;/strong&gt;: Huawei Ascend 950PR — zero NVIDIA chips&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weights&lt;/strong&gt;: HuggingFace&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For context, Claude Opus 4.7 sits at $15/$75 per 1M tokens. V4-Pro is roughly 22x cheaper at frontier-adjacent quality. That's not an incremental change — it's a category shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing reality check
&lt;/h2&gt;

&lt;p&gt;Let me put this in concrete terms. Imagine a SaaS service processing 100M tokens per month (a moderate usage tier for AI features in a B2B product):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Monthly cost (input)&lt;/th&gt;
&lt;th&gt;Monthly cost (output, 50M)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;$3,750&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$500&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Pro&lt;/td&gt;
&lt;td&gt;$174&lt;/td&gt;
&lt;td&gt;$174&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Flash&lt;/td&gt;
&lt;td&gt;$14&lt;/td&gt;
&lt;td&gt;$14&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's the API price. With open weights, you can self-host V4-Flash (284B, 13B active) on a single H100 cluster and amortize the cost further. For a startup running a coding assistant or a document analysis SaaS, this means moving from "AI is the dominant infrastructure cost" to "AI is a rounding error."&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually new in the architecture
&lt;/h2&gt;

&lt;p&gt;V4 isn't just V3 scaled up. There are two fundamental architectural changes worth understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. CSA + HCA hybrid attention
&lt;/h3&gt;

&lt;p&gt;V3.2 used DSA (DeepSeek Sparse Attention). V4 introduces a two-level compression:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSA (Compressed Sparse Attention)&lt;/strong&gt;: groups tokens along the sequence dimension to reduce attention FLOPs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HCA (Heavily Compressed Attention)&lt;/strong&gt;: additionally compresses the token dimension to shrink the KV cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result at 1M context length:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inference FLOPs: 27% of V3.2 (V4-Flash: 10%)&lt;/li&gt;
&lt;li&gt;KV cache: 10% of V3.2 (V4-Flash: 7%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So context length grew 8x (128K → 1M) while memory pressure stayed roughly the same. The implication is that future scaling to 10M context shouldn't break the bank — the architecture is built for it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Manifold-constrained hyper-connections (mHC)
&lt;/h3&gt;

&lt;p&gt;DeepSeek replaced standard residual connections with mHC. The exact mechanism needs more study from the technical report, but the high-level effect is improved information flow stability in deep networks while increasing expressive capacity. This matters for stability when training trillion-parameter models with smaller compute budgets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Here's where V4-Pro lands:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;V4-Pro&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MMLU&lt;/td&gt;
&lt;td&gt;90.1%&lt;/td&gt;
&lt;td&gt;~2pp behind Gemini 3.1 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MMLU-Pro&lt;/td&gt;
&lt;td&gt;87.5%&lt;/td&gt;
&lt;td&gt;Knowledge-intensive gap remains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HumanEval&lt;/td&gt;
&lt;td&gt;90.0%&lt;/td&gt;
&lt;td&gt;Top-tier coding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench Verified&lt;/td&gt;
&lt;td&gt;80.6%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source #1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LiveCodeBench&lt;/td&gt;
&lt;td&gt;93.5%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Best open model&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond&lt;/td&gt;
&lt;td&gt;90.1%&lt;/td&gt;
&lt;td&gt;Behind Gemini on hard science&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDPval (professional)&lt;/td&gt;
&lt;td&gt;1554 pts&lt;/td&gt;
&lt;td&gt;Open-source #1, overall #6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DeepSeek's own technical report says V4 is "narrowly behind" GPT-5.4 and Gemini 3.1 Pro. They estimate the gap with frontier closed-source at 3-6 months. For coding workloads specifically, V4 is at parity or slightly ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Huawei angle (this is the bigger story)
&lt;/h2&gt;

&lt;p&gt;V4 was trained entirely on Huawei Ascend 950PR — the first frontier-grade open model trained without any NVIDIA hardware. Last year, Jensen Huang publicly described this scenario as a "disaster." It's now reality.&lt;/p&gt;

&lt;p&gt;US export controls on AI chips were designed to slow down Chinese AI development. V4 demonstrates that the constraints have been routed around, at least at the model training level. For anyone building infrastructure decisions, this means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA's effective monopoly on AI training is over&lt;/strong&gt; at the prosumer/enterprise tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chinese AI ecosystem is now fully self-sufficient&lt;/strong&gt; for frontier model development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geopolitical assumptions in your AI roadmap need updating&lt;/strong&gt; if you assumed China would lag 2-3 years&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical takeaways
&lt;/h2&gt;

&lt;p&gt;If you're building with LLMs in 2026, here's what V4 changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  For API consumers
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Re-benchmark your stack&lt;/strong&gt;: Run V4-Flash against your current Claude/GPT workload. For coding, document analysis, and summarization tasks, you'll likely find quality is acceptable at 1/50 the cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider hybrid routing&lt;/strong&gt;: V4-Flash for high-volume routine tasks, Claude/GPT for the hardest reasoning. This routing pattern can cut total AI costs by 80%+.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-evaluate margin economics&lt;/strong&gt;: If your SaaS product was barely viable due to AI costs, V4 might unlock previously infeasible business models.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  For sovereign AI / regulated industries
&lt;/h3&gt;

&lt;p&gt;Open weights + MIT license means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Download V4-Flash, deploy on-prem&lt;/li&gt;
&lt;li&gt;Never send sensitive data to external APIs&lt;/li&gt;
&lt;li&gt;Comply with GDPR / HIPAA / industry-specific data residency rules&lt;/li&gt;
&lt;li&gt;Fine-tune on proprietary data without licensing constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a meaningful unlock for finance, healthcare, legal, and government use cases that have been blocked from frontier AI by data residency requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  For the 1M context window
&lt;/h3&gt;

&lt;p&gt;You can now load entire codebases, multi-year contract sets, or full project documentation into a single context. This doesn't kill RAG entirely (it's still more efficient at scale), but it does eliminate the need for RAG infrastructure for many lightweight document analysis tasks.&lt;/p&gt;

&lt;p&gt;A practical example: instead of building a RAG system over your company's design docs, you can now load all docs (within 1M tokens) into context and ask questions directly. For internal tools, this is often simpler and produces better results.&lt;/p&gt;

&lt;h3&gt;
  
  
  For fine-tuning
&lt;/h3&gt;

&lt;p&gt;MIT license + 1.6T base parameters opens the door to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Domain-specific models (legal, medical, financial)&lt;/li&gt;
&lt;li&gt;Language-specific models (e.g., Korean, Japanese, Spanish optimization)&lt;/li&gt;
&lt;li&gt;Task-specific specialists (e.g., contract review, code review agents)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;V3.2 fine-tunes have already shown IMO gold medal-level math performance via fine-tuning alone. V4 raises the ceiling further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The AI market shifted yesterday. The closed-source moat narrative — "frontier models will always be 12-18 months ahead of open source" — is now harder to defend. DeepSeek's own admission of a 3-6 month gap, combined with MIT licensing and 1/6 pricing, means the practical advantage of using closed-source frontier models has shrunk to specific use cases.&lt;/p&gt;

&lt;p&gt;For most builders, V4 is now the default open option. For sovereign AI deployments, it might be the default option, period.&lt;/p&gt;




&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://api-docs.deepseek.com/news/news260424" rel="noopener noreferrer"&gt;DeepSeek V4 Preview Release | DeepSeek API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek-V4-Pro · Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/deepseekv4" rel="noopener noreferrer"&gt;DeepSeek-V4: a million-token context that agents can actually use&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2026/Apr/24/deepseek-v4/" rel="noopener noreferrer"&gt;DeepSeek V4 — almost on the frontier, a fraction of the price (Simon Willison)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5" rel="noopener noreferrer"&gt;VentureBeat coverage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>DeepSeek V4-Pro: 1.6T MIT-Licensed Open Weights at 88% Less Than Opus 4.7</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:52:20 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/deepseek-v4-pro-16t-mit-licensed-open-weights-at-88-less-than-opus-47-30a2</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/deepseek-v4-pro-16t-mit-licensed-open-weights-at-88-less-than-opus-47-30a2</guid>
      <description>&lt;p&gt;While OpenAI builds a desktop super-app, Anthropic doubles down on enterprise lock-in with Opus 4.7, and Google pushes Gemini 3.1-Pro paid tier, &lt;strong&gt;DeepSeek went the exact opposite direction&lt;/strong&gt;. On April 24, 2026, they released V4-Pro and V4-Flash on Hugging Face under MIT license. 1.6 trillion parameters. Currently the largest open-weight model ever published.&lt;/p&gt;

&lt;p&gt;If you've been watching the open-source vs frontier gap close in slow motion since Llama 405B, this is the moment that gap collapsed for coding workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers That Matter
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total parameters:   1.6T
Active per token:   49B (MoE)
Context window:     1M tokens (native, not retrofitted)
Pretrain corpus:    33T tokens
License:            MIT (commercial use OK)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pricing vs Big 3
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input ($/M)&lt;/th&gt;
&lt;th&gt;Output ($/M)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;V4-Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.74&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3.48&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;~$10&lt;/td&gt;
&lt;td&gt;$30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;~$15&lt;/td&gt;
&lt;td&gt;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;V4-Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.14&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;V4-Pro output is &lt;strong&gt;88% cheaper than Opus 4.7&lt;/strong&gt;. V4-Flash input is roughly 1.4% the cost of GPT-5.5 input. Fortune's startup desk literally wrote that this "embarrasses Western AI labs' pricing pages."&lt;/p&gt;

&lt;p&gt;This isn't a loss-leader. It's structural — and that's the engineering story worth your time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How They Got There — Hybrid Attention
&lt;/h2&gt;

&lt;p&gt;The headline architectural change is &lt;strong&gt;CSA + HCA interleaved attention&lt;/strong&gt; (Compressed Sparse Attention + Heavily Compressed Attention). Two attention variants stacked in alternating layers.&lt;/p&gt;

&lt;p&gt;The result at 1M context, vs DeepSeek V3.2:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-token inference FLOPs: &lt;strong&gt;27% of V3.2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;KV cache memory: &lt;strong&gt;10% of V3.2&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;V4-Flash is even more aggressive: &lt;strong&gt;10% FLOPs, 7% KV cache&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For anyone who has tried to run long-context production workloads, this is the difference between "interesting demo" and "we can actually serve this." The bottleneck for 1M context has always been KV cache memory linearly blowing up. Compress that to 1/10, and a single GPU can serve sessions that previously needed entire boxes.&lt;/p&gt;

&lt;p&gt;Add to that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FP4 quantization-aware training&lt;/strong&gt; (precision baked in from pretraining, not bolted on after)&lt;/li&gt;
&lt;li&gt;A new optimizer (details in the partial tech report)&lt;/li&gt;
&lt;li&gt;Reworked residual connections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture doesn't just scale — it scales &lt;em&gt;efficiently&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where V4-Pro Wins
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Coding (#1 across the board):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LiveCodeBench:   93.5%   (vs Gemini 91.7%, Claude 88.8%)
Codeforces:      3206    (#1)
BrowseComp:      83.4%   (vs Opus 4.7 at 79.3%)
Terminal-Bench:  67.9%   (close to Opus 4.7's 69.4%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your stack is heavy on code generation, code review, or autonomous coding agents, V4-Pro is now the cost-performance leader by a wide margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where V4-Pro Loses
&lt;/h2&gt;

&lt;p&gt;Knowledge reasoning and complex agent workflows still belong to Big 3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPQA Diamond:     90.1%   (Opus 4.7: 94.2%)
SWE-bench Pro:    55.4%   (Opus 4.7: 64.3%)
HLE:              behind frontier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The honest summary: &lt;strong&gt;coding is the blade, knowledge reasoning is still catching up.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Self-Hosting
&lt;/h2&gt;

&lt;p&gt;Three concrete changes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data-sovereign workloads finally have a frontier-class option
&lt;/h3&gt;

&lt;p&gt;If your org can't send data to external APIs (healthcare, finance, public sector, legal), Llama 405B was the previous best self-host coding option — and it lagged frontier by a meaningful margin. V4-Pro closes that gap on the workloads where it matters most.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Token-cost-sensitive products can rebuild on V4-Flash
&lt;/h3&gt;

&lt;p&gt;If you're a SaaS startup paying $X0K/month for Haiku or 4o-mini at scale, V4-Flash at $0.14/M input is roughly 1/10th the cost. Self-hosted, the marginal cost approaches zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The hardware story is about to get interesting
&lt;/h3&gt;

&lt;p&gt;DeepSeek hinted at Huawei Ascend 950 integration. If that lands, the implication is "frontier-class model on non-NVIDIA silicon at lower TCO" — which would be the first credible break in NVIDIA dependency for inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull from Hugging Face&lt;/span&gt;
git lfs &lt;span class="nb"&gt;install
&lt;/span&gt;git clone https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

&lt;span class="c"&gt;# Or use the API&lt;/span&gt;
curl https://api.deepseek.com/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Will the GA release close the GPQA / SWE-bench gap with Opus 4.7?&lt;/li&gt;
&lt;li&gt;How quickly will domain-specific fine-tunes outperform Big 3 in verticals?&lt;/li&gt;
&lt;li&gt;Does this pressure OpenAI / Anthropic to release older model weights?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're running self-host benchmarks on actual workloads, drop your numbers in the comments. Curious what the production-side picture looks like.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek API Docs: &lt;a href="https://api-docs.deepseek.com/news/news260424" rel="noopener noreferrer"&gt;https://api-docs.deepseek.com/news/news260424&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;VentureBeat: &lt;a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5" rel="noopener noreferrer"&gt;https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;TechCrunch: &lt;a href="https://techcrunch.com/2026/04/24/deepseek-previews-new-ai-model-that-closes-the-gap-with-frontier-models/" rel="noopener noreferrer"&gt;https://techcrunch.com/2026/04/24/deepseek-previews-new-ai-model-that-closes-the-gap-with-frontier-models/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Simon Willison: &lt;a href="https://simonwillison.net/2026/Apr/24/deepseek-v4/" rel="noopener noreferrer"&gt;https://simonwillison.net/2026/Apr/24/deepseek-v4/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MarkTechPost (architecture deep-dive): &lt;a href="https://www.marktechpost.com/2026/04/24/deepseek-ai-releases-deepseek-v4-compressed-sparse-attention-and-heavily-compressed-attention-enable-one-million-token-contexts/" rel="noopener noreferrer"&gt;https://www.marktechpost.com/2026/04/24/deepseek-ai-releases-deepseek-v4-compressed-sparse-attention-and-heavily-compressed-attention-enable-one-million-token-contexts/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Claude Managed Agents now have Built-in Memory — Rakuten cut first-try errors by 97%</title>
      <dc:creator>정상록</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:46:27 +0000</pubDate>
      <link>https://dev.to/_46ea277e677b888e0cd13/claude-managed-agents-now-have-built-in-memory-rakuten-cut-first-try-errors-by-97-bd7</link>
      <guid>https://dev.to/_46ea277e677b888e0cd13/claude-managed-agents-now-have-built-in-memory-rakuten-cut-first-try-errors-by-97-bd7</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Anthropic shipped Memory Stores for Claude Managed Agents on &lt;strong&gt;2026-04-23&lt;/strong&gt; (Public Beta). Instead of standing up your own RAG stack, memory is mounted as a filesystem directory at &lt;code&gt;/mnt/memory/{store-name}&lt;/code&gt; and the agent uses standard Read/Write/Bash tools.&lt;/p&gt;

&lt;p&gt;Production metrics so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rakuten&lt;/strong&gt;: 97% fewer first-try errors, 27% lower cost, 34% lower latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wisedocs&lt;/strong&gt;: 30% faster document validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Netflix&lt;/strong&gt;: eliminated manual prompt/skill updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ando&lt;/strong&gt;: stopped building memory infra entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The problem before
&lt;/h2&gt;

&lt;p&gt;Managed Agents reset every session. Anything an agent learned about a user, a project, or a workflow vanished when the conversation ended. To bridge the gap, teams had to build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A vector database&lt;/li&gt;
&lt;li&gt;An embedding pipeline&lt;/li&gt;
&lt;li&gt;A retrieval API&lt;/li&gt;
&lt;li&gt;Custom audit logs&lt;/li&gt;
&lt;li&gt;Custom permission scoping&lt;/li&gt;
&lt;li&gt;Custom concurrency controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a lot of yak shaving for what is essentially "remember what you learned yesterday".&lt;/p&gt;

&lt;h2&gt;
  
  
  What Memory Stores actually is
&lt;/h2&gt;

&lt;p&gt;A Memory Store is a workspace-scoped collection of text documents, mounted into the agent's session container as a directory. The agent doesn't learn a new memory API — it just uses the filesystem.&lt;/p&gt;

&lt;p&gt;Three concepts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Memory Store&lt;/td&gt;
&lt;td&gt;Container holding text documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Path-addressable text file (max 100 KB / ~25K tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Version&lt;/td&gt;
&lt;td&gt;Immutable snapshot created on every change&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The mount path, access mode, description, and instructions are auto-injected into the system prompt. No prompting tricks needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickstart (Python SDK)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Set the beta header
&lt;/h3&gt;

&lt;p&gt;If you use the SDK, this is automatic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;managed-agents-2026-04-01
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create a store
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User Preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Per-user preferences and project context.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. (Optional) Seed with reference content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/formatting_standards.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All reports use GAAP formatting...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Attach to a session
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_store_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or "read_only"
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User preferences and project context. Check before starting any task.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Critical constraint&lt;/strong&gt;: stores can only be attached at session creation time. No runtime add/remove.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5. Agent-side usage
&lt;/h3&gt;

&lt;p&gt;The agent just runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; /mnt/memory/user-preferences/
&lt;span class="nb"&gt;cat&lt;/span&gt; /mnt/memory/user-preferences/formatting_standards.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The prompt-injection footgun
&lt;/h2&gt;

&lt;p&gt;Default access is &lt;code&gt;read_write&lt;/code&gt;. If your agent processes untrusted input — user prompts, web content, third-party tool output — malicious content can land in memory and the &lt;strong&gt;next session reads it as trusted reference material&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the kind of bug that doesn't show up in your dev environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigation rule of thumb&lt;/strong&gt;: any store the agent doesn't strictly need to write to should be &lt;code&gt;read_only&lt;/code&gt;. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standards / conventions&lt;/li&gt;
&lt;li&gt;Domain glossaries&lt;/li&gt;
&lt;li&gt;Shared lookup data&lt;/li&gt;
&lt;li&gt;External reference docs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Concurrency: optimistic locking built in
&lt;/h2&gt;

&lt;p&gt;Multiple agents can hit the same store simultaneously. To avoid lost updates, use &lt;code&gt;content_sha256&lt;/code&gt; as a precondition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_store_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CORRECTED: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;precondition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_sha256&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_sha256&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_sha256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If another session updated first, the call fails with a hash mismatch — re-read and retry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Audit and rollback
&lt;/h2&gt;

&lt;p&gt;Every change creates an immutable version, retained for &lt;strong&gt;30 days&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;versions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;old&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;version_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_store_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Restore by writing the old content back via memories.update()
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's no dedicated restore endpoint — you &lt;code&gt;update()&lt;/code&gt; with the old content. PII can be scrubbed via &lt;code&gt;memory_versions.redact()&lt;/code&gt; while keeping the audit trail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Stores vs traditional RAG
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Traditional RAG&lt;/th&gt;
&lt;th&gt;Memory Stores&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Access&lt;/td&gt;
&lt;td&gt;Separate search API&lt;/td&gt;
&lt;td&gt;Filesystem mount&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Embeddings, indexing, retrieval queries&lt;/td&gt;
&lt;td&gt;bash/Read/Write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra&lt;/td&gt;
&lt;td&gt;Vector DB + embedding model&lt;/td&gt;
&lt;td&gt;Managed by Claude Platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit&lt;/td&gt;
&lt;td&gt;DIY&lt;/td&gt;
&lt;td&gt;Built-in immutable versions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permissions&lt;/td&gt;
&lt;td&gt;DIY&lt;/td&gt;
&lt;td&gt;scoped (read_only/read_write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;DIY locking&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;content_sha256&lt;/code&gt; precondition&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Beta limits
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Memory stores per organization&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memories per store&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage per store&lt;/td&gt;
&lt;td&gt;100 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Versions per store&lt;/td&gt;
&lt;td&gt;250,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size per memory&lt;/td&gt;
&lt;td&gt;100 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version retention&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stores per session&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;instructions&lt;/code&gt; field&lt;/td&gt;
&lt;td&gt;4,096 chars&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 30-day version retention is the one to plan around. If you need longer audit trails, export via the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Memory lets us stop building memory infra and focus on the product itself" — Sara Du, Founder, Ando&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Persistent learning is now a platform primitive, not a side project. Two-digit operational wins (Rakuten 97%, Wisedocs 30%, Netflix's eliminated manual updates) suggest this isn't marginal.&lt;/p&gt;

&lt;p&gt;Worth integrating, especially for any agent that already has the same conversation with the same user every day.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: &lt;a href="https://platform.claude.com/docs/en/managed-agents/memory" rel="noopener noreferrer"&gt;Claude Managed Agents — Memory (official docs)&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Launch blog&lt;/strong&gt;: &lt;a href="https://claude.com/blog/claude-managed-agents-memory" rel="noopener noreferrer"&gt;Anthropic — Claude Managed Agents Memory (2026-04-23)&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
