<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ArshTechPro</title>
    <description>The latest articles on DEV Community by ArshTechPro (@arshtechpro).</description>
    <link>https://dev.to/arshtechpro</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3258664%2F7a2cc61a-0b4d-4cf8-884e-52f33905cac3.png</url>
      <title>DEV Community: ArshTechPro</title>
      <link>https://dev.to/arshtechpro</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arshtechpro"/>
    <language>en</language>
    <item>
      <title>MAI-Thinking-1: Microsoft's New Reasoning Model and What It Means for Developers</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Fri, 05 Jun 2026 16:24:31 +0000</pubDate>
      <link>https://dev.to/arshtechpro/mai-thinking-1-microsofts-new-reasoning-model-and-what-it-means-for-developers-2fma</link>
      <guid>https://dev.to/arshtechpro/mai-thinking-1-microsofts-new-reasoning-model-and-what-it-means-for-developers-2fma</guid>
      <description>&lt;p&gt;Microsoft just shipped MAI-Thinking-1, their first in-house reasoning model. If you've been watching the AI space, you know reasoning models — the kind that "think before they answer" — have become a battleground. OpenAI has o3, Anthropic has Claude with extended thinking, Google has Gemini's thinking mode. Now Microsoft is in with their own, and they built it from the ground up rather than licensing or distilling from someone else's model.&lt;/p&gt;

&lt;p&gt;Here is what you actually need to know as a developer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is MAI-Thinking-1?
&lt;/h2&gt;

&lt;p&gt;MAI-Thinking-1 is Microsoft's reasoning-focused language model, developed by their internal AI lab (Microsoft AI, or MAI). It is a medium-sized model designed specifically for complex, multi-step tasks — the kind of problems where a model needs to reason through multiple steps before producing an answer, rather than just pattern-matching to a response.&lt;/p&gt;

&lt;p&gt;The headline positioning is this: it is a smaller model that punches well above its weight class on software engineering and math benchmarks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Sparse Mixture of Experts
&lt;/h2&gt;

&lt;p&gt;The model is a &lt;strong&gt;sparse Mixture of Experts (MoE)&lt;/strong&gt; architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;35 billion active parameters&lt;/strong&gt; at inference time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~1 trillion total parameters&lt;/strong&gt; across all expert layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction matters for developers. In a dense model, every parameter fires for every token. In a MoE model, only a subset of "experts" activate per token, so the active compute footprint is much smaller than the total parameter count suggests. The practical result: you get near-frontier quality reasoning at a significantly lower inference cost than a comparable dense model.&lt;/p&gt;

&lt;p&gt;Compare that to something like GPT-4 class models which are estimated at 1.8T+ parameters (dense), and you start to see why Microsoft is calling this "mid-weight pricing."&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark Performance
&lt;/h2&gt;

&lt;p&gt;Microsoft reports the following numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;MAI-Thinking-1&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AIME 2025&lt;/td&gt;
&lt;td&gt;97.0%&lt;/td&gt;
&lt;td&gt;Advanced math competition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIME 2026&lt;/td&gt;
&lt;td&gt;94.5%&lt;/td&gt;
&lt;td&gt;Most recent math competition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;Competitive with Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;Real-world software engineering tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human side-by-side&lt;/td&gt;
&lt;td&gt;Preferred over Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Blind evaluation by Surge raters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The SWE-Bench Pro result is worth unpacking. SWE-Bench tests models on real GitHub issues — the model has to read a codebase, understand a bug report, and produce a patch that passes the existing test suite. It is arguably the most developer-relevant benchmark that exists right now. Matching Claude Opus 4.6 on this benchmark while running on far fewer active parameters is a meaningful result.&lt;/p&gt;

&lt;p&gt;The human preference eval covered 1,276 tasks across single-turn and multi-turn conversations, judged by professional raters from Surge, and prioritized whether responses actually advanced the user's goals rather than just sounding good.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes It Different From Other Models: Training Philosophy
&lt;/h2&gt;

&lt;p&gt;Microsoft made a deliberate choice that is worth understanding because it affects how the model behaves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No distillation from third-party models.&lt;/strong&gt; Most smaller models are trained by learning to imitate a larger, more capable model (this is called distillation or knowledge distillation). MAI-Thinking-1 was trained without doing this. Microsoft argues that distilled models are fundamentally bound to the design choices of their teacher model and struggle to generalize to new situations. Training from scratch on their own data means the model has to genuinely learn reasoning rather than mimicking it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clean, licensed training data only.&lt;/strong&gt; All pre-training data was commercially licensed, and AI-generated content was excluded from pre-training. For enterprises, this matters a lot: it affects copyright exposure and gives Microsoft better ability to explain (and improve) model behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-house training infrastructure end-to-end.&lt;/strong&gt; From hardware co-design on Microsoft's own accelerators to the reinforcement learning framework, the entire training stack is built internally. This is what they call the "Hill-Climbing Machine" — a system where every component can be improved independently, so capabilities improve continuously rather than requiring architectural overhauls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Developer-Relevant Features
&lt;/h2&gt;

&lt;p&gt;Before you think about API calls, here is the feature set:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context window: 256,000 tokens.&lt;/strong&gt; That is roughly 600 pages of text. You can fit entire codebases, large contracts, or lengthy research documents in a single context. For agentic coding workflows this is essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Function calling / tool use.&lt;/strong&gt; Supported. If you are building agents that need to call APIs, query databases, or interact with external services, the model can handle structured tool calls in the standard format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System prompt / developer instructions.&lt;/strong&gt; The model was trained to follow multi-layer instructions — meaning system prompts, user instructions, and constraints stack and interact predictably rather than the model silently ignoring one in favor of another.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chat Completions API compatibility.&lt;/strong&gt; This is significant. The API uses the same interface as the widely adopted OpenAI Chat Completions format. If you already have code that calls Azure OpenAI or any OpenAI-compatible endpoint, migration should require minimal changes — primarily just swapping the model name and endpoint URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise security via Microsoft Foundry.&lt;/strong&gt; All MAI models come with Microsoft Foundry's compliance stack: data residency controls, audit logging, private networking options. If you are building in a regulated industry, this is the access path that gets you the compliance paperwork you need.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Setup Will Look Like (When It's Available)
&lt;/h2&gt;

&lt;p&gt;Since the model is Chat Completions API-compatible, here is what calling it will look like once you have Foundry access. The pattern is essentially identical to calling Azure OpenAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AzureOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;azure_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&amp;lt;your-foundry-endpoint&amp;gt;.azure.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-12-01-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-foundry-api-key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mai-thinking-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. Think step by step.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this function and identify any edge cases: ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are already on the Azure OpenAI SDK or any OpenAI-compatible client, this is the shape of the migration. The main difference is the endpoint URL and model name — the rest of your code stays the same.&lt;/p&gt;

&lt;p&gt;For agentic workflows with tool calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_tests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run the test suite and return results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Path to the test file or directory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mai-thinking-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Where MAI-Thinking-1 Fits in Your Stack
&lt;/h2&gt;

&lt;p&gt;If you are trying to decide whether this model is worth tracking, here is a practical breakdown by use case:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic coding pipelines.&lt;/strong&gt; This is the primary target use case. The model was trained on deterministic, executable environments with real test suites. It is built for the multi-step loop of reading code, making edits, running tests, and recovering from failures. If you are building AI-powered code review, bug fixing, or code generation pipelines, this is worth evaluating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex reasoning tasks.&lt;/strong&gt; The AIME scores put it near the top of the field for mathematical and scientific reasoning. If your application involves multi-step problem solving — financial modeling, technical analysis, research summarization with synthesis — a reasoning model like this will outperform instruction-tuned models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise document processing.&lt;/strong&gt; The 256k context window plus the licensing provenance story makes this a credible option for enterprises processing contracts, technical documentation, or large codebases where IP exposure and compliance are real concerns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-volume daily workflows.&lt;/strong&gt; The MoE architecture and mid-weight pricing position this below frontier-cost models. If you have a use case that could benefit from strong reasoning but cannot justify the cost of running a full dense frontier model on every request, this is the price-performance sweet spot Microsoft is targeting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Safety Approach (And Why It Matters for Developers)
&lt;/h2&gt;

&lt;p&gt;Microsoft made an interesting engineering decision on safety that is worth understanding.&lt;/p&gt;

&lt;p&gt;Rather than treating safety as a post-hoc filter or a separate fine-tuning stage, they trained safety with the same reinforcement learning loop as capability. Unsafe compliance and unnecessary over-refusals are both treated as defects in the same reward model, weighted by potential harm severity.&lt;/p&gt;

&lt;p&gt;The practical effect: you should see fewer situations where the model refuses legitimate developer requests (writing code that involves networking, security concepts, system administration) while still declining actually harmful requests. Microsoft explicitly calls unnecessary refusals a failure mode, not a safe default.&lt;/p&gt;

&lt;p&gt;For developers, this means less time spent writing system prompts that work around overly cautious models.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Watch For
&lt;/h2&gt;

&lt;p&gt;A few things to keep an eye on as this moves to public preview:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing.&lt;/strong&gt; Not yet announced publicly. The "mid-weight" positioning suggests something meaningfully below frontier model pricing, but the actual numbers will determine whether the SWE-Bench Pro performance justifies switching from existing workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regional availability.&lt;/strong&gt; Microsoft Foundry supports multi-region deployment, but which specific Azure regions will have MAI-Thinking-1 available at launch will affect latency and data residency requirements for some use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limits and quota.&lt;/strong&gt; Private previews typically have constrained throughput. Production planning should wait for public preview numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model type&lt;/td&gt;
&lt;td&gt;Sparse Mixture of Experts (reasoning)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active parameters&lt;/td&gt;
&lt;td&gt;35B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total parameters&lt;/td&gt;
&lt;td&gt;~1T&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;256,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API format&lt;/td&gt;
&lt;td&gt;Chat Completions (OpenAI-compatible)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function calling&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current status&lt;/td&gt;
&lt;td&gt;Private preview on Microsoft Foundry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public access&lt;/td&gt;
&lt;td&gt;Coming soon (MAI Playground)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Early access&lt;/td&gt;
&lt;td&gt;Apply via Microsoft Foundry signup form&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Model page: &lt;a href="https://microsoft.ai/models/mai-thinking-1/" rel="noopener noreferrer"&gt;microsoft.ai/models/mai-thinking-1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Technical paper: &lt;a href="https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf" rel="noopener noreferrer"&gt;PDF&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>ai</category>
      <category>microsoft</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Thu, 04 Jun 2026 09:15:08 +0000</pubDate>
      <link>https://dev.to/arshtechpro/headroom-cut-your-llm-token-usage-by-up-to-95-without-changing-your-answers-5g06</link>
      <guid>https://dev.to/arshtechpro/headroom-cut-your-llm-token-usage-by-up-to-95-without-changing-your-answers-5g06</guid>
      <description>&lt;p&gt;If you're building AI agents or running LLM pipelines in production, you already know the pain: tool outputs, logs, RAG chunks, and conversation history pile up fast. Before you know it, you're burning through tokens at a rate that makes your billing dashboard uncomfortable to look at.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chopratejas/headroom" rel="noopener noreferrer"&gt;Headroom&lt;/a&gt; is an open-source project that tackles this problem directly. It compresses everything your AI agent reads — before it ever reaches the LLM — and claims 60–95% token reduction on real workloads, with accuracy preserved.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Headroom sits as a layer between your application and the LLM provider. It takes whatever your agent was about to send — a stack of tool call results, a long log file, a RAG retrieval dump — and compresses it using one of several strategies depending on the content type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SmartCrusher&lt;/strong&gt; handles JSON (arrays, nested objects, mixed types)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CodeCompressor&lt;/strong&gt; uses AST-aware compression for Python, JS, Go, Rust, Java, and C++&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kompress-base&lt;/strong&gt; is a HuggingFace model trained on agentic traces, for prose and text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CacheAligner&lt;/strong&gt; stabilizes prompt prefixes so provider KV caches actually hit consistently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CCR (Content-Compressed Retrieval)&lt;/strong&gt; stores originals locally and lets the LLM fetch them on demand — so compression is fully reversible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;ContentRouter&lt;/strong&gt; figures out what kind of content it's looking at and picks the right compressor automatically. You don't have to think about it.&lt;/p&gt;

&lt;p&gt;The key thing: originals are never deleted. If the LLM needs the full version of something, it can retrieve it. Compression is lossless in that sense.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Numbers
&lt;/h2&gt;

&lt;p&gt;These are the token counts from the project's benchmarks on real agent workloads:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code search (100 results)&lt;/td&gt;
&lt;td&gt;17,765&lt;/td&gt;
&lt;td&gt;1,408&lt;/td&gt;
&lt;td&gt;92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SRE incident debugging&lt;/td&gt;
&lt;td&gt;65,694&lt;/td&gt;
&lt;td&gt;5,118&lt;/td&gt;
&lt;td&gt;92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub issue triage&lt;/td&gt;
&lt;td&gt;54,174&lt;/td&gt;
&lt;td&gt;14,761&lt;/td&gt;
&lt;td&gt;73%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codebase exploration&lt;/td&gt;
&lt;td&gt;78,502&lt;/td&gt;
&lt;td&gt;41,254&lt;/td&gt;
&lt;td&gt;47%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On accuracy benchmarks (GSM8K math, TruthfulQA, SQuAD v2, BFCL tool-use), scores hold steady or slightly improve after compression. The intuition is that stripping noise helps the model focus on the signal.&lt;/p&gt;

&lt;p&gt;You can reproduce these yourself with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; headroom.evals suite &lt;span class="nt"&gt;--tier&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Setup: Three Ways to Use It
&lt;/h2&gt;

&lt;p&gt;Headroom gives you three integration modes. Pick whichever fits how you work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Wrap an existing agent (zero code changes)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"headroom-ai[all]"&lt;/span&gt;
headroom wrap claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Headroom intercepts traffic from Claude Code, Codex, Cursor, Aider, or Copilot CLI automatically. You don't touch your existing code at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Drop-in proxy
&lt;/h3&gt;

&lt;p&gt;Run Headroom as a local proxy on any port:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;headroom proxy &lt;span class="nt"&gt;--port&lt;/span&gt; 8787
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point your existing OpenAI/Anthropic SDK calls at &lt;code&gt;localhost:8787&lt;/code&gt; instead of the provider URL. Any language, any framework — no code changes needed beyond updating the base URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Inline library
&lt;/h3&gt;

&lt;p&gt;For finer control, use it directly in Python or TypeScript:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;headroom&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;your_giant_tool_output&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# compressed has the same structure, far fewer tokens
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;compress&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;headroom-ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With the Anthropic SDK directly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;headroom&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;withHeadroom&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withHeadroom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;# Use client exactly like normal — compression happens automatically
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With LangChain:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;headroom.integrations.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HeadroomChatModel&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HeadroomChatModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_existing_llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With Vercel AI SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;wrapLanguageModel&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;headroomMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;headroom-ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;wrapLanguageModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;yourModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;headroomMiddleware&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires Python 3.10+. For Node/TypeScript: &lt;code&gt;npm install headroom-ai&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP Server Mode
&lt;/h2&gt;

&lt;p&gt;If you're using an MCP client (Claude Desktop, etc.), you can install Headroom as an MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;headroom mcp &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This exposes three MCP tools: &lt;code&gt;headroom_compress&lt;/code&gt;, &lt;code&gt;headroom_retrieve&lt;/code&gt;, and &lt;code&gt;headroom_stats&lt;/code&gt;. Your AI agent can call them directly as part of its tool loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cross-Agent Memory
&lt;/h2&gt;

&lt;p&gt;One underrated feature: shared memory across agents. If you're running Claude and Codex side by side, Headroom can give them a common compressed context store with automatic deduplication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;headroom.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SharedContext&lt;/span&gt;

&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SharedContext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# In a different agent's session
&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful in multi-agent pipelines where you'd otherwise be passing the same context repeatedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  headroom learn
&lt;/h2&gt;

&lt;p&gt;There's also a &lt;code&gt;headroom learn&lt;/code&gt; command that mines failed agent sessions and writes corrections back to &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, or &lt;code&gt;GEMINI.md&lt;/code&gt;. The idea is that your agent accumulates a record of what went wrong and avoids repeating the same mistakes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;headroom learn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It parses session logs, extracts failure patterns, and appends structured learnings to your project's agent config files.&lt;/p&gt;




&lt;h2&gt;
  
  
  Check Your Savings
&lt;/h2&gt;

&lt;p&gt;After using Headroom for a while:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;headroom stats
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows you cumulative compression ratios, tokens saved, and per-content-type breakdowns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth Trying?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes, if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run AI coding agents (Claude Code, Cursor, Codex, Aider) regularly and pay for tokens&lt;/li&gt;
&lt;li&gt;Build pipelines where tool outputs and RAG chunks are large and repetitive&lt;/li&gt;
&lt;li&gt;Want cross-agent shared memory without building it yourself&lt;/li&gt;
&lt;li&gt;Need reversible compression — Headroom never discards originals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip it, or approach carefully, if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only use a single provider's built-in context management and don't need more&lt;/li&gt;
&lt;li&gt;Work in sandboxed or restricted environments where running a local process is an issue&lt;/li&gt;
&lt;li&gt;Are on a very simple single-turn setup where context bloat isn't a real problem yet&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"headroom-ai[all]"&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;headroom-ai

&lt;span class="c"&gt;# Wrap an agent&lt;/span&gt;
headroom wrap claude

&lt;span class="c"&gt;# Run as proxy&lt;/span&gt;
headroom proxy &lt;span class="nt"&gt;--port&lt;/span&gt; 8787

&lt;span class="c"&gt;# Install as MCP server&lt;/span&gt;
headroom mcp &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Check savings&lt;/span&gt;
headroom stats

&lt;span class="c"&gt;# Learn from failures&lt;/span&gt;
headroom learn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/chopratejas/headroom" rel="noopener noreferrer"&gt;chopratejas/headroom&lt;/a&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>agentskills</category>
      <category>programming</category>
    </item>
    <item>
      <title>Harness: Turn a One-Line Prompt Into a Full Agent Team for Claude Code</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Tue, 02 Jun 2026 09:34:41 +0000</pubDate>
      <link>https://dev.to/arshtechpro/harness-turn-a-one-line-prompt-into-a-full-agent-team-for-claude-code-5eog</link>
      <guid>https://dev.to/arshtechpro/harness-turn-a-one-line-prompt-into-a-full-agent-team-for-claude-code-5eog</guid>
      <description>&lt;p&gt;You have Claude Code. You want to build something ambitious — a deep research pipeline, a full-stack app scaffold, a code review system. You could wire up agents manually, writing each definition by hand. Or you could type "build a harness for this project" and let Harness do it.&lt;/p&gt;

&lt;p&gt;Harness is a Claude Code plugin that takes a plain-English description of what you want to build and produces a ready-to-run agent team: the agent definitions, the skill files, the orchestration logic — all of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Problem Does It Solve?
&lt;/h2&gt;

&lt;p&gt;Multi-agent work in Claude Code requires a lot of upfront scaffolding. You need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define each agent's role and responsibilities in a &lt;code&gt;.claude/agents/&lt;/code&gt; markdown file&lt;/li&gt;
&lt;li&gt;write skill files in &lt;code&gt;.claude/skills/&lt;/code&gt; that describe how tasks get done&lt;/li&gt;
&lt;li&gt;decide how agents communicate and hand off work&lt;/li&gt;
&lt;li&gt;handle error cases and validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a non-trivial project, this is several hours of work before you have written a line of actual code. Harness compresses that into a single conversational prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Six Architecture Patterns
&lt;/h2&gt;

&lt;p&gt;Harness does not just dump agents into a folder. It picks one of six battle-tested team structures based on your domain:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipeline&lt;/strong&gt; — agents run in sequence, each one feeding into the next. Good for anything with clear stages: plan, write, test, deploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fan-out/Fan-in&lt;/strong&gt; — a coordinator spawns parallel agents, collects their results, and merges them. Good for research or code review where independent threads can run simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expert Pool&lt;/strong&gt; — agents are specialists invoked selectively based on what the current task needs. Good for domains with diverse sub-problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Producer-Reviewer&lt;/strong&gt; — one agent generates, another critiques. Good for content creation, documentation, or anything where quality gates matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supervisor&lt;/strong&gt; — a central agent dynamically routes tasks to workers based on what needs to happen next. Good for open-ended workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hierarchical Delegation&lt;/strong&gt; — top-down recursive delegation where complex tasks get broken down through multiple layers. Good for large-scale engineering or project management.&lt;/p&gt;

&lt;p&gt;Harness reads your description and picks the pattern that fits best. You can also guide it explicitly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;You need Claude Code installed and agent teams enabled. Agent teams are still behind a feature flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add that to your shell profile (&lt;code&gt;.zshrc&lt;/code&gt;, &lt;code&gt;.bashrc&lt;/code&gt;, etc.) so it persists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install via Plugin Marketplace
&lt;/h3&gt;

&lt;p&gt;Inside Claude Code, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add revfactory/harness
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin &lt;span class="nb"&gt;install &lt;/span&gt;harness@harness
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The plugin is now globally available in your Claude Code sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Manually (Global Skill)
&lt;/h3&gt;

&lt;p&gt;If you prefer to manage things yourself, clone the repo and copy the skill directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/revfactory/harness.git
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; harness/skills/harness ~/.claude/skills/harness
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This drops the skill files into Claude Code's global skill directory and makes them available in any project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using It
&lt;/h2&gt;

&lt;p&gt;Once installed, trigger it with a natural language prompt inside Claude Code. There is no special syntax — just describe what you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: deep research agent team&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a harness for deep research. I need an agent team that can investigate
any topic from multiple angles — web search, academic sources, community
sentiment — then cross-validate findings and produce a comprehensive report.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example: code review pipeline&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a harness for comprehensive code review. I want parallel agents
checking architecture, security vulnerabilities, performance bottlenecks,
and code style — then merging all findings into a single report.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example: full-stack development&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a harness for full-stack website development. The team should handle
design, frontend (React/Next.js), backend (API), and QA testing in a
coordinated pipeline from wireframe to deployment.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you run one of these, Harness generates files in your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
├── .claude/
│   ├── agents/
│   │   ├── analyst.md
│   │   ├── builder.md
│   │   └── qa.md
│   └── skills/
│       ├── analyze/
│       │   └── SKILL.md
│       └── build/
│           ├── SKILL.md
│           └── references/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent files define each agent's persona, capabilities, and constraints. The skill files define the step-by-step procedures each agent follows. You can read and edit every file — nothing is a black box.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Six-Phase Workflow Looks Like
&lt;/h2&gt;

&lt;p&gt;Harness does not just dump files. It runs a structured process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Domain Analysis&lt;/strong&gt; — it reads your prompt and identifies the key actors, inputs, and outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team Architecture Design&lt;/strong&gt; — it picks the right pattern from the six and sketches the team structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Definition Generation&lt;/strong&gt; — it writes the &lt;code&gt;.claude/agents/&lt;/code&gt; markdown files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill Generation&lt;/strong&gt; — it writes the &lt;code&gt;.claude/skills/&lt;/code&gt; files with Progressive Disclosure (loading only what context is needed, when it is needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration and Orchestration&lt;/strong&gt; — it wires inter-agent data passing and error handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation and Testing&lt;/strong&gt; — it sets up trigger verification and dry-run tests&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Is It Worth It?
&lt;/h2&gt;

&lt;p&gt;The repo includes A/B test results from a companion repository (&lt;code&gt;revfactory/claude-code-harness&lt;/code&gt;) covering 15 software engineering tasks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Without Harness&lt;/th&gt;
&lt;th&gt;With Harness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Average Quality Score&lt;/td&gt;
&lt;td&gt;49.5 / 100&lt;/td&gt;
&lt;td&gt;79.3 / 100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Win Rate&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;15 out of 15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Variance&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;-32%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The improvement scaled with task complexity: +23.8 points on basic tasks, +29.6 on advanced, +36.2 on expert-level tasks. The more difficult the problem, the more structure helps.&lt;/p&gt;

&lt;p&gt;One important caveat: this is an author-measured study with n=15, and third-party replications have not yet been published. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it clearly helps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are starting a new project and want agent scaffolding without spending hours on definitions&lt;/li&gt;
&lt;li&gt;Your task has multiple distinct sub-problems that map cleanly onto a team pattern&lt;/li&gt;
&lt;li&gt;You want to experiment with different team architectures quickly&lt;/li&gt;
&lt;li&gt;You are building something complex enough that ad-hoc prompting produces inconsistent results&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Gets Generated vs What You Maintain
&lt;/h2&gt;

&lt;p&gt;Harness generates a starting point. The files it creates are plain markdown — readable, editable, version-controllable. You own them after generation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ecosystem Fit
&lt;/h2&gt;

&lt;p&gt;Harness is Claude Code-native. It does not work with Gemini CLI or Codex out of the box — a Codex port called &lt;code&gt;meta-harness&lt;/code&gt; exists for that.&lt;/p&gt;

&lt;p&gt;If you are using LangGraph for state-recoverable, long-running orchestration, Harness is not a replacement. LangGraph handles persistent state and recovery across sessions; Harness handles team architecture design within Claude Code. They occupy different layers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable agent teams&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Install via plugin marketplace&lt;/span&gt;
/plugin marketplace add revfactory/harness
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;harness@harness

&lt;span class="c"&gt;# Or manually&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; skills/harness ~/.claude/skills/harness

&lt;span class="c"&gt;# Use it&lt;/span&gt;
&lt;span class="s2"&gt;"Build a harness for [your domain]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;Harness solves a real problem. Multi-agent scaffolding is tedious to write from scratch, easy to get wrong, and hard to keep consistent. Harness handles the structural work so you can focus on the domain logic.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://github.com/revfactory/harness" rel="noopener noreferrer"&gt;github.com/revfactory/harness&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;License: Apache 2.0&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>claude</category>
      <category>programming</category>
    </item>
    <item>
      <title>Compound Engineering: A Plugin That Makes Your AI Coding Agent Smarter Over Time</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Sat, 30 May 2026 23:34:00 +0000</pubDate>
      <link>https://dev.to/arshtechpro/compound-engineering-a-plugin-that-makes-your-ai-coding-agent-smarter-over-time-2pp0</link>
      <guid>https://dev.to/arshtechpro/compound-engineering-a-plugin-that-makes-your-ai-coding-agent-smarter-over-time-2pp0</guid>
      <description>&lt;p&gt;Most developers using AI coding tools hit the same ceiling eventually. The agent writes code, you accept or reject it, and next time it starts from scratch again. There's no memory of what worked, no accumulated judgment about your codebase, no improvement from one session to the next. You're getting faster, but the tool isn't getting better at helping you specifically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compound Engineering&lt;/strong&gt; is a plugin that tries to fix that. Built by Every.to and available for Claude Code, Cursor, Codex, GitHub Copilot, and a growing list of other tools, it introduces a structured workflow designed around a simple principle: each unit of engineering work should make the next one easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Traditional development accumulates technical debt. Features add complexity, bug fixes leave behind knowledge no one wrote down, and the codebase slowly becomes harder to change.&lt;/p&gt;

&lt;p&gt;The Compound Engineering philosophy inverts the ratio: 80% of the effort goes into planning and review, 20% into execution. The thinking is that a sharp plan produces a smaller, cleaner implementation. A good code review catches a pattern, not just a specific bug. A documented learning means the agent doesn't have to rediscover the same constraint next week.&lt;/p&gt;

&lt;p&gt;The plugin ships 37 skills and 51 agents that implement this workflow as slash commands you run inside your AI coding tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Workflow Loop
&lt;/h2&gt;

&lt;p&gt;The core loop looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/ce-brainstorm &lt;span class="s2"&gt;"add retry logic to background jobs"&lt;/span&gt;
/ce-plan docs/brainstorms/background-job-retry-requirements.md
/ce-work
/ce-code-review
/ce-compound
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what each step actually does:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-brainstorm&lt;/code&gt;&lt;/strong&gt; runs an interactive Q&amp;amp;A session. It asks clarifying questions about your feature or problem, then produces a right-sized requirements document. The output is a file you can hand directly to the next step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-plan&lt;/code&gt;&lt;/strong&gt; takes that requirements document and turns it into a detailed implementation plan: what to change, what to test, what the edge cases are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-work&lt;/code&gt;&lt;/strong&gt; executes the plan. It uses worktrees for isolation and tracks tasks as it goes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-code-review&lt;/code&gt;&lt;/strong&gt; is a multi-agent review pass before you merge. It looks for issues but, more importantly, tries to catch patterns — recurring problems that are worth documenting rather than just fixing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-compound&lt;/code&gt;&lt;/strong&gt; is where the compounding happens. It documents the learnings from this cycle so the agent has better context the next time you work on something similar.&lt;/p&gt;

&lt;p&gt;There are also two commands that sit outside the core loop:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-strategy&lt;/code&gt;&lt;/strong&gt; creates and maintains a &lt;code&gt;STRATEGY.md&lt;/code&gt; file — the product's target problem, approach, personas, and key metrics. When this file exists, brainstorm and plan commands read it as grounding, so your strategy choices flow naturally into feature decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-ideate&lt;/code&gt;&lt;/strong&gt; sits upstream of brainstorm for bigger questions. Instead of jumping into requirements, it generates and critically evaluates several ideas, then routes the strongest one into the brainstorm step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-debug&lt;/code&gt;&lt;/strong&gt; is for bug investigations. It systematically reproduces the failure, traces the root cause, and implements a fix rather than just patching the symptom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/ce-product-pulse&lt;/code&gt;&lt;/strong&gt; generates a time-windowed report on usage, performance, and errors. Reports are saved to &lt;code&gt;docs/pulse-reports/&lt;/code&gt; so they accumulate into a browseable history of how the product is actually performing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Code (simplest path)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No Bun required. After installing, run &lt;code&gt;/ce-setup&lt;/code&gt; to check your environment and bootstrap project config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;In Cursor Agent chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/add-plugin compound-engineering
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or search for "compound engineering" in the plugin marketplace.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Copilot (VS Code)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open the VS Code command palette&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;Chat: Install Plugin from Source&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Enter &lt;code&gt;EveryInc/compound-engineering-plugin&lt;/code&gt; as the repo&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;compound-engineering&lt;/code&gt; when VS Code shows the available plugins&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Codex (three steps required)
&lt;/h3&gt;

&lt;p&gt;Codex currently needs an extra step because its native plugin spec handles skills but not custom agents. The agents are what power commands like &lt;code&gt;/ce-code-review&lt;/code&gt; and &lt;code&gt;/ce-plan&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Register the marketplace&lt;/span&gt;
codex plugin marketplace add EveryInc/compound-engineering-plugin

&lt;span class="c"&gt;# Step 2: Install the agents via Bun&lt;/span&gt;
bunx @every-env/compound-plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering &lt;span class="nt"&gt;--to&lt;/span&gt; codex

&lt;span class="c"&gt;# Step 3: Launch Codex, run /plugins, find Compound Engineering, and install&lt;/span&gt;
codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three steps are required. Skipping the Bun step means delegation-based skills will report missing agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini CLI, OpenCode, Kiro, Pi
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bunx @every-env/compound-plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering &lt;span class="nt"&gt;--to&lt;/span&gt; gemini
bunx @every-env/compound-plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering &lt;span class="nt"&gt;--to&lt;/span&gt; opencode
bunx @every-env/compound-plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering &lt;span class="nt"&gt;--to&lt;/span&gt; kiro
bunx @every-env/compound-plugin &lt;span class="nb"&gt;install &lt;/span&gt;compound-engineering &lt;span class="nt"&gt;--to&lt;/span&gt; pi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  A Typical Bug Investigation
&lt;/h2&gt;

&lt;p&gt;For debugging, the flow is shorter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/ce-debug &lt;span class="s2"&gt;"checkout webhook sometimes creates duplicate invoices"&lt;/span&gt;
/ce-code-review
/ce-compound
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/ce-debug&lt;/code&gt; doesn't just jump to a fix. It reproduces the failure first, traces where it originates, then implements a targeted fix. After a review and a compound step, that knowledge about the invoicing edge case is now part of the project's accumulated context.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Gets Written to Disk
&lt;/h2&gt;

&lt;p&gt;This is worth understanding. Compound Engineering is not just about prompts — it produces files in your project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;STRATEGY.md&lt;/code&gt; — the product anchor document, if you use &lt;code&gt;/ce-strategy&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/brainstorms/&lt;/code&gt; — requirements documents from &lt;code&gt;/ce-brainstorm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/pulse-reports/&lt;/code&gt; — product performance reports from &lt;code&gt;/ce-product-pulse&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Compound notes written by &lt;code&gt;/ce-compound&lt;/code&gt;, stored wherever the plugin is configured to put them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These files are meant to persist across sessions and become grounding context for future agent interactions. The point is that each cycle is building toward a more informed next cycle, not starting fresh.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth Using?
&lt;/h2&gt;

&lt;p&gt;The plugin is a genuine attempt to solve a real problem: AI coding agents are stateless by default, and their usefulness degrades over the life of a complex project unless you actively manage context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's worth trying if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're working on a non-trivial codebase where decisions have history and context matters&lt;/li&gt;
&lt;li&gt;You find yourself re-explaining the same architectural constraints to your agent in every session&lt;/li&gt;
&lt;li&gt;You want more structured reviews than just "does this code work"&lt;/li&gt;
&lt;li&gt;You're using Claude Code, Cursor, or Copilot and want a workflow rather than just a chat interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;It may be overkill if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're working on small, self-contained scripts or prototypes&lt;/li&gt;
&lt;li&gt;Your sessions are isolated enough that accumulated context doesn't matter&lt;/li&gt;
&lt;li&gt;You prefer a lighter workflow and the brainstorm/plan/compound ceremony feels like friction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The contribution policy is also worth knowing: the author explicitly does not accept outside contributions and reviews issues and PRs through their own agents rather than directly. That's an unusual choice for an open-source tool, but it's stated clearly and the release cadence (153 releases, latest in May 2026) suggests active maintenance regardless.&lt;/p&gt;

&lt;p&gt;One honest note: the value of this plugin scales with how consistently you run the full loop. If you only use &lt;code&gt;/ce-work&lt;/code&gt; and skip &lt;code&gt;/ce-compound&lt;/code&gt;, you're leaving the most important part on the table. The compounding only happens if you complete the cycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;First-time setup and environment check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-strategy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create or update &lt;code&gt;STRATEGY.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-ideate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Big-picture ideation before brainstorming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-brainstorm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interactive requirements doc generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-plan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Turn requirements into an implementation plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-work&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute the plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-debug&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reproduce, trace, and fix a bug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-code-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-agent pre-merge review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-doc-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Documentation review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-compound&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Document learnings for future sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/ce-product-pulse&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time-windowed usage and error report&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/EveryInc/compound-engineering-plugin" rel="noopener noreferrer"&gt;https://github.com/EveryInc/compound-engineering-plugin&lt;/a&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>claude</category>
      <category>programming</category>
    </item>
    <item>
      <title>MarkItDown: Microsoft's Tool for Converting Almost Anything to Markdown</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Fri, 29 May 2026 14:50:33 +0000</pubDate>
      <link>https://dev.to/arshtechpro/markitdown-microsofts-tool-for-converting-almost-anything-to-markdown-5hf5</link>
      <guid>https://dev.to/arshtechpro/markitdown-microsofts-tool-for-converting-almost-anything-to-markdown-5hf5</guid>
      <description>&lt;p&gt;If you've been building LLM-powered applications, you've likely run into the same problem: your data lives in PDFs, Word documents, Excel sheets, and PowerPoint decks — but your AI pipeline expects clean text. Copy-pasting doesn't scale, and most conversion tools either strip too much structure or produce noisy output.&lt;/p&gt;

&lt;p&gt;Microsoft's &lt;strong&gt;MarkItDown&lt;/strong&gt; is built specifically for this gap. It's a lightweight Python utility that converts a wide range of file formats into Markdown, preserving the structure that matters: headings, tables, lists, and links.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is MarkItDown?
&lt;/h2&gt;

&lt;p&gt;MarkItDown is a Python library (and CLI tool) that converts files and documents into Markdown. It is not designed for pixel-perfect human-readable output. The explicit goal is to feed text into LLMs and text analysis pipelines — and Markdown is the right format for that because most large language models understand it natively and it is highly token-efficient.&lt;/p&gt;

&lt;p&gt;Supported formats include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF&lt;/li&gt;
&lt;li&gt;Word (.docx)&lt;/li&gt;
&lt;li&gt;PowerPoint (.pptx)&lt;/li&gt;
&lt;li&gt;Excel (.xlsx and older .xls)&lt;/li&gt;
&lt;li&gt;Images (EXIF metadata + optional OCR)&lt;/li&gt;
&lt;li&gt;Audio files (EXIF metadata + optional speech transcription)&lt;/li&gt;
&lt;li&gt;HTML&lt;/li&gt;
&lt;li&gt;CSV, JSON, XML&lt;/li&gt;
&lt;li&gt;ZIP files (iterates and converts contents)&lt;/li&gt;
&lt;li&gt;YouTube URLs (fetches transcription)&lt;/li&gt;
&lt;li&gt;EPubs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a broad surface area for one library.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;You need Python 3.10 or higher. The simplest way to get everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'markitdown[all]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;[all]&lt;/code&gt; flag installs all optional dependencies for every supported format. If you want a leaner install, you can pick specific formats:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'markitdown[pdf,docx,pptx]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Available optional extras: &lt;code&gt;pdf&lt;/code&gt;, &lt;code&gt;docx&lt;/code&gt;, &lt;code&gt;pptx&lt;/code&gt;, &lt;code&gt;xlsx&lt;/code&gt;, &lt;code&gt;xls&lt;/code&gt;, &lt;code&gt;outlook&lt;/code&gt;, &lt;code&gt;audio-transcription&lt;/code&gt;, &lt;code&gt;youtube-transcription&lt;/code&gt;, &lt;code&gt;az-doc-intel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is recommended to work inside a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'markitdown[all]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Using the CLI
&lt;/h2&gt;

&lt;p&gt;The command-line interface is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Convert a file and print to stdout&lt;/span&gt;
markitdown report.pdf

&lt;span class="c"&gt;# Save output to a file&lt;/span&gt;
markitdown report.pdf &lt;span class="nt"&gt;-o&lt;/span&gt; report.md

&lt;span class="c"&gt;# Pipe input&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;report.pdf | markitdown
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No configuration required for basic use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using the Python API
&lt;/h2&gt;

&lt;p&gt;For programmatic use in your pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;

&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enable_plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financials.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;result.text_content&lt;/code&gt; attribute holds the converted Markdown string.&lt;/p&gt;

&lt;h3&gt;
  
  
  Converting Different File Types
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;

&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Word document
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;proposal.docx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# PowerPoint deck
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slides.pptx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# CSV file
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# HTML file
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is consistent regardless of file type. You call &lt;code&gt;.convert()&lt;/code&gt; and get back a result object.&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM-Powered Image Descriptions
&lt;/h2&gt;

&lt;p&gt;If you pass an image file (or a PowerPoint with images), MarkItDown can call an LLM to generate descriptions for those images, which then become part of the Markdown output. You supply your own client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diagram.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when the actual visual content of an image matters for downstream processing, not just the file metadata.&lt;/p&gt;




&lt;h2&gt;
  
  
  OCR Support via Plugin
&lt;/h2&gt;

&lt;p&gt;For PDFs and Office documents that contain images with embedded text (scanned documents, screenshots inside slides), MarkItDown supports a separate OCR plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;markitdown-ocr
pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;enable_plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scanned_report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OCR plugin uses the same LLM vision pattern as image descriptions — no separate ML libraries or binaries are required.&lt;/p&gt;




&lt;h2&gt;
  
  
  Azure Document Intelligence
&lt;/h2&gt;

&lt;p&gt;For enterprise-grade document parsing (better table extraction, form recognition), MarkItDown integrates with Azure Document Intelligence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CLI&lt;/span&gt;
markitdown report.pdf &lt;span class="nt"&gt;-o&lt;/span&gt; report.md &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;your_endpoint&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;markitdown&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkItDown&lt;/span&gt;

&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkItDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docintel_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your_endpoint&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complex_form.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the right path if you are processing complex financial documents, legal contracts, or forms where structure accuracy is critical.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running with Docker
&lt;/h2&gt;

&lt;p&gt;If you prefer containerized workflows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; markitdown:latest &lt;span class="nb"&gt;.&lt;/span&gt;
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; markitdown:latest &amp;lt; your-file.pdf &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; output.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Plugin Ecosystem
&lt;/h2&gt;

&lt;p&gt;MarkItDown supports third-party plugins. They are disabled by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List installed plugins&lt;/span&gt;
markitdown &lt;span class="nt"&gt;--list-plugins&lt;/span&gt;

&lt;span class="c"&gt;# Enable plugins for a conversion&lt;/span&gt;
markitdown &lt;span class="nt"&gt;--use-plugins&lt;/span&gt; path-to-file.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To find community plugins, search GitHub for &lt;code&gt;#markitdown-plugin&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security Considerations
&lt;/h2&gt;

&lt;p&gt;One thing worth knowing before you integrate this into a server-side application: MarkItDown runs with the privileges of the current process. It can access local files and remote URIs the same way &lt;code&gt;open()&lt;/code&gt; or &lt;code&gt;requests.get()&lt;/code&gt; can.&lt;/p&gt;

&lt;p&gt;The recommendation from the project is to avoid passing untrusted input directly to &lt;code&gt;.convert()&lt;/code&gt;. If you only need to convert local files, use &lt;code&gt;convert_local()&lt;/code&gt;. If you need to handle streams, use &lt;code&gt;convert_stream()&lt;/code&gt;. Prefer the narrowest API for your use case.&lt;/p&gt;

&lt;p&gt;This is standard advice for any file processing library, but it is worth calling out explicitly if you are building a web-facing feature.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth Using?
&lt;/h2&gt;

&lt;p&gt;The honest answer: it depends on what you need it for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MarkItDown is a good fit if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building an LLM pipeline that needs to ingest documents in various formats.&lt;/li&gt;
&lt;li&gt;You want a consistent Python API across PDF, Word, Excel, HTML, and other types without gluing together multiple libraries.&lt;/li&gt;
&lt;li&gt;You need a quick CLI tool to batch-convert files for indexing or embedding.&lt;/li&gt;
&lt;li&gt;You want the flexibility to extend conversion behavior via plugins.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MarkItDown is not the right tool if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need pixel-perfect conversion for human consumption. The project documentation explicitly says the output is meant for text analysis tools, not high-fidelity document rendering.&lt;/li&gt;
&lt;li&gt;You need production OCR without LLM dependencies. The OCR plugin requires an OpenAI-compatible client, which adds latency and cost.&lt;/li&gt;
&lt;li&gt;You are working with heavily formatted documents where layout matters beyond headings and tables (e.g., multi-column academic papers, complex invoice layouts).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Install all formats&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install 'markitdown[all]'&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convert via CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;markitdown file.pdf -o output.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convert via Python&lt;/td&gt;
&lt;td&gt;&lt;code&gt;MarkItDown().convert("file.pdf").text_content&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convert with LLM images&lt;/td&gt;
&lt;td&gt;Pass &lt;code&gt;llm_client&lt;/code&gt; and &lt;code&gt;llm_model&lt;/code&gt; to &lt;code&gt;MarkItDown()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable OCR plugin&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pip install markitdown-ocr&lt;/code&gt;, then &lt;code&gt;enable_plugins=True&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use Azure Doc Intelligence&lt;/td&gt;
&lt;td&gt;Pass &lt;code&gt;docintel_endpoint&lt;/code&gt; to &lt;code&gt;MarkItDown()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run via Docker&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker run --rm -i markitdown:latest &amp;lt; file.pdf &amp;gt; output.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/microsoft/markitdown" rel="noopener noreferrer"&gt;https://github.com/microsoft/markitdown&lt;/a&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>microsoft</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Pi: The Open-Source AI Coding Agent You Probably Haven't Tried Yet</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Tue, 26 May 2026 10:54:21 +0000</pubDate>
      <link>https://dev.to/arshtechpro/pi-the-open-source-ai-coding-agent-you-probably-havent-tried-yet-2h0h</link>
      <guid>https://dev.to/arshtechpro/pi-the-open-source-ai-coding-agent-you-probably-havent-tried-yet-2h0h</guid>
      <description>&lt;p&gt;If you've been following the AI coding agent space, you've likely heard of Claude Code, GitHub Copilot, or Codex. But there's a fast-moving open-source alternative sitting at over 46,000 GitHub stars that deserves a serious look: &lt;strong&gt;pi&lt;/strong&gt;, from &lt;a href="https://github.com/earendil-works/pi" rel="noopener noreferrer"&gt;earendil-works/pi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article walks you through what pi actually is, how to get it running in under five minutes, and whether it's worth adding to your workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Pi?
&lt;/h2&gt;

&lt;p&gt;Pi is a monorepo of tools built for constructing and running AI agents. The centerpiece is a &lt;strong&gt;coding agent CLI&lt;/strong&gt; — a terminal-based assistant that can read your files, write code, run shell commands, and iterate on tasks, all within your actual project directory.&lt;/p&gt;

&lt;p&gt;The repo is built entirely in TypeScript and ships as a set of npm packages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@earendil-works/pi-coding-agent&lt;/code&gt; — the interactive CLI you'll use day to day&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@earendil-works/pi-agent-core&lt;/code&gt; — the agent runtime (tool calling, state management) for building your own agents&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@earendil-works/pi-ai&lt;/code&gt; — a unified LLM API layer that normalizes OpenAI, Anthropic, Google, and others behind one interface&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@earendil-works/pi-tui&lt;/code&gt; — a terminal UI library with differential rendering&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@earendil-works/pi-web-ui&lt;/code&gt; — web components for AI chat interfaces&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Setup in Five Minutes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Node.js installed, and an API key or existing subscription (Claude Pro, ChatGPT Plus, or GitHub Copilot).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @earendil-works/pi-coding-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole install. No Docker, no Python environment, no build step.&lt;/p&gt;

&lt;p&gt;If you prefer another package manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm add &lt;span class="nt"&gt;-g&lt;/span&gt; @earendil-works/pi-coding-agent
&lt;span class="c"&gt;# or&lt;/span&gt;
yarn global add @earendil-works/pi-coding-agent
&lt;span class="c"&gt;# or&lt;/span&gt;
bun add &lt;span class="nt"&gt;-g&lt;/span&gt; @earendil-works/pi-coding-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Authenticate
&lt;/h3&gt;

&lt;p&gt;Pi supports two authentication paths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A — Subscription login (Claude Pro/Max, ChatGPT Plus/Pro, GitHub Copilot):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start pi from any directory and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi
/login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A prompt will appear to select your provider. This stores credentials in &lt;code&gt;~/.pi/agent/auth.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B — API key:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
pi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use &lt;code&gt;OPENAI_API_KEY&lt;/code&gt;, &lt;code&gt;GOOGLE_API_KEY&lt;/code&gt;, or others the same way. The &lt;code&gt;/login&lt;/code&gt; command can also store API keys interactively so you don't need to export them every session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Start a session
&lt;/h3&gt;

&lt;p&gt;Navigate to your project and launch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/your/project
pi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pi starts in interactive mode and loads your project directory as its working context. Type a request and press Enter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize this repository and tell me how to run its checks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Out of the box, the agent has access to four tools: &lt;code&gt;read&lt;/code&gt; (read files), &lt;code&gt;write&lt;/code&gt; (create or overwrite files), &lt;code&gt;edit&lt;/code&gt; (patch files), and &lt;code&gt;bash&lt;/code&gt; (run shell commands). Additional read-only tools like &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;find&lt;/code&gt;, and &lt;code&gt;ls&lt;/code&gt; are available through tool options.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features Worth Knowing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context files
&lt;/h3&gt;

&lt;p&gt;Pi loads &lt;code&gt;AGENTS.md&lt;/code&gt; (or &lt;code&gt;CLAUDE.md&lt;/code&gt;) files at startup to give the model project-specific instructions. You can have a global one in &lt;code&gt;~/.pi/agent/AGENTS.md&lt;/code&gt; and a per-project one in your repo root. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project Instructions&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Run &lt;span class="sb"&gt;`npm run check`&lt;/span&gt; after code changes.
&lt;span class="p"&gt;-&lt;/span&gt; Do not run production migrations locally.
&lt;span class="p"&gt;-&lt;/span&gt; Keep responses concise.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;/reload&lt;/code&gt; inside a session to pick up changes without restarting.&lt;/p&gt;

&lt;h3&gt;
  
  
  File references
&lt;/h3&gt;

&lt;p&gt;Type &lt;code&gt;@&lt;/code&gt; in the editor to fuzzy-search and reference files, or pass them on the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi @src/app.ts @src/app.test.ts &lt;span class="s2"&gt;"Review these together"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can paste images with Ctrl+V (Alt+V on Windows) or drag them into supported terminals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session management
&lt;/h3&gt;

&lt;p&gt;Sessions are saved automatically. Resuming is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi &lt;span class="nt"&gt;-c&lt;/span&gt;         &lt;span class="c"&gt;# Continue most recent session&lt;/span&gt;
pi &lt;span class="nt"&gt;-r&lt;/span&gt;         &lt;span class="c"&gt;# Browse previous sessions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside a session, &lt;code&gt;/fork&lt;/code&gt; and &lt;code&gt;/clone&lt;/code&gt; let you branch the conversation tree — useful when you want to try two different approaches to a problem without losing your current state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-interactive (one-shot) mode
&lt;/h3&gt;

&lt;p&gt;Pi works well in scripts and pipelines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Summarize this codebase"&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;README.md | pi &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Summarize this text"&lt;/span&gt;
pi &lt;span class="nt"&gt;-p&lt;/span&gt; @screenshot.png &lt;span class="s2"&gt;"What's in this image?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For automation, &lt;code&gt;--mode json&lt;/code&gt; gives structured event output and &lt;code&gt;--mode rpc&lt;/code&gt; allows stdin/stdout process integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shell commands mid-session
&lt;/h3&gt;

&lt;p&gt;Prefix a command with &lt;code&gt;!&lt;/code&gt; to run it and send the output to the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;npm run lint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;!!command&lt;/code&gt; to run it without adding the output to the model's context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model switching
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;/model&lt;/code&gt; or Ctrl+L to change models mid-session. Shift+Tab cycles thinking levels. This is useful if you want a fast cheap model for exploration and a smarter one for final implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using Pi as a Library
&lt;/h2&gt;

&lt;p&gt;If you're building something on top of pi rather than using it as a CLI, the SDK path is clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;AuthStorage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;createAgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;ModelRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@earendil-works/pi-coding-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authStorage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;AuthStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelRegistry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ModelRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;authStorage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createAgentSession&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;sessionManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inMemory&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="nx"&gt;authStorage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;modelRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What files are in the current directory?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For non-Node.js integrations, pi supports RPC mode over stdin/stdout with JSONL framing — so you can integrate from any language.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building from Source
&lt;/h2&gt;

&lt;p&gt;If you want to contribute or run from source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/earendil-works/pi.git
&lt;span class="nb"&gt;cd &lt;/span&gt;pi
npm &lt;span class="nb"&gt;install&lt;/span&gt;       &lt;span class="c"&gt;# Install all dependencies&lt;/span&gt;
npm run build     &lt;span class="c"&gt;# Build all packages&lt;/span&gt;
npm run check     &lt;span class="c"&gt;# Lint, format, and type check&lt;/span&gt;
./pi-test.sh      &lt;span class="c"&gt;# Run pi from sources (any directory)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: &lt;code&gt;npm run check&lt;/code&gt; requires a prior &lt;code&gt;npm run build&lt;/code&gt; because the web-ui package needs compiled &lt;code&gt;.d.ts&lt;/code&gt; files from dependencies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth a Try?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes, with some caveats.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pi earns attention for a few concrete reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's genuinely multi-provider.&lt;/strong&gt; Most coding agents are tied to one model provider. Pi normalizes across OpenAI, Anthropic, Google, and others at the API layer, so you can switch without re-learning a tool. If you already pay for Claude Pro or GitHub Copilot, pi can use those subscriptions directly — no extra API costs by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The session model is well-designed.&lt;/strong&gt; Branching, forking, and resuming sessions is something most similar tools handle poorly. Pi treats this as a first-class feature, which matters when you're doing long iterative work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The extensibility story is solid.&lt;/strong&gt; Extensions are TypeScript modules that can add tools, slash commands, event handlers, and custom UI. If the built-in tools don't cover your workflow, you can add to them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it's less compelling:&lt;/strong&gt; The terminal UI won't appeal to everyone, and if you're deeply embedded in VS Code with Copilot already working, the switching cost is real. The documentation is good but spread across many individual files in the repo — there's no single polished docs site yet.&lt;/p&gt;

&lt;p&gt;For developers who want control over their AI tooling, prefer the terminal, or need to build agents programmatically rather than just use them interactively, pi is a serious option. It's the kind of tool that rewards spending an hour with it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Install&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm install -g @earendil-works/pi-coding-agent&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start in project&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cd /project &amp;amp;&amp;amp; pi&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Login (subscription)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/login&lt;/code&gt; inside pi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set API key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;export ANTHROPIC_API_KEY=...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue last session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pi -c&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browse sessions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pi -r&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-shot prompt&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pi -p "your prompt"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switch model&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/model&lt;/code&gt; or Ctrl+L&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run shell command&lt;/td&gt;
&lt;td&gt;&lt;code&gt;!your-command&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reload context files&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/reload&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uninstall&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm uninstall -g @earendil-works/pi-coding-agent&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/earendil-works/pi" rel="noopener noreferrer"&gt;github.com/earendil-works/pi&lt;/a&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>agents</category>
      <category>python</category>
    </item>
    <item>
      <title>cmux: The Native macOS Terminal Built for Running AI Coding Agents in Parallel</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Mon, 25 May 2026 04:11:23 +0000</pubDate>
      <link>https://dev.to/arshtechpro/cmux-the-native-macos-terminal-built-for-running-ai-coding-agents-in-parallel-52il</link>
      <guid>https://dev.to/arshtechpro/cmux-the-native-macos-terminal-built-for-running-ai-coding-agents-in-parallel-52il</guid>
      <description>&lt;p&gt;If you have ever run three Claude Code sessions at the same time in a stock terminal, you know the pain. Notifications are generic ("Claude is waiting for your input" — every single time), tab titles blur together, and there is no good way to tell which agent needs you without clicking into each pane one by one. cmux was built to fix exactly this.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is cmux?
&lt;/h2&gt;

&lt;p&gt;cmux is an open-source, native macOS terminal application built on top of &lt;a href="https://ghostty.org/" rel="noopener noreferrer"&gt;Ghostty&lt;/a&gt;, the GPU-accelerated terminal emulator. It wraps Ghostty's rendering engine (libghostty) in a Swift/AppKit shell and layers on top the features that matter when you are managing multiple AI coding agents simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vertical tab sidebar&lt;/strong&gt; showing git branch, linked PR status, working directory, listening ports, and the latest notification text for each workspace&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-aware notification rings&lt;/strong&gt; — when an agent needs input, its pane gets a blue visual ring and the sidebar tab lights up&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification panel&lt;/strong&gt; with a single keyboard shortcut to jump to the most recent unread agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-app split browser&lt;/strong&gt; with a scriptable API so agents can interact with your dev server directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Socket and CLI API&lt;/strong&gt; to script workspace creation, pane splits, keystrokes, and browser control from anywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reads your existing &lt;code&gt;~/.config/ghostty/config&lt;/code&gt;, so your fonts, themes, and colors carry over instantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Use tmux or iTerm2?
&lt;/h2&gt;

&lt;p&gt;Fair question. Here is the honest comparison.&lt;/p&gt;

&lt;h3&gt;
  
  
  cmux vs tmux
&lt;/h3&gt;

&lt;p&gt;tmux is a terminal multiplexer that runs inside any terminal. It is text-based, highly composable, and works over SSH. It has no native notification system for AI agents — you would need to wire up OSC sequences yourself and build your own status line logic. The tab sidebar in cmux gives you live git branch, PR number, CWD, and agent notification text with zero configuration. tmux also runs inside existing terminals, so you are still at the mercy of whatever notification plumbing that terminal has (or does not have).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;cmux&lt;/th&gt;
&lt;th&gt;tmux&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI agent notification rings&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Manual setup required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vertical sidebar with git/PR status&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (status bar only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU-accelerated rendering&lt;/td&gt;
&lt;td&gt;Yes (libghostty)&lt;/td&gt;
&lt;td&gt;Depends on host terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;In-app browser with scripting API&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native macOS app&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works over SSH&lt;/td&gt;
&lt;td&gt;Not yet&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-platform&lt;/td&gt;
&lt;td&gt;macOS only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  cmux vs iTerm2
&lt;/h3&gt;

&lt;p&gt;iTerm2 is the veteran macOS terminal. It has excellent shell integration, triggers, and a mature notification system. But it is not built with AI agent workflows in mind — notifications do not carry workspace-level context, there is no sidebar showing agent state, and it is not GPU-accelerated. If you live in Claude Code, Codex, or OpenCode all day, iTerm2 will give you a generic macOS notification with no way to quickly surface which of your eight agents actually needs attention.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;cmux&lt;/th&gt;
&lt;th&gt;iTerm2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent notification with visual ring&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sidebar with per-workspace agent status&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU-accelerated (libghostty)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;In-app scriptable browser&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shell integration / triggers&lt;/td&gt;
&lt;td&gt;Via CLI&lt;/td&gt;
&lt;td&gt;Yes, mature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-platform&lt;/td&gt;
&lt;td&gt;macOS only&lt;/td&gt;
&lt;td&gt;macOS only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Yes (AGPL-3.0)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  cmux vs Warp
&lt;/h3&gt;

&lt;p&gt;Warp is a modern Electron-based terminal with AI features built in. It has good UX but uses Electron/Tauri under the hood, which means higher memory usage and slower startup compared to a native Swift app. cmux is intentionally not an AI orchestrator — it is a primitive that gives you the tools to run any agent (Claude Code, Codex, OpenCode, Gemini CLI, Aider, Kiro) side by side without locking you into one workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: DMG (Recommended for First Install)
&lt;/h3&gt;

&lt;p&gt;Download the latest &lt;code&gt;.dmg&lt;/code&gt; from the releases page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/manaflow-ai/cmux/releases/latest/download/cmux-macos.dmg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open it, drag cmux to your Applications folder, and launch it. cmux auto-updates via Sparkle from that point — you only need to download once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Homebrew
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap manaflow-ai/cmux
brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; cmux
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To update later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew upgrade &lt;span class="nt"&gt;--cask&lt;/span&gt; cmux
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;System requirements:&lt;/strong&gt; macOS 14.0 or later, Apple Silicon or Intel.&lt;/p&gt;

&lt;p&gt;On first launch macOS will ask you to confirm opening an app from an identified developer. Click &lt;strong&gt;Open&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up the CLI
&lt;/h2&gt;

&lt;p&gt;The CLI is what lets you script cmux from inside or outside the app. Inside cmux terminals it works automatically. To use it from an external script or CI hook, create a symlink:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo ln&lt;/span&gt; &lt;span class="nt"&gt;-sf&lt;/span&gt; &lt;span class="s2"&gt;"/Applications/cmux.app/Contents/Resources/bin/cmux"&lt;/span&gt; /usr/local/bin/cmux
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cmux list-workspaces
cmux notify &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"cmux CLI is working"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Notification System
&lt;/h2&gt;

&lt;p&gt;This is the core reason to use cmux if you run agents in parallel. There are three ways to send a notification into cmux.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. CLI (Easiest)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cmux notify &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Build Complete"&lt;/span&gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"webpack finished in 4.2s"&lt;/span&gt;
cmux notify &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Claude Code"&lt;/span&gt; &lt;span class="nt"&gt;--subtitle&lt;/span&gt; &lt;span class="s2"&gt;"Waiting"&lt;/span&gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"Agent needs input"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. OSC 777 (Shell / Any Language)
&lt;/h3&gt;

&lt;p&gt;This is the RXVT escape sequence protocol. Works from any shell script or language that can write to stdout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'\e]777;notify;My Title;Message body\a'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shell function you can drop in &lt;code&gt;.zshrc&lt;/code&gt; or &lt;code&gt;.bashrc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cmux_notify&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'\e]777;notify;%s;%s\a'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

cmux_notify &lt;span class="s2"&gt;"Tests passed"&lt;/span&gt; &lt;span class="s2"&gt;"All 142 tests green"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\x1b&lt;/span&gt;&lt;span class="s"&gt;]777;notify;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\x07&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Script done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed 5000 rows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From Node.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`\x1b]777;notify;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\x07`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Build done&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;webpack finished&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. OSC 99 (Kitty Protocol — Richer)
&lt;/h3&gt;

&lt;p&gt;If you need subtitles or notification IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'\e]99;i=1;e=1;d=0;p=title:Build Complete\e\\'&lt;/span&gt;
&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'\e]99;i=1;e=1;d=0;p=subtitle:Project X\e\\'&lt;/span&gt;
&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'\e]99;i=1;e=1;d=1;p=body:All tests passed\e\\'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use OSC 777 for most cases. Use OSC 99 only when you need subtitle fields.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring Up Claude Code Hooks
&lt;/h2&gt;

&lt;p&gt;This is probably the most useful setup step. Claude Code supports lifecycle hooks, so you can fire a cmux notification the moment an agent stops or completes a sub-task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Create the hook script:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.claude/hooks/cmux-notify.sh&lt;/span&gt;
&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# Skip silently if we're not running inside cmux&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-S&lt;/span&gt; /tmp/cmux.sock &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;0

&lt;span class="nv"&gt;EVENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;EVENT_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EVENT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.hook_event_name // "unknown"'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EVENT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_name // ""'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EVENT_TYPE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
    &lt;span class="s2"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        cmux notify &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Claude Code"&lt;/span&gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"Session complete"&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="s2"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOOL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Task"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; cmux notify &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Claude Code"&lt;/span&gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"Agent finished sub-task"&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ~/.claude/hooks/cmux-notify.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2 — Register the hook in Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;~/.claude/settings.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Stop"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~/.claude/hooks/cmux-notify.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~/.claude/hooks/cmux-notify.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Claude Code. Now every time a session completes or a sub-agent finishes, the cmux pane gets a blue notification ring and the sidebar tab lights up. Press &lt;code&gt;Cmd+Shift+U&lt;/code&gt; to jump straight to the most recent unread.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Keyboard Shortcuts
&lt;/h2&gt;

&lt;p&gt;You will use these constantly once you have a few agents running.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Shortcut&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+N&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;New workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+1–8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Jump to workspace by number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Split pane right&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+Shift+D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Split pane down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+Shift+L&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open browser in split&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+I&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open notification panel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+Shift+U&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Jump to latest unread agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+B&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Toggle sidebar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Cmd+Shift+R&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rename workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The In-App Browser
&lt;/h2&gt;

&lt;p&gt;One underrated feature: cmux ships with a split browser pane with a scriptable API ported from Vercel Labs' &lt;a href="https://github.com/vercel-labs/agent-browser" rel="noopener noreferrer"&gt;agent-browser&lt;/a&gt;. Open it with &lt;code&gt;Cmd+Shift+L&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Agents running Claude Code can snapshot the accessibility tree of the browser, get element references, click, fill forms, and evaluate JavaScript — all without leaving the terminal. This is useful when an agent is working against a local dev server and you want it to verify UI changes or run through a form flow directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth Trying?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes, if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run multiple AI coding agents in parallel (Claude Code, Codex, OpenCode, Gemini CLI, Aider)&lt;/li&gt;
&lt;li&gt;Are on macOS and want native performance over an Electron-based terminal&lt;/li&gt;
&lt;li&gt;Already use Ghostty and want agent-aware notifications without switching apps&lt;/li&gt;
&lt;li&gt;Want to script your workspace layout through a CLI or socket API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maybe not yet, if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are on Linux or Windows (cmux is macOS-only, macOS 14+)&lt;/li&gt;
&lt;li&gt;Need SSH session support (not available yet)&lt;/li&gt;
&lt;li&gt;Rely on live process restore after a restart (layout restores but running shells/agents do not resume yet)&lt;/li&gt;
&lt;li&gt;Are happy with a single agent workflow where notifications are not a problem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is young (v0.61 at the time of writing, 4.5k GitHub stars) but actively maintained with 24 releases already shipped. It is free, open source under AGPL-3.0, and auto-updates silently. The nightly build runs alongside the stable app with its own bundle ID if you want to live on the edge.&lt;/p&gt;

&lt;p&gt;If you regularly find yourself clicking through terminal panes to figure out which Claude Code session is blocked, cmux solves that problem specifically and solves it well.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/manaflow-ai/cmux" rel="noopener noreferrer"&gt;https://github.com/manaflow-ai/cmux&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
      <category>programmers</category>
    </item>
    <item>
      <title>Gemini Spark: Google's 24/7 AI Agent Just Changed the Rules (And What It Means for Developers)</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Sun, 24 May 2026 12:17:24 +0000</pubDate>
      <link>https://dev.to/arshtechpro/gemini-spark-googles-247-ai-agent-just-changed-the-rules-and-what-it-means-for-developers-31em</link>
      <guid>https://dev.to/arshtechpro/gemini-spark-googles-247-ai-agent-just-changed-the-rules-and-what-it-means-for-developers-31em</guid>
      <description>&lt;p&gt;Google I/O 2026 had a lot of announcements. New models, redesigned apps, smart glasses. But if you build software for a living, one announcement deserves your full attention: &lt;strong&gt;Gemini Spark&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not because it has a catchy name. Because it represents a real architectural shift in how AI agents work — and because Google just validated a protocol that was originally Anthropic's idea.&lt;/p&gt;

&lt;p&gt;Let me break it down.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Gemini Spark?
&lt;/h2&gt;

&lt;p&gt;Gemini Spark is Google's 24/7 personal AI agent.&lt;/p&gt;

&lt;p&gt;From the Google I/O keynote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It runs on dedicated virtual machines on Google Cloud. And it's 24/7 so you don't need to keep your laptop open. It's powered by Gemini 3.5 and the Google Antigravity harness, which allows it to perform long-horizon tasks easily in the background."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That phrase "long-horizon tasks" is the one developers should fixate on. A standard API call has a lifecycle measured in seconds. Spark's lifecycle is measured in hours and days.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technical Stack
&lt;/h2&gt;

&lt;p&gt;Spark is built on two things that matter here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt; — The newly released model announced at the same I/O. It is optimized for agentic workflows and runs faster than previous generations. Spark uses Flash by default, with Gemini 3.5 Pro support coming later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Antigravity&lt;/strong&gt; — This is the internal orchestration framework Google uses to manage long-running agent tasks. Version 2.0 is now available to external developers. Think of it as Google's answer to the kind of agent harness that tools like LangGraph or CrewAI provide — but designed specifically for tasks that span hours or days rather than seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Can It Actually Do?
&lt;/h2&gt;

&lt;p&gt;Spark is not a chatbot. It is an agent. The distinction matters.&lt;/p&gt;

&lt;p&gt;A chatbot answers a question. An agent receives a goal, breaks it into subtasks, executes those subtasks over time, checks in when needed, and delivers results.&lt;/p&gt;

&lt;p&gt;Concretely, Spark can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draft and send emails using Gmail context&lt;/li&gt;
&lt;li&gt;Read and write Google Docs, Sheets, Slides, and Drive files&lt;/li&gt;
&lt;li&gt;Plan multi-step workflows and execute them in sequence&lt;/li&gt;
&lt;li&gt;Run as an agentic browser inside Chrome (coming later this summer)&lt;/li&gt;
&lt;li&gt;Connect to third-party tools via MCP (more on this below)&lt;/li&gt;
&lt;li&gt;Be reached through email or chat, not just the Gemini app&lt;/li&gt;
&lt;li&gt;Show live task progress through Android Halo (coming later this year)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key design constraint: Spark is built to check with you before taking major actions. You opt in to turning it on, you set the parameters, and it asks for confirmation before high-stakes moves. This is intentional — Google is being cautious about autonomous action at launch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The MCP Angle: This Is the Part Developers Should Care About Most
&lt;/h2&gt;

&lt;p&gt;Here is the headline buried in the keynote that deserves its own section.&lt;/p&gt;

&lt;p&gt;Spark integrates with third-party tools through &lt;strong&gt;MCP — the Model Context Protocol.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MCP was originally an open standard developed and published by Anthropic. It defines how AI models communicate with external tools in a standardized way — essentially a universal adapter so that any AI agent can talk to any tool without custom integration code for every combination.&lt;/p&gt;

&lt;p&gt;Google confirmed that Spark will expand to third-party apps including Canva, OpenTable, and Instacart through MCP, with that support rolling out within weeks of launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does this matter for developers?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you maintain a SaaS product, a developer tool, or any kind of API, you no longer need separate integrations for each AI platform. Build one MCP server, and your tool becomes accessible to every major AI agent runtime on the market.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gemini Spark vs. OpenClaw: Two Different Philosophies
&lt;/h2&gt;

&lt;p&gt;OpenClaw and Gemini Spark are solving the same underlying problem — persistent, autonomous AI agents — but they approach it from opposite directions. Here is a direct comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Gemini Spark&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hosting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Cloud VMs (managed)&lt;/td&gt;
&lt;td&gt;Self-hosted on your own hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;td&gt;MIT-licensed, open source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 3.5 Flash/Pro&lt;/td&gt;
&lt;td&gt;Any LLM (Claude, GPT, Gemini, Llama, 200+ backends)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interface&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini app, email, chat&lt;/td&gt;
&lt;td&gt;WhatsApp, Telegram, Slack, Signal, iMessage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Workspace context&lt;/td&gt;
&lt;td&gt;Local Markdown files on your disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (coming in weeks)&lt;/td&gt;
&lt;td&gt;Community-driven via skills/plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Availability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google AI Ultra subscribers (US first)&lt;/td&gt;
&lt;td&gt;Free, self-hosted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oversight&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google infrastructure&lt;/td&gt;
&lt;td&gt;You own and control everything&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The practical difference:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw is a local-first agent. It runs on your machine, stores memory as plain Markdown files on your disk, and lets you bring any model you want. If you want full control over what the agent can access, how it stores data, and which model powers it, OpenClaw gives you that at zero subscription cost (you pay only for API usage). The tradeoff is that you manage the infrastructure.&lt;/p&gt;

&lt;p&gt;Gemini Spark is a cloud-first managed agent. You do not run anything yourself. Google handles the VMs, the uptime, the orchestration. It runs even when your devices are off. The tradeoff is that you are inside Google's ecosystem, limited to their model, and it requires a Google AI Ultra subscription.&lt;/p&gt;

&lt;p&gt;Neither is strictly better. They serve different developer profiles.&lt;/p&gt;

&lt;p&gt;If you are building personal automation that you want tight control over, runs locally, and integrates with whatever LLM you prefer — OpenClaw is still the more flexible choice.&lt;/p&gt;

&lt;p&gt;If you are deep in Google Workspace, want zero infrastructure management, and need something that can work reliably in the background without a server to maintain — Spark is the more turnkey solution.&lt;/p&gt;

&lt;p&gt;The interesting thing is that MCP may reduce this distinction over time. If Spark can connect to the same MCP servers as Claude Desktop and OpenClaw, then tool access converges even when runtime and hosting remain different.&lt;/p&gt;




&lt;h2&gt;
  
  
  Availability and Access
&lt;/h2&gt;

&lt;p&gt;Spark is still early. Google is rolling it out to trusted testers first, with a beta coming to &lt;strong&gt;Google AI Ultra subscribers in the US&lt;/strong&gt; starting the week of May 26, 2026.&lt;/p&gt;

&lt;p&gt;Timeline for what is coming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Now:&lt;/strong&gt; Trusted tester rollout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next week (US):&lt;/strong&gt; Beta for Google AI Ultra subscribers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coming weeks:&lt;/strong&gt; MCP support for third-party apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Later this summer:&lt;/strong&gt; Chrome agentic browser support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Later this year:&lt;/strong&gt; Android Halo live task progress, Agent Payments Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Agent Payments Protocol is worth noting separately — this will allow Spark to make purchases autonomously within parameters you define. That capability has significant implications for e-commerce and workflow automation, though Google is understandably cautious about rolling it out.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>google</category>
      <category>programming</category>
    </item>
    <item>
      <title>Multica: An Open-Source Platform for Managing AI Coding Agents Like Teammates</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Thu, 21 May 2026 22:07:10 +0000</pubDate>
      <link>https://dev.to/arshtechpro/multica-an-open-source-platform-for-managing-ai-coding-agents-like-teammates-2469</link>
      <guid>https://dev.to/arshtechpro/multica-an-open-source-platform-for-managing-ai-coding-agents-like-teammates-2469</guid>
      <description>&lt;p&gt;If you've been using Claude Code, Codex, or similar AI coding agents, you've probably felt the friction: you paste a prompt, watch the run, babysit the output, copy something into the next prompt, and repeat. It works, but it doesn't scale — and it definitely doesn't feel like working with a team.&lt;/p&gt;

&lt;p&gt;Multica is an open-source project that tries to fix that. The pitch is simple: treat your AI agents the way you treat human teammates. Assign them issues. Watch them post updates. Let them report blockers. Have them compound skills over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Multica Actually Does
&lt;/h2&gt;

&lt;p&gt;At its core, Multica gives your coding agents a place to live inside your team's workflow. Instead of operating a chatbot in isolation, you assign tasks to an agent the same way you'd assign a GitHub issue to a colleague. The agent picks it up, executes it on a runtime (your local machine or a cloud instance), streams progress back in real time, and posts comments when it needs clarification or hits a wall.&lt;/p&gt;

&lt;p&gt;A few things stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task lifecycle management.&lt;/strong&gt; Tasks move through states: enqueue, claim, start, complete, or fail. You're not just running a command and hoping — you have visibility into where each task is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reusable skills.&lt;/strong&gt; When an agent solves something well — a deployment script, a migration pattern, a code review checklist — that solution becomes a reusable skill the whole team can pull from. Skills accumulate over time, which is where the "compound" part of the tagline comes from.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent, multi-workspace.&lt;/strong&gt; You can have multiple agents running on different runtimes, organized into workspaces. Each workspace is isolated with its own issues, agents, and settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vendor-neutral.&lt;/strong&gt; Multica works with Claude Code, Codex, OpenCode, OpenClaw, Hermes, Gemini, Pi, and Cursor Agent. You're not locked into one provider.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture in Plain Terms
&lt;/h2&gt;

&lt;p&gt;The stack is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│   Next.js    │────&amp;gt;│  Go Backend  │────&amp;gt;│   PostgreSQL     │
│   Frontend   │&amp;lt;────│  (Chi + WS)  │&amp;lt;────│   (pgvector)     │
└──────────────┘     └──────┬───────┘     └──────────────────┘
                            │
                     ┌──────┴───────┐
                     │ Agent Daemon │  runs on your machine
                     └──────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js 16 with the App Router&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: Go with the Chi router, sqlc for type-safe queries, and gorilla/websocket for real-time streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: PostgreSQL 17 with pgvector (for skill embeddings and similarity search)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent runtime&lt;/strong&gt;: A local daemon that auto-detects whatever agent CLIs you have on your PATH&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The daemon is the key piece. It bridges your machine (where the actual agent CLI lives) with the Multica server (cloud or self-hosted). When an agent is assigned a task, the server routes it to the appropriate daemon, which spawns the CLI process and streams output back via WebSocket.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Up and Running
&lt;/h2&gt;

&lt;p&gt;Installation is one line:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS / Linux:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;multica-ai/tap/multica
&lt;span class="c"&gt;# or without Homebrew:&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/multica-ai/multica/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://raw.githubusercontent.com/multica-ai/multica/main/scripts/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;multica setup   &lt;span class="c"&gt;# authenticate + start the daemon in one command&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, open the web app, go to &lt;strong&gt;Settings → Runtimes&lt;/strong&gt;, and you should see your machine listed. From there you create an agent (pick a provider and runtime), and you're ready to assign tasks.&lt;/p&gt;

&lt;p&gt;The CLI surface is minimal:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One-shot: configure, authenticate, start daemon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica daemon start&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start the local runtime manually&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica daemon status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check what's running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica issue list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List your workspace issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica issue create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create a new issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;multica update&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pull the latest version&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want to self-host the whole thing (server included), add &lt;code&gt;--with-server&lt;/code&gt; to the install script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/multica-ai/multica/main/scripts/install.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--with-server&lt;/span&gt;
multica setup self-host
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pulls the official Docker images from GHCR. You'll need Docker. The full self-hosting guide lives in &lt;code&gt;SELF_HOSTING.md&lt;/code&gt; in the repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  For Contributors
&lt;/h2&gt;

&lt;p&gt;The dev setup is a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It auto-detects your environment, sets up the &lt;code&gt;.env&lt;/code&gt; file, installs dependencies, runs DB migrations, and starts all services. Prerequisites: Node.js v20+, pnpm v10.28+, Go v1.26+, and Docker.&lt;/p&gt;

&lt;p&gt;The codebase is about 53% TypeScript and 43% Go — frontend and backend are clearly separated, which makes it easy to work on one without touching the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multica vs Going Solo With an Agent CLI
&lt;/h2&gt;

&lt;p&gt;Here's an honest comparison of what Multica adds versus just running &lt;code&gt;claude&lt;/code&gt; or &lt;code&gt;codex&lt;/code&gt; directly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you gain:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A shared board where your whole team sees what agents are working on&lt;/li&gt;
&lt;li&gt;Real-time streaming progress instead of waiting for a long CLI run to finish&lt;/li&gt;
&lt;li&gt;A skills library that accumulates team knowledge rather than living in individual prompts&lt;/li&gt;
&lt;li&gt;Multi-agent routing — different tasks can go to different agents on different machines&lt;/li&gt;
&lt;li&gt;An audit trail: who assigned what, when, and what happened&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What you take on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running a daemon process (or a full server if self-hosting)&lt;/li&gt;
&lt;li&gt;A PostgreSQL database&lt;/li&gt;
&lt;li&gt;The overhead of a web app and task board&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're a solo developer running occasional agent tasks, the CLI alone might be enough. If you're on a team — even a small one — trying to coordinate multiple agents across projects, Multica addresses a real gap.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth Trying?
&lt;/h2&gt;

&lt;p&gt;It depends on where you are with AI agents.&lt;/p&gt;

&lt;p&gt;If you're still experimenting and running agents manually on single tasks, Multica is probably more infrastructure than you need right now. Start with the agent CLI directly and see what breaks.&lt;/p&gt;

&lt;p&gt;If you've gotten past that point and are starting to feel the coordination pain — agents running on different machines, teammates not knowing what's been automated, the same solutions being re-invented in different prompts — that's exactly the gap Multica is designed to fill.&lt;/p&gt;

&lt;p&gt;It's early software (v0.2.x as of this writing), so expect rough edges. But the core loop — assign, execute, report, reuse — is working, and the momentum behind the project is real.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/multica-ai/multica" rel="noopener noreferrer"&gt;https://github.com/multica-ai/multica&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why Reading Food Labels Shouldn't Feel Like Decoding a Chemistry Exam</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Thu, 21 May 2026 11:39:42 +0000</pubDate>
      <link>https://dev.to/arshtechpro/why-reading-food-labels-shouldnt-feel-like-decoding-a-chemistry-exam-1noa</link>
      <guid>https://dev.to/arshtechpro/why-reading-food-labels-shouldnt-feel-like-decoding-a-chemistry-exam-1noa</guid>
      <description>&lt;p&gt;Millions of people with dietary restrictions struggle with food labels every day. Here's the real problem — and how we built &lt;a href="https://apps.apple.com/us/app/safescan-food-allergy-scanner/id6767189799" rel="noopener noreferrer"&gt;SafeScan&lt;/a&gt; to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Struggle at Every Grocery Aisle
&lt;/h2&gt;

&lt;p&gt;If you've ever stood in a grocery store, squinting at a tiny ingredient list, trying to figure out if something is safe to eat — you're not alone.&lt;/p&gt;

&lt;p&gt;For the &lt;strong&gt;79 million Americans&lt;/strong&gt; with food allergies, Millions of people looking for halal options, the growing community of vegans and vegetarians, and families managing multiple dietary needs at once — grocery shopping isn't just shopping. It's a high-stakes guessing game.&lt;/p&gt;

&lt;p&gt;And the labels don't make it easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem: Labels Are Designed for Regulators, Not People
&lt;/h2&gt;

&lt;p&gt;Here's what most people don't realize: food labels are technically accurate, but practically useless for the average consumer trying to avoid specific ingredients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For vegans and vegetarians&lt;/strong&gt;, the challenge goes beyond spotting "meat" or "chicken." Animal-derived ingredients hide behind names most people wouldn't recognize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Casein&lt;/strong&gt; and &lt;strong&gt;whey&lt;/strong&gt; — both from milk, found in "non-dairy" creamers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Carmine&lt;/strong&gt; (or E120) — a red dye made from crushed insects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gelatin&lt;/strong&gt; — derived from animal bones, lurking in gummy candies, marshmallows, and even some yogurts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L-Cysteine&lt;/strong&gt; — an amino acid often sourced from duck feathers, used in commercial bread&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isinglass&lt;/strong&gt; — fish bladder extract used to clarify some wines and beers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A product can say "plant-based" on the front and still contain animal-derived emulsifiers in the fine print.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For halal consumers&lt;/strong&gt;, it gets even more complex. Beyond pork and alcohol (which are relatively easy to spot), there's an entire gray area — &lt;em&gt;mushbooh&lt;/em&gt; (doubtful) — that requires ingredient-level analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Glycerin&lt;/strong&gt; — could be plant-derived or animal-derived. The label won't tell you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mono and diglycerides&lt;/strong&gt; — same problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural flavors&lt;/strong&gt; — one of the most common ingredients in packaged food, and one of the most opaque. Could contain anything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enzymes&lt;/strong&gt; — widely used in cheese and baked goods, often from animal sources with no disclosure required.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's no "halal" or "haram" column on a nutrition label. You're on your own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For people with allergies&lt;/strong&gt;, the stakes are literally life-threatening. The FDA's "Big 9" allergens must be declared, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Peanuts&lt;/strong&gt; can appear as "arachis hypogaea" or "groundnuts"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Milk&lt;/strong&gt; hides behind "lactalbumin," "ghee," or "recaldent"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eggs&lt;/strong&gt; show up as "albumin," "lysozyme," or "meringue powder"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"May contain" warnings&lt;/strong&gt; are voluntary — a manufacturer can choose not to disclose cross-contamination risks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And if you're managing allergies for a child, or for multiple family members with &lt;em&gt;different&lt;/em&gt; restrictions? Multiply that cognitive load by every person, every product, every shopping trip.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built: SafeScan
&lt;/h2&gt;

&lt;p&gt;We got tired of the mental gymnastics. So we built &lt;a href="https://apps.apple.com/us/app/safescan-food-allergy-scanner/id6767189799" rel="noopener noreferrer"&gt;SafeScan&lt;/a&gt; — a free iOS app that turns your phone's camera into a personal food safety analyst.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scan the barcode.&lt;/strong&gt; SafeScan looks up the product from a database of over 3 million food items.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Photograph the ingredient label.&lt;/strong&gt; The app uses on-device OCR to read the actual text — because sometimes the database is incomplete, and the physical label is the ground truth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get a clear verdict.&lt;/strong&gt; Safe. Unsafe. Caution. The app cross-references every ingredient against your personal profile using a curated database of hundreds of allergen synonyms, hidden sources, and dietary restriction rules.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No account required. No data leaves your phone. It works offline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Family Profiles
&lt;/h3&gt;

&lt;p&gt;This was the feature that started the whole project. Real families don't have one set of dietary needs — they have many.&lt;/p&gt;

&lt;p&gt;SafeScan lets you create separate profiles for each family member. Your daughter is allergic to tree nuts and eggs. Your partner keeps halal. You're vegan. One app handles all of it. You can even scan a single product and see the verdict for &lt;em&gt;every&lt;/em&gt; family member at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Ontology Under the Hood
&lt;/h3&gt;

&lt;p&gt;The part we're most proud of (and the part you'll never see) is the allergen ontology — a hand-curated knowledge graph that maps thousands of ingredient names to their actual sources.&lt;/p&gt;

&lt;p&gt;It knows that "surimi" may contain egg. That "stearic acid" can be animal-derived. That "E471" is a mono/diglyceride that could come from pork fat. That "arachis oil" is just another name for peanut oil.&lt;/p&gt;

&lt;p&gt;When you scan a product, you're not just doing a string match against a list of allergens. You're running every ingredient through a multi-strategy lookup that catches what human eyes miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Parents managing food allergies for kids who can't read labels yet&lt;/li&gt;
&lt;li&gt;Navigating religious dietary laws in countries where those laws aren't reflected on packaging&lt;/li&gt;
&lt;li&gt;Vegans and vegetarians who are tired of discovering animal ingredients &lt;em&gt;after&lt;/em&gt; buying something&lt;/li&gt;
&lt;li&gt;Anyone with a "custom avoid" list — whether it's MSG, carrageenan, Red 40, or high-fructose corn syrup&lt;/li&gt;
&lt;li&gt;Families where everyone at the dinner table has different restrictions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Honest Disclaimer
&lt;/h2&gt;

&lt;p&gt;SafeScan is an aid, not a medical device. For severe allergies, always verify with the manufacturer. We built this to reduce the daily cognitive burden of reading labels — not to replace medical advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;SafeScan is &lt;strong&gt;free, ad-free, and private&lt;/strong&gt;. Available on the &lt;a href="https://apps.apple.com/us/app/safescan-food-allergy-scanner/id6767189799" rel="noopener noreferrer"&gt;App Store&lt;/a&gt; for iPhone and iPad.&lt;/p&gt;

&lt;p&gt;If this resonates with you, we'd genuinely appreciate you sharing it with someone who spends too long reading ingredient lists. That's who we built it for.&lt;/p&gt;




</description>
      <category>ios</category>
      <category>swift</category>
      <category>mobile</category>
      <category>ai</category>
    </item>
    <item>
      <title>Understand Anything: Turn Any Codebase Into an Interactive Knowledge Graph</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Tue, 19 May 2026 11:38:23 +0000</pubDate>
      <link>https://dev.to/arshtechpro/understand-anything-turn-any-codebase-into-an-interactive-knowledge-graph-37ed</link>
      <guid>https://dev.to/arshtechpro/understand-anything-turn-any-codebase-into-an-interactive-knowledge-graph-37ed</guid>
      <description>&lt;p&gt;You join a new team. The codebase has 200,000 lines of code, no docs worth reading, and the one engineer who knew everything just left. Where do you start?&lt;/p&gt;

&lt;p&gt;That exact problem is what &lt;strong&gt;Understand Anything&lt;/strong&gt; was built to solve. It is an open-source plugin (15k+ GitHub stars as of May 2026) that scans your project using a multi-agent AI pipeline, builds a structured knowledge graph of every file, function, class, and dependency, and then gives you an interactive visual dashboard to explore it all. The stated goal is refreshingly honest: "graphs that teach, not graphs that impress."&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Actually Does
&lt;/h2&gt;

&lt;p&gt;At its core, Understand Anything does three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structural analysis.&lt;/strong&gt; It maps your codebase as a graph where every file, function, and class is a node. You can click any node to see a plain-English summary of what it does, what depends on it, and where it fits in the architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business domain extraction.&lt;/strong&gt; Beyond code structure, it has a separate domain view that maps how your code relates to real business processes — domains, flows, and steps. This is genuinely useful when you need to explain a system to a non-technical stakeholder or write onboarding docs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge base analysis.&lt;/strong&gt; If your team uses a Karpathy-pattern LLM wiki (markdown files with wikilinks), you can point the tool at it and get a force-directed knowledge graph with community clustering. The tool discovers both explicit links and implicit relationships between concepts.&lt;/p&gt;

&lt;p&gt;Supporting features include guided tours (auto-generated walkthroughs ordered by dependency), fuzzy and semantic search, diff impact analysis to see what your current changes affect, and a persona-adaptive UI that adjusts detail level based on whether you describe yourself as a junior dev, PM, or senior engineer.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Set It Up
&lt;/h2&gt;

&lt;p&gt;Setup is straightforward. The tool works across a wide range of AI coding environments: Claude Code, Cursor, VS Code with GitHub Copilot, Codex, Gemini CLI, and about a dozen others.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code (Native)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add Lum1104/Understand-Anything
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;understand-anything
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  macOS / Linux (for Codex, Gemini CLI, Cursor, Copilot, and others)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/Lum1104/Understand-Anything/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to skip the interactive prompt and target a specific platform directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/Lum1104/Understand-Anything/main/install.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supported platform values: &lt;code&gt;gemini&lt;/code&gt;, &lt;code&gt;codex&lt;/code&gt;, &lt;code&gt;opencode&lt;/code&gt;, &lt;code&gt;pi&lt;/code&gt;, &lt;code&gt;openclaw&lt;/code&gt;, &lt;code&gt;antigravity&lt;/code&gt;, &lt;code&gt;vibe&lt;/code&gt;, &lt;code&gt;vscode&lt;/code&gt;, &lt;code&gt;hermes&lt;/code&gt;, &lt;code&gt;cline&lt;/code&gt;, &lt;code&gt;kimi&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows (PowerShell)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;iwr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-useb&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://raw.githubusercontent.com/Lum1104/Understand-Anything/main/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer clones the repo to &lt;code&gt;~/.understand-anything/repo&lt;/code&gt; and creates the right symlinks for your chosen platform. Restart your CLI or IDE afterward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor and VS Code + Copilot
&lt;/h3&gt;

&lt;p&gt;These two auto-discover the plugin via the &lt;code&gt;.cursor-plugin/plugin.json&lt;/code&gt; and &lt;code&gt;.copilot-plugin/plugin.json&lt;/code&gt; files respectively when you clone the repo. No manual installation step needed — clone and open.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using It Day-to-Day
&lt;/h2&gt;

&lt;p&gt;Once installed, the main commands are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Analyze the entire codebase and build the graph&lt;/span&gt;
/understand

&lt;span class="c"&gt;# Open the interactive dashboard&lt;/span&gt;
/understand-dashboard

&lt;span class="c"&gt;# Ask questions in natural language&lt;/span&gt;
/understand-chat How does the payment flow work?

&lt;span class="c"&gt;# See what your current diff affects&lt;/span&gt;
/understand-diff

&lt;span class="c"&gt;# Deep-dive into a specific file or function&lt;/span&gt;
/understand-explain src/auth/login.ts

&lt;span class="c"&gt;# Generate onboarding docs for new team members&lt;/span&gt;
/understand-onboard

&lt;span class="c"&gt;# Extract business domain flows&lt;/span&gt;
/understand-domain

&lt;span class="c"&gt;# Analyze a markdown wiki knowledge base&lt;/span&gt;
/understand-knowledge ~/path/to/wiki
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multilingual teams, you can generate content in your preferred language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/understand &lt;span class="nt"&gt;--language&lt;/span&gt; zh   &lt;span class="c"&gt;# Simplified Chinese&lt;/span&gt;
/understand &lt;span class="nt"&gt;--language&lt;/span&gt; ja   &lt;span class="c"&gt;# Japanese&lt;/span&gt;
/understand &lt;span class="nt"&gt;--language&lt;/span&gt; ko   &lt;span class="c"&gt;# Korean&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sharing the Graph With Your Team
&lt;/h3&gt;

&lt;p&gt;The generated graph is stored as a JSON file at &lt;code&gt;.understand-anything/knowledge-graph.json&lt;/code&gt;. You can commit it to the repo so teammates skip the pipeline entirely on first use. Exclude the scratch files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.understand-anything/intermediate/
.understand-anything/diff-overlay.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For large graphs (10 MB+), use git-lfs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git lfs &lt;span class="nb"&gt;install
&lt;/span&gt;git lfs track &lt;span class="s2"&gt;".understand-anything/*.json"&lt;/span&gt;
git add .gitattributes .understand-anything/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How the Pipeline Works
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;/understand&lt;/code&gt;, it orchestrates five specialized agents in sequence (a sixth is added for domain extraction):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;project-scanner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Discovers files, detects languages and frameworks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;file-analyzer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extracts functions, classes, imports; produces nodes and edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;architecture-analyzer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Identifies architectural layers (API, Service, Data, UI, Utility)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tour-builder&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generates guided learning tours ordered by dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;graph-reviewer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Validates completeness and referential integrity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain-analyzer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extracts business domains, flows, and process steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;article-analyzer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extracts entities and implicit relationships from wiki articles&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;File analyzers run in parallel — up to 5 concurrent with 20-30 files per batch. It also supports incremental updates, so only files changed since the last run get re-analyzed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Merits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Broad platform support.&lt;/strong&gt; It works natively with Claude Code and has one-line installs for 14 other platforms. If your team uses different editors, everyone can still use the same tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-native but not AI-locked.&lt;/strong&gt; The knowledge graph output is plain JSON. Once generated, the dashboard runs independently. You are not making LLM calls every time you explore the graph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incrementally useful.&lt;/strong&gt; You do not have to commit to using every feature. Running &lt;code&gt;/understand&lt;/code&gt; + &lt;code&gt;/understand-dashboard&lt;/code&gt; alone is already valuable for orientation on an unfamiliar codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team-shareable output.&lt;/strong&gt; Committing the graph to the repo means the analysis work is done once and shared. A new hire can open the dashboard on day one without running the pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actively maintained.&lt;/strong&gt; 14.7k stars, 1.4k forks, 496 commits, a v2.5.0 release in May 2026. The project is not abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language support.&lt;/strong&gt; English, Simplified Chinese, Traditional Chinese, Japanese, and Korean are supported for output, which matters for distributed teams.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Drawbacks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM costs are on you.&lt;/strong&gt; The multi-agent pipeline makes real LLM calls during the analysis phase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graph quality depends on code quality.&lt;/strong&gt; If the codebase has unclear naming, no logical separation of concerns, or is largely procedural scripts, the resulting graph will reflect that chaos rather than clarify it. The tool surfaces structure that exists; it does not invent structure that does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial scan time on large repos.&lt;/strong&gt; Even with parallel processing (5 concurrent agents), scanning a 200,000-line monorepo takes time. The incremental update feature helps on subsequent runs, but the first pass can be slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge graph can become stale.&lt;/strong&gt; Unless you enable &lt;code&gt;--auto-update&lt;/code&gt; (a post-commit hook), the graph drifts from the codebase. Teams that forget to re-run &lt;code&gt;/understand&lt;/code&gt; before major releases will hand out outdated onboarding graphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is It Worth a Try?
&lt;/h2&gt;

&lt;p&gt;For most developers, yes — with some points.&lt;/p&gt;

&lt;p&gt;The most compelling use case is onboarding. A committed knowledge graph means a new team member can open an interactive visual map of the architecture on day one, take a guided tour ordered by dependency, ask natural language questions about how flows work, and get to meaningful contribution faster. That alone is worth the LLM cost of the initial scan.&lt;/p&gt;

&lt;p&gt;The tool is still evolving, the community is active and the source is MIT-licensed, so there is low risk in trying it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Lum1104/Understand-Anything" rel="noopener noreferrer"&gt;https://github.com/Lum1104/Understand-Anything&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;License: MIT&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>OpenSRE: Build Your Own AI Incident-Investigation Agent</title>
      <dc:creator>ArshTechPro</dc:creator>
      <pubDate>Mon, 18 May 2026 09:00:18 +0000</pubDate>
      <link>https://dev.to/arshtechpro/opensre-build-your-own-ai-incident-investigation-agent-4ijd</link>
      <guid>https://dev.to/arshtechpro/opensre-build-your-own-ai-incident-investigation-agent-4ijd</guid>
      <description>&lt;p&gt;Most AI coding tools stop at the editor. They help you write code. But the hardest, most stressful part of running software is not writing it. It is the moment it breaks in production at 2 a.m.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenSRE&lt;/strong&gt; is built for that moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem it solves
&lt;/h2&gt;

&lt;p&gt;When an incident hits, the evidence is scattered. Logs are in Datadog. Metrics are in Grafana. The config change that caused it is in Git. Service dependencies live in your infra layer. Each system saw part of what happened. None of them saw all of it.&lt;/p&gt;

&lt;p&gt;So you do it manually. You pull logs, line up timestamps, ping the colleague who knows that stack, and slowly piece the story together. It takes hours. Under on-call pressure, you often just ship a patch to get the system back up and figure out the real cause later.&lt;/p&gt;

&lt;p&gt;OpenSRE automates that investigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Tracer-Cloud/opensre" rel="noopener noreferrer"&gt;OpenSRE&lt;/a&gt; is an open-source framework, built on LangGraph, for building AI-powered SRE agents that automate incident investigation and root cause analysis. It is Apache 2.0 licensed and maintained by Tracer.&lt;/p&gt;

&lt;p&gt;The point is not a single fixed product. It is a toolkit. You plug in the alerting sources you already use and compose custom investigation workflows tailored to your own infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the investigation runs
&lt;/h2&gt;

&lt;p&gt;When an alert fires, the agent works through a defined sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingest&lt;/strong&gt; the alert from your monitoring or incident system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assemble context&lt;/strong&gt; from logs, metrics, configs, and dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frame failure modes&lt;/strong&gt; the incident could plausibly be.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute investigation queries&lt;/strong&gt; across connected systems, in parallel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate hypotheses&lt;/strong&gt; against the evidence collected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deliver a root cause report&lt;/strong&gt; and recommended next actions, to Slack out of the box.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent tests several hypotheses at once and stops when it has enough confidence to give a clear answer, rather than running forever or guessing early.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it connects to
&lt;/h2&gt;

&lt;p&gt;OpenSRE integrates with the systems that already power modern platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data platform:&lt;/strong&gt; Apache Airflow, Kafka, Spark&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Grafana, Datadog, CloudWatch, Sentry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure:&lt;/strong&gt; Kubernetes, AWS, GCP, Azure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dev tools:&lt;/strong&gt; GitHub&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication:&lt;/strong&gt; Slack, PagerDuty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Adding a new output destination, such as routing reports to PagerDuty or OpsGenie, is described as one of the easiest contributions you can make.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design principles worth noting
&lt;/h2&gt;

&lt;p&gt;OpenSRE leans on a few principles that matter for production use: deterministic investigations, evidence-backed conclusions, parallel hypothesis testing, and fully auditable workflows.&lt;/p&gt;

&lt;p&gt;That last point is important. This is not a black-box LLM that hands you a guess. The investigation is traceable, so you can see why it reached a conclusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;You can try it without touching production. The repo ships a local Grafana plus Loki demo that produces a real root cause report with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Tracer-Cloud/open-sre-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;open-sre-agent
make &lt;span class="nb"&gt;install
&lt;/span&gt;make install-hooks
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
opensre onboard
make local-grafana-live
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;opensre onboard&lt;/code&gt; step walks you through configuring a local LLM provider and optionally validating integrations like Grafana, Datadog, Slack, AWS, GitHub, and Sentry. There is also a bundled demo that skips Docker entirely if you just want to see the flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is it useful?
&lt;/h2&gt;

&lt;p&gt;Promising, with caveats worth being honest about.&lt;/p&gt;

&lt;p&gt;It is the youngest of the new wave of AI-agent tooling, with a smaller community and no tagged releases yet. It is also clearly aimed at data-platform teams, the Airflow, Kafka, and Spark crowd. If that describes your stack and on-call is genuinely painful, the local demo is worth an afternoon.&lt;/p&gt;

&lt;p&gt;Heed the project's own security guidance: use read-only credentials, restrict network exposure, log every investigation, and always review a report before any automated remediation. An agent touching production systems deserves that caution.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;AI agents are moving past the editor and into operations. OpenSRE is an early, open look at what an AI SRE actually involves: not a magic fix-it button, but a structured, auditable investigator that correlates the signals you already have. If incident response on your team still means hours of manual log-correlation, it is a project worth watching and, if your stack fits, worth trying.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
