<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: nwyin</title>
    <description>The latest articles on DEV Community by nwyin (@nwyin).</description>
    <link>https://dev.to/nwyin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F370770%2F0a1198f9-afdc-4cce-b120-bc67d3ff3cf1.jpg</url>
      <title>DEV Community: nwyin</title>
      <link>https://dev.to/nwyin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nwyin"/>
    <language>en</language>
    <item>
      <title>Hive: A Lightweight Multi-Agent Orchestrator</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:25:11 +0000</pubDate>
      <link>https://dev.to/nwyin/hive-a-lightweight-multi-agent-orchestrator-4nb8</link>
      <guid>https://dev.to/nwyin/hive-a-lightweight-multi-agent-orchestrator-4nb8</guid>
      <description>&lt;p&gt;&lt;a href="/static/imgs/multi-agent-map.png" class="article-body-image-wrapper"&gt;&lt;img src="/static/imgs/multi-agent-map.png" alt="A chart with 'N agents (parallelism)' on the x-axis and 'autonomous duration' on the y-axis, showing the evolution of AI coding tools: 2021 (vscode copilot, cursor, windsurf) in the bottom-left, 2025 (claude code launch) in the middle-left, dec 2025 (opus 4.5 + CC) in the center, and a green dot labeled 'we're going here' in the top-right corner"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2025 was the "year of agents".&lt;br&gt;
We focused on making LLMs more autonomous and able to coherently work on tasks for longer and longer time horizons.&lt;/p&gt;

&lt;p&gt;Claude Code made working with LLMs akin to pair programming with a very skilled but inexperienced junior developer.&lt;br&gt;
Some time in December 2025, with the release of Opus 4.5, a step-wise increase in capability became noticeable.&lt;br&gt;
Claude was able to work and verify work by itself for hours at a time.&lt;/p&gt;




&lt;p&gt;One obvious way to parallelize work here is to &lt;code&gt;tmux&lt;/code&gt; many Claude Code instances and have them work on separate issues in different parts of the codebase.&lt;br&gt;
This became so common that Anthropic and TPOT refer to this as "multi-Clauding".&lt;br&gt;
In Steve Yegge's parlance, this is level 6/7 of agentic coding.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tmux&lt;/code&gt; is great, but you run into natural limitations in cognitive overhead.&lt;br&gt;
Even the most brilliant (i.e. ADHD) developers become overwhelmed when trying to steer 8+ different Claudes at once.&lt;br&gt;
It is usually just too hard to figure out how to compose several tracks of work in a way that you can meaningfully parallelize them.&lt;br&gt;
You rapidly have to move back and forth between idea generation, steering, and work review.&lt;br&gt;
The context switching gets to be too much.&lt;/p&gt;

&lt;p&gt;It's possible to manage a handful of Claudes.&lt;br&gt;
And there's this suspicion that, well, surely I should be able to scale this 10x, right?&lt;br&gt;
Surely I can figure out a way to work on 10 things at a time, and if I can figure out how to work on 10 things at a time maybe I can scale it further to 100?&lt;/p&gt;




&lt;p&gt;The solutions in this space are, putting it kindly, immature.&lt;br&gt;
They're hacked together and sometimes very broken, but also show sparks of promise and excitement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;openclaw/openclaw&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Incredibly slop.&lt;br&gt;
Great viral marketing tactics and the skills/plugin harness is genuinely useful.&lt;br&gt;
There are two things that really make me hesitant to draw any real lessons from its codebase.&lt;br&gt;
First, the security breaches of users' wallets and data are not inspiring.&lt;br&gt;
Second, the state of the repo speaks to the level of care and thought put into it (e.g. the fact that PRs are very liberally accepted and there are too many PRs to review, s.t. the maintainers have decided to YOLO merge a lot of them).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;steveyegge/gastown&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's the project that inspired &lt;code&gt;hive&lt;/code&gt;.&lt;br&gt;
It has abundant design docs and shows Steve's excited iteration and tinkering with the ideas around multi-agent coordination.&lt;br&gt;
My only complaint is that it's too complex!&lt;br&gt;
It's trying to solve this issue of coordinating hundreds or thousands of agents at scale.&lt;br&gt;
Me, I'd like to just have 20 working together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;randomlabs/slate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I haven't poked at it too much, and it's not open source, so to really understand it deeply I'd need to reverse engineer its binary.&lt;br&gt;
That being said, the &lt;a href="https://randomlabs.ai/blog/slate" rel="noopener noreferrer"&gt;technical blog&lt;/a&gt; is quite good and demonstrates that the team behind it is really thoughtful and making reasonable tradeoffs in this space.&lt;br&gt;
They're one of the few players who are actively innovating AND sharing about their innovation, which I appreciate.&lt;/p&gt;




&lt;p&gt;There's really only 3 core ideas you need to know to understand &lt;code&gt;hive&lt;/code&gt; (and its sister solutions):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rather than interact with an agent as if it's a pair programmer, you treat it like a project manager&lt;/li&gt;
&lt;li&gt;You maintain some kind of task board/external TODO list with the ability to note what tasks depend on each other&lt;/li&gt;
&lt;li&gt;Agents that implement the work have the ability to gather context themselves, and guardrails to keep them on track (e.g. tests, other models that review work, etc)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;1 helps alleviate the amount of context switching you do between idea generation and steering models.&lt;br&gt;
Rather than having to keep your attention focused on a single instance, following along and steering its implementation, you work ahead of time to clarify intent and plan the implementation out.&lt;br&gt;
Models are strong enough that even with a rough sketch (and clear intent) they can fill in the rest.&lt;/p&gt;

&lt;p&gt;2 is necessary as a way to "externalize" project memory and context.&lt;br&gt;
Every issue can be generated by a fresh agent, which can do the heavy lifting of figuring out all the files that need to be touched, tests that need to be created, etc.&lt;br&gt;
This information can then be handed off to a model with fresh context, which increases the chance of the model one-shotting the feature.&lt;/p&gt;

&lt;p&gt;3 helps with the context switch to reviewing output (because you build thorough testing systems and have a model to competently use that information, you end up doing far less review yourself).&lt;/p&gt;

&lt;p&gt;A lot of libraries and implementations have converged on ideas in this proximity, like &lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;Anthropic's agent teams&lt;/a&gt; or &lt;a href="https://github.com/nwyin/hive/" rel="noopener noreferrer"&gt;hive&lt;/a&gt; or &lt;a href="https://github.com/steveyegge/gastown" rel="noopener noreferrer"&gt;gastown&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;code&gt;hive&lt;/code&gt; has a few features that I think make it special:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Simplicity and hackability as a core principle. The code base is ~10k LOC and designed to run locally, attempting to use minimal resources, and as simple a state machine as possible so you can reason about it and rip it out and change it to your needs.&lt;/li&gt;
&lt;li&gt;Model/harness agnostic. Right now it supports using either &lt;code&gt;codex&lt;/code&gt; or &lt;code&gt;claude&lt;/code&gt;, but there's no reason why you can't bring your own harness (and thus any other model) as well.&lt;/li&gt;
&lt;li&gt;Multi-project/headless delegation of issues. Meaning, you can script and create a meta workflow to work on many projects in parallel. (This is the end goal of &lt;code&gt;gastown&lt;/code&gt;, but at a much larger scale than &lt;code&gt;hive&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Auditability/logging. &lt;code&gt;hive&lt;/code&gt; tracks events, issues, success rates, etc. Meaning that, if one wanted to, you could easily experiment with multiple models and start to figure out which models in which harnesses perform the best for your various problems.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;1 is useful because I'm not quite sure how multi-agent orchestrators will evolve as the base models get stronger and stronger, and it's convenient to be able to rapidly and cheaply test out ideas.&lt;br&gt;
For example, I've thought about having multiple models and runners attempt to implement the same feature.&lt;br&gt;
You can then let an LLM grade and choose one to merge, and collect stats so you can gather data on per-model, per-runner success rates.&lt;/p&gt;

&lt;p&gt;2 is nice to have because the SOTA, frontier models are still playing the game of rotating their first place podium spot every few months.&lt;br&gt;
It's also unclear how different frontier models interact with various harnesses (e.g. I hear GPT 5.4 in Claude Code is quite good).&lt;br&gt;
Being agnostic to both model and harness means you don't have to rewrite the core orchestration code as everyone is rapidly improving and iterating on these other core pieces.&lt;/p&gt;

&lt;p&gt;3+4 together are helpful for debugging and verifying the system is working smoothly.&lt;br&gt;
It's what's enabled me to start using &lt;code&gt;hive&lt;/code&gt; in multiple projects at once &lt;em&gt;with confidence&lt;/em&gt;.&lt;/p&gt;




&lt;p&gt;Future multi-agent systems are going to be far more ergonomic, and enable power users to manage 100s of agents in parallel.&lt;br&gt;
What exactly that looks like, and the kinds of problems that need 100s of agents working together, I don't know yet.&lt;/p&gt;

&lt;p&gt;Agents aren't quite substitutable for humans.&lt;br&gt;
If you get 100 humans together, you can build a 100M dollar company or create some new hit movie or start a revolution.&lt;br&gt;
If you get 100 agents together, you get slop.&lt;br&gt;
Models are not great at generating out-of-distribution, interesting ideas.&lt;br&gt;
Repeatedly chaining models together with no external signal or steering, you end up getting a very "collapsed" output.&lt;/p&gt;

&lt;p&gt;This is definitely solvable though.&lt;br&gt;
It's so exciting to be given the privilege to help figure it out.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
    </item>
    <item>
      <title>On Static Analysis + LLM</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:23:23 +0000</pubDate>
      <link>https://dev.to/nwyin/on-static-analysis-llm-1jek</link>
      <guid>https://dev.to/nwyin/on-static-analysis-llm-1jek</guid>
      <description>&lt;p&gt;Static analysis is understanding your code before running it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above is a trivial program.&lt;br&gt;
At a glance, you can tell that calling &lt;code&gt;main()&lt;/code&gt; will return &lt;code&gt;4&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here are some questions to ponder about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens if you call &lt;code&gt;add&lt;/code&gt; with non-numeric types? i.e. what does &lt;code&gt;add("foo", "bar")&lt;/code&gt; return?&lt;/li&gt;
&lt;li&gt;How do you know about the above?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Static analysis is about answering these questions without having to run the code.&lt;br&gt;
As human programmers, we develop familiarity with the language and runtime.&lt;br&gt;
This lets us answer such questions easily.&lt;/p&gt;

&lt;p&gt;But how does a machine figure out the answer?&lt;/p&gt;



&lt;p&gt;Programs have many representations (the human-friendly syntax being one of many).&lt;br&gt;
Here are some other ways to also represent the above python program:&lt;/p&gt;



&lt;h4&gt;AST&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;Module(
  body=[
    FunctionDef(
      name='add',
      args=arguments(
        args=[
          arg(arg='a'),
          arg(arg='b')]),
      body=[
        Return(
          value=BinOp(
            left=Name(id='a'),
            op=Add(),
            right=Name(id='b')))]),
    FunctionDef(
      name='main',
      args=arguments(),
      body=[
        Return(
          value=Call(
            func=Name(id='add'),
            args=[
              Constant(value=1),
              Constant(value=3)]))])])&lt;/code&gt;&lt;/pre&gt;



&lt;h4&gt;Bytecode&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;Disassembly of &amp;lt;add&amp;gt;:
  1  RESUME              0
  2  LOAD_FAST           0 (a)
     LOAD_FAST           1 (b)
     BINARY_OP           0 (+)
     RETURN_VALUE

Disassembly of &amp;lt;main&amp;gt;:
4 RESUME 0
5 LOAD_GLOBAL 1 (add)
LOAD_SMALL_INT 1
LOAD_SMALL_INT 3
CALL 2
RETURN_VALUE&lt;/code&gt;&lt;/pre&gt;





&lt;p&gt;These representations are "closer to the machine".&lt;br&gt;
They're the same program, but with details that are relevant to a compiler or CPU trying to execute the program.&lt;/p&gt;



&lt;p&gt;Most human programmers never bother with these details.&lt;br&gt;
They're frankly too low-level.&lt;br&gt;
Not much work warrants looking at the AST or bytecode or compiled assembly of a program.&lt;/p&gt;

&lt;p&gt;But!&lt;br&gt;
There are some other representations of programs that perhaps all programms should be aware of.&lt;br&gt;
For example, here's a call-graph of our sample program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.py
  [D] add
  [D] main
main
  [U] add
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;[D]&lt;/code&gt; means defined in, and &lt;code&gt;[U]&lt;/code&gt; means uses.&lt;br&gt;
You read the above as &lt;code&gt;add&lt;/code&gt; and &lt;code&gt;main&lt;/code&gt; being defined in &lt;code&gt;example.py&lt;/code&gt;, and &lt;code&gt;add&lt;/code&gt; being used in &lt;code&gt;main&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Call graphs are very useful for understanding the chain of depencies and dataflow of your program.&lt;br&gt;
In systems that grow to 100s of thousands lines, data being passed across several thousand functions and several hundred files -- call graphs end up being quite useful!&lt;br&gt;
Without actually running or profiling the code, you can answer: what are all the functions and files this data passes through?&lt;br&gt;
If I had to refactor or rethink the approach of how this data flows, what are all the files and functions I'd need to touch?&lt;/p&gt;

&lt;p&gt;Static analysis is a powerful tool for people working on very complicated systems, and don't have the privilege of "just" re-writing it from scratch.&lt;br&gt;
Many systems grow by standing on past foundations -- even if the foundations are quite shaky.&lt;/p&gt;




&lt;p&gt;In the year of 2026, though, we have LLMs.&lt;br&gt;
Can't we just use LLMs for our refactors and rewrites now?&lt;/p&gt;

&lt;p&gt;There is a sense that you can't really do "serious" engineering with LLMs.&lt;br&gt;
Their context windows are too small, their tendency to produce slop too high.&lt;br&gt;
I disagree with these sentiments entirely.&lt;/p&gt;

&lt;p&gt;I truly believe in this idea of a "capability overhang" in models.&lt;br&gt;
That is, naive prompting and context management nerfs a model's ability to perform quite dramatically.&lt;br&gt;
It's like taking a competent engineer and saying, you only get to look at the code for 1 second and then you must come up with solutions to the problem ondemand with no thinking or external tools.&lt;br&gt;
Clearly insane and naive!&lt;/p&gt;

&lt;p&gt;Harnesses like Claude Code/Codex/Opencode make the experience better, giving models hands and envrionments to test their code and iterate, but are still rather restricted due to tool permissions or patterns we encode in our prompts.&lt;br&gt;
In a sense, a model+harness's results is limited by your own competency as an engineer.&lt;br&gt;
How will you know to drive the model to better and better engineering practices if you don't know about them yourself?&lt;/p&gt;

&lt;p&gt;Great engineers are not wizards.&lt;br&gt;
Rather, they're good engineers who use great tools.&lt;/p&gt;

&lt;p&gt;There's a variety of analysis you can do with a program without running or profiling it at all.&lt;br&gt;
Call-graph analysis and control-flow analysis being some of them.&lt;br&gt;
There's "low hanging fruit" in giving models access to these tools that senior+ engineers use to make changes in larger and more complex systems.&lt;br&gt;
In my experience, giving models the ability to analyze codebases via static analysis enables them to more competently plan and execute larger scale refactors.&lt;br&gt;
They're able to reason about the codebase that a senior+ engineer would.&lt;/p&gt;




&lt;p&gt;The patterns I'm experimenting with right now is building better static analysis tools for Python.&lt;br&gt;
Due to the dynamic nature of the language (and the fact that most users of Python are not programmers themselves), the state of these tools are a bit immature compared to any C/LLVM based language.&lt;br&gt;
But it's great that LLMs are turbocharging the development here.&lt;/p&gt;

&lt;p&gt;Already, I've found that LLMs have been able to refactor and come up with reasonable code improvements to a 10k LOC Python project I maintain, just by nature of being able to look at call graphs and control-flow graphs.&lt;/p&gt;

&lt;p&gt;It seems to me that frontier models are probably as competent as a senior or staff level engineer, if given the right prompt and tools and ability to reason for long enough.&lt;br&gt;
Getting this right will "just" be a matter of building tools and formats and representations that are very amenable to LLM reasoning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Hashline vs Replace: Does the Edit Format Matter?</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:22:58 +0000</pubDate>
      <link>https://dev.to/nwyin/hashline-vs-replace-does-the-edit-format-matter-15n2</link>
      <guid>https://dev.to/nwyin/hashline-vs-replace-does-the-edit-format-matter-15n2</guid>
      <description>&lt;p&gt;Can Bölük's &lt;a href="https://blog.can.ac/2026/02/12/the-harness-problem/" rel="noopener noreferrer"&gt;The Harness Problem&lt;/a&gt; showed hashline-style edits (line-number anchored, like &lt;code&gt;4#WB&lt;/code&gt;) outperforming traditional replace-mode edits (old_string/new_string matching) for coding agents.&lt;br&gt;
I've been experimenting with building my own harness (&lt;a href="https://github.com/nwyin/tau" rel="noopener noreferrer"&gt;tau&lt;/a&gt;), and wanted to verify this result and see if I should consider using hashline as the default edit strategy there.&lt;br&gt;
So I built &lt;a href="https://github.com/nwyin/edit-bench" rel="noopener noreferrer"&gt;edit-bench&lt;/a&gt; to test this myself across multiple languages and models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;edit-bench generates mutation-based tests from existing codebases.&lt;br&gt;
You point a script at a directory, and it generates mutations like deleting a statement, flipping a boolean, swapping args, etc.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Languages&lt;/strong&gt;: Python (from &lt;a href="https://github.com/nwyin/hive" rel="noopener noreferrer"&gt;hive&lt;/a&gt;), TypeScript (from &lt;a href="https://github.com/nicepkg/oh-my-pi" rel="noopener noreferrer"&gt;oh-my-pi&lt;/a&gt;), Rust (from &lt;a href="https://github.com/nwyin/irradiate" rel="noopener noreferrer"&gt;irradiate&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: &lt;code&gt;gpt-4.1-mini&lt;/code&gt;, &lt;code&gt;google/gemini-3-flash-preview&lt;/code&gt;, &lt;code&gt;qwen/qwen3.5-397b-a17b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit modes&lt;/strong&gt;: replace (old_string/new_string) vs hashline (line-number anchored)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20 tasks per language&lt;/strong&gt;, single-attempt oneshot runs&lt;/li&gt;
&lt;li&gt;I also recently added fuzzy matching to &lt;code&gt;tau&lt;/code&gt; (trim cascade: &lt;code&gt;trim_end → trim_both → unicode normalization&lt;/code&gt;) and wanted to see if this helps&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Replace mode:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Python&lt;/th&gt;
&lt;th&gt;TypeScript&lt;/th&gt;
&lt;th&gt;Rust&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemini-3-flash&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.5-397b&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-4.1-mini&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hashline mode (from &lt;a href="https://github.com/nwyin/edit-bench/issues/13" rel="noopener noreferrer"&gt;earlier runs&lt;/a&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Python&lt;/th&gt;
&lt;th&gt;TypeScript&lt;/th&gt;
&lt;th&gt;Rust&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemini-3-flash&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.5-397b&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-4.1-mini&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;55%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hashline hurts Python noticeably, and seems roughly neutral on TypeScript and Rust.&lt;br&gt;
The &lt;a href="https://github.com/nwyin/edit-bench/issues/14" rel="noopener noreferrer"&gt;language-dependence&lt;/a&gt; is interesting — Python's significant whitespace might make line-anchored edits more error-prone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does Fuzzy Matching Help?
&lt;/h2&gt;

&lt;p&gt;Apparently not.&lt;/p&gt;

&lt;p&gt;I added trace collection to see if tau's fuzzy trim cascade ever fires during replace-mode runs. Across &lt;strong&gt;114 successful edits&lt;/strong&gt; and &lt;strong&gt;20 failed edits&lt;/strong&gt; (3 models × 3 languages), fuzzy matching triggered &lt;strong&gt;zero times&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Of the 20 failed edits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 had trailing whitespace (theoretically fixable)&lt;/li&gt;
&lt;li&gt;~8 included line numbers in &lt;code&gt;old_string&lt;/code&gt; (model bug)&lt;/li&gt;
&lt;li&gt;~11 had completely hallucinated content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When models get &lt;code&gt;old_string&lt;/code&gt; right, they get whitespace right too.&lt;br&gt;
When they get it wrong, they get it very wrong — trim cascading doesn't help.&lt;/p&gt;

&lt;p&gt;(&lt;a href="https://github.com/nwyin/edit-bench/issues/13#issuecomment-4108661427" rel="noopener noreferrer"&gt;Trace analysis details&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hashline vs replace is not a clear winner either way.&lt;/strong&gt; The effect is language-dependent and model-dependent. Python penalizes hashline; TypeScript is neutral; Rust is a toss-up.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Can's results are hard to generalize.&lt;/strong&gt; The &lt;a href="https://github.com/can1357/oh-my-pi/tree/main/packages/react-edit-benchmark" rel="noopener noreferrer"&gt;react-edit-benchmark&lt;/a&gt; is JavaScript-only and uses an LSP for validation feedback. Our setup (no LSP, multiple languages) shows a different picture. The LSP feedback loop in particular likely confounds. Giving the model type errors to retry against is a meaningful boost that interacts with edit format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fuzzy matching is a non-problem for current models.&lt;/strong&gt; LLMs either reproduce source text exactly or hallucinate something completely different. The whitespace near-miss case that fuzzy matching targets basically doesn't happen in practice.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For current-gen models in contemporary harnesses, edit format is not the bottleneck.&lt;/strong&gt; The gap between models (gemini-3-flash at 90%+ vs gpt-4.1-mini at 55-65%) dwarfs the gap between edit formats. Invest in model selection and prompt engineering before worrying about edit format.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Obligatory disclaimer: small n, not statistically rigorous, treat accordingly.&lt;/p&gt;

&lt;p&gt;All data: &lt;a href="https://github.com/nwyin/edit-bench" rel="noopener noreferrer"&gt;nwyin/edit-bench&lt;/a&gt;, issues &lt;a href="https://github.com/nwyin/edit-bench/issues/13" rel="noopener noreferrer"&gt;#13&lt;/a&gt; and &lt;a href="https://github.com/nwyin/edit-bench/issues/14" rel="noopener noreferrer"&gt;#14&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>llm</category>
      <category>testing</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Notes on Implementing Raft for the First Time</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:22:45 +0000</pubDate>
      <link>https://dev.to/nwyin/notes-on-implementing-raft-for-the-first-time-2ch3</link>
      <guid>https://dev.to/nwyin/notes-on-implementing-raft-for-the-first-time-2ch3</guid>
      <description>&lt;p&gt;I &lt;a href="https://github.com/nwyin/driftwood" rel="noopener noreferrer"&gt;implemented&lt;/a&gt; the Raft consensus algorithm (the poster child of distributed algorithms) in Python.&lt;br&gt;
It's a pretty bad implementation!&lt;br&gt;
But also (somewhat) correct.&lt;/p&gt;

&lt;p&gt;Here are some notes I'd share with anyone else who's interested in taking on a similar challenge.&lt;/p&gt;




&lt;p&gt;In hindsight, these were the most useful resources for learning about Raft and implementing it correctly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://raft.github.io/raft.pdf" rel="noopener noreferrer"&gt;The Raft paper (read up to section 5 and reference figure 2 heavily)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thesquareplanet.com/blog/students-guide-to-raft/" rel="noopener noreferrer"&gt;Students' Guide to Raft&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/etcd-io/raft" rel="noopener noreferrer"&gt;one of the most widely used Raft implementations&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;clone the repo, skim &lt;code&gt;raft.go&lt;/code&gt; and go back and forth with an LLM to understand the code base and design decisions&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;a href="https://eli.thegreenplace.net/2020/implementing-raft-part-0-introduction/" rel="noopener noreferrer"&gt;Eli Bendersky's blog series&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I'd suggest spending an hour or so reading the paper first, then stubbing out some code for a UDP or TCP server that reads incoming bytes and adds them to an array.&lt;br&gt;
I then followed along with Eli's implementation, adding features to my Raft implementation in the same order.&lt;/p&gt;

&lt;p&gt;After getting something that looks like elections working, I started looking for bugs and errors in my understanding of the algorithm.&lt;br&gt;
I'd go back and forth between the students' guide, Figure 2 in the Raft paper, and my implementation, thinking carefully about where my implementation was the same (or differed).&lt;br&gt;
I also heavily used an LLM to review this code, adding material from the above resources into the context.&lt;/p&gt;

&lt;p&gt;Repeat the above process for log replication, persistence, etc.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;re: implementation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I made some simplifying design choices in my implementation.&lt;br&gt;
In no particular order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;each node runs and processes messages on a single thread&lt;/li&gt;
&lt;li&gt;use a "logical clock" to keep track of local "time" on the system (e.g. &lt;code&gt;tick()&lt;/code&gt; and increment a counter local to each node, vs using system time)&lt;/li&gt;
&lt;li&gt;"muddy" the implementation by having everything in one file. e.g network parsing, storage/persistence, the core raft algorithm, and utilities/commands for controlling the node itself&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;2 seems like a sound and correct design choice (logical clocks are what's used in etcd's implementation).&lt;br&gt;
3 is arguably better for learning/pedagogy.&lt;br&gt;
It's nice to have everything in one file so you can see it all at once, and gives you a nice implementation you can rip up and see which abstractions fit the algorithm the best.&lt;/p&gt;

&lt;p&gt;1 is a bit of an egregious choice to me.&lt;br&gt;
It does make the implementation far simpler (you worry less about getting into deadlocks and atomic updates to the node's internal state), but you also end up with something that isn't quite Raft.&lt;br&gt;
For a first implementation, this seems fine.&lt;br&gt;
The algorithm is complex enough and I think you'd rather spend your time debugging logical errors in the core Raft algorithm vs fussing with mutexes.&lt;/p&gt;




&lt;p&gt;I'd consider implementing Raft this way as a ~30-hour project.&lt;br&gt;
The initial reading of the Raft paper and reviewing related materials should take a few hours.&lt;br&gt;
I did the bulk of the coding in ~3 days during the holidays, hacking for about 6-8 hours/day.&lt;br&gt;
I still have some things to polish and improve (e.g. fix some subtle bugs) in the existing implementation, which might be another half a day of work.&lt;/p&gt;

&lt;p&gt;All in all, not too bad for understanding one of the core algorithms that &lt;a href="https://github.com/kubernetes/kubernetes" rel="noopener noreferrer"&gt;powers&lt;/a&gt; &lt;a href="https://github.com/cockroachdb/cockroach" rel="noopener noreferrer"&gt;so much&lt;/a&gt; &lt;a href="https://github.com/rabbitmq/rabbitmq-server" rel="noopener noreferrer"&gt;infrastructure&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>python</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Reverse-Engineering Claude Code Agent Teams: Architecture and Protocol</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:22:01 +0000</pubDate>
      <link>https://dev.to/nwyin/reverse-engineering-claude-code-agent-teams-architecture-and-protocol-o49</link>
      <guid>https://dev.to/nwyin/reverse-engineering-claude-code-agent-teams-architecture-and-protocol-o49</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Claude Code (v2.1.47) ships with an experimental feature called &lt;strong&gt;Agent Teams&lt;/strong&gt;: multiple Claude Code sessions coordinate on shared work through a lead-and-teammates topology. I've been building &lt;a href="https://github.com/nwyin/hive" rel="noopener noreferrer"&gt;Hive&lt;/a&gt;, a multi-agent coding orchestrator with similar goals but a very different architecture, so I wanted to understand how Anthropic's approach works under the hood.&lt;/p&gt;

&lt;p&gt;This post documents what I found through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reading the &lt;a href="https://code.claude.com/docs/en/agent-teams" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Examining actual artifacts left on disk by previous team sessions&lt;/li&gt;
&lt;li&gt;Letting Claude analyze the Claude Code binary (v2.1.47) for implementation details (hah!)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1. Architecture Overview&lt;/li&gt;
&lt;li&gt;2. The Shared Task List&lt;/li&gt;
&lt;li&gt;3. Inter-Agent Communication&lt;/li&gt;
&lt;li&gt;4. Agent Spawning and Lifecycle&lt;/li&gt;
&lt;li&gt;5. Quality Gates and Hooks&lt;/li&gt;
&lt;li&gt;6. Token Economics&lt;/li&gt;
&lt;li&gt;7. Architecture Summary&lt;/li&gt;
&lt;li&gt;Sources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. Architecture Overview
&lt;/h2&gt;

&lt;p&gt;An agent team consists of four components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Team lead&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The main Claude Code session that creates the team, spawns teammates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Teammates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Separate Claude Code instances, each with its own context window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task list&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared work items stored as individual JSON files on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mailbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-agent inbox files for message delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The entire coordination layer is &lt;strong&gt;file-based&lt;/strong&gt;. The filesystem at &lt;code&gt;~/.claude/&lt;/code&gt; is the sole coordination substrate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/
├── teams/{team-name}/
│   ├── config.json                  # team membership registry
│   └── inboxes/{agent-name}.json    # per-agent mailbox
└── tasks/{team-name}/
    ├── .lock                        # flock() for concurrent task claiming
    ├── .highwatermark               # auto-increment counter
    ├── 1.json                       # individual task files
    ├── 2.json
    └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a fundamentally &lt;strong&gt;decentralized&lt;/strong&gt; design. The lead is just another Claude session with extra tools (&lt;code&gt;TeamCreate&lt;/code&gt;, &lt;code&gt;TeamDelete&lt;/code&gt;, &lt;code&gt;SendMessage&lt;/code&gt;). There is no background process. Coordination emerges from shared file access.&lt;/p&gt;

&lt;p&gt;In an active session, if you ask Claude to spin up a team to do some kind of task and then run the following in another window, you can observe the filesystem update in real time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;watch &lt;span class="nt"&gt;-n&lt;/span&gt; 0.5 &lt;span class="s1"&gt;'tree ~/.claude/teams/ 2&amp;gt;/dev/null; echo "---"; tree ~/.claude/tasks/ 2&amp;gt;/dev/null'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, with the following prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;can you spanw an agent team to examine this code base?
  - have one look for bugs
  - have one look for complexity
  - have one look for good things to call out and play devil's advocate against the other two agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I observed this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;teams
└── code-review
    ├── config.json
    └── inboxes
        ├── bug-hunter.json
        ├── complexity-analyst.json
        ├── devils-advocate.json
        └── team-lead.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Team Config
&lt;/h3&gt;

&lt;p&gt;The team config at &lt;code&gt;~/.claude/teams/{team-name}/config.json&lt;/code&gt; contains a &lt;code&gt;members&lt;/code&gt; array that teammates read to discover each other:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"members"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"team-lead"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"agentId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"abc-123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"agentType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"leader"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"researcher"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"agentId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"def-456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"agentType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"general-purpose"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Names are the primary addressing mechanism (UUIDs exist but aren't used for routing). All messaging and task assignment uses the &lt;code&gt;name&lt;/code&gt; field.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Shared Task List
&lt;/h2&gt;

&lt;h3&gt;
  
  
  File Format
&lt;/h3&gt;

&lt;p&gt;Each task is stored as an individual JSON file in &lt;code&gt;~/.claude/tasks/{team-name}/&lt;/code&gt;. Here's a real example from a previous session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hunt for bugs across the codebase"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"activeForm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hunting for bugs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bug-hunter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"blocks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"blockedBy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Task schema:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Numeric ID, auto-incremented via &lt;code&gt;.highwatermark&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;subject&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Imperative-form title (e.g., "Run tests")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;description&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Detailed requirements and acceptance criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;activeForm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Present-continuous form for spinner display ("Running tests")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pending&lt;/code&gt; → &lt;code&gt;in_progress&lt;/code&gt; → &lt;code&gt;completed&lt;/code&gt; (or &lt;code&gt;deleted&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;blocks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string[]&lt;/td&gt;
&lt;td&gt;Task IDs that this task blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;blockedBy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string[]&lt;/td&gt;
&lt;td&gt;Task IDs that must complete before this task can start&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Concurrency Control
&lt;/h3&gt;

&lt;p&gt;Two special files provide coordination:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.lock&lt;/code&gt;&lt;/strong&gt;: A 0-byte file used for filesystem-level mutual exclusion (&lt;code&gt;flock()&lt;/code&gt;). Present in all 42 task directories observed on my machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.highwatermark&lt;/code&gt;&lt;/strong&gt;: Contains a single integer (e.g., &lt;code&gt;"3"&lt;/code&gt;, &lt;code&gt;"13"&lt;/code&gt;). The next available task ID for auto-incrementing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Task Claiming
&lt;/h3&gt;

&lt;p&gt;Task claiming uses file locking to prevent race conditions. Teammates prefer lowest-ID-first ordering. A task with a non-empty &lt;code&gt;blockedBy&lt;/code&gt; array cannot be claimed until all blocking tasks are in a terminal state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation: Most Task Directories Are Empty
&lt;/h3&gt;

&lt;p&gt;Of 42 task directories on my machine, only 5 contained actual task JSON files. The remaining 37 had only &lt;code&gt;.lock&lt;/code&gt; and &lt;code&gt;.highwatermark&lt;/code&gt;. This likely means tasks are cleaned up after completion, or these were sessions where Claude used the internal task list (available since the task list feature launch) without decomposing into subtask files.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Inter-Agent Communication
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mailbox Pattern
&lt;/h3&gt;

&lt;p&gt;Each agent has a JSON array file at &lt;code&gt;~/.claude/teams/{team-name}/inboxes/{agent-name}.json&lt;/code&gt;. Here's a real inbox from a previous session where a team-lead dispatched work to a &lt;code&gt;controlplane-agent&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"team-lead"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;task_assignment&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;taskId&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;subject&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Phase 2: Control-plane - remove participants/presence&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;description&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Remove multiplayer code from the control-plane package...&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;assignedBy&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;team-lead&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;2026-02-18T02:37:16.890Z&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T02:37:16.890Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the &lt;strong&gt;JSON-in-JSON&lt;/strong&gt; encoding: the &lt;code&gt;text&lt;/code&gt; field is a JSON string containing a serialized message object. The outer envelope has &lt;code&gt;from&lt;/code&gt;, &lt;code&gt;text&lt;/code&gt;, &lt;code&gt;timestamp&lt;/code&gt;, and &lt;code&gt;read&lt;/code&gt; fields.&lt;/p&gt;

&lt;h3&gt;
  
  
  Message Types
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;type&lt;/code&gt; field inside the &lt;code&gt;text&lt;/code&gt; payload supports:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;task_assignment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;lead → teammate&lt;/td&gt;
&lt;td&gt;Assign a task with full details&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;message&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;any → any&lt;/td&gt;
&lt;td&gt;Direct message to one recipient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;broadcast&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;lead → all&lt;/td&gt;
&lt;td&gt;Same message to every teammate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;shutdown_request&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;lead → teammate&lt;/td&gt;
&lt;td&gt;Request graceful shutdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;shutdown_response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;teammate → lead&lt;/td&gt;
&lt;td&gt;Approve or reject shutdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;plan_approval_request&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;teammate → lead&lt;/td&gt;
&lt;td&gt;Submit plan for review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;plan_approval_response&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;lead → teammate&lt;/td&gt;
&lt;td&gt;Approve or reject with feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;idle_notification&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;teammate → lead&lt;/td&gt;
&lt;td&gt;Auto-sent when teammate's turn ends&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Delivery Mechanism
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Write path&lt;/strong&gt;: The sender appends a new entry to the recipient's inbox JSON array file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read path&lt;/strong&gt;: The recipient polls their own inbox file. New messages are injected as synthetic conversation turns (they appear as if a user sent them).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broadcast&lt;/strong&gt;: Literally writes the same message to every teammate's inbox file. Token cost scales linearly with team size.&lt;/p&gt;

&lt;p&gt;Communication is just file append + file read. Latency between send and receive depends on the recipient's poll interval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Peer DM Visibility
&lt;/h3&gt;

&lt;p&gt;When a teammate sends a DM to another teammate, a brief summary is included in the lead's idle notification. This gives the lead visibility into peer collaboration without the full message content.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Agent Spawning and Lifecycle
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How Teammates Are Created
&lt;/h3&gt;

&lt;p&gt;Each teammate is a &lt;strong&gt;separate &lt;code&gt;claude&lt;/code&gt; CLI process&lt;/strong&gt;. The lead spawns them via the &lt;code&gt;Task&lt;/code&gt; tool with &lt;code&gt;team_name&lt;/code&gt; and &lt;code&gt;name&lt;/code&gt; parameters. Environment variables are set on the spawned process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CLAUDE_CODE_TEAM_NAME&lt;/code&gt;: auto-set on spawned teammates&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CLAUDE_CODE_PLAN_MODE_REQUIRED&lt;/code&gt;: set to &lt;code&gt;true&lt;/code&gt; if plan approval is required&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Context Initialization
&lt;/h3&gt;

&lt;p&gt;Teammates load the same project context as any fresh session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt; files from the working directory&lt;/li&gt;
&lt;li&gt;MCP servers&lt;/li&gt;
&lt;li&gt;Skills&lt;/li&gt;
&lt;li&gt;The spawn prompt from the lead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The lead's conversation history does NOT carry over.&lt;/strong&gt; Each teammate starts fresh with only the spawn prompt as context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal Implementation
&lt;/h3&gt;

&lt;p&gt;From binary analysis of Claude Code v2.1.47, the teammate context is managed via &lt;code&gt;AsyncLocalStorage&lt;/code&gt; with these fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;agentId&lt;/code&gt;, &lt;code&gt;agentName&lt;/code&gt;, &lt;code&gt;teamName&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;parentSessionId&lt;/code&gt;, &lt;code&gt;color&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;planModeRequired&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key internal functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;isTeammate()&lt;/code&gt; / &lt;code&gt;isTeamLead()&lt;/code&gt;: role detection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;waitForTeammatesToBecomeIdle()&lt;/code&gt;: synchronization primitive for the lead&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getTeammateContext()&lt;/code&gt; / &lt;code&gt;setDynamicTeamContext()&lt;/code&gt;: runtime context management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Idle Detection
&lt;/h3&gt;

&lt;p&gt;After every LLM turn, a teammate automatically goes idle and sends an &lt;code&gt;idle_notification&lt;/code&gt; to the lead. This is the normal resting state, rather than an error or staleness condition. Sending a message to an idle teammate wakes it (the next poll cycle picks up the inbox message).&lt;/p&gt;

&lt;h3&gt;
  
  
  Shutdown Protocol
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Lead sends &lt;code&gt;shutdown_request&lt;/code&gt; to a teammate&lt;/li&gt;
&lt;li&gt;Teammate can approve (exits gracefully) or reject (continues working with an explanation)&lt;/li&gt;
&lt;li&gt;Team cleanup via &lt;code&gt;TeamDelete&lt;/code&gt; removes &lt;code&gt;~/.claude/teams/{team-name}/&lt;/code&gt; and &lt;code&gt;~/.claude/tasks/{team-name}/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Cleanup fails if any teammates are still active; they must be shut down first&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Permission Inheritance
&lt;/h3&gt;

&lt;p&gt;Teammates inherit the lead's permission mode at spawn time. If the lead runs &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;, all teammates do too. Individual modes can be changed post-spawn but not configured per-teammate at spawn time.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Quality Gates and Hooks
&lt;/h2&gt;

&lt;p&gt;Agent Teams integrates with Claude Code's hook system for quality enforcement:&lt;/p&gt;

&lt;h3&gt;
  
  
  TeammateIdle Hook
&lt;/h3&gt;

&lt;p&gt;Fires when a teammate is about to go idle. Exit code 2 sends stderr as feedback and prevents idle, keeping the teammate working.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hook_event_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"TeammateIdle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"teammate_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"researcher"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"team_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-project"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  TaskCompleted Hook
&lt;/h3&gt;

&lt;p&gt;Fires when a task is being marked complete. Exit code 2 prevents completion and feeds stderr back as feedback.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hook_event_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"TaskCompleted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task_subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implement user authentication"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task_description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add login and signup endpoints"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"teammate_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"implementer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"team_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-project"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This fires in two situations: (1) when any agent explicitly marks a task completed via &lt;code&gt;TaskUpdate&lt;/code&gt;, or (2) when an agent team teammate finishes its turn with in-progress tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hook Handler Types
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shell script. JSON on stdin, exit codes for decisions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Single-turn LLM evaluation. Returns &lt;code&gt;{ok, reason}&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-turn subagent with read tools. Up to 50 turns.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Token Economics
&lt;/h2&gt;

&lt;p&gt;Agent teams use &lt;strong&gt;approximately 7× more tokens&lt;/strong&gt; than standard sessions when teammates run in plan mode. Each teammate maintains its own full context window as a separate Claude instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Baseline Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Average Claude Code usage: ~$6/developer/day&lt;/li&gt;
&lt;li&gt;Agent teams: roughly proportional to team size on top of baseline&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  7. Architecture Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Claude Code Agent Teams&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Coordination substrate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flat files (&lt;code&gt;~/.claude/tasks/&lt;/code&gt;, &lt;code&gt;~/.claude/teams/&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One JSON file per task + &lt;code&gt;.lock&lt;/code&gt; for claiming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Messaging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSON inbox files (append + poll)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent lifecycle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-managing CLI processes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Work isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared working directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Merge strategy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None (agents edit files directly)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retry/escalation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual (lead decides, or user intervenes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Topology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lead + flat peers, peer-to-peer messaging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scheduling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-claim (teammates grab next task)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;State durability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Files only; no in-process teammate resumption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Quality gates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shell hooks (&lt;code&gt;TeammateIdle&lt;/code&gt;, &lt;code&gt;TaskCompleted&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token tracking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-session only, no cross-agent aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stall detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual (user notices teammate stopped)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implicit (team size = teammate count)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependency model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;blocks&lt;/code&gt;/&lt;code&gt;blockedBy&lt;/code&gt; on task files&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#architecture" rel="noopener noreferrer"&gt;Teams of Claude Code sessions: Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/interactive-mode#task-list" rel="noopener noreferrer"&gt;Interactive mode — Task list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#assign-and-claim-tasks" rel="noopener noreferrer"&gt;Agent teams — Assign and claim tasks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#context-and-communication" rel="noopener noreferrer"&gt;Agent teams — Context and communication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#shut-down-teammates" rel="noopener noreferrer"&gt;Agent teams — Shut down teammates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#clean-up-the-team" rel="noopener noreferrer"&gt;Agent teams — Clean up the team&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#permissions" rel="noopener noreferrer"&gt;Agent teams — Permissions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#limitations" rel="noopener noreferrer"&gt;Agent teams — Limitations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/agent-teams#avoid-file-conflicts" rel="noopener noreferrer"&gt;Agent teams — Best practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;Hooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/costs#agent-team-token-costs" rel="noopener noreferrer"&gt;Costs — Agent team token costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/settings" rel="noopener noreferrer"&gt;Settings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Sub-agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  On-Disk Artifacts (Claude Code v2.1.47)
&lt;/h3&gt;

&lt;p&gt;Observed at &lt;code&gt;/Users/tau/.claude/&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Team directories with &lt;code&gt;config.json&lt;/code&gt; and &lt;code&gt;inboxes/{agent-name}.json&lt;/code&gt; files&lt;/li&gt;
&lt;li&gt;Task directories with &lt;code&gt;.lock&lt;/code&gt;, &lt;code&gt;.highwatermark&lt;/code&gt;, and individual task JSON files&lt;/li&gt;
&lt;li&gt;Sample task assignment message from &lt;code&gt;team-lead&lt;/code&gt; to &lt;code&gt;cp-agent&lt;/code&gt;, timestamped &lt;code&gt;2026-02-18T02:37:16.890Z&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Binary Analysis
&lt;/h3&gt;

&lt;p&gt;Claude Code binary v2.1.47. Internal functions identified via string analysis: &lt;code&gt;getTeamName&lt;/code&gt;, &lt;code&gt;getAgentName&lt;/code&gt;, &lt;code&gt;getAgentId&lt;/code&gt;, &lt;code&gt;isTeammate&lt;/code&gt;, &lt;code&gt;isTeamLead&lt;/code&gt;, &lt;code&gt;waitForTeammatesToBecomeIdle&lt;/code&gt;, &lt;code&gt;getTeammateContext&lt;/code&gt;, &lt;code&gt;setDynamicTeamContext&lt;/code&gt;, &lt;code&gt;createTeammateContext&lt;/code&gt;. AsyncLocalStorage context fields: &lt;code&gt;agentId&lt;/code&gt;, &lt;code&gt;agentName&lt;/code&gt;, &lt;code&gt;teamName&lt;/code&gt;, &lt;code&gt;parentSessionId&lt;/code&gt;, &lt;code&gt;color&lt;/code&gt;, &lt;code&gt;planModeRequired&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hive Codebase
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/nwyin/hive/blob/main/docs/TECHNICAL_DESIGN_DOC.md" rel="noopener noreferrer"&gt;Hive Technical Design Doc&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Agent Use Patterns</title>
      <dc:creator>nwyin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:22:00 +0000</pubDate>
      <link>https://dev.to/nwyin/agent-use-patterns-1p75</link>
      <guid>https://dev.to/nwyin/agent-use-patterns-1p75</guid>
      <description>&lt;p&gt;It's a tricky thing, managing so many agents.&lt;br&gt;
Too many things can go wrong!&lt;/p&gt;

&lt;p&gt;But it's also clear that there are a lot of ways things can go right.&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;work queue + orchestrator&lt;/li&gt;
&lt;li&gt;polling agent/"cron job"&lt;/li&gt;
&lt;li&gt;message queue + responder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;what are the patterns that make using these things useful?&lt;br&gt;
memory + context management&lt;/p&gt;

&lt;p&gt;CLI TOOLS ARE A MUST&lt;/p&gt;

&lt;p&gt;CLI is ultimate ergonomics for text-based beings; if you can wrap a CLI over your workflow, the agents become so much better&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>cli</category>
    </item>
  </channel>
</rss>
