<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: WonderLab</title>
    <description>The latest articles on DEV Community by WonderLab (@wonderlab).</description>
    <link>https://dev.to/wonderlab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3797373%2F25beba30-d8d4-4d2e-9ec6-170356089350.jpg</url>
      <title>DEV Community: WonderLab</title>
      <link>https://dev.to/wonderlab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wonderlab"/>
    <language>en</language>
    <item>
      <title>Workflow Series (04): Multi-Agent Coordination — Orchestrator Boundaries, Concurrency Control, and Context Isolation</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Thu, 02 Jul 2026 02:17:55 +0000</pubDate>
      <link>https://dev.to/wonderlab/workflow-series-04-multi-agent-coordination-orchestrator-boundaries-concurrency-control-and-5bjk</link>
      <guid>https://dev.to/wonderlab/workflow-series-04-multi-agent-coordination-orchestrator-boundaries-concurrency-control-and-5bjk</guid>
      <description>&lt;h2&gt;
  
  
  Orchestrator Responsibility Boundaries
&lt;/h2&gt;

&lt;p&gt;Unclear division between the Orchestrator (main Agent) and Subagents is the most common design problem in multi-agent workflows.&lt;/p&gt;

&lt;p&gt;The Orchestrator does three things:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Decide: read state, determine the next step
2. Dispatch: spawn subagents, pass task prompts
3. Collect: read subagent output files, update state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It doesn't execute business logic (analyze bugs, write code, query logs), read raw files, or modify business data. Those belong to subagents.&lt;/p&gt;

&lt;p&gt;The main Agent receives only &lt;strong&gt;structured conclusions&lt;/strong&gt; (JSON). Subagents report through &lt;code&gt;output.json&lt;/code&gt;, not message streams.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct: main Agent reads structured conclusion
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase3/analysis_final.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;proceed_to_phase4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ Wrong: main Agent reads raw logs
&lt;/span&gt;&lt;span class="n"&gt;log_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crash.log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# 100k lines into the main Agent's context
&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# this is subagent work
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This boundary produces two benefits: the main Agent's context stays manageable (state and conclusions, no raw data); subagents' business logic can be tested independently, without the main Agent's session history.&lt;/p&gt;




&lt;h2&gt;
  
  
  Subagent Design Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Principle 1: Input Completeness
&lt;/h3&gt;

&lt;p&gt;The task prompt must contain everything the subagent needs to complete its task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# ❌ Incomplete task prompt&lt;/span&gt;
Analyze the root cause of this bug. Refer to previous analysis results.

&lt;span class="gh"&gt;# ✅ Complete task prompt&lt;/span&gt;
&lt;span class="gu"&gt;## Task&lt;/span&gt;
Analyze the root cause of the following bug.

&lt;span class="gu"&gt;## Input&lt;/span&gt;
Bug info:
{{ bug_info.summary }}
{{ bug_info.stack_trace }}

Log directory: {{ log_dir }}

&lt;span class="gu"&gt;## Output requirements&lt;/span&gt;
Write to analysis_final.json, format:
{"confidence": float, "root_cause": str, "evidence": [str]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Refer to previous analysis results" requires the subagent to access the main Agent's context history — which doesn't exist in an isolated session. Each subagent knows only what's in its task prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 2: Output Contract Strictness
&lt;/h3&gt;

&lt;p&gt;Subagents must write their output files in the declared JSON Schema. The main Agent's routing logic depends on this schema; missing fields or wrong types break the decision logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Subagent output schema (defined in templates/)
&lt;/span&gt;&lt;span class="n"&gt;OUTPUT_SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# required — main Agent routing depends on this
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# required — range 0-1
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;root_cause&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# required
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;    &lt;span class="c1"&gt;# required
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;       &lt;span class="c1"&gt;# required on failure
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Principle 3: Structured Error Output on Failure
&lt;/h3&gt;

&lt;p&gt;On failure, subagents must still write an output file with &lt;code&gt;passed=false&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Log file not found: /workspace/logs/crash_20260601.log"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"root_cause"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A missing output file looks like a timeout to the main Agent. Structured error output lets the main Agent distinguish "subagent failed" from "subagent timed out" and respond differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fan-out / Fan-in Concurrency Control
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fan-out Design
&lt;/h3&gt;

&lt;p&gt;Fan-out means one trigger point spawns N concurrent subagents. Two hard constraints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constraint 1: Each subagent writes a different output file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct: each candidate writes to its own file
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;spawn_subagent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bug_info&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase4/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# unique filename
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ Wrong: all candidates write to the same file (concurrent write conflict)
&lt;/span&gt;&lt;span class="nf"&gt;spawn_subagent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase4/result.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;spawn_subagent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase4/result.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# conflict!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Constraint 2: The main Agent waits for all subagents to complete&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After fan-out, the main Agent enters a waiting state and doesn't proceed. When no async runtime is available, polling works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wait_all_candidates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;deadline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;output_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase4/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fan-in: Failure Strategy
&lt;/h3&gt;

&lt;p&gt;When some subagents fail at fan-in, two strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;fail-fast (abort on any failure)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For: all branch results are required; one failure makes the whole batch meaningless&lt;/span&gt;
&lt;span class="na"&gt;phase_parallel_analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;fan_in_strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fail-fast&lt;/span&gt;
  &lt;span class="na"&gt;on_any_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trigger_gate_A&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Three subagents each retrieve data from different sources. Missing any one source makes further analysis impossible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;collect-all (aggregate everything, including failures)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For: partial success is enough; select the best from passing results&lt;/span&gt;
&lt;span class="na"&gt;phase_4_fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;fan_in_strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;collect-all&lt;/span&gt;
  &lt;span class="na"&gt;selection_criteria&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;require_any_passed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;select_by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;max_test_coverage&lt;/span&gt;
  &lt;span class="na"&gt;on_all_failed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trigger_gate_B&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Three code-fix candidates run concurrently. One passing tests is sufficient. Failed candidates are discarded. Only if all three fail does the human gate trigger.&lt;/p&gt;

&lt;h3&gt;
  
  
  Selection Principle
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;All branch results are required          → fail-fast
Partial success is sufficient            → collect-all (code fix, candidate generation)
Comparing multiple results for quality   → collect-all (select best)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Phase 4 fix in a Bug fix workflow uses collect-all: three candidates run concurrently, the one with highest test coverage among passing candidates is selected, and the human gate only triggers when all three fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Isolation
&lt;/h2&gt;

&lt;p&gt;Subagents must run in &lt;strong&gt;isolated sessions&lt;/strong&gt; with no access to the main Agent's conversation history.&lt;/p&gt;

&lt;p&gt;The main Agent's context holds the full workflow history: all file contents, all subagent raw outputs, all intermediate decisions. Passing this to a subagent writing one patch causes context to balloon from a few thousand tokens to tens of thousands, doubles token cost, and lets irrelevant history degrade the subagent's focus.&lt;/p&gt;

&lt;p&gt;Information flows in two directions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Main Agent
  │
  │ task prompt (only the fields the subagent needs)
  ▼
Subagent (isolated session, no history)
  │
  │ output_file (JSON at agreed path)
  ▼
Main Agent (reads file, not conversation history)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The subagent knows what's in its task prompt and the agreed output path. It doesn't know what the main Agent did, how far the workflow has progressed, or what other subagents produced.&lt;/p&gt;

&lt;p&gt;If a subagent needs to "understand the background" to complete its task, the task prompt is incomplete. Put that background in explicitly — don't count on the subagent accessing history it can't see.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Orchestrator responsibilities&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Main Agent reads only structured JSON output — no raw logs or long text&lt;/li&gt;
&lt;li&gt;[ ] Main Agent doesn't execute business logic (analysis, writing code, queries)&lt;/li&gt;
&lt;li&gt;[ ] Routing decisions depend on the state file and subagent output, not conversation history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Subagent design&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Task prompt contains all fields the subagent needs (no implicit context dependencies)&lt;/li&gt;
&lt;li&gt;[ ] Output schema is declared in &lt;code&gt;templates/&lt;/code&gt; and includes a &lt;code&gt;passed&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;[ ] On failure, the subagent still writes &lt;code&gt;{"passed": false, "error": "..."}&lt;/code&gt; to the output file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Concurrency control&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Each subagent in a fan-out writes to a unique output file path&lt;/li&gt;
&lt;li&gt;[ ] Fan-in strategy is explicitly labeled fail-fast or collect-all&lt;/li&gt;
&lt;li&gt;[ ] collect-all has defined &lt;code&gt;selection_criteria&lt;/code&gt; and &lt;code&gt;on_all_failed&lt;/code&gt; behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Context isolation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Subagents run in isolated sessions with no access to the main Agent's history&lt;/li&gt;
&lt;li&gt;[ ] All background information the subagent needs is explicitly in the task prompt&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Orchestrator only decides and dispatches&lt;/strong&gt;: reads JSON conclusions, spawns subagents, collects JSON conclusions — keeping context manageable is the core objective, not "being smart"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fan-in strategy determines workflow resilience&lt;/strong&gt;: solution-space problems (code fix) use collect-all; all-or-nothing problems (data collection) use fail-fast — getting this wrong either blocks on a single failure or wastes time on impossible tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context isolation is quality assurance&lt;/strong&gt;: extra context is noise, not help; if a subagent needs background to do its job, that background belongs explicitly in the task prompt&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Check out &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>workflow</category>
      <category>multiagent</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Open Source Project of the Day (#112): obsidian-skills — Official Agent Skills by Obsidian's CEO, So Agents Stop Breaking Your Vault</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Thu, 02 Jul 2026 02:14:13 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-of-the-day-112-obsidian-skills-official-agent-skills-by-obsidians-ceo-451i</link>
      <guid>https://dev.to/wonderlab/open-source-project-of-the-day-112-obsidian-skills-official-agent-skills-by-obsidians-ceo-451i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Most agents treat Obsidian vaults as plain folders and don't understand Obsidian's syntax. They produce notes that either corrupt existing formatting or fail to render correctly in Obsidian."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is article &lt;strong&gt;#148&lt;/strong&gt; in the &lt;em&gt;Open Source Project of the Day&lt;/em&gt; series. Today's project is &lt;strong&gt;obsidian-skills&lt;/strong&gt; — the official AI Agent skill collection written by Obsidian CEO Steph Ango (kepano).&lt;/p&gt;

&lt;p&gt;You ask Claude Code to create some notes in your Obsidian vault. You open Obsidian and find: wikilinks written as &lt;code&gt;[Note Name](note-name.md)&lt;/code&gt; instead of &lt;code&gt;[[Note Name]]&lt;/code&gt;, broken embed syntax, incorrectly formatted callouts, YAML frontmatter placed inside the body of the file instead of at the top.&lt;/p&gt;

&lt;p&gt;This is a real friction point. AI agents don't know Obsidian's syntax extensions, default to standard Markdown handling, and produce format-corrupted files as a result. Obsidian's internal link-tracking system depends on &lt;code&gt;[[wikilink]]&lt;/code&gt; format — use standard Markdown links and the note falls out of the link graph entirely.&lt;/p&gt;

&lt;p&gt;obsidian-skills is the official answer. Written by Obsidian's CEO, released January 2026, 39.3k Stars — that number signals how strongly Obsidian power users want AI agents that actually understand their vault.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What typical errors AI agents make when operating on Obsidian vaults&lt;/li&gt;
&lt;li&gt;What each of the 5 skills teaches agents: obsidian-markdown / obsidian-bases / json-canvas / obsidian-cli / defuddle&lt;/li&gt;
&lt;li&gt;The detailed content of the obsidian-markdown SKILL.md: complete spec for wikilinks, embeds, callouts, and Properties&lt;/li&gt;
&lt;li&gt;Installation for Claude Code, OpenCode, and Codex&lt;/li&gt;
&lt;li&gt;defuddle: extracting clean Markdown from web pages to save tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Regular Obsidian user (familiar with wikilinks, callouts, and basic concepts)&lt;/li&gt;
&lt;li&gt;Experience with Claude Code or a similar AI coding tool&lt;/li&gt;
&lt;li&gt;Interested in having AI help manage an Obsidian knowledge base&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is obsidian-skills?
&lt;/h3&gt;

&lt;p&gt;obsidian-skills is a set of Agent Skills for Obsidian, following the &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; open standard. It allows Claude Code, Codex, OpenCode, and other skills-compatible agents to correctly handle Obsidian-specific file formats.&lt;/p&gt;

&lt;p&gt;"Correctly handle" is the operative phrase. Obsidian adds substantial syntax extensions on top of standard CommonMark Markdown — wikilinks, embeds, callouts, Properties, inline tags. These appear rarely in training data. Agents default to standard Markdown handling and produce format-corrupted files as a result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Author&lt;/strong&gt;: Steph Ango (GitHub: kepano) — Obsidian CEO, creator of the Minimal theme&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why the author matters&lt;/strong&gt;: This is the spec written by the person who built the format, not a third-party approximation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Released&lt;/strong&gt;: January 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;39,300+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: 2,800+&lt;/li&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What AI Agents Get Wrong in Obsidian Vaults
&lt;/h2&gt;

&lt;p&gt;Before installing obsidian-skills, here's what agents typically do with an Obsidian vault:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Wikilinks become standard links&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What agents write (wrong):&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Related Note&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;related-note.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="gh"&gt;# Correct Obsidian format:&lt;/span&gt;
[[Related Note]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consequence: Obsidian's bidirectional link graph breaks. When you rename the note file, the link doesn't automatically update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Embed syntax breaks&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What agents write (wrong):&lt;/span&gt;
&lt;span class="p"&gt;![&lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;attachment.png&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="gh"&gt;# Correct Obsidian embed syntax:&lt;/span&gt;
![[attachment.png]]     ← embed a file
![[Note Name]]          ← embed another note inline
![[Note#Section]]       ← embed a specific section
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Callout format is wrong&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# What agents write (wrong):&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; **Note**: This is an important tip&lt;/span&gt;

&lt;span class="gh"&gt;# Correct Obsidian callout:&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; [!note] Optional Title&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Callout content goes here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Properties/frontmatter placement errors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Properties must be the first block in the file, wrapped in triple dashes. Agents frequently insert YAML metadata mid-file, or format it incorrectly causing Obsidian parsing failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  All 5 Official Skills
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Skill 1: obsidian-markdown
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Teaches agents the complete Obsidian Flavored Markdown specification.&lt;/p&gt;

&lt;p&gt;The SKILL.md defines a 6-step workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Add frontmatter (Properties)
2. Write body content
3. Link related notes with wikilinks
4. Embed content with ![[...]]
5. Add callouts for important information
6. Verify formatting (ensure correct rendering)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complete syntax coverage:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal Links (Wikilinks):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;[[Note Name]]                   ← basic wikilink
[[Note Name|Display Text]]      ← custom display text
[[Note Name#Heading]]           ← link to specific heading
[[Note Name#^block-id]]        ← link to specific block
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Embeds:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;![[Note Name]]                  ← embed full note
![[image.png]]                  ← embed image
![[document.pdf#page=3]]        ← embed specific PDF page
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Callouts:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gt"&gt;&amp;gt; [!note] Optional Title&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; Callout content&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; [!warning]- Collapsed callout&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; This callout is collapsed by default&lt;/span&gt;
&lt;span class="gt"&gt;
&amp;gt; [!tip]+ Expanded callout&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; This callout is expanded by default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Types: &lt;code&gt;note&lt;/code&gt;, &lt;code&gt;warning&lt;/code&gt;, &lt;code&gt;tip&lt;/code&gt;, &lt;code&gt;info&lt;/code&gt;, &lt;code&gt;success&lt;/code&gt;, &lt;code&gt;question&lt;/code&gt;, &lt;code&gt;failure&lt;/code&gt;, &lt;code&gt;bug&lt;/code&gt;, &lt;code&gt;quote&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Properties (YAML frontmatter):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;tag1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;tag2&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;aliases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;alias1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;alias2&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;cssclasses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;custom-class&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;created&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-07-02&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Other Obsidian extensions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inline tags: &lt;code&gt;#tag&lt;/code&gt;, &lt;code&gt;#nested/tag&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Hidden comments: &lt;code&gt;%%This is hidden in Preview mode%%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Highlight: &lt;code&gt;==highlighted text==&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;LaTeX math: inline &lt;code&gt;$formula$&lt;/code&gt;, block &lt;code&gt;$$formula$$&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Mermaid diagrams (with Obsidian note linking support)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key rule the skill enforces&lt;/strong&gt;: Use &lt;code&gt;[[wikilinks]]&lt;/code&gt; for vault-internal notes (Obsidian tracks renames automatically); use standard Markdown links only for external URLs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill 2: obsidian-bases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Teaches agents to create and edit &lt;code&gt;.base&lt;/code&gt; files — Obsidian's native database view format.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.base&lt;/code&gt; files let you query and display vault notes as a database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Coverage:
- Creating and editing views (table view, gallery view, etc.)
- Setting filter conditions (by tags, dates, Properties fields)
- Defining formulas (spreadsheet-like calculations)
- Configuring summaries (aggregate statistics)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a relatively new Obsidian feature (native database views) with almost no coverage in agent training data. The skill is currently one of the most authoritative format specifications available.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill 3: json-canvas
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Teaches agents to create and edit &lt;code&gt;.canvas&lt;/code&gt; files — Obsidian's whiteboard format, also the &lt;a href="https://jsoncanvas.org" rel="noopener noreferrer"&gt;JSON Canvas&lt;/a&gt; open standard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nodes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Idea A"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Related Note.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"edges"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"edge1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"fromNode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"toNode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node2"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents can create complex mind maps and knowledge visualizations programmatically rather than through GUI drag-and-drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill 4: obsidian-cli
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Teaches agents to interact with Obsidian vaults via command line and develop plugins and themes.&lt;/p&gt;

&lt;p&gt;Covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batch file operations in the vault&lt;/li&gt;
&lt;li&gt;Obsidian URI protocol for triggering operations&lt;/li&gt;
&lt;li&gt;Plugin development directory structure and API conventions&lt;/li&gt;
&lt;li&gt;Theme development CSS variables and specifications&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Skill 5: defuddle
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Extract clean Markdown content from web pages, removing ads, navigation, and sidebars — specifically designed to save tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input: A URL or raw HTML (contains ads, nav menus, sidebars)
Output: Clean Markdown body content

Use cases:
  "Save this article to my vault"
  → Agent uses defuddle to extract clean content
  → Generates a correctly formatted Obsidian note
  → Doesn't stuff the full noisy HTML into context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Defuddle is also an independent open-source project (&lt;a href="https://github.com/kepano/defuddle" rel="noopener noreferrer"&gt;kepano/defuddle&lt;/a&gt;). The skill in obsidian-skills teaches agents how to invoke it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why 39.3k Stars for Five Markdown Files
&lt;/h3&gt;

&lt;p&gt;That number for what is essentially five SKILL.md files signals several things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obsidian's user demographics&lt;/strong&gt;: Obsidian's core users are enthusiastic about knowledge management and tool integration, and among the earliest adopters of AI agent workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A real pain point&lt;/strong&gt;: AI agents corrupting Obsidian vault formatting had been a community frustration for a long time. The official answer spread quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Author credibility&lt;/strong&gt;: Written by Obsidian's CEO means the format specifications are authoritative, not community-approximated. Every nuance is intentional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Claude Code (recommended):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Place the repository contents in a &lt;code&gt;/.claude&lt;/code&gt; folder at the vault root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Navigate to your Obsidian vault root&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/your/vault

&lt;span class="c"&gt;# Clone into .claude directory&lt;/span&gt;
git clone https://github.com/kepano/obsidian-skills .claude/skills/obsidian-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with npx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add https://github.com/kepano/obsidian-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenCode:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kepano/obsidian-skills.git ~/.opencode/skills/obsidian-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important: clone the &lt;strong&gt;full repository&lt;/strong&gt;, not just the inner &lt;code&gt;skills/&lt;/code&gt; directory. The path needs to resolve as &lt;code&gt;~/.opencode/skills/obsidian-skills/skills/&amp;lt;skill-name&amp;gt;/SKILL.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marketplace (in supported agents):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add kepano/obsidian-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skill Auto-Activation
&lt;/h3&gt;

&lt;p&gt;After installation, agents don't need to be told which skill to use. When the task involves Obsidian file operations, the relevant skill loads automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You say: "Create today's daily note in vault/daily-notes/,
         with today's todos and meeting notes, linking to relevant project notes"

Agent detects Obsidian file operation
    ↓
Automatically loads obsidian-markdown SKILL.md
    ↓
Generates according to skill spec:
  - YYYY-MM-DD.md filename
  - Properties frontmatter with tags and date fields
  - [[Project Note Name]] wikilink format
  - Correctly formatted callouts for important items
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Relationship to the Agent Skills Ecosystem
&lt;/h3&gt;

&lt;p&gt;obsidian-skills follows the &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; open standard — the same format used by android/skills (Google's official Android dev skills), agent-skills (Addy Osmani's engineering discipline skills), and others.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The same skill files work across Claude Code, Codex, and OpenCode without modification&lt;/li&gt;
&lt;li&gt;Any new tool supporting the Agent Skills standard works immediately&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use case 1: Research archiving&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Save the content of this URL to vault/research/,
 extract key points, mark them with callouts,
 and link to related existing notes"

→ Agent uses defuddle to extract clean Markdown
→ Uses obsidian-markdown spec to generate correctly formatted note
→ Automatically creates wikilinks to related notes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use case 2: Knowledge visualization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Scan my vault notes on machine learning,
 create a .canvas mind map visualizing their relationships"

→ Agent uses obsidian-cli to read vault content
→ Uses json-canvas spec to generate .canvas file
→ Nodes and edges in correct format
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use case 3: Format migration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Convert all [Note Name](note.md) style links in my vault
 to Obsidian [[Note Name]] wikilink format"

→ Agent knows the correct wikilink syntax
→ Batch modification won't damage other link types
→ External URLs remain untouched
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Links and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/kepano/obsidian-skills" rel="noopener noreferrer"&gt;kepano/obsidian-skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Obsidian&lt;/strong&gt;: &lt;a href="https://obsidian.md" rel="noopener noreferrer"&gt;obsidian.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;Defuddle&lt;/strong&gt;: &lt;a href="https://github.com/kepano/defuddle" rel="noopener noreferrer"&gt;kepano/defuddle&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;JSON Canvas standard&lt;/strong&gt;: &lt;a href="https://jsoncanvas.org" rel="noopener noreferrer"&gt;jsoncanvas.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Agent Skills standard&lt;/strong&gt;: &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;obsidian-skills has value at two levels.&lt;/p&gt;

&lt;p&gt;At the surface: it fixes the specific problem of AI agents corrupting Obsidian formatting. Five skill files teach agents to use wikilinks, embeds, callouts, and Properties correctly. Your vault stops getting polluted by format-broken files written by AI.&lt;/p&gt;

&lt;p&gt;At a deeper level: this is a signal from Obsidian's CEO that tools like Obsidian are beginning to treat AI agents as first-class citizens. Personal knowledge bases are evolving into infrastructure that agents can read and write, not just places you type into.&lt;/p&gt;

&lt;p&gt;For heavy Obsidian users, this skill is close to essential. It transforms Claude Code from "a tool that might break your vault formatting" into "an assistant that can genuinely help manage your second brain."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Welcome to my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>obsidian</category>
      <category>agentskills</category>
      <category>claude</category>
    </item>
    <item>
      <title>Workflow Series (03): State Management — Persistence, Idempotency, and Version Binding</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Wed, 01 Jul 2026 02:27:59 +0000</pubDate>
      <link>https://dev.to/wonderlab/workflow-series-10-enterprise-architecture-registry-composition-and-governance-5cfp</link>
      <guid>https://dev.to/wonderlab/workflow-series-10-enterprise-architecture-registry-composition-and-governance-5cfp</guid>
      <description>&lt;h2&gt;
  
  
  Why State Management Is the Core Problem
&lt;/h2&gt;

&lt;p&gt;An Agent Workflow crashes at Phase 5. After restart, does it begin from the top or continue from Phase 6?&lt;/p&gt;

&lt;p&gt;Without state persistence, it starts over: every LLM call, tool execution, and human approval is discarded.&lt;/p&gt;

&lt;p&gt;State management means the workflow resumes from the last checkpoint regardless of when interruption occurs. It's also the mechanism behind human approval gates — a triggered gate pauses the workflow at a specific state, and the human response resumes from exactly that state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Durable Execution Pattern
&lt;/h2&gt;

&lt;p&gt;Serialize execution as recoverable checkpoints. Any interruption resumes from the most recent one with results identical to uninterrupted execution. Temporal.io implements this at the code layer, but the same semantics work with a JSON file.&lt;/p&gt;

&lt;h3&gt;
  
  
  State File Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"workflow_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wf-bug-e2e-AE-33995-20260601"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"workflow_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jira_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AE-33995"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"started_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-06-01T10:00:00+08:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"phase_4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phases"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"phase_1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"done"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"completed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-06-01T10:02:30+08:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bug_info.json"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"phase_4"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_progress"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_4_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"step_4_1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"done"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"output_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"candidate_a.json"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"step_4_2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_progress"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"step_4_3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pending"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resume Protocol
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;resume_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phase_data&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;phase_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# skip completed phases
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;phase_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_progress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# in_progress treated same as pending — re-execute
&lt;/span&gt;            &lt;span class="c1"&gt;# (idempotency guarantees this is safe)
&lt;/span&gt;            &lt;span class="nf"&gt;execute_phase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key principle: &lt;strong&gt;trust only the state file, not memory&lt;/strong&gt;. The main Agent doesn't remember what it did — it reads the &lt;code&gt;status&lt;/code&gt; field. Phases marked &lt;code&gt;in_progress&lt;/code&gt; get re-executed, which requires every phase operation to be idempotent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Double-Ended Writes
&lt;/h3&gt;

&lt;p&gt;Write to the state file both &lt;strong&gt;before&lt;/strong&gt; a phase starts and &lt;strong&gt;after&lt;/strong&gt; it completes — not only on completion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_phase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Before start: mark in_progress
&lt;/span&gt;    &lt;span class="c1"&gt;# (if crash occurs, resume finds this phase and re-executes it)
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_progress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;write_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_phase_logic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# After completion: mark done, record output file path
&lt;/span&gt;        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;
        &lt;span class="nf"&gt;write_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;phase_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;write_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Idempotency Design
&lt;/h2&gt;

&lt;p&gt;The resume protocol re-executes &lt;code&gt;in_progress&lt;/code&gt; phases, meaning a phase can run twice. Operations that aren't idempotent produce duplicate side effects: two Jira comments, two git commits, two notification emails.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotency Analysis by Operation Type
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;File writes (naturally idempotent)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Overwrite is idempotent — running twice produces the same result
&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# ✅
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Jira comments (not idempotent — requires detection)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong: direct write produces a duplicate comment on re-run
&lt;/span&gt;&lt;span class="n"&gt;jira&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_comment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;comment_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct: check for existing comment with this run's ID first
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_comment_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;comment_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jira&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_comments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[run_id:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# unique marker per workflow run
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;# already written — skip
&lt;/span&gt;
    &lt;span class="n"&gt;jira&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_comment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;issue_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comment_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Git commits (not idempotent — requires detection)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong: direct commit creates a second commit on re-run
&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct: check if commit result file exists and passed=true
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;commit_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;  &lt;span class="c1"&gt;# already committed successfully
&lt;/span&gt;
    &lt;span class="n"&gt;commit_sha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;commit_sha&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;External API triggers (conditionally idempotent)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Adding a Gerrit reviewer: duplicate adds don't error — naturally idempotent ✅
&lt;/span&gt;&lt;span class="n"&gt;gerrit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;change_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Creating a cron job: duplicate creates produce two jobs ❌
# Fix: list first, create only if not already present
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_cron_idempotent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;existing_jobs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_jobs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;job_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;existing_jobs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;# already exists — skip
&lt;/span&gt;    &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Idempotency Self-Check
&lt;/h3&gt;

&lt;p&gt;For every new Step, answer these three questions before implementing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;□ If this step runs twice, does it produce side effects?
□ If yes, how do you detect "already executed" and skip?
□ Is the detection logic itself idempotent?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The third question is easy to miss. If detection depends on in-memory state or has side effects of its own, it fails in the resume scenario just like the original operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  State File Version Binding
&lt;/h2&gt;

&lt;p&gt;Modify a workflow definition mid-run — add a new Step, for example — and the old state file has no record of it. When the workflow resumes, the main Agent has no basis for handling the missing step.&lt;/p&gt;

&lt;p&gt;The fix: bind the workflow version in the state file and verify it on resume.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_or_resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;saved_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;saved_version&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;current_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;WorkflowVersionMismatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;State file version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;saved_version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Current workflow version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Options:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  1. Resume with saved state using old workflow (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;saved_version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  2. Start fresh with new workflow (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  3. Manually migrate the state file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;  &lt;span class="c1"&gt;# versions match — resume normally
&lt;/span&gt;
    &lt;span class="c1"&gt;# New run: create state file
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;current_version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;started_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;write_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Version Number Rules (MAJOR.MINOR.PATCH)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MAJOR: Phase structure changes (add/remove Phase, major routing changes)
        → Breaks in-progress runs; requires explicit handling
        → Cannot resume directly; user must decide

MINOR: Add Step, template improvements, new gate options
        → Backward compatible; in-progress runs complete with old version
        → New runs use new version

PATCH: Wording tweaks, config adjustments, behavior unchanged
        → Safe to upgrade; old state files resume without issue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Design Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;State persistence&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Every Phase/Step writes &lt;code&gt;in_progress&lt;/code&gt; before starting and &lt;code&gt;done&lt;/code&gt; after completing&lt;/li&gt;
&lt;li&gt;[ ] Resume protocol reads only the state file, not conversation history&lt;/li&gt;
&lt;li&gt;[ ] State file includes &lt;code&gt;workflow_version&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Idempotency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] All external writes (Jira comments, git commits, API calls) have idempotency checks&lt;/li&gt;
&lt;li&gt;[ ] Detection uses a unique identifier (run_id or output file existence)&lt;/li&gt;
&lt;li&gt;[ ] The detection logic itself produces no side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Version binding&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Version is verified on resume against the current workflow version&lt;/li&gt;
&lt;li&gt;[ ] MAJOR version changes have an explicit handling strategy&lt;/li&gt;
&lt;li&gt;[ ] Version mismatches surface user-actionable options, not just an error exit&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Durable Execution requires double-ended writes&lt;/strong&gt;: write &lt;code&gt;in_progress&lt;/code&gt; before the phase starts and &lt;code&gt;done&lt;/code&gt; after — a crash at any point allows precise resumption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume requires idempotency&lt;/strong&gt;: &lt;code&gt;in_progress&lt;/code&gt; phases get re-executed, so every external write must be safe to run twice; file writes are naturally idempotent, Jira comments and git commits need explicit detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version binding prevents silent errors&lt;/strong&gt;: when a workflow is modified, a mismatch between the old state file and the new workflow version should surface actionable options — not silently apply new logic to old state&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Check out &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>workflow</category>
      <category>engineering</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Open Source Project of the Day (#111): HyperGraphRAG — N-ary Relations via Hyperedges, the Third-Generation RAG Paradigm</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Wed, 01 Jul 2026 02:26:32 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-of-the-day-111-hypergraphrag-n-ary-relations-via-hyperedges-the-3ndk</link>
      <guid>https://dev.to/wonderlab/open-source-project-of-the-day-111-hypergraphrag-n-ary-relations-via-hyperedges-the-3ndk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Every edge in a knowledge graph connects exactly two nodes — but real-world facts routinely involve three, four, or more entities simultaneously."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is article &lt;strong&gt;#111&lt;/strong&gt; in the &lt;em&gt;Open Source Project of the Day&lt;/em&gt; series. Today's project is &lt;strong&gt;HyperGraphRAG&lt;/strong&gt; — the official implementation of the NeurIPS 2025 paper "Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation."&lt;/p&gt;

&lt;p&gt;RAG technology has a clear evolutionary trajectory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1st generation (Naive RAG)&lt;/strong&gt;: Chunk documents, retrieve by vector similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2nd generation (GraphRAG / LightRAG)&lt;/strong&gt;: Extract knowledge graphs, use graph structure for retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3rd generation (HyperGraphRAG)&lt;/strong&gt;: Replace knowledge graphs with hypergraphs, represent N-ary relations via hyperedges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article explains the core question: what limits knowledge graph binary edges, how hypergraph hyperedges address that limitation, and what the change produces in actual RAG performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The fundamental difference between hypergraphs and knowledge graphs: why binary edges lose information when representing N-ary facts&lt;/li&gt;
&lt;li&gt;HyperGraphRAG's three-phase pipeline: knowledge hypergraph construction → retrieval → generation&lt;/li&gt;
&lt;li&gt;Benchmark results across medicine, agriculture, CS, and law&lt;/li&gt;
&lt;li&gt;Comparison with GraphRAG, LightRAG, and Naive RAG&lt;/li&gt;
&lt;li&gt;Implementation and quick start&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with RAG (Retrieval-Augmented Generation) concepts&lt;/li&gt;
&lt;li&gt;Basic understanding of knowledge graphs (nodes, edges, triples)&lt;/li&gt;
&lt;li&gt;Python basics&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is HyperGraphRAG?
&lt;/h3&gt;

&lt;p&gt;HyperGraphRAG is the first hypergraph-structured RAG system, published at NeurIPS 2025. It replaces knowledge graph binary edges (connecting exactly two nodes) with hyperedges (connecting any number of nodes simultaneously), natively representing multi-entity relationships in real-world facts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First author&lt;/strong&gt;: Haoran Luo (&lt;a href="mailto:haoran.luo@ieee.org"&gt;haoran.luo@ieee.org&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Published&lt;/strong&gt;: NeurIPS 2025 (&lt;em&gt;Advances in Neural Information Processing Systems&lt;/em&gt;, vol. 38, pp. 152206–152234)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;arXiv&lt;/strong&gt;: 2503.21322&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: 415+&lt;/li&gt;
&lt;li&gt;🔬 Published at: NeurIPS 2025&lt;/li&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;li&gt;🐍 Language: Python 100%&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Concept: Hypergraph vs. Knowledge Graph
&lt;/h2&gt;

&lt;p&gt;Before the pipeline details, the core concept needs to be clear.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Limitation of Knowledge Graphs: Binary Edges
&lt;/h3&gt;

&lt;p&gt;Traditional knowledge graphs represent facts as triples: &lt;code&gt;(subject, relation, object)&lt;/code&gt;. Every edge connects exactly two nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Knowledge graph representation:
Alice  ─[co-author]─→  paper_X
Bob    ─[co-author]─→  paper_X
Carol  ─[co-author]─→  paper_X
paper_X ─[published_at]─→ NeurIPS
paper_X ─[year]─→ 2025

5 separate binary edges, 5 extraction steps, the relationship is fragmented
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This representation has a fundamental information loss: &lt;strong&gt;"Alice, Bob, and Carol jointly co-authored paper X" as a single fact has been decomposed into five isolated edges&lt;/strong&gt;. Retrieval that finds only two or three of them struggles to reconstruct the complete relationship.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Hypergraphs Solve This: Hyperedges
&lt;/h3&gt;

&lt;p&gt;A hypergraph allows one edge (hyperedge) to connect any number of nodes, directly representing N-ary facts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hypergraph representation:
{Alice, Bob, Carol, paper_X, NeurIPS, 2025}
     ────────────[co-authored]────────────→
         One hyperedge, complete N-ary relationship preserved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A hyperedge packages all entities involved in a fact together, with no decomposition needed. Retrieving one hyperedge delivers the complete relational context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More concrete comparison:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event: A meeting
  Attendees: Alice, Bob, Carol
  Date: 2025-06-15
  Location: Beijing
  Topic: Product roadmap discussion

Knowledge graph:
  (Alice, attended, meeting_001)
  (Bob, attended, meeting_001)
  (Carol, attended, meeting_001)
  (meeting_001, date, 2025-06-15)
  (meeting_001, location, Beijing)
  (meeting_001, topic, product_roadmap_discussion)
  ← 6 edges, relationship broken apart

Hypergraph:
  Hyperedge: {Alice, Bob, Carol, 2025-06-15, Beijing, product_roadmap_discussion}
  Relation: co-attended-meeting
  ← 1 hyperedge, N-ary relationship intact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  System Architecture: Three-Phase Pipeline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Knowledge Hypergraph Construction (Indexing)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;hypergraphrag&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperGraphRAG&lt;/span&gt;

&lt;span class="n"&gt;rag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperGraphRAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;working_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expr/my_project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Insert documents, triggers knowledge hypergraph construction
&lt;/span&gt;&lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Construction process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Document chunking&lt;/strong&gt;: Split input documents into chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;N-ary fact extraction&lt;/strong&gt;: Use LLM to extract N-ary relational facts from each chunk

&lt;ul&gt;
&lt;li&gt;Not just &lt;code&gt;(subject, relation, object)&lt;/code&gt; triples&lt;/li&gt;
&lt;li&gt;Extract complete facts involving N entities simultaneously&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hyperedge construction&lt;/strong&gt;: Convert each N-ary fact into a hyperedge

&lt;ul&gt;
&lt;li&gt;Each hyperedge contains: all related entity nodes + relation type + provenance&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hypergraph storage&lt;/strong&gt;: Persist the node set and hyperedge set to the working directory&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 2: Hypergraph Retrieval
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What papers did Alice and Bob co-author in 2025?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference between hypergraph retrieval and knowledge graph retrieval:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Knowledge graph retrieval:
  Find Alice node
  → Find all binary edges connecting Alice
  → Find edges containing Bob
  → Take intersection
  → Multi-hop path reasoning, easy to miss connections

Hypergraph retrieval:
  Find Alice node
  → Find all hyperedges containing Alice
  → Hyperedges already contain Bob, papers, dates as complete context
  → Directly locate relevant hyperedges, no multi-hop reasoning needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 3: Generation
&lt;/h3&gt;

&lt;p&gt;Retrieved hyperedge content serves as context for the LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieved context (hyperedge):
  Entities: {Alice, Bob, paper_X, NeurIPS, 2025}
  Relation: co-authored
  Summary: Alice and Bob co-authored paper_X, published at NeurIPS 2025,
           on the topic of hypergraph-structured knowledge representation

The LLM receives complete, structured N-ary relationship context —
not fragments assembled from multiple disconnected binary edges
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Benchmark Results
&lt;/h2&gt;

&lt;p&gt;The paper evaluates across four domain datasets, comparing against Naive RAG, GraphRAG, and LightRAG:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domains&lt;/strong&gt;: Medicine, Agriculture, Computer Science, Law&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;: Answer accuracy, retrieval efficiency, generation quality&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finding&lt;/strong&gt;: HyperGraphRAG outperforms across all four domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vs. Naive RAG (vector retrieval): better multi-entity relationship understanding&lt;/li&gt;
&lt;li&gt;vs. GraphRAG: less information loss from binary decomposition&lt;/li&gt;
&lt;li&gt;vs. LightRAG: significant improvement on complex N-ary relationship scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The domain selection is deliberate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Medicine&lt;/strong&gt;: Drug interactions involving multiple simultaneous medications are N-ary by nature — "A interacts with B" doesn't capture polypharmacy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Law&lt;/strong&gt;: Contract clauses involving multiple parties, facts constrained by multiple statutes simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computer Science&lt;/strong&gt;: Technical facts linking algorithms, data structures, applications, and performance constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agriculture&lt;/strong&gt;: Crop growth conditions where soil, climate, fertilizer, and pests interact simultaneously&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The RAG Paradigm Evolution
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1st generation: Naive RAG
  Documents → Embeddings → Vector database
  Query → Similarity search → Return chunks
  Problem: Semantic retrieval, no structural knowledge

2nd generation: GraphRAG (Microsoft) / LightRAG (HKUDS)
  Documents → Extract knowledge graph (triples) → Graph database
  Query → Graph traversal → Structured context
  Problem: Binary edges can't natively represent N-ary relations; complex facts get fragmented

3rd generation: HyperGraphRAG (NeurIPS 2025)
  Documents → Extract N-ary facts → Hypergraph (hyperedges)
  Query → Hyperedge retrieval → Complete N-ary relationship context
  Advantage: Relationship integrity preserved; less noise accumulation in multi-hop reasoning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This evolution has an underlying logic: real-world knowledge isn't binary. A paper's authorship involves multiple authors, institutions, and dates. A legal judgment involves plaintiff, defendant, judge, statutes, and facts. A business contract involves multiple parties, multiple clauses, and multiple milestone dates.&lt;/p&gt;

&lt;p&gt;Forcing all of this into binary edges is an architectural mismatch between the representation and the knowledge it encodes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/LHRLAB/HyperGraphRAG
&lt;span class="nb"&gt;cd &lt;/span&gt;HyperGraphRAG

conda create &lt;span class="nt"&gt;-n&lt;/span&gt; hypergraphrag &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.11
conda activate hypergraphrag

pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configure OpenAI API:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Basic usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;hypergraphrag&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HyperGraphRAG&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;rag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperGraphRAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;working_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expr/test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Build hypergraph index
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_document.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Query
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your question here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Limitations and When to Use It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Well-suited for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documents with dense multi-entity relationships (medical records, legal documents, academic papers)&lt;/li&gt;
&lt;li&gt;Queries requiring complex reasoning across multiple entities&lt;/li&gt;
&lt;li&gt;Scenarios where GraphRAG has hit a ceiling on relation retrieval accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Worth considering:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hypergraph construction is more complex than standard KG extraction — LLM needs to identify N-ary facts, which costs more in time and API calls&lt;/li&gt;
&lt;li&gt;Currently requires OpenAI API (extensible to other LLMs)&lt;/li&gt;
&lt;li&gt;Research code, not a production framework — the README describes this as a research implementation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/LHRLAB/HyperGraphRAG" rel="noopener noreferrer"&gt;LHRLAB/HyperGraphRAG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;arXiv paper&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2503.21322" rel="noopener noreferrer"&gt;2503.21322&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔬 &lt;strong&gt;NeurIPS 2025&lt;/strong&gt;: &lt;a href="https://neurips.cc/virtual/2025/poster/115764" rel="noopener noreferrer"&gt;neurips.cc/virtual/2025/poster/115764&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📧 &lt;strong&gt;Contact&lt;/strong&gt;: &lt;a href="mailto:haoran.luo@ieee.org"&gt;haoran.luo@ieee.org&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;HyperGraphRAG's contribution in one sentence: &lt;strong&gt;replacing binary edges with hyperedges lets RAG systems natively represent N-ary relationships&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That sounds like a graph structure implementation detail — but for document corpora full of multi-entity relationships, it addresses a fundamental information compression problem. When GraphRAG decomposes N-ary facts into multiple binary edges, the holistic relationship is already lost. All subsequent retrieval and reasoning operate on incomplete information.&lt;/p&gt;

&lt;p&gt;NeurIPS 2025 publication signals academic validation of this direction. For developers using GraphRAG or LightRAG who are hitting accuracy ceilings on complex relational queries, this is a research direction worth understanding and experimenting with.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Welcome to my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>rag</category>
      <category>hypergraph</category>
      <category>llm</category>
    </item>
    <item>
      <title>Open Source Project of the Day (#110): openpilot — The Open-Source Driver Assistance System That Beat Tesla Autopilot in Consumer Reports</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Tue, 30 Jun 2026 02:52:19 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-of-the-day-110-openpilot-the-open-source-driver-assistance-system-that-5d9c</link>
      <guid>https://dev.to/wonderlab/open-source-project-of-the-day-110-openpilot-the-open-source-driver-assistance-system-that-5d9c</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 325+ supported cars."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is article &lt;strong&gt;#110&lt;/strong&gt; in the &lt;em&gt;Open Source Project of the Day&lt;/em&gt; series. Today's project is &lt;strong&gt;openpilot&lt;/strong&gt; — comma.ai's open-source semi-automated driving software, one of the most-starred autonomous driving repositories on GitHub.&lt;/p&gt;

&lt;p&gt;In 2020, Consumer Reports published a driver assistance system evaluation. The top-ranked system wasn't Tesla Autopilot, wasn't Cadillac Super Cruise — it was open-source software running on a third-party device you plug in yourself.&lt;/p&gt;

&lt;p&gt;In April 2024, an ordinary 2017 Toyota Prius, with a comma 3X device, completed a coast-to-coast US drive at 98.4% autonomy in 43 hours 18 minutes — beating Tesla Model S's previous record by nearly 12 hours.&lt;/p&gt;

&lt;p&gt;This is not a demo project. 62.8k Stars, running in 10,000+ users' real vehicles, over 100 million miles accumulated.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;openpilot's technical architecture: why end-to-end neural networks instead of separate perception and planning modules&lt;/li&gt;
&lt;li&gt;The panda safety architecture: how ISO 26262 safety levels are achieved in open-source software&lt;/li&gt;
&lt;li&gt;The World Model in 0.11: training real-world driving policy inside a learned simulated world&lt;/li&gt;
&lt;li&gt;George Hotz and comma.ai's origin story&lt;/li&gt;
&lt;li&gt;openpilot vs. Tesla Autopilot in context&lt;/li&gt;
&lt;li&gt;How to contribute (including comma.ai's bounty program)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with driver assistance systems (ADAS) basics&lt;/li&gt;
&lt;li&gt;Basic understanding of neural networks&lt;/li&gt;
&lt;li&gt;Background in automotive electronics (CAN bus) is helpful but not required&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is openpilot?
&lt;/h3&gt;

&lt;p&gt;openpilot is an open-source semi-automated driving system that uses an end-to-end neural network to predict driving trajectory directly from camera images — no separate perception module, no separate planning module, just a unified network from pixels to control commands.&lt;/p&gt;

&lt;p&gt;It connects to the car's CAN bus through comma four hardware, replaces the stock driver assistance system, and provides better lane centering, adaptive cruise control, and driver monitoring than the factory system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Company
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Founder&lt;/strong&gt;: George Hotz (geohot) — the first person to unlock an iPhone, the first to crack PS3 encryption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Company&lt;/strong&gt;: comma.ai (founded September 2015)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-sourced&lt;/strong&gt;: November 30, 2016&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;62,800+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: 11,100+&lt;/li&gt;
&lt;li&gt;🚗 Supported vehicles: 325+&lt;/li&gt;
&lt;li&gt;📍 Accumulated miles: 100M+&lt;/li&gt;
&lt;li&gt;👥 Active users: 10,000+&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What openpilot Does
&lt;/h3&gt;

&lt;p&gt;After installing the comma four device and connecting it to a supported car:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated Lane Centering (ALC)&lt;/strong&gt;: Replaces the factory lane keeping assist. Uses the neural network to keep the car centered in the lane. Handles faded lane markings, construction zones, varied road conditions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive Cruise Control (ACC)&lt;/strong&gt;: Detects vehicles ahead, automatically controls acceleration and braking to maintain safe following distance. Integrates OpenStreetMap for automatic speed adjustment on curves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lane Change Assist&lt;/strong&gt;: Assists lane changes after turn signal activation, with blind spot awareness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Driver Monitoring&lt;/strong&gt;: Camera-based alertness tracking. Alerts after approximately 6 seconds of detected distraction, without the constant steering wheel nagging many factory systems use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Factory safety features preserved&lt;/strong&gt;: Automatic Emergency Braking (AEB), blind spot warning, auto high beams — all kept as-is from the factory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supported Brands and Models
&lt;/h3&gt;

&lt;p&gt;Primary support: Toyota, Hyundai, Honda, Ford, plus GM, Subaru, Nissan, and others — 325+ specific models total. Full list in &lt;a href="https://github.com/commaai/openpilot/blob/master/docs/CARS.md" rel="noopener noreferrer"&gt;CARS.md&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Required Hardware
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;comma four device&lt;/strong&gt;: $999. Three cameras for 360° vision, Qualcomm Snapdragon 845 processor, 4G LTE + WiFi, OLED display, CAN FD enabled, high-precision GPS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Car harness&lt;/strong&gt;: The cable connecting comma four to the car's CAN network.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Technical Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  End-to-End Neural Network
&lt;/h3&gt;

&lt;p&gt;openpilot's most important architectural decision is the &lt;strong&gt;end-to-end&lt;/strong&gt; approach: camera images in, control commands out. No separate perception layer, no separate planning layer — one unified network trained directly on driving data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional ADAS architecture:
Camera → [Perception module] → [Object detection] → [Path planning] → [Control]
Each module designed separately; errors compound across layers

openpilot end-to-end:
Camera → [Unified neural network] → Control commands
One model, trained end-to-end, from pixels to steering angle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The end-to-end advantage: no hand-engineered features needed. The model learns to handle faded lane lines, international road markings, weather variations — because it's trained on millions of real-world driving miles that contain all of those situations.&lt;/p&gt;

&lt;h3&gt;
  
  
  openpilot 0.11: World Model Training
&lt;/h3&gt;

&lt;p&gt;openpilot 0.11 shipped a new driving model codenamed WMI (🍉). The core innovation: &lt;strong&gt;training a driving policy inside a learned simulated world&lt;/strong&gt; — claimed as the first deployment of this approach for real-world robotics shipped to users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-stage training:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Stage 1: Train the World Model
  Input: Unlabeled fleet driving data (video)
  Architecture: 2B-parameter diffusion Transformer
  Outcome: A world simulator that predicts "what happens next"
  (The neural network builds an internal model of the driving world)

Stage 2: Train the driving policy
  Environment: Interactions with the World Model (not the real world)
  Training: Small driving policy network interacts with the World Model
  Outcome: A policy that learned to drive inside the simulated world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;World Model technical specs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ViT encoder (50M parameters) + decoder (100M parameters)&lt;/li&gt;
&lt;li&gt;Diffusion Transformer: 2B parameters, trained on 2.5M minutes of video&lt;/li&gt;
&lt;li&gt;Throughput: 12.2 frames/sec/GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Power efficiency improvement&lt;/strong&gt;: Idle power consumption dropped from 225 mW to 52 mW when parked — a 77% reduction via peripheral disabling, voltage scaling, and deep stop mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  panda: The Safety Architecture
&lt;/h3&gt;

&lt;p&gt;openpilot sends control commands to a car. That makes safety the core engineering problem. All safety-critical code lives in the separate &lt;strong&gt;panda&lt;/strong&gt; module:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Written in C (Python is faster to develop but carries GC pauses and runtime risk)&lt;/li&gt;
&lt;li&gt;Follows ISO 26262 functional safety standards&lt;/li&gt;
&lt;li&gt;Safety rules enforced in hardware: steering torque limits, speed limits, emergency takeover logic&lt;/li&gt;
&lt;li&gt;panda runs on a separate processor inside the comma four, isolated from the main openpilot software&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design means: even if openpilot's main software has a bug or is compromised, the hardware safety layer in panda remains effective.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing Infrastructure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Software-in-the-loop tests (every commit)
  └── Simulated driving scenarios, regression suite

Hardware-in-the-loop tests (internal Jenkins)
  └── Software running on actual comma hardware

Physical replay testing (runs continuously)
  └── "A testing closet containing 10 comma devices
       continuously replaying routes, detecting behavior changes"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This testing stack is unusually thorough for open-source automotive software.&lt;/p&gt;




&lt;h2&gt;
  
  
  Historical Milestones
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sept 2015&lt;/td&gt;
&lt;td&gt;George Hotz founds comma.ai&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2016&lt;/td&gt;
&lt;td&gt;Demo on Acura ILX; California DMV cease-and-desist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nov 2016&lt;/td&gt;
&lt;td&gt;openpilot open-sourced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nov 2020&lt;/td&gt;
&lt;td&gt;Consumer Reports ranks openpilot #1 above Tesla Autopilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021&lt;/td&gt;
&lt;td&gt;comma three released at $2,199&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;td&gt;comma 3X released at $1,250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;April 2024&lt;/td&gt;
&lt;td&gt;US coast-to-coast in 43h18m at 98.4% autonomy, breaks record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;comma four released at $999, three-camera design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;openpilot 0.11, World Model-trained driving policy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What the Consumer Reports 2020 Ranking Means
&lt;/h3&gt;

&lt;p&gt;Consumer Reports evaluated lane-centering accuracy, safety, usability, and abuse prevention. openpilot scored first not because of hardware advantages — it's a third-party plug-in device running open-source software — but because of the software's algorithm quality. A system built on consumer hardware and open-source code outperformed Tesla, Cadillac, and Ford factory systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 2024 Coast-to-Coast Record
&lt;/h3&gt;

&lt;p&gt;A stock 2017 Toyota Prius. $999 comma 3X device. No special route planning. 98.4% autonomy across the continental US. Beating Tesla Model S's prior record by nearly 12 hours doesn't describe technical specifics, but it illustrates the calibration between the system's capability and real-world conditions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ecosystem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Community Forks
&lt;/h3&gt;

&lt;p&gt;11,000+ forks on GitHub. The community maintains extended versions with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic light and stop sign detection&lt;/li&gt;
&lt;li&gt;Parking lot navigation&lt;/li&gt;
&lt;li&gt;Pre-Autopilot Tesla support&lt;/li&gt;
&lt;li&gt;Custom driving behavior parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Contributor Bounties
&lt;/h3&gt;

&lt;p&gt;comma.ai operates a bounty system paying external contributors for adding new car support and fixing bugs. Cash incentives for open-source automotive software contributions are rare.&lt;/p&gt;

&lt;h3&gt;
  
  
  tinygrad
&lt;/h3&gt;

&lt;p&gt;comma.ai simultaneously maintains &lt;a href="https://github.com/tinygrad/tinygrad" rel="noopener noreferrer"&gt;tinygrad&lt;/a&gt; — a minimalist deep learning framework serving as openpilot's neural network inference backend. openpilot's training and inference code is migrating to tinygrad.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/commaai/openpilot" rel="noopener noreferrer"&gt;commaai/openpilot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Website&lt;/strong&gt;: &lt;a href="https://comma.ai" rel="noopener noreferrer"&gt;comma.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://docs.comma.ai" rel="noopener noreferrer"&gt;docs.comma.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🚗 &lt;strong&gt;Supported vehicles&lt;/strong&gt;: &lt;a href="https://docs.comma.ai/vehicles" rel="noopener noreferrer"&gt;docs.comma.ai/vehicles&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💬 &lt;strong&gt;Community&lt;/strong&gt;: &lt;a href="https://discord.comma.ai" rel="noopener noreferrer"&gt;discord.comma.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;tinygrad&lt;/strong&gt;: &lt;a href="https://github.com/tinygrad/tinygrad" rel="noopener noreferrer"&gt;github.com/tinygrad/tinygrad&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Safety Disclaimer
&lt;/h2&gt;

&lt;p&gt;The MIT license includes this notice:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"THIS IS ALPHA QUALITY SOFTWARE FOR RESEARCH PURPOSES ONLY. THIS IS NOT A PRODUCT."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;openpilot is a driver assistance system, not fully autonomous driving. Drivers must remain alert at all times and be ready to take over immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;openpilot is one of the few large open-source projects running in genuinely safety-critical environments. The 62.8k Stars represent 10,000+ users' real driving miles, not notebook experiments.&lt;/p&gt;

&lt;p&gt;The end-to-end neural network architecture was an aggressive choice in 2016. It's now becoming the industry standard — Tesla FSD moved to end-to-end too. openpilot was one of the earliest large-scale validations of that approach.&lt;/p&gt;

&lt;p&gt;The World Model-based training in 0.11 — training driving policy in a learned simulated world rather than a hand-crafted simulator — represents a direction the research community has discussed for years. openpilot deployed it to real users' cars.&lt;/p&gt;

&lt;p&gt;In a field that traditionally requires billion-dollar investments and closed ecosystems, George Hotz built a parallel path using open-source software and consumer hardware. That path beat factory systems from the world's largest automakers in a Consumer Reports evaluation. The approach itself is worth studying.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Welcome to my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>worldmodel</category>
      <category>opensource</category>
      <category>adas</category>
    </item>
    <item>
      <title>Workflow Series (02): Design Patterns — Four-Layer Architecture, Three Context Modes, and Approval Gate Design</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Mon, 29 Jun 2026 02:56:14 +0000</pubDate>
      <link>https://dev.to/wonderlab/workflow-series-02-design-patterns-four-layer-architecture-three-context-modes-and-approval-f18</link>
      <guid>https://dev.to/wonderlab/workflow-series-02-design-patterns-four-layer-architecture-three-context-modes-and-approval-f18</guid>
      <description>&lt;h2&gt;
  
  
  From Script to Engineering
&lt;/h2&gt;

&lt;p&gt;An early-stage workflow is often a single file: one Markdown that describes everything, with all configuration hardcoded. This works at small scale. As the workflow grows, three problems appear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Changing a timeout requires finding and updating multiple locations&lt;/li&gt;
&lt;li&gt;Subagent task prompts are scattered through the workflow definition, impossible to test independently&lt;/li&gt;
&lt;li&gt;Security policy and business logic are mixed together, making compliance review painful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The four-layer architecture addresses all three.&lt;/p&gt;




&lt;h2&gt;
  
  
  Four-Layer Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Policy Layer     policy.md         Execution rules, global constraints
                                    → Who can do what; authorization for high-risk ops

Workflow Layer   workflow.md        Phase / Step structure, routing logic
                                    → The skeleton; no specific task content

TaskSpec Layer   templates/         Subagent task prompt templates
                                    → Detailed instructions and output contracts per task

Tool/Skill Layer skills/            Atomic capabilities
                                    → Skill definitions reusable across workflows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Core principle:&lt;/strong&gt; each layer changes only its own concerns — nothing crosses layers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# ✅ Correct: change analysis timeout → edit workflow.md&lt;/span&gt;
phase_3_analyze:
  timeout: 30m  ← Workflow Layer change

&lt;span class="gh"&gt;# ✅ Correct: change analysis output format → edit templates/analyze.md&lt;/span&gt;
&lt;span class="gu"&gt;## Output Contract&lt;/span&gt;
{"confidence": float, "root_cause": str, ...}  ← TaskSpec Layer change

&lt;span class="gh"&gt;# ❌ Wrong: write permission rules inside a task prompt (permissions belong in Policy Layer)&lt;/span&gt;
&lt;span class="gh"&gt;# ❌ Wrong: write specific analysis steps inside workflow.md (steps belong in TaskSpec Layer)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Layer separation makes changes safe: editing &lt;code&gt;templates/&lt;/code&gt; only affects the corresponding subagent's output. Editing &lt;code&gt;policy.md&lt;/code&gt; cannot accidentally break routing logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Passing Modes
&lt;/h2&gt;

&lt;p&gt;Deciding what a subagent should know is where workflow design goes wrong most often.&lt;/p&gt;

&lt;p&gt;Passing the main Agent's full history to every subagent is the most common mistake. Context explodes: subagents slow down, output quality drops, token cost doubles.&lt;/p&gt;

&lt;p&gt;Choose a passing mode based on what the subagent actually needs. Three modes:&lt;/p&gt;

&lt;h3&gt;
  
  
  accumulate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; pass all relevant outputs the workflow has produced so far.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; the subagent synthesizes conclusions from multiple earlier phases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Phase 7: write closing notification, needs conclusions from the whole workflow&lt;/span&gt;
&lt;span class="na"&gt;phase_7_notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;context_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;accumulate&lt;/span&gt;
  &lt;span class="na"&gt;context_inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;phases.phase3.root_cause_summary&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;phases.phase4.fix_summary&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;phases.phase5.commit_result&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;phases.phase6.review_status&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Phase 7 subagent needs root cause, fix summary, commit outcome, and review status. Missing any one of them produces an incomplete notification.&lt;/p&gt;

&lt;h3&gt;
  
  
  last_only
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; pass only the output of the immediately preceding phase or step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; the subagent's task depends entirely on its direct predecessor; history is irrelevant.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Phase 2: extract log files — only needs the attachment path from Phase 1&lt;/span&gt;
&lt;span class="na"&gt;phase_2_extract_logs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;context_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;last_only&lt;/span&gt;
  &lt;span class="na"&gt;context_inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;phases.phase1.attachment_path&lt;/span&gt;   &lt;span class="c1"&gt;# one field is all it needs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extracting logs doesn't need the full Jira ticket details — just where the file is. Passing all of Phase 1's output wastes context. &lt;code&gt;last_only&lt;/code&gt; enforces taking only what's needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  explicit
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; name every specific field the subagent needs, sourcing from any prior phase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; the subagent needs specific fields from multiple phases, but not the complete output of any single phase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Phase 3: root cause analysis — needs bug_info (Phase 1) + log_dir (Phase 2)&lt;/span&gt;
&lt;span class="na"&gt;phase_3_analyze&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;context_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;explicit&lt;/span&gt;
  &lt;span class="na"&gt;context_inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phases.phase1&lt;/span&gt;
      &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;bug_info.summary&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;bug_info.stack_trace&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;bug_info.jira_key&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phases.phase2&lt;/span&gt;
      &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;log_dir&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;extracted_files&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Phase 3 needs the bug description (from Phase 1) and the log directory (from Phase 2), but not Phase 1's attachment path or Phase 2's raw extraction log. &lt;code&gt;explicit&lt;/code&gt; mode controls precisely what flows into the subagent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing a Mode
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Subagent synthesizes conclusions from multiple phases  → accumulate
Subagent depends only on its direct predecessor        → last_only
Subagent needs specific fields from multiple sources   → explicit (recommended default)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;explicit&lt;/code&gt; is the safest default.&lt;/strong&gt; Even when you're unsure what a subagent needs, start by naming specific fields. It's easier to debug than over-passing, and it documents the data dependencies explicitly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Approval Gate Design
&lt;/h2&gt;

&lt;p&gt;Approval gates are the nodes where humans intervene. Incomplete gate definitions are a common source of production incidents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Gate Types
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;interrupt (blocking)
  Workflow pauses completely until human responds
  For: high-risk operations (code merge, production deploy)

notification (non-blocking)
  Workflow continues; human is notified in parallel
  For: low-risk operations where awareness is enough

approval (async)
  Asynchronous wait for approval within a specified time window
  For: formal approval processes with SLA requirements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Five Required Fields
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Complete approval gate definition&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;gate_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gate_B&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;interrupt&lt;/span&gt;
  &lt;span class="na"&gt;trigger_condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fix_result.all_passed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;after&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;retries"&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;Fix attempts: 3 failures.&lt;/span&gt;
    &lt;span class="s"&gt;Root cause: {{ phases.phase3.root_cause_summary }}&lt;/span&gt;
    &lt;span class="s"&gt;Last error: {{ phases.phase4.last_error }}&lt;/span&gt;

    &lt;span class="s"&gt;Choose next action:&lt;/span&gt;
  &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;label&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Manual intervention&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;manual_fix&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;label&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Re-analyze root cause&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;re_analyze&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;label&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Mark as requires manual fix&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mark_manual&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;24h&lt;/span&gt;
  &lt;span class="na"&gt;timeout_action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pause&lt;/span&gt;    &lt;span class="c1"&gt;# ← most commonly omitted field&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;timeout_action&lt;/code&gt; is the most frequently missing field. Options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pause    → suspend workflow after timeout, wait for human to resume (most common)
continue → proceed with default option after timeout (low-risk notification gates)
abort    → terminate the entire workflow after timeout (strict time-window operations)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A gate without &lt;code&gt;timeout_action&lt;/code&gt; leaves the workflow hanging indefinitely: no alert fires, no record is written, no recovery path exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approval Gate Message Design
&lt;/h3&gt;

&lt;p&gt;The gate message is read by humans. It directly determines how fast decisions get made.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;✅ Effective gate message:
  "Test pass rate: 67% (8/12 passing)
   Failing tests: test_null_input, test_overflow
   Current fix: modified boundary check in parseInput()
   Recommended action: re-analyze root cause — failure pattern
   doesn't match the identified root cause"

❌ Ineffective gate message:
  "Fix failed. Choose an action."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good message lets the reviewer decide in 30 seconds. A bad one sends them to the logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Serial Retry vs Parallel Candidates
&lt;/h2&gt;

&lt;p&gt;When a workflow encounters failure, two response strategies exist. Choosing the wrong one degrades efficiency or quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serial Retry
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attempt 1 → fail
           ↓ (with failure reason + feedback)
Attempt 2 → fail
           ↓ (with failure reason + feedback)
Attempt 3 → pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; the error reason is concrete, later attempts can learn from earlier failures, and there's meaningful variation in angle or approach.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: root cause analysis (Phase 3)
  Attempt 1: analyze from code perspective
  Attempt 2: feedback "code analysis confidence low — try log anomaly patterns"
  Attempt 3: feedback "try tracing the call chain chronologically"

Each retry applies the previous failure as a learning signal.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Parallel Candidates
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            → Candidate A → test → pass  ← select this
analysis →  → Candidate B → test → fail
            → Candidate C → test → fail
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; the solution space is diverse, predicting which approach will work is impossible, and exploring multiple options concurrently then selecting the best is more efficient than serial exploration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: code fix (Phase 4)
  Candidate A: fix the boundary check logic
  Candidate B: fix the caller's input validation
  Candidate C: fix parseInput()'s default value handling

All three approaches could be correct. Run them concurrently,
select the one that passes tests with the highest coverage.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Selection Principle
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Later attempts can learn from earlier failures → serial retry
Multiple approaches are equally plausible      → parallel candidates
Time-sensitive, can't afford serial latency    → parallel candidates
Comparing solution quality is the goal         → parallel candidates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Document the strategy in the workflow definition — it makes debugging straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;phase_3_analyze&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;retry_strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;serial&lt;/span&gt;          &lt;span class="c1"&gt;# different angles, learning from failure&lt;/span&gt;
  &lt;span class="na"&gt;max_retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;feedback_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;include_prev_error&lt;/span&gt;

&lt;span class="na"&gt;phase_4_fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;retry_strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parallel&lt;/span&gt;        &lt;span class="c1"&gt;# solution space is diverse, select best&lt;/span&gt;
  &lt;span class="na"&gt;parallel_candidates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;selection_criteria&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;passed_tests AND max_coverage&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Design Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Four-layer separation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Policy (permissions/security) and Workflow (routing/structure) are in separate files&lt;/li&gt;
&lt;li&gt;[ ] Subagent task prompts live in independent &lt;code&gt;templates/&lt;/code&gt; files&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;config.yaml&lt;/code&gt; centralizes mutable parameters (timeouts, retry counts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Context passing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Every subagent invocation declares a &lt;code&gt;context_mode&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;explicit&lt;/code&gt; mode lists specific fields — no whole Phase outputs passed in&lt;/li&gt;
&lt;li&gt;[ ] Main Agent's full history is not sent to every subagent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Approval gates&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Every gate has both &lt;code&gt;timeout&lt;/code&gt; and &lt;code&gt;timeout_action&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Message contains enough information for a 30-second decision&lt;/li&gt;
&lt;li&gt;[ ] Option values are enumerated types, not free text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Retry strategy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Every node with retry logic is labeled &lt;code&gt;serial&lt;/code&gt; or &lt;code&gt;parallel&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Serial retries have &lt;code&gt;feedback_mode&lt;/code&gt; (failure reason flows back to the generator)&lt;/li&gt;
&lt;li&gt;[ ] Parallel candidates have a defined &lt;code&gt;selection_criteria&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The four-layer architecture's value is isolation&lt;/strong&gt;: Policy changes don't affect routing; &lt;code&gt;templates/&lt;/code&gt; changes are independently testable — this is the foundation of maintainability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;explicit&lt;/code&gt; is the safest default for Context passing&lt;/strong&gt;: it names every field, costs fewer tokens than &lt;code&gt;accumulate&lt;/code&gt;, is more flexible than &lt;code&gt;last_only&lt;/code&gt;, and is easier to debug when something goes wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serial retry vs parallel candidates is a directional choice&lt;/strong&gt;: root cause analysis benefits from serial retry (learning is effective); code fix benefits from parallel candidates (solution space is diverse) — reversing them degrades both efficiency and quality&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Check out &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>workflow</category>
      <category>designpatterns</category>
      <category>production</category>
    </item>
    <item>
      <title>Open Source Project of the Day (#109): Trellis — Persist Project Specs, Tasks, and Memory Into Your Repository</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Mon, 29 Jun 2026 02:52:27 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-of-the-day-109-trellis-persist-project-specs-tasks-and-memory-into-your-2h9i</link>
      <guid>https://dev.to/wonderlab/open-source-project-of-the-day-109-trellis-persist-project-specs-tasks-and-memory-into-your-2h9i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"AI coding is 10% coding and 90% re-explaining your stack. Trellis fixed the worst part of AI agents: amnesia."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is article &lt;strong&gt;#109&lt;/strong&gt; in the &lt;em&gt;Open Source Project of the Day&lt;/em&gt; series. Today's project is &lt;strong&gt;Trellis&lt;/strong&gt; — an AI coding agent harness that persists project specs, task context, and session memory into your repository.&lt;/p&gt;

&lt;p&gt;You spend a session with Claude Code getting things right: conventions explained, context established, good results produced. You close the terminal. Next day: start over. Re-explain the tech stack choices, re-describe the code style, re-frame the current task's background.&lt;/p&gt;

&lt;p&gt;This has a name: &lt;strong&gt;agent amnesia&lt;/strong&gt;. Every session starts fresh, and all accumulated context disappears.&lt;/p&gt;

&lt;p&gt;Trellis stores what should be remembered inside the repository: &lt;code&gt;.trellis/spec/&lt;/code&gt; for conventions, &lt;code&gt;.trellis/tasks/&lt;/code&gt; for tasks and PRDs, &lt;code&gt;.trellis/workspace/&lt;/code&gt; for session journals. The agent reads these on startup — no re-explaining required.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The three-directory system: why spec / tasks / workspace each have distinct roles&lt;/li&gt;
&lt;li&gt;The 4-phase workflow: Plan → Implement → Verify → Finish as a complete loop&lt;/li&gt;
&lt;li&gt;Trellis Skills: what each of the four built-in workflow modules does&lt;/li&gt;
&lt;li&gt;Comparison with CLAUDE.md / AGENTS.md: why single-file approaches become monoliths&lt;/li&gt;
&lt;li&gt;Team scenario: how committing specs to Git benefits the whole team&lt;/li&gt;
&lt;li&gt;Supported platforms and initialization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Experience with Claude Code, Cursor, or a similar AI coding tool&lt;/li&gt;
&lt;li&gt;Familiarity with basic Git workflow&lt;/li&gt;
&lt;li&gt;Basic familiarity with CLAUDE.md or .cursorrules concepts&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is Trellis?
&lt;/h3&gt;

&lt;p&gt;Trellis is an AI coding agent harness — "scaffolding for AI, guiding it along the path of your conventions."&lt;/p&gt;

&lt;p&gt;"Harness" as a term is gaining traction in 2026 AI coding discussions: not the agent itself, but the constraints, memory, and workflow structure built around it. Trellis is among the most systematic open-source implementations in this space, cited in academic work (the "Agent Harness Engineering: A Survey" paper on OpenReview).&lt;/p&gt;

&lt;p&gt;The core design decision: &lt;strong&gt;persist everything inside the Git repository&lt;/strong&gt;. Specs can be code-reviewed. Session memory can be team-shared. Task context works across tools. This isn't just a file path choice — it's a decision to bring AI working patterns into the engineering management system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Organization&lt;/strong&gt;: Mindfold AI (&lt;a href="https://github.com/mindfold-ai" rel="noopener noreferrer"&gt;mindfold-ai&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://docs.trytrellis.app" rel="noopener noreferrer"&gt;docs.trytrellis.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: AGPL-3.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack&lt;/strong&gt;: TypeScript 67.9% / Python 25.8%&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;11,300+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: 641+&lt;/li&gt;
&lt;li&gt;💻 Supported platforms: 16 AI coding platforms&lt;/li&gt;
&lt;li&gt;📄 License: AGPL-3.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Three-Directory System
&lt;/h3&gt;

&lt;p&gt;Trellis maintains a &lt;code&gt;.trellis/&lt;/code&gt; directory at the repository root with three subdirectories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.trellis/
├── spec/           ← Project conventions (commit to Git)
│   ├── coding-standards.md
│   ├── architecture.md
│   ├── tech-stack.md
│   └── team-conventions.md
│
├── tasks/          ← Tasks and PRDs (commit to Git)
│   ├── active/
│   │   └── feature-auth/
│   │       ├── prd.md
│   │       ├── implementation-context.md
│   │       └── review-context.md
│   └── completed/
│
└── workspace/      ← Session journals (optionally .gitignored)
    └── your-name/
        └── journal.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;spec/&lt;/strong&gt;: The core. Everything you don't want to re-explain to an agent every session: the reasoning behind technology choices, code style rules, architecture constraints, naming conventions, known pitfalls. Committed to Git, automatically injected into the agent's context at session start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;tasks/&lt;/strong&gt;: Tasks are complete work units, not single-line descriptions. Each task has a PRD (product requirements document), implementation context (everything the agent needs to know before writing code), and review context (what to check the diff against).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;workspace/&lt;/strong&gt;: Per-developer session journals. Records what happened this session, what problems were encountered, and what the next session needs to know to continue. The agent reads this file when a new session starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4-Phase Workflow
&lt;/h3&gt;

&lt;p&gt;Trellis defines a complete work loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Plan → Implement → Verify → Finish
  ↑                                ↓
  └──────── spec updated ←─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 1: Plan (trellis-brainstorm)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Clarify requirements before writing any code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Invoke trellis-brainstorm
    ↓
Agent walks you through requirement clarification:
  "What does success look like for this feature?"
  "Are there technical constraints to respect?"
  "What edge cases need to be handled?"
    ↓
Produces prd.md (product requirements document)
Research-heavy subtasks → dispatched to sub-agent
    ↓
PRD saved to .trellis/tasks/active/[task-name]/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 2: Implement (trellis-implement)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Context auto-injected at execution time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Invoke trellis-implement
    ↓
Automatically reads:
  - Relevant specs from .trellis/spec/
  - .trellis/tasks/active/[task-name]/prd.md
  - .trellis/workspace/[your-name]/journal.md
    ↓
Writes code according to the PRD, respecting spec constraints
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 3: Verify (trellis-check)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Automated review after code is written:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Invoke trellis-check
    ↓
Compare against spec: does the diff conform to standards?
Run lint, type checking, tests
Issues found → attempt self-correction
Self-correction fails → report to developer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 4: Finish (trellis-update-spec)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most valuable part of the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Invoke trellis-update-spec
    ↓
Analyze new patterns, conventions, and rules discovered in this task
Promote those learnings into .trellis/spec/
    ↓
The next session starts smarter than this one
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Comparison with CLAUDE.md / AGENTS.md
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The problem with CLAUDE.md / .cursorrules:
  Single file → grows into a catch-all dumping ground over time
  Bulk injection → everything in context regardless of task relevance
  No task structure → no PRD, no implementation/review context separation
  No persistent memory → session ends, context is gone

What Trellis does instead:
  Modular specs → injected on demand, relevant to the task at hand
  Task context → PRD + implementation context + review context, layered
  Workspace memory → continuity across sessions
  Finish phase → spec auto-updates every time a task completes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trellis doesn't replace CLAUDE.md — it builds structure around it. You can still have a CLAUDE.md; Trellis spec files supplement it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation and Initialization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Trellis CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @mindfoldhq/trellis@latest

&lt;span class="c"&gt;# Navigate to your project&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project

&lt;span class="c"&gt;# Basic init (auto-detects installed AI tools)&lt;/span&gt;
trellis init &lt;span class="nt"&gt;-u&lt;/span&gt; your-name

&lt;span class="c"&gt;# Platform-specific init&lt;/span&gt;
trellis init &lt;span class="nt"&gt;--claude-code&lt;/span&gt; &lt;span class="nt"&gt;--cursor&lt;/span&gt; &lt;span class="nt"&gt;--codex&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; your-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;trellis init&lt;/code&gt; does the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates the &lt;code&gt;.trellis/&lt;/code&gt; directory structure&lt;/li&gt;
&lt;li&gt;Generates platform-specific configuration files for detected tools&lt;/li&gt;
&lt;li&gt;Creates initial spec templates (can be AI-generated from existing code)&lt;/li&gt;
&lt;li&gt;Handles workspace directory &lt;code&gt;.gitignore&lt;/code&gt; according to preference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Generating initial specs for an existing project:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In Claude Code:
"Analyze this codebase and generate initial .trellis/spec/ content
 covering the tech stack, code style, and architectural constraints"

Then review and refine the generated spec files manually
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Platform Support
&lt;/h3&gt;

&lt;p&gt;Trellis supports 16 AI coding platforms and generates platform-specific configuration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confirmed support&lt;/strong&gt;: Claude Code, Cursor, OpenCode, Codex, GitHub Copilot, Devin, and more.&lt;/p&gt;

&lt;p&gt;For Claude Code specifically, Trellis generates a structured CLAUDE.md and corresponding Skill files so Claude Code knows when to invoke &lt;code&gt;trellis-brainstorm&lt;/code&gt;, &lt;code&gt;trellis-implement&lt;/code&gt;, and the other workflow modules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trellis Skill System
&lt;/h3&gt;

&lt;p&gt;Four built-in Skills define the workflow entry points:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;th&gt;Core Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trellis-brainstorm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Starting a new task&lt;/td&gt;
&lt;td&gt;Guides requirement clarification, outputs PRD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trellis-implement&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Starting to write code&lt;/td&gt;
&lt;td&gt;Injects context, executes implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trellis-check&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After code is written&lt;/td&gt;
&lt;td&gt;Reviews against specs, self-corrects where possible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trellis-update-spec&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;After task completion&lt;/td&gt;
&lt;td&gt;Promotes learnings back into spec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These Skills follow the &lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; open standard, the same format used by android/skills and other projects in the ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Team Workflow
&lt;/h3&gt;

&lt;p&gt;When spec files are committed to Git, the entire team benefits from the same standards:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: New team member joins
Traditional: Docs in Confluence (possibly stale); ask senior devs; stumble through
Trellis:
  After git clone → .trellis/spec/ has complete tech stack and conventions
  AI agent automatically follows those conventions
  New member is productive with team-standard AI workflows within a day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario: Team discovers a new best practice
Traditional: Verbal knowledge transfer, or updating a doc nobody reads
Trellis:
  Someone runs trellis-update-spec → spec file updated
  Opens a PR, team reviews
  After merge, all subsequent AI sessions across the team use the new standard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Workspace Journal Format
&lt;/h3&gt;

&lt;p&gt;Session journals give agents continuity between sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Journal - your-name&lt;/span&gt;

&lt;span class="gu"&gt;## 2026-06-28&lt;/span&gt;
&lt;span class="gu"&gt;### Session summary&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Completed JWT verification logic for the auth module
&lt;span class="p"&gt;-&lt;/span&gt; Discovered a race condition in refresh token handling under concurrent requests
&lt;span class="p"&gt;-&lt;/span&gt; Temporary fix: Redis distributed lock, but a better approach probably exists

&lt;span class="gu"&gt;### What the next session needs to know&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Refresh token race condition fix needs further evaluation
&lt;span class="p"&gt;-&lt;/span&gt; Need to write integration tests for the auth module
&lt;span class="p"&gt;-&lt;/span&gt; Edge case: same user refreshing from multiple devices simultaneously

&lt;span class="gu"&gt;### Open questions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Should we implement rotating refresh tokens? Need to confirm product requirements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the next session starts, the agent reads this journal, understands the context, and doesn't need re-briefing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @mindfoldhq/trellis@latest

&lt;span class="c"&gt;# Navigate to your project&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project

&lt;span class="c"&gt;# Initialize (auto-detects Claude Code, Cursor, etc.)&lt;/span&gt;
trellis init &lt;span class="nt"&gt;-u&lt;/span&gt; your-name

&lt;span class="c"&gt;# Start your first task&lt;/span&gt;
&lt;span class="c"&gt;# In Claude Code, say:&lt;/span&gt;
&lt;span class="c"&gt;# "Use trellis-brainstorm to start a new task: add MFA support to the auth module"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Links and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mindfold-ai/trellis" rel="noopener noreferrer"&gt;mindfold-ai/Trellis&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://docs.trytrellis.app" rel="noopener noreferrer"&gt;docs.trytrellis.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🏢 &lt;strong&gt;Mindfold AI&lt;/strong&gt;: &lt;a href="https://github.com/mindfold-ai" rel="noopener noreferrer"&gt;github.com/mindfold-ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Trellis addresses a systemic problem in the AI coding toolchain: every session starts from zero, without awareness of the project's context, the team's conventions, or the previous session's progress.&lt;/p&gt;

&lt;p&gt;Storing specs, task context, and session memory inside the Git repository is a design decision, not an implementation detail. It means AI working patterns enter the engineering management system: specs can be reviewed, memory can be shared, changes can be tracked.&lt;/p&gt;

&lt;p&gt;The closed-loop workflow design — especially the Finish phase promoting learnings back into specs — means the system improves with use rather than starting from the same baseline each time.&lt;/p&gt;

&lt;p&gt;11.3k Stars signals real demand. For developers switching between multiple projects or tools, or tech leads trying to standardize AI-assisted development practices across a team, Trellis is worth serious evaluation.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Welcome to my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>claude</category>
    </item>
    <item>
      <title>Open Source Project #108: ai-website-cloner-template — One Command, Parallel Agents, Any Website Reversed into Next.js Code</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Sun, 28 Jun 2026 01:59:48 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-108-ai-website-cloner-template-one-command-parallel-agents-any-website-633</link>
      <guid>https://dev.to/wonderlab/open-source-project-108-ai-website-cloner-template-one-command-parallel-agents-any-website-633</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Point it at a URL, run &lt;code&gt;/clone-website&lt;/code&gt;, and your AI agent will inspect the site, extract design tokens and assets, write component specs, and dispatch parallel builders to reconstruct every section."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is &lt;strong&gt;article #108&lt;/strong&gt; in the "One Open Source Project a Day" series. Today's project is &lt;strong&gt;ai-website-cloner-template&lt;/strong&gt; — a GitHub template repository for reverse-engineering any website into a Next.js codebase using AI coding agents.&lt;/p&gt;

&lt;p&gt;Turning a "website screenshot" into "runnable code" is a classic Vibe Coding scenario, but most implementations stop at the surface: ask the LLM to look at a screenshot, approximate the layout, fill in placeholder content. This template takes a fundamentally different approach: it defines a rigorous multi-phase agent workflow whose core principle is &lt;strong&gt;"completeness beats speed"&lt;/strong&gt; — every Builder Agent must have exact &lt;code&gt;getComputedStyle()&lt;/code&gt; values, genuinely downloaded assets, and complete interaction state specifications before touching any code.&lt;/p&gt;

&lt;p&gt;22k stars signals real demand for this use case. But the more interesting story is the engineering design — especially using git worktrees to achieve true parallel multi-agent construction.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The five-phase clone pipeline: reconnaissance → foundation → component specs → parallel build → assembly QA&lt;/li&gt;
&lt;li&gt;git worktree parallel agent pattern: how each Builder Agent works in an isolated branch&lt;/li&gt;
&lt;li&gt;Component spec file design principles: why the spec is a "contract," not a "reference"&lt;/li&gt;
&lt;li&gt;Interaction model identification: distinguishing click-driven, scroll-driven, and time-driven behaviors before building&lt;/li&gt;
&lt;li&gt;Cross-platform agent support: how 13 AI coding tools are all served from a single source file&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Experience with Claude Code, Cursor, or similar AI coding tools&lt;/li&gt;
&lt;li&gt;Basic familiarity with Next.js&lt;/li&gt;
&lt;li&gt;Basic understanding of git branches&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;ai-website-cloner-template is a GitHub template repository (&lt;code&gt;is_template: true&lt;/code&gt;) that pre-scaffolds a Next.js 16 + shadcn/ui + Tailwind v4 project alongside a &lt;code&gt;/clone-website&lt;/code&gt; AI Agent Skill.&lt;/p&gt;

&lt;p&gt;Usage pattern: use "Use this template" on GitHub to create your own repository, start your AI agent, run &lt;code&gt;/clone-website &amp;lt;target-url&amp;gt;&lt;/code&gt;, and the agent completes the full pipeline from reconnaissance to working code.&lt;/p&gt;

&lt;p&gt;The project's core engineering contribution isn't the tech stack choice — it's the &lt;strong&gt;multi-agent collaboration protocol&lt;/strong&gt; behind the &lt;code&gt;/clone-website&lt;/code&gt; Skill, particularly using git worktrees for genuinely parallel component builds and enforcing the "spec first, builder second" constraint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Author&lt;/strong&gt;: JCodesMore&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary Language&lt;/strong&gt;: TypeScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community&lt;/strong&gt;: Discord&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;22,100+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: 3,173+ (most used to create independent projects)&lt;/li&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;li&gt;📅 Created: March 2026&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional "screenshot clone" approach:
Screenshot → LLM approximates layout → placeholder content → manual color/spacing corrections → iterate

ai-website-cloner-template approach:
/clone-website https://example.com
    ↓
Phase 1 Recon: screenshots + scroll/click/hover interaction sweep + design token extraction
    ↓
Phase 2 Foundation: update fonts/colors/globals.css + download all assets + extract SVG icons
    ↓
Phase 3 Specs: write spec files per section (exact CSS values + real content + interaction states)
    ↓
Phase 4 Parallel Build: git worktree per section + dispatch Builder Agents in parallel
    ↓
Phase 5 Assembly QA: merge all worktrees + wire up page + npm run build verification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use Cases
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Platform migration&lt;/strong&gt;: Rebuild a site you own from WordPress/Webflow/Squarespace into a Next.js codebase, gaining full code ownership&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lost source code&lt;/strong&gt;: The site is live but the repo is gone, the developer left, or the stack is too legacy to maintain — recover the code from the live version in a modern format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt;: Deconstruct how production sites achieve specific layouts, animations, and responsive behavior by working with the actual code (not just screenshots)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Not Intended For
&lt;/h3&gt;

&lt;p&gt;The README explicitly states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phishing or impersonation&lt;/strong&gt;: forbidden for deceptive purposes, impersonation, or illegal activities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Passing off others' designs as your own&lt;/strong&gt;: logos, brand assets, and original copy belong to their owners&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Violating terms of service&lt;/strong&gt;: some sites explicitly prohibit scraping or reproduction — check first&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Create your own repository&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the GitHub project page, click &lt;strong&gt;Use this template&lt;/strong&gt; → &lt;strong&gt;Create a new repository&lt;/strong&gt; (don't clone the template repository directly).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Clone and install&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/YOUR-USERNAME/YOUR-NEW-REPO.git
&lt;span class="nb"&gt;cd &lt;/span&gt;YOUR-NEW-REPO
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Start Claude Code (recommended)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;--chrome&lt;/span&gt;   &lt;span class="c"&gt;# --chrome starts Chrome MCP for browser automation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Run the clone skill&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/clone-website https://target-website.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiple URLs can be processed in parallel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/clone-website https://site1.com https://site2.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Dev preview&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev        &lt;span class="c"&gt;# Start dev server&lt;/span&gt;
npm run check      &lt;span class="c"&gt;# Run lint + typecheck + build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Supported AI Agent Platforms
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Recommended&lt;/strong&gt; (Opus 4.7)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenCode&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windsurf&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cline&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Roo Code&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Q&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Augment Code&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cross-platform support works like this: &lt;code&gt;AGENTS.md&lt;/code&gt; is the single source of truth for all project instructions. Running &lt;code&gt;bash scripts/sync-agent-rules.sh&lt;/code&gt; auto-generates platform-specific copies (&lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;GEMINI.md&lt;/code&gt;, &lt;code&gt;.cursor/&lt;/code&gt;, &lt;code&gt;.windsurf/&lt;/code&gt;, etc.) from that single file.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Five-Phase Pipeline
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Phase 1: Reconnaissance
&lt;/h4&gt;

&lt;p&gt;This is not just screenshots. Reconnaissance requires three mandatory tasks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Screenshots&lt;/strong&gt;: Full-page screenshots at 1440px (desktop) and 390px (mobile), saved to &lt;code&gt;docs/design-references/&lt;/code&gt;. These are the visual master reference for the entire process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mandatory interaction sweep&lt;/strong&gt; (the most commonly skipped step):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scroll sweep — scroll slowly from top to bottom, observe:
  - At what scroll position does the navbar change appearance?
  - Which elements animate in when entering the viewport?
  - Which sections have scroll-snap points?
  - Is a smooth scroll library (Lenis, Locomotive Scroll) active?

Click sweep — click every element that looks interactive:
  - What content does each tab/pill switch to?
  - What modals open, what dropdowns appear?

Hover sweep — hover every element that might have hover states:
  - Color changes, scale, shadow, opacity, underlines...

Responsive sweep — test at 1440px / 768px / 390px:
  - Note layout changes and the approximate breakpoint where they occur
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All findings are saved to &lt;code&gt;docs/research/BEHAVIORS.md&lt;/code&gt; — the "behavior bible" for the entire cloning process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page topology&lt;/strong&gt;: Map every distinct section from top to bottom, document each section's interaction model (static / click-driven / scroll-driven / time-driven), save as &lt;code&gt;docs/research/PAGE_TOPOLOGY.md&lt;/code&gt; — the assembly blueprint.&lt;/p&gt;

&lt;h4&gt;
  
  
  Phase 2: Foundation Build
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Done by the Orchestrator Agent directly — not delegated to sub-agents&lt;/strong&gt; because it touches many files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Update &lt;code&gt;src/app/layout.tsx&lt;/code&gt;: configure the target site's actual fonts via &lt;code&gt;next/font/google&lt;/code&gt; or &lt;code&gt;next/font/local&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;src/app/globals.css&lt;/code&gt;: map the target site's color tokens (background, foreground, primary, muted...) to the shadcn variable system using oklch color space&lt;/li&gt;
&lt;li&gt;Extract all SVG icons → save as named React components in &lt;code&gt;src/components/icons.tsx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run the asset download script (&lt;code&gt;scripts/download-assets.mjs&lt;/code&gt;): batch-download all images and videos to &lt;code&gt;public/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Verify: &lt;code&gt;npm run build&lt;/code&gt; passes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Asset discovery runs JavaScript via browser MCP to precisely enumerate all &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt;, and CSS &lt;code&gt;background-image&lt;/code&gt; elements, including &lt;strong&gt;absolutely-positioned overlay layers&lt;/strong&gt; — a section that looks like a single image is often a background watercolor + foreground UI mockup PNG + an overlay icon. Missing any layer makes the clone look empty.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Run via browser MCP to discover all assets&lt;/span&gt;
&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;img&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currentSrc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parentClasses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parentElement&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;siblings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parentElement&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parentElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;img&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getComputedStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;position&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;zIndex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getComputedStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;zIndex&lt;/span&gt;
  &lt;span class="p"&gt;})),&lt;/span&gt;
  &lt;span class="na"&gt;backgroundImages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getComputedStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;backgroundImage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;bg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;bg&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getComputedStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;backgroundImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})),&lt;/span&gt;
  &lt;span class="c1"&gt;// ... videos, fonts, favicons&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Phase 3: Component Specs
&lt;/h4&gt;

&lt;p&gt;Before any Builder Agent is dispatched, a spec file must be written for that section in &lt;code&gt;docs/research/components/&amp;lt;name&amp;gt;.md&lt;/code&gt;. The spec is the &lt;strong&gt;contract&lt;/strong&gt; between extraction and construction — not optional.&lt;/p&gt;

&lt;p&gt;What the spec file contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A screenshot crop of the section (local path)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exact CSS values&lt;/strong&gt; extracted via &lt;code&gt;getComputedStyle()&lt;/code&gt; — not eyeballed estimates&lt;/li&gt;
&lt;li&gt;Downloaded asset local paths (the &lt;code&gt;public/&lt;/code&gt; path, not the original URL)&lt;/li&gt;
&lt;li&gt;Real text content (from &lt;code&gt;element.textContent&lt;/code&gt;, not placeholders)&lt;/li&gt;
&lt;li&gt;All interaction states (content per tab state, CSS diff before/after scroll trigger, transition animation parameters)&lt;/li&gt;
&lt;li&gt;Responsive breakpoint behaviors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CSS extraction script (executed via browser MCP):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;computed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getComputedStyle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fontSize&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fontWeight&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fontFamily&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lineHeight&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;letterSpacing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;color&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;backgroundColor&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;background&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;padding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;paddingTop&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;paddingRight&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;paddingBottom&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;paddingLeft&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;margin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;borderRadius&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;boxShadow&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;display&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;flexDirection&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gap&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alignItems&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;justifyContent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;position&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zIndex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ... full property list&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;computed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;]]));&lt;/span&gt;
&lt;span class="p"&gt;})(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complexity budget rule&lt;/strong&gt;: If a section's spec file exceeds ~150 lines, the section is too complex for one agent — split it into smaller pieces. This is a mechanical check that cannot be overridden with "but they're all related."&lt;/p&gt;

&lt;h4&gt;
  
  
  Phase 4: Parallel Build via git Worktrees
&lt;/h4&gt;

&lt;p&gt;This is the key architectural decision — git worktrees give each Builder Agent an isolated working branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Orchestrator creates a worktree per section&lt;/span&gt;
git worktree add .worktrees/hero-section feature/hero-section
git worktree add .worktrees/features-grid feature/features-grid
git worktree add .worktrees/pricing-section feature/pricing-section

&lt;span class="c"&gt;# Each Builder Agent works in its own worktree&lt;/span&gt;
&lt;span class="c"&gt;# Builder receives the full spec file content inline in its prompt&lt;/span&gt;
&lt;span class="c"&gt;# Builder verifies: npx tsc --noEmit passes before finishing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What each Builder Agent receives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full spec file content (inlined into the prompt, not a path reference)&lt;/li&gt;
&lt;li&gt;Screenshot crop path&lt;/li&gt;
&lt;li&gt;Downloaded asset local paths&lt;/li&gt;
&lt;li&gt;Global style system (font variables, color tokens)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Builder doesn't need to read other sections' code or understand the overall page structure. Its only job: implement this one component to spec, with TypeScript compiling clean.&lt;/p&gt;

&lt;h4&gt;
  
  
  Phase 5: Assembly &amp;amp; QA
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Orchestrator merges all worktree branches&lt;/span&gt;
git merge feature/hero-section feature/features-grid feature/pricing-section ...
&lt;span class="c"&gt;# Resolve merge conflicts (Orchestrator has full context for smart resolution)&lt;/span&gt;

&lt;span class="c"&gt;# Wire up all section components in correct order in src/app/page.tsx&lt;/span&gt;

&lt;span class="c"&gt;# Final verification&lt;/span&gt;
npm run build    &lt;span class="c"&gt;# Must pass — no exceptions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Single Most Expensive Mistake: Wrong Interaction Model
&lt;/h3&gt;

&lt;p&gt;The SKILL file dedicates significant space to this because &lt;strong&gt;building a click-based UI when the original is scroll-driven means a complete rewrite, not a CSS tweak&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Identification protocol:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't click first.&lt;/strong&gt; Scroll slowly and observe if anything changes on its own.&lt;/li&gt;
&lt;li&gt;If yes → scroll-driven. Extract the mechanism: &lt;code&gt;IntersectionObserver&lt;/code&gt;, &lt;code&gt;scroll-snap&lt;/code&gt;, &lt;code&gt;position: sticky&lt;/code&gt;, &lt;code&gt;animation-timeline&lt;/code&gt;, or JS scroll listeners.&lt;/li&gt;
&lt;li&gt;If no → then test click/hover-driven interactivity.&lt;/li&gt;
&lt;li&gt;Document explicitly in the spec: &lt;code&gt;INTERACTION MODEL: scroll-driven with IntersectionObserver&lt;/code&gt; or &lt;code&gt;INTERACTION MODEL: click-to-switch with opacity transition&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Classic scroll-driven patterns to watch for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A sticky sidebar where the active item auto-changes as content scrolls past (IntersectionObserver, NOT click handlers)&lt;/li&gt;
&lt;li&gt;Tabbed/pill content that cycles when built as click-based&lt;/li&gt;
&lt;li&gt;Smooth scroll libraries (Lenis, Locomotive Scroll) — check for &lt;code&gt;.lenis&lt;/code&gt; class or scroll container wrappers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;src/&lt;/span&gt;
  &lt;span class="s"&gt;app/&lt;/span&gt;              &lt;span class="c1"&gt;# Next.js routes&lt;/span&gt;
  &lt;span class="s"&gt;components/&lt;/span&gt;
    &lt;span class="s"&gt;ui/&lt;/span&gt;             &lt;span class="c1"&gt;# shadcn/ui primitives&lt;/span&gt;
    &lt;span class="s"&gt;icons.tsx&lt;/span&gt;       &lt;span class="c1"&gt;# SVG icons extracted from target (React components)&lt;/span&gt;
  &lt;span class="s"&gt;lib/utils.ts&lt;/span&gt;      &lt;span class="c1"&gt;# cn() utility&lt;/span&gt;
  &lt;span class="s"&gt;types/&lt;/span&gt;            &lt;span class="c1"&gt;# TypeScript interfaces&lt;/span&gt;
  &lt;span class="s"&gt;hooks/&lt;/span&gt;            &lt;span class="c1"&gt;# Custom React hooks&lt;/span&gt;
&lt;span class="s"&gt;public/&lt;/span&gt;
  &lt;span class="s"&gt;images/&lt;/span&gt;           &lt;span class="c1"&gt;# Downloaded images from target&lt;/span&gt;
  &lt;span class="s"&gt;videos/&lt;/span&gt;           &lt;span class="c1"&gt;# Downloaded videos from target&lt;/span&gt;
  &lt;span class="s"&gt;seo/&lt;/span&gt;              &lt;span class="c1"&gt;# Favicons, OG images, webmanifest&lt;/span&gt;
&lt;span class="s"&gt;docs/&lt;/span&gt;
  &lt;span class="s"&gt;research/&lt;/span&gt;         &lt;span class="c1"&gt;# Extraction output: component specs, behaviors, topology&lt;/span&gt;
  &lt;span class="s"&gt;design-references/&lt;/span&gt; &lt;span class="c1"&gt;# Screenshots (desktop + mobile)&lt;/span&gt;
&lt;span class="s"&gt;scripts/&lt;/span&gt;
  &lt;span class="s"&gt;sync-agent-rules.sh&lt;/span&gt;  &lt;span class="c1"&gt;# Sync AGENTS.md to all platform formats&lt;/span&gt;
  &lt;span class="s"&gt;sync-skills.mjs&lt;/span&gt;      &lt;span class="c1"&gt;# Sync /clone-website to all platforms&lt;/span&gt;
&lt;span class="s"&gt;AGENTS.md&lt;/span&gt;           &lt;span class="c1"&gt;# Single source of truth for all agent instructions&lt;/span&gt;
&lt;span class="s"&gt;CLAUDE.md&lt;/span&gt;           &lt;span class="c1"&gt;# Claude Code config (imports AGENTS.md)&lt;/span&gt;
&lt;span class="s"&gt;GEMINI.md&lt;/span&gt;           &lt;span class="c1"&gt;# Gemini CLI config (imports AGENTS.md)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cross-Platform Design: One Source, Many Targets
&lt;/h3&gt;

&lt;p&gt;The template supports 13 AI coding platforms without maintaining 13 separate instruction sets:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;Source of Truth&lt;/th&gt;
&lt;th&gt;Sync Command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Project instructions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bash scripts/sync-agent-rules.sh&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;/clone-website&lt;/code&gt; skill&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.claude/skills/clone-website/SKILL.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;node scripts/sync-skills.mjs&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each script generates platform-specific copies automatically. Agents that can read the source files natively need no regeneration.&lt;/p&gt;

&lt;p&gt;This pattern — single source file plus generation scripts — is worth borrowing for any tool that needs to support multiple AI coding environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/JCodesMore/ai-website-cloner-template" rel="noopener noreferrer"&gt;JCodesMore/ai-website-cloner-template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎬 &lt;strong&gt;Demo video&lt;/strong&gt;: YouTube link in the project README&lt;/li&gt;
&lt;li&gt;💬 &lt;strong&gt;Discord&lt;/strong&gt;: &lt;a href="https://discord.gg/hrTSX5yTpB" rel="noopener noreferrer"&gt;discord.gg/hrTSX5yTpB&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;ai-website-cloner-template's value isn't the tech stack choice (Next.js + shadcn + Tailwind is standard). It's the &lt;strong&gt;multi-agent collaboration protocol&lt;/strong&gt; that's been thought through carefully.&lt;/p&gt;

&lt;p&gt;A few design decisions worth internalizing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Completeness beats speed"&lt;/strong&gt;: Builder Agents receive everything before starting. No guessing mid-build. The constraint makes results more reliable at the cost of a slower reconnaissance phase — the author considers that tradeoff correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity budget (150-line spec = split signal)&lt;/strong&gt;: A mechanical rule controlling task scope, not a judgment call. Engineering discipline, not intuition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;git worktree parallelism&lt;/strong&gt;: Each Builder works in an isolated branch; the Orchestrator merges at the end. Parallelism isn't "run multiple tasks simultaneously" — it's isolated work with clear merge semantics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single source + sync scripts&lt;/strong&gt;: &lt;code&gt;AGENTS.md&lt;/code&gt; is the single source of truth for 13 platforms. One edit, one script run, all platforms updated. A pattern worth copying for any multi-environment AI toolchain.&lt;/p&gt;

&lt;p&gt;The underlying principle — define the process rigorously enough that any capable agent can follow it, rather than relying on one specific agent to improvise correctly — is applicable well beyond website cloning.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills, each validated against real enterprise workflows. No hype, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Visit my &lt;a href="https://home.wonlab.top" rel="noopener noreferrer"&gt;personal site&lt;/a&gt; for more insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>frontend</category>
      <category>website</category>
      <category>development</category>
    </item>
    <item>
      <title>AI-Native Organizations Aren't About Buying Tools — They're About Making Waiting Disappear</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Sun, 28 Jun 2026 01:54:02 +0000</pubDate>
      <link>https://dev.to/wonderlab/ai-native-organizations-arent-about-buying-tools-theyre-about-making-waiting-disappear-47oj</link>
      <guid>https://dev.to/wonderlab/ai-native-organizations-arent-about-buying-tools-theyre-about-making-waiting-disappear-47oj</guid>
      <description>&lt;p&gt;I came across a piece on AI-native organizations recently, and it stuck with me. Let me work through it.&lt;/p&gt;

&lt;p&gt;Lots of companies are doing "AI transformation." Some build internal gateways and hand out API keys to everyone. Some run company-wide training programs. Some bolt a few "smart assistants" onto their existing systems. And some — this one's my favorite — post daily token usage rankings in the company chat, as if burning more compute is the same as being more advanced.&lt;/p&gt;

&lt;p&gt;Are any of these useful? Sure, a little. But none of them equal AI-native.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;True AI-native isn't about how many models you use. It's about whether the waiting in your organization has decreased since AI showed up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If collaboration still works the same way — people still queue up for approvals, tasks still get handed off layer by layer — then AI is just a new tool bolted onto an old organization. New hardware, running the same old operating system.&lt;/p&gt;




&lt;h2&gt;
  
  
  From "Tool" to "Operating System"
&lt;/h2&gt;

&lt;p&gt;YC partner Diana Hu once said: AI shouldn't just be a tool that companies use — it should become the operating system that companies run on.&lt;/p&gt;

&lt;p&gt;That's right. But it's easy to misread.&lt;/p&gt;

&lt;p&gt;When people hear "operating system," the first instinct is: does that mean we need to build an AI platform? Connect all our systems into it?&lt;/p&gt;

&lt;p&gt;That's exactly the problem. AI-native absolutely needs a technical foundation, but it's not fundamentally an IT project. The hard part isn't the systems — it's the unspoken habits, responsibility boundaries, and collaboration patterns baked into how an organization works.&lt;/p&gt;

&lt;p&gt;For a CEO, "AI as operating system" really means: AI needs to enter the rules of how the organization runs. It needs to influence how tasks get assigned, how information gets used, how accountability gets defined, and how people work together.&lt;/p&gt;

&lt;p&gt;If all you end up with is a new platform — but the rules haven't changed — that's not AI-native. It's a new system running inside an old organization.&lt;/p&gt;




&lt;h2&gt;
  
  
  Four Stages: Where Are You Now?
&lt;/h2&gt;

&lt;p&gt;Companies don't become AI-native overnight. They tend to go through four stages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Individual Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Employees start using AI to write emails, meeting notes, and summaries. This is the most common starting point, and it's easy to get there.&lt;/p&gt;

&lt;p&gt;But it's not organizational change. Each person has just gotten faster at their own piece of the work. Collaboration hasn't changed. The people who needed to wait before still need to wait. The processes that needed to be walked through still do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Node Automation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI starts entering fixed workflows to replace repetitive work. Customer feedback comes in and AI auto-tags it, assigns priority, routes it. A support ticket arrives and AI fills in missing info, classifies the issue, and suggests a resolution. A sales call ends and AI extracts customer needs, risk signals, and next steps.&lt;/p&gt;

&lt;p&gt;This goes further than Stage 1, but the process itself hasn't changed — you've just made one node inside the old process faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3: Nodes Disappear&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the real turning point.&lt;/p&gt;

&lt;p&gt;AI isn't just speeding up a node anymore — it's making some nodes that once felt inevitable simply unnecessary.&lt;/p&gt;

&lt;p&gt;Before, many tasks had to wait in sequence. Now, employees can use AI to push the early-stage work to a reasonably complete state before any handoff — not necessarily the final version, but enough to keep things moving. You no longer need to wait on someone else, or another department, before the next step can start.&lt;/p&gt;

&lt;p&gt;This is when an organization starts to genuinely get lighter. AI isn't just changing the efficiency of a node — it's changing how work flows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 4: Organizational Restructuring&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once fewer tasks require passing through multiple layers, the old processes, roles, and accountability boundaries start getting redefined.&lt;/p&gt;

&lt;p&gt;Some roles stop being execution nodes and become standard-setters, quality gatekeepers, handlers of genuinely complex problems. Some workflows stop requiring multi-layer hand-offs and instead have one person or a small team close the loop, with key decision-makers stepping in only at judgment points.&lt;/p&gt;

&lt;p&gt;AI-native organizations don't happen because the CEO orders processes cut. They happen because new capabilities grow in, old nodes slowly lose their reason to exist, and the organization naturally gets lighter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Roles: AI Builder and Human Decider
&lt;/h2&gt;

&lt;p&gt;Once an organization starts going AI-native, people's roles have to be redefined too.&lt;/p&gt;

&lt;p&gt;The relevant distinction in future organizations won't be job titles — it'll be two kinds of roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Builder&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not necessarily an engineer. Could be product, operations, sales, finance — the job title doesn't matter. What matters is whether this person can take an ambiguous business problem and break it into a workflow AI can actually execute. Whether they can set standards, validate results, iterate, and turn something that worked once into something that can be reused.&lt;/p&gt;

&lt;p&gt;The scarcest thing in the AI era isn't people who know how to open a tool. It's &lt;strong&gt;people who understand the business and can close loops in collaboration with AI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human Decider&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This person's value isn't monitoring processes — it's making judgments. AI can generate 10 proposals, but which one aligns with company strategy? Which one carries brand risk? Which one looks effective short-term but will quietly damage the organization over time? These questions can't be delegated to a model. Models don't carry business accountability.&lt;/p&gt;

&lt;p&gt;Which means: the more capable AI gets, the more important Human Deciders become.&lt;/p&gt;

&lt;p&gt;One sentence summary: &lt;strong&gt;AI Builders push work to the point where a judgment call is possible. Human Deciders decide whether to do it, which direction to go, and how far.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Five Steps — Not "Start by Cutting Processes"
&lt;/h2&gt;

&lt;p&gt;Here's a cleaner path forward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Build an AI transformation team — with CEO authorization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This team cannot be a side project under IT. It can't be an HR-run training initiative. AI transformation will run into old processes, old divisions of responsibility, old power structures. Without CEO-level backing, any pilot will get slowly absorbed back into the old org.&lt;/p&gt;

&lt;p&gt;The team needs three capabilities together: know AI (understand what fits what scenario), know the business (identify real pain points, not just visible ones), know the organization (navigate the structural changes that follow).&lt;/p&gt;

&lt;p&gt;The goal isn't to build some AI assistants. It's to find the serial bottlenecks in the business most worth breaking through, run a proof of concept, and show the company: this wasn't impossible to do faster — we've just always done it the old way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Build the habit of "ask AI first"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But you can't mandate this. If you require everyone to use AI a certain number of times per day, they'll manufacture conversations. If you require departments to submit AI case studies, they'll package something. Another formalism exercise.&lt;/p&gt;

&lt;p&gt;A better approach: use incentives to surface genuine cases. A good case should answer three questions: How many people used to be involved in this? How many wait-steps did AI remove? Can this approach be reused by someone else in the same role doing the same thing?&lt;/p&gt;

&lt;p&gt;And share these cases company-wide — not to honor the early adopters, but so others can see: this thing can be done differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Turn good cases into reusable Skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running a case study program isn't enough on its own. If someone figures out a method and it stays on their laptop or in their head, the organization hasn't actually gotten stronger.&lt;/p&gt;

&lt;p&gt;The transformation team's job is to pull the pattern out of each good case and turn it into a reusable Skill — something a colleague in the same role, handling the same situation, can invoke without starting from scratch.&lt;/p&gt;

&lt;p&gt;The core of an AI-native organization isn't having a few AI experts. It's &lt;strong&gt;turning what the experts figured out into something the organization can replicate&lt;/strong&gt;. Otherwise, the person who's brilliant today leaves, and everyone starts over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Let old processes leave the main path&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once there's a library of Skills that lets employees push work to a "ready to discuss, ready to decide" state, the nodes that once required waiting in line become optional.&lt;/p&gt;

&lt;p&gt;Note: this isn't about slashing processes up front. Cutting things directly triggers defensiveness — from the employee's perspective, you're removing their reason to exist.&lt;/p&gt;

&lt;p&gt;The smarter framing is: change "must always go through this node" to "involved when it's actually needed."&lt;/p&gt;

&lt;p&gt;Concrete example: a business person writing a client proposal used to need to wait for marketing to provide materials, data team to pull analysis, and design to do layout. With the right Skills in place, they can generate the initial client insight, proposal structure, copy, and visual draft themselves. Not a final version, but enough to start a real conversation.&lt;/p&gt;

&lt;p&gt;At that point, marketing's role shifts from "materials production node you always wait for" to "brand standards and content quality gatekeeper." The data team shifts from "report output you always wait for" to "accountable for the metric framework and genuinely complex analysis." Design shifts from "mechanical layout person" to "responsible for visual standards and high-stakes output quality."&lt;/p&gt;

&lt;p&gt;Nodes go from "mandatory stops" to "high-value judgment points." People don't disappear — they exit low-value execution and move toward work that requires judgment, definition, and oversight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: IT transformation comes last&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many companies get this sequence completely backwards. Start talking AI transformation, immediately kick off a systems project, spend a year building a platform, employees don't use it, business habits haven't changed — and you've just created a fancier version of the previous IT initiative.&lt;/p&gt;

&lt;p&gt;The right order: change people's habits first, then crystallize the business method, then evaluate which methods are worth systematizing, then build the IT layer.&lt;/p&gt;

&lt;p&gt;Early on, you genuinely don't know which use cases are real. If you build the AI platform first, you risk locking in the wrong processes at a higher level of sophistication.&lt;/p&gt;

&lt;p&gt;CEOs: resist the urge to immediately ask "should we build an AI platform?" or "should we connect everything into a unified agent system?" Ask instead: which role has already demonstrated real efficiency gains with AI? Which Skill is getting repeatedly invoked? Which old node has already stopped being a required stop? Which scenario is genuinely worth systematizing?&lt;/p&gt;

&lt;p&gt;When those questions have answers, IT investment won't be wasted. &lt;strong&gt;The IT system is not the starting point of AI-native. It's what gets built after the organization's new habits have stabilized.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Things CEOs Actually Need to Do
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use AI yourself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You don't need to understand model training or infrastructure. But you need to actually use AI. Only then will you avoid being misled by subordinates, avoid getting swept up in a flashy demo and mandating it company-wide, and avoid dismissing everything because you heard AI hallucinates sometimes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shield and authorize the transformation team.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI-native can't be achieved by a side project. Without CEO-level protection, every pilot gets slowly ground down by the existing org structure and turns into another "exciting initiative that faded out."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change where resources flow.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Teams that produce real results with AI get more resources. People who can independently close problems with AI earn higher returns.&lt;/p&gt;

&lt;p&gt;The flip side: teams that consistently reject AI, depend on inefficient handoffs, and pass problems along to processes and other departments should not keep receiving the same resources.&lt;/p&gt;

&lt;p&gt;Organizational change doesn't happen through slogans. &lt;strong&gt;It happens through where resources flow.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  In the End
&lt;/h2&gt;

&lt;p&gt;LLM capability will spread like electricity or running water — it'll become basic infrastructure. Buying compute, buying accounts, building systems: none of that creates a moat.&lt;/p&gt;

&lt;p&gt;What will beat you isn't a competitor of similar size with similar bloat. It'll be a team one-tenth your headcount where every person can close loops with AI, move fast, and make calls. Every business cycle they run is 10x faster than yours.&lt;/p&gt;

&lt;p&gt;AI is, at its core, an evolutionary event. It eliminates the mechanical relay nodes, the redundant approval chains, and the managers who hide behind process to avoid accountability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Making large companies lighter, making slow companies faster — that's what AI-native actually means.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>efficiency</category>
      <category>organization</category>
    </item>
    <item>
      <title>Open Source Project #107: page-agent — A Pure-JS GUI Agent for the Browser, No Screenshots, No Plugins, No Backend</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Sat, 27 Jun 2026 00:16:03 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-107-page-agent-a-pure-js-gui-agent-for-the-browser-no-screenshots-no-51hm</link>
      <guid>https://dev.to/wonderlab/open-source-project-107-page-agent-a-pure-js-gui-agent-for-the-browser-no-screenshots-no-51hm</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"One script tag. Natural language controls your web app. No screenshots, no browser extensions, no backend, no Python."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is &lt;strong&gt;article #107&lt;/strong&gt; in the "One Open Source Project a Day" series. Today's project is &lt;strong&gt;page-agent&lt;/strong&gt; — Alibaba's open-source client-side GUI Agent library.&lt;/p&gt;

&lt;p&gt;The dominant approach to making AI operate a browser follows one path: screenshot → multimodal LLM recognizes elements → execute action. That path has two costs: multimodal models (expensive) and a server or headless browser environment (complex infrastructure).&lt;/p&gt;

&lt;p&gt;page-agent's answer: serializing the DOM to text is enough. Assign numeric indices to interactive elements, send the text to any LLM, get back "click element 3" as a tool call, execute it directly in the browser. The entire loop stays in-page. No screenshots. No multimodal capability needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Text-based DOM: how to turn a page into an LLM-readable indexed structure&lt;/li&gt;
&lt;li&gt;ReAct loop architecture: Observe → Think (reflection + action) → Act, fully implemented in TypeScript&lt;/li&gt;
&lt;li&gt;Reflection-Before-Action model: evaluating the previous step before planning the next&lt;/li&gt;
&lt;li&gt;Built-in tool system: how click, input, scroll, and JS execution tools are defined&lt;/li&gt;
&lt;li&gt;Single package vs. core package: &lt;code&gt;page-agent&lt;/code&gt; (with UI) and &lt;code&gt;@page-agent/core&lt;/code&gt; (pure logic)&lt;/li&gt;
&lt;li&gt;Chrome extension + MCP Server for cross-page capability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Understanding of LLM function calling (tool use) mechanics&lt;/li&gt;
&lt;li&gt;Basic knowledge of DOM and browser events&lt;/li&gt;
&lt;li&gt;Experience with OpenAI SDK or similar APIs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;page-agent is a pure client-side GUI Agent library that embeds LLM reasoning directly into web pages. It understands page structure through serialized DOM text and executes actions without leaving the browser.&lt;/p&gt;

&lt;p&gt;The core architectural decision: &lt;strong&gt;DOM text instead of screenshots&lt;/strong&gt;. The DOM already fully describes what elements exist on the page, what type they are, and whether they're currently interactive. Serializing this to text is more precise than a screenshot (button labels are never blurry), cheaper (no multimodal models), and faster (DOM reads are synchronous).&lt;/p&gt;

&lt;p&gt;The project acknowledges &lt;a href="https://github.com/browser-use/browser-use" rel="noopener noreferrer"&gt;&lt;code&gt;browser-use&lt;/code&gt;&lt;/a&gt; (server-side Python browser automation) as its inspiration. page-agent's positioning is the client-side counterpart: runs inside the page, not controlling a headless browser from a server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Organization&lt;/strong&gt;: Alibaba&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary Language&lt;/strong&gt;: TypeScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm packages&lt;/strong&gt;: &lt;code&gt;page-agent&lt;/code&gt; (with UI Panel), &lt;code&gt;@page-agent/core&lt;/code&gt; (pure agent logic)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latest version&lt;/strong&gt;: v1.10.0&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📄 License: MIT&lt;/li&gt;
&lt;li&gt;📦 npm: &lt;code&gt;page-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;💻 Stack: TypeScript + npm workspaces + Vite&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional browser GUI Agent (screenshot path):
Page → Screenshot → Multimodal LLM visual understanding → Return coordinates/elements → Execute
Cost: multimodal API fees + server/headless browser infrastructure

page-agent (text DOM path):
Page → Serialize DOM with indexed interactive elements → Pure text LLM reasoning
    → Return tool call: { click_element_by_index: { index: 2 } }
    → Execute directly in the page
Cost: text LLM (cheaper) + zero backend (pure frontend JS)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use Cases
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SaaS AI Copilot&lt;/strong&gt;: Embed an AI assistant directly in your product — user says "create a new project named X" and it happens. No backend rewrite needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart form filling&lt;/strong&gt;: Compress 20-click workflows in ERP/CRM/admin systems into a single sentence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility&lt;/strong&gt;: Add natural language control to any web application — voice commands, screen reader enhancement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-tab Agent&lt;/strong&gt;: With the Chrome extension, agents can work across tabs (e.g., read data from a spreadsheet tab, fill it into a form in another tab)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP control&lt;/strong&gt;: Through the MCP Server (Beta), external agent clients can control the browser&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One-line integration (free Demo LLM, technical evaluation only):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://cdn.jsdelivr.net/npm/page-agent@1.10.0/dist/iife/page-agent.demo.js"&lt;/span&gt; &lt;span class="na"&gt;crossorigin=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After loading, a floating Agent panel appears in the bottom-right corner. Type natural language commands directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NPM installation (production):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;page-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PageAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;page-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PageAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.openai.com/v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-US&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Click the login button, then fill in the username as test@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Using &lt;code&gt;@page-agent/core&lt;/code&gt; (no UI panel, embedded scenarios):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PageAgentCore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@page-agent/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PageController&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@page-agent/page-controller&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pageController&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PageController&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;enableMask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PageAgentCore&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;pageController&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sk-...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;maxSteps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;onAfterStep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;step completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Find the lowest-priced item in the product list and add it to the cart&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Supported models&lt;/strong&gt; (any OpenAI-compatible endpoint works):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;gpt-4o, gpt-4-turbo, gpt-5.2, gpt-5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;claude-opus-4.8, claude-sonnet-4, claude-haiku-3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alibaba&lt;/td&gt;
&lt;td&gt;qwen3.5-plus, qwen3.6-max, qwen3.6-flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;deepseek-chat, deepseek-reasoner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;gemini-2.0-flash (via OpenAI-compatible endpoint)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Ollama (qwen3:14b, tested on RTX 3090 24GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Text-Based DOM: The Core Technique
&lt;/h3&gt;

&lt;p&gt;page-agent's &lt;code&gt;PageController&lt;/code&gt; converts the DOM into an indexed simplified text structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;Raw DOM (simplified):
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"form-container"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Username"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"password"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Password"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"btn-primary"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Sign In&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/register"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Create account&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

Serialized for LLM (interactive elements with indices):
URL: https://example.com/login
Title: Login - My App

[0]&lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Username"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
[1]&lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"password"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Password"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
[2]&lt;span class="nt"&gt;&amp;lt;button&amp;gt;&lt;/span&gt;Sign In&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
[3]&lt;span class="nt"&gt;&amp;lt;a&amp;gt;&lt;/span&gt;Create account&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each interactive element gets a numeric index &lt;code&gt;[N]&lt;/code&gt;. The LLM only needs to return "click [2]" or "input '&lt;a href="mailto:admin@example.com"&gt;admin@example.com&lt;/a&gt;' into [0]."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DOM processing pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Live DOM
    ↓ dom_tree/ module
FlatDomTree (flattened tree with DomNode map)
    ↓ Dehydration (simplification)
Indexed text representation
    ↓
LLM context input
    ↓
LLM returns tool call: { click_element_by_index: { index: 2 } }
    ↓
PageController.clickElement(2) → locate HTMLElement by index → fire click
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only elements meeting all three criteria are indexed (to reduce noise):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;isVisible: true&lt;/code&gt; — element is in or scrollable to viewport&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;isInteractive: true&lt;/code&gt; — clickable/inputtable/selectable elements&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;isTopElement: true&lt;/code&gt; — not obscured by overlapping elements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ReAct Loop Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent.execute("complete some task")
    ↓
┌─────────────────────────────────────────────┐
│         Main loop (up to maxSteps)           │
│                                             │
│  ① Observe                                  │
│     pageController.updateTree()             │
│     → refresh DOM, get current page text    │
│                                             │
│  ② Think (LLM call)                         │
│     Input: system prompt + history + DOM    │
│     LLM output (Reflection model):          │
│     {                                       │
│       evaluation_previous_goal: "...",  ←── how did the last step go?
│       memory: "...",                    ←── what to remember?
│       next_goal: "...",                 ←── what to do next?
│       action: { click_element_by_index: {index: 2} }
│     }                                       │
│                                             │
│  ③ Act                                      │
│     Execute the tool call returned by LLM   │
│     Record to history (persistent, visible  │
│     to LLM in next steps)                   │
│                                             │
│  ④ Check termination                        │
│     LLM calls done tool → return result     │
│     OR maxSteps reached                     │
└─────────────────────────────────────────────┘
    ↓
ExecutionResult { success, message, history }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reflection-Before-Action Model
&lt;/h3&gt;

&lt;p&gt;Before each LLM call, the previous step's result and full history are passed to the model, which is required to reflect before acting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evaluation_previous_goal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Successfully clicked the login button, page navigated to the dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Login complete, currently on the dashboard. Still need to find the settings page."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"next_goal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Locate and click the Settings or Account option in the navigation bar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"click_element_by_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;evaluation_previous_goal&lt;/code&gt; forces the model to evaluate the previous step's outcome before proceeding — preventing blind continuation (e.g., if a click triggered nothing, the model should pivot rather than click again).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;memory&lt;/code&gt; is the short-term memory mechanism: compressing key progress into 1-3 sentences that persist in history, so the LLM doesn't "forget" what it has already accomplished in a long multi-step task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Tool System
&lt;/h3&gt;

&lt;p&gt;Tools the LLM can call via function calling:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;click_element_by_index&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Click element at specified index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;input_text&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Type text into input field at index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;select_dropdown_option&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Select dropdown option by text content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scroll&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vertical scroll (by pages or pixels)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scroll_horizontally&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Horizontal scroll&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;execute_javascript&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute arbitrary JS (AbortSignal supported)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wait&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wait 1-10 seconds (for page loads or animations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ask_user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ask the user a question (human-in-the-loop node)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;done&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Task complete, return result&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Custom tools (extend or override defaults):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PageAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;customTools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Add a custom tool&lt;/span&gt;
        &lt;span class="na"&gt;get_current_user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Get the currently logged-in user information&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({}),&lt;/span&gt;
            &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchCurrentUser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;// Set to null to disable a built-in tool&lt;/span&gt;
        &lt;span class="na"&gt;execute_javascript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monorepo Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;packages/
├── page-agent/        → Main package (npm: page-agent), includes UI panel
├── core/              → Pure agent logic (npm: @page-agent/core), no UI
├── llms/              → LLM client, multi-provider support
├── page-controller/   → DOM operations and visual feedback
├── ui/                → Control panel + i18n, decoupled from core
├── extension/         → Chrome extension (WXT + React)
└── website/           → Documentation site (React)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key module boundary design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;page-controller&lt;/code&gt; has no knowledge of LLMs — only handles DOM operations&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llms&lt;/code&gt; has no knowledge of page structure — only handles LLM communication&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;core&lt;/code&gt; combines them into the ReAct loop&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;page-agent&lt;/code&gt; (main package) adds a UI panel on top of &lt;code&gt;core&lt;/code&gt; — both are independently usable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How the LLM Client Works
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;llms&lt;/code&gt; package uses a &lt;strong&gt;MacroTool&lt;/strong&gt; pattern that wraps the reflection model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Each LLM call expects this structure back&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;MacroToolInput&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nb"&gt;Partial&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;AgentReflection&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;AgentReflection&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;evaluation_previous_goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;  &lt;span class="c1"&gt;// "Previous action succeeded/failed because..."&lt;/span&gt;
    &lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;                    &lt;span class="c1"&gt;// "Key facts to remember: ..."&lt;/span&gt;
    &lt;span class="nx"&gt;next_goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;                 &lt;span class="c1"&gt;// "Next I will..."&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI-compatible client handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Converting tools to OpenAI function-calling format&lt;/li&gt;
&lt;li&gt;Building requests with &lt;code&gt;parallel_tool_calls: false&lt;/code&gt; (one action per step)&lt;/li&gt;
&lt;li&gt;Provider-specific patches (DeepSeek: disable explicit &lt;code&gt;tool_choice&lt;/code&gt;; MiniMax: temperature/tool-call compatibility)&lt;/li&gt;
&lt;li&gt;Automatic retry with exponential backoff for 429/500 errors&lt;/li&gt;
&lt;li&gt;Token usage tracking including prompt cache hits and reasoning tokens&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cross-Page Capability: Chrome Extension + MCP
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Chrome Extension&lt;/strong&gt; (for cross-tab workflows):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent runs in an extension-controlled context with access to switch tabs and read different pages' DOM&lt;/li&gt;
&lt;li&gt;Use case: "Read data from a spreadsheet tab, paste it into a form in another tab"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MCP Server (Beta)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exposes page-agent as MCP tools, letting external agent clients (Claude Desktop, Claude Code) control the browser remotely&lt;/li&gt;
&lt;li&gt;Use case: connecting browser control capability into a larger agent workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reliability Design
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step delay&lt;/strong&gt;: 400ms between steps — breathing room for page renders and network requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Click wait&lt;/strong&gt;: 200ms after click operations — ensures DOM updates complete&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency guard&lt;/strong&gt;: Prevents concurrent &lt;code&gt;execute()&lt;/code&gt; calls to avoid race conditions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AbortSignal support&lt;/strong&gt;: All tool execution honors cancellation signals; &lt;code&gt;execute_javascript&lt;/code&gt; is interruptible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic retry&lt;/strong&gt;: LLM failures (429 rate limits, 500 server errors) auto-retry with exponential backoff (100ms base)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token tracking&lt;/strong&gt;: Every step records &lt;code&gt;promptTokens&lt;/code&gt;, &lt;code&gt;completionTokens&lt;/code&gt;, prompt cache hits, and reasoning tokens&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/alibaba/page-agent" rel="noopener noreferrer"&gt;alibaba/page-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🚀 &lt;strong&gt;Live Demo&lt;/strong&gt;: &lt;a href="https://alibaba.github.io/page-agent/" rel="noopener noreferrer"&gt;alibaba.github.io/page-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://alibaba.github.io/page-agent/docs/introduction/overview" rel="noopener noreferrer"&gt;alibaba.github.io/page-agent/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;npm&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/page-agent" rel="noopener noreferrer"&gt;page-agent&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;page-agent's core insight is that browser automation doesn't need to "look at" the page. The DOM is already a structured description of what exists on the page — serializing it to indexed text lets any text-only LLM understand and operate it directly. No multimodal model, no visual understanding, no coordinate targeting.&lt;/p&gt;

&lt;p&gt;This insight dramatically lowers the barrier for GUI agents: no server required, no headless browser, no screenshots, no extra infrastructure — a single script tag or npm package running in the user's browser.&lt;/p&gt;

&lt;p&gt;The Reflection-Before-Action model (evaluate last step → plan next step) combined with persistent &lt;code&gt;memory&lt;/code&gt; in history is what lets multi-step tasks stay on track. The agent doesn't just chain commands blindly; it continuously re-evaluates whether its actions are having the intended effect.&lt;/p&gt;

&lt;p&gt;For developers who want to add AI Copilot capability to a web product, or automate internal tool workflows without touching backend infrastructure, page-agent is one of the lowest-friction options currently available.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills, each validated against real enterprise workflows. No hype, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Visit my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;personal site&lt;/a&gt; for more insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>agents</category>
      <category>llm</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Workflow Series (01): Foundations — Three Execution Models and Anthropic's Five Patterns</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Fri, 26 Jun 2026 23:31:07 +0000</pubDate>
      <link>https://dev.to/wonderlab/workflow-series-01-foundations-three-execution-models-and-anthropics-five-patterns-2943</link>
      <guid>https://dev.to/wonderlab/workflow-series-01-foundations-three-execution-models-and-anthropics-five-patterns-2943</guid>
      <description>&lt;h2&gt;
  
  
  Workflows Are Not Flowcharts
&lt;/h2&gt;

&lt;p&gt;Traditional workflows are deterministic: each node is code, branch conditions are boolean expressions, failures are predefined exception types. Same input, same output, every time.&lt;/p&gt;

&lt;p&gt;Agent Workflows break those three assumptions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional Workflow (Airflow / n8n):
  Node             = Python function / API call (deterministic)
  Branch condition = x &amp;gt; 0 (boolean expression)
  Failure handling = try/except (predefined exception types)

Agent Workflow:
  Node             = LLM + tools (non-deterministic output)
  Branch condition = confidence ≥ 95% (semantic judgment)
  Failure handling = retry + human escalation + semantic fallback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent Workflows are one of three fundamentally different execution models, not an upgraded flowchart.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Execution Models
&lt;/h2&gt;

&lt;h3&gt;
  
  
  DAG (Directed Acyclic Graph)
&lt;/h3&gt;

&lt;p&gt;Control flow is fully determined before execution begins. Nodes only move forward — no loops.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Examples: Airflow, n8n, GitHub Actions
Use case: data pipelines, ETL, scheduled jobs

Properties:
  → Structure is visualizable, execution order is transparent
  → No cycles (retry logic requires workarounds)
  → Right for deterministic data processing, wrong for AI tasks requiring dynamic branching
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  State Machine
&lt;/h3&gt;

&lt;p&gt;The system has a finite set of states. Events trigger state transitions: current state + event → next state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Examples: LangGraph, custom workflows with JSON state files
Use case: business processes with branching, retries, and human approval gates

Properties:
  → System state is serializable at any moment (supports interrupt and resume)
  → Cycles are native (retry = state rollback)
  → workflow_state.json is the state file; approval gates are waiting-for-event states
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A Bug fix workflow as a state machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;States: analyzing / fixing / reviewing / waiting_human / done / failed
Events: analyze_done / fix_passed / human_approved / timeout
Transitions:
  analyzing + analyze_done      → fixing
  fixing    + fix_passed        → reviewing
  reviewing + human_approved    → done
  reviewing + timeout           → waiting_human (escalate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Event-Driven
&lt;/h3&gt;

&lt;p&gt;Nodes communicate through messages. Each node subscribes to messages, processes them, and publishes new ones. Naturally asynchronous and long-running.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Examples: Temporal, AWS Step Functions, Apache Kafka Streams
Use case: long-running workflows (&amp;gt; 1 hour), enterprise-grade transactions

Properties:
  → Nodes are loosely coupled — independently deployable and scalable
  → Built-in persistence and Durable Execution guarantees
  → High operational complexity; not right for fast iteration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Which Model to Use
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data pipeline, deterministic nodes      → DAG
Branching / retries / approval gates    → State Machine
Runtime &amp;gt; 1 hour, strong consistency    → Event-Driven
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most AI Agent Workflows use the &lt;strong&gt;state machine&lt;/strong&gt; model: it supports retry loops, serializable state, and interrupt-and-resume — all essential properties for AI workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Anthropic's Five Core Patterns
&lt;/h2&gt;

&lt;p&gt;Any Agent Workflow can be described using these five patterns. This vocabulary lets you communicate system architecture clearly with any engineer or stakeholder.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Prompt Chaining
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A → B → C
each step's output is the next step's input
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Tasks decompose into ordered steps where each step's quality directly affects what follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example: Bug fix workflow main chain
  Phase 1 (gather info) → Phase 2 (log analysis) → Phase 3 (root cause)
  → Phase 4 (code fix) → Phase 5 (submit) → Phase 7 (notify)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key constraint:&lt;/strong&gt; One step fails, the whole chain stops. Every step needs a defined &lt;code&gt;on_failure&lt;/code&gt; behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Routing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input → Classifier → [type A] → Agent A
                  → [type B] → Agent B
                  → [default] → Agent C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Different input types need different processing paths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Routing design example (Workflow Markdown format)&lt;/span&gt;

After Phase 3, route based on root cause confidence:
&lt;span class="p"&gt;
-&lt;/span&gt; confidence ≥ 95%         → proceed to Phase 4 (code fix)
&lt;span class="p"&gt;-&lt;/span&gt; 60% ≤ confidence &amp;lt; 95%  → trigger Gate A (human confirms root cause)
&lt;span class="p"&gt;-&lt;/span&gt; confidence &amp;lt; 60%         → retry Phase 3 (different angle, max 3 times)
&lt;span class="p"&gt;-&lt;/span&gt; still &amp;lt; 60% after 3 tries → escalate to human
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key design requirement:&lt;/strong&gt; The router must output an &lt;strong&gt;enumerated type&lt;/strong&gt;, not free text. &lt;code&gt;confidence &amp;lt; 60%&lt;/code&gt; is computable. "Analysis wasn't deep enough" is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Parallelization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         → B1 →
A → split → B2 → merge → C
         → B3 →
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Subtasks are independent and can run concurrently, or you need multiple perspectives to compare and select the best result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fan-out: run 3 fix candidates concurrently
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_completed&lt;/span&gt;

&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate_c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_fix_candidate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bug_info&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;candidate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Fan-in: select the best passing candidate
&lt;/span&gt;&lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important caveat:&lt;/strong&gt; Parallelization speedup is capped by Amdahl's Law. Three concurrent branches plus one sequential merge step typically yields ~1.5x speedup, not 3x. (See Skill Series Article 05 for measured data.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4: Orchestrator-Subagents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Orchestrator (main Agent)
  ├── spawn Subagent A (isolated session)
  ├── spawn Subagent B (isolated session)
  └── collect results → make decision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Tasks decompose into independent subtasks, each needing focused context without access to the main Agent's full history.&lt;/p&gt;

&lt;p&gt;Three design principles for subagents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Principle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;completeness&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;task&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contains&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;everything&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;subagent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;needs.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;It&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cannot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;depend&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"implicit context from the main Agent."&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Principle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Output&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contract&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;strictness&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Output&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;must&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;conform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;declared&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;schema.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;str&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Principle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Structured&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;failure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;output&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;Even&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;failure,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;write&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"passed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"specific reason"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;missing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;output&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;file&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;looks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;like&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;main&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Agent.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 5: Evaluator-Optimizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generator → Evaluator → [score ≥ threshold] → output
                ↓
         [score &amp;lt; threshold] → feedback → Generator (retry)
                                          max_retries = 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Output quality matters and the first attempt may not meet the bar. Iterative improvement is viable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Evaluation loop definition in workflow YAML&lt;/span&gt;
&lt;span class="na"&gt;phase_4_fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;generator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write-android-code&lt;/span&gt;
  &lt;span class="na"&gt;evaluator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-unit-tests&lt;/span&gt;
  &lt;span class="na"&gt;quality_threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;min_test_coverage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;80%&lt;/span&gt;
    &lt;span class="na"&gt;all_tests_passed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;on_fail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max_retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;feedback_to_generator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# pass failure reason back to generator&lt;/span&gt;
  &lt;span class="na"&gt;on_max_retries_exceeded&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;human_escalation&lt;/span&gt;      &lt;span class="c1"&gt;# escalate after 3 tries, never infinite loop&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Anti-pattern:&lt;/strong&gt; An evaluation loop without &lt;code&gt;max_retries&lt;/code&gt; runs forever. Evaluator feedback must name a specific problem. "Make it better" is not feedback.&lt;/p&gt;




&lt;h2&gt;
  
  
  Patterns Compose Into Real Workflows
&lt;/h2&gt;

&lt;p&gt;Real production workflows combine multiple patterns. A Bug fix workflow contains all five:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Overall structure:    Prompt Chaining (main chain, Phase 1→7)

Phase 3 internally:   Evaluator-Optimizer (root cause analysis, max 3 retries)

Phase 3→4 transition: Routing (route by confidence score)

Phase 4 internally:   Parallelization (3 candidates concurrently)
                      + Evaluator-Optimizer (test each candidate)

Phase 5/7 writes:     Routing (human approval gate = special routing branch)

Overall control:      Orchestrator-Subagents (main Agent runs the state machine,
                      spawns a subagent for each Phase)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mapping each part of your system to a pattern is the starting point for design review.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workflow Expression Formats
&lt;/h2&gt;

&lt;p&gt;A workflow's "code" can take many forms. There's no single right answer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Markdown + YAML&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;workflow.md&lt;/code&gt; + &lt;code&gt;config.yaml&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Frequent iteration; readable by non-engineers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python code (LangGraph)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;StateGraph&lt;/code&gt; + node functions&lt;/td&gt;
&lt;td&gt;Complex state machines; code-level testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual configuration (n8n)&lt;/td&gt;
&lt;td&gt;Canvas + JSON&lt;/td&gt;
&lt;td&gt;API integration-heavy; AI is just one step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Natural language&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; + task prompts&lt;/td&gt;
&lt;td&gt;Early exploration; rapid prototyping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The choice affects maintainability, testability, and execution engine selection. The next article (W2: Design Patterns) covers how to structure these files.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agent Workflows are state machines&lt;/strong&gt;: non-deterministic nodes, semantic branching, semantic fallback — the three assumptions of traditional DAGs all fail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five patterns are a shared vocabulary&lt;/strong&gt;: Prompt Chaining, Routing, Parallelization, Orchestrator-Subagents, Evaluator-Optimizer — any workflow can be described with these five&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real workflows combine patterns&lt;/strong&gt;: identifying which part of your system maps to which pattern is what system design ability looks like in practice&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic — Building Effective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf" rel="noopener noreferrer"&gt;OpenAI — A Practical Guide to Building Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Check out &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Find more useful knowledge and interesting products on my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>workflow</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Open Source Project of the Day (#106): android/skills — Google's Official AI Skill Library for Android Development</title>
      <dc:creator>WonderLab</dc:creator>
      <pubDate>Fri, 26 Jun 2026 03:26:06 +0000</pubDate>
      <link>https://dev.to/wonderlab/open-source-project-of-the-day-106-androidskills-googles-official-ai-skill-library-for-3g5o</link>
      <guid>https://dev.to/wonderlab/open-source-project-of-the-day-106-androidskills-googles-official-ai-skill-library-for-3g5o</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Android Skills cover the workflows where LLMs underperform — not the areas they've already mastered."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is article &lt;strong&gt;#106&lt;/strong&gt; in the &lt;em&gt;Open Source Project of the Day&lt;/em&gt; series. Today's project is &lt;strong&gt;android/skills&lt;/strong&gt; — Google Android team's official AI skill library.&lt;/p&gt;

&lt;p&gt;LLMs are already competent at a lot of Android code. Standard RecyclerViews, basic Jetpack Compose layouts, simple network calls — these appear extensively in training data and models handle them reasonably well.&lt;/p&gt;

&lt;p&gt;The hard parts are different: the areas of the Android framework that keep evolving — migrating from Camera1 to CameraX, configuring builds with AGP 9's new approach, understanding the architectural differences between Navigation 3 and the old Navigation Component, correctly configuring ProGuard/R8 rules. Official best practices shift constantly in these areas, and LLM training data doesn't keep up.&lt;/p&gt;

&lt;p&gt;android/skills packages developer.android.com's current best practices as Skill files specifically to fill those knowledge gaps. Google's internal data from the launch blog: tasks completed 3× faster compared to agents working without skills, and token consumption reduced by more than 70%.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What all 13 official skills cover — which Android scenarios Google identifies as LLM weak spots&lt;/li&gt;
&lt;li&gt;Installation and usage: Android CLI commands + Android Studio integration&lt;/li&gt;
&lt;li&gt;On-demand loading architecture: how skills activate automatically without polluting the context window&lt;/li&gt;
&lt;li&gt;Writing custom skills: encoding your team's workflows into private SKILL.md files&lt;/li&gt;
&lt;li&gt;Compatibility with Claude Code, Gemini CLI, Codex, and others&lt;/li&gt;
&lt;li&gt;The companion tools: Android CLI and Android Knowledge Base&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Android development experience (familiarity with Gradle build system, Jetpack libraries)&lt;/li&gt;
&lt;li&gt;Experience with Claude Code, Gemini CLI, or a similar AI coding tool&lt;/li&gt;
&lt;li&gt;Basic familiarity with the agent skills concept&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project Background
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is android/skills?
&lt;/h3&gt;

&lt;p&gt;android/skills is Google's official Android development AI skill repository, released alongside Android CLI in April 2026.&lt;/p&gt;

&lt;p&gt;Skill files are Markdown format (&lt;code&gt;SKILL.md&lt;/code&gt;) following the &lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; open standard. Each skill contains: metadata (name, description, trigger conditions) + step-by-step execution instructions + optional scripts and reference documentation.&lt;/p&gt;

&lt;p&gt;The selection criterion is explicit: &lt;strong&gt;focus on scenarios where LLM evaluations show underperformance&lt;/strong&gt;, not areas where models already do well (like basic Compose components). That explains why the skill list doesn't include "write a Button" but does include "migrate from Camera1 to CameraX."&lt;/p&gt;

&lt;h3&gt;
  
  
  Author / Team
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Team&lt;/strong&gt;: Google Android Developer Relations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: Apache-2.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version&lt;/strong&gt;: v1.0.2 (June 2026)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://developer.android.com/tools/agents/android-skills" rel="noopener noreferrer"&gt;developer.android.com/tools/agents/android-skills&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;⭐ GitHub Stars: &lt;strong&gt;5,900+&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🍴 Forks: 328+&lt;/li&gt;
&lt;li&gt;📦 Releases: 14&lt;/li&gt;
&lt;li&gt;📄 License: Apache-2.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  All 13 Official Skills
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Problem It Addresses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Build&lt;/td&gt;
&lt;td&gt;AGP 9 Upgrade&lt;/td&gt;
&lt;td&gt;Breaking changes and migration path for AGP 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Camera&lt;/td&gt;
&lt;td&gt;Camera1 → CameraX&lt;/td&gt;
&lt;td&gt;Camera API migration best practices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Device AI&lt;/td&gt;
&lt;td&gt;App Functions&lt;/td&gt;
&lt;td&gt;On-device AI feature integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dev Tools&lt;/td&gt;
&lt;td&gt;Android CLI&lt;/td&gt;
&lt;td&gt;Using Android CLI itself&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Identity&lt;/td&gt;
&lt;td&gt;Verified Email&lt;/td&gt;
&lt;td&gt;Email verification implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Jetpack Compose&lt;/td&gt;
&lt;td&gt;Best practices for Compose weak spots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Navigation&lt;/td&gt;
&lt;td&gt;Navigation 3&lt;/td&gt;
&lt;td&gt;Nav3 setup and migration from old Nav&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;R8 Analyzer&lt;/td&gt;
&lt;td&gt;R8 configuration audit and optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Android Intent Security&lt;/td&gt;
&lt;td&gt;Intent security best practices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System&lt;/td&gt;
&lt;td&gt;Edge-to-Edge&lt;/td&gt;
&lt;td&gt;Modern edge-to-edge UI implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;td&gt;Testing Setup&lt;/td&gt;
&lt;td&gt;Test infrastructure configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wear&lt;/td&gt;
&lt;td&gt;Wear Compose M3&lt;/td&gt;
&lt;td&gt;M3 components for Wear OS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XR&lt;/td&gt;
&lt;td&gt;Display Glasses (Compose Glimmer)&lt;/td&gt;
&lt;td&gt;Compose for XR devices&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few worth highlighting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;R8 Analyzer&lt;/strong&gt;: ProGuard/R8 rules are among the most difficult parts of an Android build to tune. LLMs typically give generic R8 rule suggestions without precision for specific scenarios (reflection, third-party library obfuscation). This skill encodes the complete R8 configuration analysis workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AGP 9 Upgrade&lt;/strong&gt;: Every major AGP version introduces breaking changes. AGP 9 includes deprecated APIs and new build behavior that exists in almost no training data. The skill provides the current up-to-date upgrade path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camera1 → CameraX&lt;/strong&gt;: Camera1 API has been deprecated for years, but many older projects still use it. The migration path has details — like the Camera2 to CameraX adapter layer — that LLMs handle poorly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Navigation 3&lt;/strong&gt;: Navigation 3 and the old Navigation Component differ architecturally, not just at the API level. The skill includes design decisions and migration guidance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Method 1: Android CLI (recommended)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all available skills&lt;/span&gt;
android skills list

&lt;span class="c"&gt;# Install a single skill&lt;/span&gt;
android skills add &lt;span class="nt"&gt;--skill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;r8-analyzer &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Install all skills (across all detected agents)&lt;/span&gt;
android skills add &lt;span class="nt"&gt;--all&lt;/span&gt;

&lt;span class="c"&gt;# Target specific agents&lt;/span&gt;
android skills add &lt;span class="nt"&gt;--skill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;navigation-3 &lt;span class="nt"&gt;--agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-code,gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If no existing agent configuration is detected, skills install to &lt;code&gt;~/.gemini/antigravity/skills&lt;/code&gt; by default.&lt;/p&gt;

&lt;p&gt;Install locations by agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code: &lt;code&gt;~/.claude/skills/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Gemini CLI: &lt;code&gt;~/.gemini/skills/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Android Studio: &lt;code&gt;.skills/&lt;/code&gt; or &lt;code&gt;.agent/skills/&lt;/code&gt; at project root&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Method 2: Android Studio&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download the skill directory from GitHub&lt;/li&gt;
&lt;li&gt;Import via Android Studio → Gemini → Skills&lt;/li&gt;
&lt;li&gt;Or place the skill directory directly under &lt;code&gt;.skills/&lt;/code&gt; in project root&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Usage
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Automatic activation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once installed, when an AI agent detects that a prompt matches a skill's description, the skill loads automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You say to Claude Code: "Update this Activity to support edge-to-edge UI"
        ↓
Agent detects "edge-to-edge" context, matches skill description
        ↓
Edge-to-Edge skill loaded (SKILL.md + associated resources)
        ↓
Task executed using the skill's step-by-step guidance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Manual invocation (Android Studio):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Type &lt;code&gt;@skill-name&lt;/code&gt; in the Gemini chat to trigger directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@edge-to-edge Help me update the WindowInsets handling in this Activity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  On-Demand Loading Architecture
&lt;/h3&gt;

&lt;p&gt;Skills aren't all loaded into the context window upfront — they're pulled on demand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request arrives
    ↓
Agent reads all skill metadata (description fields — lightweight)
    ↓
Relevant skill identified via description matching
    ↓
Full SKILL.md + attached resources loaded into context window
    ↓
Task executes with complete expert context

No matching skill: nothing loaded, context window unaffected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This avoids the inefficient pattern of "stuff all skills into the system prompt" — only relevant knowledge loads, only when it's needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SKILL.md Format
&lt;/h3&gt;

&lt;p&gt;Both official and custom skills use the same format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;r8-analyzer&lt;/span&gt;              &lt;span class="c1"&gt;# max 64 chars, lowercase and hyphens only&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;                 &lt;span class="c1"&gt;# max 1024 chars — this is what agents use to match&lt;/span&gt;
  &lt;span class="s"&gt;Use this skill when the user needs to analyze or optimize an Android&lt;/span&gt;
  &lt;span class="s"&gt;app's R8/ProGuard configuration. Applicable when debugging obfuscation&lt;/span&gt;
  &lt;span class="s"&gt;issues, reducing APK size, or fixing runtime crashes after enabling R8.&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;android&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## Skill Instructions&lt;/span&gt;

(Markdown step-by-step instructions here, target 10,000–20,000 characters)

&lt;span class="gu"&gt;### Step 1: Analyze existing configuration&lt;/span&gt;
...

To run a helper script: &lt;span class="sb"&gt;`scripts/analyze_rules.py`&lt;/span&gt;
Reference docs: See &lt;span class="sb"&gt;`references/r8-guide.md`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The description field is critical&lt;/strong&gt;: these 1024 characters determine when an agent activates this skill. Too broad ("for Android development") causes false triggers; too narrow misses legitimate activation points. Writing good descriptions is where skill quality lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Skills: Encoding Your Team's Patterns
&lt;/h3&gt;

&lt;p&gt;This is the most underappreciated capability in android/skills: any team can write their own skills in the same format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical use cases:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your team has a standard module structure convention
→ Write that convention as a SKILL.md
→ Place it in the project's .skills/ directory
→ Every time AI creates a new module, it follows your convention automatically
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Directory structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;project-root/&lt;/span&gt;
&lt;span class="s"&gt;└── .skills/&lt;/span&gt;
    &lt;span class="s"&gt;├── team-module-template/&lt;/span&gt;
    &lt;span class="s"&gt;│   ├── SKILL.md&lt;/span&gt;            &lt;span class="c1"&gt;# Required, case-sensitive&lt;/span&gt;
    &lt;span class="s"&gt;│   ├── scripts/&lt;/span&gt;
    &lt;span class="s"&gt;│   │   └── create_module.sh&lt;/span&gt;
    &lt;span class="s"&gt;│   └── references/&lt;/span&gt;
    &lt;span class="s"&gt;│       └── architecture-guide.md&lt;/span&gt;
    &lt;span class="s"&gt;├── internal-api-patterns/&lt;/span&gt;
    &lt;span class="s"&gt;│   └── SKILL.md&lt;/span&gt;
    &lt;span class="s"&gt;└── ci-setup/&lt;/span&gt;
        &lt;span class="s"&gt;└── SKILL.md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These private skills are scoped to the project directory — they don't affect other projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: If you modify an official skill, &lt;strong&gt;rename it first&lt;/strong&gt; before modifying — &lt;code&gt;android skills add&lt;/code&gt; will overwrite skills with matching names on update.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Companion Toolchain
&lt;/h3&gt;

&lt;p&gt;android/skills is part of a three-component release:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Android CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;android sdk &lt;span class="nb"&gt;install&lt;/span&gt;         &lt;span class="c"&gt;# Download only the SDK components you need&lt;/span&gt;
android create              &lt;span class="c"&gt;# Create projects from official templates&lt;/span&gt;
android emulator            &lt;span class="c"&gt;# Manage virtual devices&lt;/span&gt;
android run                 &lt;span class="c"&gt;# Build and deploy apps&lt;/span&gt;
android skills              &lt;span class="c"&gt;# Manage skills&lt;/span&gt;
android docs                &lt;span class="c"&gt;# Access the Knowledge Base&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI's purpose: let AI agents operate the Android toolchain through standardized commands, reducing failures caused by environment differences (for example, &lt;code&gt;./gradlew assembleDebug&lt;/code&gt; path variations across environments).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Android Knowledge Base:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously syncs developer.android.com, Firebase docs, Kotlin docs&lt;/li&gt;
&lt;li&gt;Addresses the LLM training data staleness problem&lt;/li&gt;
&lt;li&gt;Accessed through the &lt;code&gt;android docs&lt;/code&gt; command&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Claude Code Integration
&lt;/h3&gt;

&lt;p&gt;android/skills is fully compatible with Claude Code. After installing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;android skills add &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With skill files in &lt;code&gt;~/.claude/skills/&lt;/code&gt;, Claude Code automatically detects and loads relevant skills when handling Android tasks — no additional configuration needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links and Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌟 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/android/skills" rel="noopener noreferrer"&gt;android/skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://developer.android.com/tools/agents/android-skills" rel="noopener noreferrer"&gt;developer.android.com/tools/agents/android-skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🛠 &lt;strong&gt;Android CLI&lt;/strong&gt;: &lt;a href="https://developer.android.com/tools/agents/android-cli" rel="noopener noreferrer"&gt;developer.android.com/tools/agents/android-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🏗 &lt;strong&gt;Android Studio integration&lt;/strong&gt;: &lt;a href="https://developer.android.com/studio/gemini/skills" rel="noopener noreferrer"&gt;developer.android.com/studio/gemini/skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Agent Skills standard&lt;/strong&gt;: &lt;a href="https://agentskills.io/home" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;android/skills' value isn't just "13 skill files." It's Google's public answer to two questions: which Android development scenarios are genuine LLM weak spots, and how to fill those gaps with structured skill files that agents can execute.&lt;/p&gt;

&lt;p&gt;"Focus on areas where LLM evaluations show underperformance" is a principle worth studying. It explains why these skills have practical value — they're not redundant with what LLMs already know; they're targeted at specific knowledge gaps.&lt;/p&gt;

&lt;p&gt;The open custom skill format means any team can apply the same mechanism to internal architecture conventions, code style guidelines, and common operational workflows — encoding them as AI-executable skills instead of repeatedly pasting them into system prompts.&lt;/p&gt;

&lt;p&gt;For Android developers, this is one of the most direct AI toolchain upgrades currently available — Google-maintained, continuously updated, compatible with multiple agents.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Explore &lt;a href="https://primeskills.store" rel="noopener noreferrer"&gt;PrimeSkills&lt;/a&gt; — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Welcome to my &lt;a href="https://home.wonlab.top/en" rel="noopener noreferrer"&gt;Homepage&lt;/a&gt; for more useful insights and interesting products.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>android</category>
      <category>agentskills</category>
      <category>developer</category>
    </item>
  </channel>
</rss>
