<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: gezilinll</title>
    <description>The latest articles on DEV Community by gezilinll (@gezilinll).</description>
    <link>https://dev.to/gezilinll</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1776492%2F173c11c3-94ac-49e5-899a-348fbb0d818a.jpeg</url>
      <title>DEV Community: gezilinll</title>
      <link>https://dev.to/gezilinll</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gezilinll"/>
    <language>en</language>
    <item>
      <title>Agent-Ready Engineering Infrastructure</title>
      <dc:creator>gezilinll</dc:creator>
      <pubDate>Wed, 06 May 2026 03:43:27 +0000</pubDate>
      <link>https://dev.to/gezilinll/agent-ready-engineering-infrastructure-3lco</link>
      <guid>https://dev.to/gezilinll/agent-ready-engineering-infrastructure-3lco</guid>
      <description>&lt;ul&gt;
&lt;li&gt;GitHub repository: &lt;a href="https://github.com/gezilinll/agent-grove" rel="noopener noreferrer"&gt;https://github.com/gezilinll/agent-grove&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Online documentation: &lt;a href="https://gezilinll.github.io/agent-grove/agent-ready-engineering-infrastructure" rel="noopener noreferrer"&gt;https://gezilinll.github.io/agent-grove/agent-ready-engineering-infrastructure&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Project infrastructure for Coding Agents is not about turning every repository into an agent product. It is also not just adding "please run tests first" to the README.&lt;/p&gt;

&lt;p&gt;It answers a more concrete question: when Coding IDEs or Coding Agents such as Cursor, Codex, and Claude Code enter a real project, what engineering interfaces should the codebase provide so the agent can understand the project, make changes, verify outcomes, stay within governance boundaries, and improve from failures?&lt;/p&gt;

&lt;p&gt;The core conclusion is short:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent-ready repo
  = context
  + intent
  + execution
  + verification
  + governance
  + feedback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftew7b5xsc4veumj85t22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftew7b5xsc4veumj85t22.png" alt="Agent-Ready Codebase Overview" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spec&lt;/code&gt; and &lt;code&gt;harness&lt;/code&gt; are core modules, but they are not the whole story.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;spec&lt;/code&gt; mainly serves intent: what this task is trying to do, what is out of scope, and what counts as done.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;harness&lt;/code&gt; mainly connects execution, verification, and feedback: how code runs in a controlled environment, how correctness is proven, and how failures return to the agent and the team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All code snippets below come from real open-source projects. To keep the reading flow light, each snippet keeps only the lines that directly support the infrastructure point being made.&lt;/p&gt;

&lt;p&gt;The examples are not a complete matrix. A project is included in a layer only when its implementation is representative, useful as a reference, or shows a clear engineering trade-off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scope
&lt;/h2&gt;

&lt;p&gt;This article only studies &lt;strong&gt;infrastructure that helps Coding Agents work on concrete projects&lt;/strong&gt;. If an open-source project is itself an agent product, this article does not analyze its agent loop, memory, tool calling, or model routing design unless those ideas are explicitly turned into rules, verification entry points, configuration, or governance mechanisms for developing the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Six Layers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Common artifacts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;How does the agent understand the project and boundaries?&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt;, scoped guides, architecture maps, coding rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intent&lt;/td&gt;
&lt;td&gt;How does the agent understand the current task?&lt;/td&gt;
&lt;td&gt;spec, proposal, design, tasks, acceptance criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution&lt;/td&gt;
&lt;td&gt;Where does the agent run, and with what permissions?&lt;/td&gt;
&lt;td&gt;setup, Makefile, Docker, devcontainer, MCP/tool config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;How does the agent prove the change is correct?&lt;/td&gt;
&lt;td&gt;test, lint, typecheck, contract test, CI, eval harness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance&lt;/td&gt;
&lt;td&gt;How are agent changes constrained and audited?&lt;/td&gt;
&lt;td&gt;approval gates, CODEOWNERS, PR templates, diff guards, rollback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feedback&lt;/td&gt;
&lt;td&gt;How do failures and reviews flow back?&lt;/td&gt;
&lt;td&gt;failure artifacts, coverage, trajectory, benchmark, flaky tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Traditional codebases depend heavily on organizational memory: which modules are risky, which tests are flaky, which commands only run in CI, and which interfaces are public contracts. Coding Agents cannot reliably depend on that implicit knowledge. The essence of agent-ready infrastructure is to turn that knowledge into engineering interfaces inside the repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Make implicit knowledge explicit
Structure explicit rules
Automate structured rules
Feed automation results back to the agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  1. Context Layer
&lt;/h2&gt;

&lt;p&gt;Context Layer tells the agent what the project is, where to read, and which boundaries must not be broken. The minimal form is a root &lt;code&gt;AGENTS.md&lt;/code&gt;; the mature form is usually a root guide plus scoped guides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: OpenClaw's &lt;code&gt;AGENTS.md&lt;/code&gt; Is an Agent Operating Manual
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; is not a duplicate README. OpenClaw's root file tells the agent which local rules to read first, where project boundaries are, which commands should not be called directly, and which changes trigger which gates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Root rules only. Read scoped &lt;span class="sb"&gt;`AGENTS.md`&lt;/span&gt; before subtree work.

&lt;span class="gu"&gt;## Map&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Core TS: &lt;span class="sb"&gt;`src/`&lt;/span&gt;, &lt;span class="sb"&gt;`ui/`&lt;/span&gt;, &lt;span class="sb"&gt;`packages/`&lt;/span&gt;; plugins: &lt;span class="sb"&gt;`extensions/`&lt;/span&gt;;
  SDK: &lt;span class="sb"&gt;`src/plugin-sdk/*`&lt;/span&gt;; channels: &lt;span class="sb"&gt;`src/channels/*`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Scoped guides exist in: &lt;span class="sb"&gt;`extensions/`&lt;/span&gt;, &lt;span class="sb"&gt;`src/{plugin-sdk,channels,plugins,gateway}/`&lt;/span&gt;,
  &lt;span class="sb"&gt;`test/helpers*/`&lt;/span&gt;, &lt;span class="sb"&gt;`docs/`&lt;/span&gt;, &lt;span class="sb"&gt;`ui/`&lt;/span&gt;, &lt;span class="sb"&gt;`scripts/`&lt;/span&gt;.

&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Smart gate: &lt;span class="sb"&gt;`pnpm check:changed`&lt;/span&gt;; explain &lt;span class="sb"&gt;`pnpm changed:lanes --json`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Targeted tests: &lt;span class="sb"&gt;`pnpm test &amp;lt;path-or-filter&amp;gt; [vitest args...]`&lt;/span&gt;; never raw &lt;span class="sb"&gt;`vitest`&lt;/span&gt;.

&lt;span class="gu"&gt;## Gates&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Changed lanes:
&lt;span class="p"&gt;  -&lt;/span&gt; core prod: core prod typecheck + core tests
&lt;span class="p"&gt;  -&lt;/span&gt; public SDK/plugin contract: extension prod/test too
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the value of Context Layer: it does not merely tell the agent how to start the project. It compresses the repo map, scoped guide entry points, command constraints, ownership boundaries, and verification routes into executable working context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Langfuse Converges Multi-Tool Rules Into &lt;code&gt;.agents/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Langfuse does not hand-write separate configuration for every Coding IDE. It generates Claude, Codex, Cursor, and MCP configuration from &lt;code&gt;.agents/config.json&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sourcePath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.agents/config.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sourcePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileOutputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.claude/settings.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;formatClaudeSettings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.mcp.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;formatSharedJsonConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.codex/environments/environment.toml&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;formatCodexEnvironmentToml&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.cursor/mcp.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;formatSharedJsonConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoRoot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.cursor/environment.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;formatCursorEnvironment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trend is clear: when multiple agent IDEs coexist, teams need one canonical source and then project it into each tool's preferred format. LangGraph's much shorter guide also reminds us that Context Layer is not about length. It is about navigation.&lt;/p&gt;

&lt;p&gt;A good Context Layer lets the agent answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which directory or module should this task start from?&lt;/li&gt;
&lt;li&gt;Is there a more specific scoped guide?&lt;/li&gt;
&lt;li&gt;Which public interfaces, dependency boundaries, or architecture boundaries must not be broken?&lt;/li&gt;
&lt;li&gt;Which commands should be run after the change?&lt;/li&gt;
&lt;li&gt;When must a human confirm the decision?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Root &lt;code&gt;AGENTS.md&lt;/code&gt; plus scoped &lt;code&gt;AGENTS.md&lt;/code&gt; files&lt;/td&gt;
&lt;td&gt;Turns a large repo map, commands, gates, and ownership boundaries into an agent operating manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;Root &lt;code&gt;AGENTS.md&lt;/code&gt; routes to &lt;code&gt;api/&lt;/code&gt;, &lt;code&gt;web/&lt;/code&gt;, and &lt;code&gt;e2e/&lt;/code&gt; local rules&lt;/td&gt;
&lt;td&gt;Multi-stack applications should let the root guide route and push details into subdomains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.agents/AGENTS.md&lt;/code&gt; as canonical source, synchronized to multiple tool configs&lt;/td&gt;
&lt;td&gt;Avoids drift between Claude, Codex, Cursor, and MCP configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;Minimal &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Shows that context does not need to be long when project boundaries are simple&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Intent Layer
&lt;/h2&gt;

&lt;p&gt;Intent Layer helps the agent understand the goal, boundary, and acceptance criteria of the current task. Context is long-lived project policy; Intent is the task contract for this change.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What the spec should express&lt;/th&gt;
&lt;th&gt;Typical content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What to build&lt;/td&gt;
&lt;td&gt;goal, user scenario, feature scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What not to build&lt;/td&gt;
&lt;td&gt;non-goals, exclusions, compatibility boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What counts as done&lt;/td&gt;
&lt;td&gt;acceptance criteria, scenario, example&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What must not break&lt;/td&gt;
&lt;td&gt;public contract, permissions, security, performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How to verify&lt;/td&gt;
&lt;td&gt;test, lint, typecheck, E2E, schema check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What to do when uncertain&lt;/td&gt;
&lt;td&gt;open questions, conservative decision rules&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Code Evidence: Spec Kit Splits Intent Into Staged Artifacts
&lt;/h3&gt;

&lt;p&gt;Spec Kit is not just "one more spec file." It turns the path from principles to implementation into agent-executable commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/speckit.constitution  -&amp;gt; project principles
/speckit.specify       -&amp;gt; what and why
/speckit.plan          -&amp;gt; technical plan
/speckit.tasks         -&amp;gt; implementation tasks
/speckit.implement     -&amp;gt; execute tasks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its workflow also places spec and plan behind human review gates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
    &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;what&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;you&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;want&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build"&lt;/span&gt;

&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;specify&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;speckit.specify&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;review-spec&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gate&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;approve&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reject&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;speckit.plan&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;review-plan&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gate&lt;/span&gt;
    &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;approve&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reject&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tasks&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;speckit.tasks&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;implement&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;speckit.implement&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyuhel9hird5qbv3i3jh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyuhel9hird5qbv3i3jh.png" alt="Spec Kit Workflow" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The infrastructure meaning is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project principles are persisted in &lt;code&gt;.specify/memory/constitution.md&lt;/code&gt;, so later specs, plans, and tasks follow the same rules.&lt;/li&gt;
&lt;li&gt;"Think before coding" becomes an agent command rather than a one-off human reminder.&lt;/li&gt;
&lt;li&gt;Review happens at the spec and plan stages, before the agent produces a large diff in the wrong direction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Evidence: OpenSpec Turns Changes Into a Delta Lifecycle
&lt;/h3&gt;

&lt;p&gt;OpenSpec follows a different route. It behaves more like a long-lived behavior specification system: current behavior lives in &lt;code&gt;specs/&lt;/code&gt;, active changes live in &lt;code&gt;changes/&lt;/code&gt;, and completed changes are archived.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openspec/
  specs/
    &amp;lt;current-system-behavior&amp;gt;/
  changes/
    &amp;lt;active-change&amp;gt;/
      proposal.md
      design.md
      tasks.md
      specs/
    archive/
      &amp;lt;completed-change&amp;gt;/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jn2z36axyvuz3huzt80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jn2z36axyvuz3huzt80.png" alt="OpenSpec Workflow" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a real change, &lt;code&gt;proposal.md&lt;/code&gt; first defines why, what, and non-goals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## What Changes&lt;/span&gt;

Add the first user-facing workspace setup flow:

openspec workspace setup
openspec workspace list
openspec workspace link /path/to/api
openspec workspace relink api /new/path/to/api
openspec workspace doctor

&lt;span class="gu"&gt;## Non-Goals&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; No public &lt;span class="sb"&gt;`openspec workspace create`&lt;/span&gt; command in this first release.
&lt;span class="p"&gt;-&lt;/span&gt; No agent launch or workspace open behavior.
&lt;span class="p"&gt;-&lt;/span&gt; No apply, verify, archive, branch, or worktree behavior.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The delta spec then turns behavior into requirements and scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## MODIFIED Requirements&lt;/span&gt;

&lt;span class="gu"&gt;### Requirement: Stable Workspace Name&lt;/span&gt;
OpenSpec SHALL use one kebab-case workspace name across workspace identity,
managed storage, and the local registry.

&lt;span class="gu"&gt;#### Scenario: Rejecting invalid workspace names&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; WHEN OpenSpec accepts a workspace name
&lt;span class="p"&gt;-&lt;/span&gt; THEN it SHALL require kebab-case names using lowercase letters, numbers,
  and single hyphen separators
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tasks.md&lt;/code&gt; turns intent into checkable implementation and verification work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; [x] Implement &lt;span class="sb"&gt;`openspec workspace setup`&lt;/span&gt; as the only public creation path
&lt;span class="p"&gt;-&lt;/span&gt; [x] Fail cleanly when non-interactive setup is missing a name or link
&lt;span class="p"&gt;-&lt;/span&gt; [x] Run &lt;span class="sb"&gt;`openspec validate workspace-create-and-register-repos --strict`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [x] Run targeted command tests for workspace setup/list/link/relink/doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Spec Kit and OpenSpec share the same underlying goal: make human intent consumable, traceable, and reviewable by agents. The difference is that Spec Kit leans toward a staged pipeline, while OpenSpec leans toward a change and delta-spec lifecycle.&lt;/p&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Spec Kit&lt;/td&gt;
&lt;td&gt;constitution / specify / plan / tasks / implement&lt;/td&gt;
&lt;td&gt;Good for greenfield work or large features that need a staged artifact pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenSpec&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;specs/&lt;/code&gt; source of truth + &lt;code&gt;changes/&lt;/code&gt; delta spec + archive&lt;/td&gt;
&lt;td&gt;Good for existing projects where each behavior change needs reviewable lifecycle management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;api/AGENTS.md&lt;/code&gt; treats docstrings and comments as spec&lt;/td&gt;
&lt;td&gt;Keeps invariants, edge cases, and trade-offs close to the code they constrain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;Public API contract changes must update Fern sources and generated outputs&lt;/td&gt;
&lt;td&gt;Intent is not only requirements text; it can also be API contracts and schema sources of truth&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. Execution Layer
&lt;/h2&gt;

&lt;p&gt;Execution Layer lets the agent execute work in a reproducible, controlled, bounded environment. It is not merely about "getting the project to run"; it also reduces environment guessing and dangerous side effects.&lt;/p&gt;

&lt;p&gt;It does include test commands and workflows, but it is not the same as Verification Layer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;How to install dependencies, start services, reset data, or open a browser&lt;/td&gt;
&lt;td&gt;Execution&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;e2e:install&lt;/code&gt;, &lt;code&gt;e2e:middleware:up&lt;/code&gt;, &lt;code&gt;e2e:reset&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where code runs, whether network is allowed, how secrets are handled&lt;/td&gt;
&lt;td&gt;Execution&lt;/td&gt;
&lt;td&gt;Docker, sandbox, devcontainer, MCP/tool config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Which checks this change must run&lt;/td&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;API changes run API tests; migration changes run migration checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What counts as passing, and where failure artifacts live&lt;/td&gt;
&lt;td&gt;Verification&lt;/td&gt;
&lt;td&gt;required checks, coverage, E2E report, benchmark output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The same test command can cross both layers: Execution defines how to run it; Verification defines when it must run, how to judge the result, and how failures flow back.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Dify Scripts E2E Execution Entry Points
&lt;/h3&gt;

&lt;p&gt;Dify's E2E package does not merely say "run E2E tests." It scripts installation, middleware startup, reset, full runs, and headed runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx ./scripts/run-cucumber.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e:full"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx ./scripts/run-cucumber.ts --full"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e:install"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"playwright install --with-deps chromium"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e:middleware:up"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx ./scripts/setup.ts middleware-up"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e:middleware:down"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx ./scripts/setup.ts middleware-down"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e2e:reset"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx ./scripts/setup.ts reset"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;End-to-end tests often depend on browsers, backend services, middleware, seed data, and reset order. Turning those into commands is far more reliable than asking the agent to infer the process from documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: OpenHands Writes Real Runtime Pitfalls Into the Execution Entry Point
&lt;/h3&gt;

&lt;p&gt;OpenHands' &lt;code&gt;AGENTS.md&lt;/code&gt; does more than list commands. It documents the environment problems agents will actually hit in a local sandbox.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;make build &amp;amp;&amp;amp; make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 &lt;span class="err"&gt;\&lt;/span&gt;
  BACKEND_HOST=0.0.0.0 &amp;amp;&amp;gt; /tmp/openhands-log.txt &amp;amp;

Local run troubleshooting notes:
&lt;span class="p"&gt;-&lt;/span&gt; If the backend fails with &lt;span class="sb"&gt;`nc: command not found`&lt;/span&gt;, install &lt;span class="sb"&gt;`netcat-openbsd`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; If local runtime startup fails with &lt;span class="sb"&gt;`duplicate session: test-session`&lt;/span&gt;,
  clear the stale tmux session.
&lt;span class="p"&gt;-&lt;/span&gt; In this sandbox environment, an inherited &lt;span class="sb"&gt;`SESSION_API_KEY`&lt;/span&gt; can make
  &lt;span class="sb"&gt;`/api/v1/settings`&lt;/span&gt; return 401 in the browser. Unset it before &lt;span class="sb"&gt;`make run`&lt;/span&gt;.

IMPORTANT: Before making any changes to the codebase, ALWAYS run
&lt;span class="sb"&gt;`make install-pre-commit-hooks`&lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Humans often treat this as experience. Agents need it inside the repository. A mature Execution Layer turns real environment pitfalls into executable preconditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Aider Benchmarks Run Isolated by Default
&lt;/h3&gt;

&lt;p&gt;Aider's benchmark executes code generated by an LLM, so it explicitly requires Docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;The benchmark is intended to be run inside a docker container.
This is because the benchmarking harness will be taking code written by an LLM
and executing it without any human review or supervision.

./benchmark/docker_build.sh
./benchmark/docker.sh
./benchmark/benchmark.py a-helpful-name-for-this-run --model gpt-3.5-turbo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Execution Layer therefore includes safety boundaries. When agent or model output is executed, sandbox or Docker isolation is not a nice-to-have. It is infrastructure.&lt;/p&gt;

&lt;p&gt;Implementation guidance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small projects need at least one reliable setup/test/build entry point.&lt;/li&gt;
&lt;li&gt;Medium and large projects should distinguish local quick checks, PR checks, and CI-only checks.&lt;/li&gt;
&lt;li&gt;If model-generated code will be executed, default to Docker or sandbox isolation.&lt;/li&gt;
&lt;li&gt;MCP/tool config should not be hand-maintained forever across multiple IDE-specific files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;E2E package scripts manage install, middleware up/down, reset, and full runs&lt;/td&gt;
&lt;td&gt;Scripts complex E2E execution so the agent does not guess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenHands&lt;/td&gt;
&lt;td&gt;run/build/pre-commit plus sandbox troubleshooting&lt;/td&gt;
&lt;td&gt;Makes real local development problems explicit, especially for complex apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;benchmark must run in Docker&lt;/td&gt;
&lt;td&gt;Treats execution of LLM-generated code as a safety boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;scripts/codex/setup.sh&lt;/code&gt;, Playwright install, MCP/Codex/Cursor environment generation&lt;/td&gt;
&lt;td&gt;Execution includes agent tools and environment bootstrap, not only shell commands&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Verification Layer
&lt;/h2&gt;

&lt;p&gt;Verification Layer lets the project automatically judge whether the agent actually did the right thing.&lt;/p&gt;

&lt;p&gt;The relationship between spec and harness can be summarized as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Spec defines correctness. Harness makes correctness executable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdljobvq4xl9f7d22vtf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdljobvq4xl9f7d22vtf.png" alt="Spec x Harness" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Dify Uses Path Filters to Select CI
&lt;/h3&gt;

&lt;p&gt;Dify's main CI first determines which areas changed, then triggers API, web, E2E, vector database, and migration workflows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;check-changes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;api-changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.changes.outputs.api }}&lt;/span&gt;
    &lt;span class="na"&gt;e2e-changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.changes.outputs.e2e }}&lt;/span&gt;
    &lt;span class="na"&gt;web-changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.changes.outputs.web }}&lt;/span&gt;
    &lt;span class="na"&gt;vdb-changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.changes.outputs.vdb }}&lt;/span&gt;
    &lt;span class="na"&gt;migration-changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.changes.outputs.migration }}&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dorny/paths-filter@...&lt;/span&gt;
      &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;api:&lt;/span&gt;
            &lt;span class="s"&gt;- 'api/**'&lt;/span&gt;
          &lt;span class="s"&gt;web:&lt;/span&gt;
            &lt;span class="s"&gt;- 'web/**'&lt;/span&gt;
            &lt;span class="s"&gt;- 'packages/**'&lt;/span&gt;
          &lt;span class="s"&gt;e2e:&lt;/span&gt;
            &lt;span class="s"&gt;- 'api/**'&lt;/span&gt;
            &lt;span class="s"&gt;- 'e2e/**'&lt;/span&gt;
            &lt;span class="s"&gt;- 'web/**'&lt;/span&gt;
          &lt;span class="s"&gt;migration:&lt;/span&gt;
            &lt;span class="s"&gt;- 'api/migrations/**'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not product logic. It is a verification planner for agents: the changed surface determines the checks that should run. Large projects cannot ask agents to blindly run everything every time, but they also cannot let affected areas be missed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: OpenClaw Implements Changed Gates as Project Code
&lt;/h3&gt;

&lt;p&gt;OpenClaw does not rely only on CI YAML. It encodes path classification, impact, and reasons inside repository scripts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DOCS_PATH_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;docs&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;|README&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;md$|AGENTS&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;md$|.*&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;mdx&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;$&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;EXTENSION_PATH_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^extensions&lt;/span&gt;&lt;span class="se"&gt;\/[^/]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(?:\/&lt;/span&gt;&lt;span class="sr"&gt;|$&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CORE_PATH_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;src&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;|ui&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;|packages&lt;/span&gt;&lt;span class="se"&gt;\/)&lt;/span&gt;&lt;span class="sr"&gt;/u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PUBLIC_EXTENSION_CONTRACT_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;src&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;plugin-sdk&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;|src&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;plugins&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;contracts&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;|src&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;channels&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;plugins&lt;/span&gt;&lt;span class="se"&gt;\/)&lt;/span&gt;&lt;span class="sr"&gt;/u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;PUBLIC_EXTENSION_CONTRACT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changedPath&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;lanes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;core&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;lanes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;coreTests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;lanes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extensions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;lanes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extensionTests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;reasons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;changedPath&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: public core/plugin contract affects extensions`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is stronger than a natural-language rule. It tells the agent that changing a public plugin contract cannot be verified with core tests alone; extension checks are part of the impact surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Hermes Agent Uses a Test Runner to Remove Local/CI Drift
&lt;/h3&gt;

&lt;p&gt;Hermes Agent does not recommend running &lt;code&gt;pytest&lt;/code&gt; directly. It provides a canonical test runner that fixes environment settings, blanks credential-shaped variables, and pins worker count.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# inside an env-var loop&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
  &lt;span class="k"&gt;*&lt;/span&gt;_API_KEY|&lt;span class="k"&gt;*&lt;/span&gt;_TOKEN|&lt;span class="k"&gt;*&lt;/span&gt;_SECRET|&lt;span class="k"&gt;*&lt;/span&gt;_PASSWORD|&lt;span class="k"&gt;*&lt;/span&gt;_CREDENTIALS|GH_TOKEN|GITHUB_TOKEN&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;unset&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TZ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;UTC
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LANG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;C.UTF-8
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PYTHONHASHSEED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="nv"&gt;WORKERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HERMES_TEST_WORKERS&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;4&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PYTHON&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; pytest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"addopts="&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WORKERS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ignore&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tests/integration &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ignore&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tests/e2e &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ARGS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for Coding Agents. An agent does not know what API keys, locale, CPU count, or shell state exist on a developer machine. A hermetic runner makes "it passed locally" closer to a CI-quality signal.&lt;/p&gt;

&lt;p&gt;The core rule for Verification Layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Important rules in the spec should have corresponding checks in the harness.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the spec says "public API schema must not change," the harness should include a contract test or schema diff.&lt;br&gt;&lt;br&gt;
If the spec says "migration must be reversible," the harness should include migration dry-run or rollback checks.&lt;br&gt;&lt;br&gt;
If the spec says "no cross-owner dependency," the harness should include import boundary or dependency ownership checks.&lt;/p&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;path-filter CI plus stable required checks&lt;/td&gt;
&lt;td&gt;Large apps select API/web/E2E/migration checks from changed surfaces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;changed-lanes.mjs&lt;/code&gt; turns path impact into code&lt;/td&gt;
&lt;td&gt;Verification planning no longer depends on human memory, especially for public contract spread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;canonical test runner blanks env, pins workers, excludes integration/e2e&lt;/td&gt;
&lt;td&gt;Reduces local/CI drift and makes agent test conclusions more trustworthy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ragas&lt;/td&gt;
&lt;td&gt;Makefile aggregates &lt;code&gt;format&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;check&lt;/code&gt;, &lt;code&gt;run-ci&lt;/code&gt;, and &lt;code&gt;benchmarks&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;General-purpose libraries can expose a stable harness through a small command surface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider / SWE-agent&lt;/td&gt;
&lt;td&gt;benchmark/eval harness records pass rate, cost, trajectory&lt;/td&gt;
&lt;td&gt;Validating agent capability itself requires reproducible evals, not just one-off tests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  5. Governance Layer
&lt;/h2&gt;

&lt;p&gt;Governance Layer constrains, audits, approves, and rolls back agent changes. This is the difference between "an agent can write code" and "we can let an agent into a real project."&lt;/p&gt;
&lt;h3&gt;
  
  
  Code Evidence: OpenHands Gives Complex PRs a Temporary Evidence Directory
&lt;/h3&gt;

&lt;p&gt;OpenHands allows complex PRs to store design rationale, debug logs, E2E results, and other temporary material in &lt;code&gt;.pr/&lt;/code&gt;, but does not want that content merged into the main branch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;if [ -d ".pr" ]; then&lt;/span&gt;
  &lt;span class="s"&gt;git config user.name "allhands-bot"&lt;/span&gt;
  &lt;span class="s"&gt;git rm -rf .pr/&lt;/span&gt;
  &lt;span class="s"&gt;git commit -m "chore&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Remove PR-only artifacts [automated]"&lt;/span&gt;
  &lt;span class="s"&gt;git push&lt;/span&gt;
&lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The design is useful because governance is not only about forbidding what agents can do. It can also give agents a temporary workspace where process evidence is visible without polluting the long-lived codebase. OpenClaw's &lt;code&gt;AGENTS.md&lt;/code&gt; reflects the same principle by putting broad gates, Testbox, owner review, release approval, and PR verification into the operating rules agents must read.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Langfuse Turns PR Rules Into Checks
&lt;/h3&gt;

&lt;p&gt;Langfuse's PR template requires Conventional Commit titles, self-review, tests, and documentation checks; a workflow then validates the title automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PR&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Title"&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;edited&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;synchronize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reopened&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validate-pr-title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Validate PR title follows conventional commits&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amannn/action-semantic-pull-request@...&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;feat&lt;/span&gt;
            &lt;span class="s"&gt;fix&lt;/span&gt;
            &lt;span class="s"&gt;docs&lt;/span&gt;
            &lt;span class="s"&gt;refactor&lt;/span&gt;
            &lt;span class="s"&gt;test&lt;/span&gt;
            &lt;span class="s"&gt;security&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for agents because "how a PR enters collaboration" becomes a machine-checkable governance boundary, not just reviewer feedback.&lt;/p&gt;

&lt;p&gt;Governance Layer should make clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which files or directories can be modified.&lt;/li&gt;
&lt;li&gt;Which public contracts need owner review.&lt;/li&gt;
&lt;li&gt;Which commands can run locally and which belong to CI or remote systems.&lt;/li&gt;
&lt;li&gt;Which check failures must be fixed and which can be explained.&lt;/li&gt;
&lt;li&gt;Which temporary evidence may enter a PR and which must never merge to main.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenHands&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.pr/&lt;/code&gt; temporary artifacts plus cleanup after approval&lt;/td&gt;
&lt;td&gt;Gives complex agent PRs an evidence space while keeping main clean&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;owner review, Testbox, release approval, PR verification in &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;High-risk repos must state which actions require human approval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;PR template + semantic PR title workflow + CodeQL/Snyk&lt;/td&gt;
&lt;td&gt;Turns review hygiene and security checks into automatic gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;semantic PR title + layered CI required checks&lt;/td&gt;
&lt;td&gt;Large apps use stable check names and PR rules to maintain merge gates&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Feedback Layer
&lt;/h2&gt;

&lt;p&gt;Feedback Layer lets failures, review comments, and quality signals flow back so the next agent run improves. This layer is still early, but several patterns are already visible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Evidence: Failure Is an Artifact, Not Terminal Output
&lt;/h3&gt;

&lt;p&gt;Dify's E2E workflow uploads logs, and SWE-agent's CI uploads trajectories. The shared idea is that failure should not remain in a single terminal session. It should become downloadable, reviewable material that an agent can inspect in the next step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Dify web-e2e&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload E2E logs&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@...&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;e2e-logs&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;e2e/.logs&lt;/span&gt;

&lt;span class="c1"&gt;# SWE-agent pytest&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload logs &amp;amp; trajectories&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v7&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trajectories-py${{ matrix.python-version }}&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trajectories/runner/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aider's benchmark report records the key context of an eval run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-3.5-sonnet&lt;/span&gt;
&lt;span class="na"&gt;edit_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diff&lt;/span&gt;
&lt;span class="na"&gt;commit_hash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;35f21b5&lt;/span&gt;
&lt;span class="na"&gt;pass_rate_1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;57.1&lt;/span&gt;
&lt;span class="na"&gt;percent_cases_well_formed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;99.2&lt;/span&gt;
&lt;span class="na"&gt;syntax_errors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;test_timeouts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;total_cost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3.6346&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feedback Layer has three kinds of value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the current task fails, the agent can read concrete artifacts instead of guessing again.&lt;/li&gt;
&lt;li&gt;During human review, reviewers can see what verification the agent actually ran, not just its natural-language promise.&lt;/li&gt;
&lt;li&gt;Over time, repeated failures can be written back into Context, Intent, or Verification Layer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Representative cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;Reference value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;API/web/E2E coverage and E2E log artifacts&lt;/td&gt;
&lt;td&gt;Failures flow back by subsystem, making the next agent step easier to localize&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-agent&lt;/td&gt;
&lt;td&gt;trajectory artifacts&lt;/td&gt;
&lt;td&gt;Reviewers can replay the agent behavior path, not only the final result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;benchmark YAML records model, commit, pass rate, cost, and error types&lt;/td&gt;
&lt;td&gt;Eval results become comparable and reproducible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;changed lane reasons, timing, performance notes&lt;/td&gt;
&lt;td&gt;The agent can understand why checks ran and which ones are expensive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;agents:check&lt;/code&gt; detects multi-tool config drift&lt;/td&gt;
&lt;td&gt;Agent configuration itself enters the feedback loop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cross-Project Observations
&lt;/h2&gt;

&lt;p&gt;The table below only looks at how these projects let Coding Agents participate in development. It does not evaluate their product shape.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;What the code shows&lt;/th&gt;
&lt;th&gt;Practical lesson&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt;, scoped guides, changed lane scripts, gate rules&lt;/td&gt;
&lt;td&gt;Large repos need agent operating manuals and coded verification planners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.agents/&lt;/code&gt; canonical source, MCP/Cursor/Codex/Claude config generation&lt;/td&gt;
&lt;td&gt;Multiple agent IDEs need a shared source of truth to avoid drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dify&lt;/td&gt;
&lt;td&gt;root/scoped &lt;code&gt;AGENTS.md&lt;/code&gt;, E2E scripts, path-filter CI&lt;/td&gt;
&lt;td&gt;Large apps should split context and checks by subdomain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenHands&lt;/td&gt;
&lt;td&gt;setup troubleshooting, pre-commit, &lt;code&gt;.pr/&lt;/code&gt; cleanup workflow&lt;/td&gt;
&lt;td&gt;Complex PRs need process evidence while the main branch stays clean&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;hermetic test runner, credential env cleanup, fixed workers&lt;/td&gt;
&lt;td&gt;Harnesses should actively remove local/CI drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;minimal agent guide, unified make commands&lt;/td&gt;
&lt;td&gt;When a project is simple, a short guide can be enough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spec Kit&lt;/td&gt;
&lt;td&gt;specify / plan / tasks / implement + review gate&lt;/td&gt;
&lt;td&gt;Intent Layer can be a staged pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenSpec&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;specs/&lt;/code&gt; + &lt;code&gt;changes/&lt;/code&gt; + archive&lt;/td&gt;
&lt;td&gt;Existing projects benefit from delta specs that maintain behavior changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;Docker benchmark harness, pass rate/cost/error report&lt;/td&gt;
&lt;td&gt;Eval harnesses should isolate execution and record reproducible metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-agent&lt;/td&gt;
&lt;td&gt;Docker sandbox, batch mode, trajectory artifacts&lt;/td&gt;
&lt;td&gt;Agent evaluation needs instances, environments, patches, and evaluation loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ragas&lt;/td&gt;
&lt;td&gt;Makefile checks, CI matrix, benchmarks&lt;/td&gt;
&lt;td&gt;General-purpose libraries can provide a stable harness through clear commands&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Common patterns are emerging:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context files are becoming standard, but formats are still fragmented.&lt;/li&gt;
&lt;li&gt;Scoped guides are more maintainable than one giant root file.&lt;/li&gt;
&lt;li&gt;Verification is the most mature consensus; the difference is whether there is a unified entry point and changed gate.&lt;/li&gt;
&lt;li&gt;Spec workflows are still diverging, but "create reviewable artifacts before coding" is clearly becoming the direction.&lt;/li&gt;
&lt;li&gt;Governance becomes very concrete in higher-risk projects.&lt;/li&gt;
&lt;li&gt;Feedback Layer is early, but benchmarks, trajectories, artifacts, and config drift checks are already visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The divergences are also clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some projects use minimal guides; others use detailed operating manuals.&lt;/li&gt;
&lt;li&gt;Some treat specs as long-lived sources of truth; others keep them only for a PR or feature.&lt;/li&gt;
&lt;li&gt;Some try to reproduce CI locally; others explicitly push heavy checks to remote or CI-only environments.&lt;/li&gt;
&lt;li&gt;Some aggregate checks with Makefiles; others write dedicated changed-gate scripts.&lt;/li&gt;
&lt;li&gt;Some scatter agent config across tool directories; others use &lt;code&gt;.agents/&lt;/code&gt; as a canonical source.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From Ordinary Repo to Agent-Ready Repo
&lt;/h2&gt;

&lt;p&gt;An ordinary project does not need to adopt Spec Kit, OpenSpec, eval harnesses, and complex CI gates all at once. A more practical path is staged maturity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rr1sn69jzl8r9lifb82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rr1sn69jzl8r9lifb82.png" alt="Agent-ready Repo Roadmap" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Minimal action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0. Ordinary repo&lt;/td&gt;
&lt;td&gt;README, source code, and some tests, while key rules live in people's heads&lt;/td&gt;
&lt;td&gt;None; the agent can only search and guess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1. Agent Context Ready&lt;/td&gt;
&lt;td&gt;The agent understands project structure, commands, and boundaries&lt;/td&gt;
&lt;td&gt;Write a root &lt;code&gt;AGENTS.md&lt;/code&gt;; add scoped guides for large areas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Spec Ready&lt;/td&gt;
&lt;td&gt;The agent understands the task contract for this change&lt;/td&gt;
&lt;td&gt;Use lightweight specs for small changes; requirements/design/tasks for medium features; OpenSpec for existing systems; Spec Kit for large or greenfield features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Harness Ready&lt;/td&gt;
&lt;td&gt;The agent can verify changes through a unified entry point&lt;/td&gt;
&lt;td&gt;Provide &lt;code&gt;./scripts/verify.sh&lt;/code&gt;, &lt;code&gt;make check&lt;/code&gt;, or &lt;code&gt;pnpm check&lt;/code&gt;, and explain quick/PR/CI-only checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Agent Workflow Ready&lt;/td&gt;
&lt;td&gt;The six layers connect into the daily workflow&lt;/td&gt;
&lt;td&gt;Define the sequence: read guide, produce spec, change code, verify, submit evidence, enter review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Continuous Agent Improvement&lt;/td&gt;
&lt;td&gt;Failures and review feed back into infrastructure&lt;/td&gt;
&lt;td&gt;Track failure patterns, write repeated review comments into guides, and add missing cases to spec or harness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Level 4 can be very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read AGENTS.md
  -&amp;gt; read scoped guide
  -&amp;gt; read or generate spec / tasks
  -&amp;gt; make the change
  -&amp;gt; run harness
  -&amp;gt; submit verification evidence
  -&amp;gt; enter PR review / approval gate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What This Means for Agent Grove
&lt;/h2&gt;

&lt;p&gt;This article should not only be the first published piece. It should become a roadmap for building Agent Grove itself.&lt;/p&gt;

&lt;p&gt;We already have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt;: the project collaboration entry point.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; / &lt;code&gt;README.zh-CN.md&lt;/code&gt;: project positioning and knowledge framework draft.&lt;/li&gt;
&lt;li&gt;VitePress documentation site and GitHub Pages workflow.&lt;/li&gt;
&lt;li&gt;Local &lt;code&gt;external/&lt;/code&gt; research material, which does not enter the formal content tree.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next steps can move along three lines.&lt;/p&gt;

&lt;p&gt;First, build the Agent Grove repository itself. Following OpenClaw, Dify, and Langfuse, we should not rush to create empty &lt;code&gt;specs/&lt;/code&gt;, &lt;code&gt;evals/&lt;/code&gt;, &lt;code&gt;harnesses/&lt;/code&gt;, or &lt;code&gt;case-studies/&lt;/code&gt; directories. Let real work pull the infrastructure into existence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context: keep maintaining &lt;code&gt;AGENTS.md&lt;/code&gt;; add scoped guides when docs, research, code, and Arbor begin to diverge.&lt;/li&gt;
&lt;li&gt;Intent: each formal research topic should start with a lightweight task contract that states the question, scope, evidence bar, and deliverable shape.&lt;/li&gt;
&lt;li&gt;Execution: document real commands for building the docs site, handling image assets, and checking references.&lt;/li&gt;
&lt;li&gt;Verification: treat &lt;code&gt;npm run docs:build&lt;/code&gt; as the current minimal harness; later add link checks, image existence checks, and reference format checks.&lt;/li&gt;
&lt;li&gt;Governance: keep &lt;code&gt;external&lt;/code&gt; research material out of the formal content tree, and do not package immature content as a finished article.&lt;/li&gt;
&lt;li&gt;Feedback: write repeated review issues back into writing rules or article templates instead of leaving them only in conversation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Second, keep examples as minimal validation slices. Examples are useful, but they should not exist just to make the repo look complete. Create them when an idea is hard to explain in prose or needs runnable evidence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A minimal &lt;code&gt;AGENTS.md&lt;/code&gt; plus scoped guide example for Context Layer.&lt;/li&gt;
&lt;li&gt;A lightweight spec plus verify script example for the Intent-to-Verification connection.&lt;/li&gt;
&lt;li&gt;A failure artifact feedback example for Feedback Layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each example should demonstrate one concept and support article conclusions. It should not become a separate tutorial repo.&lt;/p&gt;

&lt;p&gt;Third, build Arbor. Arbor is not a helper for writing articles or doing research. It is the lightweight learning version of OpenClaw that Agent Grove should actually iterate: a code-oriented Agent project for practicing agent engineering infrastructure.&lt;/p&gt;

&lt;p&gt;Arbor can start small, but it should enter a real development shape early:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It has its own code directory, module boundaries, and agent-facing guide.&lt;/li&gt;
&lt;li&gt;It has lightweight specs describing each capability increment, non-goals, and acceptance criteria.&lt;/li&gt;
&lt;li&gt;It has an execution harness that can run the minimal agent loop, tool calls, and task flow in a controlled environment.&lt;/li&gt;
&lt;li&gt;It has a verification harness that can decide whether a task was completed according to spec, not just by reading natural-language output.&lt;/li&gt;
&lt;li&gt;It has governance boundaries for files, commands, tools, and external side effects that need restriction or human approval.&lt;/li&gt;
&lt;li&gt;It has feedback artifacts that record failure traces, test results, cost, latency, and replayable execution paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples remain valuable, but they are single-point experiments outside Arbor. When a concept does not yet belong in Arbor, or when it needs a very small runnable slice, create an example. Long term, Arbor should be the system that carries the multi-layer infrastructure, not a pile of isolated demos.&lt;/p&gt;

&lt;p&gt;Therefore, Agent Grove should not become a tutorial collection for Spec Kit, OpenSpec, or Harness engineering. The external docs and official tutorials are already detailed. We should absorb them as cases and evidence, turn them into our own engineering judgment, and write future articles around how a project solves a concrete agent engineering problem. The final output should feed back into Arbor as a real system.&lt;/p&gt;

&lt;p&gt;This matches the core claim of the article: agent-ready infrastructure is not about creating every directory at once. It is about gradually engineering the context, intent, execution, verification, governance, and feedback that real collaboration repeatedly needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;AGENTS.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/guides/agents-md" rel="noopener noreferrer"&gt;OpenAI Codex AGENTS.md guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;OpenAI: Harness engineering for reliable agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/github/spec-kit/blob/1994bd766ea2a3b1d9d87dcec18abc9410f39834/README.md" rel="noopener noreferrer"&gt;GitHub Spec Kit README&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/github/spec-kit/blob/1994bd766ea2a3b1d9d87dcec18abc9410f39834/workflows/speckit/workflow.yml" rel="noopener noreferrer"&gt;Spec Kit workflow definition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Fission-AI/OpenSpec/blob/7c3acccaf7d01006e3aac2194a2a1967e4d66984/docs/getting-started.md" rel="noopener noreferrer"&gt;OpenSpec Getting Started&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Fission-AI/OpenSpec/blob/7c3acccaf7d01006e3aac2194a2a1967e4d66984/docs/workflows.md" rel="noopener noreferrer"&gt;OpenSpec Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Fission-AI/OpenSpec/tree/7c3acccaf7d01006e3aac2194a2a1967e4d66984/openspec/changes/workspace-create-and-register-repos" rel="noopener noreferrer"&gt;OpenSpec workspace change example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openclaw/openclaw/blob/e8d0cf75ea0e6c0db5a1468cb0715746fa3ad75e/AGENTS.md" rel="noopener noreferrer"&gt;OpenClaw AGENTS.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openclaw/openclaw/blob/e8d0cf75ea0e6c0db5a1468cb0715746fa3ad75e/scripts/changed-lanes.mjs" rel="noopener noreferrer"&gt;OpenClaw changed lanes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/.agents/AGENTS.md" rel="noopener noreferrer"&gt;Langfuse agent guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/scripts/agents/sync-agent-shims.mjs" rel="noopener noreferrer"&gt;Langfuse agent config sync&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langchain-ai/langgraph/blob/a0c4bdc3cb88e371a0fee00b6479509e9c9a8a72/AGENTS.md" rel="noopener noreferrer"&gt;LangGraph AGENTS.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/AGENTS.md" rel="noopener noreferrer"&gt;Dify AGENTS.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/api/AGENTS.md" rel="noopener noreferrer"&gt;Dify API Agent Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/e2e/AGENTS.md" rel="noopener noreferrer"&gt;Dify E2E Agent Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/e2e/package.json" rel="noopener noreferrer"&gt;Dify E2E package scripts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/.github/workflows/main-ci.yml" rel="noopener noreferrer"&gt;Dify main CI workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/.github/workflows/web-e2e.yml" rel="noopener noreferrer"&gt;Dify Web E2E workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langgenius/dify/blob/cd9daef564369b3926ce7fed242a1feb5c4a451f/.github/workflows/semantic-pull-request.yml" rel="noopener noreferrer"&gt;Dify semantic PR workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/OpenHands/OpenHands/blob/d3864d9992c4a7503b32e9fbc1fba8c4bf2bdf92/AGENTS.md" rel="noopener noreferrer"&gt;OpenHands AGENTS.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/OpenHands/OpenHands/blob/d3864d9992c4a7503b32e9fbc1fba8c4bf2bdf92/.github/workflows/pr-artifacts.yml" rel="noopener noreferrer"&gt;OpenHands PR artifacts workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/blob/8163d371922768c32f43eb6036d7d36e56775605/scripts/run_tests.sh" rel="noopener noreferrer"&gt;Hermes Agent test runner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Aider-AI/aider/blob/3ec8ec5a7d695b08a6c24fe6c0c235c8f87df9af/benchmark/README.md" rel="noopener noreferrer"&gt;Aider benchmark harness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/SWE-agent/SWE-agent/blob/0f4f3bba990e01ca8460b9963abdcd89e38042f2/docs/usage/batch_mode.md" rel="noopener noreferrer"&gt;SWE-agent batch mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/SWE-agent/SWE-agent/blob/0f4f3bba990e01ca8460b9963abdcd89e38042f2/.github/workflows/pytest.yaml" rel="noopener noreferrer"&gt;SWE-agent pytest workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/package.json" rel="noopener noreferrer"&gt;Langfuse package scripts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/.github/workflows/validate-pr-title.yml" rel="noopener noreferrer"&gt;Langfuse PR title workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/.github/PULL_REQUEST_TEMPLATE.md" rel="noopener noreferrer"&gt;Langfuse PR template&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/.github/workflows/codeql.yml" rel="noopener noreferrer"&gt;Langfuse CodeQL workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langfuse/langfuse/blob/0256db00672babdeac527221186429ef258848ca/.github/workflows/snyk-web.yml" rel="noopener noreferrer"&gt;Langfuse Snyk Web workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/explodinggradients/ragas/blob/298b68274234c060deacab3cf5fb52aa3a20e885/Makefile" rel="noopener noreferrer"&gt;Ragas Makefile&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>opensource</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Proteus: The AI-native editor for multimodal creation</title>
      <dc:creator>gezilinll</dc:creator>
      <pubDate>Fri, 02 Jan 2026 04:19:42 +0000</pubDate>
      <link>https://dev.to/gezilinll/proteus-the-ai-native-editor-for-multimodal-creation-29h3</link>
      <guid>https://dev.to/gezilinll/proteus-the-ai-native-editor-for-multimodal-creation-29h3</guid>
      <description>&lt;p&gt;I'm building Proteus, an open-source multimodal editor (think Figma meets Notion, but AI-native) where &lt;strong&gt;AI writes most of the code&lt;/strong&gt; while I focus on architecture, technical decisions, and quality control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In 2025, tools like Cursor and Claude can write good enough code in 80% of scenarios. The question isn't "Can AI code?" but "What becomes valuable when AI can code?" I believe it's &lt;strong&gt;system design, technical decision-making, and end-to-end ownership&lt;/strong&gt;—not just knowing APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What makes this different:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI-native from day one&lt;/strong&gt;: Every architectural decision prioritizes AI-friendliness. This isn't AI bolted on later—it's designed for AI collaboration from the first line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fully transparent&lt;/strong&gt;: All code, architecture decisions, and lessons learned are public. I'm documenting the entire journey in weekly technical articles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real editor, not a toy&lt;/strong&gt;: Phase 1 is complete with a working demo. You can create shapes, text, images, transform them, copy/paste, undo/redo—all the core editor capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning resource&lt;/strong&gt;: If you want to understand how editors work (scene graphs, rendering, interaction systems) or how to structure code for AI collaboration, this is a live case study.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Current status:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ Phase 1: Core editing (scene graph, rendering, interaction, tools)&lt;br&gt;&lt;br&gt;
🚧 Phase 2: Multimodal elements (video, audio, web embeds)&lt;br&gt;&lt;br&gt;
📋 Phase 3: AI Agent integration (natural language → editor actions)&lt;br&gt;&lt;br&gt;
📋 Phase 4: Real-time collaboration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; &lt;a href="https://proteus.gezilinll.com/" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/gezilinll/Proteus" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Articles:&lt;/strong&gt; &lt;a href="https://github.com/gezilinll/Proteus/tree/main/articles" rel="noopener noreferrer"&gt;Tech Blog&lt;/a&gt; (4 articles so far, covering architecture, rendering, interaction design)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The experiment:&lt;/strong&gt; What happens when you stop reviewing AI's code and instead focus entirely on architecture, problem diagnosis, and guiding AI through testing and context-building? That's what I'm exploring here.&lt;/p&gt;

&lt;p&gt;Would love feedback from the HN community—especially from those building complex frontend apps or thinking about AI-native development workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Alternative Shorter Version (if character limit is an issue)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Title:&lt;/strong&gt; Proteus: An AI-native multimodal editor where AI writes 80% of the code&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building an open-source editor (Figma + Notion, AI-native) where AI writes most code while I focus on architecture and decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt; In 2025, AI can code—so what becomes valuable? System design, technical decisions, and ownership.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's different:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI-native from day one (not bolted on)&lt;/li&gt;
&lt;li&gt;Fully transparent (all code + articles public)&lt;/li&gt;
&lt;li&gt;Real editor (Phase 1 complete, working demo)&lt;/li&gt;
&lt;li&gt;Learning resource (how editors work, AI-native architecture)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Phase 1 ✅ | Phase 2-4 🚧&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo:&lt;/strong&gt; &lt;a href="https://proteus.gezilinll.com/" rel="noopener noreferrer"&gt;https://proteus.gezilinll.com/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/gezilinll/Proteus" rel="noopener noreferrer"&gt;https://github.com/gezilinll/Proteus&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Articles:&lt;/strong&gt; &lt;a href="https://github.com/gezilinll/Proteus/tree/main/articles" rel="noopener noreferrer"&gt;https://github.com/gezilinll/Proteus/tree/main/articles&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Experimenting with: What happens when you stop reviewing AI code and focus on architecture + problem diagnosis?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I have implemented a GPU version of Pica which is high quailty image resizer</title>
      <dc:creator>gezilinll</dc:creator>
      <pubDate>Wed, 05 Mar 2025 01:06:32 +0000</pubDate>
      <link>https://dev.to/gezilinll/i-have-implemented-a-gpu-version-of-pica-which-is-high-quailty-image-resizer-42la</link>
      <guid>https://dev.to/gezilinll/i-have-implemented-a-gpu-version-of-pica-which-is-high-quailty-image-resizer-42la</guid>
      <description>&lt;p&gt;🔗 GitHub Repo: &lt;a href="https://github.com/gezilinll/pica-gpu" rel="noopener noreferrer"&gt;https://github.com/gezilinll/pica-gpu&lt;/a&gt;&lt;br&gt;
🔗 Demo: &lt;a href="https://pica-gpu.gezilinll.com/" rel="noopener noreferrer"&gt;https://pica-gpu.gezilinll.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently, while using Pica in a project, I noticed that both CPU and memory usage were quite high. When processing multiple images, the application would sometimes freeze or even crash. After reviewing the source code, I have to say that Pica's current CPU-based implementation is already optimized to an extreme level.&lt;/p&gt;

&lt;p&gt;From what I observed, Pica's main optimization lies in its use of advanced filters for image scaling instead of conventional methods like nearest-neighbor interpolation (which prioritizes performance over quality). These filters have higher computational complexity but produce better results. Since filtering mainly controls how target pixels sample from the original image, and each pixel can be processed independently, this logic is inherently well-suited for GPU execution.&lt;/p&gt;

&lt;p&gt;I have ported all of Pica's filtering algorithms to WebGL, creating a GPU-based version of Pica. In theory, using Compute Shaders could further improve performance, but since it requires WebGPU, which has compatibility concerns, I have not implemented that yet.&lt;/p&gt;

&lt;p&gt;On the algorithm side, apart from mks2013, other filters theoretically support additional sharpening parameters. However, since our project primarily uses mks2013, which already provides excellent results, I haven't introduced sharpening logic to other filters yet—this is something I may improve in the future.&lt;/p&gt;

&lt;p&gt;Currently, the GPU version of Pica achieves the same anti-moiré effect and sharpness as the original Pica while improving performance by 2-10×, with greater speedup for larger images. Additionally, because the GPU implementation avoids creating extra buffers, memory usage is lower. CPU load is also significantly reduced, which should help prevent performance bottlenecks.&lt;/p&gt;

&lt;p&gt;Since Pica is designed as a CPU-based, JavaScript implementation, modifying it directly via a PR might be challenging for me. Instead, I have created a separate project.&lt;/p&gt;

&lt;p&gt;I’d love for the community to collaborate on this! Any feedback or contributions would be greatly appreciated. Lastly, huge thanks to Pica for making our project possible! 😊&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I am developing a Text Input Component based on Skia and Canvas</title>
      <dc:creator>gezilinll</dc:creator>
      <pubDate>Sun, 14 Jul 2024 02:25:12 +0000</pubDate>
      <link>https://dev.to/gezilinll/i-am-developing-a-text-input-component-based-on-skia-and-canvas-1407</link>
      <guid>https://dev.to/gezilinll/i-am-developing-a-text-input-component-based-on-skia-and-canvas-1407</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/gezilinll/TextMagic" rel="noopener noreferrer"&gt;visit github&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TextMagic is the next generation text component. Unlike native input and textarea components, it supports richer text effects and typesetting capabilities. By controlling text layout autonomously, it ensures consistent text display across different platforms and browsers. TextMagic follows a modular design approach, offering both an integrated component(@text-magic) for seamless integration and standalone components for specific needs: @text-magic/input for text input and @text-magic/renderer for text typesetting and rendering.&lt;/p&gt;

&lt;p&gt;If anyone shares an interest in text or related fields, I welcome discussion and collaboration. I'm also in the process of learning in this area and would appreciate more feedback and assistance.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I am developing a Text Input Component based on Skia and Canvas</title>
      <dc:creator>gezilinll</dc:creator>
      <pubDate>Sat, 13 Jul 2024 13:49:33 +0000</pubDate>
      <link>https://dev.to/gezilinll/i-am-developing-a-text-input-component-based-on-skia-and-canvas-1l04</link>
      <guid>https://dev.to/gezilinll/i-am-developing-a-text-input-component-based-on-skia-and-canvas-1l04</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/gezilinll/TextMagic" rel="noopener noreferrer"&gt;visit github&lt;/a&gt;&lt;br&gt;
TextMagic is the next generation text component. Unlike native input and textarea components, it supports richer text effects and typesetting capabilities. By controlling text layout autonomously, it ensures consistent text display across different platforms and browsers. TextMagic follows a modular design approach, offering both an integrated component(@text-magic) for seamless integration and standalone components for specific needs: @text-magic/input for text input and @text-magic/renderer for text typesetting and rendering.&lt;/p&gt;

&lt;p&gt;If anyone shares an interest in text or related fields, I welcome discussion and collaboration. I'm also in the process of learning in this area and would appreciate more feedback and assistance.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
