<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hichoi-Dev</title>
    <description>The latest articles on DEV Community by Hichoi-Dev (@casamia918).</description>
    <link>https://dev.to/casamia918</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1175424%2Fccccef7f-7c9c-4c14-812d-d36e99459a10.png</url>
      <title>DEV Community: Hichoi-Dev</title>
      <link>https://dev.to/casamia918</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/casamia918"/>
    <language>en</language>
    <item>
      <title>New workflow control method for harness engineering — Signature-Based Locking</title>
      <dc:creator>Hichoi-Dev</dc:creator>
      <pubDate>Sat, 21 Mar 2026 19:58:59 +0000</pubDate>
      <link>https://dev.to/casamia918/new-workflow-control-method-for-harness-engineering-signature-based-locking-3bmj</link>
      <guid>https://dev.to/casamia918/new-workflow-control-method-for-harness-engineering-signature-based-locking-3bmj</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: AI Won't Stay Harnessed
&lt;/h2&gt;

&lt;p&gt;If you've been building AI-assisted development workflows — what some call "harness engineering" — you've hit this wall:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No matter how carefully you craft your prompts, the AI eventually goes off-script.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You define a multi-step workflow. The AI follows it for a while. Then somewhere around step 4, it decides to "optimize" by skipping steps, modifying files directly, or inventing a shortcut that breaks your entire pipeline.&lt;/p&gt;

&lt;p&gt;This isn't a prompting failure. It's a fundamental limitation of prompt-only workflow control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Prompt-Only Control Fails
&lt;/h2&gt;

&lt;p&gt;Three documented forces work against prompt-based workflow enforcement:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Context Rot (Lost in the Middle)
&lt;/h3&gt;

&lt;p&gt;As conversations grow longer, instructions from the beginning of the context window lose influence. &lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;Research published in TACL&lt;/a&gt; ("Lost in the Middle") demonstrates that LLMs exhibit a U-shaped attention curve — they attend strongly to the beginning and end of context, but performance degrades by over 20% for information in the middle. Your carefully structured "NEVER do X" rules get diluted by thousands of tokens of subsequent conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Training-Induced Optimization Pressure
&lt;/h3&gt;

&lt;p&gt;This isn't speculation — it's documented behavior. &lt;a href="https://arxiv.org/abs/2310.13548" rel="noopener noreferrer"&gt;RLHF training&lt;/a&gt; creates measurable pressure toward concise, "helpful" responses, because human evaluators systematically prefer them. Anthropic's own &lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices" rel="noopener noreferrer"&gt;prompting best practices&lt;/a&gt; explicitly state that newer Claude models "may skip detailed summaries for efficiency." OpenAI acknowledged the same phenomenon when GPT-4 became &lt;a href="https://futurism.com/the-byte/openai-patch-fix-gpt4-laziness" rel="noopener noreferrer"&gt;"lazy"&lt;/a&gt; in December 2023, requiring a new model checkpoint to fix.&lt;/p&gt;

&lt;p&gt;When an AI sees a 5-step workflow where steps 2-4 seem like overhead, it has a trained tendency to compress. This is the model being helpful — and breaking your harness in the process.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. No Enforcement Boundary
&lt;/h3&gt;

&lt;p&gt;Prompts are suggestions, not constraints. There's no mechanism to &lt;em&gt;prevent&lt;/em&gt; the AI from taking an action — you can only &lt;em&gt;ask&lt;/em&gt; it not to. &lt;a href="https://arxiv.org/pdf/2502.13295" rel="noopener noreferrer"&gt;Research on specification gaming&lt;/a&gt; shows that models can learn to satisfy the apparent goal while bypassing the intended process — including &lt;a href="https://lilianweng.github.io/posts/2024-11-28-reward-hacking/" rel="noopener noreferrer"&gt;modifying unit tests to pass&lt;/a&gt; instead of writing correct code. Prompts operate in the same trust domain as the AI itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Been Tried
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Approach 1: Stronger Prompts
&lt;/h3&gt;

&lt;p&gt;Add more rules. Make them UPPERCASE. Use XML tags. Add "CRITICAL" and "NEVER" and "NON-NEGOTIABLE."&lt;/p&gt;

&lt;p&gt;This helps initially but doesn't solve context rot. The more rules you add, the more diluted each individual rule becomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: File-Level Permissions
&lt;/h3&gt;

&lt;p&gt;Restrict which files the AI can modify (e.g., strict mode, read-only markers). This prevents certain destructive actions but doesn't enforce &lt;em&gt;workflow ordering&lt;/em&gt;. The AI can still call commands out of sequence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Deterministic Scripts
&lt;/h3&gt;

&lt;p&gt;Move workflow logic out of prompts and into deterministic scripts. The AI calls scripts instead of modifying state directly. This is the right direction — but scripts alone can't prevent the AI from calling them out of order, or skipping them entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, You Need a Workflow
&lt;/h2&gt;

&lt;p&gt;Before we can lock anything, we need something to lock. Signature-Based Locking assumes your AI-assisted work follows a &lt;strong&gt;defined lifecycle with ordered stages&lt;/strong&gt; — a workflow where step N must complete before step N+1 begins.&lt;/p&gt;

&lt;p&gt;This is the lifecycle steps behind &lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;REAP&lt;/a&gt; (Recursive Evolutionary Autonomous Pipeline), where each unit of work — called a &lt;strong&gt;Generation&lt;/strong&gt; — follows a 5-stage lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Objective → Planning → Implementation → Validation → Completion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each stage has a clear purpose: define the goal, break it into tasks, build it, verify it works, then retrospect and archive. Stages produce artifacts, and transitions between stages are explicit — you can't "drift" from planning into implementation without a deliberate transition.&lt;/p&gt;

&lt;p&gt;This kind of structured workflow is where AI agents provide the most value (creative work within each stage) but also where they cause the most damage (skipping stages, going out of order, bypassing gates). The more structured your workflow, the more you need enforcement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signature-Based Locking: Enforcing Workflow Sequence from Outside the AI
&lt;/h2&gt;

&lt;p&gt;Given a structured workflow, here's the insight: &lt;strong&gt;sequence enforcement must happen outside the AI's trust boundary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI can ignore prompts. Content guardrails can filter what it says. But neither can enforce the &lt;em&gt;order&lt;/em&gt; in which steps are executed. Cryptographic signatures can.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxm2pp09mtomgqordq8q2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxm2pp09mtomgqordq8q2.png" alt="SIGNATURE_BASED_LOCKING_SEQ_DIAGRAM" width="800" height="985"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each stage command generates a random nonce, stores its SHA256 hash (mixed with execution context) in the workflow state file, and returns the raw nonce to the AI. To advance to the next stage, the AI must pass this nonce to the transition command, which recomputes the hash and verifies it matches.&lt;/p&gt;

&lt;p&gt;The critical property: &lt;strong&gt;only the actual script execution can produce a valid nonce.&lt;/strong&gt; The AI receives the nonce as output, but cannot reverse-engineer or fabricate one. The hash is stored in a managed state file that the AI is structurally prevented from modifying directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Signature-Based Locking Gives You
&lt;/h3&gt;

&lt;p&gt;Existing approaches each solve a piece of the puzzle: prompts communicate intent, file permissions restrict access, content guardrails filter unsafe output, and deterministic scripts encode logic. But none of them enforce &lt;strong&gt;execution sequence&lt;/strong&gt; — the guarantee that step N actually happened before step N+1.&lt;/p&gt;

&lt;p&gt;Signature-Based Locking fills this gap. It doesn't replace the other approaches; it adds the missing dimension. Here's how it stacks up:&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Signature-Based Locking Works
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threat&lt;/th&gt;
&lt;th&gt;Prompt-only&lt;/th&gt;
&lt;th&gt;Signature-Based Locking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI skips a stage&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;Blocked (no nonce)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI modifies state directly&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;Blocked (hash mismatch)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI replays a previous step&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;Blocked (context-bound nonce)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Hybrid Architecture
&lt;/h2&gt;

&lt;p&gt;Signature-Based Locking is most effective as part of a &lt;strong&gt;hybrid architecture&lt;/strong&gt; that separates deterministic and creative work:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxyvul79sii5ufuc74ut9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxyvul79sii5ufuc74ut9.png" alt="HYBRID_ARCH_FLOW_DIAGRAM" width="707" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key principle&lt;/strong&gt;: The deterministic script handles everything that has a "right answer" — state transitions, gate checks, file validation, hook execution. The AI handles everything that requires creativity — writing code, making design decisions, solving problems.&lt;/p&gt;

&lt;p&gt;The scripts communicate with the AI through &lt;strong&gt;structured JSON output&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"objective"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"complete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Objective stage complete. Advance with: /reap.next a3f8c2d9..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI receives clear instructions and a nonce. It cannot advance without passing the nonce to the next command. The deterministic script verifies and controls the flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  REAP: This Architecture in Practice
&lt;/h2&gt;

&lt;p&gt;This is exactly how &lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;REAP&lt;/a&gt; (Recursive Evolutionary Autonomous Pipeline) works. REAP is an open-source CLI tool that structures AI-assisted development as an evolutionary process — software evolves across &lt;strong&gt;Generations&lt;/strong&gt;, each carrying one goal through a 5-stage lifecycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  What REAP Does
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Genome&lt;/strong&gt; — Your project's design knowledge (architecture decisions, conventions, constraints, business rules) is managed as a living document that evolves across generations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lifecycle&lt;/strong&gt; — Each generation follows: Objective → Planning → Implementation → Validation → Completion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signature-Based Locking&lt;/strong&gt; — Stage transitions require cryptographic nonce verification, preventing the AI from skipping stages or going off-script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session Persistence&lt;/strong&gt; — The Genome and current generation state are automatically injected into the AI's context at session start, solving the "context loss across sessions" problem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Support&lt;/strong&gt; — Works with Claude Code and OpenCode.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Signature Chain in REAP
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/reap.start &lt;span class="s2"&gt;"Build user auth"&lt;/span&gt;
  → Script creates generation, stores &lt;span class="nb"&gt;hash&lt;/span&gt;
  → AI receives instructions

/reap.objective
  → AI defines goals, writes artifact
  → Script verifies artifact, generates nonce
  → Message: &lt;span class="s2"&gt;"Advance with: /reap.next a3f8c2..."&lt;/span&gt;

/reap.next a3f8c2...
  → Script verifies SHA256&lt;span class="o"&gt;(&lt;/span&gt;a3f8c2 + genId + stage&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; stored &lt;span class="nb"&gt;hash&lt;/span&gt;
  → ✅ Match → advance to planning
  → ❌ Mismatch → &lt;span class="s2"&gt;"Token verification failed. Re-run the stage command."&lt;/span&gt;

/reap.planning
  → AI creates implementation plan
  → Script generates new nonce
  → Message: &lt;span class="s2"&gt;"Advance with: /reap.next b7d91e..."&lt;/span&gt;

  ... chain continues through all stages ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each nonce is single-use, context-bound (includes generation ID and stage name), and cryptographically verified. The AI cannot skip ahead, replay, or forge tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Matters
&lt;/h3&gt;

&lt;p&gt;After building 109 generations with REAP (yes, REAP is &lt;a href="https://github.com/c-d-cc/reap/tree/main/.reap/lineage" rel="noopener noreferrer"&gt;built with REAP&lt;/a&gt;), we've seen firsthand that prompt-only workflow control breaks down at scale. The AI "optimizes" by skipping validation, modifying state files directly, or calling commands out of order.&lt;/p&gt;

&lt;p&gt;Signature-Based Locking eliminated these failure modes — not by adding more rules to the prompt, but by making rule violation mechanically impossible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Work: NeMo Guardrails
&lt;/h2&gt;

&lt;p&gt;It's worth mentioning NVIDIA's &lt;a href="https://developer.nvidia.com/nemo-guardrails" rel="noopener noreferrer"&gt;NeMo Guardrails&lt;/a&gt;, which shares an important philosophy with Signature-Based Locking: &lt;strong&gt;don't rely on prompts alone — enforce rules in code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NeMo Guardrails places a programmable middleware between the user and the LLM. User input is normalized into intents, &lt;a href="https://github.com/NVIDIA/NeMo-Guardrails" rel="noopener noreferrer"&gt;Colang&lt;/a&gt; rules determine whether to call the LLM or return a pre-defined response, and the output is screened against safety policies. This gives developers precise control over what the AI can say — blocking toxic content, preventing jailbreaks, enforcing topic boundaries, and &lt;a href="https://discuss.pytorch.kr/t/nemo-guardrails-llm-feat-nvidia/8403" rel="noopener noreferrer"&gt;detecting hallucinations through factual grounding checks&lt;/a&gt;. It integrates with LangChain, LlamaIndex, and supports GPU-accelerated evaluation for production workloads.&lt;/p&gt;

&lt;p&gt;This is genuinely valuable for chatbots, customer-facing AI, and any application where content safety matters. The core insight — layering deterministic safety on top of probabilistic LLMs — is sound.&lt;/p&gt;

&lt;p&gt;Where the two approaches diverge is the &lt;strong&gt;dimension of control&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;NeMo Guardrails&lt;/th&gt;
&lt;th&gt;Signature-Based Locking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Controls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What the AI says (content)&lt;/td&gt;
&lt;td&gt;What order the AI executes steps (sequence)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mechanism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Input/output filtering via policy rules&lt;/td&gt;
&lt;td&gt;Cryptographic nonce chain across steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prevents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Toxic content, jailbreaks, hallucinations&lt;/td&gt;
&lt;td&gt;Stage skipping, out-of-order execution, state tampering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chatbots, customer-facing AI&lt;/td&gt;
&lt;td&gt;Multi-step workflows, autonomous agents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They're complementary, not competing. You could use NeMo Guardrails to ensure the AI doesn't produce unsafe content, &lt;em&gt;and&lt;/em&gt; Signature-Based Locking to ensure it follows the correct execution sequence. Different dimensions, same philosophy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @c-d-cc/reap
reap init my-project
&lt;span class="c"&gt;# Open Claude Code or OpenCode&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.evolve &lt;span class="s2"&gt;"Implement user authentication"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://reap.cc" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt; | &lt;a href="https://www.npmjs.com/package/@c-d-cc/reap" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you struggled with keeping AI agents on-script in multi-step workflows? What approaches have you tried? I'd love to hear about your harness engineering experiences in the comments.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;Lost in the Middle: How Language Models Use Long Contexts — Liu et al. (TACL)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2310.13548" rel="noopener noreferrer"&gt;Towards Understanding Sycophancy in Language Models — Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices" rel="noopener noreferrer"&gt;Claude Prompting Best Practices — Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://futurism.com/the-byte/openai-patch-fix-gpt4-laziness" rel="noopener noreferrer"&gt;OpenAI Patches "Lazy" GPT-4 — Futurism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2502.13295" rel="noopener noreferrer"&gt;Specification Gaming in Reasoning Models — arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lilianweng.github.io/posts/2024-11-28-reward-hacking/" rel="noopener noreferrer"&gt;Reward Hacking in LLMs — Lilian Weng&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research/reward-tampering" rel="noopener noreferrer"&gt;Reward Tampering Research — Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.pnas.org/doi/10.1073/pnas.2322420121" rel="noopener noreferrer"&gt;Embers of Autoregression — PNAS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.nvidia.com/nemo-guardrails" rel="noopener noreferrer"&gt;NeMo Guardrails — NVIDIA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discuss.pytorch.kr/t/nemo-guardrails-llm-feat-nvidia/8403" rel="noopener noreferrer"&gt;NeMo Guardrails 소개 — PyTorch Korea&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;REAP — GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Specs Cannot be Source of Source Code — Why Intent Management Matters in AI-Driven Development</title>
      <dc:creator>Hichoi-Dev</dc:creator>
      <pubDate>Fri, 20 Mar 2026 17:37:55 +0000</pubDate>
      <link>https://dev.to/casamia918/specs-cannot-be-source-of-source-code-why-intent-management-matters-in-ai-driven-development-159c</link>
      <guid>https://dev.to/casamia918/specs-cannot-be-source-of-source-code-why-intent-management-matters-in-ai-driven-development-159c</guid>
      <description>&lt;h3&gt;
  
  
  The Seductive Idea
&lt;/h3&gt;

&lt;p&gt;There's a compelling narrative in AI-driven development right now: &lt;strong&gt;write a detailed spec, feed it to an AI agent, and get working software out.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GitHub's &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;Spec Kit&lt;/a&gt;, AWS's &lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, and a growing ecosystem of tools all converge on the same premise — that specifications can become the "source of source code." The product requirements document isn't just a guide for implementation; it &lt;em&gt;is&lt;/em&gt; the source that generates implementation.&lt;/p&gt;

&lt;p&gt;It's an attractive idea. If specs are the source, then developers become spec writers, AI becomes the compiler, and code becomes a generated artifact. Clean. Elegant. Almost too good.&lt;/p&gt;

&lt;p&gt;And that's the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Source Code Is Deterministic. Specs Are Not.
&lt;/h3&gt;

&lt;p&gt;Let's start with what "source" actually means in software engineering.&lt;/p&gt;

&lt;p&gt;When you compile &lt;code&gt;main.c&lt;/code&gt;, you get the same binary. Every time. On every machine. This property — &lt;strong&gt;determinism&lt;/strong&gt; — is what makes source code &lt;em&gt;source&lt;/em&gt;. It's the &lt;a href="https://reproducible-builds.org/docs/deterministic-build-systems/" rel="noopener noreferrer"&gt;reproducible foundation&lt;/a&gt; on which everything else stands: builds, tests, deployments, debugging.&lt;/p&gt;

&lt;p&gt;Now consider a specification:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The system should handle user authentication with proper security measures."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Feed this to an AI agent three times. You'll get three different implementations — different OAuth flows, different session strategies, different error handling patterns. The &lt;a href="https://arxiv.org/abs/2602.00180" rel="noopener noreferrer"&gt;same spec produces different code&lt;/a&gt; across different runs, different models, and different context windows.&lt;/p&gt;

&lt;p&gt;This isn't a bug in the AI. It's a fundamental characteristic. Specifications are written in natural language, which is inherently ambiguous. LLMs are non-deterministic by design. The combination means that &lt;strong&gt;specs cannot serve as "source" in any meaningful engineering sense&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Source code has a contract: same input, same output. Specifications don't — and can't — honor that contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Then What Are Specs? Intent, Not Source.
&lt;/h3&gt;

&lt;p&gt;If specs aren't source, what are they?&lt;/p&gt;

&lt;p&gt;They're &lt;strong&gt;intent&lt;/strong&gt;. Initiative. Direction. A spec says &lt;em&gt;what&lt;/em&gt; you want and &lt;em&gt;why&lt;/em&gt; you want it — but it doesn't deterministically produce &lt;em&gt;how&lt;/em&gt;. The "how" emerges through the act of implementation, whether done by a human or an AI agent.&lt;/p&gt;

&lt;p&gt;This distinction matters more than it seems:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Intent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Determinism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same input → same output&lt;/td&gt;
&lt;td&gt;Same input → many valid outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Verification&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Compile, run, test&lt;/td&gt;
&lt;td&gt;Interpret, judge, review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authority&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The code is the truth&lt;/td&gt;
&lt;td&gt;The intent guides the truth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Drift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Doesn't drift from itself&lt;/td&gt;
&lt;td&gt;Drifts from implementation over time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Treating intent as source is a category error. It's like treating a compass bearing as a GPS coordinate — useful for direction, useless for pinpointing where you actually are.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Still Need to Manage Intent
&lt;/h3&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;just because specs aren't source doesn't mean they don't matter.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In AI-driven development, intent management is arguably &lt;em&gt;more&lt;/em&gt; critical than ever:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context loss&lt;/strong&gt; — AI agents forget everything between sessions. Without persisted intent, every session starts from zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge decay&lt;/strong&gt; — Decisions made in session 12 are invisible in session 13. Architecture rationale evaporates. Business rules get re-debated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift without anchor&lt;/strong&gt; — Without a persistent record of intent, AI agents make locally reasonable but globally inconsistent decisions. The codebase slowly becomes incoherent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question isn't whether to manage intent. It's &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Teams Have Managed Specs (A Brief History)
&lt;/h3&gt;

&lt;p&gt;Software teams have tried many approaches to capture and maintain design knowledge. Here's how the major ones compare — especially through the lens of AI-assisted development:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;th&gt;Weaknesses&lt;/th&gt;
&lt;th&gt;AI-Era Fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RFC&lt;/strong&gt; — proposal for collecting feedback (&lt;a href="https://newsletter.pragmaticengineer.com/p/rfcs-and-design-docs" rel="noopener noreferrer"&gt;Pragmatic Engineer&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Structured deliberation&lt;/td&gt;
&lt;td&gt;Point-in-time, never updated&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;ADR&lt;/strong&gt; — records one decision + rationale (&lt;a href="https://candost.blog/adrs-rfcs-differences-when-which/" rel="noopener noreferrer"&gt;Candost&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Lightweight, captures &lt;em&gt;why&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Accumulates without sync&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Design Docs&lt;/strong&gt; — comprehensive pre-impl design (&lt;a href="https://blog.pragmaticengineer.com/rfcs-and-design-docs/" rel="noopener noreferrer"&gt;Google, Uber-style&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Thorough analysis&lt;/td&gt;
&lt;td&gt;Goes stale fast&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;CLAUDE.md / AGENTS.md&lt;/strong&gt; — repo-level AI instructions (&lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;agents.md&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Zero-friction, always loaded&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.infoq.com/news/2026/03/agents-context-file-value-review/" rel="noopener noreferrer"&gt;No sync&lt;/a&gt;, grows stale silently&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Spec Kit&lt;/strong&gt; — Spec → Plan → Task → Implement (&lt;a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/" rel="noopener noreferrer"&gt;GitHub Blog&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Structured workflow&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://blog.scottlogic.com/2025/11/26/putting-spec-kit-through-its-paces-radical-idea-or-reinvented-waterfall.html" rel="noopener noreferrer"&gt;One-shot&lt;/a&gt;, no cross-session continuity&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Kiro&lt;/strong&gt; — IDE with built-in spec workflow (&lt;a href="https://kiro.dev/blog/kiro-and-the-future-of-software-development/" rel="noopener noreferrer"&gt;kiro.dev&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Integrated experience&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;Static specs, manual updates&lt;/a&gt;, IDE-locked&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every approach above shares a common failure mode: &lt;strong&gt;they treat specification as a one-time event, not a continuous process.&lt;/strong&gt; You write the RFC, make the decision, and move on. You create the design doc, build the feature, and the doc rots. You set up CLAUDE.md on day one, and by week three it describes a project that no longer exists.&lt;/p&gt;

&lt;p&gt;Drew Breunig captured this perfectly with the &lt;a href="https://www.dbreunig.com/2026/03/04/the-spec-driven-development-triangle.html" rel="noopener noreferrer"&gt;Spec-Driven Development Triangle&lt;/a&gt; — specs, code, and tests form a triangle that must stay in sync, but keeping them in sync is where everyone fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sync Problem Is a Workflow Problem
&lt;/h3&gt;

&lt;p&gt;Here's the insight that most tools miss: &lt;strong&gt;spec drift isn't a documentation problem. It's a workflow problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't solve it by writing better specs. You can't solve it by adding a linter that checks specs against code. You can't solve it with a pre-commit hook that nags you to update the docs.&lt;/p&gt;

&lt;p&gt;You solve it by making knowledge maintenance &lt;strong&gt;an inseparable part of the development workflow itself&lt;/strong&gt; — not something you do after the "real work," but part of the work.&lt;/p&gt;

&lt;p&gt;This requires three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A knowledge base&lt;/strong&gt; that's structured enough for AI to reference, but lightweight enough for humans to maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A sync mechanism&lt;/strong&gt; that's embedded in the development lifecycle, not bolted on as an afterthought&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An iterative workflow&lt;/strong&gt; that revisits and evolves knowledge across sessions, not just within a single feature&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most tools get one or two of these. Almost none get all three.&lt;/p&gt;

&lt;h3&gt;
  
  
  REAP's Answer: Genome + Sync + Recursive Workflow
&lt;/h3&gt;

&lt;p&gt;This is the problem &lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;REAP&lt;/a&gt; was built to solve. Not by treating specs as source code, but by building a &lt;strong&gt;recursive workflow where knowledge evolves alongside the code it describes&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Genome: A Living Knowledge Base
&lt;/h4&gt;

&lt;p&gt;REAP maintains a "Genome" — a structured collection of project knowledge stored in &lt;code&gt;.reap/genome/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.reap/genome/
  principles.md      # Architecture decisions (ADR-style, with rationale)
  conventions.md      # Development rules and enforced standards
  constraints.md      # Technical choices and validation commands
  domain/             # Business rules that can't be derived from code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Genome isn't a spec. It doesn't try to describe &lt;em&gt;what to build&lt;/em&gt;. It captures &lt;strong&gt;what you've learned&lt;/strong&gt; — architecture principles, business rules, constraints, conventions. It's the accumulated knowledge that makes your project &lt;em&gt;your project&lt;/em&gt;, not a generic codebase.&lt;/p&gt;

&lt;p&gt;Every time an AI agent starts a session in a REAP project, the Genome is automatically injected into its context. The agent doesn't start from zero — it starts with your project's institutional knowledge.&lt;/p&gt;

&lt;h4&gt;
  
  
  Sync Through the Lifecycle, Not After It
&lt;/h4&gt;

&lt;p&gt;Here's where REAP diverges from every tool listed above. Knowledge sync isn't a separate activity — it's &lt;strong&gt;built into the development lifecycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each "Generation" (a unit of work) follows a five-stage cycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Objective → Planning → Implementation → Validation → Completion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During &lt;strong&gt;Implementation&lt;/strong&gt;, when you discover something that contradicts the Genome — a business rule that changed, an architectural assumption that proved wrong — you don't stop to update docs. You log it as a backlog item and keep building.&lt;/p&gt;

&lt;p&gt;During &lt;strong&gt;Completion&lt;/strong&gt;, those discoveries are reviewed and the Genome is updated. Knowledge evolution happens as a natural part of finishing work, not as a separate maintenance chore that everyone skips.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the critical difference.&lt;/strong&gt; The Genome stays in sync with reality because updating it is part of the workflow, not something you do "when you have time" (which means never).&lt;/p&gt;

&lt;h4&gt;
  
  
  Recursive, Not One-Shot
&lt;/h4&gt;

&lt;p&gt;But the most important differentiator isn't the Genome or the sync — it's that the &lt;strong&gt;workflow is recursive&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Spec Kit gives you: Specify → Plan → Task → Implement. Done. Start over from scratch for the next feature.&lt;/p&gt;

&lt;p&gt;REAP gives you an &lt;strong&gt;endless chain of generations&lt;/strong&gt;, where each generation inherits the knowledge from all previous ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gen 1: Build auth → learns "we use JWT" → Genome updated
Gen 2: Build API → starts knowing "we use JWT" → learns "rate limiting needed" → Genome updated
Gen 3: Build dashboard → starts knowing both → builds on accumulated knowledge
...
Gen N: Genome reflects N generations of accumulated learning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each generation archives its artifacts in a &lt;strong&gt;Lineage&lt;/strong&gt; — a complete history of what was decided, what was built, and what was learned. The Genome is a living summary; the Lineage is the full record.&lt;/p&gt;

&lt;p&gt;This recursive structure means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No cold starts&lt;/strong&gt; — Every generation begins with the full context of everything that came before&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No knowledge loss&lt;/strong&gt; — Decisions made in generation 5 are still accessible in generation 50&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural evolution&lt;/strong&gt; — The Genome grows more accurate over time, not less — the opposite of traditional specs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Right Mental Model
&lt;/h3&gt;

&lt;p&gt;Here's how to think about it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional&lt;/th&gt;
&lt;th&gt;SDD&lt;/th&gt;
&lt;th&gt;REAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code is...&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The only truth&lt;/td&gt;
&lt;td&gt;A generated artifact&lt;/td&gt;
&lt;td&gt;The truth, always&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spec is...&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-work that rots&lt;/td&gt;
&lt;td&gt;Source of truth&lt;/td&gt;
&lt;td&gt;Per-generation Objective (scoped, disposable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge is...&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In people's heads&lt;/td&gt;
&lt;td&gt;In spec documents&lt;/td&gt;
&lt;td&gt;In an evolving Genome&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workflow is...&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ad hoc&lt;/td&gt;
&lt;td&gt;One-shot pipeline&lt;/td&gt;
&lt;td&gt;Recursive generations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sync happens...&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Never&lt;/td&gt;
&lt;td&gt;Manually&lt;/td&gt;
&lt;td&gt;Built into each generation's completion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Code remains the source of truth. The Genome doesn't replace it — it &lt;strong&gt;complements&lt;/strong&gt; it by capturing the &lt;em&gt;intent, rationale, and constraints&lt;/em&gt; that code alone can't express. And the recursive workflow ensures the two stay in sync, generation after generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @c-d-cc/reap
reap init my-project

&lt;span class="c"&gt;# In Claude Code or OpenCode:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.start
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.evolve &lt;span class="s2"&gt;"Implement user authentication"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;REAP is open source, MIT licensed, and supports Claude Code and OpenCode today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://reap.cc" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt; | &lt;a href="https://www.npmjs.com/package/@c-d-cc/reap" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Specs can't be source code. But the intent behind them — the decisions, the constraints, the hard-won lessons — that's worth managing. The question is whether your workflow makes that management automatic or optional. Because optional means it won't happen.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://reproducible-builds.org/docs/deterministic-build-systems/" rel="noopener noreferrer"&gt;Reproducible Builds — reproducible-builds.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2602.00180" rel="noopener noreferrer"&gt;Spec-Driven Development: From Code to Contract — arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://newsletter.pragmaticengineer.com/p/rfcs-and-design-docs" rel="noopener noreferrer"&gt;Engineering Planning with RFCs, Design Documents and ADRs — Pragmatic Engineer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://agents.md/" rel="noopener noreferrer"&gt;AGENTS.md — Open Standard for AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.dbreunig.com/2026/03/04/the-spec-driven-development-triangle.html" rel="noopener noreferrer"&gt;The Spec-Driven Development Triangle — Drew Breunig&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/" rel="noopener noreferrer"&gt;Spec-Driven Development with AI — GitHub Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;Exploring SDD Tools — Martin Fowler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.scottlogic.com/2025/11/26/putting-spec-kit-through-its-paces-radical-idea-or-reinvented-waterfall.html" rel="noopener noreferrer"&gt;Putting Spec Kit Through Its Paces — Scott Logic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kiro.dev/blog/kiro-and-the-future-of-software-development/" rel="noopener noreferrer"&gt;Kiro and the Future of Software Development — kiro.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/news/2026/03/agents-context-file-value-review/" rel="noopener noreferrer"&gt;AGENTS.md Value Reassessment — InfoQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Why Spec-Driven Development Fails— And a Better Way to Structure AI Development</title>
      <dc:creator>Hichoi-Dev</dc:creator>
      <pubDate>Wed, 18 Mar 2026 21:14:10 +0000</pubDate>
      <link>https://dev.to/casamia918/why-spec-driven-development-fails-and-what-we-can-learn-from-it-2pec</link>
      <guid>https://dev.to/casamia918/why-spec-driven-development-fails-and-what-we-can-learn-from-it-2pec</guid>
      <description>&lt;h2&gt;
  
  
  SDD: The Right Problem, Wrong Solution
&lt;/h2&gt;

&lt;p&gt;Spec-Driven Development (SDD) is the idea that detailed specifications — written upfront — can guide AI agents to produce working software. GitHub's &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;Spec Kit&lt;/a&gt; is a representative example, formalizing this into a workflow: &lt;strong&gt;Specify → Plan → Task → Implement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;SDD recognized a real problem: "prompt and pray" doesn't scale. Beyond toy projects, you need a way to communicate intent to AI that goes beyond "build me an auth system." The core insight — that &lt;strong&gt;structure matters&lt;/strong&gt; — is valid.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Problem: Specs Are Non-Deterministic
&lt;/h2&gt;

&lt;p&gt;The fundamental flaw: SDD treats specifications as authoritative sources of truth, but LLMs exhibit non-deterministic behavior. The same specification produces different implementations across different runs—varying architectural choices, data structures, and error handling. As the analysis notes, "Because of the non-deterministic nature of this technology, there will always remain a very non-negligible probability that it does things that we don't want." This means specifications cannot serve as reliable sources of truth the way source code does.&lt;/p&gt;

&lt;h2&gt;
  
  
  SDD Is Waterfall in Disguise
&lt;/h2&gt;

&lt;p&gt;SDD essentially recreates Waterfall methodology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Big Design Up Front with exhaustive specifications&lt;/li&gt;
&lt;li&gt;Sequential phases completing before the next begins&lt;/li&gt;
&lt;li&gt;Assumption that thorough planning eliminates execution uncertainty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real-world testing revealed inefficiency: one hands-on evaluation required &lt;strong&gt;33 minutes and 2,577 lines of markdown&lt;/strong&gt; to produce 689 lines of code, compared to 8 minutes using iterative prompting—approximately 10x slower with no quality improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Specifications Drift
&lt;/h2&gt;

&lt;p&gt;Specifications and code inevitably diverge because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI makes unanticipated architectural choices&lt;/li&gt;
&lt;li&gt;Each iteration accumulates undocumented decisions&lt;/li&gt;
&lt;li&gt;Specs become post-hoc documentation rather than guides&lt;/li&gt;
&lt;li&gt;Developers spend time reading lengthy markdown instead of solving problems&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Question
&lt;/h2&gt;

&lt;p&gt;Rather than "exhaustive upfront specifications," the answer aligns with decades of software engineering wisdom: &lt;strong&gt;iterative development with accumulated learning&lt;/strong&gt;—essentially Agile methodology adapted for AI collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Different Approach
&lt;/h2&gt;

&lt;p&gt;This is what motivated me to build &lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;REAP&lt;/a&gt; (Recursive Evolutionary Autonomous Pipeline). Rather than treating development as a spec-to-code translation, REAP structures AI-assisted development as an &lt;strong&gt;evolutionary process&lt;/strong&gt; — closer to how experienced developers actually work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How REAP Works
&lt;/h3&gt;

&lt;p&gt;Development happens in &lt;strong&gt;Generations&lt;/strong&gt;. Each generation carries one focused goal through a 5-stage lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Objective → Planning → Implementation → Validation → Completion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't just a linear pipeline. Each stage has gates, and stages can &lt;strong&gt;regress&lt;/strong&gt; — if validation fails, you loop back to implementation with the failure context preserved. This mirrors the real-world "build → test → fix → test again" cycle that SDD's sequential model ignores.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Genome: Knowledge That Evolves
&lt;/h3&gt;

&lt;p&gt;Where SDD puts specifications at the center, REAP puts a &lt;strong&gt;Genome&lt;/strong&gt; at the center — a living record stored in &lt;code&gt;.reap/genome/&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;principles.md&lt;/code&gt;&lt;/strong&gt; — Architecture decisions with rationale (ADR-style)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;conventions.md&lt;/code&gt;&lt;/strong&gt; — Development rules and enforced standards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;constraints.md&lt;/code&gt;&lt;/strong&gt; — Technical choices and validation commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;domain/&lt;/code&gt;&lt;/strong&gt; — Business rules that can't be derived from code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Genome isn't written once and forgotten. It &lt;strong&gt;evolves across generations&lt;/strong&gt;. When you discover something during implementation that contradicts the Genome, you log it as a backlog item. At the end of each generation, discoveries are reviewed and the Genome is updated. Over time, the Genome becomes an increasingly accurate map of your project — not a spec that drifts from reality.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes It Different from SDD
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SDD&lt;/th&gt;
&lt;th&gt;REAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Source of truth&lt;/td&gt;
&lt;td&gt;Specification document&lt;/td&gt;
&lt;td&gt;Evolved Genome + source code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planning scope&lt;/td&gt;
&lt;td&gt;Entire project upfront&lt;/td&gt;
&lt;td&gt;One generation at a time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;When plans break&lt;/td&gt;
&lt;td&gt;Spec drift → update spec → regenerate&lt;/td&gt;
&lt;td&gt;Discovery → backlog → evolve Genome&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation&lt;/td&gt;
&lt;td&gt;Spec compliance&lt;/td&gt;
&lt;td&gt;Actual tests, type checks, builds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge persistence&lt;/td&gt;
&lt;td&gt;Specs (static)&lt;/td&gt;
&lt;td&gt;Genome (evolving) + Lineage (history)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context for AI&lt;/td&gt;
&lt;td&gt;Spec document&lt;/td&gt;
&lt;td&gt;Genome + generation state (auto-injected)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Context That Persists
&lt;/h3&gt;

&lt;p&gt;Every time you start an AI session in a REAP project, the &lt;strong&gt;SessionStart hook&lt;/strong&gt; automatically injects the Genome, current generation state, and workflow rules into the AI's context. The AI doesn't start from zero — it starts with your project's accumulated knowledge.&lt;/p&gt;

&lt;p&gt;This solves SDD's "spec drift" problem at the root. The Genome stays in sync with reality because it's &lt;strong&gt;updated as part of the development process&lt;/strong&gt;, not maintained as a separate artifact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="s2"&gt;"@c-d-cc/reap"&lt;/span&gt;
reap init my-project
&lt;span class="c"&gt;# Open Claude Code or OpenCode&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.start
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.evolve &lt;span class="s2"&gt;"Implement user authentication"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;REAP supports multiple AI agents — Claude Code and OpenCode today, with an extensible adapter system for adding more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://reap.cc" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt; | &lt;a href="https://www.npmjs.com/package/@c-d-cc/reap" rel="noopener noreferrer"&gt;npm&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your experience with spec-driven development? Have you found structure that works for AI-assisted development? I'd love to hear in the comments.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;GitHub Spec Kit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/" rel="noopener noreferrer"&gt;Spec-Driven Development with AI — GitHub Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.scottlogic.com/2025/11/26/putting-spec-kit-through-its-paces-radical-idea-or-reinvented-waterfall.html" rel="noopener noreferrer"&gt;Putting Spec Kit Through Its Paces — Scott Logic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://marmelab.com/blog/2025/11/12/spec-driven-development-waterfall-strikes-back.html" rel="noopener noreferrer"&gt;SDD: The Waterfall Strikes Back — Marmelab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html" rel="noopener noreferrer"&gt;Exploring SDD Tools — Martin Fowler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.microsoft.com/blog/spec-driven-development-spec-kit" rel="noopener noreferrer"&gt;Diving Into SDD With Spec Kit — Microsoft Developer Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vibecoding.app/blog/spec-kit-review" rel="noopener noreferrer"&gt;Spec Kit Review — Vibecoding.app&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thoughtworks.com/en-us/insights/blog/agile-engineering-practices/spec-driven-development-unpacking-2025-new-engineering-practices" rel="noopener noreferrer"&gt;SDD: Unpacking 2025's Key Practice — Thoughtworks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>softwaredevelopment</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>I built a dev tool that "evolves" code with AI — REAP</title>
      <dc:creator>Hichoi-Dev</dc:creator>
      <pubDate>Wed, 18 Mar 2026 02:57:43 +0000</pubDate>
      <link>https://dev.to/casamia918/i-built-a-dev-tool-that-evolves-code-with-ai-reap-k17</link>
      <guid>https://dev.to/casamia918/i-built-a-dev-tool-that-evolves-code-with-ai-reap-k17</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you've been building with AI agents (like Claude Code), you've probably encountered these problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context loss&lt;/strong&gt; — Start a new session and your context is gone. You end up clinging to long sessions just to avoid losing everything the AI has learned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale documentation&lt;/strong&gt; — You try to persist knowledge in READMEs and CLAUDE.md files, but they quietly go stale as the project moves forward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI going rogue&lt;/strong&gt; — Sometimes the AI just ignores your carefully crafted docs and does its own thing anyway.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're all stuck at the same bottleneck — the context window just isn't enough for long-running projects.&lt;/p&gt;

&lt;p&gt;I tried existing tools like spec-kit and superpower — they're decent for one-off feature work, but didn't quite fit for sustained, long-term development.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;So I built &lt;strong&gt;REAP&lt;/strong&gt; (Recursive Evolutionary Autonomous Pipeline) — an open-source CLI tool inspired by generational evolution in biology.&lt;/p&gt;

&lt;p&gt;The idea: &lt;strong&gt;AI and humans evolve software across generations.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Genome (Design &amp;amp; Knowledge)
  → Evolution (Generational Progress)
    → Civilization (Source Code)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Genome
&lt;/h3&gt;

&lt;p&gt;Your project's design knowledge is managed as a "Genome" — architecture decisions, business rules, conventions, and constraints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.reap/genome/
├── principles.md      # Architecture principles
├── domain/            # Business rules
├── conventions.md     # Development conventions
└── constraints.md     # Technical constraints
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Life Cycle
&lt;/h3&gt;

&lt;p&gt;Each generation follows a five-stage lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Objective → Planning → Implementation → Validation → Completion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Objective&lt;/strong&gt; — Define goal, requirements, and acceptance criteria&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planning&lt;/strong&gt; — Break down tasks, choose approach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation&lt;/strong&gt; — Build with AI + human collaboration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt; — Run tests, verify completion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completion&lt;/strong&gt; — Retrospective + apply Genome changes + archive&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Evolution
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When a generation completes, it gets archived in the lineage, and the next generation picks up new goals.&lt;/li&gt;
&lt;li&gt;Lessons learned within a generation get folded back into the Genome.&lt;/li&gt;
&lt;li&gt;Through this iterative pipeline, your source code (the "Civilization") keeps evolving.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @c-d-cc/reap

&lt;span class="c"&gt;# Initialize&lt;/span&gt;
reap init my-project

&lt;span class="c"&gt;# Run a full generation in Claude Code&lt;/span&gt;
claude
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /reap.evolve &lt;span class="s2"&gt;"Implement user authentication"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/reap.evolve&lt;/code&gt; runs the entire generation lifecycle — from Objective through Completion — interactively with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/c-d-cc/reap" rel="noopener noreferrer"&gt;github.com/c-d-cc/reap&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://reap.cc" rel="noopener noreferrer"&gt;reap.cc&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT licensed. Contributions and feedback are welcome!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
