DEV Community: Hichoi-Dev

Everyone Just Discovered Loop Engineering. REAP Got There First — and It's Ready When You Are

Hichoi-Dev — Sun, 05 Jul 2026 09:11:03 +0000

In June 2026, "loop engineering" went viral. Stop prompting your agent — design the loop that prompts it. Ralph Wiggum loops running Claude for hours. Overnight runs. Millions of views.

My honest reaction: finally, everyone's here. I've been running my entire development process as AI loops since this February — months before the trend had a name — and one project is now 70+ loop iterations deep, and the tool that runs those loops is itself built by those loops.

So this is a field report. Not "loops are amazing" (they are) and not "loops are hype" (they're not). Just the seven things that turned out to actually matter once you live inside a loop long enough — including the ones the infinite-loop crowd is about to learn the hard way.

Context: the tool is REAP (https://reap.cc), an open-source pipeline I built on top of Claude Code / OpenCode. It exists because I needed these seven lessons encoded in software, not in my discipline.

First, what the loop people get right

Credit where due — the core insight of loop engineering is correct:

One-shot prompting doesn't scale. Iteration beats a perfect mega-prompt every time.
Files beat context windows. State that matters must live on disk, not in the conversation.
Fresh context each iteration prevents the slow rot of a 400-message session.
The leverage moved. Your job really is designing the system around the agent now.

I agree with all of it. Now here's what months of actually living inside the loop adds.

Lesson 1: A goal is not a loop spec

The naive loop is: final goal + while true. It works for tasks where the environment can say "done" — make tests pass, finish a mechanical migration. Even Ralph loop advocates admit it: vague criteria = infinite loop, judgment-heavy work doesn't converge.

But almost everything interesting in software is judgment-heavy. So instead of one goal driving infinite iterations, I got much better results from one bounded goal per iteration, chosen fresh each time by comparing long-term vision against current state (gap analysis). The loop's unit of work in REAP is a generation: one goal, one lifecycle, one review. Then pick the next goal with a human sanity-check in between.

Small bounded loops with re-aiming between them beat one big loop, every single time.

Lesson 2: The loop needs stages, not just repetitions

An unstructured iteration ("here's the goal, go") makes the agent jump straight to code. The fix that stuck: every generation walks a fixed lifecycle —

learning → planning → implementation ⇄ validation → completion

Each stage produces an artifact file (what was learned, what's planned, what was done, what was verified). Sounds bureaucratic. Isn't. Those artifacts are what make iteration N+1 smarter than iteration N — and what make the human checkpoint (next lesson) reviewable in minutes instead of hours.

Lesson 3: The agent must never grade its own homework

This is the hill I'll die on. Every generation ends with a fitness phase where a human gives feedback — and it's deliberately natural language only. No scores, no rubric, no "rate this 1–10", no LLM-as-judge.

Why so strict? Goodhart's law. Any quantitative fitness signal an agent can see is a signal it will optimize instead of the actual goal. I've watched it happen. Self-assessment ("here's what I'm uncertain about") is allowed and useful; self-scoring is banned at the protocol level.

An unattended infinite loop has exactly one grader: the model itself. That's not autonomy — that's compounding hallucination with a progress bar.

Lesson 4: Lock the rules while the loop is running

Give an agent long enough inside a loop and it will, very reasonably, decide the rules should change. The convention was inconvenient, so it "improved" it — mid-task, silently.

REAP's answer is a genome: a small set of files holding architecture decisions, conventions, and hard constraints. During a generation the genome is immutable. The agent can propose changes, but they queue up in a backlog and get applied only at the generation boundary — where I review them. The loop can suggest amendments to its constitution; it cannot ratify them.

Lesson 5: If a stage can be skipped, it will be skipped

Ask any agent to "always run validation before completing" and count the sessions until it… doesn't. Instructions decay. So REAP enforces stage order cryptographically: every stage transition requires a signature token (nonce) that only the previous stage's completion can issue. Skipping validation isn't a disobeyed instruction — it's a failed signature check. The CLI just says no.

Rule of thumb after 70 generations: anything you'd write in ALL CAPS in your prompt should be enforced by the harness instead.

Lesson 6: Autonomy should be a budget, not a binary

The loop-engineering debate keeps framing it as attended vs. unattended. The useful knob is in between: how many iterations am I willing to pre-approve?

REAP calls it cruise mode: reap cruise 3 means "run 3 generations autonomously, then come back to me." Clear, mechanical goals? Crank it up. Ambiguous design territory? Set it to zero and review every generation. Autonomy becomes a dial you turn per-situation, not an ideology.

Lesson 7: Loops need exit ramps, not just exit conditions

Real loops don't always end in success. Sometimes the goal was wrong, sometimes 60% done is worth keeping. An infinite loop has one exit: Ctrl-C, and whatever mess is on disk is your problem.

A loop iteration in REAP has three distinct endings — complete (full lifecycle + review), early-close (keep the partial value, auto-carry unfinished tasks to the next generation's backlog), and abort (discard cleanly, restore consumed state). The ability to lose gracefully is what makes running many loops cheap.

Ralph loop vs. REAP, honestly

	Ralph-style infinite loop	REAP
Unit of work	One prompt, repeated forever	One goal per generation, re-aimed each cycle
Memory	Files + git, unstructured	3-tier memory + genome + lineage archive
Correction signal	Environment (tests/build) only	Environment + human fitness each generation
Rules mid-run	Agent can drift	Genome locked, changes reviewed at boundary
Stage discipline	Prompt-based (decays)	Signature-enforced (can't skip)
Autonomy	All or nothing	Budgeted (cruise N)
Failure exit	Ctrl-C + cleanup	abort / early-close / complete
Best at	Mechanical, machine-checkable tasks	Sustained product evolution with judgment calls

Not a takedown — for a 200-file mechanical migration with a green-tests exit condition, a Ralph loop is genuinely great. But for evolving a real product over months, you need the right column. That's the gap REAP was built for.

Proof of loop: this tool builds itself

The part I'm proudest of: REAP is developed with REAP. All 70+ generations — the signature locking, the memory system, cruise mode, the evaluator agent — were built inside the exact loop they enforce, each closed with human fitness feedback. Every flaw in the loop design lands on me first, and the fix gets encoded into the genome for every generation after.

Dog-fooding a loop tool inside its own loop is the fastest feedback cycle I've ever worked in.

Try a structured loop (5 minutes)

npm install -g @c-d-cc/reap
cd your-project
reap init        # detects greenfield vs existing codebase

Open Claude Code (or OpenCode) and run:

/reap.evolve

That's one full generation: the agent learns your codebase, plans, implements, validates — and then asks you how it did. Feedback becomes selection pressure. The next generation starts smarter.

🌱 Site: reap.cc
⭐ GitHub: github.com/c-d-cc/reap — stars help more people find it
📦 npm: @c-d-cc/reap

If you're running loops today — Ralph-style, cron-based, hand-rolled — I'd love to hear where yours drifted and what you did about it. That's the conversation loop engineering actually needs next.

New workflow control method for harness engineering — Signature-Based Locking

Hichoi-Dev — Sat, 21 Mar 2026 19:58:59 +0000

The Problem: AI Won't Stay Harnessed

If you've been building AI-assisted development workflows — what some call "harness engineering" — you've hit this wall:

No matter how carefully you craft your prompts, the AI eventually goes off-script.

You define a multi-step workflow. The AI follows it for a while. Then somewhere around step 4, it decides to "optimize" by skipping steps, modifying files directly, or inventing a shortcut that breaks your entire pipeline.

This isn't a prompting failure. It's a fundamental limitation of prompt-only workflow control.

Why Prompt-Only Control Fails

Three documented forces work against prompt-based workflow enforcement:

1. Context Rot (Lost in the Middle)

As conversations grow longer, instructions from the beginning of the context window lose influence. Research published in TACL ("Lost in the Middle") demonstrates that LLMs exhibit a U-shaped attention curve — they attend strongly to the beginning and end of context, but performance degrades by over 20% for information in the middle. Your carefully structured "NEVER do X" rules get diluted by thousands of tokens of subsequent conversation.

2. Training-Induced Optimization Pressure

This isn't speculation — it's documented behavior. RLHF training creates measurable pressure toward concise, "helpful" responses, because human evaluators systematically prefer them. Anthropic's own prompting best practices explicitly state that newer Claude models "may skip detailed summaries for efficiency." OpenAI acknowledged the same phenomenon when GPT-4 became "lazy" in December 2023, requiring a new model checkpoint to fix.

When an AI sees a 5-step workflow where steps 2-4 seem like overhead, it has a trained tendency to compress. This is the model being helpful — and breaking your harness in the process.

3. No Enforcement Boundary

Prompts are suggestions, not constraints. There's no mechanism to prevent the AI from taking an action — you can only ask it not to. Research on specification gaming shows that models can learn to satisfy the apparent goal while bypassing the intended process — including modifying unit tests to pass instead of writing correct code. Prompts operate in the same trust domain as the AI itself.

What's Been Tried

Approach 1: Stronger Prompts

Add more rules. Make them UPPERCASE. Use XML tags. Add "CRITICAL" and "NEVER" and "NON-NEGOTIABLE."

This helps initially but doesn't solve context rot. The more rules you add, the more diluted each individual rule becomes.

Approach 2: File-Level Permissions

Restrict which files the AI can modify (e.g., strict mode, read-only markers). This prevents certain destructive actions but doesn't enforce workflow ordering. The AI can still call commands out of sequence.

Approach 3: Deterministic Scripts

Move workflow logic out of prompts and into deterministic scripts. The AI calls scripts instead of modifying state directly. This is the right direction — but scripts alone can't prevent the AI from calling them out of order, or skipping them entirely.

First, You Need a Workflow

Before we can lock anything, we need something to lock. Signature-Based Locking assumes your AI-assisted work follows a defined lifecycle with ordered stages — a workflow where step N must complete before step N+1 begins.

This is the lifecycle steps behind REAP (Recursive Evolutionary Autonomous Pipeline), where each unit of work — called a Generation — follows a 5-stage lifecycle:

Objective → Planning → Implementation → Validation → Completion

Each stage has a clear purpose: define the goal, break it into tasks, build it, verify it works, then retrospect and archive. Stages produce artifacts, and transitions between stages are explicit — you can't "drift" from planning into implementation without a deliberate transition.

This kind of structured workflow is where AI agents provide the most value (creative work within each stage) but also where they cause the most damage (skipping stages, going out of order, bypassing gates). The more structured your workflow, the more you need enforcement.

Signature-Based Locking: Enforcing Workflow Sequence from Outside the AI

Given a structured workflow, here's the insight: sequence enforcement must happen outside the AI's trust boundary.

The AI can ignore prompts. Content guardrails can filter what it says. But neither can enforce the order in which steps are executed. Cryptographic signatures can.

How It Works

Each stage command generates a random nonce, stores its SHA256 hash (mixed with execution context) in the workflow state file, and returns the raw nonce to the AI. To advance to the next stage, the AI must pass this nonce to the transition command, which recomputes the hash and verifies it matches.

The critical property: only the actual script execution can produce a valid nonce. The AI receives the nonce as output, but cannot reverse-engineer or fabricate one. The hash is stored in a managed state file that the AI is structurally prevented from modifying directly.

What Signature-Based Locking Gives You

Existing approaches each solve a piece of the puzzle: prompts communicate intent, file permissions restrict access, content guardrails filter unsafe output, and deterministic scripts encode logic. But none of them enforce execution sequence — the guarantee that step N actually happened before step N+1.

Signature-Based Locking fills this gap. It doesn't replace the other approaches; it adds the missing dimension. Here's how it stacks up:

Why Signature-Based Locking Works

Threat	Prompt-only	Signature-Based Locking
AI skips a stage	Possible	Blocked (no nonce)
AI modifies state directly	Possible	Blocked (hash mismatch)
AI replays a previous step	Possible	Blocked (context-bound nonce)

The Hybrid Architecture

Signature-Based Locking is most effective as part of a hybrid architecture that separates deterministic and creative work:

Key principle: The deterministic script handles everything that has a "right answer" — state transitions, gate checks, file validation, hook execution. The AI handles everything that requires creativity — writing code, making design decisions, solving problems.

The scripts communicate with the AI through structured JSON output:

{
  "status": "ok",
  "command": "objective",
  "phase": "complete",
  "message": "Objective stage complete. Advance with: /reap.next a3f8c2d9..."
}

The AI receives clear instructions and a nonce. It cannot advance without passing the nonce to the next command. The deterministic script verifies and controls the flow.

REAP: This Architecture in Practice

This is exactly how REAP (Recursive Evolutionary Autonomous Pipeline) works. REAP is an open-source CLI tool that structures AI-assisted development as an evolutionary process — software evolves across Generations, each carrying one goal through a 5-stage lifecycle.

What REAP Does

Genome — Your project's design knowledge (architecture decisions, conventions, constraints, business rules) is managed as a living document that evolves across generations
Lifecycle — Each generation follows: Objective → Planning → Implementation → Validation → Completion
Signature-Based Locking — Stage transitions require cryptographic nonce verification, preventing the AI from skipping stages or going off-script
Session Persistence — The Genome and current generation state are automatically injected into the AI's context at session start, solving the "context loss across sessions" problem
Multi-Agent Support — Works with Claude Code and OpenCode.

The Signature Chain in REAP

/reap.start "Build user auth"
  → Script creates generation, stores hash
  → AI receives instructions

/reap.objective
  → AI defines goals, writes artifact
  → Script verifies artifact, generates nonce
  → Message: "Advance with: /reap.next a3f8c2..."

/reap.next a3f8c2...
  → Script verifies SHA256(a3f8c2 + genId + stage) == stored hash
  → ✅ Match → advance to planning
  → ❌ Mismatch → "Token verification failed. Re-run the stage command."

/reap.planning
  → AI creates implementation plan
  → Script generates new nonce
  → Message: "Advance with: /reap.next b7d91e..."

  ... chain continues through all stages ...

Each nonce is single-use, context-bound (includes generation ID and stage name), and cryptographically verified. The AI cannot skip ahead, replay, or forge tokens.

Why It Matters

After building 109 generations with REAP (yes, REAP is built with REAP), we've seen firsthand that prompt-only workflow control breaks down at scale. The AI "optimizes" by skipping validation, modifying state files directly, or calling commands out of order.

Signature-Based Locking eliminated these failure modes — not by adding more rules to the prompt, but by making rule violation mechanically impossible.

Related Work: NeMo Guardrails

It's worth mentioning NVIDIA's NeMo Guardrails, which shares an important philosophy with Signature-Based Locking: don't rely on prompts alone — enforce rules in code.

NeMo Guardrails places a programmable middleware between the user and the LLM. User input is normalized into intents, Colang rules determine whether to call the LLM or return a pre-defined response, and the output is screened against safety policies. This gives developers precise control over what the AI can say — blocking toxic content, preventing jailbreaks, enforcing topic boundaries, and detecting hallucinations through factual grounding checks. It integrates with LangChain, LlamaIndex, and supports GPU-accelerated evaluation for production workloads.

This is genuinely valuable for chatbots, customer-facing AI, and any application where content safety matters. The core insight — layering deterministic safety on top of probabilistic LLMs — is sound.

Where the two approaches diverge is the dimension of control:

	NeMo Guardrails	Signature-Based Locking
Controls	What the AI says (content)	What order the AI executes steps (sequence)
Mechanism	Input/output filtering via policy rules	Cryptographic nonce chain across steps
Prevents	Toxic content, jailbreaks, hallucinations	Stage skipping, out-of-order execution, state tampering
Best for	Chatbots, customer-facing AI	Multi-step workflows, autonomous agents

They're complementary, not competing. You could use NeMo Guardrails to ensure the AI doesn't produce unsafe content, and Signature-Based Locking to ensure it follows the correct execution sequence. Different dimensions, same philosophy.

Try It

npm install -g @c-d-cc/reap
reap init my-project
# Open Claude Code or OpenCode
> /reap.evolve "Implement user authentication"

GitHub | Documentation | npm

Have you struggled with keeping AI agents on-script in multi-step workflows? What approaches have you tried? I'd love to hear about your harness engineering experiences in the comments.

References:

Specs Cannot be Source of Source Code — Why Intent Management Matters in AI-Driven Development

Hichoi-Dev — Fri, 20 Mar 2026 17:37:55 +0000

The Seductive Idea

There's a compelling narrative in AI-driven development right now: write a detailed spec, feed it to an AI agent, and get working software out.

GitHub's Spec Kit, AWS's Kiro, and a growing ecosystem of tools all converge on the same premise — that specifications can become the "source of source code." The product requirements document isn't just a guide for implementation; it is the source that generates implementation.

It's an attractive idea. If specs are the source, then developers become spec writers, AI becomes the compiler, and code becomes a generated artifact. Clean. Elegant. Almost too good.

And that's the problem.

Source Code Is Deterministic. Specs Are Not.

Let's start with what "source" actually means in software engineering.

When you compile main.c, you get the same binary. Every time. On every machine. This property — determinism — is what makes source code source. It's the reproducible foundation on which everything else stands: builds, tests, deployments, debugging.

Now consider a specification:

"The system should handle user authentication with proper security measures."

Feed this to an AI agent three times. You'll get three different implementations — different OAuth flows, different session strategies, different error handling patterns. The same spec produces different code across different runs, different models, and different context windows.

This isn't a bug in the AI. It's a fundamental characteristic. Specifications are written in natural language, which is inherently ambiguous. LLMs are non-deterministic by design. The combination means that specs cannot serve as "source" in any meaningful engineering sense.

Source code has a contract: same input, same output. Specifications don't — and can't — honor that contract.

Then What Are Specs? Intent, Not Source.

If specs aren't source, what are they?

They're intent. Initiative. Direction. A spec says what you want and why you want it — but it doesn't deterministically produce how. The "how" emerges through the act of implementation, whether done by a human or an AI agent.

This distinction matters more than it seems:

	Source	Intent
Determinism	Same input → same output	Same input → many valid outputs
Verification	Compile, run, test	Interpret, judge, review
Authority	The code is the truth	The intent guides the truth
Drift	Doesn't drift from itself	Drifts from implementation over time

Treating intent as source is a category error. It's like treating a compass bearing as a GPS coordinate — useful for direction, useless for pinpointing where you actually are.

Why We Still Need to Manage Intent

But here's the thing: just because specs aren't source doesn't mean they don't matter.

In AI-driven development, intent management is arguably more critical than ever:

Context loss — AI agents forget everything between sessions. Without persisted intent, every session starts from zero.
Knowledge decay — Decisions made in session 12 are invisible in session 13. Architecture rationale evaporates. Business rules get re-debated.
Drift without anchor — Without a persistent record of intent, AI agents make locally reasonable but globally inconsistent decisions. The codebase slowly becomes incoherent.

The question isn't whether to manage intent. It's how.

How Teams Have Managed Specs (A Brief History)

Software teams have tried many approaches to capture and maintain design knowledge. Here's how the major ones compare — especially through the lens of AI-assisted development:

Approach	Strengths	Weaknesses	AI-Era Fit
RFC — proposal for collecting feedback (Pragmatic Engineer)	Structured deliberation	Point-in-time, never updated	Low
ADR — records one decision + rationale (Candost)	Lightweight, captures why	Accumulates without sync	Medium
Design Docs — comprehensive pre-impl design (Google, Uber-style)	Thorough analysis	Goes stale fast	Low
CLAUDE.md / AGENTS.md — repo-level AI instructions (agents.md)	Zero-friction, always loaded	No sync, grows stale silently	Medium
Spec Kit — Spec → Plan → Task → Implement (GitHub Blog)	Structured workflow	One-shot, no cross-session continuity	Medium
Kiro — IDE with built-in spec workflow (kiro.dev)	Integrated experience	Static specs, manual updates, IDE-locked	Medium

Every approach above shares a common failure mode: they treat specification as a one-time event, not a continuous process. You write the RFC, make the decision, and move on. You create the design doc, build the feature, and the doc rots. You set up CLAUDE.md on day one, and by week three it describes a project that no longer exists.

Drew Breunig captured this perfectly with the Spec-Driven Development Triangle — specs, code, and tests form a triangle that must stay in sync, but keeping them in sync is where everyone fails.

The Sync Problem Is a Workflow Problem

Here's the insight that most tools miss: spec drift isn't a documentation problem. It's a workflow problem.

You can't solve it by writing better specs. You can't solve it by adding a linter that checks specs against code. You can't solve it with a pre-commit hook that nags you to update the docs.

You solve it by making knowledge maintenance an inseparable part of the development workflow itself — not something you do after the "real work," but part of the work.

This requires three things:

A knowledge base that's structured enough for AI to reference, but lightweight enough for humans to maintain
A sync mechanism that's embedded in the development lifecycle, not bolted on as an afterthought
An iterative workflow that revisits and evolves knowledge across sessions, not just within a single feature

Most tools get one or two of these. Almost none get all three.

REAP's Answer: Genome + Sync + Recursive Workflow

This is the problem REAP was built to solve. Not by treating specs as source code, but by building a recursive workflow where knowledge evolves alongside the code it describes.

The Genome: A Living Knowledge Base

REAP maintains a "Genome" — a structured collection of project knowledge stored in .reap/genome/:

.reap/genome/
  principles.md      # Architecture decisions (ADR-style, with rationale)
  conventions.md      # Development rules and enforced standards
  constraints.md      # Technical choices and validation commands
  domain/             # Business rules that can't be derived from code

The Genome isn't a spec. It doesn't try to describe what to build. It captures what you've learned — architecture principles, business rules, constraints, conventions. It's the accumulated knowledge that makes your project your project, not a generic codebase.

Every time an AI agent starts a session in a REAP project, the Genome is automatically injected into its context. The agent doesn't start from zero — it starts with your project's institutional knowledge.

Sync Through the Lifecycle, Not After It

Here's where REAP diverges from every tool listed above. Knowledge sync isn't a separate activity — it's built into the development lifecycle.

Each "Generation" (a unit of work) follows a five-stage cycle:

Objective → Planning → Implementation → Validation → Completion

During Implementation, when you discover something that contradicts the Genome — a business rule that changed, an architectural assumption that proved wrong — you don't stop to update docs. You log it as a backlog item and keep building.

During Completion, those discoveries are reviewed and the Genome is updated. Knowledge evolution happens as a natural part of finishing work, not as a separate maintenance chore that everyone skips.

This is the critical difference. The Genome stays in sync with reality because updating it is part of the workflow, not something you do "when you have time" (which means never).

Recursive, Not One-Shot

But the most important differentiator isn't the Genome or the sync — it's that the workflow is recursive.

Spec Kit gives you: Specify → Plan → Task → Implement. Done. Start over from scratch for the next feature.

REAP gives you an endless chain of generations, where each generation inherits the knowledge from all previous ones:

Gen 1: Build auth → learns "we use JWT" → Genome updated
Gen 2: Build API → starts knowing "we use JWT" → learns "rate limiting needed" → Genome updated
Gen 3: Build dashboard → starts knowing both → builds on accumulated knowledge
...
Gen N: Genome reflects N generations of accumulated learning

Each generation archives its artifacts in a Lineage — a complete history of what was decided, what was built, and what was learned. The Genome is a living summary; the Lineage is the full record.

This recursive structure means:

No cold starts — Every generation begins with the full context of everything that came before
No knowledge loss — Decisions made in generation 5 are still accessible in generation 50
Natural evolution — The Genome grows more accurate over time, not less — the opposite of traditional specs

The Right Mental Model

Here's how to think about it:

	Traditional	SDD	REAP
Code is...	The only truth	A generated artifact	The truth, always
Spec is...	Pre-work that rots	Source of truth	Per-generation Objective (scoped, disposable)
Knowledge is...	In people's heads	In spec documents	In an evolving Genome
Workflow is...	Ad hoc	One-shot pipeline	Recursive generations
Sync happens...	Never	Manually	Built into each generation's completion

Code remains the source of truth. The Genome doesn't replace it — it complements it by capturing the intent, rationale, and constraints that code alone can't express. And the recursive workflow ensures the two stay in sync, generation after generation.

Try It

npm install -g @c-d-cc/reap
reap init my-project

# In Claude Code or OpenCode:
> /reap.start
> /reap.evolve "Implement user authentication"

REAP is open source, MIT licensed, and supports Claude Code and OpenCode today.

GitHub | Documentation | npm

Specs can't be source code. But the intent behind them — the decisions, the constraints, the hard-won lessons — that's worth managing. The question is whether your workflow makes that management automatic or optional. Because optional means it won't happen.

References:

Why Spec-Driven Development Fails— And a Better Way to Structure AI Development

Hichoi-Dev — Wed, 18 Mar 2026 21:14:10 +0000

SDD: The Right Problem, Wrong Solution

Spec-Driven Development (SDD) is the idea that detailed specifications — written upfront — can guide AI agents to produce working software. GitHub's Spec Kit is a representative example, formalizing this into a workflow: Specify → Plan → Task → Implement.

SDD recognized a real problem: "prompt and pray" doesn't scale. Beyond toy projects, you need a way to communicate intent to AI that goes beyond "build me an auth system." The core insight — that structure matters — is valid.

The Core Problem: Specs Are Non-Deterministic

The fundamental flaw: SDD treats specifications as authoritative sources of truth, but LLMs exhibit non-deterministic behavior. The same specification produces different implementations across different runs—varying architectural choices, data structures, and error handling. As the analysis notes, "Because of the non-deterministic nature of this technology, there will always remain a very non-negligible probability that it does things that we don't want." This means specifications cannot serve as reliable sources of truth the way source code does.

SDD Is Waterfall in Disguise

SDD essentially recreates Waterfall methodology:

Big Design Up Front with exhaustive specifications
Sequential phases completing before the next begins
Assumption that thorough planning eliminates execution uncertainty

Real-world testing revealed inefficiency: one hands-on evaluation required 33 minutes and 2,577 lines of markdown to produce 689 lines of code, compared to 8 minutes using iterative prompting—approximately 10x slower with no quality improvement.

Why Specifications Drift

Specifications and code inevitably diverge because:

AI makes unanticipated architectural choices
Each iteration accumulates undocumented decisions
Specs become post-hoc documentation rather than guides
Developers spend time reading lengthy markdown instead of solving problems

The Real Question

Rather than "exhaustive upfront specifications," the answer aligns with decades of software engineering wisdom: iterative development with accumulated learning—essentially Agile methodology adapted for AI collaboration.

A Different Approach

This is what motivated me to build REAP (Recursive Evolutionary Autonomous Pipeline). Rather than treating development as a spec-to-code translation, REAP structures AI-assisted development as an evolutionary process — closer to how experienced developers actually work.

How REAP Works

Development happens in Generations. Each generation carries one focused goal through a 5-stage lifecycle:

Objective → Planning → Implementation → Validation → Completion

This isn't just a linear pipeline. Each stage has gates, and stages can regress — if validation fails, you loop back to implementation with the failure context preserved. This mirrors the real-world "build → test → fix → test again" cycle that SDD's sequential model ignores.

The Genome: Knowledge That Evolves

Where SDD puts specifications at the center, REAP puts a Genome at the center — a living record stored in .reap/genome/:

principles.md — Architecture decisions with rationale (ADR-style)
conventions.md — Development rules and enforced standards
constraints.md — Technical choices and validation commands
domain/ — Business rules that can't be derived from code

The Genome isn't written once and forgotten. It evolves across generations. When you discover something during implementation that contradicts the Genome, you log it as a backlog item. At the end of each generation, discoveries are reviewed and the Genome is updated. Over time, the Genome becomes an increasingly accurate map of your project — not a spec that drifts from reality.

What Makes It Different from SDD

	SDD	REAP
Source of truth	Specification document	Evolved Genome + source code
Planning scope	Entire project upfront	One generation at a time
When plans break	Spec drift → update spec → regenerate	Discovery → backlog → evolve Genome
Validation	Spec compliance	Actual tests, type checks, builds
Knowledge persistence	Specs (static)	Genome (evolving) + Lineage (history)
Context for AI	Spec document	Genome + generation state (auto-injected)

Context That Persists

Every time you start an AI session in a REAP project, the SessionStart hook automatically injects the Genome, current generation state, and workflow rules into the AI's context. The AI doesn't start from zero — it starts with your project's accumulated knowledge.

This solves SDD's "spec drift" problem at the root. The Genome stays in sync with reality because it's updated as part of the development process, not maintained as a separate artifact.

Try It

npm install -g "@c-d-cc/reap"
reap init my-project
# Open Claude Code or OpenCode
> /reap.start
> /reap.evolve "Implement user authentication"

REAP supports multiple AI agents — Claude Code and OpenCode today, with an extensible adapter system for adding more.

GitHub | Documentation | npm

What's your experience with spec-driven development? Have you found structure that works for AI-assisted development? I'd love to hear in the comments.

References:

I built a dev tool that "evolves" code with AI — REAP

Hichoi-Dev — Wed, 18 Mar 2026 02:57:43 +0000

The Problem

If you've been building with AI agents (like Claude Code), you've probably encountered these problems:

Context loss — Start a new session and your context is gone. You end up clinging to long sessions just to avoid losing everything the AI has learned.
Stale documentation — You try to persist knowledge in READMEs and CLAUDE.md files, but they quietly go stale as the project moves forward.
AI going rogue — Sometimes the AI just ignores your carefully crafted docs and does its own thing anyway.

We're all stuck at the same bottleneck — the context window just isn't enough for long-running projects.

I tried existing tools like spec-kit and superpower — they're decent for one-off feature work, but didn't quite fit for sustained, long-term development.

What I Built

So I built REAP (Recursive Evolutionary Autonomous Pipeline) — an open-source CLI tool inspired by generational evolution in biology.

The idea: AI and humans evolve software across generations.

Genome (Design & Knowledge)
  → Evolution (Generational Progress)
    → Civilization (Source Code)

How It Works

Genome

Your project's design knowledge is managed as a "Genome" — architecture decisions, business rules, conventions, and constraints.

.reap/genome/
├── principles.md      # Architecture principles
├── domain/            # Business rules
├── conventions.md     # Development conventions
└── constraints.md     # Technical constraints

Life Cycle

Each generation follows a five-stage lifecycle:

Objective → Planning → Implementation → Validation → Completion

Objective — Define goal, requirements, and acceptance criteria
Planning — Break down tasks, choose approach
Implementation — Build with AI + human collaboration
Validation — Run tests, verify completion
Completion — Retrospective + apply Genome changes + archive

Evolution

When a generation completes, it gets archived in the lineage, and the next generation picks up new goals.
Lessons learned within a generation get folded back into the Genome.
Through this iterative pipeline, your source code (the "Civilization") keeps evolving.

Quick Start

# Install
npm install -g @c-d-cc/reap

# Initialize
reap init my-project

# Run a full generation in Claude Code
claude
> /reap.evolve "Implement user authentication"

/reap.evolve runs the entire generation lifecycle — from Objective through Completion — interactively with you.

DEV Community: Hichoi-Dev

Everyone Just Discovered Loop Engineering. REAP Got There First — and It's Ready When You Are

First, what the loop people get right

Lesson 1: A goal is not a loop spec

Lesson 2: The loop needs stages, not just repetitions

Lesson 3: The agent must never grade its own homework

Lesson 4: Lock the rules while the loop is running

Lesson 5: If a stage can be skipped, it will be skipped

Lesson 6: Autonomy should be a budget, not a binary

Lesson 7: Loops need exit ramps, not just exit conditions

Ralph loop vs. REAP, honestly

Proof of loop: this tool builds itself

Try a structured loop (5 minutes)

New workflow control method for harness engineering — Signature-Based Locking

The Problem: AI Won't Stay Harnessed

Why Prompt-Only Control Fails

1. Context Rot (Lost in the Middle)

2. Training-Induced Optimization Pressure

3. No Enforcement Boundary

What's Been Tried

Approach 1: Stronger Prompts

Approach 2: File-Level Permissions

Approach 3: Deterministic Scripts

First, You Need a Workflow

Signature-Based Locking: Enforcing Workflow Sequence from Outside the AI

How It Works

What Signature-Based Locking Gives You

Why Signature-Based Locking Works

The Hybrid Architecture

REAP: This Architecture in Practice

What REAP Does

The Signature Chain in REAP

Why It Matters

Related Work: NeMo Guardrails

Try It

Specs Cannot be Source of Source Code — Why Intent Management Matters in AI-Driven Development

The Seductive Idea

Source Code Is Deterministic. Specs Are Not.

Then What Are Specs? Intent, Not Source.

Why We Still Need to Manage Intent

How Teams Have Managed Specs (A Brief History)

The Sync Problem Is a Workflow Problem

REAP's Answer: Genome + Sync + Recursive Workflow

The Genome: A Living Knowledge Base

Sync Through the Lifecycle, Not After It

Recursive, Not One-Shot

The Right Mental Model

Try It

Why Spec-Driven Development Fails— And a Better Way to Structure AI Development

SDD: The Right Problem, Wrong Solution

The Core Problem: Specs Are Non-Deterministic

SDD Is Waterfall in Disguise

Why Specifications Drift

The Real Question

A Different Approach

How REAP Works

The Genome: Knowledge That Evolves

What Makes It Different from SDD

Context That Persists

Try It

I built a dev tool that "evolves" code with AI — REAP

The Problem

What I Built

How It Works

Genome

Life Cycle

Evolution

Quick Start

Links