The Problem: AI Won't Stay Harnessed
If you've been building AI-assisted development workflows — what some call "harness engineering" — you've hit this wall:
No matter how carefully you craft your prompts, the AI eventually goes off-script.
You define a multi-step workflow. The AI follows it for a while. Then somewhere around step 4, it decides to "optimize" by skipping steps, modifying files directly, or inventing a shortcut that breaks your entire pipeline.
This isn't a prompting failure. It's a fundamental limitation of prompt-only workflow control.
Why Prompt-Only Control Fails
Three documented forces work against prompt-based workflow enforcement:
1. Context Rot (Lost in the Middle)
As conversations grow longer, instructions from the beginning of the context window lose influence. Research published in TACL ("Lost in the Middle") demonstrates that LLMs exhibit a U-shaped attention curve — they attend strongly to the beginning and end of context, but performance degrades by over 20% for information in the middle. Your carefully structured "NEVER do X" rules get diluted by thousands of tokens of subsequent conversation.
2. Training-Induced Optimization Pressure
This isn't speculation — it's documented behavior. RLHF training creates measurable pressure toward concise, "helpful" responses, because human evaluators systematically prefer them. Anthropic's own prompting best practices explicitly state that newer Claude models "may skip detailed summaries for efficiency." OpenAI acknowledged the same phenomenon when GPT-4 became "lazy" in December 2023, requiring a new model checkpoint to fix.
When an AI sees a 5-step workflow where steps 2-4 seem like overhead, it has a trained tendency to compress. This is the model being helpful — and breaking your harness in the process.
3. No Enforcement Boundary
Prompts are suggestions, not constraints. There's no mechanism to prevent the AI from taking an action — you can only ask it not to. Research on specification gaming shows that models can learn to satisfy the apparent goal while bypassing the intended process — including modifying unit tests to pass instead of writing correct code. Prompts operate in the same trust domain as the AI itself.
What's Been Tried
Approach 1: Stronger Prompts
Add more rules. Make them UPPERCASE. Use XML tags. Add "CRITICAL" and "NEVER" and "NON-NEGOTIABLE."
This helps initially but doesn't solve context rot. The more rules you add, the more diluted each individual rule becomes.
Approach 2: File-Level Permissions
Restrict which files the AI can modify (e.g., strict mode, read-only markers). This prevents certain destructive actions but doesn't enforce workflow ordering. The AI can still call commands out of sequence.
Approach 3: Deterministic Scripts
Move workflow logic out of prompts and into deterministic scripts. The AI calls scripts instead of modifying state directly. This is the right direction — but scripts alone can't prevent the AI from calling them out of order, or skipping them entirely.
First, You Need a Workflow
Before we can lock anything, we need something to lock. Signature-Based Locking assumes your AI-assisted work follows a defined lifecycle with ordered stages — a workflow where step N must complete before step N+1 begins.
This is the lifecycle steps behind REAP (Recursive Evolutionary Autonomous Pipeline), where each unit of work — called a Generation — follows a 5-stage lifecycle:
Objective → Planning → Implementation → Validation → Completion
Each stage has a clear purpose: define the goal, break it into tasks, build it, verify it works, then retrospect and archive. Stages produce artifacts, and transitions between stages are explicit — you can't "drift" from planning into implementation without a deliberate transition.
This kind of structured workflow is where AI agents provide the most value (creative work within each stage) but also where they cause the most damage (skipping stages, going out of order, bypassing gates). The more structured your workflow, the more you need enforcement.
Signature-Based Locking: Enforcing Workflow Sequence from Outside the AI
Given a structured workflow, here's the insight: sequence enforcement must happen outside the AI's trust boundary.
The AI can ignore prompts. Content guardrails can filter what it says. But neither can enforce the order in which steps are executed. Cryptographic signatures can.
How It Works
Each stage command generates a random nonce, stores its SHA256 hash (mixed with execution context) in the workflow state file, and returns the raw nonce to the AI. To advance to the next stage, the AI must pass this nonce to the transition command, which recomputes the hash and verifies it matches.
The critical property: only the actual script execution can produce a valid nonce. The AI receives the nonce as output, but cannot reverse-engineer or fabricate one. The hash is stored in a managed state file that the AI is structurally prevented from modifying directly.
What Signature-Based Locking Gives You
Existing approaches each solve a piece of the puzzle: prompts communicate intent, file permissions restrict access, content guardrails filter unsafe output, and deterministic scripts encode logic. But none of them enforce execution sequence — the guarantee that step N actually happened before step N+1.
Signature-Based Locking fills this gap. It doesn't replace the other approaches; it adds the missing dimension. Here's how it stacks up:
Why Signature-Based Locking Works
| Threat | Prompt-only | Signature-Based Locking |
|---|---|---|
| AI skips a stage | Possible | Blocked (no nonce) |
| AI modifies state directly | Possible | Blocked (hash mismatch) |
| AI replays a previous step | Possible | Blocked (context-bound nonce) |
The Hybrid Architecture
Signature-Based Locking is most effective as part of a hybrid architecture that separates deterministic and creative work:
Key principle: The deterministic script handles everything that has a "right answer" — state transitions, gate checks, file validation, hook execution. The AI handles everything that requires creativity — writing code, making design decisions, solving problems.
The scripts communicate with the AI through structured JSON output:
{
"status": "ok",
"command": "objective",
"phase": "complete",
"message": "Objective stage complete. Advance with: /reap.next a3f8c2d9..."
}
The AI receives clear instructions and a nonce. It cannot advance without passing the nonce to the next command. The deterministic script verifies and controls the flow.
REAP: This Architecture in Practice
This is exactly how REAP (Recursive Evolutionary Autonomous Pipeline) works. REAP is an open-source CLI tool that structures AI-assisted development as an evolutionary process — software evolves across Generations, each carrying one goal through a 5-stage lifecycle.
What REAP Does
- Genome — Your project's design knowledge (architecture decisions, conventions, constraints, business rules) is managed as a living document that evolves across generations
- Lifecycle — Each generation follows: Objective → Planning → Implementation → Validation → Completion
- Signature-Based Locking — Stage transitions require cryptographic nonce verification, preventing the AI from skipping stages or going off-script
- Session Persistence — The Genome and current generation state are automatically injected into the AI's context at session start, solving the "context loss across sessions" problem
- Multi-Agent Support — Works with Claude Code and OpenCode.
The Signature Chain in REAP
/reap.start "Build user auth"
→ Script creates generation, stores hash
→ AI receives instructions
/reap.objective
→ AI defines goals, writes artifact
→ Script verifies artifact, generates nonce
→ Message: "Advance with: /reap.next a3f8c2..."
/reap.next a3f8c2...
→ Script verifies SHA256(a3f8c2 + genId + stage) == stored hash
→ ✅ Match → advance to planning
→ ❌ Mismatch → "Token verification failed. Re-run the stage command."
/reap.planning
→ AI creates implementation plan
→ Script generates new nonce
→ Message: "Advance with: /reap.next b7d91e..."
... chain continues through all stages ...
Each nonce is single-use, context-bound (includes generation ID and stage name), and cryptographically verified. The AI cannot skip ahead, replay, or forge tokens.
Why It Matters
After building 109 generations with REAP (yes, REAP is built with REAP), we've seen firsthand that prompt-only workflow control breaks down at scale. The AI "optimizes" by skipping validation, modifying state files directly, or calling commands out of order.
Signature-Based Locking eliminated these failure modes — not by adding more rules to the prompt, but by making rule violation mechanically impossible.
Related Work: NeMo Guardrails
It's worth mentioning NVIDIA's NeMo Guardrails, which shares an important philosophy with Signature-Based Locking: don't rely on prompts alone — enforce rules in code.
NeMo Guardrails places a programmable middleware between the user and the LLM. User input is normalized into intents, Colang rules determine whether to call the LLM or return a pre-defined response, and the output is screened against safety policies. This gives developers precise control over what the AI can say — blocking toxic content, preventing jailbreaks, enforcing topic boundaries, and detecting hallucinations through factual grounding checks. It integrates with LangChain, LlamaIndex, and supports GPU-accelerated evaluation for production workloads.
This is genuinely valuable for chatbots, customer-facing AI, and any application where content safety matters. The core insight — layering deterministic safety on top of probabilistic LLMs — is sound.
Where the two approaches diverge is the dimension of control:
| NeMo Guardrails | Signature-Based Locking | |
|---|---|---|
| Controls | What the AI says (content) | What order the AI executes steps (sequence) |
| Mechanism | Input/output filtering via policy rules | Cryptographic nonce chain across steps |
| Prevents | Toxic content, jailbreaks, hallucinations | Stage skipping, out-of-order execution, state tampering |
| Best for | Chatbots, customer-facing AI | Multi-step workflows, autonomous agents |
They're complementary, not competing. You could use NeMo Guardrails to ensure the AI doesn't produce unsafe content, and Signature-Based Locking to ensure it follows the correct execution sequence. Different dimensions, same philosophy.
Try It
npm install -g @c-d-cc/reap
reap init my-project
# Open Claude Code or OpenCode
> /reap.evolve "Implement user authentication"
GitHub | Documentation | npm
Have you struggled with keeping AI agents on-script in multi-step workflows? What approaches have you tried? I'd love to hear about your harness engineering experiences in the comments.
References:
- Lost in the Middle: How Language Models Use Long Contexts — Liu et al. (TACL)
- Towards Understanding Sycophancy in Language Models — Anthropic
- Claude Prompting Best Practices — Anthropic
- OpenAI Patches "Lazy" GPT-4 — Futurism
- Specification Gaming in Reasoning Models — arXiv
- Reward Hacking in LLMs — Lilian Weng
- Reward Tampering Research — Anthropic
- Embers of Autoregression — PNAS
- NeMo Guardrails — NVIDIA
- NeMo Guardrails 소개 — PyTorch Korea
- REAP — GitHub


Top comments (0)