Summary
Anthropic ran a controlled experiment: Opus 4.5 solo ($9) = 20% success. Add a Harness (5 subsystems) = 100% success at $200. OpenAI confirmed with a million-line repo: one AGENTS.md file changed everything. Stop swapping models. Build your harness first.
The Experiment
| Config | Cost | Success Rate |
|---|---|---|
| Opus 4.5 solo | $9 | 20% |
| Opus 4.5 + Harness | $200 | 100% |
The $191 premium was all verification loops: compile, test, lint, type check.
The 5 Harness Subsystems
| Subsystem | What It Prevents |
|---|---|
| Instructions | Agent doesn't know project conventions |
| Tools | Unauthorized operations, accidental deletes |
| Environment | "Works on my machine" syndrome |
| State | Cross-session amnesia |
| Feedback | Premature victory declarations |
The 3 Fatal Failure Modes
Premature Victory — Agent writes 500 lines, declares "done", CI goes red.
Fix: Pre-commit hook: npx tsc --noEmit
Context Amnesia — Agent adds feature but breaks existing one.
Fix: MEMORY.md — read previous state before acting.
Tool Abuse — Agent runs destructive commands without asking.
Fix: Tool whitelist.
OpenAI's Million-Line Confirmation
OpenAI added one AGENTS.md file (<100 lines) to a million-line repo. Success rate increase was comparable to Anthropic's findings.
Quick Wins: Build Your Harness Today
| Priority | What | Time |
|---|---|---|
| 🥇 |
AGENTS.md in repo root |
30 min |
| 🥇 | Pre-commit CI (tsc/lint/test) | 1 hour |
| 🥈 |
MEMORY.md for session state |
20 min |
| 🥈 | Tool whitelist config | 30 min |
| 🥉 |
setup.sh for environment |
30 min |
FAQ
Q: Is $200 a lot for $9 worth of work? A: The $200 run delivers working code. The $9 run delivers nothing.
Q: Does this apply to small projects? A: Yes. Even one file benefits from AGENTS.md + verification.
Q: Does it work with any AI? A: Yes. The pattern is model-agnostic.
The model is the engine. The harness is the steering wheel, brakes, and seatbelt.
Top comments (0)