DEV Community

TengLongAI2026
TengLongAI2026

Posted on

Anthropic's $200 Experiment: How AI Success Rate Jumped From 20% to 100% With a Harness

Summary

Anthropic ran a controlled experiment: Opus 4.5 solo ($9) = 20% success. Add a Harness (5 subsystems) = 100% success at $200. OpenAI confirmed with a million-line repo: one AGENTS.md file changed everything. Stop swapping models. Build your harness first.


The Experiment

Config Cost Success Rate
Opus 4.5 solo $9 20%
Opus 4.5 + Harness $200 100%

The $191 premium was all verification loops: compile, test, lint, type check.


The 5 Harness Subsystems

Subsystem What It Prevents
Instructions Agent doesn't know project conventions
Tools Unauthorized operations, accidental deletes
Environment "Works on my machine" syndrome
State Cross-session amnesia
Feedback Premature victory declarations

The 3 Fatal Failure Modes

Premature Victory — Agent writes 500 lines, declares "done", CI goes red.
Fix: Pre-commit hook: npx tsc --noEmit

Context Amnesia — Agent adds feature but breaks existing one.
Fix: MEMORY.md — read previous state before acting.

Tool Abuse — Agent runs destructive commands without asking.
Fix: Tool whitelist.


OpenAI's Million-Line Confirmation

OpenAI added one AGENTS.md file (<100 lines) to a million-line repo. Success rate increase was comparable to Anthropic's findings.


Quick Wins: Build Your Harness Today

Priority What Time
🥇 AGENTS.md in repo root 30 min
🥇 Pre-commit CI (tsc/lint/test) 1 hour
🥈 MEMORY.md for session state 20 min
🥈 Tool whitelist config 30 min
🥉 setup.sh for environment 30 min

FAQ

Q: Is $200 a lot for $9 worth of work? A: The $200 run delivers working code. The $9 run delivers nothing.

Q: Does this apply to small projects? A: Yes. Even one file benefits from AGENTS.md + verification.

Q: Does it work with any AI? A: Yes. The pattern is model-agnostic.


The model is the engine. The harness is the steering wheel, brakes, and seatbelt.

Top comments (0)