$200 vs $9 — The Anthropic Experiment That Proves Infrastructure > Model Choice

#productivity #devops #ai

TL;DR — Anthropic ran a controlled experiment comparing AI coding outcomes:

Setup	Cost	Success Rate
Bare Opus 4.5	$9	20%
+ Harness Engineering	$200	100%

The extra $191 was spent on verification loops — compile, test, lint, type-check, repeat. OpenAI ran the same experiment on a million-line codebase and got identical results.

The 5 Subsystems

Instructions (AGENTS.md) — Tell the AI project conventions
Tools (settings.json allowlist) — Prevent file modification mistakes
Environment (setup.sh/Dockerfile) — "It works on my machine" solved
State (MEMORY.md/PROGRESS.md) — Cross-session context
Feedback (CI/lint/type-check) — Catch AI mistakes early

3 Things Today

Add AGENTS.md (30 min) — OpenAI saw improvement from just this file
Pre-commit hooks (1 hour) — npx tsc --noEmit + npm test
Write MEMORY.md (20 min) — Track cross-session context

The Bottom Line

The model is the engine. Harness is the steering wheel, brakes, and seatbelt.

Factor	Impact
Model version	~20%
Harness engineering	~80%
Cost multiplier	22x
Success rate delta	80% → 100%

Found this useful? Follow me on Dev.to for daily AI engineering discoveries.

DEV Community

$200 vs $9 — The Anthropic Experiment That Proves Infrastructure > Model Choice

The 5 Subsystems

3 Things Today

The Bottom Line

Top comments (0)