TL;DR — Anthropic ran a controlled experiment comparing AI coding outcomes:
| Setup | Cost | Success Rate |
|---|---|---|
| Bare Opus 4.5 | $9 | 20% |
| + Harness Engineering | $200 | 100% |
The extra $191 was spent on verification loops — compile, test, lint, type-check, repeat. OpenAI ran the same experiment on a million-line codebase and got identical results.
The 5 Subsystems
- Instructions (AGENTS.md) — Tell the AI project conventions
- Tools (settings.json allowlist) — Prevent file modification mistakes
- Environment (setup.sh/Dockerfile) — "It works on my machine" solved
- State (MEMORY.md/PROGRESS.md) — Cross-session context
- Feedback (CI/lint/type-check) — Catch AI mistakes early
3 Things Today
- Add AGENTS.md (30 min) — OpenAI saw improvement from just this file
- Pre-commit hooks (1 hour) — npx tsc --noEmit + npm test
- Write MEMORY.md (20 min) — Track cross-session context
The Bottom Line
The model is the engine. Harness is the steering wheel, brakes, and seatbelt.
| Factor | Impact |
|---|---|
| Model version | ~20% |
| Harness engineering | ~80% |
| Cost multiplier | 22x |
| Success rate delta | 80% → 100% |
Found this useful? Follow me on Dev.to for daily AI engineering discoveries.
Top comments (0)