DEV Community

TengLongAI2026
TengLongAI2026

Posted on

$200 vs $9 — The Anthropic Experiment That Proves Infrastructure > Model Choice

TL;DR — Anthropic ran a controlled experiment comparing AI coding outcomes:

Setup Cost Success Rate
Bare Opus 4.5 $9 20%
+ Harness Engineering $200 100%

The extra $191 was spent on verification loops — compile, test, lint, type-check, repeat. OpenAI ran the same experiment on a million-line codebase and got identical results.

The 5 Subsystems

  1. Instructions (AGENTS.md) — Tell the AI project conventions
  2. Tools (settings.json allowlist) — Prevent file modification mistakes
  3. Environment (setup.sh/Dockerfile) — "It works on my machine" solved
  4. State (MEMORY.md/PROGRESS.md) — Cross-session context
  5. Feedback (CI/lint/type-check) — Catch AI mistakes early

3 Things Today

  1. Add AGENTS.md (30 min) — OpenAI saw improvement from just this file
  2. Pre-commit hooks (1 hour) — npx tsc --noEmit + npm test
  3. Write MEMORY.md (20 min) — Track cross-session context

The Bottom Line

The model is the engine. Harness is the steering wheel, brakes, and seatbelt.

Factor Impact
Model version ~20%
Harness engineering ~80%
Cost multiplier 22x
Success rate delta 80% → 100%

Found this useful? Follow me on Dev.to for daily AI engineering discoveries.

Top comments (0)