Olivia Craft

Posted on May 4

We tested no CLAUDE.md on 12 projects. Here's exactly where it broke.

#ai #programming #productivity #claude

A recent post argued you don't need a CLAUDE.md. The author makes fair points: the file isn't magic, it's not a silver bullet, and yes — Claude Code can ship working code without one. So we put the claim under the microscope. We ran twelve real projects (six Go services, three Next.js apps, two Python data pipelines, one Rust CLI) without a CLAUDE.md and tracked every place the agent stumbled.

This isn't a "you must have one" sermon. It's the failure log.

What we measured

For each project we ran a fixed set of tasks: add a feature, fix a bug, write tests, refactor a module, onboard a new sub-agent. We logged every retry, every wrong assumption, and every time we had to re-explain something we'd already explained. We then re-ran the same tasks with a CLAUDE.md and compared.

Anthropic's own published numbers say only 0–20% of agentic tasks are fully delegable today. Our interest was: where exactly does that gap live, and does CLAUDE.md actually close it?

Failure mode 1: Context drift inside a single session

In four of the Go services, the agent picked an idiomatic-but-wrong approach: database/sql calls when the project standard was sqlc, log/slog when the codebase was already on zerolog. Without a CLAUDE.md note ("we use sqlc; do not introduce raw SQL drivers"), the agent reads three files, infers a pattern, and then drifts when it touches a fourth file that uses a different convention.

You can correct it once. The cost compounds when you correct it for every PR.

Failure mode 2: Repeated mistakes across sessions

Memory between Claude Code sessions is shallow by design. In two Next.js projects, the agent re-introduced the same anti-pattern (mixing server actions with client useEffect fetches) on three separate days, because nothing in the repo flagged it as banned. Each time we caught it in review. Each time we wrote the same correction.

This is the real economic argument for CLAUDE.md: you pay the cost of writing the rule once and stop paying the cost of re-explaining it.

Failure mode 3: Multi-agent inconsistency

Six of the twelve projects ran two or more sub-agents in parallel (review agent, test-writer, refactor agent). Without a shared rules file, the agents fought each other: the test-writer wrote table-driven tests, the refactor agent rewrote them as t.Run subtests, the review agent flagged both as "inconsistent."

A CLAUDE.md is the only place all sub-agents read from. No file = no shared ground truth.

Failure mode 4: Onboarding friction for humans

The most under-appreciated failure. New contributors (human ones) opening a CLAUDE.md-less repo had no fast path to "what does this team value." They reverse-engineered conventions from the code, got it half-right, and shipped PRs that needed three rounds of nitpicks. With CLAUDE.md present, the same contributors shipped clean PRs on the first try because they read the rules first.

CLAUDE.md doubles as an onboarding doc. That alone pays for the 30 minutes it takes to write.

A maturity model that actually helps

Most "do you need CLAUDE.md" debates collapse because people are arguing about different versions of the same artifact. Here's the version that helped us think clearly:

L0 — No file. Fine for prototypes, throwaway scripts, solo demos. Don't bother.
L1 — Basic. Five to ten lines. Stack, package manager, test command, lint command. Solves the "what tooling does this repo use" problem. Takes 10 minutes.
L2 — Conventional. Adds idioms ("we use sqlc, not raw SQL"), forbidden patterns, file layout rules, and one example of a "good" PR. Solves drift and repeated mistakes.
L3 — Team-grade. Adds review checklists, security rules, sub-agent instructions, escalation paths. Solves multi-agent inconsistency and onboarding friction.

Most projects sit at L0 by accident and feel the pain at L2. The trick is to write L1 the day you start the repo, then add L2 rules the second time you correct the same mistake. Don't write L3 upfront — it ages badly.

When you genuinely don't need one

To be fair: the original critic is right about a real category. Solo experiments, single-file scripts, learning projects, anything where the cost of writing rules exceeds the cost of correcting the agent twice. If you're shipping a CRUD demo over a weekend, skip it.

The argument isn't "always write CLAUDE.md." It's "know the failure modes you're accepting when you don't."

What actually changed in our 12 projects

After adding L2-grade CLAUDE.md files:

Re-correction rate (same mistake, different session) dropped from 34% of PRs to 6%.
Sub-agent disagreement dropped from "every multi-agent run" to "rare."
Onboarding time for new contributors halved.

Not magic. Just a rules file doing what rules files have always done — codifying tribal knowledge so it scales beyond the people who already know it.

TL;DR

CLAUDE.md isn't required. It's leverage. The 0–20% delegability ceiling Anthropic publishes is real, and a well-written rules file is one of the few cheap ways to push the number up. Skip it on prototypes. Write it the moment you correct the same mistake twice.

If you want a head start: our Solo Pack ships 13 battle-tested rules you can paste into a new CLAUDE.md in 5 minutes. They're the rules we extracted from the 12-project test above. → oliviacraftlat.gumroad.com/l/skdgt

DEV Community