Writing a Good CLAUDE.md

#claudemd #agentlint #claudecode #ai

Writing a Good CLAUDE.md

If you have ever opened a CLAUDE.md written six months ago and immediately wished you hadn't, you already understand the problem. The file is supposed to be the durable memory of your project — the thing your AI agent reads at the start of every session before it touches a single line of code. In practice it tends to drift. Rules get added in moments of frustration, never reviewed. Sections grow into 300-line walls that nobody actually reads. Half the lines are aspirational, half are obsolete, and the agent quietly ignores the difference.

This piece is about what makes a CLAUDE.md actually work: how to structure it, what to put in, what to leave out, and why the shortest CLAUDE.md you can write that still encodes your real constraints will always outperform the long one you keep meaning to clean up.

What a CLAUDE.md is for

The framing that finally clicked for me came from a 2026 conversation where Mitchell Hashimoto, OpenAI, and the LangChain team converged on the same definition: an agent is a model plus a harness. The model is the LLM. The harness is everything that wraps it — tools, state, feedback loops, and the persistent rules it reads at session start. CLAUDE.md is one of the load-bearing pieces of that harness. So are AGENTS.md, .cursor/rules, your CI configuration, your pre-commit hooks, and your .gitignore. They all do the same thing: they tell the agent what world it is operating in, before the agent has to figure that out from scratch.

The reason this framing matters is that it changes what you optimize for. You are not writing a beautifully phrased manifesto. You are writing the boot sequence for an autonomous program. Every line is either load-bearing or it is dead weight. If a rule is not specific enough that the agent can act on it without asking a follow-up question, it will be ignored — and worse, the agent will quietly invent its own interpretation. Vague rules are not "safe defaults." They are silent licenses to drift.

What "good" looks like

A good CLAUDE.md has three properties, in this order: it is correct, it is readable in one session, and it is enforced by the harness around it.

Correctness comes first. If the file says "use bun" but your CI uses npm, the file is wrong. If the file says "all changes go through PR" but you push to main yourself when you are in a hurry, the file is wrong. Wrong rules are not just neutral — they actively train your agent to ignore the file, because the agent will see the contradiction with the actual project state and learn that CLAUDE.md is unreliable. The fastest way to make your agent stop following CLAUDE.md is to leave one obviously broken rule in it for two weeks.

Readability is the second constraint, and it is brutal. The agent reads CLAUDE.md at the start of every session, and the cost of reading is paid in context window. Every line you add to CLAUDE.md is a line that competes with the actual code, the actual diff, and the actual question you are trying to answer. A 600-line CLAUDE.md is a luxury you can rarely afford. A 150-line CLAUDE.md that encodes the same constraints is almost always achievable if you are willing to delete.

Enforcement is the third leg, and the one most people skip. A rule that lives only in CLAUDE.md is a wish. A rule that lives in CLAUDE.md and a Husky pre-commit hook, a GitHub Actions check, or a linter step in your CI pipeline is a fact. The agent will follow rules that are also enforced by the harness, because those rules have teeth — the agent learns very quickly that pushing code with console.log left in fails the pre-push check, and adjusts. Rules that exist only in prose are advisory. Use both layers.

The sections every CLAUDE.md needs

Across about a hundred CLAUDE.md files I've audited, the same five sections appear in every file that actually works. Names vary; the contents do not.

The first is what this project is — one paragraph, no marketing. What the codebase does, who it is for, what stack it runs on. The agent needs this to disambiguate similar-sounding files later.

The second is how we work — the small set of process rules that are non-negotiable. Branching strategy, commit conventions, PR policy, testing expectations. Six to ten lines. This is where the agent learns whether it can push directly, whether tests are mandatory, whether commits need to be atomic.

The third is language and style — what languages this project uses, what frameworks are off-limits, what naming conventions to follow. This is also where you encode language for communication: do you want the agent to talk to you in English, in Chinese, terse, verbose. Pick one and write it down.

The fourth is operational notes — the surprising facts about your environment. The path that looks normal but is actually a symlink. The CI check that fails on Windows. The token that expires every Tuesday. This is the section that saves you most often, because it captures the implicit knowledge a new contributor would otherwise have to discover by breaking something.

The fifth is principles — the small set of design rules you want the agent to internalize. Not "write good code." Specific principles like "don't add error handling for cases that can't happen" or "never modify lockfiles by hand." Three to seven principles, each two sentences max. If you have twenty, you have none.

What goes wrong

The most common failure I see is vagueness. A rule like "follow best practices" or "be careful with errors" is a wishful sticky note. The agent cannot act on it. Replace it with a specific, falsifiable rule: "wrap every external API call in a timeout of 5 seconds and retry once on transient errors." Now the agent knows exactly what to do, and you know how to verify it.

The second is contradiction. Two rules that disagree, often added months apart, sit in the same file. "All commits go through PR" and "you can push directly to main for typos" cannot both be true; the agent will pick one and ignore the other. AgentLint flags this kind of contradiction as a hard rule, because contradictions train the agent to treat the file as advisory.

The third is stale references. The file mentions a script that was deleted, a service that was renamed, a directory that was moved. Each stale reference is a small lie the agent has to discover and work around. Run a grep before every CLAUDE.md edit: every path, every command, every URL still resolves.

The fourth is the unbounded section. Someone starts a "Principles" list with five items. Six months later it has thirty. Half are duplicates of each other. None are enforced. Hard cap your principle count and treat it as a permanent budget — to add a new one, retire an old one.

The fifth is language drift. The file mixes English process language with Chinese commit messages with copy-pasted Slack threads. Pick one language for code and rules. Pick another (or the same) for communication. Be consistent. The agent's parser does not care, but a human reviewer will, and the inconsistency is a tell that nobody is curating the file.

A worked example

Here is a before/after I use in workshops. The "before" is a real CLAUDE.md from a real project, lightly anonymized:

## How we work
- We try to write good code
- Tests are important
- Use TypeScript when possible
- Don't break things
- All changes should be reviewed
- Commit often

Six bullets, zero falsifiable rules. "Try" is not a commitment. "Important" is not a behavior. "When possible" is the agent's permission slip to skip TypeScript. "Don't break things" is unactionable. "Should" is advisory.

The "after," same project, same intent:

## How we work
- All TypeScript. No new JavaScript files. Existing JS files migrate when touched.
- Vitest for unit, Playwright for integration. Both must pass before merge.
- Branch from main, open PR, squash-merge. No direct pushes to main.
- One logical change per commit. If you can't summarize the diff in one sentence, split it.
- Pre-push hook runs lint + typecheck + unit tests. Don't disable it.

Same six lines. Now the agent can act on every one of them, and you can write a check that detects violations.

Why the harness needs the linter

The reason we built AgentLint is that even careful authors drift. You write a good CLAUDE.md on Monday, ship a feature on Tuesday, add three rules in a frustrated session on Wednesday, and by Friday the file has a contradiction you didn't notice. The 33 checks AgentLint ships are the failure modes I kept seeing across audits — vague rules, contradictions, stale references, unbounded sections, language drift, missing enforcement, unverifiable claims. Each check cites the primary source it is checking against, so when the linter flags something, you can trace exactly why. The file does not need to be perfect. It needs to be not silently broken, and that is a checkable property.

If you are starting a CLAUDE.md from scratch, write the five sections above, keep it under 200 lines, and run AgentLint against it. If you have an existing one that has drifted, run AgentLint first and fix what it flags. The point is not to get a clean report — the point is to make sure the most expensive rule in the file, the one your agent reads every session, is actually saying what you want.

Originally posted on agentlint.app/blog/writing-a-good-claude-md.