Yanli Liu's "The 4 Lines Every CLAUDE.md Needs" makes a real point. The 4 lines, derived from Andrej Karpathy's January 2026 thread on agent failure modes, all express the same insight: behavioral rules outperform feature rules. Don't assume. Surface tradeoffs. Minimum code that solves the problem. Touch only what you must. Define success criteria. Loop until verified. Each one is portable across stacks and tasks, where prescriptive rules go stale the moment your codebase shifts.
The 4 lines are the floor of a working CLAUDE.md. They are not the ceiling. Most of the CLAUDE.md files I see in the wild — including the ones the article holds up as cautionary tales of "47 rules about code style" — fail because they treat a file as the unit of organization. A production CLAUDE.md is an architecture, not a file.
What the article gets right and what to flag
The behavioral-vs-prescriptive distinction is correct, and the Configuration Paradox is real: past a threshold, more rules produce confused agents, not disciplined ones. Liu's litmus test — would removing this cause a mistake the agent couldn't recover from? — is the right filter for any individual rule.
A few things in the piece do not hold up under inspection. The asserted 6,000 / 12,000 character caps for CLAUDE.md have no source I can verify. The "/plugin marketplace add" command described in the article is not part of base Claude Code. The 94% accuracy stat the piece borrows from another blog has no disclosed methodology. And the "60,000 GitHub stars" figure cited as evidence of Claude Code adoption is unverified. Cite the article for the framing. Do not cite it for the numbers.
The 4 lines do not stand alone for long
Behavioral rules are the right starting point. They are also incomplete the moment you have a real project. You quickly need three other things the 4 lines do not give you:
- Domain context the agent cannot infer from files — what each service does, why a directory is named the way it is, which APIs are read-only vs. write-side, where secrets live.
- Architecture decisions — patterns the agent shouldn't have to re-derive on every task.
- Incident-driven rules — the corrections that came out of specific failures, with enough context that the rule is unambiguous.
If you put all three of these into one CLAUDE.md, you get the 47-rule sprawl Liu warns against. If you leave them out, the agent guesses and the 4 lines do not help — don't assume is a behavior, not a fact.
The fix is structural. Stop accumulating rules in one file. Start delegating them to files with single jobs.
What the architecture looks like in practice
NEXUS — my Claude Code operating layer — runs about 237 lines of CLAUDE.md. That file holds behavioral guardrails and protocols, and almost nothing else. The first two protocols there are Verify Before Reporting and Plan First, Code Second. Both are extensions of the same behavioral category Liu names. Adding fourteen more behavioral protocols at the same level still does not approach 47 rules of code style — they are the same shape as the 4 lines, just covering more failure modes.
What CLAUDE.md does not contain is the project-specific stuff. That lives in delegated files:
-
MEMORY.mdholds 21 numbered, dated, append-only Hard-Won Lessons. Each one came from a specific incident, with the cost of getting it wrong in the entry. LaunchAgent log paths must be on local disk, not SMB (lesson #15) is in there because six of my LaunchAgents silently broke on 2026-04-19 when the path was on a NAS mount. The agent reads MEMORY.md at session start. I wrote about the Mistakes Become Rules pattern last week. -
.claude/rules/holds language-specific and capability-specific rule files.python.mdfor Python work.completeness.mdfor "what counts as done." Each file gets loaded when the agent enters that context, not on every session. -
agents/<domain>-context.mdfor per-system context — finance, content, the DeFi system before it was retired. CLAUDE.md's session-startup protocol tells the agent if a specific domain is in play, read the relevantagents/<domain>-context.md. The agent doesn't load all of them up front. It loads the one that matters. -
SESSION-STATE.mdholds ephemeral active context — what's in flight, what was decided yesterday, what to pick up from. It is the first thing rewritten when a major task closes.
That is the architecture. Behavioral guardrails at the top, in one shared file. Project-, domain-, and incident-specific rules delegated to files with one trigger condition each. The agent reads what's relevant.
The structural version of Liu's litmus test
Liu's would removing this cause a mistake the agent couldn't recover from is the right filter for an individual rule. The structural question is: does this rule belong here, or in a delegated file?
Three quick filters answer that:
- If it changes per-project, it does not belong in CLAUDE.md. Put it in
agents/<domain>-context.mdor a project-specific file. - If it changes per-language or per-tool, it does not belong in CLAUDE.md. Put it in
.claude/rules/<language>.md. - If it came from a real incident with a date and a cost, it does not belong in CLAUDE.md either. Put it in MEMORY.md's Hard-Won Lessons.
What's left in CLAUDE.md is the part that's behavioral, portable, and load-bearing. That tends to be a few dozen entries — bigger than 4, smaller than 47. Each entry is one short paragraph.
Why this scales
Two reasons. First, every file has one update protocol. Hard-Won Lessons are append-only and triggered by corrections. Domain contexts get rewritten when systems change. Behavioral protocols change rarely, and when they do, the change applies everywhere. Mixing them in one file forces every edit to sit next to every other edit, which is how you end up with the 47-rule mess.
Second, the agent's working set at any decision point is smaller. A CLAUDE.md sized for the worst case is a CLAUDE.md the agent has to re-read every time. A CLAUDE.md sized for the always-true case, with delegated files for the contextual case, is one the agent can hold internally — and only loads the rest when the work demands it. This is the same logic I argued for supervision artifacts in the Faye reframe yesterday: institutional memory belongs in files with single owners and lifecycles, not in one file with many.
Anthropic's Building Effective Agents framing draws the same line at the workflow level — predefined paths for the deterministic part, agent autonomy at the seams. The same shape applies to CLAUDE.md. The behavioral floor is the predefined part. The delegated files are the seams.
Where the architecture still needs help
This pattern does not solve everything. Multi-file refactors still need real architecture context the agent cannot derive from reading source. Regulated industries — Fulcrum, the presales workflow stack I run for enterprise customers, lives here — need domain-specific guardrails alongside the behavioral ones, and those guardrails are themselves a maintained artifact, not a one-time rule list. Team-scale consistency is a coordination problem, not a configuration one — the architecture gets you a reproducible shape, but multiple humans still have to agree on which lessons are real lessons.
Tool portability is the last gap. The 4 lines transfer between Claude Code, Cursor, Codex, and others. The delegated file pattern transfers in shape but not in syntax — every agent has its own loading model. That is a real limitation. It is also a smaller limitation than starting from scratch on every tool.
What to take from Liu
The 4 lines are the right floor. Behavioral rules over feature rules. Universal categories over project specifics. The Configuration Paradox is a thing to design against, not just a thing to know.
The ceiling is the architecture above the floor. Behavioral guardrails in one shared file. Project, domain, and language rules delegated. Incident-driven rules in an append-only file the agent reads at session start. CLAUDE.md as the dispatcher, not the rulebook.
Most CLAUDE.md files I see are stuck on the floor or buried under a 47-rule pile. The architecture is the move that gets you out of both.
Top comments (0)