DEV Community

Cover image for How to Write a CLAUDE.md Rule That Actually Gets Enforced
Brad Kinnard
Brad Kinnard Subscriber

Posted on

How to Write a CLAUDE.md Rule That Actually Gets Enforced

Open a CLAUDE.md file at random and you'll find build commands, architecture notes, and rules. The rules tend to be the unenforceable kind. "Write clean code." "Be careful with types." "Follow our conventions." The author meant every word. The agent reads them. And nothing checks whether the agent followed them, because nothing can.

In a corpus of 580 CLAUDE.md, AGENTS.md, and .cursorrules files from public GitHub repos with 10+ stars, 74% contained zero machine-extractable rules. Not because the authors didn't care about rules. Because most rules were written in a form no parser could pull out as a deterministic check.

This post is about the difference. Specifically: how to phrase a rule so a parser can extract it and a verifier can check it, without sacrificing what you actually meant.

The principle

Enforceability comes from a verifiable surface. A rule is verifiable when there's a concrete pattern in code that either matches it or doesn't. "Use camelCase for function names" is verifiable: read the AST, list the function names, check the casing. "Name things consistently" is not: there's no concrete pattern to check, only a judgment to make.

The gap between the two is the gap between intent and enforcement. You meant the same thing in both cases. But only one of them survives translation into a check.

Here's the heuristic I use: could a junior engineer with no context mechanically check whether code follows this rule, just by reading the rule and looking at the code? If yes, the rule is enforceable. If they'd have to ask "what does 'consistent' mean here?", it isn't.

Three pairs, walked through

Take a few common intents and look at how they fail or succeed.

Type safety. You want strong typing.

Bad:  Be careful with types.
Good: No `any` types in `src/`. Async functions require explicit return types.
Enter fullscreen mode Exit fullscreen mode

The bad version has no surface. "Careful" isn't a check. The good version has two: a forbidden token (any) and a structural property (return type annotation on async function declarations). Both check directly against the AST.

Module structure. You want predictable imports and exports.

Bad:  Prefer clean module structure.
Good: Named exports only. No default exports. Filenames in kebab-case.
Enter fullscreen mode Exit fullscreen mode

"Clean" is meaningless to a parser. Named-only and kebab-case are both binary properties of code that exist or don't. The first version sounds like more guidance because it's broader, but breadth is the problem: it covers everything and enforces nothing.

Preferences over alternatives. You want React functional components, not class components.

Bad:  Write modern React.
Good: Prefer functional components over class components.
Enter fullscreen mode Exit fullscreen mode

"Modern" is a moving target with no fixed surface. The "prefer X over Y" pattern, on the other hand, has a clean check: count instances of each, compute a ratio, score against a threshold. This is one of the most useful patterns in instruction files because it captures real-world preference (not absolute prohibition) in a measurable way.

The reference table

Twelve common intents, paired:

Intent Unenforceable Enforceable
Naming functions Name things consistently Use camelCase for function names
Filenames Pick reasonable filenames All filenames in kebab-case
Type safety Be careful with types No any types
Async return types Make types clear Async functions require explicit return types
Module exports Prefer clean module structure Named exports only, no default exports
File size Keep files manageable Maximum 300 lines per file
Logging Be mindful of logging Never use console.log; use src/logger.ts
Component style Write modern React Prefer functional components over class components
Package manager Use the right package manager Use pnpm, not npm
Test files Keep tests organized All test files end with .test.ts
Error handling Handle errors properly Async functions must use try/catch or return a Result type
Commit format Write clear commit messages Use conventional commits (feat:, fix:, chore:)

Every right-hand cell points at a concrete check: a token, a casing rule, a count, a file pattern, a configured tool. Every left-hand cell points at a judgment.

Real-world rules usually carry scope: "no any in src/," "named exports outside index.ts files," "no console.log in production code paths." Scope makes a rule narrower and more accurate without making it less enforceable. The interop layer that genuinely needs any keeps it; the rest of the codebase doesn't.

What kinds of checks exist

Worth knowing what's available, because it shapes what's writable. Static analysis tools targeting AI instruction files generally support a few classes of check:

  • AST-level: function names, type annotations, import patterns, forbidden tokens, structural properties
  • Filesystem: file existence, naming conventions, directory layout, file size limits
  • Regex: literal strings, content patterns, conventional formats
  • Tooling: presence and configuration of linters, formatters, package managers, test runners
  • Config-file: contents of .eslintrc, tsconfig.json, .prettierrc, etc.
  • Git-history: commit message formats, branch naming conventions
  • Preference ratios: "prefer X over Y" with a compliance percentage instead of a binary verdict

If your rule maps to one of these classes, it's enforceable. If it doesn't, it isn't. The trick when writing instruction files is to keep that map in mind: when you're about to write "be careful with X", ask which of these classes "carefulness with X" lives in. Usually the answer points at a concrete reformulation.

The unenforceable rules aren't worthless

Here's a real tension: most of what makes a CLAUDE.md useful isn't enforceable at all. Project context (what the repo does, where the architecture lives), agent behavior directives (be succinct, ask before deleting, don't touch /legacy), and onboarding instructions are all valuable. None of them extract as rules.

Don't try to make them enforceable. They're a different kind of content with a different purpose. Project context grounds the agent. Behavior directives shape its style. Neither is supposed to be checked against output; they're checked against the agent's process, which is a different problem.

The mistake worth avoiding is letting unenforceable prose crowd out enforceable rules. Anthropic's Claude Code best practices doc recommends deleting any instruction the model already follows correctly without it. Most "write clean code" style rules fail that test: the model already does its version of clean code, so the line is taking up attention budget your agent could be spending on the specific, verifiable rules that actually distinguish your codebase from a generic project. Cut what the model already does. Keep the checks.

A test for your own files

Pull up your CLAUDE.md or AGENTS.md right now. For each line that looks like a rule, ask:

  1. Could a junior engineer check this without asking clarifying questions?
  2. Does it name a specific pattern, file, token, casing, or value?
  3. Would 5 different reviewers all agree on whether a piece of code passes this rule?

If a rule fails 1 or 2, it's not a rule, it's a wish. If it fails 3, it's ambiguous. Rewrite or delete.

If you want a mechanical version of this test, RuleProbe parses CLAUDE.md, AGENTS.md, .cursorrules, .windsurfrules, GEMINI.md, and copilot-instructions.md against 102 matchers and tells you which lines extracted as rules and which didn't:

npm install -g ruleprobe
ruleprobe parse ./CLAUDE.md --show-unparseable
Enter fullscreen mode Exit fullscreen mode

The --show-unparseable flag is the interesting one. It surfaces every line that looked rule-shaped but didn't map to a check. That list is your rewrite queue.

RuleProbe on GitHub

What this leaves out

The hardest case is rules like "follow the existing error handling pattern in this codebase." That's enforceable in principle (compare new code's structural shape against the codebase's dominant pattern), but not by simple AST or regex matching. It needs codebase-aware analysis. Some tools handle that; most don't. If you find yourself writing those kinds of rules, know that they'll either need a tool that does pattern profiling or they'll stay aspirational.

The other thing enforceability doesn't catch: an agent that follows every rule and still writes broken code. Static rules reduce variance, they don't eliminate it. A function with any removed and an explicit return type can still have wrong logic. Treat passing rule checks as a floor, not a ceiling.

Top comments (0)