Gábor Mészáros

Posted on Mar 3

CLAUDE.md Best Practices: 7 formatting rules for the Machine

#agents #ai #documentation #productivity

I watched an agent ignore a rule I wrote 2 hours earlier.

Not a vague rule. A specific one. "run pytest before committing." It was right there in the CLAUDE.md, paragraph two, between the project description and the linting setup. The agent read the file. I saw it in the context. It just... didn't follow it.

I moved the same instruction under a ## Testing header, wrapped pytest in backticks, and added a one-line rationale. Next run, the agent followed it to the letter.

The instruction didn't change. The signal strength did.

In the last post, we got the agent oriented — /bootstrap loads the map, the workflows, the boundaries. But orientation and compliance are different things. You can hand someone a perfect briefing and still lose them if the briefing is a wall of text. Same with agents.

The question isn't whether your instructions are loaded. It's whether the agent follows them.

The comparison

Here's the same instruction, two ways.

Version A:

When working on this project, always make sure to run the test suite
before committing any changes. The command to run tests is pytest and
you should run it from the project root. If tests fail, fix them before
committing. Also make sure to use ruff for formatting.

Version B:

## Testing

- `pytest` — run from project root before every commit
- Fix failures before committing

## Formatting

- `ruff check --fix && ruff format` — run before committing

Same content. Version B gets followed. Version A gets buried.

This isn't about aesthetics. Structural elements — headers, code fences, lists — create anchor points that agents latch onto. Prose paragraphs don't. The more structure you provide, the more reliably each instruction lands.

It's not just about length

You already learned to keep your CLAUDE.md short. It's a good start but it's not sufficient. A 20-line prose paragraph gets lost just as easily as a 200-line one. The variable isn't word count. It's structure.

A short file with no headers, no code blocks, and no rationale will underperform a longer file that's well-structured.

Length is the ceiling. Formatting is the signal.

Seven structural rules

These aren't content guidelines. They're formatting choices that determine whether instructions survive the trip from file to agent behavior. I'll start with the three you won't find in other guides, then cover the four that everyone mentions but nobody explains why.

1. Include rationale

"Never force push" is an instruction. "Never force push — rewrites shared history, unrecoverable for collaborators" is an instruction the agent weighs.

# Without rationale
- Never use `rm -rf` on the project root
- Always run tests before committing
- Don't modify package-lock.json manually

# With rationale
- Never use `rm -rf` on the project root — irrecoverable
- Always run tests before committing — CI will reject untested code
- Don't modify package-lock.json manually — causes merge conflicts
  and dependency resolution issues

The rationale doesn't just explain — it gives the agent a way to generalize. An agent that understands why force push is forbidden will also avoid git reset --hard origin/main without being told. The "why" turns a single rule into a class of behaviors.

This is the most undervalued formatting choice. Every prohibition should carry its reason.

2. Keep heading hierarchy shallow

Three levels is enough. h1 for the file title, h2 for sections, h3 for subsections. That's it.

# Before (5 levels deep)
# Project
## Development
### Testing
#### Unit Tests
##### Mocking Strategy

# After (3 levels max)
# Project
## Testing
### Unit tests

Deep nesting dilutes attention. An h5 competes with every heading above it for the agent's focus. It doesn't lose the h2, but the hierarchy creates ambiguity about which level governs. Flat structures keep every instruction at the surface. If you need an h4, you probably need a separate file.

3. Name files descriptively

When an agent searches your project - browsing a directory listing, running a glob, deciding which file to read - the file name is the first filter. Before content, before headers, before anything.

# Before
docs/guide.md
docs/notes.md
scripts/setup.sh

# After
docs/api-authentication.md
docs/deployment-checklist.md
scripts/setup-local-dev.sh

The agent sees a directory listing and picks what to open. api-authentication.md tells it whether the file might be relevant to the current task. guide.md forces it to open and read before it can decide. Descriptive names save the agent a round trip. In a project with dozens of files, that adds up.

This applies to any file the agent might discover: docs, scripts, configs.

Now the four you've heard before - but with a why.

4. Use headers

Agents scan headers the way developers scan a README: as a table of contents. A header says "new topic, reset attention."

# Before
The project uses TypeScript with strict mode enabled. For testing we
use vitest. The CI pipeline runs on GitHub Actions.

# After
## Language

TypeScript with strict mode enabled.

## Testing

- `npx vitest` — run from project root

## CI

- `.github/workflows/` — GitHub Actions

One topic per header. The agent navigates to the right section instead of parsing the whole paragraph. Without headers, every instruction competes with every other instruction for attention.

5. Put commands in code blocks

Commands in prose get read as descriptions. Commands in code blocks get treated as executable.

# Before
You can run the linter by running npm run lint and the tests
by running npm test.

# After
- `npm run lint` — check for issues
- `npm test` — run test suite

If you do nothing else from this post, wrap your commands in backticks. It's the single highest-impact change - a command in a code fence is a command. A command in a sentence is a suggestion.

6. Use standard section names

## Testing gets recognized instantly. ## Quality Assurance Verification Process doesn't.

Agents have been trained on millions of README files. They know what ## Testing, ## Commands, ## Structure, and ## Conventions mean. Those names carry built-in context.

Instead of	Use
Quality Assurance	Testing
Development Guidelines	Conventions
Operational Instructions	Commands
Safety and Compliance	Boundaries
Project Organization	Structure

The familiar name is the signal. The creative name is noise.

7. Make instructions actionable

"Follow best practices" is not an instruction. "Use ruff for formatting, run before committing" is.

The test: could an agent execute this instruction right now, without asking a clarifying question? If not, it's too vague.

# Before
Make sure code quality is maintained and follows our standards.

# After
## Conventions

- Format with `ruff format` before committing
- Type annotations on all public functions
- No `print()` in production code — use `logging`

Every instruction should pass the "act on it immediately" test. If it can't be acted on, it's a wish, not an instruction.

The compound effect

Each rule alone is a small improvement. Together, they're multiplicative - not because the rules add up, but because they reinforce each other. Headers create sections. Sections hold code blocks. Code blocks contain actionable commands. Rationale explains why. Descriptive file names route attention to the right file. Shallow hierarchy keeps everything findable.

Here's a realistic before/after applying all seven:

Before:

This project is a Python CLI tool. We use pytest for testing and ruff
for linting. Make sure to run tests before you commit anything. The
source code is in src/myapp and tests are in tests/. Don't modify
anything in the dist/ folder because that's generated. Also we have
some rules about how to write tests — they should test behavior not
implementation details, and use parametrize instead of writing lots
of individual test functions that do the same thing.

After:

## Testing

- `pytest` — run from project root before every commit
- Test behavior, not implementation — assert on outcomes, not internal calls
- Use `@pytest.mark.parametrize` when cases share the same assertion shape

## Formatting

- `ruff check --fix && ruff format`

## Structure

- Source: `src/myapp/`
- Tests: `tests/`

## Boundaries

- `dist/` — generated, do not modify

Same information. Half the words. Every instruction lands.

When to reformat

If you notice:

The agent apologizes for missing an instruction that's in your file
The same rule gets violated in consecutive sessions
You keep adding more words to an instruction hoping the agent will "get it"
Your CLAUDE.md is one long section with no headers
Commands appear in sentences instead of code blocks

Your instructions don't need more content. They need more structure.

The connection to /bootstrap

In the previous posts we built the delivery system: backbone.yml maps the project, Mermaid draws the workflows, /bootstrap loads both in seconds. That's the orientation layer - the agent knows where it is and how things work.

This is about attention budget allocation. The agent has a limited context window. What matters isn't just what's in it — it's how the agent decides what's relevant at each step. Structure is what makes your instructions win that competition.

Orientation without compliance means the agent knows your project but ignores your rules. Compliance without orientation means the agent follows instructions but works in the wrong place. You need both.

Try it

Open your CLAUDE.md (or whatever instruction file your agent reads)
Find the longest prose paragraph
Break it: one header per topic, one code block per command, one sentence of rationale per prohibition
Run your agent on the same task you ran yesterday

The instructions didn't change. The signal did.

Don't just write more instructions. Format the ones you have.

Top comments (3)

Jamie Cole • Mar 3

The formatting trick is doing what a guardrails layer should. The core issue is agents treat CLAUDE.md as soft context, not hard constraints -- they follow it until they don't. I kept running into this with rules ignored under full context windows. Built claude-guard to enforce at runtime instead of hoping formatting does it: dev.to/clawgenesis/how-to-add-guar...

Gábor Mészáros • Mar 3

Fair point.

CLAUDE.md is soft context. Formatting raises the signal, it doesn't guarantee compliance. Under a full context window, instructions can still get dropped regardless of structure.

The way I think about it: formatting is the pre-execution layer (maximize the probability the agent follows), runtime enforcement is the post-execution layer (catch it when it doesn't). Different problems, both real.

ps: your link points to a 404 page.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.