Stop Wasting Tokens: How to structure Claude.md for complex codebases.

#agents #claude #llm #productivity

When I first set up Claude Code across our monorepo, I did the obvious thing: one AGENTS.md at the root, everything I could think of in it, and the assumption that more context meant better output. That seemed right. It's how you'd brief a new engineer, give them everything upfront and let them sort out what's relevant.

It didn't work the way I expected. The agent would follow instructions correctly on one task and ignore them entirely on the next. Some modules got good output. Others were a mess. I couldn't find a pattern to it. I couldn't debug it. I kept adding more to the file, assuming I'd missed something, which made things worse in ways I didn't understand at the time.

This article is about what I eventually figured out - looking back at it now, the logic seems fairly clear, but at the time I was mostly just noticing that things weren't working and poking at it until they did.

The process of fixing that is what this article is about. I'll share what we changed, how we structured things, and the reasoning behind each decision, including some data from Augment Code's systematic study of AGENTS.md files that validated (and sometimes corrected), hackernews discussions and what I'd been doing intuitively.

The Problem With One Big File

Our repo looks roughly like this:

repo/
├── AGENTS.md
├── notifications/
├── api-gateway/
├── auth/
├── common/          ← git submodule
└── orchestrator/

Each module has different responsibilities, different dependencies, different testing setup, different gotchas. common is a git submodule shared across multiple repositories: it contains shared utilities, custom error types, middleware logic, and all the infrastructure and build Makefiles. The agent needs to know it's a submodule and treat it differently from the rest.

Writing one root file that covers all of it meant one of two things:

either the file was so generic it said nothing useful, or
it was so long it started actively hurting performance.

That second part took me a while to accept, because it ran against my instincts. I had assumed that an overloaded AGENTS.md would just be inefficient - the agent would skim past the irrelevant bits. What I was actually seeing was that output quality degraded across the board, not just on tasks where the extra context was irrelevant. I didn't have a good explanation for it until I came across Augment Code's systematic study of AGENTS.md files. They'd measured this directly: the best-structured files produced output quality equivalent to a model upgrade. The worst files - long, overloaded ones - pushed output below the baseline of having no file at all. That matched exactly what I'd been seeing. I just hadn't known that was the mechanism.

The fix was a hierarchy: one lean root file covering only universals, and a focused AGENTS.md inside each module covering only what's specific to that module. common gets special treatment.

repo/
├── AGENTS.md                 ← root file
├── notifications/
│   └── AGENTS.md
├── api-gateway/
│   └── AGENTS.md
├── auth/
│   └── AGENTS.md
├── common/                   ← git submodule
│   └── AGENTS.md
└── orchestrator/
    └── AGENTS.md

What I Put in the Root File (and What I Removed)

The root AGENTS.md is now under 120 lines. That's a constraint I hold to deliberately. Here's what it covers:

Repo layout - one or two sentences per top-level folder. Enough for the agent to navigate, not a tour. Critically, I explicitly list that each module has its own AGENTS.md and tell the agent to read the relevant one before starting work in that folder. I also call out common explicitly as a git submodule, the agent needs to know not to modify it directly when working in another module, and to check it first before writing anything that might already exist there.
The common submodule rules. This gets its own bullet in the root file because the mistake is too expensive otherwise. The agent's natural instinct when it needs a utility or middleware is to write one. If it doesn't know common exists and what lives there, it'll reinvent things that already exist - and worse, write them inconsistently across modules. The root file states this clearly: before writing any new utility, error type, or middleware logic, check common/ first.
Universal runtime conventions - language version, package manager, build system. Things that affect every module equally.
How to run things - install, test, build. Short and concrete.
Git conventions - branch format, what branches to never touch directly.
A pointer to reference docs - more on this below.
What I removed: code style guidelines, module-specific patterns, architecture explanations, anything with "we decided to do X because Y." All of that went somewhere else, or was cut entirely.
The rule I use: if a piece of information only matters when the agent is working in a specific module, it doesn't belong in the root file. It will consume instruction budget on every session even when it's irrelevant.
Why Instruction Budget Matters More Than I Expected
The Augment Code study was also where I found the number that made this concrete: frontier models can reliably follow around 150–200 instructions before quality starts degrading - and not just at the end of the file. Uniformly, across everything. Claude Code's own system prompt already consumes roughly 50 of those before it touches my AGENTS.md.
I hadn't thought about it this way before. I'd been treating the file as additive - each line either helps or is neutral. The reality is it's a budget, and I'd been spending it carelessly. Every instruction competes against every other instruction. The agent doesn't quietly drop the bad ones; it gets worse at following all of them.
Once I understood this, I stopped asking "is this useful?" about each line and started asking "is this useful enough to spend part of a finite budget on?" Those are different questions, and the second one cuts a lot more.
The Overexploration Trap I Kept Hitting
Before I understood what was going wrong, I noticed a pattern: the agent would do a simple task, but take forever getting there. It would read file after file, open docs it didn't need, and sometimes come back with an incomplete or wrong answer despite clearly having explored the codebase extensively.
I was causing this myself. The culprit was usually one of two things:
Too much architecture description. I had sections explaining why services were structured a certain way, how the event bus worked, what the API gateway routing decisions were. Detailed, well-intentioned context. What it actually did was signal to the agent: "there's a lot to understand here, go read more." On a task that should have taken minutes, the agent would load 80K+ tokens of context it didn't need and come back confused.
The fix: I cut all explanatory "why" content. If an architectural decision affects how the agent should write code, I state the rule. The reasoning lives in the wiki.
Long lists of don'ts. I had accumulated a lot of warnings over time. Don't do X, don't do Y, be careful with Z. The agent, on every task, would check each warning for relevance - reading migration scripts, checking API compatibility, exploring auth boundaries - even when the task had nothing to do with any of it.
The fix: every don't now has a paired do. "Don't instantiate HTTP clients directly" became "Don't instantiate HTTP clients directly - use src/http/client.ts which includes retry middleware." The first version made the agent cautious and exploratory. The paired version tells it what to do and lets it move on.
Progressive Disclosure: The Structural Change That Helped Most
Once I understood the overexploration problem, the solution became clear: the AGENTS.md shouldn't contain everything the agent might need. It should tell the agent how to find what it needs.
I restructured each module to have a short AGENTS.md that points to reference files:
auth/
├── AGENTS.md ← short, under 100 lines
└── agent_docs/
├── token_lifecycle.md
├── session_patterns.md
└── error_handling.md
The AGENTS.md lists these files with a one-line description each and instructs the agent to pull whichever are relevant before starting. In practice, I've found this works well - the agent reads what's relevant to the task and ignores the rest.
The rule I follow: pointers, not copies. I don't paste code snippets into reference files if they'll go stale. I use file:line references to point the agent at the actual source. When the code changes, the reference stays valid.
This matters even more for common. Instead of documenting what utilities exist inline, the common/AGENTS.md is essentially an index - here's what's in each directory, here's the file to look at. The agent reads it, finds what it needs, and uses it rather than reimplementing it.

What Goes in Each Module's AGENTS.md
After a few iterations, the module-level files follow a consistent pattern. Here's roughly what the auth module's looks like:

auth - AGENTS.md

Purpose

Handles authentication and session management.
Issues tokens consumed by api-gateway and orchestrator.
Do not import auth internals directly from other modules - go through the api-gateway.

Stack differences from root

Redis for session storage
JWT for access tokens, opaque tokens for refresh ## When to use JWT vs opaque token | Scenario | Token type | |-------------------------------------|---------------| | Short-lived API access | JWT | | Refresh or long-lived session | Opaque | | Service-to-service calls | JWT | ## Workflow: adding a new auth strategy
Create strategy class in src/strategies/, following src/strategies/email_password.ts
Register it in src/strategy_registry.ts
Add the relevant middleware in common/middleware/ if the logic is reusable - check there first
Write unit tests in src/__tests__/strategies/ ## Non-obvious rules
Don't write custom error classes here - they live in common/errors/
Don't duplicate middleware - check common/middleware/ before writing anything new
Token signing config comes from env, never hardcoded ## Reference docs
agent_docs/token_lifecycle.md - how access and refresh tokens are issued and invalidated
agent_docs/session_patterns.md - Redis session structure and TTL conventions
agent_docs/error_handling.md - which error classes to use from common And the common submodule gets a different kind of file - it's an index, not a set of instructions: # common - AGENTS.md ## What this is A git submodule shared across multiple repositories. Do not modify it unless the task explicitly requires a change to shared infrastructure. Changes here affect all consumers. ## Contents
utils/ - shared utility functions (date formatting, string helpers, pagination)
errors/ - custom error base classes and typed HTTP errors
middleware/ - Express middleware (auth guards, rate limiting, request logging)
infra/ - Terraform modules and shared cloud config
build/ - shared Makefiles, Docker base images, CI templates ## Before writing anything new Check here first. If a utility, error type, or middleware you need doesn't exist, add it here rather than in the consuming module - but only if it's genuinely reusable. ## How to use from a module Import paths follow @repo/common/utils, @repo/common/errors, etc. See build/README.md for how to update the submodule reference after changes. Three things in the module files worth calling out: The decision table. When a module has two reasonable approaches for a recurring choice, a table resolves the ambiguity before the agent writes a single line. I've found it cuts back-and-forth on "which approach should I use" significantly. The numbered workflow. For complex tasks - adding a new event handler, wiring a new service, deploying a new integration - a step-by-step workflow is the single strongest pattern in my toolkit. It moves the agent from "exploring how to do this" to "executing a known process." Real code examples, sparingly. When I include code snippets, they're 3–10 lines from actual production code showing a pattern the agent should replicate. More than a few examples and the agent starts pattern-matching on the wrong one. I keep it minimal and prefer file:line pointers when the pattern is already in the codebase.

Should Code Conventions Be in AGENTS.md?
Short answer: mostly no.
The longer answer is that I spent time putting code style guidelines in these files and removing them. LLMs are in-context learners. If the codebase consistently follows certain patterns, the agent picks them up from the code it reads during the task. Restating const over let, or import ordering, or how we structure component props - none of it meaningfully changed agent behavior. It just consumed instruction budget.
The question I ask now: would a skilled engineer who knows the tech stack still get this wrong on day one in our specific codebase? If the answer is yes, it belongs in AGENTS.md. If it's just good practice that any competent developer would follow, it doesn't.
What does pass that test: which module owns what, where specific types of code are supposed to live, what shared utilities to use instead of rolling your own, cross-module boundaries that aren't obvious from the file structure, database migration naming conventions. Institutional knowledge, not craft knowledge.
For style enforcement, the pattern is the same regardless of stack: wire your formatter and linter as Stop hooks in Claude Code. They run automatically at the end of every agent session, the agent sees the output, fixes violations, and nothing needs to be restated in AGENTS.md.
Go specifically: go fmt is the language standard, not a team preference - there's nothing to document. golangci-lint with a committed .golangci.yml means the config file is the convention. Wire both as Stop hooks (go fmt ./... and golangci-lint run) and the agent corrects itself after every session. Nothing in AGENTS.md should restate what either tool already enforces.

Maintenance
A few things I've built into our process:
AGENTS.md is part of the PR checklist for any module. If a PR changes conventions, module boundaries, or shared utilities, the relevant AGENTS.md gets updated in the same PR. Stale agent docs are worse than no docs.
After significant model upgrades, I ask the agent to review its own file. I'll open a session in a module and ask: "Read this AGENTS.md and tell me if anything looks like unnecessary noise given how you currently work." Models improve at following certain patterns over time. Something that needed to be spelled out explicitly a few versions ago may now be wasted instruction budget.
I never auto-generate the file. Claude Code and AI Coding agents offer /init or similar commands to generate AGENTS.md from the codebase. I don't use them. The file goes into every session - it's the highest-leverage artifact in the agent configuration. Every line in it should be there because I decided it should be, not because an auto-generator thought it looked relevant.
Summary
Looking back, most of what I changed was fairly logical once I understood the constraint. The budget is finite. Extra context doesn't help if the agent can't act on it. Pointers work better than copies. Module-specific rules belong in module-specific files. None of this is complicated in hindsight.
What took time was understanding why things were behaving the way they were - and a lot of that came from the Augment Code data, which gave language to patterns I'd been half-noticing but couldn't explain. The structure above has held up across our modules since then. Whether it holds for yours depends on how different your setup is, but the underlying constraints are the same.