DEV Community

Cover image for Stop Putting Everything in AGENTS.md
Shinsuke KAGAWA
Shinsuke KAGAWA

Posted on • Originally published at norsica.jp

Stop Putting Everything in AGENTS.md

If you're using Agentic Coding and find yourself explaining the same thing to the LLM over and over, you have a learning externalization problem.

The fix seems obvious: write it down in AGENTS.md (or CLAUDE.md, depending on your tool) and never explain it again.

Note: This article uses "AGENTS.md" as the generic term for root instruction files. Claude Code uses CLAUDE.md, Codex uses AGENTS.md, and other tools have their own conventions. The principles apply regardless of the specific filename.

But here's what actually happens—you keep adding rules, AGENTS.md grows to 200+ lines, and somehow the LLM still ignores half of what you wrote.

This article is about how to actually make your rules stick: where to write them, what to write, and how to verify they work.


The Real Problem

LLMs don't learn across sessions. Every conversation starts fresh. This means:

  1. You explain something once
  2. It works
  3. Next session, you explain it again
  4. And again
  5. Eventually you get frustrated

The solution is to externalize your learnings into rules. But most people do this wrong.


The Common Mistakes

Mistake What Happens
Put everything in AGENTS.md It bloats, becomes noise, important rules get buried
Put everything in code comments LLM doesn't load them into context unless you explicitly reference the file
Don't write it down at all You repeat yourself forever

The thing is, where you write a rule determines whether the LLM actually follows it.


Where to Write Rules

Not all rules belong in the same place. A simple decision tree:

When is this rule needed?
│
├─ Always, on every task → AGENTS.md
│
├─ When working on a specific feature → Design Doc
│
├─ When using a specific technology → Rule file (skill)
│
└─ When performing a specific task type → Task guidelines
Enter fullscreen mode Exit fullscreen mode

Note: "Skills" are modular rule files used in tools like Codex and Claude Code. They allow you to inject context-specific rules only when relevant. If your tool doesn't have this concept, think of them as separate rule files you reference when needed.

"Task guidelines" refers to rules that apply only during specific operations—like code review, migration, or content generation. Some call these "task rules" or "task-specific constraints."

The Full Picture

Destination Scope When Applied Examples
AGENTS.md All tasks Always Approval flows, stop conditions, project principles
Rule files (skills) Specific technology area When using that tech Type conventions, error handling patterns, function size limits
Task guidelines Specific task type When doing that task Subagent usage rules, review procedures
Design docs Specific feature When developing that feature Feature requirements, API specs, security constraints
Code comments Specific code location When modifying that code Implementation rationale, gotchas

The Key Question

Ask yourself: "Is this needed on every task in this project?"

  • Yes → AGENTS.md
  • No → Put it closer to where it's needed

This keeps AGENTS.md lean (around 100 lines) and ensures task-specific rules don't create noise for unrelated work.

You don't need to get this perfect from day one. Start with one thing: keep AGENTS.md small. That alone changes a lot.


What to Write

This is the hard part. Most people write the wrong thing.

The Principle: Write Root Causes, Not Incidents

When something goes wrong, the instinct is to document the specific incident. But this creates bias—the LLM over-fits to that one case.

❌ Bad (specific incident)
"The getUser() function in UserService was missing null check"

✅ Good (root cause / system fix)
"Always null-check return values from external APIs"
Enter fullscreen mode Exit fullscreen mode

The first one only helps if the LLM encounters that exact function again. The second one prevents the entire class of errors.

Specific Incident vs. Root Cause

Aspect Specific Incident Root Cause
Applies to That one location All similar cases
Prevents recurrence Weakly (same bug elsewhere) Strongly (operates as principle)
Bias risk High (overfitting) Low (generalizable)

Finding the Root Cause

When you encounter an issue, ask:

  1. Why did this mistake happen? (direct cause)
  2. Why wasn't it prevented? (system gap)
  3. Where else could this same mistake occur? (scope)

Example:

  • Direct cause: getUser() was missing null check
  • System gap: We trusted external API return values without validation
  • Scope: All external API calls

Rule to write: "Always null-check return values from external APIs"


How to Verify Rules Work

This is the step most people skip—and it's critical.

The Principle: Fix the System, Then Discard and Retry

When you add or modify a rule in AGENTS.md or a skill file, you need to verify it actually works. The only way to do this:

  1. Add/modify the rule
  2. Discard the current artifact (or stash it in a branch)
  3. Start a new session with the updated rules
  4. Re-run the same task
  5. Verify the issue doesn't recur
Continue with existing artifact after rule change → ❌
Discard and restart with new rules → ✅
Enter fullscreen mode Exit fullscreen mode

Why This Matters

If you keep the existing artifact and just continue, you're still operating in a context polluted by the old system. The new rule might not get properly applied because:

  • The existing artifact carries biases from before the rule existed
  • The LLM might try to "reconcile" the new rule with existing work rather than applying it cleanly
  • You can't tell if the rule actually works or if you just manually fixed the symptom

Verification Checklist

  • [ ] Modified the rule (AGENTS.md / skill file / task guideline)
  • [ ] Discarded current artifact (or moved to a branch)
  • [ ] Started new session with updated rules
  • [ ] Re-ran the same task
  • [ ] Confirmed the issue doesn't recur

For small changes, you can stash instead of discard. The key is: test the system in isolation.


When to Write Rules

Not every issue deserves a rule. Some guidance:

Situation Write a Rule? Rationale
You explained the same thing twice Yes Prevent the third time
Encountered unexpected behavior Maybe Find root cause first
Task completed successfully Maybe Retrospective—any generalizable insights?
Found a serious bug Yes Prevent recurrence

Warning Signs You're Over-Documenting

  • AGENTS.md exceeds 100 lines
  • A single rule file exceeds 300 lines (~1,500 tokens)
  • Rules take more than 1 minute to read through
  • You find yourself thinking "is this really needed every time?"
  • Rules contradict each other

If you see these signs, it's time to prune. Rule maintenance includes deletion.


How to Write Rules (Cheat Sheet)

This section is a reference. You don't need to read it all now—come back when you're actually writing a rule. The rest of the article stands on its own.

1. Minimum Viable Length

Context is precious. Same meaning, shorter expression. But don't sacrifice clarity for brevity.

❌ Verbose (38 chars)
If an error occurs, you must always log it

✅ Concise (20 chars)
All errors must be logged

❌ Too short (unclear)
Log errors
Enter fullscreen mode Exit fullscreen mode

2. No Duplication

Same content in multiple places wastes context and creates update drift.

❌ Duplicated
# base.md
Standard error format: { success: false, error: string }

# api.md
Errors use { success: false, error: string } format

✅ Single source
# base.md
Standard error format: { success: false, error: string }
Enter fullscreen mode Exit fullscreen mode

3. Measurable Criteria

Vague instructions create interpretation variance. Use numbers and specific conditions.

✅ Measurable
- Functions: max 30 lines
- Cyclomatic complexity: max 10
- Test coverage: min 80%

❌ Vague
- Readable code
- Sufficient testing
Enter fullscreen mode Exit fullscreen mode

4. Recommendations Over Prohibitions

Banning things without alternatives leaves the LLM guessing. Show the right way.

✅ Recommendation + rationale
【State Management】
Recommended: Zustand or Context API
Reason: Global variables make testing difficult, state tracking complex
Avoid: window.globalState = { ... }

❌ Prohibition list
- Don't use global variables
- Don't store values on window
Enter fullscreen mode Exit fullscreen mode

5. Priority Order

LLMs pay more attention to what comes first. Lead with the most important rules.

## Critical (Must Follow)
1. All APIs require JWT authentication
2. Rate limit: 100 requests/minute

## Standard Specs
- Methods: Follow REST principles
- Body: JSON format

## Edge Cases (Only When Applicable)
- File uploads may use multipart
Enter fullscreen mode Exit fullscreen mode

6. Clear Scope Boundaries

State what the rule covers—and what it doesn't.

## Scope

### Applies To
- REST API endpoints
- GraphQL endpoints

### Does Not Apply To
- Static file serving
- Health checks (/health)
Enter fullscreen mode Exit fullscreen mode

The Feedback Loop

This is how it all fits together in practice:

[Working with LLM]
       │
       ├─ Issue occurs
       │      │
       │      ▼
       │  Find root cause (not just symptom)
       │      │
       │      ▼
       │  Decide where to write (AGENTS.md? Skill? Task guideline?)
       │      │
       │      ▼
       │  Write the rule
       │      │
       │      ▼
       │  Discard current work
       │      │
       │      ▼
       │  New session with updated rules
       │      │
       │      ▼
       │  Verify issue doesn't recur
       │
       ▼
[Continue working]
Enter fullscreen mode Exit fullscreen mode

The goal is to reach a state where you never explain the same thing twice. Every explanation either:

  • Gets externalized into a rule, or
  • Was truly a one-off that doesn't need capturing

Passing Feedback Correctly

One more thing: when you give feedback to the LLM, don't just paste error logs. Include your intent.

❌ Just the error
[Stack trace]

✅ Intent + error
Goal: Redirect to dashboard after user authentication
Issue: Following error occurred
[Stack trace]
Enter fullscreen mode Exit fullscreen mode

Without the intent, the LLM optimizes for "make the error go away." With the intent, it optimizes for "achieve the goal while resolving this error."

These are very different things.


Anti-Pattern Summary

Quick reference if you want to check your current practices:

Anti-Pattern Reference
Put everything in AGENTS.md → "Where to Write Rules"
Write specific incidents instead of root causes → "What to Write"
Continue with old artifacts after changing rules → "How to Verify Rules Work"
List only prohibitions without recommendations → "How to Write Rules" #4
Keep explaining instead of writing it down → "When to Write Rules"

Key Takeaways

  1. AGENTS.md is not a dumping ground. Only rules needed on every task belong there. Everything else goes closer to where it's used.

  2. Write root causes, not incidents. "Null-check external API returns" beats "UserService.getUser() was missing null check."

  3. Test your rules. After adding a rule, discard current work and re-run. If the issue recurs, the rule isn't working.

  4. Maintenance includes deletion. If AGENTS.md is over 100 lines, you've probably over-documented. Prune ruthlessly.

  5. Explain twice, document once. If you're explaining the same thing for a second time, stop and externalize it.


What's Next

This article covered where to put your rules so they actually stick. In the next article, I'll cover how planning turns execution into verification—and why that's the key to consistent LLM output.

If your AGENTS.md is already bloated—what finally made you realize it was time to stop adding?


The Research

The practices in this article are grounded in LLM research:

SALAM (Wang et al., 2023): LLM self-feedback is often inaccurate. Structured feedback from external agents (or externalized rules) is more effective.

LEMA (An et al., 2023): Learning from mistakes (error → explanation → correction) improves LLM reasoning ability—but this requires explicit externalization of what was learned.

Feedback Loop for IaC (Palavalli et al., 2024): Feedback loop effectiveness decreases exponentially with each iteration and plateaus. This supports the "discard and restart" approach over endless iteration in the same context.

Reflexion (Shinn et al., 2023): Combining short-term memory (recent trajectory) with long-term memory (past experience) enables effective self-improvement. Externalized rules function as that long-term memory.


References

  • Wang, D., et al. (2023). "Learning from Mistakes via Cooperative Study Assistant for Large Language Models." arXiv:2305.13829
  • An, S., et al. (2023). "Learning From Mistakes Makes LLM Better Reasoner." arXiv:2310.20689
  • Palavalli, M. A., et al. (2024). "Using a Feedback Loop for LLM-based Infrastructure as Code Generation." arXiv:2411.19043
  • Shinn, N., et al. (2023). "Reflexion: Language Agents with Verbal Reinforcement Learning." NeurIPS 2023.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.