DEV Community

demandt
demandt

Posted on

A two-layer documentation pattern for AI coding agents

TL;DR

  • Keep CLAUDE.md short (under ~300 lines). It's auto-loaded into every session and competes with the system prompt for a limited instruction budget.
  • Put deep detail in agent_docs/ and route to it with a well-written table in CLAUDE.md.
  • Write routing-table triggers as task descriptions (what someone would say when assigning work), not as content summaries.
  • Make multi-doc relationships explicit ("See also" rows, "Common multi-doc tasks" section).
  • Audit and maintain the routing table; stale triggers degrade every future session.
  • Evidence from a case study below: good documentation prevented two functional defects that would have pushed broken code.

Introduction

A new team member usually doesn't ship working code on their first day. They read the documentation, ask questions, learn the conventions, and gradually internalise how this particular codebase does things. An AI coding agent has no such ramp-up. Every session, it starts fresh, knowing nothing about the project beyond what's in front of it.

That gap is what this article is about. In our experience, one of the most impactful things you can do for an AI agent is giving it the same kind of onboarding documentation you'd give a human, structured so the agent reads the right bits at the right time.

Practically, that means two layers: a short CLAUDE.md file at the repository root (read on every session) and a folder of reference docs the agent pulls in only when a task calls for them. The rest of this article walks through the pattern, the lessons we learned writing it, and a case study comparing what two agents produced when given the same task, one with the documentation and one without.

This page covers:

  • The documentation approach: a two-layer model (CLAUDE.md + agent_docs/) that gives the agent the right context at the right time, plus best practices for writing effective routing tables.
  • Evidence: a case study where the same task was implemented with and without agent documentation, showing concrete differences in code quality, architectural correctness, and functional completeness.

This assumes basic familiarity with VS Code and AI coding assistants, but not with Claude Code specifically. New to Claude Code? See Appendix A for a quick orientation; everything below assumes you have it installed. The patterns generalise to any codebase; the case study at the end uses an abstracted example so it's easy to translate to your own project.

Why context management matters

When Claude encounters a project for the first time, it has zero context: it doesn't know the architecture, naming conventions, or pitfalls. Without guidance, it spends a significant portion of its context window exploring, and can quickly run into context rot: the point where working memory is so saturated that the agent starts confabulating (the technical term for what's colloquially called "hallucinating").

This is closely related to the "Lost in the Middle" phenomenon. Research by Liu et al. (2023) found that:

"Performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts."

This has a direct practical implication: dumping all documentation into the agent's context at once doesn't just waste the context budget; it actively degrades performance because the model struggles to retrieve the right information from the middle of a large context.

A guiding principle: progressive disclosure

The solution is progressive disclosure: give the agent the right information at the right time, not everything at once. The pattern described below (a lightweight CLAUDE.md plus an agent_docs/ folder) is inspired by HumanLayer's blog post on writing a good CLAUDE.md. In practice, progressive disclosure means:

  • Always loaded: a lightweight CLAUDE.md file that orients the agent and tells it where to find deeper information.
  • Loaded on-demand: detailed reference docs that the agent reads only when the task requires them.
  • Cross-referenced: docs link to related docs so that the agent pulls in adjacent knowledge when needed.

This is the foundation of the two-layer documentation model described in the next section.

Instruction budget

There's a quantitative dimension to this too. Research by Jaroslawicz et al. (2025) found that frontier thinking models can reliably follow roughly 150–200 instructions, and compliance decays as the count rises. Two findings are particularly relevant:

  • Uniform degradation: as instruction count increases, compliance drops uniformly across all instructions, not just the newer ones. Adding low-value instructions makes the high-value ones less reliable too.
  • Peripheral bias: LLMs attend more strongly to instructions at the very beginning and very end of a prompt, consistent with the "lost in the middle" finding above.

Claude Code's own system prompt already consumes a meaningful fraction of this budget before any user content is added. (The exact count isn't directly comparable to the benchmark, since the paper counts discrete benchmark instructions rather than measuring Claude Code specifically.) However, the directional point still holds: your CLAUDE.md competes for the remainder, and every line you add erodes compliance with every other line.

Practical length guidance: community consensus is to keep CLAUDE.md under 300 lines. HumanLayer's own root CLAUDE.md is under 60 lines. Only include instructions that are universally applicable across all tasks; task-specific details belong in agent_docs/.

All of this matters because CLAUDE.md sits at the top of a potential error cascade:

CLAUDE.md → research → plan → code

A flawed line of code is one bad line. A flawed line in a plan produces many bad lines of code. But a flawed line in CLAUDE.md affects every single phase of every session and every artifact produced. This makes CLAUDE.md one of the highest-leverage files in the entire project when applying agentic reasoning and solutions. As such, it deserves a lot of editorial care.

CLAUDE.md: what it is and how to use it

How Claude uses the CLAUDE.md file

CLAUDE.md is auto-loaded. Every time a Claude Code session starts in the project directory, CLAUDE.md is injected into the conversation context automatically, before any user prompt is processed.

A few things to keep in mind:

  • The agent_docs are NOT auto-loaded. Only CLAUDE.md is. The files in agent_docs/ are read on-demand, which is why the routing-table descriptions are critical.

  • The routing table is the key mechanism. The table in CLAUDE.md maps task intent to documentation:

Document When to read
architecture.md Understanding project structure...
writing_tests.md Naming conventions, AAA structure...

When Claude receives a task, it matches the task's intent against the "When to read" column and reads the relevant doc(s) before acting.

The quality chain:

Clear routing description → Claude picks the right doc → doc provides patterns/conventions → Claude produces consistent code.

This means every new doc needs both its content and an updated row in the routing table with a descriptive "When to read" trigger.

The two-layer documentation model

Layer 1: CLAUDE.md (the entry point)

This file is read by the agent at the start of every session, so it must be deliberately short and fast to scan. It contains:

  • What the project is: one paragraph of orientation.
  • Quick commands: build/test commands ready for copy-paste.
  • A doc-loading directive: a one-line instruction telling the agent to read every doc whose trigger matches the task (see Use hard directives, not soft nudges below).
  • A routing table: a table mapping task types to the relevant agent_docs files.
  • A "Common multi-doc tasks" section: an optional follow-up table making frequent doc combinations explicit (see Make doc relationships explicit below).

Appendix B shows a complete worked example with all of these in place.

Layer 2: agent_docs/ (deep-dive guides)

Detailed reference docs that the agent reads only when the routing table tells it to. A typical structure organised by category:

Category Example docs Purpose
Architecture architecture.md Project structure, execution flows, overall data model
Build & run building_the_project.md Frameworks, package feeds, platform-specific details
Testing running_tests.md, writing_tests.md How to run, write, and debug tests
New features adding_a_new_feature.md, code_conventions.md, di_registration_guide.md Step-by-step recipes and style rules
Domain logic domain_reference.md, integration_patterns.md How the core business/domain layer and external integrations work
UI / Frontend ui_patterns.md, views_reference.md, components_reference.md UI framework patterns, data binding, views, and controls

The key insight is that the agent only reads what it needs. An agent working on a UI change reads the UI docs; one adding a new feature reads the recipe + conventions + DI guide. Loading every doc for every task would waste the context budget and dilute focus.

Prefer pointers to copies. When writing agent_docs, avoid embedding code snippets directly; the risk that they go stale as the codebase evolves is high. Instead, use file:line references pointing to the actual source code (e.g. See CompositionRoot.cs:45 for the registration pattern). The agent reads the referenced file at task time, ensuring it always sees the current code.

What this looks like in practice: with the documentation in place, the agent follows actual project conventions instead of inventing its own, knows the step-by-step recipe for common tasks, avoids known pitfalls (misspelled folders, DI boundary rules, async patterns), and runs verification tests after risky changes. In short, it produces code that looks like the rest of the codebase.

Writing a good routing table

Iterating on our own routing table surfaced three recurring problems and the fixes that significantly improved how reliably Claude picks the right documentation.

Use hard directives, not soft nudges

The original directive said: "Read the relevant doc before modifying that area."

This sounds reasonable to a human, but the agent interpreted it too conservatively; it read zero or one doc, even when the task required two or three. The fix was to make the instruction explicit:

"Read all docs whose 'When to read' column matches the task. Most tasks require more than one."

Takeaway: vague guidance like "read the relevant ones" doesn't perform well. Explicit instructions like "read all matches" give the agent a concrete decision rule.

Write triggers as task descriptions, not content summaries

The original triggers described what each doc contains: implementation-specific phrases like "event bus subscription lifecycle" or "dual storage system." But when someone asks the agent to "publish an event" or "save a record," those phrases don't match.

In practice, the agent picks docs based on how well the trigger text overlaps with the words and concepts in the task description. So triggers work best when they use the same action-oriented vocabulary that people use when assigning tasks.1

For example, suppose a doc covers the project's messaging layer. A trigger like "message bus subscription lifecycle, channel registration" describes the contents accurately but won't match a user saying "publish an event" or "subscribe to updates." A better trigger is "publishing events, subscribing to events, sending messages between services". This uses the same vocabulary the user actually uses.

More examples of weak vs. strong triggers, paired by doc:

Doc Weak trigger Strong trigger Why the strong version works
architecture.md "Understanding project structure" "orienting in the codebase, where major components live, how data flows through the system" Action-oriented; multiple distinct phrases someone would actually use
code_conventions.md "Layered command naming, async rules, error-handling contract" "naming a new class, async/await rules, error handling, validating input" Uses the verbs people reach for when assigning a task
di_registration_guide.md "Registration ownership and lifetime configuration" "DI container, service registration, lifetime rules, common mistakes" Multiple distinct match points in vocabulary people search for

Common vocabulary-mismatch patterns:

  • Implementation names instead of intent. A doc triggered on "SignalR connection lifecycle" won't match a user saying "call the backend service." The user usually doesn't know (or care) which transport library is involved.
  • Internal terminology instead of user-facing actions. A doc triggered on "dual storage layer" won't match "save the record." Use the verb someone would type into an issue tracker.
  • Overlapping triggers with no disambiguation. If writing_tests.md and running_tests.md both just trigger on "test," the agent might read the wrong one first. Add discriminators ("how to author a test" vs. "how to execute the suite").

Takeaway: write triggers from the perspective of "what task would someone assign?", not "what does the document cover?"

Make doc relationships explicit

Some docs almost always belong together (e.g. writing_tests.md + running_tests.md), but a flat routing table gives no signal about this. Similarly, adding_a_new_feature.md and code_conventions.md may both mention naming conventions; if someone says "add a feature," which one triggers? Both should, but the table doesn't indicate they're needed together. The same overlap exists for ui_patterns.md and views_reference.md: someone saying "modify a view" or "change the UI" could need either or both.

Two fixes:

  • "See also" cross-references in the relevant routing-table rows.
  • A "Common multi-doc tasks" section below the table with explicit combinations.

For example, the routing table could include a section like:

Task Read together
Adding a new feature adding_a_new_feature.md + code_conventions.md + di_registration_guide.md
Writing a test for a new feature writing_tests.md + running_tests.md + di_registration_guide.md
Modifying a view ui_patterns.md + views_reference.md
Calling an external service integration_patterns.md + domain_reference.md

Takeaway: don't assume the agent will infer relationships between docs. Make them explicit.

Additional tips

Don't use CLAUDE.md for style enforcement. It's tempting to add formatting rules, naming conventions, and code-style instructions to CLAUDE.md, but this bloats the context with rules that are irrelevant to most tasks. LLMs are in-context learners, so if the codebase follows consistent patterns, the agent will generally follow them without explicit instructions. Instead, configure a Claude Code Stop hook that runs your formatter/linter automatically after code changes and feeds errors back to Claude. This is faster, cheaper, and more reliable than instruction-based style enforcement.

Treat /init output as a skeleton, not a deliverable. Claude Code's /init command auto-generates a CLAUDE.md, but the result is generic and perhaps even unfocused. It's fine as a starting point, but expect to rewrite most of it. Given that CLAUDE.md sits at the top of the error cascade, every line should be deliberate.

How to phrase task assignments

Even with a well-tuned routing table, how you phrase a task to Claude affects which docs it reads. There are three escalation levels:

Level 1: mention the area. Works when the routing table triggers are well-written and the task maps cleanly to a single area:

"Add a new CLI command for archiving inactive users."

Claude will usually match this to the right doc, but as the examples above show, vague triggers or multi-area tasks can cause it to read 0–1 docs when 2–3 are needed. Use Level 2 or 3 when the task spans multiple areas.

Level 2: name the doc explicitly. Removes all guesswork:

"Read agent_docs/integration_patterns.md first, then implement the API call."

Level 3: quote the directive. Both points to the doc and tells Claude to treat it as a constraint:

"Following the patterns in adding_a_new_feature.md, add an archive-user command."

The main failure mode is when a task spans multiple areas but the phrasing only triggers one doc. For example, "add a UI command that calls a backend service" touches adding_a_new_feature.md, integration_patterns.md, di_registration_guide.md, and ui_patterns.md. Claude might only pick up 1–2. In those cases, Level 2 or 3 is safest.

Practical tip: for complex tasks, open the session with:

"Before making any code changes, read all relevant docs from agent_docs/ and confirm which ones you've read."

This forces the agent to be deliberate about its doc selection and lets you verify before it starts coding.

How this connects to the routing table: the better the triggers are written, the less you need to rely on explicit doc references in your task phrasing. A well-tuned routing table means Level 1 works most of the time; a poorly tuned one requires Level 2 or 3 for every task.

Case study: with documentation vs. without documentation

To test whether the two-layer documentation model actually makes a difference, the same task was run through two agents in a real codebase:

Task: add a new CLI command that takes two arguments (a list of entity IDs and a boolean flag) and calls into the service layer to perform the operation.

One agent had a full CLAUDE.md routing table and four agent_docs/ files; the other had no guidance documentation at all. The codebase used a layered architecture: CLI verbs registered in a central registry, commands resolved via a DI container, and a coordinator service that lazily obtains a client connection.

Note on the example. The task and codebase have been described in generic terms (CLI command, entity IDs, coordinator service, etc.) rather than using the original class, file, and feature names. This is deliberate: it keeps internal implementation details out of a public write-up, and the patterns translate more cleanly to other codebases when they're not tangled up with a single project's vocabulary.

Caveat on scope. This is a single task run once per condition, so the findings are illustrative rather than statistically significant. The specific defects shown below are what happened this time; a different task might expose different gaps. Treat this as a case study, not a controlled experiment.

Result summary

Aspect With docs Without docs
Works end-to-end Yes No (two independent failures)
CLI verb registered in the framework registry Yes No (command invisible to the parser)
Correct DI injection Coordinator service (registered) Low-level client (not registered, crashes)
Tests catch the bug Yes (DI-based setup would fail on wrong registration) No (direct construction bypasses DI)

Both agents produced code that compiles. But only the agent guided by a well-written CLAUDE.md file produced code that works.

Functional failures (without docs)

Failure 1: missing CLI registration. Command is invisible. The without-docs agent never added the new command type to the framework's verb-registry array. The CLI parser only recognises verbs listed in that array, so the command is fully wired in the DI container but unreachable from the command line. The with-docs agent got this right on the first pass because the recipe doc explicitly listed this step.

Failure 2: DI resolution crash. The without-docs agent injected the low-level service client directly, but that type is never registered in the DI container. Every other command in the codebase resolves a higher-level coordinator interface and lazily obtains the client through it. The unregistered injection throws an exception at startup, crashing before the command ever executes.

Why the tests didn't catch this: the without-docs test bypasses DI entirely; it directly constructs the command with mocked dependencies. This sidesteps both failures: it never touches the verb registry and never resolves through the DI container. The with-docs agent's tests use the real DI container with mocked services, which would have surfaced the misregistration.

Architectural and convention differences

Beyond the two functional failures, the without-docs agent diverged from project conventions in many smaller ways:

Aspect With docs Without docs
Class visibility internal sealed (matches existing commands) public (breaks convention; only works because tests construct it directly)
Injected dependency Coordinator → lazy client lookup Raw client directly (not registered, crashes)
Exception for unused members Type specified by the docs Different type; matches the de facto majority of existing code but inconsistent with the docs
Constructor parameter order Logger factory first (matches all commands) Logger last (reversed)
Help-text casing Lowercase first letter (matches convention) Capitalised (breaks convention)
Input validation Explicit null + empty-collection checks Cast-and-null-check only; missed empty-collection case
Validation placement Before try/catch (fail-fast) Inside try/catch (error gets caught and re-logged)
Error message style Neutral, informational Prefixed with "ERROR:" (redundant: the exception already signals an error)
Test assertion library Matched the existing test project Different library/style
Test namespace Correct project namespace Plausible-looking but wrong namespace

Files changed

File With docs Without docs
Param interface Created (identical) Created (identical)
Core command class Created Created
CLI wrapper class Created Created
DI registration file Modified Modified
Verb-registry file Modified (added the new verb) Not touched (bug 1)
Unsolicited README Not created Created
Unit test file Created Created

The without-docs agent also created an unsolicited README.md for the new feature, a common pattern when nothing in the prompt constrains scope.

Bottom line

Documentation prevented two functional defects. The step-by-step recipe explicitly included the verb-registration step. The DI guide made clear which interface was the correct injection point. Without that guidance, the agent made two reasonable-looking but fundamentally broken choices.

Maintaining the docs

Agent documentation is only as useful as it is accurate. Stale docs are worse than no docs; they actively mislead the agent into reproducing outdated patterns. A few lightweight practices keep the system healthy:

  • Ownership. Assign an owner (or rotating owner) for CLAUDE.md and the routing table. Individual agent_docs/ files can be owned by whoever owns the area they describe.
  • Update when you change the pattern. If you change a convention the docs describe (e.g. switch DI containers, rename a folder, change the test-assembly naming), update the relevant doc in the same PR. Treat docs as part of the code change.
  • Prefer pointers over copies. As noted earlier, reference source files with file:line pointers rather than embedding code. This reduces the maintenance burden.
  • Review the routing table periodically. Quarterly is a reasonable default. Check that triggers still match how people phrase tasks, that no docs have been added without a routing row, and that removed docs no longer appear.
  • Watch for symptoms of rot. If an agent repeatedly picks the wrong doc, produces code that ignores a convention, or asks questions the docs should answer, the trigger text or the doc itself needs revision.

Appendix A: Claude Code and the VS Code extension

Claude Code is Anthropic's agentic coding tool. It's available across several surfaces (a terminal CLI, a VS Code extension, a JetBrains plugin, a desktop app, and a web interface), all backed by the same engine, so CLAUDE.md, settings, and MCP servers carry across them (Anthropic, n.d.-a). This guide focuses on the VS Code extension, which provides a graphical panel inside the IDE and ships with the CLI bundled for use in the integrated terminal (Anthropic, n.d.-b). For current detail on installation, supported models, and configuration, refer to those docs directly. The rest of this guide is about how to feed the agent good context, not which surface or model you choose.

Appendix B: a sample CLAUDE.md

The example below is a complete CLAUDE.md you can adapt. The doc names (architecture.md, adding_a_new_feature.md, etc.) match the ones used in the article and can be reused as-is in most projects. The placeholders in angle brackets (the orientation paragraph and the build/test commands) are the only parts you genuinely need to customise per project.

# CLAUDE.md

## What this project is

<One paragraph: what the system does, who uses it, the major components and how
they fit together. 3–5 sentences max. The goal is to orient the agent, not to
teach it the codebase; the agent_docs/ folder does that.>

## Quick commands

```bash
# Build
<build command>

# Run all tests
<test command>

# Run a single test by name
<filtered-test command>
```

## How to use the docs in agent_docs/

Read **all** docs whose "When to read" column matches the task. Most tasks
require more than one. Don't guess at conventions: the docs exist precisely
because guessing produces code that looks plausible but breaks the existing
patterns.

| Document | When to read |
|---|---|
| [architecture.md](agent_docs/architecture.md) | Orienting in the codebase; understanding execution flow, layering, where major components live |
| [building_the_project.md](agent_docs/building_the_project.md) | Build failures, dependency issues, target frameworks, package feeds, platform-specific setup |
| [running_tests.md](agent_docs/running_tests.md) | Running the test suite, filtering tests, debugging test failures |
| [writing_tests.md](agent_docs/writing_tests.md) | Authoring a new test: naming conventions, fixture setup, mocking style, assertion library |
| [adding_a_new_feature.md](agent_docs/adding_a_new_feature.md) | Adding a new command, endpoint, or feature: step-by-step recipe |
| [code_conventions.md](agent_docs/code_conventions.md) | Naming, async patterns, error handling, validation style, exception choice |
| [di_registration_guide.md](agent_docs/di_registration_guide.md) | Wiring up a new service, choosing a lifetime, fixing DI resolution errors |
| [integration_patterns.md](agent_docs/integration_patterns.md) | Calling an external service, publishing/subscribing to events, retry behaviour |
| [domain_reference.md](agent_docs/domain_reference.md) | Working with core domain types, IDs, and shared value objects |

### Common multi-doc tasks

| Task | Read together |
|---|---|
| Adding a new feature | `adding_a_new_feature.md` + `code_conventions.md` + `di_registration_guide.md` |
| Writing a test for a new feature | `writing_tests.md` + `running_tests.md` + `di_registration_guide.md` |
| Calling an external service | `integration_patterns.md` + `domain_reference.md` + `code_conventions.md` |
| Debugging a DI resolution failure | `di_registration_guide.md` + `architecture.md` |
Enter fullscreen mode Exit fullscreen mode

A few notes on why it's structured this way:

  • The doc-loading directive comes immediately before the table. It's a hard rule the agent needs every time, and putting it adjacent to the artefact it governs makes it harder to miss.
  • Every row lists multiple match points. Each "When to read" cell uses several action-oriented phrases (e.g. "authoring a new test: naming conventions, fixture setup, mocking style") so a user saying any of them reliably triggers the doc.
  • "Common multi-doc tasks" is a separate table. The main table answers "which doc handles X?" The multi-doc section answers "which combinations belong together?" Mixing them dilutes both.
  • What was deliberately left out. No code-style rules, no formatter configuration, no per-language guidance. Those belong in code_conventions.md or a Stop hook, not in CLAUDE.md.

Appendix C: example project layout

A typical repository using this pattern looks like:

my-project/
├── CLAUDE.md                ← auto-loaded into every Claude Code session
├── agent_docs/              ← read on-demand via the routing table
│   ├── architecture.md
│   ├── building_the_project.md
│   ├── running_tests.md
│   ├── writing_tests.md
│   ├── adding_a_new_feature.md
│   ├── code_conventions.md
│   ├── di_registration_guide.md
│   ├── integration_patterns.md
│   └── domain_reference.md
├── src/
│   └── ...                  source code
├── tests/
│   └── ...                  test code
└── README.md                ← human-facing entry point
Enter fullscreen mode Exit fullscreen mode

A few conventions worth noting:

  • CLAUDE.md sits at the repository root. Claude Code looks for it there; nesting it inside agent_docs/ means it won't be auto-loaded.
  • agent_docs/ is a flat folder. Subdirectories work, but a flat layout keeps the routing-table paths short and easy to scan.
  • README.md and CLAUDE.md serve different audiences. README is for humans landing on the repo for the first time. CLAUDE.md is for the agent. There will be overlap (project summary, build commands), but resist the urge to merge them. The audiences and length constraints are very different.
  • The doc set scales with the codebase. Nine files is a reasonable starting point; large monorepos often grow to 15–20. The limit is not the file count but the routing table's ability to discriminate between them. If two docs frequently get confused, that's a signal to either merge them or sharpen their triggers, not to add a third.

References

Anthropic. (n.d.-a). Overview. Claude Code documentation. Retrieved from https://code.claude.com/docs/en/overview

Anthropic. (n.d.-b). Use Claude Code in VS Code. Claude Code documentation. Retrieved from https://code.claude.com/docs/en/vs-code

HumanLayer. (2025, November 25). Writing a good CLAUDE.md. https://www.humanlayer.dev/blog/writing-a-good-claude-md

Jaroslawicz, D., Whiting, B., Shah, P., & Maamari, K. (2025). How many instructions can LLMs follow at once? (arXiv:2507.11538). arXiv. https://arxiv.org/abs/2507.11538

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the middle: How language models use long contexts (arXiv:2307.03172). arXiv. https://arxiv.org/abs/2307.03172

Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T.,
Mann, B., Askell, A., Bai, Y., et al. (2022). In-context learning and
induction heads
(arXiv:2209.11895). arXiv.
https://arxiv.org/abs/2209.11895

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need
(arXiv:1706.03762). arXiv. https://arxiv.org/abs/1706.03762


  1. How this works under the hood: there's no separate routing engine. Claude Code injects the entire CLAUDE.md into the model's context at session start, so when you send a task the model sees both your prompt and the routing table simultaneously. Choosing which doc to read is just another next-token prediction, conditioned on the standard transformer attention mechanism (Vaswani et al., 2017). The specific behaviour that makes routing tables work is what Olsson et al. (2022) call induction heads: attention circuits that learn to complete patterns of the form [A][B] ... [A] → [B]. A routing-table entry is exactly this pattern: the trigger text is A, the doc filename is B. When the user's task contains tokens resembling A, the model attends to that row and is more likely to emit a tool call referencing B. This is why triggers that share vocabulary with how users phrase tasks get picked up more reliably. 

Top comments (0)