Lean context architecture for multi-agent pipelines

#claude #multiagent #devtools #ai

In a second week of my project I noticed my agents were getting slower and dumber. When I checked the logs, I realized why: every single agent was loading the entire project history just to write one line of copy.

That file was 254 lines, around 3,400 tokens. Then there was a separate AGENTS.md — 267 lines, about 3,050 tokens. Two more root files added another 2,200. Every agent loaded all four. Roughly 8,700 tokens just to orient, most of it irrelevant to whatever that agent actually needed to do.

This is the refactor that fixed it.

255 lines and everyone reads everything

Token waste adds up (8,700 per invocation, across 8 agents, across multiple sessions), but that's not the real problem. The real problem is noise diluting signal. An agent working from clean, relevant context does better work than one working from an everything-file where most of what it reads doesn't apply to its job.

I should have seen it earlier. I didn't.

The load-only-if-needed pattern

The fix was just a tedious afternoon of moving files around. Four root files collapsed into one: CLAUDE.md, now at ~1,070 tokens. Pipeline overview, agent roster, done criteria, chain lineup, dev commands — the minimum every agent needs to orient.

Then each agent gets its own spec: agents/[role]/AGENT.md. Small files on purpose. The Copywriter's runs ~420 tokens. QA is ~235. Each covers only what that role needs — inputs, outputs, what it doesn't do, and references to load only when relevant.

Every one of them ends the same way: Await your brief. It will contain all week-specific state. Agents spawn stateless. Stable context (who they are, what they do) lives in the AGENT.md. Week-specific facts arrive in the brief for that session. The separation sounds obvious in retrospect. It wasn't obvious when everything was baked into one file.

Per-agent context load: ~8,700 tokens → ~1,385. Agent output got more focused. Fewer irrelevant references, fewer moments where the wrong agent wandered into territory that wasn't its job. This is also a Claude Code pipeline, and the refactor improved what the agents actually produced — less noise in means less noise out.

Cleaning up the broken links

Every structural rename silently breaks briefs that reference the old path. You move a file, run the pipeline a week later, something fails in a confusing way, you trace it back to a path nobody updated.

The fix is one habit:

grep -r "old-filename" .

After collapsing the four root files, I ran it and found 18 files still pointing to deleted names. 11 active files got updated. 7 were historical records — intentionally left, because those paths were accurate when those files were written. That's not sloppiness. It's a triage decision, and it's worth making deliberately rather than stumbling into it.

The checksum refactor workflow

The grep habit catches stale references. The checksum workflow is what makes large-scale moves tractable in the first place.

Before touching anything structural, snapshot checksums of every file:

find . -type f -not -path '*/.*' | sort | xargs md5sum > checksums_before.txt

Do the moves. Then run it again:

find . -type f -not -path '*/.*' | sort | xargs md5sum > checksums_after.txt

Same hash, different path means the same file moved. Diff the two outputs and you have a rename map automatically — no manual tracking of what went where. Then feed each old path into grep.

This project moved 76 files in one session using this approach. Without the checksum diff, correlating old references to new paths across multiple terminal sessions would have been a manual tracking problem. With it, the rename map came out of a script. The combination is: checksum diff builds the map, grep burns through the tree, triage decides what to update and what to leave as historical record.

YAML vs Markdown — when each wins

One design rule that came out of this: YAML for status documents, Markdown for instruction documents.

State passed between pipeline stages — session status, open items, key facts — is short, structured, needs to be machine-readable. YAML fits. Agent briefs with code blocks, nested tables, and ordered research questions need Markdown. Forcing those into YAML adds syntax noise and makes them harder to parse.

The rule: if an agent reads it to understand what to do, use Markdown. If a pipeline stage produces it for the next stage to consume, use YAML. That distinction maps to a lot of systems beyond this one.

The honest tradeoff

This architecture has overhead. For a solo one-week project, one context file is fine. You don't need any of this.

The argument is only load-bearing when agents specialise and when the pipeline runs across multiple sessions — which is exactly this project's structure. If you're wiring up a couple of agents that each run once, skip it.

It's better now, but I'm still keeping an eye on that 1,000-token root file. It’s already starting to grow again.