Ian Johnson

Posted on Jun 4

Multi-agent, One Harness

#ai #webdev #programming #agents

Half the team uses Claude Code. A third uses Cursor. Two engineers swear by Aider. One was running codex out of curiosity for a sprint. Five different agents reading what should have been five copies of the same conventions; predictably, they were not. The Claude Code users had the most complete CLAUDE.md. The Cursor users had .cursor/rules files that overlapped with it. The Aider users had an .aider.conf.yml plus a CONVENTIONS.md they had drifted from the others.

The team was getting drift across tools, not because the engineers disagreed about how to write code, but because each agent had a slightly different copy of the rules and nobody was syncing them.

The fix was the one nobody wanted to do first: a single source of truth, and a strategy for how each agent's tool-specific config pointed at it. That is the post. What goes in the shared file, what stays per-tool, and where the harness has to fork.

What translates cleanly

Most of the harness is plain English. "When you touch the migrations directory, run the test suite against a real Postgres, not the mocked client." That sentence is meaningful to Claude Code, to Cursor, to Aider, to any agent that reads natural language and acts on it. The rule is portable because the rule is about the codebase, not about the tool.

The convention rules port. The architectural constraints port. The escalation patterns port (mostly; the syntax for stopping varies). The "do not edit legacy/ without coordination" rule ports. The file-path scoping ports, since the directory structure is shared.

The bulk of a healthy CLAUDE.md is portable. The shape that worked for us was to put the portable rules in a file at the project root that every agent could read, and have each tool's config point at that file.

For Claude Code, that means a top-level CLAUDE.md. For Cursor, a top-level .cursor/rules/ file that imports the same content, or a symlink, or a build step that generates the tool-specific files from the shared source. For Aider, the CONVENTIONS.md flag points at the shared file. For codex, the equivalent config.

The principle is shared content, per-tool wiring. Not five copies.

What does not translate

Some rules are tool-specific in a way that does not survive the translation.

Slash commands and skills. Claude Code has slash commands and skills; Cursor has its own custom modes; Aider has macros. The rule "run the security audit via the /security-audit skill before merging" is a Claude Code-only sentence. The equivalent for the other tools is a different sentence pointing at a different command. These do not share; each tool gets its own block.

Tool-specific failure modes. Each agent has its own bad habits. Claude Code tends toward over-eager refactors. Cursor has a different signature failure on multi-file changes. Aider's failure modes around context loading are different from either. The rules that catch tool-specific failure modes belong in tool-specific configs, because they are not relevant to the other agents and they pay a token cost wherever they live.

Hook syntax and integration. Claude Code's settings.json hooks, Cursor's commands, Aider's pre-commit integration; the rules that reference these point at tool-specific machinery. The rule "the pre-commit hook will reject this; do not bypass it" is portable. The rule "the verify skill runs the checks defined in .claude/skills/verify/SKILL.md" is not.

Context window limits. Each agent has different context window math. A rule that says "if the file is more than 2,000 lines, read the section you need rather than the whole file" might be load-bearing for one tool and irrelevant for another. The rules about how the agent should manage its own context belong to the tool whose context it is.

The rule of thumb: anything about the codebase shares; anything about the tool stays.

The forking question

The hardest part is not the easy splits. The hardest part is the rules that should mostly share, with a small per-tool variation.

The escalation ladder is the classic example. "Stop and ask before editing more than three files at once" is a portable rule. The way the agent communicates the stop ("ask via the assistant message" vs. "open a side panel" vs. "post to the chat") is tool-specific. Do you write one rule or two?

The pattern that has worked: write the portable rule in the shared file, with the imperative in plain English. Each tool's config has a one-line addition that says "when this rule applies, do X in this tool" with the tool-specific mechanism.

So the shared file has the rule. Each tool's file has the implementation hook. The shared file does not balloon with five tool-specific clauses; each tool's file does not duplicate the shared rule. The fork is at the smallest possible scope.

This works for maybe 80% of the rules with a tool-specific edge. The remaining 20% are genuinely different enough that they get separate rules in separate files. Pretending they are the same does not save effort.

The sync problem

The naive solution is a single shared file plus tool-specific configs that all reference it. The reference can be a symlink, an include directive, a build step, or a convention. The agents will all read the same conventions content on every session. The drift is structurally impossible, because there is one file.

The simpler structure wins for the same reason simpler systems usually win: the failure modes are obvious. If the shared file is missing, every agent fails the same way and you fix it once. If it is wrong, you fix it once and every agent improves on the next session.

The team practice

The team practice that holds the structure together is one rule: harness changes that are portable land in the shared file. Harness changes that are tool-specific land in the tool-specific file.

The PR review enforces this. Someone adding a tool-specific clause to the shared file gets pushed back to move it. Someone adding a portable clause to a tool-specific file gets pushed back to move it the other way. The discipline is small but constant.

The second practice: when a new tool joins the team, the engineer onboarding it writes the tool-specific config and the team reviews it for what should be moved into the shared file. The review surfaces conventions that had been implicit and tool-specific without anyone noticing.

The third practice: drift audits. Once a quarter, someone reads all the tool-specific configs and checks for portable content that has snuck in. The audit takes thirty minutes and usually surfaces two or three rules that should be moved.

Where the harness has to fork

There are cases where the harness genuinely has to fork. The agent's capabilities are different. The right rule for an agent that can read a lot of context at once is a different rule than for an agent that can read very little.

Claude Code with the 1M-context model has a different harness than Claude Code with a smaller model would, if I were running both. The rule "read the full file before editing" is fine in the first case and unworkable in the second.

If your team is running agents with very different capabilities, the harness fork is real and you cannot abstract it away. The shared file is what they have in common; the tool-specific files do real work, not just integration glue.

The lesson: aim for shared, accept the fork when it is real, do not pretend a fork is shared.

What the shared file should not be

The shared file is not a dumping ground for everything anyone wanted in the harness. It is the rules that genuinely apply to any agent that touches this codebase. The bar is higher, not lower, because the file is being read by every agent on the team.

I have seen teams put their entire harness in a "shared" file and then complain that the agents behave inconsistently. The agents are reading the same file; the file is not designed for what it is being asked to do. Half the rules are tool-specific and the agents that do not need them are paying the token cost and the attention cost on every session.

The shared file is small and dense. Each tool's file is larger and has the tool-specific work. That inversion of the typical "shared = bigger" assumption is what makes the structure stable.

Keystone is agent-agnostic by design

Keystone is the scaffolder I built around this split. The harness is one directory of markdown; the agent-specific wiring is one small file per tool.

The harness lives at the project root. Four layers: corpus, guides, sensors, and flywheels. None of it knows which agent will read it. That is the design. The harness is agent-agnostic at the ground floor, not patched in after the fact.

The adapter pattern handles the rest. Each agent gets a small directory under harness/adapters/<agent>/ with three files: lifecycle.md, sensors.md, and activation.md. The activation file is what the agent reads first: CLAUDE.md for Claude Code, AGENTS.md for Codex CLI, .cursor/rules/000-harness.mdc for Cursor. Its job is to point the agent at the shared harness by convention.

Adapters for Claude Code, Codex CLI, and pi.dev are real. The rest ship with features gaps: a minimal lifecycle file and a working menu, enough to start. Gaps are handled with warning messages during init, with suggested actions to resolve the problem. There is also a generic fallthrough. It is a contract for what an adapter must do plus a default instruction to read the shared corpus. A new agent joining the team is not a rewrite; it is filling in the stub.

The win is sync. There is one directory every agent points at, by standard. If a rule changes, it changes in one place. The drift problem cannot come back because there is nowhere for it to come back from.

What I would tell someone running multiple agents

Audit your tool-specific configs for content that is actually about the codebase, not the tool. Move the codebase content into one shared file. Have each tool's config reference it.

Establish the discipline that portable content lands in the shared file and tool-specific content lands in the tool-specific file. Enforce it in PR review. Audit quarterly.

Accept that some rules genuinely have to fork. The forks are not failures of the architecture; they are the architecture telling you the truth about your tools' differences. The teams that try to flatten the forks end up with rules that fit no agent well.

The unified harness is one file the team agrees on, plus the tool-specific glue. Not one file pretending to do everything. The discipline is to keep the shared file shared and the specific files specific.

The drift goes away when the source of truth is singular. The rest is wiring.

And if you're interested, try out Keystone! 👋

DEV Community