Alex Chen

Posted on May 3

The Architectural Shape Hint: A Spec-Time Trick That Lets 10 AI Agents Run in Parallel Without Stepping on Each Other

#agents #ai #architecture #softwareengineering

I run agent swarms now. Not "an agent" — agents, plural, in flight at once, each working on a different feature against the same repo. Ten agents per session is normal. Twenty isn't unusual when the spec is well-decomposed. The token math works, the wall-clock math works, the model latency hides inside the swarm because something is always landing while something else is still compiling. The economics make a strong case for parallel execution as the default.

Until you hit the wall everyone hits: two agents touched the same file.

I've spent the better part of the year fighting this. I've shipped four layers of runtime defense. They all work and none of them are the answer. The answer turned out to be one attribute on the spec. This is the post about that one attribute.

1. The four layers nobody told you you'd need

Before I describe the fix, let me describe the disease — because if you're running parallel agents and you don't recognize this stack, you're probably going to recognize it next week.

When two agents in flight at once both want to edit src/router/routes.py, here's what claw-forge (the harness I work in) does:

File-claim locks. Each task declares touches_files=[...] upfront. The dispatcher refuses to start a second task that wants a file currently held by a running task. The second task defers to the next dispatch cycle.
Pre-dispatch worktree sync. Before the agent runs, the harness merges target_branch into the feature branch inside the worktree. If target moved while the task was queued, the merge happens before any token is spent. Conflicts surface as resume_conflict: failures with the offending file list.
Catch-up rebase inside squash_merge. When the agent's branch finally squash-merges to main and conflicts with concurrent work, the harness merges target into the branch and retries the squash automatically.
Resume-on-retry preamble. If a task fails mid-run, the next attempt picks up the worktree as-is, with a prompt prefix listing what's already committed and what failed last time. The agent doesn't redo the first 60% of the work.

This stack is correct. Each layer earns its keep. If I deleted any one of them, real users would file real bug reports within 48 hours. But notice what they all have in common: they are reactive. Every layer is a response to "two agents touched the same file." The conflict has already happened by the time the layer fires.

What if it never happened?

2. Conflicts are usually predictable from architecture

Sit down with a senior engineer who has worked on a codebase for six months. Hand them a list of feature requests. Ask: "If we built these in parallel with one engineer per feature, where would the merge conflicts happen?" They'll be right within five minutes. They don't run the merges. They look at the codebase's structure and know.

The reason they know is that conflicts cluster around architectural surfaces. A few specific files — the dispatcher, the routes table, the global event bus, the error envelope, the auth middleware — get touched by every feature. Most other files are owned by one feature each. The conflict surface isn't uniformly distributed across the repo. It's concentrated on the structural choke points.

This is the same insight that drives plugin architectures in big software systems. WordPress plugins don't conflict because each lives in wp-content/plugins/<name>/. VS Code extensions don't conflict because each lives in its own directory and registers through a stable API. The host is small and stable. The plugins are everything else.

If you build your codebase as a small core plus many plugins, and your spec tells the harness which features are plugins versus core, and the harness honors that distinction at scheduling time — then ten agents working on ten plugins literally cannot conflict. They are editing files in ten different directories. The locks are decorative. The catch-up rebase is dead code. The pre-dispatch sync is a no-op.

This was the unlock. Encode the architectural intent in the spec. Let the scheduler use it.

3. Two shapes, one attribute

Every feature in our specs now carries an architectural-shape attribute. There are exactly two shapes that matter:

shape="plugin" — vertical features. Live in their own directory, own their own data model, own their own tests. Adding or removing the plugin doesn't touch sibling plugins. Examples: "user can register," "user can edit profile," "task CRUD with tag filtering." Each lives in src/plugins/<name>/.
shape="core" — cross-cutting concerns. Edit files used by every plugin. Examples: "all endpoints validate JWT," "uniform RFC 7807 error envelope," "global rate limit," "database connection pool." Each lives in src/core/<concern>/.

That's it. No tier, no taxonomy, no UML. Two values. The simplicity is load-bearing — if the classifier had three values it would have ten by next quarter, and the scheduling rule would have to handle a Cartesian product of cases.

A spec entry now looks like this:

<feature index="14" shape="plugin" plugin="auth">
  <description>User can register with email and password</description>
</feature>
<feature index="20" shape="core"
         touches_files="src/core/middleware/auth.py">
  <description>All endpoints validate JWT on incoming requests</description>
</feature>

The plugin="auth" attribute auto-fills touches_files to ["src/plugins/auth/**"]. The harness now knows that feature 14 will only touch files inside src/plugins/auth/. Two shape="plugin" features with different plugin names are guaranteed to be file-disjoint. Not "probably." Not "usually." Guaranteed by directory boundaries.

For shape="core" features the auto-derivation can't help — cross-cutting work touches a specific file by name. The author writes touches_files="src/core/middleware/auth.py" explicitly. The parser refuses any spec where shape="core" lacks a touches_files value. Cross-cutting work without a declared file set is a bug in the spec, not a runtime decision the dispatcher gets to make.

4. The scheduling rule that follows

Once shape is in the spec, the dispatcher gets two new rules:

shape="plugin" tasks dispatch freely up to --concurrency N. Their file sets are disjoint by construction. The file-claim lock layer becomes a sanity check rather than a primary defense. Plugin tasks scale linearly with concurrency.
shape="core" tasks single-flight. At most one cross-cutting task runs at a time, regardless of --concurrency. Two core tasks both want to edit the auth middleware? They serialize. Always. No clever overlap analysis, no "well actually they touch different lines." Cross-cutting work is cheap to serialize — it's a small minority of features — and the cost of getting it wrong is high.
Tasks without shape (legacy specs) fall through to the existing concurrency cap + file-claim lock behavior. Backward compatibility is free because the new rules are gated on task.shape IS NOT NULL.

The scheduler's filter is twelve lines of Python:

def get_ready_tasks(self) -> list[TaskNode]:
    ready = [t for t in self._tasks.values() if self._is_ready(t)]
    # Cross-cutting (shape="core") tasks single-flight: drop any
    # candidate ``core`` task from the ready set if another core task
    # is already running.
    any_core_running = any(
        t.status == "running" and t.shape == "core"
        for t in self._tasks.values()
    )
    if any_core_running:
        ready = [t for t in ready if t.shape != "core"]
    return sorted(ready, key=lambda t: -t.priority)

That's the entire enforcement mechanism. The scheduler has no opinion about parallelism beyond this. The touches_files lock layer handles the second-line defense for cases where a plugin author lied about their shape (which the code review should catch separately).

5. Why this works structurally, not just behaviorally

The thing that makes this approach durable is that the safety property is structural: it's a consequence of file-system layout, not of clever runtime detection.

If feat/plugins/auth/ and feat/plugins/profile/ are the only file sets two agents touch, there is no possible interleaving where they conflict. Not because the harness is smart. Because the files don't overlap. The same way two git worktree instances on different branches can edit different files without any locking — git just doesn't see them as a conflict.

Compare this to the old approach: "predict conflicts at runtime by checking which files each agent claims to touch." That works if every agent honestly declares its file set. In practice, agents trying to wire a plugin into a registry often need to edit the registry too. They forget to declare the registry file. The lock layer doesn't fire. The merge conflicts at squash time. The whole reactive stack kicks in.

The plugin-shape approach refuses to be in that situation. If your codebase has a registry that every plugin has to edit, that registry is a hotspot and you should restructure it — or declare it as shape="core" and serialize work on it. The architecture catches up to the parallelism, not the other way around.

This is also why the harness composes naturally with my project's boundaries audit pass. That tooling already identifies hotspot files (registries, route tables, dispatch chains) and refactors them into plugin-extensible patterns. After a boundaries apply --auto pass, the codebase is more amenable to plugin-shape features — fewer surfaces remain that force a shape="core" declaration. The two pieces — spec-time architectural intent and codebase structural refactoring — pull in the same direction. Each makes the other more effective.

6. The brownfield path: refactor first, then extend

Greenfield projects can be built plugin-shaped from day one. Brownfield projects — i.e. every project worth working on — usually have an existing dispatcher / route table / event bus that gets touched by every feature. You can't bolt plugin-shape semantics onto a codebase whose architecture isn't ready for them.

So the brownfield workflow has an extra step:

analyze — generate a manifest with stack, conventions, test baseline.
boundaries audit — emit boundaries_report.md listing extension hotspots and the refactor pattern best suited to each (registry / split / route-table / extract-collaborators).
boundaries apply --auto — refactor each hotspot one at a time on its own feature branch with test gating. Squash-merges to main on green; reverts on red.
/create-spec — the slash command reads boundaries_report.md first. If hotspots remain unrefactored, it warns the user before generating any spec. Then it asks shape per feature.
claw-forge add — runs the planner against the now-shape-aware spec.

Skipping step 3 is the costly mistake. New features land as shape="plugin", but the file-claim lock catches them when they try to edit the un-refactored hotspot, the dispatcher fails the task with resume_conflict, and the agent has wasted one full attempt on stale state. Refactoring up front is cheaper than discovering you need to mid-flight. The boundaries harness exists exactly to make that "up front" step automatic.

The cultural ask is: when adding non-trivial features to an existing codebase, do the structural work first. That's not a new principle — it's "make the change easy, then make the easy change," Kent Beck, twenty years ago. Plugin-shape specs make this principle observable: if you can't write a clean spec without declaring half your features as shape="core", that's a structural signal, not a spec-writing failure.

7. What this doesn't solve (be honest)

I want to be careful not to oversell this. Here's what plugin-shape specs explicitly do not do:

Semantic conflicts inside a single plugin. Two tasks for the same plugin (plugin="auth") still serialize via touches_files locks. Adding "user can reset password" while "user can change email" is in flight will defer the second one until the first finishes. This is fine — it's the correct behavior — but it limits intra-plugin parallelism to one task at a time.
Cross-plugin coupling that wasn't designed in. If your tasks plugin imports from your auth plugin's internals (and your codebase doesn't enforce plugin isolation via lint or import boundaries), edits to auth/ can break tasks/ after merge. The spec doesn't catch this; tests do. Treat the spec as a parallelism hint, not an isolation guarantee.
Shared infrastructure changes. A migration that adds a column to the users table is shape="core" because the migrations directory is shared. Two such migrations serialize. They have to — concurrent migration writers race on the migration sequence number. Don't try to plugin-ify your migrations.
Specifications written as shape-agnostic. A feature whose acceptance criteria say "the system shall …" without naming a directory or file is hard to classify. Either rewrite the criterion to reference a concrete piece of the system, or accept that the feature won't get a shape attribute and will fall through to legacy scheduling.

The honest framing: plugin-shape specs make the common parallelism case (many vertical features against a clean plugin host) trivial-safe. The hard cases — cross-cutting concerns, coupled plugins, shared infrastructure — still require engineering judgment. The win is that the common case becomes the default rather than the exception.

8. The cultural shift this enables

There's a meta-point here that's bigger than the technical mechanism.

Most discussions of "AI agents at scale" focus on the agent's capabilities — context window, reasoning depth, tool-use accuracy. Those matter, but they're not where the leverage is. The leverage is in encoding the human's architectural intent in a place the harness can read. Specs are not just task descriptions for the agent. They're scheduling hints for the orchestrator. They're isolation declarations for the locks. They're refactoring targets for the boundaries pass. They're documentation for the next human reviewer.

When you start writing specs that carry this much load, the spec format itself stops being a casual prose blob and becomes a structured contract. XML attributes that look fussy at first — index, depends_on, shape, plugin, touches_files — earn their keep because every one of them maps to a runtime decision the harness will otherwise have to guess. Guessing is what produces the four-layer reactive stack. Declaring is what makes that stack a quiet backstop instead of a daily firefight.

This is the same shift that happened in deployment automation a decade ago: declarative manifests beat imperative shell scripts because the intent — "I want three replicas behind a load balancer" — was machine-readable rather than buried in a sequence of side-effecting commands. Plugin-shape specs are doing the same thing for AI-agent orchestration: making intent readable so the orchestrator can stop guessing.

If you're building AI-coding-agent infrastructure right now and your dispatcher is making scheduling decisions based purely on what's in the queue, you're building the imperative-shell-script version of this. The declarative version — where the agents read what the human meant rather than what they typed — is meaningfully better, and it doesn't require a smarter model. It requires a more structured spec.

9. The minimum implementation

If you want to try this in your own harness, the minimum viable version is:

One attribute on your task/feature object. Call it shape, kind, category, whatever — but pick exactly two values. "vertical" and "horizontal" works. "feature" and "infra" works. Two values. The temptation to add a third is a trap.
One auto-derivation rule. When shape="plugin" and a plugin="X" is set, the file-claim list defaults to ["plugins/X/**"]. One line of helper code.
One scheduling rule. When any shape="core" task is running, drop other core tasks from the ready set. Twelve lines of Python.
One spec-time validation. shape="core" without an explicit file list raises an error before the planner runs. Five lines.

That's the whole ship. Total surface area: maybe 50 lines of harness code, plus the spec schema extension and the docs to teach the spec author what to declare.

The minimum tests:

A round-trip test that parses the documented XML example and asserts the auto-derived file lists match (guards against doc/code drift).
A scheduler test that adds two shape="core" tasks and confirms only one is in the ready set when the other is running.
A scheduler test that confirms shape="plugin" tasks dispatch freely when a core task is running.

Three tests. Done. The pattern compounds: now your codebase has a place to put new shape-aware behavior, and your spec authors have a place to encode new architectural intent. Future work — auto-derived shape inference via static analysis, telemetry on adoption rates, conflict-prediction at scheduler time — all builds on this primitive.

10. Closing thought

The thing that took me too long to internalize is that parallelism is a property of the architecture, not the runtime. You can't bolt safe parallelism onto a codebase whose architecture forces every feature through the same chokepoint. You can build elaborate runtime defenses against the resulting conflicts — and you should, because real codebases always have some chokepoints — but the runtime defenses are the patch, not the cure.

The cure is to design codebases where parallelism is structurally safe, and to encode that structural intent in the spec so the orchestrator can lean on it. Two values, one attribute, twelve lines of scheduler logic. That's the surface area of the win. The cost was a year of fighting the four-layer reactive stack to recognize that the layers were treating symptoms, not the disease.

If your AI-agent harness is dropping conflicts on you, look at your spec format before you look at your dispatcher. The dispatcher is downstream. The spec is where the architecture lives.

Alex Chen builds AI-coding-agent infrastructure shipped to production. He runs ten-agent swarms daily and would like to thank the team's boundaries harness for finally making it stop hurting.

DEV Community