I was reviewing a PR last week — code generated by agents running in parallel — when something just didn't add up. Two functions stepping on each other. One was writing a config file while the other was reading it. The result was some intermediate state that was neither one thing nor the other. My first instinct was "the AI hallucinated." My second instinct, after reading the multi-agent software development paper, was much worse: this isn't a hallucination. This is a race condition.
And that's when my blood ran cold.
Multi-Agent Development: The Problem Nobody Is Naming Correctly
The paper — "Multi-Agentic Software Development Is a Distributed Systems Problem" — says something that feels obvious the moment you read it, but that nobody in the AI community is saying out loud: when you have multiple AI agents working on the same codebase, you have a distributed system. With everything that implies.
This isn't a metaphor. It's literal.
When you have two agents modifying files in parallel, you have exactly the same problem as two processes competing for a shared resource. The same problems that make distributed systems hard — and distributed systems are famously hard — show up here:
- Race conditions: two agents modify the same file simultaneously
- Deadlocks: Agent A waits for Agent B to finish a module, and Agent B waits for Agent A to finish an interface
- Eventual inconsistency: the codebase state between agents is momentarily divergent
- Split-brain: two agents hold incompatible mental models of the system they're building
And here's the crucial difference from traditional distributed systems: you have no compiler warning you. You don't have Rust telling you "hey, two mutable references at the same time — that's not happening." You don't have Go's runtime throwing a panic in the goroutine. You have code that looks correct, passes the surface-level tests, and either blows up in production or — worse — never blows up and just behaves badly in ways that are too subtle to catch.
Why This Hit Different: The Surelock Context
A few weeks ago I wrote about Surelock and deadlocks in Rust — or more honestly, about how I burned myself trying to understand it. The lesson Rust taught me was that the compiler forces you to think about ownership and borrowing before the program ever runs. That's intentional friction. The language saying: "before you move on, prove to me you know what you're doing with this shared resource."
In multi-agent development, that friction doesn't exist. The language model has no concept of "this file is already being modified by another agent." It has no notion of a mutex. No transaction semantics. Each agent has a local context that can be completely out of sync with the actual state of the system.
It's like someone took the hardest problem in distributed systems — state consistency — and stripped away every tool we've built to handle it.
The Concrete Patterns the Paper Identifies (That I'd Already Seen Without Knowing What to Call Them)
Here's where it gets interesting. The paper categorizes the failures into recognizable patterns:
1. Write-Write Conflict
Two agents modify the same file. The one that commits last wins. The first one's work is lost, partially or entirely.
// Agent A is writing this:
export interface UserService {
getUser(id: string): Promise<User>;
// Agent A added this new method
getUserByEmail(email: string): Promise<User>;
}
// Agent B, in parallel, is writing this:
export interface UserService {
getUser(id: string): Promise<User>;
// Agent B added THIS new method
getUsersByRole(role: Role): Promise<User[]>;
}
// Final result (whoever wins the merge):
export interface UserService {
getUser(id: string): Promise<User>;
// Only one method makes it. The other is gone.
// And somewhere in the codebase there's code calling the one that isn't here.
getUsersByRole(role: Role): Promise<User[]>;
}
2. Read-Write Inconsistency (What I Saw in That PR)
Agent A reads the interface and makes decisions based on it. Agent B modifies that interface while A is still working. A finishes with code that's consistent with a version of the system that no longer exists.
3. Context Drift
Over time, each agent's mental model of the system diverges. Agent A "knows" that authentication uses JWT. Agent B, which had a different conversation thread, "knows" it uses sessions. Neither is wrong — both had correct context at some point. But the resulting system has both implementations mixed together.
4. Assumption Collision
Two agents make incompatible assumptions about unspecified behavior. Agent A assumes IDs are UUIDs. Agent B assumes they're auto-increment integers. The database schema ends up with both, and no test catches it because each test suite was written by the agent that made the assumption.
-- Table created by Agent A
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255)
);
-- Table created by Agent B (in another part of the schema)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
-- Agent B assumed user_id is an integer
user_id INTEGER REFERENCES users(id) -- THIS IS GOING TO EXPLODE
);
What Real Distributed Systems Already Solved (And What We Can Apply Here)
This is the part that gets me genuinely excited, because we're not starting from scratch. We have decades of distributed systems engineering to adapt:
File/module-level locks and semaphores: before an agent starts working on a module, it acquires a logical lock. No other agent can modify that module until the first one finishes and releases it. It's exactly what we do with mutexes — just applied at the level of agent work organization.
Transactions and rollback: if a set of changes from one agent can't be integrated consistently, you throw them all away and start over. Same as a database transaction failing on conflict.
Event sourcing for the codebase: instead of agents reading the current state of the code, they read the change log. Every agent knows exactly what happened before their intervention. It's more computationally expensive, but it eliminates inconsistency.
Central coordinator (the most pragmatic pattern): an orchestrator agent that serializes decisions. Implementation agents work in parallel, but there's exactly one agent deciding what goes in and in what order. It's less sexy than "fully autonomous AI" but it's what actually works.
That last point reminds me a lot of how I ended up solving the real-time data problems in the Buenos Aires bus project: you don't try to process everything in parallel without coordination. You have a serialization point. Coordinated chaos has a referee.
The Mistakes You're Going to Make (That I Already Made)
Mistake 1: Thinking shared context solves the problem
"But if all agents have access to the same repo, don't they see the same state?" No. The context each agent loads into its context window is a snapshot. If the repo changes while the agent is working, the agent doesn't automatically know. It's eventual consistency without the eventual consistency mechanisms.
Mistake 2: Trusting tests as the only validator
The tests agents generate are consistent with the agents' assumptions. If Agent A has an incorrect assumption, it'll write tests that validate that incorrect assumption. Tests pass. System is broken. Tests are necessary but not sufficient to detect this kind of conflict.
Mistake 3: Scaling agents without scaling coordination
More agents in parallel is not linearly better. In distributed systems this is called the coordination problem — the overhead of maintaining consistency grows faster than the benefit of parallelization. With agents it's exactly the same. Four agents running in parallel without coordination can produce more technical debt than one agent working alone.
Mistake 4: Not versioning interfaces before distributing work
Before any agent starts implementing, interfaces need to be defined and frozen. Same as in language design: the contract has to exist before the consumers of that contract start working. If the contract changes while everyone is implementing, all prior work potentially becomes debt.
FAQ: Multi-Agent Development and Distributed Systems
Do agent-integrated IDEs (Cursor, Copilot Workspace) already solve this?
Partially. Cursor, for example, has full-repo context, but parallel agents still don't have real coordination mechanisms. It's like having all processes seeing the same memory without locks. The write-write conflict and context drift problems are still present when you have multiple sessions or multiple agents running in parallel.
When is it actually worth using multiple agents in parallel?
When the tasks are genuinely independent and have well-defined interfaces between them. If you can draw a dependency graph and there are nodes that don't touch each other, those are candidates for safe parallelization. If the graph is a mess of cross-dependencies, you're better off serializing.
Does this apply only to large projects, or to personal projects too?
It applies as soon as you're using more than one agent working on the same code. Even in small projects — if you've got a Claude session on the frontend and another on the backend touching shared interfaces, you're in distributed systems territory. Scale matters for severity, not for whether the problem exists.
Does Git solve the coordination problem between agents?
Git solves the merge conflict after it happened. It doesn't prevent wasted work — two agents that spent hours on incompatible solutions. And it doesn't detect semantic conflicts where code merges without syntactic conflicts but the resulting behavior is wrong. Git is version control, not agent coordination.
Are there specific tools for coordinating agents today?
Frameworks like LangGraph or CrewAI have coordination primitives, but none yet implement robust distributed-system semantics (locks, transactions, vector clocks). The current state of the art is mostly "human orchestrator" — someone reviewing what each agent does and coordinating manually. Which is valid, but it's important to recognize that's what you're doing.
Does this change how I need to design my system architecture from the start?
Yes, and it's one of the paper's most important conclusions. If you know you're going to use agents in development, the architecture needs to favor modules with explicit and stable interfaces, low coupling, high cohesion — exactly the principles that make distributed systems manageable. That's not a coincidence. It's the same problem.
The Compiler We Don't Have
Rust taught me that early friction is a gift. The compiler that says "no" before the program runs saves you hours of debugging race conditions at runtime. It's uncomfortable in the moment. It's invaluable afterward.
With multi-agent development, we're in the pre-Rust moment of distributed systems. We have the power of parallelization. We don't have the safety tooling. And the consequence is exactly what theory predicts: subtle bugs, inconsistent state, wasted work.
What excites me is that the paper names the problem correctly. Once you know it's a distributed systems problem, you know which library of solutions to reach for. You don't have to invent anything new — you have to apply forty years of distributed systems engineering to a new context.
What worries me is that the industry is going to be slow to realize this. We're going to see a mountain of vibe-coded projects that "work" until they don't, and the diagnosis is going to be "the AI failed" when the correct diagnosis is "nobody coordinated the agents the way you coordinate a distributed system."
I've already changed how I work with agents. I define interfaces before distributing work. I serialize changes to shared code. I treat each agent's context as potentially stale. It's slower than just unleashing agents without a plan, but the code that comes out the other end is code I can actually maintain.
Privacy and local context control — something Apple is betting heavily on — is also going to play into this: if the system state lives in each agent's local context without synchronization, the problem multiplies. Centralized coordination models and shared, synchronized context data are part of the solution.
The industry's next step is building the equivalent of the borrow checker for agents. Until then, we're Rust before 2010: we know concurrency bugs exist, we don't have the compiler that prevents them, and we depend on the programmer — or the architect — being careful.
I'm the architect. I'm going to be careful.
Top comments (0)