Bala Paranj

Posted on Jun 6 • Edited on Jun 25

Fallacies of GenAI Development #8: More AI Agents Means More Productivity

#ai #softwaredevelopment #architecture #engineering

✓ Human-authored analysis; AI used for formatting and proofreading.

This is the eighth and final post in a series on the false assumptions teams make when building with generative AI. The series began with the observation that the trough of disillusionment for AI-assisted development has arrived — not because AI is useless, but because eight false assumptions made the trough inevitable. This post covers the last assumption and closes the series.

The Fallacy

"If one AI agent gives us a 10x boost, ten agents will give us 100x."

Why it's tempting

The arithmetic feels irresistible. One agent generates code for the backend. Another generates the frontend. A third writes tests. A fourth handles database migrations. A fifth generates documentation. Each agent works in parallel. No meetings, waiting or coordination overhead. Pure throughput.

Leadership sees the potential: a five-person team with fifty agents has the output of a fifty-person team at the cost of a five-person team plus API credits. The scaling is linear. The economics are transformational.

And the early results confirm it. Each agent, working on its own, produces impressive output. The backend agent generates Go code. The frontend agent generates React components. The test agent generates test suites. Each agent, in isolation, looks like a 10x developer.

Why it's wrong

You've seen this problem before. It has a name. It's called distributed systems.

A distributed system is a collection of independent actors that must coordinate to produce a coherent result. Each actor makes decisions locally. The system's correctness depends on those local decisions being compatible globally. When they aren't, you get inconsistency, conflicts, data corruption, and cascading failures.

AI agents working on the same codebase are a distributed system. Each agent makes decisions — variable names, error handling strategies, retry policies, data formats, abstraction levels, dependency choices. Each decision is made locally, in the context of one prompt, one file, one task. No agent sees the full picture. No agent coordinates with the other agents. Each agent's decisions are invisible to the others.

Distributed systems engineers spent 40 years learning that you can't scale a distributed system by adding more nodes. You can only scale it by adding protocols — consensus mechanisms, ordering guarantees, conflict resolution rules, interface contracts. Without protocols, more nodes means more conflicts, not more throughput.

The same applies to AI agents. More agents without specifications means more invisible decisions, more inconsistency, more cognitive fragmentation — not more productivity.

The boom

Month 1-2: The parallel sprint. Five agents work simultaneously on different parts of the system. Each produces well-structured code. PRs flow in from every direction. The team merges them rapidly. The system takes shape faster than anyone expected.

Month 3-4: The integration cracks. The backend agent chose camelCase for JSON field names. The frontend agent expected snake_case. The database migration agent used PascalCase for column names. None knew about the others' choices. Each was reasonable in isolation. The integration fails silently — data flows through but field mappings are wrong. The bug appears as "the UI shows the wrong value" and takes two days to trace to a naming mismatch across three layers.

Month 5-6: The contradictory patterns. The backend agent implemented retries with exponential backoff and jitter. The API gateway agent implemented retries with a fixed 3-second delay. Both are valid retry strategies. Both were generated from different training data patterns. When a downstream service is slow, the backend retries with increasing delays while the gateway retries every 3 seconds — creating a thundering herd that overwhelms the already-slow service. The agents' patterns contradicted each other. Neither knew the other existed.

Month 7-8: The architectural drift. Each agent, given different tasks over months, evolved different internal patterns. The backend agent started using result types for error handling. The frontend agent uses exceptions. The test agent mixes both depending on which prompt it received. The codebase has three error handling philosophies, each locally consistent, globally incoherent. A new developer opens the codebase and can't determine which pattern is correct because all three exist in production.

Month 9-10: The edit war. Agent A refactors a shared utility function for performance. Agent B, in a separate task, refactors the same function for readability. Agent A's change merges first. Agent B's change overwrites A's optimization. Neither agent knows the other touched the file. The team pays for the token cost of both refactors and gets the result of neither. Adam Bender of Google has a name for this — in his talk Software Engineering at the Tipping Point, he calls it the agentic edit war: two agents refactoring the same file toward different goals, livelocking the system while the company pays for the tokens on both sides.

Worse, without a pattern specification, the cycle repeats. Agent A sees the unoptimized code and refactors it again for performance. Agent B sees the unreadable code and refactors it again for readability. The agents consume tokens infinitely, bouncing the code back and forth between two valid-but-contradictory goals. In distributed systems, this is called livelock — the system is active but making no progress. In AI development, it's called your API bill.

Month 11-12: The coordination collapse. The team needs to make an architectural change — migrate from REST to gRPC. Each agent needs to be told individually. Each interprets the migration prompt differently. The backend agent generates gRPC server code. The frontend agent keeps generating REST client calls because its prompt wasn't updated. The test agent generates tests for both protocols because it sees both in the codebase. The migration that should take a week takes two months because every agent is working against the others.

The team discovers they've been managing a distributed system without the protocols that distributed systems require. Each agent is a node. Each node is making decisions. Nobody built the consensus layer.

What distributed systems already solved

Hold up the two systems side by side:

Distributed Computing (1990s):           Distributed AI Development (2020s):
───────────────────────────              ──────────────────────────────────
Multiple processes on multiple nodes     Multiple agents on one codebase
Each makes local decisions               Each makes local decisions
No shared state by default               No shared context by default
Inconsistency is the default outcome     Inconsistency is the default outcome

Distributed computing solved this with protocols:

Protocol                    What it solves              AI equivalent
────────────────────        ────────────────────        ──────────────────────
Consensus (Paxos, Raft)     Agreement on shared state   Shared specification repo
Ordering (vector clocks)    Event sequencing            Architectural priority rules
Conflict resolution         Concurrent modifications    Specification gate on merge
Interface contracts (IDL)   Cross-service compatibility API contracts + contract tests
Schema evolution            Backward compatibility      Migration specifications
Circuit breakers            Cascading failure prevention Dependency scope limits

Every protocol in the left column has an AI development equivalent in the right column. The solutions exist. They're called specifications, contracts, and enforcement gates. They're the same coordination mechanisms — applied to agents instead of processes.

The five coordination failures and their fixes

1. Naming inconsistency → Convention specification

Problem: Each agent chooses its own naming conventions.
Fix:     A convention specification fed to every agent as context.
         "JSON fields: camelCase. Database columns: snake_case.
          Go structs: PascalCase. Environment variables: UPPER_SNAKE."
         Four lines. Every agent reads them. Every change is checked
         against them by a linter. Inconsistency becomes mechanically
         impossible.

         Critical: the specification must be immutable for the duration
         of concurrent agent tasks. In distributed systems, "split brain"
         happens when nodes have different versions of the truth. If
         Agent A has the old naming convention and Agent B has the new
         one, the codebase gets both. Version the specification. Update
         it between task batches, not during them.

2. Pattern contradictions → Architectural decision records

Problem: Each agent implements different patterns for the same concern.
Fix:     One ADR per cross-cutting concern.
         "ADR-007: Retry strategy is exponential backoff with jitter,
          base 100ms, max 5 attempts, across ALL services."
         The specification is the protocol. The CI check is the
         enforcement. Any agent that generates fixed-delay retries
         fails the build.

3. Edit conflicts → Ownership boundaries

Problem: Two agents modify the same file with contradictory goals.
Fix:     CODEOWNERS + module boundaries.
         Each module has one owner (human or agent). Cross-module
         changes require the interface contract to be satisfied.
         Agents can't modify modules outside their scope.
         Same principle as microservice boundaries — but for agents.

4. Error handling divergence → Interface contracts

Problem: Three error handling philosophies coexist in the codebase.
Fix:     One interface contract: "All public functions return
         (result, error). No exceptions. No panics. No sentinel values."
         Enforced by the compiler (Go) or by a linter rule (TypeScript).
         The contract is the specification. The tool is the enforcement.

5. Migration incoherence → Change specifications

Problem: An architectural migration is interpreted differently by each agent.
Fix:     A migration specification: "Phase 1: Add gRPC endpoints alongside
         REST. Phase 2: Migrate clients to gRPC. Phase 3: Remove REST.
         Current phase: 1. Agents must not remove REST endpoints."
         Each agent reads the current phase. The specification prevents
         agents from jumping ahead or falling behind.

Each fix is small. Each is a few lines of text. Each is fed to agents as context AND enforced mechanically by CI. The context helps agents make correct decisions. The enforcement catches them when they don't.

The principle: specifications are protocols for agents

In distributed computing, protocols enable coordination without requiring every node to understand every other node. Node A doesn't need to know Node B's implementation. It needs to know Node B's interface contract. If both nodes respect the protocol, the system is consistent — regardless of how many nodes you add.

The same principle applies to AI agents. Agent A doesn't need to know Agent B's prompt. It needs to know the specification that governs the module it's working on. If all agents respect the specifications, the codebase is consistent — regardless of how many agents you add.

Distributed computing:     Protocol enables coordination between nodes
Distributed AI dev:        Specification enables coordination between agents

Distributed computing:     More nodes + same protocol = more throughput
Distributed AI dev:        More agents + same specifications = more productivity

Distributed computing:     More nodes + no protocol = more conflicts
Distributed AI dev:        More agents + no specifications = more inconsistency

Scaling agents is scaling a distributed system. The mechanisms are the same. The lesson is the same. The solution is the same.

The series, complete

Eight fallacies. One meta-pattern.

Fallacy #1: Faster generation = faster engineering
            → The leading sub-system outran the lagging ones

Fallacy #2: Looks correct = is correct
            → Plausibility is not correctness

Fallacy #3: AI can verify AI
            → Correlated failure modes don't converge

Fallacy #4: Drop review = remove bottleneck
            → Removing a gate without replacing it removes the safety net

Fallacy #5: Better context = correct output
            → Input quality doesn't guarantee output correctness

Fallacy #6: Generated code is an asset
            → Code is a liability; capability is the asset

Fallacy #7: Specs are new work
            → The specifications already exist; the enforcement doesn't

Fallacy #8: More agents = more productivity
            → More actors without protocols = more conflicts

Every fallacy stems from one root assumption: generating the output is the hard part.

This assumption is wrong. Understanding the output, verifying it, maintaining it, coordinating the actors that produce it, and preserving the rationale for why it's shaped this way — those are the hard parts. They always were. AI made the easy part faster. The hard parts didn't change.

Peter Deutsch's Distributed Computing Fallacies worked because they named the assumptions that every developer made, discovered were wrong, and paid for in production. The network is not reliable. Latency is not zero. Bandwidth is not infinite.

The Fallacies of GenAI Development work the same way. Generation is not engineering. Plausible is not correct. More agents is not more productivity. Each assumption sounds true. Each leads to failure. Each has already been resolved by a domain that learned the lesson first.

The resolution to all eight

The resolution is the same across every fallacy, every domain, every era:

Recognize the specifications that already exist in your system — types, contracts, schemas, boundaries (Fallacy #7). Enforce them mechanically on every change, at machine speed (Fallacies #1, #3, #4). Verify the output against declared properties, not just the input quality (Fallacies #2, #5). Measure properties verified, not code generated (Fallacy #6). Use specifications as coordination protocols for agents (Fallacy #8).

One architectural principle. Eight applications. The teams that adopt it first will emerge from the trough of disillusionment ahead of everyone else. The teams that don't will learn the same lessons the hard way — the same way distributed systems developers learned Deutsch's fallacies, one production incident at a time.

The engineer's role has changed. Not from "writing code" to "writing prompts" — that's the Fallacy #4 trap, the prompt operator with no ownership. The real shift is from writing code to designing protocols. The specifications, contracts, boundaries, and enforcement gates that enable agents to coordinate safely. The engineer becomes the protocol designer for a distributed system of AI actors. That's not a demotion from "programmer." It's the same move the industry made when it went from writing assembly to designing systems. The abstraction level changed. The engineering judgment became more valuable, not less.

The trough is real. The exit is specifications — recognized, enforced, and verified.

This concludes The Fallacies of GenAI Development. The complete series: #1 Faster Generation ≠ Faster Engineering · #2 Plausible ≠ Correct · #3 AI Can't Verify AI · #4 Dropping Review ≠ Removing Bottleneck · #5 Better Context ≠ Correct Output · #6 Generated Code Is a Liability · #7 Specifications Already Exist · #8 More Agents ≠ More Productivity

For cloud infrastructure specifically, the specification-first model is implemented in Stave — an open-source tool that evaluates configuration snapshots against 2,662 safety invariants with deterministic mechanical verification. Apache 2.0.

For a single IAM policy file, try iam-explain — point it at your policy JSON, see what the math says. The specifications are already in your policy. The tool shows you what they mean.

The Fallacies of GenAI Development were inspired by Peter Deutsch's "Fallacies of Distributed Computing" (1994). The resolution draws from Parnas (1972), Altshuller's TRIZ (1946), Byron Cook's automated reasoning at AWS, and evidence from aviation, nuclear operations, financial trading, and Google's monorepo. Each fallacy was discovered independently across domains. The convergence is the evidence.

DEV Community