DEV Community

Bala Paranj
Bala Paranj

Posted on

A 1972 Paper Predicted the AI Coding Crisis. Nobody Listened.

There's a paper from 1972 that describes — with precision — the crisis AI coding agents are creating right now.

David Parnas published "On the Criteria To Be Used in Decomposing Systems into Modules" when software ran on mainframes and a large program was 10,000 lines. The paper argued that the conventional way of breaking systems apart (by execution steps: input → process → output) was wrong, and that systems should instead be decomposed by design decisions that are likely to change.

The reason was human cognition. Not compiler efficiency. Not runtime performance. The hard bottleneck of software engineering, Parnas argued, is the developer's ability to understand the system well enough to change it safely. Modular boundaries exist to protect human comprehension. Everything else is secondary.

Fifty-three years later, AI coding agents generate complex logic 5-7x faster than humans can comprehend it. Developers accept code through what can only be called cognitive surrender — they review the output, the tests pass, and they merge it without building a mental model of how or why it works.

Parnas diagnosed this exact failure mode half a century before it existed. And his cure still works.

The Firehose Effect

Here's what happens in practice. You prompt an AI agent to implement a retry mechanism with exponential backoff and circuit breaking. The agent returns 300 lines of well-structured code: a RetryPolicy struct, a CircuitBreaker with state management, a BackoffCalculator with jitter, integration tests, and error classification logic that distinguishes transient from terminal failures.

The code works. The tests pass. You read it, nod at the structure, and merge it.

Three weeks later, you need to change the retry behavior for one specific endpoint. You open the file. You can't remember how the circuit breaker interacts with the backoff calculator. You're not sure whether changing the max-retries constant affects the circuit breaker's failure threshold. You don't know if the error classifier's transient category includes timeouts or only HTTP 5xx responses.

You wrote none of this code. You reviewed it once. You understood it well enough to approve it. You didn't understand it well enough to change it safely.

That gap — between understanding enough to approve and understanding enough to change — is cognitive debt. It lives in your head, not in the code. The code is fine. Your mental model of the code is incomplete. And it became incomplete the moment the AI bypassed the cognitive struggle that would have built it.

Parnas's term for the cause: the system violated information hiding. Not because the code was poorly written, but because the module boundaries weren't designed to protect your comprehension. The circuit breaker and the backoff calculator and the error classifier are all exposed to you simultaneously. When you need to change one, you have to reason about all three. The code didn't hide its design decisions. It dumped them on you. That's the firehose.

Parnas's argument (it's not about code organization)

The 1972 paper is widely cited and widely misunderstood. Most developers think Parnas argued for "clear module boundaries" — a code organization principle. He argued for something deeper: that the criterion for where to place module boundaries should be based on which design decisions are likely to change.

The conventional approach decomposed systems by processing steps: an input module, a processing module, an output module. Each step becomes a module. Parnas showed this was wrong — not because it's messy, but because it forces developers to understand design decisions that cross module boundaries. When the input format changes, the processing module has to change too, because it knows about the input format. The design decision (input format) leaked across the boundary. The developer changing the input format now has to understand two modules instead of one.

Parnas's alternative: hide each volatile design decision inside a single module. The input format lives in one module. The processing algorithm lives in another. The output format lives in a third. When the input format changes, only the input module changes. The developer only needs to understand one module. The interface between modules is stable. The implementation behind the interface is hidden.

The key word is hidden. Not "encapsulated." Not "abstracted." Hidden. The developer using the module cannot see the implementation. Cannot depend on it. Cannot be confused by it. The design decision is invisible outside its module. The developer's cognitive load is bounded by the interface, not by the implementation.

Why AI violates information hiding by default

AI coding agents don't think in Parnas modules. They think in tasks.

"Implement a retry mechanism with circuit breaking" is a task. The AI produces a solution to the task — often a good one. But the solution is organized around the task's execution logic, not around which design decisions should be hidden from the developer.

The retry policy, the circuit breaker state machine, and the error classification logic all land in the same module (or in tightly coupled modules) because they all serve the same task. The AI doesn't ask "which of these design decisions might change independently?" It asks "what code solves the prompt?"

The result is exactly the decomposition Parnas warned against: organized by execution steps (receive request → classify error → calculate backoff → check circuit state → retry or abort) rather than by design decisions (error classification policy, backoff strategy, circuit threshold). The three design decisions are tangled together. Changing one requires understanding all three.

This isn't an AI quality problem. A skilled human developer writing under time pressure produces the same tangle. The difference is that the human developer at least struggled through the implementation and built a partial mental model of the tangled decisions. The AI generates the tangle instantly, and the developer never builds the model at all.

When the AI generates at this speed across dozens of modules, the effect compounds. Each module has leaking design decisions. Each leaked decision is a piece of implementation the developer must understand to make safe changes. Multiply this across an AI-accelerated codebase and you get what the cognitive debt analysis calls the firehose effect — the volume of exposed design decisions exceeds human cognitive capacity.

Cognitive firewalls

Parnas's solution, applied to AI-generated code, creates what amounts to a cognitive firewall:

[ AI Agent generates complex implementation ]
                │
                ▼
┌──────────────────────────────────────┐
│      PARNAS MODULE INTERFACE         │  ← Developer reviews THIS
│  (what it provides, requires,        │
│   guarantees — the contract)         │
├──────────────────────────────────────┤
│  Hidden AI-Generated Internals       │  ← Hidden from working memory
│  (complex, fast, possibly opaque)    │
└──────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The developer doesn't need to understand the AI's internal algorithmic choices. They need to understand the interface: what the module provides, what it requires, what it guarantees, and what happens when those guarantees are violated.

For the retry example, the Parnas decomposition hides each design decision:

Module 1: ErrorClassifier
  Interface: classify(error) → transient | terminal
  Hidden: the classification rules, the regex patterns,
          the HTTP status code mappings

Module 2: BackoffStrategy  
  Interface: nextDelay(attempt) → duration
  Hidden: the backoff curve, the jitter algorithm,
          the maximum delay cap

Module 3: CircuitBreaker
  Interface: allow() → bool; record(outcome)
  Hidden: the state machine, the failure threshold,
          the recovery window, the half-open probe logic
Enter fullscreen mode Exit fullscreen mode

Each interface is 1-2 lines. Each hidden implementation might be 50-100 lines of AI-generated code. The developer understands three interfaces (6 lines). The AI generated 150-300 lines behind them. The developer's cognitive load scales with the interfaces, not with the implementation.

When the developer needs to change the backoff strategy, they modify Module 2. They don't touch Module 1 or Module 3. They don't need to understand the circuit breaker's state machine to change the backoff curve. The design decisions are hidden. The cognitive firewall holds.

The criterion AI agents should follow

Parnas's criterion for decomposition — "what design decisions are likely to change?" — is the instruction that's missing from most AI coding workflows.

When a developer prompts "implement retry with circuit breaking," the AI should ask (or the developer should specify): "which design decisions in this implementation might change independently?"

The answer for the retry example:

  • The error classification rules will change when new error types are discovered
  • The backoff strategy will change when performance requirements shift
  • The circuit breaker thresholds will change when reliability targets change
  • The integration between them (retry loop orchestration) will change rarely

Each independently-changeable decision becomes a module. The integration becomes a thin orchestrator that calls the modules through their interfaces. The developer understands the orchestrator (10 lines) and the interfaces (6 lines). The implementations are hidden.

This is not new. Parnas published it in 1972. What's new is that AI makes the violation of this principle automatic and invisible. The AI generates monolithic implementations because monolithic implementations solve the prompt efficiently. The Parnas decomposition is a constraint the developer must impose on the AI's output — the AI won't impose it on itself.

Unix already proved this

Unix's design is Parnas applied at the operating system level. Every Unix tool hides its implementation behind a universal interface: stdin, stdout, exit codes.

grep doesn't expose its regex engine. sort doesn't expose its sorting algorithm. uniq doesn't expose its deduplication logic. Each tool is a Parnas module with a hidden implementation and a stable interface (text streams in, text streams out).

The result: a developer who has never read a line of grep's source code can compose grep | sort | uniq and predict the output with certainty. The cognitive load is bounded by the interface (text stream processing) not by the implementation (regex finite automata, merge sort, hash-based deduplication).

AI-generated code should compose the same way. Each module should have a stable interface that a developer can reason about without reading the implementation. The implementation can be AI-generated, opaque, and complex. The interface must be human-readable, stable and simple.

When the AI generates a module whose interface is as complex as its implementation, it has failed the Parnas test — regardless of whether the code works.

Interfaces evolve into contracts

Parnas described interfaces within a single program, in a single language. But the principle doesn't stop at the module boundary. The industry has been walking an interface evolution for decades — and each step applies the same information-hiding insight at a larger scale.

Level 1: Within a module — language-level interfaces. Go's interface{}, Java's interface, TypeScript's type signatures. The developer defines the contract in the same language as the implementation. type ErrorClassifier interface { Classify(err error) ErrorKind }. The compiler enforces it. The implementation is hidden behind the type system. Every developer already knows this level.

Level 2: Between modules — package boundaries. Go's internal/ convention. Java's module system. Package-level exports vs. private internals. The developer sees the exported API. The unexported implementation is hidden by the language's visibility rules. Same Parnas principle, enforced by the language's module system rather than by discipline alone.

Level 3: Between systems — API contracts. OpenAPI, gRPC/Protocol Buffers, GraphQL schemas. The interface is no longer in the same language as the implementation. The API contract is expressed in a language-independent schema. A Python client and a Go server share the OpenAPI spec — neither needs to understand the other's implementation, or even what language it's written in. This was the first major jump: information hiding across language boundaries. The contract became a standalone artifact, separate from any implementation.

Level 4: Between organizations — standard formats. JSONL, SARIF, SMT-LIB, OCSF. The interface is no longer between two specific systems — it's a standard that any system can implement. SARIF doesn't belong to any security scanner; it's a format any scanner can emit and any dashboard can consume. The contract became universal. The implementation became fully interchangeable.

Level 5: Between human and machine — specifications. This is the level the cognitive debt crisis demands. The interface is no longer between two pieces of software. It's between a human's intent and a machine's execution. A CEL predicate, a reasoning spec YAML, a typed invariant — these are contracts between the developer who knows what must be true and the engine (or AI agent) that makes it true.

The evolution at each level:

Level Scope Developer reasons about Interface construct Examples
1 Within a program A function — inputs, outputs, behavior Language-level signatures Go interface{}, Java interface, C function prototypes
2 Between modules A module — responsibilities, boundaries Package/module exports Go internal/, Java 9 modules, C header files (.h)
3 Between systems A system — capabilities, failure modes API contracts, protocols Unix pipes (stdin/stdout), RPC, REST/OpenAPI, gRPC/Protobuf
4 Between organizations An ecosystem — interoperability, data exchange Standard formats SARIF, JSONL, EDI, SWIFT messages, HL7/FHIR, OCSF
5 Between intent and execution Intent — what must always be true Specifications, invariants CEL predicates, reasoning specs, TLA+ properties

Each level raises the abstraction. At Level 1, the developer reasons about control flow and data types within a single program. At Level 3, Unix proved that processes don't need to share memory or language — grep | sort | uniq composes through stdin/stdout, the universal system-level interface. At Level 4, organizations that have never communicated can exchange data because SARIF and HL7 define the shape independently of any implementation. At Level 5, the developer reasons about what must be true, regardless of how any program, module, or system implements it.

At each level, the same thing happens: the interface enables reasoning without understanding the implementation. At Level 1, you reason about the function without reading its body. At Level 3, you reason about the service without knowing its language. At Level 5, you reason about the system's safety without reading any code at all.

But something deeper shifts alongside the interface: what the developer understands changes.

At Level 1, the developer understands a function — its signature, its preconditions, its return type. The unit of understanding is a single operation.

At Level 2, the developer understands a module — its exported API, its responsibilities, its boundaries. The unit of understanding is a coherent set of operations.

At Level 3, the developer understands a system's capability — what the service does, not how it does it. "This service classifies errors" is a capability. The developer integrates with the capability through a contract (OpenAPI spec), not by reading source code. Integration becomes easier precisely because the contract specifies the capability at a level above the implementation.

At Level 4, the developer understands a data shape — what the standard format carries, what fields mean, what consuming tools expect. SARIF findings have a ruleId, a level, a message. Any tool that emits this shape integrates with any dashboard that reads it. Understanding the shape is understanding the integration.

At Level 5, the developer understands an invariant — what must always be true about the system. Not how it's enforced. Not what code checks it. What property holds. "No unauthorized principal can read sensitive data through any path in the access graph" is a statement about the system's meaning, not its mechanism.

Notice what happens to durability as you climb. Understanding a function's implementation is fragile — it changes with every refactor, every AI regeneration, every optimization. Understanding a module's API is more durable — APIs change less often than implementations. Understanding a system's capability is more durable still — capabilities persist across rewrites. Understanding an invariant is the most durable of all — "no unauthorized access to sensitive data" doesn't change when you switch languages, rewrite the backend, or replace the team.

Cognitive debt compounds fastest at the lowest levels of understanding, because that's where change is fastest and understanding is most fragile. Level 1 understanding (function internals) erodes the moment AI regenerates the function. Level 5 understanding (safety invariants) persists across every regeneration, because the invariant describes intent, not implementation.

This is why the specification-first model resolves cognitive debt: it moves the developer's understanding to the level where it's most durable. You stop understanding the code (Level 1-2, fragile, erodes with every AI-generated change) and start understanding the invariants (Level 5, durable, persists across implementations). The code becomes disposable. The understanding doesn't.

The industry already walked Levels 1 through 4. Every developer has written a Go interface (Level 1), consumed an OpenAPI spec (Level 3), and emitted SARIF or JSONL (Level 4). The patterns are familiar. The cognitive skill — reasoning about the contract rather than the implementation — is already trained.

Level 5 is the same skill applied one level higher. Instead of "I don't need to understand the HTTP handler's implementation, I need to understand the OpenAPI spec," it's "I don't need to understand the AI-generated codebase, I need to understand the safety invariants." The reasoning is the same. The level of abstraction shifts. The cognitive load drops — because the specification is always smaller than the implementation, at every level, by definition. That's Parnas's insight, scaled to its logical conclusion.

What makes Level 5 urgent now is AI. At Levels 1-4, humans wrote the implementations behind the interfaces. The implementations were slow to produce, so cognitive debt accumulated gradually. At Level 5, AI generates the implementations instantly. The implementations are vast, fast-changing, and opaque. The only thing that keeps the developer's understanding intact is the specification — the Level 5 interface. Without it, the developer has no contract to reason against. They're back to reading AI-generated code line by line, which is Stage 2 feedback — the level Parnas proved was insufficient in 1972.

Technical debt vs. cognitive debt

The Parnas lens sharpens the distinction between technical debt and cognitive debt:

Technical debt Cognitive debt
Where it lives In the repository — messy code, poor formatting, tight coupling In human heads — eroded understanding of how and why the system works
How it's detected Linters, static analysis, code smells, compiler warnings Crises, outages, fear of changing code, "don't touch that module" folklore
Parnas diagnosis Bad code structure that fails the modularity test — design decisions leaked across boundaries Breakdown of the interface contract — the internals have overwhelmed the developer's ability to reason about the module
How it compounds Linearly — each shortcut adds friction Exponentially — each un-understood module makes adjacent modules harder to understand
How AI affects it AI can reduce it — AI-generated code is often well-formatted and structurally clean AI accelerates it — well-formatted code that nobody understands is still cognitive debt

The last row is the critical one. AI-generated code often has less technical debt than human-written code — better naming, consistent formatting, comprehensive error handling. And it simultaneously creates more cognitive debt — because the developer didn't build the mental model during writing.

Tackling technical debt (reformatting, renaming, extracting functions) doesn't reduce cognitive debt. The code looks better. The developer's understanding is unchanged. The cognitive debt measures something the linter can't see: does the developer understand this code well enough to change it safely?

Parnas's answer: they don't need to understand the code. They need to understand the interface. If the module boundary is drawn correctly (hiding volatile design decisions), the developer's understanding of the interface is sufficient for safe changes. The implementation is someone else's problem — whether someone else is another developer, another team, or an AI agent.

What this means for AI-assisted development

Three concrete practices follow from applying Parnas to AI-generated code:

Specify the module boundaries before prompting the AI. Don't prompt "implement retry with circuit breaking." Prompt: "implement three modules: an ErrorClassifier with interface classify(error) → transient | terminal, a BackoffStrategy with interface nextDelay(attempt) → duration, and a CircuitBreaker with interface allow() → bool, record(outcome). Each module hides its implementation. Then implement a RetryOrchestrator that calls them through their interfaces." The Parnas decomposition is the developer's job. The implementation is the AI's job.

Review interfaces, not implementations. When the AI returns the code, review the interfaces first. Are they stable? Do they hide the volatile design decisions? Can a developer who reads only the interfaces predict the module's behavior? If the interface exposes implementation details (a circuit breaker interface that reveals its internal state enum), send it back. The interface is the cognitive firewall. If the firewall leaks, the module fails regardless of how correct the implementation is.

Measure cognitive debt by interface complexity, not code complexity. A module with 500 lines of AI-generated implementation behind a 2-line interface has low cognitive debt. A module with 50 lines of AI-generated code and a 20-line interface has high cognitive debt — the developer must understand 20 interface elements to use it safely. The ratio of interface complexity to implementation complexity is the Parnas metric for cognitive exposure. When AI generates code, track the interface complexity, not the line count.

The 53-year head start

Parnas proved in 1972 that human cognitive capacity is the hard constraint of software engineering. Not compilation speed. Not runtime performance. Not deployment frequency. The developer's ability to understand the system well enough to change it safely — that's the bottleneck that everything else depends on.

For 53 years, this insight was treated as a code organization principle: define modules, use good interfaces, practice encapsulation. It was right but not urgent. Humans wrote code slowly enough that they built mental models as they went. The cognitive bottleneck was real but manageable.

AI made it urgent. The cost of writing code dropped to near zero. The cost of understanding code stayed constant. The gap between production speed and comprehension speed — the gap Parnas identified as the fundamental constraint — is now growing exponentially.

The cure is the same as Parnas prescribed: decompose by design decisions, hide implementations behind stable interfaces, bound the developer's cognitive load to the interface rather than the implementation. The AI generates the implementation. The developer owns the interface. The interface is the understanding.

Parnas gave us a 53-year head start on solving this crisis. We just didn't know we'd need it until the AI made the constraint visible.


This article is part of a series on cognitive debt in AI-assisted development. Related: "Six Contradictions Behind Cognitive Debt" (TRIZ analysis), "Six Industries That Already Solved Cognitive Debt" (cross-domain precedents), and "The Next Platform Won't Track Code. It'll Track Intent." (the specification-first platform).

Top comments (0)