Bala Paranj

Posted on Jun 5 • Edited on Jun 25

Fallacies of GenAI Development #7: Specifications Are a New Artifact You Have to Create

#ai #softwaredevelopment #architecture #engineering

✓ Human-authored analysis; AI used for formatting and proofreading.

This is the seventh in a series of eight posts on the false assumptions teams make when building with generative AI. Fallacy #1 through #5 covered why AI-speed development breaks without verification. Fallacy #6 covered why generated code is a liability, not an asset. This post covers the assumption that stops most teams from adding verification: the belief that it requires writing new specifications from scratch.

The Fallacy

"To adopt specification-first development, we need to write specifications for everything. That's too much work."

Why it's tempting

The word specification carries baggage. Engineers who've been in the industry long enough remember:

The 1980s: Formal methods. Z notation. VDM. B method. Full behavioral specifications that were as complex as the implementation. Writing the spec took as long as writing the code. Maintaining both was double the work. When deadline pressure hit, the spec was the first thing dropped.

The 1990s: IEEE 830-style requirements documents. Hundreds of pages. Maintained alongside the code by a dedicated requirements engineer. The document and the code drifted apart within months. The requirements document became fiction that nobody read.

The 2000s: Model-driven development. UML diagrams generated code. The diagrams were the specification. Maintaining the diagrams was a full-time job. When the generated code needed manual modifications, the diagrams and the code diverged permanently.

Each attempt failed for the same reason: the specification tried to describe EVERYTHING the system does. A complete behavioral specification is as complex as the implementation. Maintaining both is unsustainable. The spec rots. The team abandons it. The effort was wasted.

Engineers hear specifications and think of these failures. The reaction is immediate: "We tried that. It didn't work. Too much overhead. We're not doing it again."

Why it's wrong

The fallacy is in the assumption that specification means what it meant in the 1980s — a complete behavioral description of the system.

It doesn't. Look at your codebase right now. You already have specifications. You just don't call them that.

The specifications you already maintain

Type signatures. Every function in a typed language has a specification: what it accepts and what it returns. func ParsePolicy(path string) (*PolicyDocument, error) — that's a specification. It says: this function takes a string, returns a document or an error. Any code that calls it with the wrong type fails at compile time. You already write these. You already maintain them. You already enforce them mechanically — the compiler is the enforcement gate.

You're already using formal methods. Type checking is the most successful formal verification tool in history. It proves a mathematical property about your code — type safety — every time you compile. It's fast, deterministic, and catches errors before any human sees them. We have to apply the success of the compiler to your other boundaries.

API contracts. Your REST API has an OpenAPI spec. Your gRPC services have Protocol Buffer definitions. Your GraphQL API has a schema. Each one specifies: these are the endpoints, these are the fields, these are the types, these are the required parameters. Any client that violates the contract gets an error. You already write and version them.

Database schemas. Your migration files specify: these are the tables, these are the columns, these are the constraints, these are the foreign keys. NOT NULL, UNIQUE, FOREIGN KEY — each is a specification enforced by the database engine on every write. You already write these.

Configuration schemas. Your Kubernetes manifests specify resource limits, health checks, replica counts. Your Terraform files specify infrastructure state. Your CI pipeline files specify build steps and their dependencies. Each is a specification that a machine reads and enforces.

Interface boundaries. Your Go packages have exported and unexported functions. Your Java classes have public and private methods. Your Python modules have __all__ lists. Each boundary specifies: this is the interface, everything else is hidden. Parnas told you to create these in 1972. You did.

What Parnas said

David Parnas's 1972 paper "On the Criteria To Be Used in Decomposing Systems into Modules" argued that humans can't hold entire systems in their heads. The solution: information hiding. Each module exposes an interface (what it does) and hides its implementation (how it does it).

This paper is the foundation of every modular programming language, every API boundary, every microservice architecture, every package system. When you write a Go package with exported functions, you're implementing Parnas. When you define a Protocol Buffer service, you're implementing Parnas. When you create a database schema with foreign key constraints, you're implementing Parnas.

You've been writing specifications your entire career. You just called them interfaces and schemas and contracts and types.

What's missing

The specifications exist. What's missing is a single thing: mechanical enforcement across all of them, on every change, at AI speed.

Your type system enforces type specifications — but only within one language, one compilation unit. It doesn't enforce that the TypeScript client matches the Go server's API contract.

Your database engine enforces schema constraints — but only at write time. It doesn't enforce that the application code handles all the constraint violations correctly.

Your API contract exists — but most teams don't have a CI check that fails the build when generated code deviates from the OpenAPI spec.

Your module boundaries exist — but nothing prevents an AI agent from generating code that reaches across boundaries through reflection, type casting, or dynamic imports.

The specifications are there. The enforcement is human — code review, manual testing, "I'll check that during review." And as Fallacy #4 showed, human enforcement doesn't scale to AI-speed generation.

There's an irony here that connects to the GenAI context: AI models are remarkably good at following well-defined contracts and remarkably bad at guessing implicit ones. When your OpenAPI spec is enforced mechanically, you give the AI a fixed target. It doesn't have to guess your intent. It just has to satisfy the contract. The mechanical enforcement that catches human errors ALSO catches AI errors — and the AI benefits from the explicit boundary even more than the human does, because the AI has zero institutional knowledge to fall back on when the contract is absent.

The three-layer model

Every domain that faced the "system too complex for humans to understand" problem arrived at the same three-layer structure:

Layer 1 — Boundaries (Parnas, 1972):
    The specifications. Already in your codebase.
    Module interfaces, API contracts, type signatures, schemas.
    Defines WHAT each part promises to the rest.

Layer 2 — Mechanical enforcement (the missing layer):
    A gate that checks EVERY change against EVERY boundary.
    At machine speed. Before deployment.
    The type checker is Layer 2 for type specifications.
    CI contract tests are Layer 2 for API specifications.
    What's missing: Layer 2 for architectural and security specifications.

Layer 3 — Rationale preservation:
    WHY each boundary exists.
    The failure mode it prevents. The incident that motivated it.
    Connected to the boundary so that changing it triggers
    review of the rationale.

Layer 1 you already have. Your interfaces, contracts, types, and schemas.

Layer 2 you partially have. The compiler is Layer 2 for types. The database engine is Layer 2 for schemas. What you're missing: Layer 2 for the architectural constraints, security properties, and cross-service contracts that currently live in people's heads or in documents.

Layer 3 you barely have. Maybe some Architecture Decision Records. Maybe some comments explaining "why" in the code. Rarely connected to the boundaries they explain. Rarely enforced — nobody checks whether a proposed change to an interface violates the rationale that shaped it.

The extreme domains added Layer 2

No safety-critical domain stopped at Layer 1. Every one discovered that boundaries without enforcement don't survive contact with reality at scale.

Microprocessor design: Interface specifications exist (bus protocols, timing diagrams). Formal verification tools CHECK every circuit against the specifications mechanically. You can't tape out a chip without passing the verification suite.

Nuclear operations: Operating procedures exist (the specifications). Physical interlocks ENFORCE parameter limits mechanically. The reactor scrams automatically when parameters exceed the envelope — regardless of what the operator does.

Aviation: Flight envelopes exist (the specifications). Fly-by-wire systems ENFORCE them mechanically. The pilot can't stall the aircraft — the computer overrides the input before it reaches the control surfaces.

Google monorepo: API contracts exist (the specifications). CI gates ENFORCE them on every commit. A large-scale change touching millions of lines of code merges only when every affected contract test passes mechanically.

Each domain: specifications existed (Layer 1). Human enforcement couldn't keep up (Layer 2 missing). They added mechanical enforcement (Layer 2). The system survived scaling.

Your codebase is in the same position. Layer 1 exists. Layer 2 is partial. AI-speed generation made the gap visible.

Why this isn't the 1980s

The formal methods failures of the 1980s had a specific cause: the specification was a COMPLETE BEHAVIORAL DESCRIPTION — as complex as the code. That's not what this article is proposing.

1980s approach (failed):
    Write a full specification of everything the system does.
    Spec complexity ≈ code complexity.
    Maintaining both is double the work.
    Spec rots. Team abandons it.

Current approach (what already exists + enforcement):
    Recognize the specifications you already maintain.
    Add mechanical enforcement for the ones that aren't enforced.
    Spec complexity << code complexity because these are
    INTERFACE specifications, not BEHAVIORAL specifications.
    Types, contracts, schemas — small, stable, already maintained.

The type signature func(string) (*Document, error) is not a complete behavioral specification of the function. It's an interface specification — what goes in, what comes out. It's smaller than the code. It changes less often. The compiler already enforces it.

The OpenAPI spec is not a complete description of your API's behavior. It's an interface specification — endpoints, fields, types, required parameters. It's smaller than the code. It changes less often. Contract testing already enforces it.

The database schema is not a complete model of your data's semantics. It's a structural specification — tables, columns, constraints. It's smaller than the application code. It changes less often. The database engine already enforces it.

Each of these succeeded where the 1980s failed because they specify INTERFACES, not BEHAVIOR. Interfaces are small. Behavior is large. Small specifications are maintainable. Large specifications rot.

What you can do this week

1. Inventory your existing specifications. Open your codebase. Count: How many typed function signatures? How many API contracts (OpenAPI, Protobuf, GraphQL schemas)? How many database schemas with constraints? How many CI pipeline definitions? You'll find more specifications than you expected. You've been writing them your entire career.

2. Identify the unenforced specifications. Of the specifications you found, which ones have mechanical enforcement (compiler, contract test, schema validator, CI check)? Which ones exist as documents but aren't checked on every change? The unenforced ones are your Layer 2 gaps. Each gap is a place where AI-generated code can violate an interface without anyone catching it.

3. Enforce one specification mechanically this week. Pick the most critical unenforced specification. Your OpenAPI contract that exists but has no contract test? Add one. Your module boundary that exists but nothing prevents cross-boundary imports? Add a linter rule — tools like arch-go, depguard, or eslint-plugin-import can enforce that your internal packages can't be imported by your public packages. That's a Layer 2 gate for a Layer 1 boundary, implemented in one config file. Your architecture decision that exists as a document but has no CI check? Write one check for one rule.

One specification. Already written. One enforcement gate. New this week. That's the entire adoption path. Not "write specifications for everything." Not "adopt formal methods." Not "add a six-month process change." One existing specification. One new CI check. This week.

4. Attach one rationale. Take the specification you just enforced. Add one comment or one metadata field: WHY does this boundary exist? What incident or failure mode motivated it? When someone proposes changing it next quarter, the rationale will save the team from reintroducing the problem the boundary was designed to prevent.

A Layer 3 comment isn't // This parses JSON. It's:

// We use this specific parser because it handles the non-standard
// date format from the legacy billing system. The standard library
// parser silently truncates these dates, causing invoice mismatches.
// See Incident #402. Do not replace without verifying billing
// integration tests pass with the legacy date format.

Without this, an AI agent will optimize the code by replacing the parser with the standard library version. The tests pass — because the test fixtures use standard dates. The billing integration breaks in production — because real billing data uses the legacy format. The rationale prevents the optimization that looks correct but isn't.

The specifications already exist. Parnas gave you the principle. Your language gave you the type system. Your API framework gave you the contract. Your database gave you the schema. The only new thing is the enforcement gate — and that's a CI configuration, not a cultural transformation.

Next and final in the series: **Fallacy #8 — "More AI Agents Means More Productivity."* Why adding agents without adding specifications is like scaling a distributed system without adding protocols — and why the same coordination mechanisms that solved distributed computing solve distributed AI development.*

The Fallacies of GenAI Development: eight assumptions every team is making. Each one leads to an architectural failure. Each one has already been solved.

DEV Community