Every week, a new article appears on Dev.to or Zenn arguing about code quality in the AI era.
"AI-generated code is hard to read." "Premature optimization creates comprehension debt." "Clean code matters more now, not less."
These are thoughtful arguments. I've read them carefully.
But I think they're all built on an assumption nobody is questioning:
That code will remain the default layer where human judgment operates.
We've Seen This Movie Before
When factories replaced craftsmen, people debated how to preserve the craftsman's eye. How do you maintain quality when a single worker can no longer inspect every piece?
The answer wasn't to slow down the factory. It was to stop inspecting individual outputs entirely.
GE's Six Sigma didn't work by making each product more readable to human inspectors. It worked by shifting the object of control from the product to the process. Statistical process control replaced individual inspection. The question moved from "is this bolt good?" to "is this process producing acceptable defect rates?"
The same transition is coming to software.
Why We Read Code (And Why That's Changing)
When you ask why engineers read code, the answers cluster around a few purposes:
- Verify it does what was intended
- Find security vulnerabilities
- Understand performance characteristics
- Know what to change and where
Now ask which of those actually requires reading code:
- Verify behavior → tests
- Security → static analysis tools
- Performance → measurement
- What to change → here's where it gets interesting
Some reading will always remain. Incident post-mortems, security breaches, performance regressions, responsibility boundary disputes — these are cases where humans will trace back through code. That's not going away.
But notice what those cases have in common: they're exceptions, not the default flow. They're forensic, not operational.
The last routine reason — understanding what to change — is the one that's evaporating. And even that is a proxy. The actual goal is: take the next action correctly.
If AI can take the next action correctly without a human reading the code first, the reading step disappears.
We don't read compiled binaries as part of our daily development workflow. Not because binaries are unreadable in principle, but because we decided our default intervention layer was above that. We trusted the compiler.
We are at the beginning of making the same decision about AI-generated code.
Humans will not disappear from software quality control. But code will no longer be the default layer where human judgment operates.
The Rate Argument
There's a simpler version of this point.
AI generates code faster than humans can read it. This is not a temporary condition. It will widen.
Every industry that hit this inflection point made the same choice: stop inspecting the output, start controlling the process.
The current debate about "how to write AI-assisted code properly" is the craftsman debating technique on the factory floor. The conversation is happening in the wrong place.
What Replaces Code as the Default Control Layer?
If code is no longer where human judgment routinely operates, three things move up to take its place:
1. Contracts, not code
The question "does this implementation look right?" gets replaced by "does this system do what was specified?" The specification becomes the artifact humans author and defend. Not the implementation.
2. Tests as the verification boundary
Tests don't require reading code. They require defining behavior. The human contribution is specifying what correct behavior looks like — which is a design decision, not an implementation review.
3. Measurement as ground truth
Latency, error rates, behavioral drift — these are observable without reading a single line. The monitoring layer becomes the quality gate.
The Problem Six Sigma Didn't Have
Manufacturing's version of this transition worked cleanly because specifications were stable and verifiable before use.
Software has a harder problem: you only discover specification defects through use.
Stakeholders don't fully know what they need until they've seen something that isn't it. The spec is always incomplete. No amount of contract formalization eliminates this.
This means the right analogy isn't Six Sigma. It's closer to iterative product development — where the goal isn't defect-free output, but fast feedback loops.
The human's job isn't to read the code. It's to shorten the cycle between "wrong assumption in the spec" and "that assumption gets corrected."
Two Separate Conversations We're Conflating
There are two distinct problems in play:
Problem A: Code quality — readable, maintainable, not prematurely optimized. This is the layer almost every "AI and code" article addresses.
Problem B: What humans should control — what layer of abstraction should human judgment operate on?
Problem A assumes Problem B is solved. It assumes code will remain the human control layer indefinitely.
Problem B, once you take it seriously, makes Problem A mostly irrelevant.
The debate about code style is a debate about how to decorate a layer that is moving out of human hands.
What Stays Human
The one thing that doesn't get automated is the judgment call about what to build and what "done" means.
Not because it's technically hard to automate. Because it's inherently a negotiation between humans — stakeholders, users, teams — about value and priority. That negotiation can't be delegated to a process. It ends in a handshake, not a test suite.
This is what I mean by a Skill Operating Contract. Not a prompt. Not a style guide.
A Skill Operating Contract is not a prompt that tells an AI how to write code. It is an operational boundary that defines what evidence, tests, assumptions, risks, and human approvals are required before the work can be considered complete.
The human doesn't watch the AI write code. The human defines what "done" means — and the contract holds that definition stable across every execution.
The question is no longer "how should this code be written?"
The question is "what does it mean for this to be done?"
That's the conversation worth having.
This is part of an ongoing series on moving from prompt engineering to judgment externalization. If this framing resonates — or if you think I'm wrong — I'd genuinely like to hear it.
Top comments (0)