We let AI write the code. We just don't let it check its own work.

#ai #llm #discuss #programming

Everyone's chasing a new word for "AI engineering." The bar it's supposed to clear never actually moved.

AI's had three names for "engineering" in about as many years.

First it was prompt engineering — learn to write the magic words. Then, around the middle of 2025, everyone pivoted to context engineering — stop fussing over wording, start filling the context window with the right stuff. Now, early 2026, the word is harness engineering — the layer wrapped around the model: tools, memory, state, error recovery, verification, permissions.

Every time the word turns over, a wave of people panic that they've fallen behind. I've stopped reacting to it. Because the more I build, the more obvious it gets: the word keeps changing and the thing underneath doesn't.

Here's the thing underneath. Whatever you call the tooling, you're always crossing the same gap — from "the AI produced something that runs" to "I'll put my name on this, and if it breaks in production, that's on me." Getting across that gap is the entire job. The new words are just the industry renaming the same crossing, over and over.

So instead of arguing about vocabulary, let me just show you ours — the actual rules four of us use to get across that gap. Almost none of it is clever. That's kind of the point.

Rule 1: nothing starts without a finish line

If we can't say what "done" looks like before we begin, we don't begin. Then we keep cutting the task down until each piece is small enough to be verified on its own. And at commit time, a hook fires automatically and runs an informedness check — we call the threshold τ — that makes sure a human actually understands every necessary change in that commit. Necessary, not every line. We don't make anyone re-read the AI line by line; that both distrusts the tool and doesn't scale. The point is to pull the change up to the one layer of abstraction a person should own, and be genuinely clear at that layer.

Rule 2: the threshold slides with the blast radius

A low-risk, easily-reversible change gets a light touch — we lean on fast rollback and let it move. The moment something is irreversible or high-impact — data, billing, permissions, migrations — the threshold goes to the ceiling: higher informedness required, and we cross-check it with a second model on a different base. Speed and safety aren't two different settings here. They're two ends of the same knob, and we turn it by consequence instead of turning it all the way to "slow" or all the way to "gamble."

Rule 3, the hard one: verification has to be independent

Independent means the thing that checks is not the thing that wrote. We stack two layers. The first is you — but only if you're genuinely informed enough about this piece, which is exactly the threshold from the last rule. A human who actually understands the change is the first independent verifier. The second layer is a model on a different base, checking the first one's work. The pattern the industry's settling into right now is roughly this: Claude Code writes, Codex checks. Different training, different blind spots — so the blind spots don't line up, and something actually gets caught.

The one thing we never do is let the AI check its own work. Change the prompt all you like; it's still reasoning the same way it did when it wrote the bug, so the blind spot sits there untouched.

The part we never hand off

There's one part of all this we never hand off. You have to know when to let the AI write and when to specify the technical detail by hand — that judgment is learnable. But the actual logic of the problem, the clear line of reasoning through it — that can't be delegated to a model. That's yours, always.

And there's more than one honest way across the gap. Just among four of us the styles diverge hard. One of us works architect-first: before the agent touches anything, he's already cut the thing into layers in his head and knows where every piece goes — heavy on that first verifier, the informed human. Another is bolder: let the agent build it, see it run, then throw several different models at it to check — heavy on the second verifier, the independent model. I used to think those two were opposites. They're not. They're just different weightings of the same two independent checks. We keep both around. Whatever lands the requirement safely is good AI engineering, full stop.

So what's the actual moat?

Not the vocabulary. You may have already noticed the whole thing I just described is what people are now calling harness engineering — a name that only caught fire in 2026, and honestly, we're just starting to figure it out ourselves. I'm not going to pretend we've been doing this all along. prompt, context, harness — and there'll be more words after these. The moat is who can take each new tool and each new name and internalize it fastest into something you can actually ship. Keep up with that — the speed of turning novelty into deliverable work — and you keep up with the whole wave.

If you're wrestling with the same thing — trying to ship AI you can actually stand behind, or just working out your own delivery process and want to compare notes — those are my favorite people to talk to. Say hi: lingchong@iterant-ai.com.

Top comments (4)

Julian Neagu • Jul 7

This lands on one thing people keep rediscovering: the model can write, but it shouldn’t be the final judge of correctness.
Even a second model helps, but it’s still correlated failure modes unless you anchor it with real tests and constraints.
The “finish line before you start” rule is doing more work here than most of the tooling.

Lingchong Hu • Jul 7

Well put — defining the finish line before you start is exactly the part that does the heavy lifting. Thanks for reading and taking the time to write this.

Dipankar Sarkar • Jul 7

Rule 2 is the one doing the real work, and I'd push it a step further: the knob isn't only 'how much informedness for this blast radius,' it's 'can I shrink the blast radius before I measure it.' A migration behind expand/contract (add nullable column, backfill, flip read, drop later) or a billing change behind a shadow-write that reconciles against the old path is irreversible-on-paper but reversible-in-practice, and that drops the threshold back down honestly instead of forcing the second-model cross-check. The second model helps but it's correlated failure like the first comment says; reversibility isn't correlated with the model at all. So I'd order it: make it reversible first, then raise informedness only for the residue you couldn't make reversible. Does your tau account for 'behind a flag' as a first-class input, or is it scored on the raw change?

Lingchong Hu • Jul 7

Really appreciate this — "shrink the blast radius before you measure it" is a sharp way to put it, and reversibility-first is an ordering I'd defend too. Great insight, thanks for taking the time to write it up.