I realized something that changed how I work with AI:
Code review of AI-generated code costs as much as writing it yourself.
That cancels out the speed advantage entirely. You're back to square one.
The problem isn't the model. The problem is where you're putting the verification effort.
## The Insight
AI hallucinations show up in behavior, not in syntax. The code looks fine. It passes review. It passes tests. Then it breaks in production.
So stop verifying code. Verify behavior.
## The Methodology
1. Assume AI will always make mistakes
Not sometimes — always. Same engineering logic as "networks drop packets." Design around the failure.
2. Define steps before implementation
Steps exist in every implementation anyway — either you define them before, or AI invents them during. Defining them upfront costs almost nothing but
gives you a behavioral contract you can verify.
3. How to choose step granularity
This is the hardest part. The conventional instinct is to split along code boundaries — functions, modules, services.
The right instinct: split along data change moments.
A 1500-line function can be a single step if it produces one data change you care about. Three lines of code can be a step if those three lines produce
data you need to verify. Step size has nothing to do with code size.
This also means you're not constraining AI's implementation. You're only saying "I need to see the data at this point." How AI gets there is entirely its
creative space.
4. Wrap each step in a tracer
Record input, output, duration, success/failure. No instrumentation needed after the fact.
5. Verify by observing results
Did the step run? Is the data correct? Is the timing reasonable? A human can answer these in seconds. No code review required.
Human defines steps → AI implements → Trace verifies → Human observes
## Without Tracing vs. With Tracing
Without tracing:
❌ Error: Insufficient stock
→ Which step failed? How far did it get? Unknown.
With tracing:
❌ Step 1: ① Check stock (44ms)
Input : {"productId":"prod_002","quantity":1}
Error : Insufficient stock: available=0, requested=1
Exact step. Exact input. Exact error. Located instantly.
Successful order with tracing:
✅ Step 1: ① Check stock (31ms)
✅ Step 2: ② Lock stock & create order (32ms)
✅ Step 3: ③ Push to payment queue (15ms)
✅ Step 4: ④ Payment processed (64ms)
✅ Step 5: ⑤ Notify shipping (28ms)
The trace is the delivery proof. No code review needed.
## Why Heavy Prompt Constraints Backfire
Heavy prompt constraints tell AI what not to do. AI spends attention avoiding rules instead of solving the problem.
Step definitions tell AI what to achieve. The implementation is AI's creative space. You get better results and more flexibility — not less.
## The Bugs You Can't See
The demo (src/order.ts) contains both implementations. The naive version has three real bugs invisible in code review:
- Race condition — stock checked and deducted separately. Concurrent requests oversell.
- Missing transaction — order creation and stock deduction not atomic. Crash between them = inconsistent data.
- No idempotency — duplicate payment messages process twice.
These pass code review. They pass unit tests. They show up in production under load.
Defining the steps precisely forces these boundaries to become visible.
## Design Principles
- No model lock-in — works with any model. Optimized for the weakest model you use; stronger models only improve results.
- No language lock-in — the pattern works in any language. This demo is TypeScript.
-
Minimum viable contract — two fields:
traceIdand step name. Everything else is implementation.
## Try It
bash
git clone https://github.com/adun1982/step-trace
npm install
npx tsx src/index.ts
---
This is shared freely. No product. No upsell. Just a pattern that emerged from real production use — solo developer, 20 restaurant chains, 400 locations,
team-scale output.
If it changes how you think about AI-assisted development, that's enough.
---
Top comments (0)