Here's a pattern I see constantly: a developer asks AI to refactor a function, gets a decent result, then spends 45 minutes on follow-up prompts trying to make it "perfect." The final version is barely better than the third iteration — and sometimes worse.
The problem isn't the AI. It's that you never defined what "done" looks like.
The Pattern
Before you start any AI-assisted task, write exit criteria. Literally. In your prompt.
Task: Refactor the processOrder function to handle partial failures.
Exit criteria (stop when ALL are true):
- [ ] Each payment/inventory/notification step can fail independently
- [ ] Failed steps are logged with enough context to retry manually
- [ ] The function returns a result object showing which steps succeeded/failed
- [ ] No step takes longer than 5 seconds (timeout handling)
- [ ] Existing tests still pass
When all boxes are checked, you're done. Ship it. Move on.
Why Developers Over-Iterate
Three reasons:
Perfectionism masquerading as diligence. "What if we also handle..." is not improving the code — it's scope creep in real time.
The AI is too agreeable. Ask it to improve something and it will always find something to improve. That's not a feature — it's a trap.
No definition of done. Without exit criteria, every iteration feels like progress because there's no finish line to cross.
A Real Session Gone Wrong
Without exit criteria, my session looked like this:
Turn 1: "Refactor processOrder for partial failures" → Good start
Turn 2: "Add better error messages" → Slight improvement
Turn 3: "Make the retry logic configurable" → Scope creep
Turn 4: "Actually, use a state machine" → Complete rewrite
Turn 5: "The state machine is too complex, simplify" → Back to Turn 2
Turn 6: "Add logging" → Should have been in the original spec
Turn 7: ... still going
Seven turns. The Turn 3 output was shippable. Everything after was waste.
With Exit Criteria
Same task, same AI, different approach:
Turn 1: Generate with exit criteria in prompt → 80% there
Turn 2: "Steps 1-3 are met. Step 4 (timeouts) is missing." → Fixed
Turn 3: Run tests → All pass. All criteria met. Done.
Three turns. Better result. The exit criteria acted as a filter: I only iterated on gaps, not preferences.
Writing Good Exit Criteria
Good exit criteria are:
- Binary. You can answer yes or no. Not "is it clean enough?"
- Testable. You can verify them with a test, a grep, or a code review.
- Minimal. 3-5 criteria max. More than that means you need to split the task.
- Ordered by priority. If you run out of patience, the top criteria are the ones that matter.
Bad exit criteria:
- ❌ "Code should be clean"
- ❌ "Handle all edge cases"
- ❌ "Production-ready"
Good exit criteria:
- ✅ "Function returns an error instead of throwing for invalid input"
- ✅ "No function longer than 30 lines"
- ✅ "All three test cases in test_order.py pass"
The "Good Enough" Checkpoint
I add a meta-criterion to every list:
Meta: After 3 iterations, evaluate whether remaining gaps are
worth another turn. If the cost of iteration > cost of a manual fix,
stop and patch by hand.
This prevents the most common failure mode: spending 20 minutes of AI time on something you could fix in 2 minutes manually.
Template
Copy this into your project and customize per task:
## Exit Criteria for [TASK NAME]
Must-have (stop when all checked):
- [ ] [Specific, binary condition 1]
- [ ] [Specific, binary condition 2]
- [ ] [Specific, binary condition 3]
Nice-to-have (only if done in ≤2 extra turns):
- [ ] [Optional improvement]
Meta: Max 4 AI turns. After that, ship or patch manually.
The best AI-assisted developers I know don't write better prompts — they write better exit criteria. They know what "done" looks like before they start, and they stop when they get there.
Define your finish line. Then cross it and move on.
Top comments (0)