Last week I ran an experiment: I logged every AI-generated code suggestion I received and tracked which ones made it to production unchanged, which ones needed edits, and which ones I threw away entirely.
The results surprised me.
The Setup
- Duration: 5 working days
- Tools: Claude and GPT for code generation, Copilot for autocomplete
- Project: A medium-sized TypeScript backend (REST API, ~40 endpoints)
- Tracking: Simple markdown file, one entry per suggestion
The Numbers
| Category | Count | Percentage |
|---|---|---|
| Shipped unchanged | 12 | 18% |
| Shipped with edits | 31 | 47% |
| Thrown away | 23 | 35% |
| Total suggestions | 66 | 100% |
Only 18% of AI suggestions shipped without changes. Almost half needed editing. And over a third were useless.
What Got Shipped Unchanged
The 12 suggestions that shipped as-is had something in common: they were small and well-specified.
- Unit tests for pure functions (given a clear function signature)
- Type definitions from a schema description
- Utility functions with obvious behavior (slugify, debounce, date formatting)
- Regex patterns with clear requirements
Pattern: The more constrained the task, the better the output.
What Needed Edits
The 31 "shipped with edits" suggestions fell into predictable categories:
- Wrong error handling (14 cases): AI almost always generates optimistic code. Try/catch blocks that log and continue instead of throwing. Missing null checks on database results.
- Wrong abstraction level (9 cases): AI tends to over-abstract. Creating a class where a function would do. Adding config options nobody asked for.
- Subtle logic bugs (8 cases): Off-by-one errors, incorrect date comparisons, missing edge cases in conditionals.
What Got Thrown Away
The 23 rejected suggestions shared patterns too:
- Hallucinated APIs (7 cases): Functions that don't exist in the library version I'm using.
- Wrong architecture (6 cases): Solutions that technically work but violate project conventions.
- Overcomplicated (5 cases): A 40-line solution for a 5-line problem.
- Just wrong (5 cases): Logic that doesn't match the requirement at all.
The Real Insight
I spent roughly 45 minutes per day on AI-assisted coding. My estimate of time saved (vs. writing everything manually): about 90 minutes per day.
Net gain: ~45 minutes/day, or about 3.5 hours/week.
That's real, but it's not the 10x productivity boost people claim. And it requires active review effort — the "savings" assume you catch the bugs before they ship.
What I Changed After This Experiment
Stopped using AI for complex logic. If I need to think hard about the algorithm, I write it myself. AI is best for boilerplate and well-defined transformations.
Started writing specs before prompting. Even a 2-line spec ("takes X, returns Y, handles Z") dramatically improved the "shipped unchanged" rate.
Set a 3-minute rule. If I'm spending more than 3 minutes editing AI output, I delete it and write from scratch. It's faster.
Try It Yourself
Track your AI suggestions for one week. Just a simple log: accepted / edited / rejected. You might be surprised how much time you're spending on the "editing" step.
What's your accept rate? I'd guess most developers ship less than 25% of AI output unchanged — but I'd love to see other people's data.
Top comments (0)