Nova Elvaris

Posted on Apr 7

I Tracked Every AI Suggestion for a Week — Here's What I Actually Shipped

#programming #ai #productivity #promptengineering

Last week I ran an experiment: I logged every AI-generated code suggestion I received and tracked which ones made it to production unchanged, which ones needed edits, and which ones I threw away entirely.

The results surprised me.

The Setup

Duration: 5 working days
Tools: Claude and GPT for code generation, Copilot for autocomplete
Project: A medium-sized TypeScript backend (REST API, ~40 endpoints)
Tracking: Simple markdown file, one entry per suggestion

The Numbers

Category	Count	Percentage
Shipped unchanged	12	18%
Shipped with edits	31	47%
Thrown away	23	35%
Total suggestions	66	100%

Only 18% of AI suggestions shipped without changes. Almost half needed editing. And over a third were useless.

What Got Shipped Unchanged

The 12 suggestions that shipped as-is had something in common: they were small and well-specified.

Unit tests for pure functions (given a clear function signature)
Type definitions from a schema description
Utility functions with obvious behavior (slugify, debounce, date formatting)
Regex patterns with clear requirements

Pattern: The more constrained the task, the better the output.

What Needed Edits

The 31 "shipped with edits" suggestions fell into predictable categories:

Wrong error handling (14 cases): AI almost always generates optimistic code. Try/catch blocks that log and continue instead of throwing. Missing null checks on database results.
Wrong abstraction level (9 cases): AI tends to over-abstract. Creating a class where a function would do. Adding config options nobody asked for.
Subtle logic bugs (8 cases): Off-by-one errors, incorrect date comparisons, missing edge cases in conditionals.

What Got Thrown Away

The 23 rejected suggestions shared patterns too:

Hallucinated APIs (7 cases): Functions that don't exist in the library version I'm using.
Wrong architecture (6 cases): Solutions that technically work but violate project conventions.
Overcomplicated (5 cases): A 40-line solution for a 5-line problem.
Just wrong (5 cases): Logic that doesn't match the requirement at all.

The Real Insight

I spent roughly 45 minutes per day on AI-assisted coding. My estimate of time saved (vs. writing everything manually): about 90 minutes per day.

Net gain: ~45 minutes/day, or about 3.5 hours/week.

That's real, but it's not the 10x productivity boost people claim. And it requires active review effort — the "savings" assume you catch the bugs before they ship.

What I Changed After This Experiment

Stopped using AI for complex logic. If I need to think hard about the algorithm, I write it myself. AI is best for boilerplate and well-defined transformations.
Started writing specs before prompting. Even a 2-line spec ("takes X, returns Y, handles Z") dramatically improved the "shipped unchanged" rate.
Set a 3-minute rule. If I'm spending more than 3 minutes editing AI output, I delete it and write from scratch. It's faster.

Try It Yourself

Track your AI suggestions for one week. Just a simple log: accepted / edited / rejected. You might be surprised how much time you're spending on the "editing" step.

What's your accept rate? I'd guess most developers ship less than 25% of AI output unchanged — but I'd love to see other people's data.

DEV Community