Harsh

Posted on Jun 23

The 80/20 Rule of AI Code — Why the Last 20% Takes 80% of Your Time

#ai #programming #productivity #softwareengineering

Relocating effort instead of cutting it

AI wrote the first 80% of my feature in 10 minutes.

The code was clean. The logic made sense. The happy path worked on the first try. I ran it, saw it work, and felt that specific kind of developer pride that makes you lean back in your chair slightly.

I was impressed. I felt genuinely productive. I thought I'd be done in another 10, maybe 15 minutes.

That was Tuesday By Thursday evening I was still working on the same feature. Not because the AI had failed. Because it had succeeded at exactly the wrong thing the easy part and left the actual hard part entirely to me.

The edge cases. The error handling. The null checks. The situations that only surface when a real user does something the happy path didn't anticipate.

The AI didn't write those. It didn't even know they existed. It optimized confidently and completely for the world where everything goes right - and that world is not the one your users live in.

That's the 80/20 rule of AI code. The first 80% is fast, impressive, and kind of magical. The last 20% is where the real work actually lives. And it takes 80% of your total time.

Here's what I've learned about that gap, and why I think it matters more than the 10 minutes you saved on Tuesday.

The 80% - Fast, Clean, and Genuinely Impressive

I want to be honest about this part before I get into the frustration.

The AI is remarkable at the first 80%. You give it a clear prompt, it understands what the happy path looks like, and it generates code that works. Not kind-of works. Actually works, with reasonable variable names and logic that flows the way you'd expect.

The first time I saw it in action I genuinely felt like I'd cheated at something. Tickets were closing. The velocity graph was going up. I was shipping things faster than I had in years.

And that feeling is real - I'm not being sarcastic about it. The AI is fast because it's operating in familiar territory. The happy path is the well-trodden path. It's the version of your problem that exists in some form in the training data, that has been solved thousands of times before, that the model can pattern-match its way through with confidence.

The 80% is real. The speed is real.

The problem is that we've started treating the 80% like it's the whole thing. And it isn't.

The 20% - Where Tuesday Becomes Thursday

The AI wrote the happy path. Here's an honest list of what it didn't write:

The empty list. What happens when the user has no data yet? New account, nothing in the database, the list the AI assumed would always have items turns out to be empty. The AI didn't check. You find out from a user report three days after launch, spend an hour tracing back to the unhandled case, and add the check you should have written on Tuesday.

The error handling. The AI assumes the network responds. It assumes the API returns what you asked for. It assumes the third-party service is up. Every try-catch block, every fallback, every "what do we show the user when this fails" decision - that's yours. The AI left it blank because things going wrong wasn't part of the prompt.

The domain-specific edge cases. This is the one that surprises me every time. The AI doesn't know your business logic. It doesn't know that "empty" means something different in three different parts of your application. It doesn't know about the legacy data that's formatted differently. It doesn't know about the enterprise customer who uses the product in a way nobody expected. You know those things. The AI has never heard of them.

The performance cliff. The AI writes code that works for the examples it was given. It doesn't stress-test for scale. You find the bottleneck when the feature goes live and the page suddenly takes four seconds to load for users with large datasets. The code isn't wrong. It just wasn't written with real load in mind.

The maintainability tax. This one is the slowest to show up. The AI writes code that solves today's problem. Three months from now when the requirements shift slightly and you're trying to extend it, you realize the abstraction doesn't quite fit the new shape. Refactoring it costs more time than writing it from scratch would have.

Each of those items takes time. Together, they consistently add up to about 80% of the total effort on any feature I've shipped using AI-generated code.

The 30 Seconds That Cost Me 3 Hours

I was looking at a pull request recently - maybe 200 lines of AI-generated code that I'd prompted in about 30 seconds.

I spent the next 3 hours with it.

Not because the code was broken. The code was fine. I spent 3 hours adding everything the AI had quietly decided wasn't its problem: the error paths, the null checks, the comments explaining the decisions that weren't obvious, the edge case I found by actually thinking about what our users do.

During the 30 seconds I felt fast. During the 3 hours I felt slow.

But here's the thing I keep coming back to: the 3 hours was the actual work. The 30 seconds was the scaffolding. The AI didn't reduce the work - it relocated it. The time moved from writing the structure to making it real and making it real is slower because it requires something the AI genuinely doesn't have: context about your specific situation, your specific users, your specific history with this codebase.

That was the moment I stopped caring about how long generation took and started tracking something more honest: how long until it's actually ready to ship.

Why This Isn't a Complaint About AI

I want to be clear - the 80/20 split isn't a failure of AI. It's basically the design.

The AI is optimized for the common case. The common case is the happy path. Generating the common case quickly is genuinely useful; I'm not being dismissive of that.

The issue isn't with the AI. The issue is with how we've started measuring productivity around it.

We measure velocity. Tickets closed. Lines generated. Contribution graph. And all of those metrics capture the 80% beautifully - because the 80% is fast and visible and shows up as green squares.

The 20% is invisible to those metrics. Nobody's dashboard shows time spent adding error handling. Nobody's standup starts with "I spent yesterday on edge cases the AI didn't anticipate." It doesn't show up anywhere. But it's where most of the actual time goes.

The 80% is what gets you to a demo. The 20% is what gets you to production. And if you're not tracking how long the 20% takes, you're not tracking your real productivity - you're tracking how quickly you can type a prompt and feel good about it.

What I'm Actually Doing Differently

Not quitting AI. Not even thinking about it. But I've changed a few things:

I budget for the 20% upfront. When I estimate any task involving AI-generated code, I add roughly 4x to whatever the generation time suggests. The AI says "this is a 10-minute feature." I tell my brain it's a 40-minute feature and plan accordingly. It's not pessimism - it's just the pattern holding.

I prompt for the unhappy path explicitly. Before I even generate the main code, I add to the prompt: what should happen with empty input? What should happen when the API fails? What edge cases exist here? The AI won't think of them on its own. If I name them, it at least takes a pass at them.

I write the failing tests before the code exists. What would break this? What would a mischievous user do? I write those tests first so the AI has a target. It doesn't catch everything, but it catches more than the AI would find by itself.

I remember the 3 hours. When I'm tempted to push something quickly because it works in the demo I think about the 3 hours. The 30 seconds felt good. The 3 hours was the job.

None of this makes the 20% disappear. But it makes it predictable instead of surprising, which is the difference between managing it and being ambushed by it.

One Question

What's the longest you've spent on the last 20% of something the AI generated quickly?

I want actual numbers if you have them. The gap between how long generation took and how long it actually took to ship - that's the number I'm curious about.

My answer: 30 seconds to generate, 3 hours to finish.

What's yours? 👇

Heads up I used AI to help structure this post and refine my thoughts. The experiences stories and opinions are my own.

Top comments (23)

Jasmine Dueñas • Jun 24

This matches my experience almost perfectly.

AI is incredibly good at getting you from a blank file to a working prototype. The part that still takes time is everything that happens between "it works on my machine" and "I'm comfortable deploying this to production."

The last 20% is usually where the business rules, edge cases, legacy data, security concerns, and unexpected user behavior start showing up. That's also the part where domain knowledge matters more than coding speed.

I've found that AI makes me faster, but it doesn't remove the need to think. If anything, it shifts more of my time toward reviewing decisions, validating assumptions, and making sure the solution actually fits the real-world problem.

The 80% is impressive. The last 20% is where engineering still happens.

Harsh • Jun 24

You've summarized it better than I did The last 20% is where engineering still happens. that's the line AI can get you to it works. It can't get you to it's right for your users, your data, your legacy constraints. That part still needs someone who understands the why.

And you're right AI doesn't remove the need to think. It shifts the thinking from how to write it to is this actually the right thing for this context?

Thanks for this perfect summary. 🙌

Mykola Kondratiuk • Jun 24

the first 80% builds false confidence. you ship to staging feeling great, then spend 2 days chasing a race condition the happy path hid. I track this as 'AI debt' now - fast draft, slow integration.

Harsh • Jun 24

AI debt that's the term I was missing Fast draft, slow integration. The first 80% gives you confidence, the last 20% gives you humility The race condition example is perfect the happy path hid it, the staging environment revealed it, and the 2 days were spent paying for the 10 minutes you saved.

The debt doesn't show up on your velocity graph. But it shows up on your calendar.

I'm using AI debt from now on. Thanks for this. 🙌

Mykola Kondratiuk • Jun 24

yeah AI debt is exactly the frame. borrows fast, compounds silently. and the staging invoice always lands harder than you expect

Mike Czerwinski • Jun 23

The 80/20 split names the symptom. The structural cause underneath is that the lineage that wrote the happy path is also the lineage that would have to spot the edge cases. Confident-imagination of edge cases by the writer is a different distribution from the ones that actually appear — business logic the model didn't have, legacy data it never saw, null branches three function calls deep. Asking "what could go wrong?" of the path that produced the optimization returns the most plausible-sounding gaps. Not the real ones.

Failing tests before code is the right primitive, but the variant worth holding onto is the bite-check version: the test has to fail against the pre-change code, not just exist. A test that passes against both states isn't a gate. It's decor with passing CI.

Tracking time-to-ship over generation speed shifts the metric to the right axis. The leverage point underneath is whose attention catches the 20%. If the same person who accepted the 30-second code runs the 3-hour debug, the gap stays. If the gate refuses the merge until the bite-check fires, the gap moves where it can't be skipped.

Harsh • Jun 23

Mike this is the most technically precise comment in the thread. Thank you Confident imagination of edge cases by the writer is a different distribution from the ones that actually appear That's it. The AI doesn't imagine edge cases it simulates them. And the simulation is based on what it already knows, not what it hasn't seen. The real gaps come from business logic the model never had, legacy data it never saw, null branches it didn't know existed.

Asking 'what could go wrong?' returns the most plausible-sounding gaps. Not the real ones This is the key. The AI will confidently invent gaps that sound plausible but they're the ones it can imagine. The real gaps are the ones it can't A test that passes against both states isn't a gate. It's decor with passing CI Line of the year.

If the same person who accepted the 30-second code runs the 3-hour debug, the gap stays This is the structural problem. The person who says it works and the person who says it's ready need to be different or at least different mindsets.

Thank you for this it's the most important comment in the thread. 🙌

Mike Czerwinski • Jun 23

Simulation vs imagination — that's the cut. The simulator is bound to the lineage that trained it; the real edge cases are out-of-distribution by construction, which is why "ask harder" doesn't reach them. The says-it-works/says-it's-ready split is the structural answer at the team layer — same shape as the test that must fail against pre-change code at the gate layer and the operator-is-the-second-view position at the workflow layer. Three places to put the catch, same reason it has to live somewhere the writer can't reach.

Harsh • Jun 23

Mike this is a framework. Not just a comment. A framework Simulation vs imagination that's the cut. The simulator can only produce what it's seen. The real edge cases are, by definition, out-of-distribution. No amount of ask harder gets you there.

Three places to put the catch

Gate layer - test that must fail against pre-change code
Team layer - says-it-works vs says-it's-ready split
Workflow layer - operator-as-second-view

Same shape at every layer the writer can't be the only reviewer This is the structural answer. Not better prompts. Not more careful prompting. A system where the gap has nowhere to hide.

Thank you for this it's the most useful comment I've received on this post. 🙌

Mike Czerwinski • Jun 23

Thanks for the read. The framing about layers needing structural independence from each other matters too — gate + team + workflow only do the work if they aren't reading from the same source. Otherwise three layers becomes one in three coats. There's a long-form version of this coming this week — the version one floor up, where the same disease shows up at the multi-path verification layer. I'll point you at it when it lands.

Nazar Boyko • Jun 24

The reframe that does the work here is that AI relocated the effort instead of cutting it. That's the whole trap. The 80% feels finished, so your guard drops right before the part that needs it most. Writing the failing tests first is the move I'd keep, since it hands the model the unhappy path as a target before it writes the happy one, and some edge cases get caught on the way in instead of three days after launch. The 4x budget is smart too, though I'd bet the real multiplier swings a lot with how strange your domain data is.

Harsh • Jun 24

This is the sharpest reframe in the thread AI relocated the effort instead of cutting it. That's it. The work isn't gone. It just moved from writing to finishing. And finishing is harder because it requires judgment, context, and the ability to recognize what the AI couldn't see.

The 80% feels finished, so your guard drops right before the part that needs it most. This is the psychological trap. The speed of the first 80% convinces you the hard part is over. It's not. It's just been deferred. Failing tests first hands the model the unhappy path as a target before it writes the happy one.

Exactly. The test defines what shouldn't happen before the AI writes what should happen. That flips the whole dynamic The real multiplier swings with how strange your domain data is. Yes the 4x is a baseline. The stranger your data, the higher the multiplier. Legacy systems, weird business logic, inconsistent data formats the AI's blind spots are bigger there.

Thanks for this it's the most precise comment in the thread. 🙌

Theo Valmis • Jun 24

The honest part here is 'it optimized for the world where everything goes right.' That's the whole failure: the model writes the happy path because the happy path is the well-trodden one in its training, and edge cases are by definition the under-documented territory. The last 20% is slow because it's the part no prompt specified and no example covered. Which means the lever is specifying the failure modes up front, not discovering them on Thursday. The 80% you saved on Tuesday was never the expensive part.

Harsh • Jun 24

Most precise comment here Model writes happy path because happy path is well-trodden in training that's it. Edge cases are under-documented. AI hasn't seen them The last 20% is slow because no prompt specified and no example covered structural answer.

The lever is specifying failure modes up front, not discovering them on Thursday the line The 80% saved on Tuesday was never the expensive part - exactly.

Best comment here. 💎

mote • Jun 29

The 80/20 framing is exactly right, and the deeper issue is that AI has no concept of "done." A human developer knows when they're done because they understand the domain -- they know what the code is for, who uses it, and what happens when edge cases surface. AI has none of that.

The first 80% is pattern-matching against what "correct" looks like. The last 20% is judgment -- which edge cases matter, which null checks are critical, which error messages users will actually read. That judgment comes from experience with the problem space, not from training data.

One pattern I've found useful: write the error messages first. Before the happy path, write what the code should say when something goes wrong. That forces you to think about the failure modes, and gives the AI something to optimize toward instead of just "it should work."

Harsh • Jun 29

The deeper issue is that AI has no concept of 'done' that's it. You, the developer, know when it's done because you understand the domain. You know who uses it. You know what happens when something breaks. The AI has none of that. It just keeps generating until you stop it.

The first 80% is pattern-matching. The last 20% is judgment. This is the cleanest way to frame it. Pattern-matching is fast. Judgment is slow because it requires experience, context, and the ability to prioritize which edge cases actually matter.

Write the error messages first. This is brilliant. Before the happy path exists, write what the code should say when it fails. That forces you to name the failure modes before the AI optimizes for success. The AI then has something to optimize toward not just make it work.

Thank you for this the error-messages-first pattern is going into my toolkit. 🙌

Mateo Ruiz • Jun 24

The “AI wrote it in 10 minutes, I shipped it in 3 hours” pattern feels very real.

What’s interesting is that the last 20% is usually where product knowledge, business rules, and operational realities show up. AI can generate a solid implementation, but it rarely knows about the legacy edge case, the enterprise customer workflow, or the failure mode that only appears under production load.

One thing we've noticed across AI-assisted development engagements at IT Path Solutions is that teams often overestimate generation speed and underestimate verification time. The bottleneck shifts from writing code to validating assumptions, handling edge cases, and making sure the feature survives real-world usage.

The faster AI gets at the first 80%, the more important that final 20% becomes. That's often where production readiness is decided.

Harsh • Jun 24

The bottleneck shifts from writing code to validating assumptions that's the new reality Writing is no longer the constraint. Validation is Teams overestimate generation speed, underestimate verification time pattern, not anecdote.

The faster AI gets at 80%, the more important the final 20% becomes exactly. Production readiness is decided there.

Thanks for the real-world data. 🙌

urmila sharma • Jun 23

This is so true! I've experienced this firsthand. Getting the initial prototype with AI takes just a few prompts, but then comes the real grind edge cases, error handling, and making it production-ready. The last 20% really tests your actual coding skills. Great article!

Harsh • Jun 23

Urmila The last 20% really tests your actual coding skills That's the line. 🙏

The AI can handle the familiar. The 20% is where the unfamiliar lives and that's where you actually have to think Edge cases, error handling, production reality those can't be prompted away.

Thanks for reading and for naming the real test. 😊

Danil • Jun 23

There are already techniques that can increase the percentage of quality work done by AI from 80% to, say, 90–95%. Spec Driven Development, for example. In that case, using AI turns from plain vibe-coding into high-level engineering, where you and the LLM align on approaches, test cases, and possible states. I personally know of a large, old project with a backend team of more than 20 people alone. They adopted SDD, and development and maintenance became much faster. It also became easier for newcomers to get up to speed. Of course, you still can't do without a human touch, but in my opinion, it's important to know how to use modern tools competently.

Harsh • Jun 23

This is a great point and honestly, the article should have mentioned SDD Spec Driven Development turns AI from vibe-coding into high-level engineering That's the frame shift. The AI doesn't get smarter. The input gets better more structured, more explicit, more aligned with what the AI actually needs to produce something useful.

The 20% gap isn't fixed. It's a function of how well you set up the problem before the AI writes a single line Large, old project, 20+ backend team SDD made development and maintenance much faster This is the real-world proof. Not a greenfield project. A legacy codebase. The kind where edge cases are hiding everywhere. SDD gave structure to the process and the AI could work within that structure.

I'd add: SDD doesn't eliminate the 20%. It makes the 20% predictable. Instead of discovering edge cases in production, you're listing them in the spec before any code is written.

Thanks for this adding it to my toolkit. 🙌

View full discussion (23 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.