Evan Lausier

Posted on Jan 23

The 66% Problem

#ai #programming #productivity #coding

I spent three hours last Tuesday chasing a bug that didn't exist.

The code looked perfect. It was syntactically correct, followed best practices, and even had thoughtful comments explaining what each function did. The problem was that one of those functions was solving a problem I never asked it to solve. Claude had decided, in its infinite pattern-matching wisdom, that my API endpoint needed pagination. I hadn't asked for pagination. I didn't want pagination. But there it was, breaking my response structure in ways that took me longer to diagnose than it would have taken to write the whole thing myself.

This is the 66% problem.

According to Stack Overflow's latest developer survey of over 90,000 developers, 66% said their biggest frustration with AI coding assistants is that the code is "almost right, but not quite." Another 45% said debugging AI-generated code takes more work than it's worth.

I find those numbers oddly comforting. Not because I enjoy suffering, but because it means I'm not losing my mind. The tools that were supposed to make me faster have introduced a new category of bug that didn't exist three years ago: the bug that looks like working code.

Here's the thing about completely wrong code. It fails loudly. It throws errors. It refuses to compile. Your test suite catches it. You fix it and move on with your life. But almost-right code? That's the code that passes your tests, ships to staging, and then does something subtly insane at 2am when your biggest client runs a batch job you forgot they were running.

The old bugs were honest. They announced themselves. These new bugs are polite. They wait.

I've been writing code professionally for decades... I've made every mistake you can make. I've shipped SQL injection vulnerabilities. I've accidentally deleted production data. I once re-imaged over a production database locking myself out.Those were my mistakes, and I understood them immediately when they blew up. The feedback loop was tight: I did something dumb, the system complained, I learned not to do that again.

The AI-generated bugs don't work that way. When something breaks now, my first question isn't "what did I do wrong?" It's "what did the AI do that I didn't notice?" That's a fundamentally different kind of debugging. Instead of understanding my own logic, I'm reverse-engineering someone else's assumptions about what I probably wanted.

Microsoft Research published a study earlier this year that quantified this. They tested nine different AI models on SWE-bench Lite, a benchmark of 300 real-world debugging tasks. The best performer, Claude 3.7 Sonnet, solved 48.4% of them. Less than half. These weren't exotic edge cases. They were the kinds of bugs that wouldn't trip up an experienced developer.

The models are phenomenal at writing code. They struggle to fix it.

This makes a perverse kind of sense when you think about how they work. Code generation is pattern completion. You give the model a prompt, it predicts what code probably comes next based on billions of examples. That's genuinely useful for boilerplate, for syntax you've forgotten, for exploring unfamiliar libraries. But debugging isn't pattern completion. Debugging is hypothesis testing. It requires understanding what the code is supposed to do, what it's actually doing, and why those two things are different.

That "why" is where everything falls apart. The AI doesn't know why your system is architected the way it is. It doesn't know about the business rule your CEO insisted on in 2019 that makes no logical sense but accounts for 40% of your revenue. It doesn't know that your database schema has a quirk because you migrated from Oracle fifteen years ago and nobody wants to touch it. It just sees patterns and matches them.

The METR randomized trial from July 2025 found something that should concern all of us. They had experienced open-source developers complete tasks with and without AI assistance. The AI group was 19% slower on average. But here's the part that keeps me up at night: they believed they were 24% faster. Before starting, participants predicted AI would speed them up. After finishing, even with slower results, they still thought it had helped.

We're not just getting almost-right code. We're getting almost-right code while feeling productive. The dopamine hit of instant completion masks the debugging debt accumulating behind us.

I'm not going to tell you to stop using AI tools. I use them constantly. But I've started treating them differently than I did a year ago. I used to accept suggestions and move on. Now I read every line like it was written by a junior developer who's very confident and moderately competent. Because that's essentially what it is.

The 66% aren't complaining because the tools are bad. They're complaining because the tools are good enough to be dangerous. A hammer that misses the nail is annoying. A hammer that hits almost the right spot is how you end up with a crooked house.

I don't have a solution. I'm not sure anyone does yet. The tools will get better. The context windows will get longer. The models will learn to ask clarifying questions instead of assuming. Maybe.

Until then, I'm keeping my print statements close and my test coverage closer. Some skills don't need to be automated. They need to be sharpened.

Top comments (31)

Sylwia Laskowska • Jan 23

Oh yes, I remember when my UX designer wrote his very first piece of Python code in ChatGPT 😄
It worked at first, but once he asked for “optimization”, something broke. In the end he asked me to take a look. I removed about 80% of the code because it was adding things he didn’t actually need, tweaked the main function a bit - and it worked.
He decided I was a genius after that 😄

Pascal CESCATO • Jan 28

You're a genius anyway 😉 – that said, I've written code with chatGPT, Claude, and many others… for simple cases, it's fine. But for production-ready code… hmmm, you understand. And debugging is the same. Asking it to find the error in simple code, sure – but if there's functional logic behind it, not detailed… of course, it won't find it. Worse, it might rewrite the code for you, with terrible self-assurance, explaining that it found the error… and your code will become an ocean of nonsense.

Shitij Bhatnagar • Jan 28

Agree, and post AI code generation, one can go on telling AI about its code mistakes, it will apologize politely, show some crappy fix .. for me, the amount of time lost in getting AI to produce better code could have been better utilized to train a human or fix the issue myself.

Sylwia Laskowska • Jan 28

Haha 😄 definitely not a genius — unless we’re talking about a genius of chaos 😂

But yes, totally agree. For simple cases, quick prototypes, or a first pass at debugging, it can be really helpful. For larger projects with real business logic and context… well, that’s a whole different story — and a pretty risky one 😅

Pascal CESCATO • Jan 28

Hey! A genius of chaos is still a genius 😄

Evan Lausier • Jan 28 • Edited

LOL I donno @sylwia-lask .... I've read a lot of your stuff... you might be.. 😊 strict equality one was really good.

I don't think I realized how widespread this was... I thought it was just me for a while.😂

Sounds like were all sharing in the fun haha

Sylwia Laskowska • Jan 28

Haha 😄 careful, you’re raising expectations now!

Don’t worry - I’ll balance it out soon. Tomorrow I’ll probably publish a post about how bad I am at CSS 😂

Evan Lausier • Jan 24

HA! That's so funny! 😂

Fred Brooker • Jan 28

I feel your pain

I've spent 5 hours debugging nonexistent bugs until it was clear that the unit tests were flawed 😂

simply put - Gemini created bad test cases and could not solve the problem afterwards
😂 😂 🍸 💩

Evan Lausier • Jan 28

Oh my!! 😂😂 That is like the AI version of "The Good Idea Fairy" 😂

Web Developer Hyper • Jan 24

I always check AI outputs carefully and ask follow-up questions about the code. Sometimes I also go back to the official documentation to verify whether the AI’s output is really correct. Even so, I still miss bugs that I didn’t anticipate.😭
However, AI is improving very quickly and getting better day by day. So I believe that as my skills improve and AI coding improves as well, the number of bugs will decrease in the future.

Evan Lausier • Jan 24

It really is. I find myself using it more and more for quick analysis when I am strapped for time. It more often than not points me in the right direction but is not quite there in the detailed root causes.

KC • Feb 2

Microsoft Research published a study earlier this year that quantified this. They tested nine different AI models on SWE-bench Lite, a benchmark of 300 real-world debugging tasks.

@evanlausier could you share which research study that was done by showing any link or reading reference? I'm interested to know on which factor did they make the study.

Some skills don't need to be automated. They need to be sharpened.

Agree with this. Moreover, we need to set up the metrics on how they can be improved. It would be better if we could set up a benchmark for the specific skills to match the market's expectation.

Evan Lausier • Feb 4 • Edited

That is going to be the really challenging part. How do we set the benchmarks? But yes, the blog is below. A colleague came across it and shared with me after a discussion on leveraging the technology at work.

microsoft.com/en-us/research/blog/...

Marina Eremina • Jan 28

It’s great to see these Stack Overflow surveys, right? I also find it reassuring when my own opinion about a tool aligns with what a large part of the community thinks, looks like many of us try it out and reach similar conclusions :)

Evan Lausier • Jan 28

I know right!? I thought I was the only one too!!

Hathrel • Feb 5

When working with AI I stick to two rules:

1) Don't ask the AI to do anything you, yourself cannot do

2) Never let the AI dictate the structure of your code.

Everything an AI gives me I read over and make sure it's doing what I want it to, in the way I want it to do it. It's a great productivity tool, but it doesn't have your power of abstraction or vision.

Evan Lausier • Feb 5

Love it! Totally agree

Shitij Bhatnagar • Jan 28 • Edited

I fully agree with the intent of the article and the findings and especially, 'debugging isn't pattern completion. Debugging is hypothesis testing'. Debugging is often not discussed much but every engineer knows that at times it can be very frustrating to debug code, even the code you may have written or reviewed/approved earlier and debugging skills are 'non-negotiable', they can be a differentiator as well.

The whole narrative around code generation by AI is broken, because at the end of the day, if I were a production manager, I would never trust AI generated code and when something breaks, a bot will not fix, I need a man and that man needs to be confident on the code itself, that link itself is missing.

AI code can be an unverified assistance at most, not the main actor. I have also noticed how context-less the remarks from AI code review bots can be (take the online git management tools) on MRs, and that's because they are just following some rules fed to them, not because they found some real bug in your code. I feel, Find Bugs and PMD were more predictable than these AI code review bots.

Thanks for the article.

Evan Lausier • Jan 28

Right? Thank you so much!! Im really glad it resonated. It probably helped the article that I spent half the morning debugging something one of my junior resources got from using an AI code tool 😊

Alois Sečkár • Jan 28

"almost right, but not quite" - every single time I try to generate an AI image...

Hariprasad • Jan 31

I have been struggling on this problem from months now. one of a junior developer written 6000+ lines of testing code in a single PR. While reviewing it, I wonder do he actually read all these before sending it for my review. In a highly productive month, I would have written a 1000 to 2000 lines max for a month of work, but seeing such a number in a single PR is so insane.

Evan Lausier • Feb 1

Omg, yea thats crazy!

Evan Lausier • Feb 1

LOL right?

View full discussion (31 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.