DEV Community

The 66% Problem

Evan Lausier on January 23, 2026

I spent three hours last Tuesday chasing a bug that didn't exist. The code looked perfect. It was syntactically correct, followed best practices, ...

Read full post

Sylwia Laskowska • Jan 23

Oh yes, I remember when my UX designer wrote his very first piece of Python code in ChatGPT 😄
It worked at first, but once he asked for “optimization”, something broke. In the end he asked me to take a look. I removed about 80% of the code because it was adding things he didn’t actually need, tweaked the main function a bit - and it worked.
He decided I was a genius after that 😄

Pascal CESCATO • Jan 28

You're a genius anyway 😉 – that said, I've written code with chatGPT, Claude, and many others… for simple cases, it's fine. But for production-ready code… hmmm, you understand. And debugging is the same. Asking it to find the error in simple code, sure – but if there's functional logic behind it, not detailed… of course, it won't find it. Worse, it might rewrite the code for you, with terrible self-assurance, explaining that it found the error… and your code will become an ocean of nonsense.

Shitij Bhatnagar • Jan 28

Agree, and post AI code generation, one can go on telling AI about its code mistakes, it will apologize politely, show some crappy fix .. for me, the amount of time lost in getting AI to produce better code could have been better utilized to train a human or fix the issue myself.

Sylwia Laskowska • Jan 28

Haha 😄 definitely not a genius — unless we’re talking about a genius of chaos 😂

But yes, totally agree. For simple cases, quick prototypes, or a first pass at debugging, it can be really helpful. For larger projects with real business logic and context… well, that’s a whole different story — and a pretty risky one 😅

Pascal CESCATO • Jan 28

Hey! A genius of chaos is still a genius 😄

Evan Lausier • Jan 28 • Edited

LOL I donno @sylwia-lask .... I've read a lot of your stuff... you might be.. 😊 strict equality one was really good.

I don't think I realized how widespread this was... I thought it was just me for a while.😂

Sounds like were all sharing in the fun haha

Sylwia Laskowska • Jan 28

Haha 😄 careful, you’re raising expectations now!

Don’t worry - I’ll balance it out soon. Tomorrow I’ll probably publish a post about how bad I am at CSS 😂

Evan Lausier • Jan 24

HA! That's so funny! 😂

Fred Brooker • Jan 28

I feel your pain

I've spent 5 hours debugging nonexistent bugs until it was clear that the unit tests were flawed 😂

simply put - Gemini created bad test cases and could not solve the problem afterwards
😂 😂 🍸 💩

Evan Lausier • Jan 28

Oh my!! 😂😂 That is like the AI version of "The Good Idea Fairy" 😂

Web Developer Hyper • Jan 24

I always check AI outputs carefully and ask follow-up questions about the code. Sometimes I also go back to the official documentation to verify whether the AI’s output is really correct. Even so, I still miss bugs that I didn’t anticipate.😭
However, AI is improving very quickly and getting better day by day. So I believe that as my skills improve and AI coding improves as well, the number of bugs will decrease in the future.

Evan Lausier • Jan 24

It really is. I find myself using it more and more for quick analysis when I am strapped for time. It more often than not points me in the right direction but is not quite there in the detailed root causes.

KC • Feb 2

Microsoft Research published a study earlier this year that quantified this. They tested nine different AI models on SWE-bench Lite, a benchmark of 300 real-world debugging tasks.

@evanlausier could you share which research study that was done by showing any link or reading reference? I'm interested to know on which factor did they make the study.

Some skills don't need to be automated. They need to be sharpened.

Agree with this. Moreover, we need to set up the metrics on how they can be improved. It would be better if we could set up a benchmark for the specific skills to match the market's expectation.

Evan Lausier • Feb 4 • Edited

That is going to be the really challenging part. How do we set the benchmarks? But yes, the blog is below. A colleague came across it and shared with me after a discussion on leveraging the technology at work.

microsoft.com/en-us/research/blog/...

Marina Eremina • Jan 28

It’s great to see these Stack Overflow surveys, right? I also find it reassuring when my own opinion about a tool aligns with what a large part of the community thinks, looks like many of us try it out and reach similar conclusions :)

Evan Lausier • Jan 28

I know right!? I thought I was the only one too!!

Hathrel • Feb 5

When working with AI I stick to two rules:

1) Don't ask the AI to do anything you, yourself cannot do

2) Never let the AI dictate the structure of your code.

Everything an AI gives me I read over and make sure it's doing what I want it to, in the way I want it to do it. It's a great productivity tool, but it doesn't have your power of abstraction or vision.

Evan Lausier • Feb 5

Love it! Totally agree

Shitij Bhatnagar • Jan 28 • Edited

I fully agree with the intent of the article and the findings and especially, 'debugging isn't pattern completion. Debugging is hypothesis testing'. Debugging is often not discussed much but every engineer knows that at times it can be very frustrating to debug code, even the code you may have written or reviewed/approved earlier and debugging skills are 'non-negotiable', they can be a differentiator as well.

The whole narrative around code generation by AI is broken, because at the end of the day, if I were a production manager, I would never trust AI generated code and when something breaks, a bot will not fix, I need a man and that man needs to be confident on the code itself, that link itself is missing.

AI code can be an unverified assistance at most, not the main actor. I have also noticed how context-less the remarks from AI code review bots can be (take the online git management tools) on MRs, and that's because they are just following some rules fed to them, not because they found some real bug in your code. I feel, Find Bugs and PMD were more predictable than these AI code review bots.

Thanks for the article.

Evan Lausier • Jan 28

Right? Thank you so much!! Im really glad it resonated. It probably helped the article that I spent half the morning debugging something one of my junior resources got from using an AI code tool 😊

Alois Sečkár • Jan 28

"almost right, but not quite" - every single time I try to generate an AI image...

Hariprasad • Jan 31

I have been struggling on this problem from months now. one of a junior developer written 6000+ lines of testing code in a single PR. While reviewing it, I wonder do he actually read all these before sending it for my review. In a highly productive month, I would have written a 1000 to 2000 lines max for a month of work, but seeing such a number in a single PR is so insane.

Evan Lausier • Feb 1

Omg, yea thats crazy!

Evan Lausier • Feb 1

LOL right?

Vasu Ghanta • Jan 29

Nailed the "66% problem"—AI's polite, lurking bugs are stealthier than honest failures; treat it like overconfident juniors and double-test to sharpen those irreplaceable debugging instincts!

Evan Lausier • Jan 29

Love it! "overconfident juniors" 😂

Vadim Vinogradov • Jan 30

The dopamine hit of instant completion masks the debugging debt accumulating behind us.

Well said, I feel the same

Derek Cheng • Jan 27

LLMs are great at writing code, but definitely need to be supervised. There's a ton that needs to be built in tooling and workflow to make that process super efficient.