DEV Community

Cover image for Why AI-Generated Code Is Always Good Enough — And Never Great

Why AI-Generated Code Is Always Good Enough — And Never Great

Harsh on May 25, 2026

AI wrote a function for me last week It worked Tests passed Edge cases handled I shipped it. But something bothered me - not enough to rewrite it ...
Collapse
 
rondo profile image
Rondo

No matter how harness and agent instructions are well structured, I think lack of context for AI is inevitable. Naming rules, wide-spread code patterns, and some in-person context is what AI can't exactly grab.
It may not be a big deal for new projects, but for old enough projects, that lack of context may come as a disaster. So personally, even though I sometimes let AI change our codebase I always check and revise the changed code to follow the missing context.

Collapse
 
harsh2644 profile image
Harsh

Rondo lack of context for AI is inevitable That's the honest truth most people skip You can write perfect prompts. You can define every rule. But context isn't just code. It's conversation. It's history. It's the argument you had six months ago about why that weird pattern exists. AI wasn't there for that Not a big deal for new projects but for old enough projects, disaster.

Yes. Greenfield code has no history to miss. Brownfield code is full of ghosts decisions made for reasons that aren't written anywhere AI can't see ghosts I always check and revise changed code to follow the missing context That's the practice Not AI writes I approve AI writes, I check, I revise I add the context the AI couldn't see That's not overhead that's craftsmanship.

Thank you for this practical and honest. 🙌

Collapse
 
mudassirworks profile image
Mudassir Khan

the 'written by someone who had never used the product' line is the real diagnosis. AI code comes from a system with no skin in the game — it doesn't have to maintain it, debug it at 2am, or explain it to the next person.

the naming vagueness is actually the tell. we ran evals where we fed AI generated modules back to the AI and asked it to describe what they do. accuracy was noticeably worse than on human written code with similar complexity. the model had no intent to recover — it can't, because intent was never there to begin with.

does the problem get worse with longer functions, or is it consistent regardless of scope?

Collapse
 
harsh2644 profile image
Harsh

Mudassir this is the most data-backed comment in the thread. Thank you for running the eval No skin in the game that's it. AI doesn't live with its mistakes. It doesn't wake up to a page at 2 AM. It doesn't have to explain its choices to a colleague who's taking over The code has no owner. And that absence leaks into the code itself.

The model had no intent to recover because intent was never there to begin with This is the key insight. Human code has intent baked in choices made for reasons. Even if the implementation is wrong, the why is recoverable. AI code has no intent to recover It's just pattern output. If the pattern is wrong, there's nothing underneath to find To your question: longer functions or consistent across scope My sense is it gets worse with length Short functions the intent is still guessable Long functions the AI starts optimizing for coherence, not correctness. The logic holds together, but the reason it holds together that way becomes increasingly opaque. Would love to see your eval data on this.

Thank you for adding real evidence to the conversation. 🙌

Collapse
 
urmila_sharma_78a50338efb profile image
urmila sharma • Edited

Great read! Really explains why AI code feels like a solid junior developer gets the job done but lacks that spark of creativity and deep architecture thinking.

Collapse
 
harsh2644 profile image
Harsh

Urmila solid junior developer is the perfect analogy Gets the job done. Tests pass. No one complains But the senior sees what's missing The structure The foresight AI is stuck at junior level. Not because it can't learn because it has no scars.

Thanks for this framing. 🙌

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

The 'compound cost' part is what most teams miss. AI code is fine at write time. The pain shows up at refactor time, when nothing has the small idiosyncrasies that signal intent. You can't read between the lines because there are no lines between the lines. Keeping a human on the 'why' saves you, but it costs the speed everyone hired AI for in the first place.

Collapse
 
harsh2644 profile image
Harsh

Valentin the pain shows up at refactor time That's it Write time cheap Refactor time expensive AI optimizes for the cheap part You can't read between the lines because there are no lines between the lines perfect description. Human code has fingerprints AI code is smooth No clues.

Keeping human on why costs the speed you hired AI for the trade-off no one talks about.

Thank you for this precision. 🙌

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Thanks Harsh. The 'no fingerprints' metaphor is the sharpest version of this I've seen. One thing it extends to: attribution. When something breaks six months later, human code usually leaves a trail back to a Slack thread or a ticket. AI output is a clean wall, you can't even ask 'why did we do it this way'.

Collapse
 
embernoglow profile image
EmberNoGlow

I think it's not so good because it's "smooth" - there are no variables with strange names, no functions that are not clear why they are needed, and of course the style is the same

Collapse
 
harsh2644 profile image
Harsh

Ember smooth as criticism is a fascinating angle Smooth usually means good Here smooth means predictable Safe The AI never makes weird, interesting choices Strange names, unclear functions those sometimes come from human leaps AI doesn't leap. It walks the well-worn path.

Smooth is safe Safe is good enough Good enough is never great.

Thanks for this lens. 🙌

Collapse
 
copods profile image
Growth Copods

Being in a product engineering team means we see AI-generated code across client projects constantly. This article names something we've been trying to articulate for months.
The taste gap is real, and in our work, it shows up most visibly at the design-to-code boundary. AI can generate a functional component quickly, but it almost never preserves the intent behind a design decision. Why this interaction pattern? Why this spacing behavior? That reasoning disappears entirely, and the next developer inherits working code with no sense of what shaped it.
From what we've observed, investing in detailed convention documentation and giving AI structured context about how your team thinks and builds does seem to help with the context gap somewhat. Not a complete answer, but the output shifts noticeably when the model has more to work with than a bare prompt.
The consequence gap is the one we keep coming back to. The judgment that comes from living with your own past decisions, debugging them, and inheriting their costs is something that builds quietly over time and is hard to replicate through any other means.
The habit of treating AI output as a starting point rather than a finished answer feels like the most grounded way to work with it. The challenge is keeping that discipline steady when timelines tighten, which is usually exactly when it matters most.

Collapse
 
stoyan_minchev profile image
Stoyan Minchev

I know that feeling, but I wonder also how much the result is affected by the used model?
And if you put the proper rules, regarding, style, deepness, naming, level of comments, will this lead to a result that does not bother the intuition? ;)

Collapse
 
harsh2644 profile image
Harsh

Stoyan can better rules close the gap? Partially Yes Better prompts better output. Clear naming, style rules, comment depth all help.

But judgment remains AI follows rules It can't know when to break them Great code sometimes breaks its own patterns AI won't do that unless you tell it and you can't predict when.

Model matters, but ceiling is similar across models Better rules raise the floor They don't create the ceiling.

Thanks for the question. 🙌

Collapse
 
motedb profile image
mote

The 50/50 split is the real tension here — and it's getting harder to call.

On one side: vibe coding accelerates the boring parts, and that's genuinely valuable when you're prototyping. On the other: you lose the muscle memory that makes you dangerous when things break.

I watched a junior dev spend three days debugging a React hydration issue last month. Their first instinct was to prompt their way out. When that failed, they had no mental model to fall back on. That's not a failure of AI — it's a gap in fundamentals.

The question worth asking: does vibe coding help people build the models they need to eventually go solo? Or does it create a dependency loop where you always need the tool to think through the problem?

Collapse
 
harsh2644 profile image
Harsh

Mote watched a junior dev spend three days debugging a React hydration issue. First instinct: prompt. When that failed, no mental model to fall back on that's the nightmare scenario. Not AI made a mistake Human had no backup plan that's not a failure of AI it's a gap in fundamentals this is the honest take. The AI isn't the villain. The dependency is. If the only way you can solve problems is by prompting, you're not a developer with a tool you're a passenger with a steering wheel.

Does vibe coding help people build the mental models to eventually go solo? Or does it create a dependency loop this is the question. And I don't think we know the answer yet.

Some people will use AI as a scaffold learning from it, internalizing patterns, eventually needing it less.
Others will use it as a crutch never building the muscle, always needing the tool.

The difference isn't the tool. It's the person holding it.

Thanks for asking the hard question not the easy answer. 🙌

Collapse
 
codecraft154 profile image
codecraft

This perfectly captures what many teams are experiencing but are rarely articulating.
AI is definitely exceptional at reducing the cost of producing code, but what it hasn't reduced is the cost of understanding, maintaining, and evolving code, and those are different problems.

I've found that AI-generated code often gets me 80-90% of the way there. The remaining 10% is where engineering judgment lives: naming things, simplifying complexity, aligning with team conventions, anticipating future changes, and making intent obvious to the next developer. The most valuable skill isn't writing code faster anymore, but recognizing when "correct" isn't enough.

Interestingly, the best AI-assisted developers I know aren't the ones accepting the most generated code. They're the ones who can quickly identify where the code needs refinement and where it already meets the standard. AI accelerates implementation, but taste, context, and ownership still come from experience. The shift, in my opinion, may not be from developer to AI operator, but from code producer to code curator. And for now, that's still, fundamentally, a very "human job."

Collapse
 
harsh2644 profile image
Harsh

codecraft AI reduces producing cost, not understanding maintaining evolving cost that's the line Remaining 10% is where engineering judgment lives not 90% of value The 10% that makes 90% usable over time Most valuable skill: recognizing when correct isn't enough from execution to judgment Best AI-assisted developers aren't accepting most generated code they know where refinement is needed Discernment > volume.

From code producer to code curator the title update we've all been looking for.

Most complete comment here. 🙌

Collapse
 
pavanbhatia profile image
Pavan Bhatia

Really enjoyed this perspective. I think the most dangerous misconception around AI-generated code is that “working code” automatically equals “production-ready architecture.”

In my experience, AI is extremely effective at accelerating:

  • boilerplate generation
  • repetitive refactors
  • test scaffolding
  • infrastructure templates
  • documentation synthesis

But it still struggles with the higher-order engineering tradeoffs that only emerge under real production conditions — latency amplification, failure domains, concurrency bottlenecks, sequencing issues, operational observability, rollback safety, etc.

The gap becomes especially visible in distributed systems. A generated ORM loop may look perfectly valid until a 2–3ms RTT increase suddenly multiplies thousands of synchronous round-trips into a major production incident.

AI can absolutely increase developer velocity, but I increasingly see senior engineering value shifting toward:

  • systems thinking
  • architecture validation
  • operational risk analysis
  • performance modeling
  • failure-mode reasoning

The code itself is becoming the easier part.

Collapse
 
harsh2644 profile image
Harsh

Pavan the code itself is becoming the easier part That's the line AI accelerates boilerplate refactors scaffolding templates docs The predictable work AI struggles with: latency amplification, failure domains concurrency rollback safety, operational observability The work that only reveals itself under real production load Generated ORM loop looks valid until a 2-3ms RTT increase multiplies thousands of synchronous round-trips into a major incident.

This is the perfect example The code isn't wrong. The architecture is wrong for the environment AI can't see the environment Senior engineering value shifting to systems thinking architecture validation operational risk analysis failure-mode reasoning Not writing code faster Thinking about how code behaves under real conditions

The easier part is getting easier. The hard part isn't going anywhere.

Thank you for this the most senior-level comment in the thread. 🙌

Collapse
 
dcstolf profile image
Daniel Stolf • Edited

The consequence gap is the one I keep coming back to, because of all three it's the one with at least a partial structural answer.

AI doesn't have scars , agreed. But a team does. The scars are in post-mortems, in "we tried this and it bit us in 2024" stories, in conventions that exist for non-obvious reasons. Most of that knowledge lives in people's heads or in Slack archives nobody searches.

What helps: writing that scar somewhere the AI sees every session. Take the scars out of people's heads and into the repo. A small always-injected file (call it CLAUDE.md, AGENTS.md, a project bible, whatever) that gets prepended to every session, containing the conventions, the key architectural decisions, and a running log of "here's a thing we got burned by, here's why we now do X." When the AI generates the next function, it's working against your team's scar tissue, not against generic best practices.

This doesn't close the Taste Gap, though. Taste is judgment, not pattern-matching, and no markdown file gives an LLM judgment. But it does narrow the Consequence Gap: the AI can't have felt your specific pain, but it can read the lesson you wrote down after feeling it. The Context Gap responds to the same move, "this team hates clever abstractions" is a one-liner in conventions if you bother to write it.

What this can't do is make the AI surprising in a good way. That part is still you.

Collapse
 
harsh2644 profile image
Harsh

Daniel take the scars out of people's heads and into the repo That's the most practical sentence in this entire thread AI doesn't have scars. But a team does the AI can't learn from your pain unless you write the pain down. That's the move. Not better training data. Better documentation of your specific failures.

CLAUDE.md / AGENTS.md a small always-injected file containing conventions, key decisions, and a running log of 'here's a thing we got burned by, here's why we now do X This is the structural fix. Not prompt better Design the context the AI sees. If the AI reads your team's scar tissue every session, it will generate code that respects those scars The AI can't have felt your specific pain but it can read the lesson you wrote down after feeling it This is the hopeful part. You don't need the AI to be human. You just need it to *remember what you told it What this can't do is make the AI surprising in a good way. That part is still you

The honest closing. You can narrow the gaps You can't make the AI creative Taste, judgment, the spark of clever that's still human territory.

Thank you for this practical, honest, and beautifully written. 🙌

Collapse
 
mushfiq_rahmanmushfiq_ profile image
Mushfiq Rahman

This is so true " Hard to read on first glance - you have to trace it before you understand it" . I have been feeling this recently.

Collapse
 
harsh2644 profile image
Harsh

Mushfiq that feeling is the quiet signal. When you have to trace before you understand the code isn't yours yet even if it works Not a bug Just a sign that the AI optimized for execution, not for reading

The good news: that feeling fades when you rewrite the parts that made you pause Takes a few extra minutes. Saves hours later.

Thanks for reading and for naming the feeling. 🙌

Collapse
 
evanlausier profile image
Evan Lausier

Im noticing the same. Its always just enough to pass but lacks the creativity you see with a human developer.

Collapse
 
harsh2644 profile image
Harsh

Just enough to pass perfect phrase Not wrong Not broken Just adequate Adequacy is fine But the hidden cost shows up later when just enough makes the next change harder than it should be.

Creativity isn't flashy It's seeing a path the AI missed.

Thanks, Evan. 🙌