Harsh

Posted on May 25

Why AI-Generated Code Is Always Good Enough — And Never Great

#ai #programming #codequality #softwareengineering

The hidden friction of missing intent

AI wrote a function for me last week It worked Tests passed Edge cases handled I shipped it.

But something bothered me - not enough to rewrite it not enough to flag it in review Just enough to leave a small discomfort I couldn't name.

The code was correct It wasn't good.

Variable names were vague in a way that was technically fine but practically annoying The logic was nested one level deeper than it needed to be There were three places where a comment would have explained why not just what - and none of them had one The function did exactly its job but reading it felt like reading an instruction manual written by someone who had never used the product.

AI writes code that works It rarely writes code that sings.

This isn't a complaint about bugs or hallucinations or incorrect outputs It's about a different gap - the gap between "correct" and "elegant" Between no one can complain about this and this is genuinely well-made.

Here's why that gap exists why AI can't cross it and why it matters more than most people are willing to say out loud.

What Good Enough Actually Means

Let's be specific. "Good enough" code:

✅ Passes all tests

✅ Handles the happy path correctly

✅ Covers the common edge cases

✅ Runs without crashing

✅ Does exactly what was asked

It does the job Nobody will complain The ticket gets closed The feature ships.

But "good enough" also means:

❌ Hard to read on first glance - you have to trace it before you understand it

❌ Variable names that make you pause for half a second every time

❌ Nested logic that could be one level flatter without losing any clarity

❌ No comments explaining why a decision was made only what is happening

❌ Structured in a way that makes the next change slightly harder than it needed to be

The AI optimized for correctness It didn't optimize for understanding It generated code that satisfies the requirements It didn't generate code that respects the reader.

And here's the quiet danger: most of the time good enough is genuinely fine Not every function needs to be poetry Not every script needs to be a masterclass. But when every function is just barely passable when the entire codebase is optimized for no one can object to this rather than this is actually good - something shifts.

The baseline lowers Slowly And you stop knowing what great even looks like.

What Great Code Looks Like

Great code isn't just correct It has specific qualities that go beyond passing tests:

Readable. You understand it on the first read not the third You don't need to trace execution to follow the logic.

Self-documenting. Variable names tell you what's happening Function names tell you why You could read it without knowing the surrounding context and still understand the intent.

Simple - not simplistic. The simplest thing that could work chosen deliberately Not the first thing that came to mind.

Surprising in a good way. There's a solution so clean it makes you smile Not clever for the sake of being clever just genuinely the right approach arrived at through judgment.

A joy to change. Adding a feature doesn't feel like surgery The structure anticipates the next developer.

Great code feels crafted Not generated There's a difference - and you can feel it when you read it even if you can't always articulate why.

AI can't write great code Not because it's not technically capable Because great code requires taste It requires judgment A sense of what good looks like beyond correctness - what's appropriate what's overkill what's elegant for this specific situation in this specific codebase.

Taste comes from experience From having read thousands of functions From having been burned by bad code From having fixed bugs at 2 AM in code that worked but was structured wrong in a way that made everything harder.

AI has processed millions of functions But it hasn't felt any of them.

The Three Gaps AI Can't Cross

1. The Taste Gap

AI knows what works It doesn't know what's good Taste isn't pattern recognition it's judgment It's knowing when a familiar pattern is actually a bad fit for this specific situation even if it technically solves the problem It's knowing when the right solution would make the next developer's life harder.

AI can approximate taste by matching patterns from high-quality training data But matching patterns isn't judgment It's mimicry.

2. The Context Gap

Great code fits its context The same solution might be excellent in one codebase and genuinely terrible in another - depending on the team's conventions the performance constraints the expected lifetime of the code the experience level of whoever will maintain it.

AI generates based on the prompt not based on the specific constraints of your project It doesn't know that your team hates clever abstractions It doesn't know this service gets called 10 million times a day It doesn't know this code will be owned by someone who joined last week.

3. The Consequence Gap

AI has never been paged at 2 AM It's never had to debug its own code six months after writing it It's never felt the cost of a bad abstraction the hours spent untangling something that seemed reasonable at the time.

Great code comes partly from knowing what not to do And that knowledge comes from pain from specific memorable experiences of code that bit back AI has no pain No scars No I'll never do that again moments

These three gaps aren't bugs in the technology They're features of what it is AI optimizes for correctness Greatness requires something that correctness alone can't produce.

Why This Matters

Good enough is completely fine for throwaway scripts For one-off automation For prototypes that will be deleted For functions no one will maintain.

But when good enough becomes the default when every function in a production codebase is just passable the codebase quietly becomes something else. Harder to change Harder to understand Harder to debug. Harder to reason about.

You stop knowing if the code is actually correct or just looks correct.

The real cost isn't performance It's comprehension Bad code hides bugs. Good code reveals them. Great code structures things so bugs are harder to introduce in the first place.

Every good enough function is a small tax The AI saved you ten minutes now. That structure will cost you an hour in three months when you need to change it Multiply that across a codebase where everything is just barely passable.

The compound interest on good enough is expensive.

What I'm Doing Differently

I'm not quitting AI That's not the answer and it's not what I want.

But I'm changing how I use it:

I treat AI output as a first draft. Not the final answer A starting point that I'm responsible for finishing The AI writes the code I make it mine.

I ask: Would I approve this if a junior wrote it? Same standards Same code review The source of the code doesn't change the bar it has to meet.

I refactor one function per AI-generated PR. Just one Make it simpler Add the comment that explains why Rename the variable to something that doesn't make me pause Small acts of craftsmanship consistently.

I remember that good enough compounds. Today's it's fine ship it is next month's why is this so hard to change? The feeling of lowering standards is barely noticeable in the moment The cost shows up later.

Will these habits make AI-generated code great? No But they stop me from forgetting what great looks like. And that matters.

One Question

When was the last time you saw AI-generated code that made you say - not that works not that's fine - but that's actually clever?

A solution that surprised you That you wouldn't have written that way yourself but immediately recognized as better.

If you have an example share it in the comments I genuinely want to see what's possible at the top end.

I'll go first - with the one piece of AI-generated code that actually impressed me.

Your turn. 👇

Top comments (28)

Rondo • May 27

No matter how harness and agent instructions are well structured, I think lack of context for AI is inevitable. Naming rules, wide-spread code patterns, and some in-person context is what AI can't exactly grab.
It may not be a big deal for new projects, but for old enough projects, that lack of context may come as a disaster. So personally, even though I sometimes let AI change our codebase I always check and revise the changed code to follow the missing context.

Harsh • May 27

Rondo lack of context for AI is inevitable That's the honest truth most people skip You can write perfect prompts. You can define every rule. But context isn't just code. It's conversation. It's history. It's the argument you had six months ago about why that weird pattern exists. AI wasn't there for that Not a big deal for new projects but for old enough projects, disaster.

Yes. Greenfield code has no history to miss. Brownfield code is full of ghosts decisions made for reasons that aren't written anywhere AI can't see ghosts I always check and revise changed code to follow the missing context That's the practice Not AI writes I approve AI writes, I check, I revise I add the context the AI couldn't see That's not overhead that's craftsmanship.

Thank you for this practical and honest. 🙌

Mudassir Khan • May 26

the 'written by someone who had never used the product' line is the real diagnosis. AI code comes from a system with no skin in the game — it doesn't have to maintain it, debug it at 2am, or explain it to the next person.

the naming vagueness is actually the tell. we ran evals where we fed AI generated modules back to the AI and asked it to describe what they do. accuracy was noticeably worse than on human written code with similar complexity. the model had no intent to recover — it can't, because intent was never there to begin with.

does the problem get worse with longer functions, or is it consistent regardless of scope?

Harsh • May 27

Mudassir this is the most data-backed comment in the thread. Thank you for running the eval No skin in the game that's it. AI doesn't live with its mistakes. It doesn't wake up to a page at 2 AM. It doesn't have to explain its choices to a colleague who's taking over The code has no owner. And that absence leaks into the code itself.

The model had no intent to recover because intent was never there to begin with This is the key insight. Human code has intent baked in choices made for reasons. Even if the implementation is wrong, the why is recoverable. AI code has no intent to recover It's just pattern output. If the pattern is wrong, there's nothing underneath to find To your question: longer functions or consistent across scope My sense is it gets worse with length Short functions the intent is still guessable Long functions the AI starts optimizing for coherence, not correctness. The logic holds together, but the reason it holds together that way becomes increasingly opaque. Would love to see your eval data on this.

Thank you for adding real evidence to the conversation. 🙌

urmila sharma • May 25 • Edited

Great read! Really explains why AI code feels like a solid junior developer gets the job done but lacks that spark of creativity and deep architecture thinking.

Harsh • May 25

Urmila solid junior developer is the perfect analogy Gets the job done. Tests pass. No one complains But the senior sees what's missing The structure The foresight AI is stuck at junior level. Not because it can't learn because it has no scars.

Thanks for this framing. 🙌

Valentin Monteiro • May 26

The 'compound cost' part is what most teams miss. AI code is fine at write time. The pain shows up at refactor time, when nothing has the small idiosyncrasies that signal intent. You can't read between the lines because there are no lines between the lines. Keeping a human on the 'why' saves you, but it costs the speed everyone hired AI for in the first place.

Harsh • May 26

Valentin the pain shows up at refactor time That's it Write time cheap Refactor time expensive AI optimizes for the cheap part You can't read between the lines because there are no lines between the lines perfect description. Human code has fingerprints AI code is smooth No clues.

Keeping human on why costs the speed you hired AI for the trade-off no one talks about.

Thank you for this precision. 🙌

Valentin Monteiro • May 27

Thanks Harsh. The 'no fingerprints' metaphor is the sharpest version of this I've seen. One thing it extends to: attribution. When something breaks six months later, human code usually leaves a trail back to a Slack thread or a ticket. AI output is a clean wall, you can't even ask 'why did we do it this way'.

EmberNoGlow • May 25

I think it's not so good because it's "smooth" - there are no variables with strange names, no functions that are not clear why they are needed, and of course the style is the same

Harsh • May 25

Ember smooth as criticism is a fascinating angle Smooth usually means good Here smooth means predictable Safe The AI never makes weird, interesting choices Strange names, unclear functions those sometimes come from human leaps AI doesn't leap. It walks the well-worn path.

Smooth is safe Safe is good enough Good enough is never great.

Thanks for this lens. 🙌

Copods • Jun 5

Being in a product engineering team means we see AI-generated code across client projects constantly. This article names something we've been trying to articulate for months.
The taste gap is real, and in our work, it shows up most visibly at the design-to-code boundary. AI can generate a functional component quickly, but it almost never preserves the intent behind a design decision. Why this interaction pattern? Why this spacing behavior? That reasoning disappears entirely, and the next developer inherits working code with no sense of what shaped it.
From what we've observed, investing in detailed convention documentation and giving AI structured context about how your team thinks and builds does seem to help with the context gap somewhat. Not a complete answer, but the output shifts noticeably when the model has more to work with than a bare prompt.
The consequence gap is the one we keep coming back to. The judgment that comes from living with your own past decisions, debugging them, and inheriting their costs is something that builds quietly over time and is hard to replicate through any other means.
The habit of treating AI output as a starting point rather than a finished answer feels like the most grounded way to work with it. The challenge is keeping that discipline steady when timelines tighten, which is usually exactly when it matters most.

Stoyan Minchev • May 25

I know that feeling, but I wonder also how much the result is affected by the used model?
And if you put the proper rules, regarding, style, deepness, naming, level of comments, will this lead to a result that does not bother the intuition? ;)

Harsh • May 25

Stoyan can better rules close the gap? Partially Yes Better prompts better output. Clear naming, style rules, comment depth all help.

But judgment remains AI follows rules It can't know when to break them Great code sometimes breaks its own patterns AI won't do that unless you tell it and you can't predict when.

Model matters, but ceiling is similar across models Better rules raise the floor They don't create the ceiling.

Thanks for the question. 🙌

mote • May 29

The 50/50 split is the real tension here — and it's getting harder to call.

On one side: vibe coding accelerates the boring parts, and that's genuinely valuable when you're prototyping. On the other: you lose the muscle memory that makes you dangerous when things break.

I watched a junior dev spend three days debugging a React hydration issue last month. Their first instinct was to prompt their way out. When that failed, they had no mental model to fall back on. That's not a failure of AI — it's a gap in fundamentals.

The question worth asking: does vibe coding help people build the models they need to eventually go solo? Or does it create a dependency loop where you always need the tool to think through the problem?

Harsh • May 29

Mote watched a junior dev spend three days debugging a React hydration issue. First instinct: prompt. When that failed, no mental model to fall back on that's the nightmare scenario. Not AI made a mistake Human had no backup plan that's not a failure of AI it's a gap in fundamentals this is the honest take. The AI isn't the villain. The dependency is. If the only way you can solve problems is by prompting, you're not a developer with a tool you're a passenger with a steering wheel.

Does vibe coding help people build the mental models to eventually go solo? Or does it create a dependency loop this is the question. And I don't think we know the answer yet.

Some people will use AI as a scaffold learning from it, internalizing patterns, eventually needing it less.
Others will use it as a crutch never building the muscle, always needing the tool.

The difference isn't the tool. It's the person holding it.

Thanks for asking the hard question not the easy answer. 🙌

codecraft • May 29

This perfectly captures what many teams are experiencing but are rarely articulating.
AI is definitely exceptional at reducing the cost of producing code, but what it hasn't reduced is the cost of understanding, maintaining, and evolving code, and those are different problems.

I've found that AI-generated code often gets me 80-90% of the way there. The remaining 10% is where engineering judgment lives: naming things, simplifying complexity, aligning with team conventions, anticipating future changes, and making intent obvious to the next developer. The most valuable skill isn't writing code faster anymore, but recognizing when "correct" isn't enough.

Interestingly, the best AI-assisted developers I know aren't the ones accepting the most generated code. They're the ones who can quickly identify where the code needs refinement and where it already meets the standard. AI accelerates implementation, but taste, context, and ownership still come from experience. The shift, in my opinion, may not be from developer to AI operator, but from code producer to code curator. And for now, that's still, fundamentally, a very "human job."

Harsh • May 29

codecraft AI reduces producing cost, not understanding maintaining evolving cost that's the line Remaining 10% is where engineering judgment lives not 90% of value The 10% that makes 90% usable over time Most valuable skill: recognizing when correct isn't enough from execution to judgment Best AI-assisted developers aren't accepting most generated code they know where refinement is needed Discernment > volume.

From code producer to code curator the title update we've all been looking for.

Most complete comment here. 🙌

Pavan Bhatia • May 29

Really enjoyed this perspective. I think the most dangerous misconception around AI-generated code is that “working code” automatically equals “production-ready architecture.”

In my experience, AI is extremely effective at accelerating:

boilerplate generation
repetitive refactors
test scaffolding
infrastructure templates
documentation synthesis

But it still struggles with the higher-order engineering tradeoffs that only emerge under real production conditions — latency amplification, failure domains, concurrency bottlenecks, sequencing issues, operational observability, rollback safety, etc.

The gap becomes especially visible in distributed systems. A generated ORM loop may look perfectly valid until a 2–3ms RTT increase suddenly multiplies thousands of synchronous round-trips into a major production incident.

AI can absolutely increase developer velocity, but I increasingly see senior engineering value shifting toward:

systems thinking
architecture validation
operational risk analysis
performance modeling
failure-mode reasoning

The code itself is becoming the easier part.

Harsh • May 29

Pavan the code itself is becoming the easier part That's the line AI accelerates boilerplate refactors scaffolding templates docs The predictable work AI struggles with: latency amplification, failure domains concurrency rollback safety, operational observability The work that only reveals itself under real production load Generated ORM loop looks valid until a 2-3ms RTT increase multiplies thousands of synchronous round-trips into a major incident.

This is the perfect example The code isn't wrong. The architecture is wrong for the environment AI can't see the environment Senior engineering value shifting to systems thinking architecture validation operational risk analysis failure-mode reasoning Not writing code faster Thinking about how code behaves under real conditions

The easier part is getting easier. The hard part isn't going anywhere.

Thank you for this the most senior-level comment in the thread. 🙌

View full discussion (28 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.