DEV Community

Cover image for Working Code Is Not the Same as Clean Code. GitHub Copilot Does Not Know the Difference.
Avery
Avery

Posted on

Working Code Is Not the Same as Clean Code. GitHub Copilot Does Not Know the Difference.

The tests pass. The app runs. The feature ships on time.

And somewhere in the codebase a component is doing four things at once, state is living in the wrong place, and a naming convention that made sense in session one has quietly been replaced by something different in session twelve.

Nothing is broken. Everything is wrong.

What working code hides

Working code is the minimum bar. It means the app does what it is supposed to do. It does not mean the code is maintainable, consistent, or built on a standard that will hold up as the project grows.

GitHub Copilot clears the working code bar every time. That is not the problem.

The problem is that working code and clean code look identical in the short term. Both pass tests. Both ship features. Both make the deadline.

The difference shows up later. In the refactor that takes three times longer than it should. In the new developer who cannot understand the codebase. In the bug that appears in three places because the same logic was written three times.

By the time the difference is visible, the working code has been in production for months.

Why GitHub Copilot cannot tell the difference

Working code has a clear definition. It runs without errors. It produces the expected output. Copilot can optimize for this because it is measurable.

Clean code does not have a clear definition without rules. Single responsibility means different things in different projects. Clear naming depends on domain context. Separation of concerns depends on architecture decisions that were made before the session started.

Without rules that define what clean looks like in this specific project, Copilot defaults to working. Every time. Because working is the only bar it can measure against.

The gap nobody talks about

Most conversations about AI coding tools focus on whether the output works.

Does it compile? Does it pass the tests? Does it do what the prompt asked?

The gap nobody talks about is the space between working and clean. That gap is where technical debt lives. Where inconsistency accumulates. Where the codebase slowly becomes harder to work with despite the fact that everything technically functions.

GitHub Copilot fills that gap with its own decisions when no rules exist to fill it first.


Working code is what GitHub Copilot produces by default. Clean code is what it produces when the rules define what clean means before the session starts.

What defining clean actually looks like

It starts with translating what you know into what the AI can follow.

You know that components should have one responsibility. Write it as a rule:

React output rules:

  1. Every component has exactly one responsibility. If it does more, extract the rest.
  2. No component exceeds 200 to 300 lines. Extract logic into hooks before adding more.
  3. State lives in hooks. UI components receive data through props only.

Now Copilot has a definition of clean that it can apply. Not a vague principle. A specific constraint. The output moves from working to clean because clean is now defined.

The prompt does not matter. The rules do.

Working code will always be GitHub Copilot's default. That is not a criticism. It is how the tool works.

Clean code requires a definition. And the definition has to come from you, in the form of rules, before the first prompt is written.

Stop measuring success by whether the code runs. Start measuring it by whether the code is clean. And give your AI the rules it needs to produce the difference.


Want to find where your React project stopped at working and never reached clean?

I built a free 24 point checklist that helps you find exactly that. The structural gaps between working output and clean output.

Get the React AI Clean Code Checklist — free

Avery Code React AI Engineering System

Top comments (4)

Collapse
 
nark3d profile image
Adam Lewis

Agreed on the core of this. One thing from experience though - the rules only do anything if they're actually checkable. I've written plenty of 'one responsibility per component' guidance in prose and then watched everyone, me included, ignore it the minute a deadline hit. The one that sticks is wired into the linter, because then the agent can't call the job done until it passes. In prose it's just a comment in a style guide nobody opens.

Collapse
 
avery_code profile image
Avery

That is exactly the problem the Output Gate in Avery Code solves. Rules in prose get ignored under pressure. Rules wired into the AI working process do not. The system has a mandatory validation step before any output is considered done. The agent cannot call the job finished until the rules pass. Same principle as your linter example, just applied at the AI behavior level rather than the code level.

Collapse
 
christiecosky profile image
Christie Cosky

I've spent a lot of time thinking and writing about readable, maintainable code this year. I think that class names should be fences that keep related logic inside and unrelated logic outside, and that class names should specify their one reason to change. So, where it makes sense, I started naming my classes after their single responsibility. *Resolver, *Formatter, *Validator, *Orchestrator. I'm hoping that will encourage others - including Claude - to NOT add unrelated code to those files. So far, I've been encouraged: when Claude works in parts of the system where I've done a good job structuring and naming files, it repeats the same patterns in its new code.

I agree that "the code works" is the absolute minimum bar. I think the good news is, with AI we can now create code that not only works but is readable and maintainable in less time than it would have taken to do by hand. (The bad news is, not everyone takes the time to do that!)

Collapse
 
avery_code profile image
Avery

Christie, thank you for this. The naming approach you describe is exactly right and the pattern repetition you are seeing from Claude is real. Good structure pulls the AI in the right direction.

The challenge comes with a new project where nothing exists yet. No Resolver, no Formatter, no established conventions for Claude to pick up on. That is where the rules have to come first, before the codebase exists to demonstrate them. The system I built starts there, defining the standard before the first file is created so Claude has something to follow from session one, not just in the parts of the project that are already well structured.