Incomplete Developer

Posted on Mar 17

GitHub Copilot Completes a Real Code Review (Claude Sonnet 4.5)

#githubcopilot #codereview #ai

One of the more interesting experiments I recently ran was asking GitHub Copilot to perform a full code review on a .NET application that it had also helped rewrite.

The original project was written in 2013. The application still worked, but the codebase had accumulated a lot of technical debt over the years.

Rather than trying to gradually refactor it, I decided to do a complete rewrite with Copilot’s help using modern conventions in the .NET ecosystem.

Once the rewrite was finished, I wanted to answer a simple question:

How useful is an AI-generated code review on a real-world project?

So I asked Copilot to review:

The original legacy codebase
The new rewritten solution

The results were interesting.

Watch Full Video

Starting With a Simple AI Code Review

The easiest way to run an AI code review is something like this:

Perform a code review on this project.

Copilot will usually return a fairly useful response that includes things like:

Naming issues
Readability improvements
Basic design observations
Possible refactoring opportunities

But there is a problem with this approach.

The review tends to focus mostly on surface-level code quality.

In my legacy project, the biggest issues weren’t small code smells. The real problems were architectural.

For example:

Many classes were static
Several classes simply passed calls directly to the data access layer
Domain logic and database logic were mixed together
The solution had no automated tests

The initial open-ended review mentioned some issues, but it completely missed the severity of the architectural problems.

The Other Problem: AI Doesn’t Know Your Standards

Another issue with AI reviews is that the model does not know your team conventions.

Every system has different priorities.

Some systems are mostly CRUD applications where fast queries are the most important factor.

Other systems might be financial or analytical platforms where things like:

deterministic calculations
validation rules
extensive unit testing

are critical.

Without guidance, Copilot evaluates code using generic best practices, not the standards that actually matter for your project.

AI Reviews Also Tend to Be Polite

Another thing I noticed is that AI-generated reviews tend to be very diplomatic.

They rarely say something like:

This architecture is fundamentally broken.

Instead they produce much softer language.

In fact, the first open-ended review Copilot generated described the legacy codebase as:

“A well-structured lottery simulation system with clear domain modeling and separation of concerns.”

That assessment was far more positive than reality.

Giving AI a Code Review Scorecard

To get a more realistic review, I gave Copilot a structured review scorecard.

Instead of asking for a general opinion, the AI had to evaluate the project across several weighted categories.

For example:

Category	Weight
Architecture	High
Testing	20%
Programming practices	15%
Maintainability	Medium

Interestingly, in this scoring model writing code itself only accounts for about 15% of the total score.

That reflects something many experienced developers already know:

Software quality is determined far more by architecture and testing than by individual lines of code.

The Result: Reviewing the Legacy Code

Once Copilot evaluated the legacy project using the scorecard, the results were dramatically different.

The system scored:

15 / 100

The review highlighted several major issues:

No clear architectural structure
Tight coupling between components
No automated tests
Poor separation of responsibilities

The final recommendation from the AI was very direct:

Do not attempt to fix this system incrementally.

Instead:

Treat the codebase as reference material
Perform a full rewrite
Avoid investing time into patching severe technical debt

This was the type of honest assessment that the simple open-ended review failed to produce.

Reviewing the Rewritten Version

After several days of work — with Copilot helping with much of the code — I ran the same structured review on the new version.

The rewritten solution followed a structure that most developers familiar with the .NET ecosystem would recognize:

clear project separation
cleaner domain organization
proper data access layers
improved maintainability

The application itself is a simple lottery simulation tool that can generate numbers based on historical results and simulate outcomes.

And like most lottery simulations, it demonstrates one consistent result:

Even when simulated millions of times, the expected outcome is still a loss.

An Interesting AI Behavior

When I ran the review on the new codebase, Copilot generated a detailed report but did not include the final score.

I had to prompt it again to output the result using the same scorecard format.

This highlights something important about working with AI systems.

AI responses are not fully deterministic.

Even when using the same prompts and templates, the output may vary slightly because the model is generating responses rather than executing fixed instructions.

If the responses were identical every time, it would behave more like a script than an AI model.

Key Takeaways

Using AI for code reviews can be extremely useful, but it works best when used correctly.

1. AI Needs Structure

Without a structured review framework, AI reviews will stay fairly shallow.

Providing guidelines dramatically improves the analysis.

2. AI Defaults to Being Polite

If you want honest feedback, you need to force the AI to follow strict evaluation criteria.

Otherwise the review may sound overly positive.

3. AI Is Not Deterministic

Even with identical prompts, AI responses may vary.

This is normal behavior for generative AI systems.

4. AI Works Best as an Engineering Assistant

AI tools can greatly accelerate tasks like:

code reviews
refactoring analysis
technical debt assessment

But they still work best when combined with human engineering judgment.

Final Thoughts

Using Copilot to both rewrite and review a codebase turned out to be a fascinating experiment.

The biggest lesson was simple:

AI becomes much more powerful when you give it structure.

A simple prompt produces interesting feedback.

A structured framework produces actionable engineering insight.

📚 Watch the Full Series

Episode 1: GitHub Copilot AI Code Review - Can AI Understand Legacy .NET Code?
https://youtu.be/P26t5EVz70U

Episode 2: Creating .NET Projects and Solution Structure
https://youtu.be/Vf0yULOHY3I

Episode 3: Legacy Code Rewrite - Random Number Generator
https://youtu.be/6DuaW9VjQa8

Episode 4: Working Without Agent Skills in Visual Studio 2026
https://youtu.be/dznUGMNhqSU

Episode 5: Vibe Coding Razor Pages
https://youtu.be/sQdByQML_w8

➡️ Episode 6: Code Review (this article)
https://youtu.be/omDvFGu8Vtc

DEV Community