One of the more interesting experiments I recently ran was asking GitHub Copilot to perform a full code review on a .NET application that it had also helped rewrite.
The original project was written in 2013. The application still worked, but the codebase had accumulated a lot of technical debt over the years.
Rather than trying to gradually refactor it, I decided to do a complete rewrite with Copilot’s help using modern conventions in the .NET ecosystem.
Once the rewrite was finished, I wanted to answer a simple question:
How useful is an AI-generated code review on a real-world project?
So I asked Copilot to review:
- The original legacy codebase
- The new rewritten solution
The results were interesting.
Watch Full Video
Starting With a Simple AI Code Review
The easiest way to run an AI code review is something like this:
Perform a code review on this project.
Copilot will usually return a fairly useful response that includes things like:
- Naming issues
- Readability improvements
- Basic design observations
- Possible refactoring opportunities
But there is a problem with this approach.
The review tends to focus mostly on surface-level code quality.
In my legacy project, the biggest issues weren’t small code smells. The real problems were architectural.
For example:
- Many classes were static
- Several classes simply passed calls directly to the data access layer
- Domain logic and database logic were mixed together
- The solution had no automated tests
The initial open-ended review mentioned some issues, but it completely missed the severity of the architectural problems.
The Other Problem: AI Doesn’t Know Your Standards
Another issue with AI reviews is that the model does not know your team conventions.
Every system has different priorities.
Some systems are mostly CRUD applications where fast queries are the most important factor.
Other systems might be financial or analytical platforms where things like:
- deterministic calculations
- validation rules
- extensive unit testing
are critical.
Without guidance, Copilot evaluates code using generic best practices, not the standards that actually matter for your project.
AI Reviews Also Tend to Be Polite
Another thing I noticed is that AI-generated reviews tend to be very diplomatic.
They rarely say something like:
This architecture is fundamentally broken.
Instead they produce much softer language.
In fact, the first open-ended review Copilot generated described the legacy codebase as:
“A well-structured lottery simulation system with clear domain modeling and separation of concerns.”
That assessment was far more positive than reality.
Giving AI a Code Review Scorecard
To get a more realistic review, I gave Copilot a structured review scorecard.
Instead of asking for a general opinion, the AI had to evaluate the project across several weighted categories.
For example:
| Category | Weight |
|---|---|
| Architecture | High |
| Testing | 20% |
| Programming practices | 15% |
| Maintainability | Medium |
Interestingly, in this scoring model writing code itself only accounts for about 15% of the total score.
That reflects something many experienced developers already know:
Software quality is determined far more by architecture and testing than by individual lines of code.
The Result: Reviewing the Legacy Code
Once Copilot evaluated the legacy project using the scorecard, the results were dramatically different.
The system scored:
15 / 100
The review highlighted several major issues:
- No clear architectural structure
- Tight coupling between components
- No automated tests
- Poor separation of responsibilities
The final recommendation from the AI was very direct:
Do not attempt to fix this system incrementally.
Instead:
- Treat the codebase as reference material
- Perform a full rewrite
- Avoid investing time into patching severe technical debt
This was the type of honest assessment that the simple open-ended review failed to produce.
Reviewing the Rewritten Version
After several days of work — with Copilot helping with much of the code — I ran the same structured review on the new version.
The rewritten solution followed a structure that most developers familiar with the .NET ecosystem would recognize:
- clear project separation
- cleaner domain organization
- proper data access layers
- improved maintainability
The application itself is a simple lottery simulation tool that can generate numbers based on historical results and simulate outcomes.
And like most lottery simulations, it demonstrates one consistent result:
Even when simulated millions of times, the expected outcome is still a loss.
An Interesting AI Behavior
When I ran the review on the new codebase, Copilot generated a detailed report but did not include the final score.
I had to prompt it again to output the result using the same scorecard format.
This highlights something important about working with AI systems.
AI responses are not fully deterministic.
Even when using the same prompts and templates, the output may vary slightly because the model is generating responses rather than executing fixed instructions.
If the responses were identical every time, it would behave more like a script than an AI model.
Key Takeaways
Using AI for code reviews can be extremely useful, but it works best when used correctly.
1. AI Needs Structure
Without a structured review framework, AI reviews will stay fairly shallow.
Providing guidelines dramatically improves the analysis.
2. AI Defaults to Being Polite
If you want honest feedback, you need to force the AI to follow strict evaluation criteria.
Otherwise the review may sound overly positive.
3. AI Is Not Deterministic
Even with identical prompts, AI responses may vary.
This is normal behavior for generative AI systems.
4. AI Works Best as an Engineering Assistant
AI tools can greatly accelerate tasks like:
- code reviews
- refactoring analysis
- technical debt assessment
But they still work best when combined with human engineering judgment.
Final Thoughts
Using Copilot to both rewrite and review a codebase turned out to be a fascinating experiment.
The biggest lesson was simple:
AI becomes much more powerful when you give it structure.
A simple prompt produces interesting feedback.
A structured framework produces actionable engineering insight.
📚 Watch the Full Series
Episode 1: GitHub Copilot AI Code Review - Can AI Understand Legacy .NET Code?
https://youtu.be/P26t5EVz70U
Episode 2: Creating .NET Projects and Solution Structure
https://youtu.be/Vf0yULOHY3I
Episode 3: Legacy Code Rewrite - Random Number Generator
https://youtu.be/6DuaW9VjQa8
Episode 4: Working Without Agent Skills in Visual Studio 2026
https://youtu.be/dznUGMNhqSU
Episode 5: Vibe Coding Razor Pages
https://youtu.be/sQdByQML_w8
➡️ Episode 6: Code Review (this article)
https://youtu.be/omDvFGu8Vtc




Top comments (0)