DEV Community

Pseudo Developer
Pseudo Developer

Posted on

OpenAI Codex - Using it for Code Review

What OpenAI Codex Missed in a Legacy .NET Codebase

AI code review tools are often marketed as near–senior-engineer replacements: point them at a repository and expect deep architectural insight. I wanted to see how true that is, so I tested OpenAI Codex’s web-based code review on a real-world legacy .NET C# application.

The result? Useful—but shallow in the ways that matter most.


The Setup

I pointed Codex directly at a GitHub-hosted legacy .NET solution using the web interface—no IDE plugins, no hand-holding. This is a non-trivial codebase that’s been evaluated by other AI tools before, making it a good benchmark.

The goal wasn’t to test syntax knowledge. It was to see whether Codex could reason about a system.


What Codex Gets Right

Out of the box, Codex quickly identified:

  • Unused variables and redundant methods
  • Overly static implementations
  • Minor data access and code-structure issues

For code-level feedback, it’s fast and competent. This kind of review can absolutely save time during refactoring or cleanup.


Where It Falls Apart: Architecture

The problems started when I asked for a holistic review.

Even after providing a detailed scorecard covering architecture, testing, and maintainability, Codex produced:

  • A vague, overly positive executive summary
  • Inflated scores for architecture and testing
  • No serious discussion of systemic design flaws

The application looks layered, but in reality the domain is tightly coupled to Entity Framework and the database sits at the center of the system. This is a well-known architectural anti-pattern—and Codex largely missed it.

Similarly, the solution contains only end-to-end tests, with no meaningful unit or domain testing. Yet the AI implied reasonable test discipline.


The Real Limitation

Codex still reasons locally, not systemically.

It evaluates classes and methods well, but struggles to:

  • Trace dependency flow across projects
  • Identify architectural coupling
  • Penalize designs that appear structured but are fundamentally flawed

This is especially dangerous in legacy systems, where bad design is often repeated consistently across the codebase—making it harder for AI to recognize as a problem.


Final Thoughts

AI code review isn’t useless—but it’s not a replacement for architectural judgment.

Use it for:

  • Code-level feedback
  • Cleanup and refactoring suggestions

Do not use it for:

  • Architectural evaluation
  • Assessing system health
  • Trusting numeric scores at face value

AI is improving quickly, but for now, architecture still belongs to humans.

Top comments (0)