DEV Community

Hulk in Public
Hulk in Public

Posted on

Should code implemented by AI receive a sloppy review?

The other day, I received a review request from a colleague. There were a staggering 100 files in the diff—my jaw dropped.

It is impossible for a human to implement such a large-scale change manually. They definitely used an AI agent. The task involved migrating CSV exports within the project to an Excel format. If these 100 files had been replaced mechanically—for instance, simply changing /path/hogehoge.csv to /path/hogehoge.xlsx across the board—I wouldn’t have had any complaints, and a 100-file diff would have been fine.

However, the changes included logic modifications, file deletions, creations, and migrations of test locations. I understand that the issue involved significant changes, but why bundle it all into a single Pull Request? Why not check the diff yourself to see if there are any issues before requesting a review? I spent four hours reviewing this. A typical PR review for me takes less than 10 minutes. The PR contained changes clearly outside the scope of the task, such as DB schema modifications, which have nothing to do with replacing CSV files with Excel files. There were also model changes and new controller/action additions (which Rails engineers will understand).

I pointed out the clearly unnatural code, leaving nearly 30 comments. Two weeks after I sent back the review, I received another request. I was at a loss. Sure, the issues I pointed out were fixed, but there was nothing more I wanted to—or could—point out in the codebase. If I were to approve it now, it would be released. But with a diff this massive, it’s easy to imagine business teams requesting numerous small fixes later. Even if we decided to revert, the diff is so large that it would likely cause conflicts with other developers, making it impossible to undo. So, should I approve it?

I remembered a piece of advice from a veteran engineer at a startup who mentored me during my internship:

"The responsibility for the implementation lies with the reviewer."

Perhaps he was trying to encourage me as a junior engineer to feel comfortable submitting PRs. At the same time, I learned that a review is a task that requires that level of ownership.

I consulted my boss via Slack DM about whether I should approve it, and this is what came back:

"I've only verified the code's validity!
There are a lot of changes, so I haven't tested the actual functionality yet!
Please list the updated APIs and test them on the develop branch!
If you let me know, I'll help with testing on the develop branch too.

Just pass this along to the PM as is.
On the contrary, you’ve checked the PR thoroughly. If it were me, I’d just approve without a second thought."

Anyway, let’s stop badmouthing my colleague here (though it's unlikely anyone around me will read this article since it’s in English).

What I want to say is that the recent trend of taking code reviews lightly because of AI coding is dangerous. We feel like it’s magic that AI can churn out massive amounts of code, but the ones reviewing it aren't AI—it’s humans.

Sometimes when I browse tech blogs, I see weirdos saying, "I let AI handle not just the implementation but the review too!" No. A review isn't just looking at the code. It’s an act that requires various political judgments:

  • Is it correctly interpreting business requirements?
  • Are there any missing or redundant implementations?
  • Will this negatively impact customers if released now?
  • Is the coding consistent with the team's philosophy?
  • Will performance, like DB load, be an issue?

You can't figure these things out just by looking at the code; reviews are possible only on top of the daily communication you've had with others. (Conversely, maybe it could work if you record all team conversations in Slack and have an AI with enough input power to read it all. Though I don't see the point of going that far.)

Did you think, "Well, wouldn't AI reviews be useful for a solopreneur like you?" Perhaps. But its capability is still low.

Most of all, I have zero motivation to have an AI review a PR that was created by an AI. If you want to do that, feel free. And, this might just be my stubbornness, but the final decision on whether to release something should be left to humans.

Also, let’s stop blindly trusting code implemented by AI. Recently, Bun moved from Zig to Rust using "VibeCoding" and merged it into main without any reviews. That’s a move you can only make if you have absolute trust in AI, but I can't place absolute trust in AI even for my own small-scale projects. In fact, I don't trust it at all.

Yesterday, I refactored the social login feature for my project, SuperRails. Previously, each user could only have one login method (email, GitHub, Google, etc.). I changed it so that a single user can have multiple login methods. I let GPT-5.5, which I've been into lately, handle the implementation to create a PR. When I reviewed it myself, I discovered the following issues:

  • uid and provider remained in both the User model and the newly created SocialIdentify model (in other words, it should have been removed from the User model, but it was forgotten).
  • User#from_omniauth had deeply nested code that clearly violated the "Single Responsibility Principle."

Especially for the second point, I refactored the code until I was satisfied and committed it. Was my effort a waste? At the very least, I don't think there's any reason to justify taking reviews lightly.

Top comments (0)