DEV Community

Cover image for I let four AI code reviewers fight over my PRs
Adam Poulemanos
Adam Poulemanos

Posted on

I let four AI code reviewers fight over my PRs

AI code reviewers are annoying

Many developers complain about AI code review. It's noisy. It's repetitive. It misunderstands context. It suggests bad patterns. It never shuts up.

Four AI systems review every push I make, and they will not shut up.

GitHub Copilot. Sourcery. GitHub Code Quality. Claude. All via CI. Every push.

Every pull request becomes a battlefield of automated opinions. My most recent architectural PR had 72 comments. I'm the only human participating.

This is annoying. I'm going to tell you why I do it anyway.

The solo developer's dilemma

I'm building CodeWeaver, an open-source MCP server for semantic code search. It's just me. No co-founder, no team, no code review buddy.

This is a problem. Not because I need validation (though that's nice), but because code review serves a specific function: it creates friction. Someone else looks at your work and asks "why?" before it ships. That friction catches bugs. It surfaces assumptions. It forces you to defend decisions you might have made on autopilot.

When you're solo, that friction disappears. You write code, you merge code. Nobody asks questions. Your bad habits calcify into architectural decisions.

So I outsourced the friction to robots.

What AI code review actually looks like

Let me be clear about what I'm dealing with here.

On a recent PR introducing daemon architecture (17 commits, 2000 lines changed), I got 37 comments from Copilot alone. Add in the other reviewers and we're at 60 line-level comments plus 12 general PR reviews. One PR.

Most of them are about:

  • Unused imports and variables

  • Empty except blocks missing explanatory comments

  • Health check loops that sleep before checking instead of after

  • Parameter names that could be "clearer"

The same suggestions, repeated by three different reviewers

It's a lot. Most of it is noise. The signal-to-noise ratio is genuinely terrible.

But here's the thing about noise: it takes two seconds to dismiss. I scan a comment, think "no, that's intentional," and move on.

The cognitive overhead is low once you accept that most suggestions won't matter.

The .variable vs .value war

I have a custom enum pattern in CodeWeaver. Instead of accessing enum values with .value (Python's default), I use a custom .variable property on my BaseEnum class. There are good reasons for this — it gives me more control over serialization and string representation and not all BaseEnums are strings.

Copilot flags this. Every time. In one PR, it flagged the same .value → .variable pattern ten times across different files.

"Consider using .value for consistency with Python enum conventions."

This is maddening. It's an intentional design decision. I've made it. I'm committed to it. BaseEnum actually has a .variable property — Copilot just doesn't know about my custom base class.

But.

The first time I saw this comment, it made me stop and think. Why am I using .variable? Is there actually a good reason? I spent a few minutes writing up the justification in my head. Turned out yes, there was a good reason. But I hadn't consciously articulated it until the robot asked.

That's the value buried in the annoyance. Being forced to defend a decision — even to a machine that won't understand your defense — clarifies your own thinking.

The catches that matter

Buried in the noise, real bugs surface.

In the daemon PR I mentioned, I had a stop command that was supposed to kill the daemon process. The code was:


os.kill(os.getpid(), signal.SIGTERM)
Enter fullscreen mode Exit fullscreen mode

That kills the CLI process itself, not the daemon. Multiple AI reviewers caught this. I'd have shipped a command that literally did nothing useful.

GitHub Code Quality caught me using asyncio.suppress(asyncio.CancelledError) — which doesn't exist. The correct form is contextlib.suppress(). I use the latter all the time, but an AI assistant suggested the former and I missed it in my review. Would have caused a runtime error. My AI reviewers caught my AI developers' mistakes.

Copilot flagged an inverted conditional:


if not (project or not isinstance(project, Path) or not project.exists()):
Enter fullscreen mode Exit fullscreen mode

The logic was backwards. I'd have been showing warnings when I shouldn't and staying silent when I should have warned.

Sourcery caught that my systemd service file generator wasn't quoting paths. Any user with a space in their home directory would have had a broken service file.

In another PR, Copilot found a format string error — %r% s%r instead of %r, %s, %r — and a type error where I was dividing a boolean by an integer instead of dividing lengths.

None of these were catastrophic. All of them would have wasted my time later. Some would have reached users before I noticed.

Why multiple reviewers?

If AI code review is annoying, why run four of them?

Coverage. They catch different things.

Looking at the actual data, each reviewer has a distinct personality:

Copilot generates the highest volume (37 comments on one PR) but has the lowest signal-to-noise ratio. About 55% of suggestions get implemented. It's good at catching unused code, potential AttributeErrors, and logic issues. It's also the most repetitive — it'll flag the same pattern ten times if it appears ten times.

Sourcery is the opposite: fewer comments (12 on that same PR), but nearly all of them matter. About 90% implementation rate. It catches security issues like path quoting, architecture problems, and it tracks which suggestions you've addressed with a "✅ Addressed" marker, which is nice.

GitHub Code Quality goes deep on static analysis. It caught the asyncio.suppress misuse that no other reviewer flagged. It also provides autofix capabilities — click a button and the fix is applied.

Claude (via CI) writes comprehensive architectural reviews. Good for cross-cutting concerns. The downside: I had 11 nearly identical reviews on one PR because my CI workflow triggers it multiple times. That's a configuration problem, not a Claude problem.

The enum .variable thing gets flagged by Copilot every time. The asyncio.suppress bug? Only Code Quality caught that one.

Running multiple reviewers also creates a consensus signal. When all of them flag the same thing, it's probably worth a closer look.

This isn't for everyone

I want to be honest: this approach requires a specific temperament.

You have to be okay with noise. Lots of it. If seeing "wrong" suggestions irritates you, this will drive you crazy. You need to be able to scan, dismiss, and move on without getting emotionally hooked by bad advice.

You also have to be solo (or nearly solo). On a team, AI code review creates a different dynamic. Human reviewers might feel their feedback is redundant. The comment threads become unreadable. The value proposition changes entirely.

And you have to accept that you're trading time for coverage. Reading through 60 comments takes time, even if most are quick dismissals. That's time you could spend actually coding.

For me, the trade is worth it. I don't have teammates to catch my mistakes. The AI reviewers are bad teammates, but they're the teammates I have.

The real point

Here's what I've learned from this experiment: AI code review isn't good. The tools are noisy, repetitive, and often wrong about intent. Developers who find it annoying are correct.

But "not good" isn't the same as "not useful."

When you're a solo developer, your options are:

  1. No code review at all

  2. Annoying, imperfect, robotic code review

I pick option two. Not because it's good, but because it's better than nothing. It creates friction where there would otherwise be none. It catches some bugs that would otherwise ship. It forces me to articulate decisions I might otherwise make unconsciously.

Is this the future of code review? God, I hope not. Human reviewers who understand context and intent will always be better than pattern-matching robots.

But until I have those human reviewers, I'll keep letting the robots fight over my PRs.

I'm building CodeWeaver, semantic code search for AI agents. If you want to see what 60+ AI comments on a PR looks like, check out PR #184. It's not pretty, but it works.


Post originally appeared on the Knitli blog at https://blog.knitli.com/i-let-four-ai-code-reviewers-fight-over-my-prs/

Top comments (0)