Teruo Kunihiro

Posted on Dec 19, 2025

Code Review with multiple AIs

#ai

Hello folks.
Have you ever wanted to quickly run code reviews using multiple AIs? I have. If you really want to do something like this, you can have an AI generate a script and run it locally right away. Problem solved! …But if we stop there, the blog post ends immediately, so please stick with me for a little longer.

The problem I want to solve

In most cases, that really does solve it—but scripts created this way often end up calling pay-as-you-go APIs such as the ChatGPT API. Calling APIs isn’t inherently a problem, but I personally wanted to keep these kinds of tasks within a subscription fee if possible. (Subscriptions also have usage limits, so they’re effectively usage-based too but with how I use them, I rarely hit the limit.)

AI vendors also offer their own coding agents like Codex, Claude Code, Gemini CLI, and so on. By authenticating inside those coding agents, you can use them within your subscription plan. GitHub Copilot doesn’t develop its own models, but it’s appealing because it’s inexpensive and fixed-price, and lets you try a variety of models.

So it seems promising to delegate code review to these fixed-price coding agents and compare their results. That way, without issuing API keys, you can internally call multiple coding agents you already use and instantly get second opinions on your code review.

You might also want to use a team-standard prompt for code reviews. Even if you don’t fully standardize, it’s nice to avoid reinventing prompts each time and use a reasonably well-prepared team-specific one.

Then why not run the CLIs in CI?

You’re absolutely right. But once you run it in CI, the next questions become bigger ones—how do we handle team-wide subscriptions, or would it be cheaper and higher-performing to adopt a dedicated SaaS, etc.

So this time, I wanted something that runs locally at a script-like level. If you want to do it properly, as mentioned, moving it into CI is likely better. But even then, it might still be handy to quickly check multiple reviews locally.

The solution

I’d like to introduce ai-utils, a tool I made that has made my development flow just a bit more convenient.

How it actually behaves

Let’s look at a real review I ran when I added Copilot CLI support to ai-utils. Including full results for every AI would be too long, so I’ll show only excerpts.

I reviewed this commit. In practice, this is an AI check against pre-commit (staged) files called staged-check.

The check is performed using a prompt like the one linked here:
staged-check.md

In this case, Codex’s feedback captured the situation most accurately. Since this is a lightweight check, it’s understandable but because the prompt only passes git diff, the LLM seems to struggle to grasp the repository as a whole. I’d like to improve that at some point.

Below are the actual review outputs.

Claude Code (Sonnet 4.5)

     The code quality is very good. The new Copilot provider implementation follows a structure consistent with the existing Codex provider. The `stripFlagWithValue` helper function is implemented in a properly reusable way and adheres to the DRY principle. Naming is clear and follows Go conventions.

Gemini CLI (Gemini 2.5)

     * **Code duplication**:
         * **File**: `internal/provider/copilot.go`
         * **Issue**: The `stripFlagWithValue` function is newly defined in `copilot.go`, but this logic is also used for model flag handling in `codex.go`. Currently, `codex.go` cannot access this function defined in `copilot.go`, so similar logic would be (not visible here, but presumably) needed in `codex.go` as well. To avoid code duplication, this helper function should be moved to a shared utility file within the `provider` package (e.g., `provider/utils.go`).

Codex (GPT-5.2)

     - **Fatal timeout unit bug**: In `initConfig()` in `internal/cli/root.go`, `config.ProviderConfig.Timeout` is assigned `120`, but since the type is `time.Duration`, that becomes **120ns** (you likely expected `120*time.Second`). This same value is set when adding `copilot`, so with the default configuration it will time out immediately / become unstable. At minimum, the newly added `copilot` part must be fixed; ideally, fix existing `claude/gemini/codex` similarly.

     - **Where to place the utility function**: `stripFlagWithValue()` is defined in `internal/provider/copilot.go` and is also used from `internal/provider/codex.go` (it works because it’s the same package, but it hurts readability). Moving it to a shared file like `internal/provider/args.go` would make the intent clearer.

     - **Hard-coded strings**: Provider name `"copilot"` and flags like `"--model"` are scattered (e.g., `internal/cli/root.go`, `internal/provider/*`). Even small constants (at least provider names) would improve consistency.

GitHub Copilot CLI (Claude Sonnet 4.5)

     **`internal/provider/copilot.go`**: The `stripFlagWithValue` function is newly implemented, but the same logic is also used in `codex.go`. Following the DRY principle, this function should be moved to a common file such as `provider.go` so it can be shared by both providers.

     **`internal/provider/copilot_test.go`**: A trailing newline is missing at the end of the file (line 115). Go formatting conventions expect files to end with a newline.

     **`internal/provider/codex.go`**: An implementation of `stripFlagWithValue` has not been added. With this change, the codex provider may end up with an undefined reference when processing the model flag. The staging may be partial.

Even with some prompt issues, asking multiple AIs to review helped me avoid blindly trusting a single model’s feedback. I didn’t do it this time, but you can also compare SOTA models like Opus 4.5 and Gemini 3.0. In many cases you don’t know which model is best, so being able to benchmark and compare with a single command is very convenient.

My OSS project

As mentioned above, ai-utils is my own OSS project. It’s small and functionally simple, but seemed useful enough that I decided to build it.
Details are here: ai-utils

Concept

Easily run multiple AIs locally within the subscription plans.

Problems it solves

There are plenty of OSS tools like this. But the three things I specifically wanted to solve were:

I don’t want to issue API keys
I want to rewrite prompts in my own style
I want to compare responses from multiple AIs

I couldn’t find an OSS project that satisfied all three, so I chose to build one. In the AI era, it’s easy to build what you want, so I was able to overcome the cost of “reinventing the wheel.”

How to use

On macOS, you can install easily with Homebrew:

brew tap trknhr/homebrew-tap

brew install aiu

On Linux, run the install shell script:

curl -sSfL https://raw.githubusercontent.com/trknhr/ai-utils/main/install.sh | sh

You can’t use it unless supported coding agents like Claude Code or Codex are installed and ready to use.

Trying it out

Using commit-msg, you can generate a commit message based on staged files:

aiu commit-msg

With -m, you can run multiple AIs in parallel.

You can also run your own prompts. Inside a prompt file, {{$ }} executes a command, so you can dynamically pass the command output to the AI.

Example:

Just say {{$ date }}.

This passes the current time to the AI, and it will return only the current time. Using the same mechanism, the review task passes things like git diff.

So if your team wants custom prompts, you can place team-specific prompts under .aiu/prompts/ and run standardized reviews.

About development

The implementation required for this app wasn’t challenging. AI is so good at implementing typical CLI applications that there wasn’t much I had to do myself. What I did was mostly defining the spec and writing tests and I found myself thinking “So this is the AI era...” over and over.

Summary

This tool just calls the coding agents provided by each vendor, but wrapping it up as a CLI makes it surprisingly comfortable.

Because the tool’s functionality is simple, it’s also an application where it’s easy to let AI handle most of the implementation. Probably about 95% of the code was written by AI.

It won’t dramatically improve something by itself, but it helps you move through small daily tasks a little more smoothly.

If you’re interested, please refer to the GitHub page and install it. If you have complaints or requests, please open an Issue.

DEV Community