I built a code reviewer for myself. My API bill showed me where the real cost was hiding

#ai #codereview #devtools #claudecode

I've been building Revue since March.

It started as a personal tool. I wanted multi-agent code review running inside my development workflow, not a one-shot "ask Claude to review this PR" prompt. By April I had the core working and was testing it heavily.

Then I checked my Anthropic API usage. $78.83 in 13 days on top of my Max subscription. I'd burned 1.5 to 2 million input tokens a day on Sonnet 4.5 without watching the meter.

The irony was obvious. I was building a tool to make AI-assisted development cheaper, and I was doing it in the most expensive way possible.

Where the cost was really coming from

I already knew the API billed separately from my Max subscription. That was not the surprise. The surprise was how fast a good review burns tokens.

As I improved the six agents, added tool calling, and built comment tracking, each review got heavier. More agents reading more context, making more calls, following up on more threads. The better the workflow got, the more it cost to run, and I was putting all of it through the API while I tested.

That's when the distinction started to matter. API tokens cost per token; the subscription tokens already in my Max plan I'd paid for upfront. A review that ran inside my session was free. The same review through the API billed me for every token. Multi-agent reviews eat tokens fast, and the bill climbs with them.

The fix had two parts:

I moved the review inside my session, before committing, onto those subscription tokens I'd already paid for.
For CI, I switched to a cheaper model: from Sonnet 4.5 to DeepSeek-V4-Pro.

What Revue actually is

Revue is a code review workflow, not the model that does the reviewing. It runs in the two places review actually matters: on your machine before you commit, and in CI on every pull request. The behaviour is the same across GitHub, GitLab, and Bitbucket, on almost any model you choose.

On your machine, that layer is the /revue Claude Code skill. You run it against your staged diff inside your existing Claude Code session, so it draws on subscription tokens instead of your API wallet. No separate charge.

Six agents run in parallel:

Security catches injection vectors, auth bypasses, and supply-chain risks
Performance flags O(n²) loops, memory leaks, and inefficient queries
Architecture checks for coupling violations and missing error handling
Code Quality reviews naming, duplication, and testability
Licensing checks GPL and AGPL compatibility in dependencies
Synthesis reconciles findings across agents and prints to the terminal

Security reasoning differs from performance reasoning. Running them together in one prompt creates context pressure and dilutes both. Running them in parallel means the full review takes roughly as long as the slowest single agent.

The model is the easy part to swap

The agent that writes the comments is the part everyone can copy, and the part that gets better for free every time a vendor ships a new model. What I actually spent my time on is the layer around it.

The synthesis agent does not just stack the six outputs together. It catches contradictions between agents and verifies each finding against the code before anything reaches you, so you are not left reading six reviewers arguing with each other. On pull requests, Revue reads your replies to its comments and answers them, instead of dropping a wall of feedback and going silent. Every run is logged, so you get a review history rather than a comment that scrolls away.

The Zara security agent flags a missing HTTP timeout and suggests a fix. The developer replies that it is done, and Revue acknowledges and resolves the thread. The review becomes a conversation Revue tracks across the whole lifecycle, not a one-shot comment.

This layer is what stays constant as models improve. You pick the model. The review workflow is what you are actually running.

The CI cost question

For CI pipelines, Revue runs as a pipeline step in GitHub Actions, GitLab CI, or Bitbucket Pipelines. Here the cost comes from your API wallet, and because you choose the model, you also choose the cost.

I tracked the same review workload across two models:

April, Sonnet 4.5, 13 days: $78.83 in API costs
May to June, DeepSeek, 22 days of equal or heavier work: $27 total
Heaviest day, a full implementation sprint: $4.30 on DeepSeek versus about $15 on Sonnet (about 70 percent cheaper)

It runs on DeepSeek-V4-Pro through OpenRouter by default.

That saving comes from the model choice, not from anything clever in Revue's architecture. I tested Sonnet 4.5, Haiku 4.5, Qwen3 Coder, and DeepSeek-V4-Pro. DeepSeek won: it doesn't match Sonnet 4.5's precision on code review, but for automated CI review it was good enough, and the cost difference was decisive. I'm flagging this explicitly because technical readers deserve honesty.

Running /revue locally first compounds the saving. Issues fixed before the push never reach the pull request, so they never trigger a CI review. You skip the push, get-flagged, fix, push-again loop that bills you on every lap.

You bring your own key: OpenAI, Anthropic, Azure, or any OpenRouter model. The model is yours to pick, and your diff is the only thing that leaves your machine.

Who this is for

If you are doing AI-assisted development and your API bill is growing, the /revue local workflow is the most direct fix. You are already paying for the subscription.

If you run AI review in CI on GPT-4 or Sonnet-class models, Revue cuts the bill by letting you swap in a cheaper model. It also adds the workflow that a single model call doesn't provide: multi-agent synthesis, contradiction checks, a reply-loop, and history. The model switch saves money. The workflow is why you keep it.

If you are on a team where AI made writing code the fast part and review the slow part, Revue is a companion reviewer, not a replacement for one. It takes the first pass, so your reviewers spend their time on the findings that need human judgment. A human stays in the loop: when someone marks a finding as a false positive, Revue records that pattern into your repo's review config through a small pull request. Merge it once and the whole team stops seeing it, and that shared config becomes a knowledge base that grows with every correction, so the bottleneck shrinks instead of repeating.

Try it

Revue is live. Free tier: 25 reviews per month, no credit card required.

curl -fsSL https://raw.githubusercontent.com/Revue-sh/revue/main/scripts/install.sh | bash -s -- --key <your-key>

https://revue.sh

I've been running Revue since March and I'm genuinely curious how it'll work in your workflow. If you give it a shot, email me at support@revue.sh with what you think.