Paulo Victor Leite Lima Gomes

Posted on Jun 2

AI code review is now a cloud workload

#github #githubcopilot #ai #productivity

Code review used to have a very human scaling problem.

The pull request was ready. The tests were green. Then it sat there waiting for someone with enough context, enough patience, and maybe enough coffee to read it properly.

AI code review changes that queue. GitHub Copilot can review pull requests automatically, and since June 1, those reviews consume GitHub AI Credits. For private repositories, they can also consume GitHub Actions minutes while Copilot prepares the environment and analyzes the code.

This is not surprising. Models cost money. Runners cost money.

But it does make the product category much clearer.

AI code review is not merely a helpful comment bot anymore. It is a cloud workload triggered by your development process.

And cloud workloads need budgets.

the review button is now an infrastructure decision

There is a familiar progression in developer tooling.

At first, a feature feels small and local. Someone enables it. A few people try it. The cost is invisible because the usage is limited and the invoice is mixed into something larger.

Then adoption grows.

Soon, the same feature runs across hundreds of repositories, thousands of pull requests, and multiple review passes per change. A product that felt like an editor convenience becomes an operating expense.

We already learned this with CI.

The first pipeline is cheap. Then every commit runs unit tests, integration tests, security scans, preview deployments, browser tests, and half a dozen checks nobody wants to remove because nobody remembers why they were added.

The pipeline is useful. It is also a bill.

AI review is entering the same phase.

GitHub's new billing model is explicit: code review consumes AI Credits, and private repositories use Actions minutes during the analysis. Organizations can configure a default runner and apply budgets at the user level.

Those controls sound boring.

They are also the part engineering leaders should care about.

review everything is not a strategy

It is tempting to enable AI review everywhere.

Why not? More review sounds better than less review. If the bot catches one security bug, one missing test, or one questionable API change, the entire month might pay for itself.

That is probably true in some teams.

It does not follow that every review is equally valuable.

A ten-line dependency bump and a large authentication refactor should not necessarily receive the same treatment. A generated documentation change may not need an AI reviewer. A risky database migration probably deserves more than one automated pass and a quick human approval.

Once reviews are metered, the policy questions become unavoidable:

Which repositories should request AI review by default?
Which pull requests should trigger it automatically?
Should generated changes receive the same review budget as human-written changes?
How many review rounds are useful before the comments become noise?
What runner size is actually necessary for each repository?
Which teams are spending more because their codebase is harder to analyze?

These are not finance-only questions.

They are architecture and engineering-management questions expressed through a bill.

cost attribution is a useful forcing function

I do not think metering is automatically bad.

Free-looking infrastructure has a way of hiding bad habits. A budget can force a team to decide what it actually values.

If one repository uses far more AI review credits and runner minutes than another, there may be a good reason. Maybe it is a critical service. Maybe the diffs are complex. Maybe the team is using the reviewer as an extra security layer.

Or maybe every tiny pull request is triggering an expensive workflow because nobody looked at the defaults.

The useful outcome is not minimizing the bill at all costs. The useful outcome is understanding what the bill represents.

CI minutes can tell you that your tests are too slow, too broad, or too flaky. Cloud spend can tell you that an architecture is chatty, over-provisioned, or poorly cached. AI review spend can reveal that your workflow is asking a model to examine a lot of low-value changes.

Cost is architecture feedback.

the other bill is reviewer attention

There is a second cost that will not appear on the GitHub invoice.

Every AI review comment asks a human to spend attention.

Some comments will be useful. Some will be technically correct but irrelevant. Some will identify a real concern without understanding why the code looks strange. Some will confidently suggest a cleaner implementation that quietly breaks an ugly but important edge case.

That means the cheapest AI review is not always the best one.

A review that costs a few credits but creates ten minutes of distraction for two engineers is expensive in a different way. Multiply that across a large organization and the bigger problem may be signal quality, not model usage.

This is why "number of AI review comments" is a terrible success metric.

The metrics I would want are more practical:

How often does an AI comment lead to a code change?
Which categories of comments are consistently useful?
How often are comments dismissed as noise?
Does AI review reduce human review time or add another review queue?
Does it catch defects that tests and static analysis missed?
Are teams starting to rubber-stamp reviews because a bot already looked at the diff?

The goal is not more comments.

The goal is better changes reaching production with less wasted attention.

keep humans in the uncomfortable parts

AI review is good at broad, patient inspection.

It does not get bored reading the fifth similar file. It can notice missing error handling, suspicious patterns, inconsistent tests, and changes that deserve a second look. It can give a pull request author feedback before a human reviewer arrives.

That is useful.

But the highest-value review questions are usually uncomfortable and contextual:

Is this the right change for the system?
Does the abstraction make future incidents easier or harder to debug?
Are we preserving an API contract that is not documented anywhere?
Is the migration plan realistic under production load?
Is this complexity necessary, or did we create it because the generated code looked plausible?

Those questions need judgment.

The AI reviewer can help surface evidence. It should not become a reason to skip the conversation.

what i would do this week

If your organization uses Copilot code review, I would treat the June 1 billing change as a good excuse to inspect the workflow before the invoice does it for you.

Start with visibility. Check which repositories use automatic reviews, which runners they use, and how frequently reviews run. Put a budget in place, even if it is generous. A budget is an alerting mechanism before it is a restriction.

Then sample the output.

Take a few weeks of review comments and classify them. Useful defect catch. Helpful suggestion. Style preference. Duplicate of an existing check. Wrong. Ignored.

You do not need a complicated dashboard on day one. A spreadsheet and an honest conversation will tell you more than a vanity metric.

Finally, decide where AI review belongs in the pipeline.

Maybe it runs automatically on critical services. Maybe authors request it before asking for human review. Maybe small dependency bumps skip it. Maybe security-sensitive changes get a more deliberate policy.

The correct answer will depend on the codebase.

That is the point.

the punchline

AI code review is becoming infrastructure.

It consumes model credits. It can consume runner minutes. It generates operational data. It needs defaults, budgets, and a reason to exist in each workflow.

That does not make it less useful.

It makes the engineering conversation more honest.

"Let the AI review everything" sounds like a productivity strategy until every pull request spends compute and every comment spends attention.

The teams that get value from AI review will not be the teams with the most automated comments.

They will be the teams that know which reviews are worth paying for.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

DEV Community