Umesh Malik

Posted on Mar 10 • Originally published at umesh-malik.com

Anthropic Code Review for Claude Code: Multi-Agent PR Reviews, Pricing, Setup, and Limits

#anthropic #claudecode #codereview #aiagents

Anthropic launched Code Review for Claude Code on March 9, 2026, and the short answer is simple: this is a managed pull-request reviewer that runs multiple Claude agents in parallel, verifies their findings, and posts ranked review comments back into GitHub.

That sounds incremental until you look at the actual problem it is trying to solve. Modern teams are no longer bottlenecked only by code generation. They are bottlenecked by review quality. AI can now produce diffs faster than most teams can evaluate them, and classic review tooling still mostly catches syntax, style, and narrow static patterns. Anthropic is betting that the next productivity jump comes from moving code review up from rule enforcement to repository-aware reasoning.

If you searched for Anthropic Code Review, Claude Code review pricing, or how Claude Code code review works, this is the practical breakdown: what is confirmed, what it costs, how to configure it, and where it fits in a real engineering workflow.

TL;DR

Anthropic launched Code Review on March 9, 2026 as a new Claude Code capability for automated pull-request review.
Anthropic says the system runs multiple specialized agents in parallel, then verifies and ranks their findings before posting comments.
The core pitch is logic-aware review, not style policing. Anthropic says the system can reason over changed files, adjacent code, and similar past bugs in the repository.
In Anthropic's internal data, 54% of pull requests now receive substantive comments, up from 16% with older approaches.
Anthropic says engineers marked less than 1% of findings as incorrect, which is unusually low for automated review tooling.
As of March 10, 2026, Code Review is in research preview for Claude Team and Claude Enterprise customers.
Anthropic documents a typical cost of $15 to $25 per review and typical completion time of about 20 minutes.
Teams can customize the reviewer with REVIEW.md for review criteria and CLAUDE.md for project context.
Anthropic says Code Review is not available for organizations with Zero Data Retention enabled.
If you need a self-hosted path or are outside this managed GitHub flow, Anthropic points teams to GitHub Actions or GitLab CI/CD integrations instead.

What Anthropic Code Review Actually Is

The cleanest description is this:

Anthropic Code Review is a managed GitHub pull-request reviewer inside Claude Code that uses several Claude agents to inspect a PR from different angles, validate the findings, and surface the highest-value comments.

That last part matters. Plenty of review bots can already leave comments. What Anthropic is trying to do differently is move beyond isolated line comments and reason about:

whether a change breaks assumptions in another file
whether a new parameter or state path is handled everywhere it needs to be
whether a fix silently introduces a downstream regression
whether the diff violates team-specific review rules that are too nuanced for ESLint or a static policy engine

Anthropic's launch post gives a concrete example: a change added a new parameter in one file, but the corresponding state and logic were not updated elsewhere. The system flagged the bug in the untouched adjacent code path. That is the category that makes this interesting.

Why this matters
Anthropic is explicitly positioning Code Review as something that can catch bugs static analyzers often miss. That does not make static analysis obsolete. It means the product is aimed at a different layer of failure: cross-file reasoning, intent drift, and repository-specific logic bugs.

How Anthropic Code Review Works

The review lifecycle is more important than the headline. Once you understand the flow, you can see exactly where this helps and where it does not.

Why This Is More Than Another Linter

Most existing automation helps in one of two ways:

it enforces deterministic rules very cheaply
it blocks clearly bad patterns before humans ever look at the code

That is useful, but it is not the same as reasoning through intent. Anthropic's bet is that AI-generated diffs create too many review situations where the failure is not "bad syntax" but "locally plausible code that breaks a larger system assumption."

The obvious tradeoff is that Anthropic's approach is slower and more expensive than static tooling. But that is the wrong comparison if the real alternative is a human reviewer missing a subtle cross-file bug in a large AI-generated diff.

Pricing, Availability, and Setup

As of March 10, 2026, Anthropic documents the following:

Availability: research preview for Claude Team and Claude Enterprise
Cost: usually $15 to $25 per review
Speed: usually around 20 minutes
Setup path: admin installs the Anthropic GitHub app, connects repositories, and enables review on the branches you want covered

How To Configure Custom Checks Without Turning It Into Noise

The most important operational detail in the docs is not the launch metric. It is the customization model.

Anthropic exposes two simple files:

REVIEW.md for pull-request review instructions
CLAUDE.md for broader repository context, architecture, and project conventions

That is the right separation. CLAUDE.md tells the agents how your system is shaped. REVIEW.md tells them what to care about during review.

Example REVIEW.md:

# REVIEW.md

Prioritize comments about:
- authorization regressions across admin and customer paths
- idempotency in webhook handlers
- missing transaction boundaries on billing writes
- async jobs that can double-send emails, refunds, or notifications

Deprioritize:
- formatting and import order
- naming-only comments without runtime risk
- style nits already covered by linting

Example CLAUDE.md:

# CLAUDE.md

Architecture notes:
- packages/auth owns all role and permission checks
- apps/api is the only service allowed to mutate billing state
- apps/worker replays webhook events and must remain idempotent
- do not write directly to Subscription rows outside BillingService

This is where teams can get real leverage. If you do not encode your business invariants, the model falls back to generic review behavior. If you encode too much low-value policy, you recreate the comment spam problem you were trying to avoid.

Where Anthropic Code Review Fits Best

The ideal use case is not every repository on day one.

It is strongest when:

pull requests are large, AI-assisted, or cross-cutting
human reviewers routinely miss multi-file regressions
your team has real architectural invariants that are hard to encode in static rules
you are willing to pay for review quality, not just for code generation speed

It is weaker when:

you need ultra-fast deterministic gating in seconds
your organization requires Zero Data Retention today
your diffs are small and most review comments are already stylistic
you expect the tool to replace code owners, tests, or threat modeling

There is a broader product thesis here too: Anthropic is clearly trying to own more of the full coding loop, not just code generation. That makes sense. If models keep writing more code, the value shifts toward tools that can verify, criticize, and constrain that code before it reaches production.

Anthropic is also expanding the security side of that workflow with Claude Code Security, which makes this launch look less like a one-off bot feature and more like the start of a layered AI review stack.

FAQ

Final Take

Anthropic Code Review is not interesting because it leaves AI comments on a PR. Plenty of tools can do that. It is interesting because Anthropic is aiming at a harder problem: can an AI reviewer reason across a real codebase well enough to catch bugs that deterministic tooling and rushed humans both miss?

The early signals are strong enough to take seriously. The internal comment-rate jump from 16% to 54%, the claimed sub-1% incorrect rate, and the docs around REVIEW.md and CLAUDE.md all suggest this is a real attempt to make review agentic rather than cosmetic.

But the tradeoffs are equally real: this is a managed service, it is not compatible with Zero Data Retention, it costs real money per review, and it takes real time to run.

So the right framing is not "Will Anthropic replace code review?" The right framing is: for high-risk PRs, does paying for a slower, reasoning-heavy AI reviewer catch enough bugs to justify the latency and cost?

For teams already generating code with AI, that is exactly the next question that matters.

Sources

Originally published at umesh-malik.com

Top comments (2)

MergeShield • Mar 10

Thanks for the detailed breakdown. A few things stood out to me:

The $15-25 per review price point means this is Enterprise-only in practice. If you're a 10-person team doing 30 PRs/day, that's $450-750/day or roughly $10K-16K/month just for AI review. For context, CodeRabbit is $24-30/user/month regardless of PR volume. The pricing tells you a lot about who Anthropic is building for.

The more interesting question is what happens after the review. Code Review tells you about bugs and logic errors. But it doesn't make the merge decision. It doesn't track which AI agent authored the PR. It doesn't know if your team policy says "PRs touching the payments directory always need senior review" or "Copilot-generated PRs under risk score 25 can auto-merge." That governance layer is still entirely manual for most teams.

I'd love to see Anthropic publish data on false positive rates broken down by language and PR type. The <1% "marked incorrect" stat is impressive but I'd want to see that replicated outside Anthropic's own codebase.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.