DEV Community

Aamer Mihaysi
Aamer Mihaysi

Posted on

The PR Review Bottleneck Is the Real Crisis in AI-Assisted Coding

The PR review bottleneck is the real crisis in AI-assisted coding. Everyone's measuring tokens generated, lines written, files touched. Shopify's CTO Mikhail Parakhin dropped the actual insight: his team spends more on critique than generation, and the ratio matters more than the total.

This flips the narrative entirely. We've been optimizing for speed of writing when we should have been optimizing for speed of merging. The constraint moved upstream.

Here's what happens when AI writes code at machine speed. Test failures spike. Deployment rollbacks multiply. The probability that something breaks approaches one fast. Parakhin's team saw this clearly: "good model writes code on average with fewer bugs than the average human. But since they write so much more of it, more of it will make it into production."

Volume becomes the enemy. Not because the code is worse, but because there's more surface area for failure. Your CI pipeline wasn't built for this throughput. Your staging environments choke. Your integration tests_TIMEOUT_.

Shopify's response was to build their own PR review system. Not because existing tools are broken, but because they're measuring the wrong thing. Most review tools optimize for quick turnaround. They use fast models, surface-level checks, parallel agents throwing comments at the wall. Parakhin went the opposite direction: expensive models, serial critique loops, deliberate latency. One agent writes. Another critiques with a different model. They debate. The quality jumps.

The latency is the feature. An hour of model debate still beats a week of human back-and-forth. More importantly, it beats the alternative: code that passes review but breaks production.

This maps to a pattern I'm seeing across teams that actually ship AI-assisted code at scale. They're rebuilding their review infrastructure from the ground up. Not adding AI to existing human workflows. Designing for a world where the author is fast, fallible, and prolific.

The implications run deep. Git, pull requests, CI/CD—these metaphors were built for human speed. A human writes a few hundred lines, another human reviews it, tests run, it merges. At machine speed, the global mutex of the merge queue becomes the bottleneck. The queue backs up. Conflicts explode. Integration hell returns.

Some teams are experimenting with stack diffs and merge queues. Others are splitting services further—microservices making a comeback not because they're elegant, but because they let teams ship independently. When your AI can generate a service in an afternoon, the coordination cost of a monolith becomes unbearable.

The deeper insight is about feedback loops. Fast generation without fast validation is just fast accumulation of technical debt. The teams winning with AI coding aren't the ones with the highest token budgets. They're the ones with the tightest loops between write, review, test, and deploy.

This is why tokenmaxxing metrics miss the point. Jensen Huang's directionally correct—engineers should use more AI—but the raw count is noise. What matters is the ratio of generation tokens to verification tokens. If you're not spending heavily on critique, testing, and validation, you're not using AI effectively. You're just accelerating your bug production.

Shopify tracks this explicitly. They fund unlimited tokens but discourage anything below Opus 4.6 for review. They want the expensive reasoning, the extended thinking, the models that cost more per token because they catch more per token.

The future of AI coding isn't faster autocomplete. It's infrastructure that absorbs machine-speed output without breaking. Review systems that scale with generation. Deployment pipelines that don't choke on volume. Metrics that measure quality of merge, not quantity of code.

The bottleneck was never typing speed. It was always the organizational friction between idea and production. AI removed the typing constraint. Now we have to remove everything else.

Top comments (0)