My hot take of the day is that PR reviews are dead.
AI generates far more code than we're able to read or understand. Often times, when tasked to do a PR review, I'll make a comment about a better pattern or a semantic ambiguity only to realize that my comment will help a human that's working with the code, not a machine. In other instances, I'll find myself saying "add more logging" or "send more breadcrumbs to Sentry." But even this is easily fixed with better prompting and agents that do second passes over a code base.
X abounds with developers, including the creator of Claude Code, claiming that they ship upwards of 200+ PRs a week. I guarantee that these were not backed by 200+ high-quality human code reviews. And it's not like these are toy projects - Claude Code has tens of millions of users and USD 1b+ ARR.
In the face of this onslaught, I've seen two competing tendencies:
- Claim that code review is more important then ever, chastise people for not doing it, and position the reviewer as the last rampart against slop, bugs, and drift.
- Give up entirely and replace it with something else.
I vote for #2.
Code review makes about as much sense as reviewing the assembly generated by a C compiler. In the early days of C compilers, some people probably needed to do this and even rewrite or inline assembly. But as the tooling got better, only compiler developers spent time on this class of problems. Now, compiler optimizations exist mostly at the margins: the biggest battles were fought and won long ago.
An LLM is a compiler for the mind, and code is its assembly. So our tooling should lint and add a modicum of safety to our thinking, not to the code. We want brain review, not code review.
IMO, we can get there in three ways.
One, we need to make it easier than ever to test apps. A developer that's piloting a coding agent wants to achieve an outcome, and the thing we want to review is that outcome.
Going back to the compiler analogy, compilers really became powerful once they were able to produce viable executables quickly. In the bad old days, people had to wait hours for an executable to be produced, and even then, its stability couldn't be taken for granted. Once the production of an executable became fast and cheap, messing around with it became trivially easy, and whole new categories of tooling emerged, from valgrind to AFL.
We want to be able to create test environments quickly. We want to seed these environments with test data and make them observable and controllable by MCP agents. For example, if I want to test out a new quota feature that gates the publication of a marketing campaign for users that have reached a certain quota, I need to be able to:
- Spin up the env.
- Seed the database.
- Fast-forward the app to the point right before a user with a given profile hits the quota.
None of these are trivial - some of them require fudging in-app logic that would otherwise require waiting 10 or 15 minutes. The important thing here is to make it trivially easy to coax applications into these states - not by adding heavy testing infra and test-only switches throughout the code, but by making the app instantly deployable and accessible by agents that can bring it where you want it to be in less than 5 minutes.
Shameless self-plug: I'm building and using Autodock to do exactly this.
Two, we need fully embrace spec-driven development. We're just at the beginning of that journey. Version 0 of specs, popularized by spec-kit write giant markdown documents that people half-read. Now, we're asking agents to ask us more and more questions, interview us, and challenge us. Meaning we're building systems to force us out of complacency and become opinionated where we need to be. Eventually, I think all specs will be the results of discussions - we will no longer read them because they'll be artifacts of a conversation.
Three, we need to embrace background agents that clean up after us. It's true that Claude Code often misses the mark on context because the application is not organized in a way that's easy to navigate. But if Claude is tasked with refactoring into smaller files or enforcing cleaner patterns, it will often be able to one-shot these tasks, especially if there is a robust test suite. For example, I regularly use Claude Code as a background task to make sure API contracts are rigorously enforced through validators.
By doing these three things:
- effortless previews
- spec-driven development
- background agents
We will have obviated the need for code review. If I were an engineering manager, I would double down here and banish code review aside from CodeRabbit or Copilot. Code review is squeegeeing the ocean, whereas the three practices above are the code equivalent of terraforming Mars. Done well, they allow for bold and ambitious projects that are more correct, robust, and resilient then their code-reviewed ancestors of yore.
Top comments (0)