Sagiv ben giat

Posted on Mar 12 • Originally published at debuggr.io

The New Bottleneck - When AI Writes Code Faster Than Humans Can Review It

#ai #codereview #productivity #softwareengineering

The 10x Productivity Paradox

We're living through an incredible shift in software development. AI agents have made it absurdly easy to write code - close tech debt gaps, create multiple POC implementations, kill bugs, dramatically increase testing coverage, and ship new features at lightning speed. The promise of 10x velocity increase is real, and it's here.

But here's the uncomfortable truth: the bottleneck has shifted.

In the traditional software delivery cycle, writing code was often the slowest phase. Now? It's the fastest. The new bottleneck is code review. Humans simply can't review the enormous volume of code that AI agents can generate.

The "Keep PRs Small" Band-Aid

The obvious solution seems simple: stick to the good old standard of "keep pull requests small." Set limits on lines of code or number of files. Problem solved, right?

Not quite.

Sure, we can enforce small PRs and technically 'solve' the review bottleneck. But let's be honest - we're not solving the problem, we're ignoring innovation and pushing back on evolution.

If we can achieve a 10x increase in code generation, why not strive for a 10x increase in code review capacity?

If we keep delivering at the same pace while the world evolves around us, our competitors won't wait - and neither will our customers.

The Obvious Suspect: AI Reviewers

So what's the solution? Well, the obvious suspect is the AI agent itself, right? If an AI agent helped us produce 10x more code, it should help us review it at the same rate and volume.

But then comes that feeling. That awkward, uncomfortable feeling.

This code isn't really mine. I didn't write it. I barely reviewed it. How can I have confidence shipping it to production?

A Strange Thought: What If AI Code Is Just Another Dependency?

Lately, a strange thought has been popping into my mind: What if AI agents are dependencies?

Think about it. What if AI-generated code is just another third-party code? I'm positive that most of us don't review third-party code - dependencies that we install and use. If you think about it, most of the code we ship to production isn't ours. It's a third-party library we use, a dependency library, which also has dependencies, which have dependencies, and so on. You know that tree.

But here's the part we rarely stop to think about: the trust is recursive. The author of that library you trust? They also trust third-party code they didn't review. And the authors of those nested dependencies trust their own dependencies. Every node in that tree is trusting code that someone else wrote, someone else reviewed - or maybe no one reviewed at all. And we ship all of it to production.

So what's the difference between that and AI-generated code? In both cases, someone - or something - we don't know wrote the code, someone we don't know reviewed it, and in the case of open-source, someone we don't know even starred it on GitHub and we base our trust on that.

How We Make Peace With Third-Party Code

How do we reconcile our minds with third-party code integrated into our codebase? We actually do a lot more than we realize.

We vet before we adopt. We check how active the project is - when was the last commit? How fast do maintainers respond to issues? We look at community signals - stars, downloads, who else is using it. We check the license. We read the docs. We might even skim the source code of the critical parts.

We test. Unit tests, integration tests, end-to-end tests, manual QA. We write contract tests that verify the dependency behaves the way we expect it to.

We secure our supply chain. We run vulnerability scanners and dependency audits. We pin versions and use lock files so nothing changes underneath us without our knowledge. We monitor for known vulnerabilities in our dependency tree.

We set boundaries. We wrap third-party code behind interfaces and abstractions. We define contracts - what goes in, what comes out. We make sure we can swap a dependency out without rewriting half the codebase.

We monitor in production. We have observability - logs, metrics, alerts. We use staged rollouts and feature flags. We can roll back fast if something breaks.

What if AI-generated code is just another third-party dependency? What if we need the same trust mechanisms - vetting, testing, boundaries, security scanning, and monitoring?

The Rabbit Hole Deepens

The longer I think about this idea, the harder it is for me to counter-argue against it. My mind starts drifting to more questions - and the uncomfortable part is that none of them feel unanswerable.

Should there be a way to distinguish AI-generated code from human-written code in the codebase? How do I mark it, track it, set boundaries around it?
Do I need to see and read the code myself, or is it enough to define clear review rules and let AI agents review it for me?
How many AI agent reviewers do I need before I feel confident? Is one enough? Three?
If multiple AI agents review the same code and agree, is that more trustworthy than a single human reviewer who skimmed it?
Should we have different AI agents with different review focuses - one for security, one for performance, one for correctness?
What's the equivalent of "2 approvals required" in an AI-review world?

I don't have all the answers yet. But the thing that keeps pulling me deeper into this rabbit hole is that these questions feel like they have answers. They feel like problems we can solve - with tooling, with process, with experimentation. And if that's the case, if these concerns can be addressed with real solutions, then are we actually looking at a legitimate shift in how we think about code ownership and review?

The Uncomfortable Questions

Here's what keeps me up at night:

Trust without understanding: We trust libraries with millions of downloads that we've never read. Why is AI-generated code different? Is it because it's generated specifically for our codebase, making us feel more responsible for it?

The illusion of control: When we install a dependency, we accept that we don't control it. When AI generates code in our repository, we feel like we should control it. But should we?

And yet, the more I sit with these questions, the more I keep arriving at the same place:

Ownership doesn't mean authorship. Here's the reality - we own the code we ship to production, including third-party libraries. When there's a bug, the customer doesn't care if the faulty code is a third-party library or code written by me. Nor should I care about the distinction when fixing it. I should fix the bug - either by open-source contribution or in "user-land" - and make sure I catch it next time before the customer does. The same goes for AI-generated code. I didn't write it, I might not have reviewed every line, but if I use it, it's mine. Ownership is about responsibility, not authorship.

What This Means for the Future

If we accept this mental model - that AI-generated code is a dependency - it could change how we think about a lot of things. None of this is a recipe. It's a direction worth exploring.

Code review doesn't disappear - it evolves. Just as AI became our multiplier for writing code, it could become our multiplier for reviewing it. We'd still review. But the nature of human review might shift from line-by-line inspection to higher-level concerns, while AI agents handle the detailed implementation review.

Human review could shift toward contract review: Instead of reviewing every line ourselves, we'd focus on the contract: What should this code do? What are the edge cases? What are the performance requirements? AI reviewers could handle the implementation details - style, correctness, edge cases in the code itself.

Trust mechanisms could add layers of confidence: Tests, security scanning, pinned versions, and observability wouldn't replace review - they'd reinforce it. Together with AI-assisted review, they could form a safety net that's broader than any single human reviewer could provide.

Documentation might matter more than implementation: Understanding what the code does could become more important than understanding how it does it. Just like we read API docs for libraries instead of their source code.

Monitoring could close the loop: Instead of assuming correctness at merge time, we'd verify it continuously in production. Observability, alerts, and fast rollbacks would become first-class citizens in the delivery process - not afterthoughts.

If you take this idea far enough, it hints at something even more fundamental - we might be moving to a higher layer of abstraction altogether. English becomes the programming language. We write declarative specifications (the "what") instead of imperative code (the "how"). "Create a user authentication system with JWT tokens and rate limiting" becomes the new code. The AI figures out the implementation. This would require new forms of verification - tests that validate behavior rather than implementation, security scanning that runs automatically, monitoring that confirms correctness in production, and pipelines that validate the "what" was correctly translated into the "how."

The Next Bottleneck

Here's another brain teaser: What happens when both code generation and code review are 10x in volume?

Do we have enough product requests to keep us busy? Do we have enough problems to solve? Does the bottleneck shift to product discovery and requirement gathering?

But that's a topic for another post.

Where Do We Go From Here?

I don't have all the answers. This is a thought experiment, not a playbook. But if this idea resonates, here are some directions worth exploring:

What does trust infrastructure look like? Tests, security scanning, monitoring - the full chain, not just one link. If these are going to support our quality gate, how comprehensive, fast, and reliable do they need to be?
How should code review processes change? Maybe we need different review levels - contract review vs. implementation review. Maybe AI-generated code gets a different review process than human-written code. What would that look like for your team?
Where do boundaries make sense? Wrapping AI-generated code behind clear interfaces and abstractions could keep the blast radius small if you need to replace or rewrite it later.
What can we learn from AI reviewing AI? What happens when multiple AI agents review each other's code? What do they catch that humans miss - and what do they miss that humans catch?
What should we measure? How often does AI-generated code cause production issues compared to human-written code? We won't know unless we track it. Let data inform the conversation.
Is the discomfort useful? This feeling of unease might be healthy. It keeps us questioning and improving our processes rather than blindly adopting a new paradigm.

Final Thoughts

The AI era is forcing us to question fundamental assumptions about software development. The idea that AI-generated code might be just another dependency is uncomfortable, controversial, and possibly wrong.

But it's worth exploring. And if we do explore it, we don't have to go all-in immediately. In fact, we probably shouldn't.

Some areas of our codebase are critical and dangerous. Payment processing, authentication, security features, data privacy controls - maybe it's fine not to be 10x there. Maybe slower is safer, and safer is better. We can experiment with AI-generated and AI-reviewed code in less critical areas first. Internal tools, admin dashboards, test utilities, documentation. Build confidence, gather data, learn what works and what doesn't. Then, gradually expand as we develop better trust mechanisms, review processes, and confidence in the approach.

Not every part of your system needs to move at 10x speed. Sometimes, the right speed for critical infrastructure is the speed at which humans can thoroughly understand and validate every change.

But if we're going to unlock the true potential of AI in software development, we need to rethink not just how we write code, but how we review it, trust it, and build confidence around it.

The bottleneck has shifted. The question is: are we ready for a world where the code we ship isn't the code we wrote?

What do you think? Is AI-generated code a dependency? How are you handling code review in the age of AI? I'd love to hear your thoughts - find me at @sag1v.

Originally published on debuggr.io.

I write about software engineering, AI, and the things that keep bugging me about our industry. If this resonated with you, come visit debuggr.io for more.

DEV Community