$Cover image for The Math Doesn't Work$

Jono Herrington

Posted on Apr 23 • Originally published at jonoherrington.com

The Math Doesn't Work

#ai #programming #discuss #webdev

Less scrutiny for complex code changes

I built a dashboard to track how long PRs sat in review. I wanted to see where velocity was getting stuck. It was a pattern that made me question every approval I'd ever given.

The Numbers That Didn't Add Up

The dashboard showed two bars side by side. PRs under two hundred lines averaged fifteen minutes in review. PRs over a thousand lines averaged twenty minutes.

Five minutes difference. For five times the code.

I sat with that for a minute. A ten-line change that renames a variable gets nearly the same scrutiny as a five-hundred-line refactor that touches authentication, caching, and the checkout flow. The math doesn't work. More code, less review. Less code, more scrutiny.

I know why it happens. The thousand-line PR is opaque. You can't evaluate what you can't trace. The dependencies spider across files you haven't opened in months. The test coverage looks comprehensive but you can't tell if the tests are testing the right thing. So you scan for syntax, check that the build passes, and move on. The fifteen-line PR is legible. Every choice visible. Every tradeoff exposed. Fair game for critique.

We're not being lazy. We're being human. We push back on what we understand and surrender to what we don't.

But here's the cost: the code we can debug gets rejected. The code we can't gets merged.

Where I Got It Wrong

I've been the person rubber-stamping those thousand-line PRs.

There was a Tuesday last quarter. I was running between two meetings, eating lunch at my desk when a notification popped up. A senior engineer had opened a twelve-hundred-line PR that touched our inventory system, our caching layer, and add to cart. I had twenty minutes before my next call. I scrolled through the first hundred lines, saw familiar patterns, checked that tests passed, and clicked approve.

Later that week, a junior engineer opened a fifteen-line PR. Just fifteen lines. It changed how we handled null checks in a single component. I had time. I left seven comments. Three of them were about naming conventions. One was about indentation.

I didn't realize what I'd done until I saw the dashboard data. I had spent more time debating formatting on fifteen lines than I had spent evaluating a twelve-hundred-line change to core infrastructure. The large PR sailed through because I was busy and it was big and I trusted the engineer who wrote it. The small PR got interrogated because I had the bandwidth and the code was small enough that I could see every detail.

That's not rigorous review. That's convenient review.

The Preference Problem

This is where I think the conversation about PRs usually goes wrong. We talk about review culture like the problem is that people aren't reviewing thoroughly enough. But the deeper issue is that most PR debates aren't about correctness at all. They're about preference.

I've watched teams spend thirty minutes arguing about whether to use early returns or nested conditionals. I've seen PRs blocked over variable naming that follows every convention but the reviewer's personal taste. I've been in threads that went fifty comments deep about formatting choices that could have been handled by a linter in half a second.

If you care about consistency that much, automate it. Add it to a git hook. Configure your formatter. Build it into CI. Stop making humans repeat the same debates every PR when a machine can enforce the standard without ego, without fatigue, without mood.

It's Not AI Versus People

We often frame these conversations as people versus AI. It's not.

If an AI is opening a PR that's twelve hundred lines, it's because you're allowing the AI to open a PR that's twelve hundred lines. The AI will exponentially mimic your behavior. It's like having a kid ... you give them guidelines or they do whatever they want. Guardrails turn output into something good.

The same pattern applies whether the PR comes from a senior engineer you've worked with for five years or an AI assistant that generated the code in thirty seconds. The bias isn't against AI. The bias is toward opacity. We rubber-stamp what we can't hold in working memory. We interrogate what we can see clearly.

The engineers who complain that AI code gets more scrutiny than human code are missing the point. The AI code gets scrutiny because it's small enough to evaluate. The human code gets a pass because it's too big to comprehend. The problem isn't who's writing it. The problem is the size.

What I Changed

We made a hard rule on what could be allowed inside PRs. Anything over two hundred lines got closed immediately. Not rejected ... redirected. Break it into vertical slices. Ship the inventory update end to end ... database, API, and UI together. Then the caching fix. Then the checkout flow. Each slice reviewable in under thirty minutes.

And the leaders were accountable for holding the team accountable. Not optional. Not "when we have time." If a twelve-hundred-line PR showed up, we closed it and explained why.

We also automated the preferences. Naming conventions, formatting, import order ... all of it went into git hooks and CI. If a machine can enforce it, humans don't debate it.

The engineers pushed back at first. They were used to working in branches for weeks, accumulating changes, shipping everything at once. It felt like more work to break things apart. It felt like more overhead.

But something happened after the first month. The reviews got better. Not longer ... better. People were actually reading the code instead of skimming it. They were catching things: race conditions they'd missed before, edge cases that only showed up when the changes were isolated, assumptions baked into the architecture that didn't hold up under scrutiny.

And review time shrank. Once the style debates were automated, engineers stopped spending thirty minutes on indentation and naming conventions. They spent that time on the actual logic.

When the scope is clear and the machines handle the preferences, reviewers can focus on what actually matters: does this change do what it claims to do, and does it do it safely?

We're not being lazy. We're being human. We push back on what we understand and surrender to what we don't.

The Real Metric

Most people think more review is better. I think consistent review is better.

Same standard. Same depth. Same courage to say "this doesn't make sense" or "this needs to be smaller" whether it came from Claude or a colleague.

The dashboard taught me something I didn't want to learn. I wasn't reviewing code based on risk or importance. I was reviewing based on cognitive load. The code that was easy to hold in my head got my attention. The code that exceeded my working memory got my trust.

That's a dangerous way to run an engineering organization. Trust should be earned by the code, not assumed because of the author or the size. Attention should be paid to complexity, not convenience.

I still catch myself doing it. I see a massive PR roll in and I feel the urge to scan and approve, to trust the process, to assume someone else is paying attention. Then I remember the numbers. Fifteen minutes for two hundred lines. Twenty minutes for a thousand.

The math doesn't work. But it can. If we're willing to close the PRs that are too big to review properly. If we're willing to automate the preferences so humans can focus on the decisions that matter. If we're willing to admit that our attention is a scarce resource and allocate it deliberately instead of conveniently.

The code we can debug gets rejected. The code we can't gets merged.

What This Means for Leaders

Your job isn't to review every line. Your job is to build a culture where the lines that get reviewed are the ones that matter.

That means you close the PRs that are too big to evaluate. Not because you're being difficult. Because you're being honest about human limits. The twelve-hundred-line PR isn't getting reviewed. It's getting trusted. And trust without verification isn't quality control.

It also means you automate the noise. If your engineers are debating formatting in 2026, that's a leadership failure. Not theirs. Yours. You had the tools to remove that distraction and you didn't.

Most importantly, it means you hold the line. When the senior engineer opens a massive PR and expects it to sail through because of their title, you close it. When the deadline pressure mounts and someone says "just approve it this once," you say no. The pattern is only a pattern because someone allowed it.

The dashboard will show you the problem. But the fix requires you to be the kind of leader who acts on what the data reveals, even when it's inconvenient.

One email a week from The Builder's Leader. The frameworks, the blind spots, and the conversations most leaders avoid. Subscribe for free.

Top comments (2)

Mindmagic • Apr 23

Great article! Really appreciate how you explained the concepts in a simple and structured way. This was both insightful and easy to follow. Thanks for sharing 🙌

Jono Herrington • Apr 24

Thanks!