Hannah Redmond

Posted on Jun 30

Why I Stopped Trusting PR Line Counts (And Built a Complexity Scorer)

#developer #github #complexity #codereview

Why I Stopped Trusting PR Line Counts (And Built a Complexity Scorer)

A 200-file refactor and a 10-line bugfix look identical in GitHub's pull request list. That's a problem.

A while back, my team shipped a nasty regression. The root cause? A massive refactor PR had been sitting in the queue for three days, buried between tiny bugfixes and dependency updates. Nobody spotted it. Everyone assumed "it's just another PR."

But it wasn't. It was 2,847 lines across 56 files, touching core business logic, authentication flows, and the database layer. In GitHub's UI, it looked like this:

chore: upgrade to Node 22 LTS                    +2,847 −198
fix: token expiry not refreshing on 401             +45 −12
docs: update README                                  +8 −2

Three lines. Equal visual weight. One of them is a landmine.

The Problem: Flat PR Lists Don't Scale

GitHub shows you:

Title
Author
Labels
Line count
Last updated

That's useful for small teams. But when you're tracking 50+ open PRs across multiple repos, you need to know which ones are dangerous and which ones need your eyes right now. Line count is a terrible proxy for risk. Here's why:

A 500-line change to package-lock.json is trivial. It's generated, mechanical, and safe to approve.

A 50-line change to your auth middleware is terrifying. It's small, surgical, and high-stakes.

Yet in the PR list, the lockfile change looks "bigger." Your brain learns to ignore large numbers, so the genuinely risky PRs get lost in the noise.

What I Wanted

I wanted a dashboard that could:

Score every PR for complexity — not by raw lines, but by what actually changed
Show me only PRs that need my review — filter out my own PRs, drafts, and things I've already reviewed
Surface blockers instantly — build failures, merge conflicts, changes requested, all in one view
Work across repos — one view for everything my team owns
Surface failing CI/CD — see which workflows are failing across branches without leaving the dashboard

So I built ReviewRadar.

The Complexity Score: How It Works

The core idea is simple: weight every changed file by its type, then combine churn, file count, and rewrite intensity into a single 0-100 score.

Step 1: File Relevance

Not all files are equal. A generated lockfile shouldn't count the same as your auth service.

File type	Weight	Examples
Source code & tests	1.0	`.ts`, `.tsx`, `.py`, `.go`, `.rs`
Config & infra	0.5	`.yml`, `.json`, `Dockerfile`, `.tf`
Documentation	0.1	`.md`, `.rst`, `.txt`
Generated / binary	0.0	`node_modules/`, `.lock`, images, `.min.js`

This means a PR that changes package-lock.json + 3 source files gets scored almost entirely on those 3 source files. The 500-line lockfile contributes virtually nothing.

Step 2: Weighted Churn

For each relevant file, compute:

churn = max(additions, deletions) × weight

Per-file churn is capped at 500 lines to prevent a single enormous file from dominating the score.

Step 3: File Spread

More files = more context to hold in your head = more complexity. We add a logarithmic file-count factor:

spread = ln(1 + relevantFiles) × 10

This means going from 5 → 15 files matters more than going from 50 → 60.

Step 4: Rewrite Intensity

A PR that deletes 1,000 lines and adds 1,000 lines is a rewrite, not a refactor. We measure this with:

intensity = ln(1 + totalChurn) × (churnRatio / (1 + churnRatio))

Where churnRatio = totalChurn / (1 + |netChange|). High ratio = high rewrite intensity = higher complexity.

The Final Formula

score = ln(1 + weightedChurn) × 5.5
      + ln(1 + relevantFiles) × 10
      + 5 × intensity

The result is a 0-100 score with clear colour bands:

Score	Colour	Meaning
< 15	🟢 Green	Trivial — safe to approve quickly
15-29	🟢 Green	Small — quick review
30-49	🔵 Cyan	Medium — standard review
50-69	🟠 Amber	Large — careful review needed
70-89	🔴 Red	Complex — significant risk
90+	🔴 Red	Very complex — break it up

Real World Calibration

I tested this against real PRs from my team's repos:

PR	Files	Lines	Score	Label
Bugfix: null check	3	+45 / −12	18	Small
Feature: webhook listener	8	+124 / −18	42	Medium
Dark mode toggle	14	+312 / −84	55	Large
Node upgrade refactor	56	+2,847 / −198	73	Complex
Monorepo migration	300	+6,200 / −1,800	94	Very Complex

The scores spread nicely across the 0-100 range without clustering at the top. A 300-file PR scores ~94. A typical 10-file, 3,000-line PR scores ~55-60. The scoring feels right.

Zero Backend Philosophy (With a Server-Side Twist)

One of my non-negotiables: I didn't want to run infrastructure — no databases, no long-running servers, no ops burden.

ReviewRadar is a Next.js app that:

Static-exported to Cloudflare Pages — zero server costs
Uses a Cloudflare Worker to proxy all GitHub API calls
Your token is encrypted with AES-GCM and stored in an HttpOnly, Secure, SameSite=Strict cookie — never in localStorage, never visible to client-side JavaScript
The Worker validates every request against a strict path allowlist — unapproved endpoints are rejected with a 403 before they reach GitHub

This means:

✅ No signup required (OAuth or PAT)
✅ Your token is encrypted at rest — even a compromised browser can't steal it
✅ No infrastructure to manage
✅ Deployed to Cloudflare Pages — global edge network, zero cold starts

How it works: When you sign in, your token is sent once to the Worker (POST /api/session). The Worker encrypts it and returns the ciphertext as an HttpOnly cookie. Every subsequent API call goes through the Worker (/api/github/*), which decrypts the cookie, attaches the token, and proxies the request. The raw token never touches the browser's network tab — you can verify this in DevTools yourself.

What's In the Dashboard

Beyond complexity scoring, ReviewRadar gives you:

Customisable table — drag, drop, and show/hide any column (size, complexity, files, approvals, build status, etc.)
"Needs Attention" filter — instantly shows only PRs that are not yours, not drafts, not yet reviewed by you, and have no approvals
Blocked filter — aggregates build failures, merge conflicts, and changes requested across all repos
Workflow dashboard — see your main branch's latest run plus all failing feature branches. Expand any card to inspect individual jobs and quality gates (check runs). Inline summaries show what broke without clicking.
Status reports — visual breakdowns by author, label, status, and complexity spread
Deep PR drawer — slide out any PR for full details, reviews, comments, and a complete complexity breakdown
Auto-refresh — optional background refresh with browser notifications
Multilingual — English, French, Polish, and Vietnamese