DEV Community

Andrew for Koalr

Posted on

We Scored 28 Famous Open Source PRs for Deploy Risk — Here's What We Found

The React Hooks PR that changed every React application on earth? Three words in the commit message. One feature flag removed. It scored 91 out of 100 for deploy risk. The Svelte 5 release scored 99. A 65-line TypeScript change scored 79 and silently broke type inference in codebases worldwide.

We ran 28 landmark open source pull requests through Koalr's deploy risk model. Here is what we found — and why it matters for the PRs your team ships every week.


The problem with code review

Modern code review answers one question well: is this code correct?

It answers a different question poorly: how likely is this to cause a production incident?

Those are not the same question. A PR can be clean, well-written, and thoroughly reviewed — and still wreck production because it touches a critical path nobody flagged, because the reviewer had twelve other PRs open, or because it is the fourth consecutive revert of a feature that never landed cleanly.

Most teams have no objective signal for the second question. They have green checkmarks.


What deploy risk scoring is

Koalr scores every pull request from 0 to 100 before it merges, built from 36 signals across four categories:

Blast radius — files changed, services affected, CODEOWNERS compliance, shared library modifications

Change quality — file churn, change entropy, lines added vs deleted, test coverage of changed files

Context — reviewer load, author's recent incident rate, time since last deploy, revert history on the changed file set

History — consecutive reverts of the same feature, recent incident correlation, PR age

  • 0–39 → Low
  • 40–69 → Medium
  • 70–89 → High
  • 90–100 → Critical

The score does not replace review. It gives reviewers a number to orient around before they start reading.


The obvious ones scored as expected

Svelte 5 release — score 99

The full runes rewrite merged to main. Thousands of files changed, the entire reactivity model replaced, years of migration work consolidated into one merge. High blast radius, enormous file count, fundamental architecture change. The model does what you would expect.

TypeScript modules conversion — score 98

Microsoft's conversion of the entire TypeScript compiler from namespaces to ES modules. Touched every source file in the compiler, changed the build system, dropped dependencies. If any PR in history deserved a mandatory all-hands review before merge, it was this one.


The surprising ones — small diffs, enormous blast radius

This is where it gets interesting.

React PR #14679 "Enable hooks!" — score 91

The commit message is three words. The diff is the removal of a single feature flag. You could read the entire change in thirty seconds.

It scored 91.

The model does not count lines — it looks at what the changed code controls. A feature flag in a framework used by tens of millions of applications is not a small change. It is a detonation switch. The blast radius is every React application on earth.


json
{
  "blast_radius_score": 0.97,
  "feature_flag_detected": true,
  "downstream_consumers": "critical",
  "reviewer_load": 0.2
}

Score: 91 / Critical
Enter fullscreen mode Exit fullscreen mode

Top comments (0)