Gabriel Holmes

Posted on Jul 3 • Originally published at testward.app

How to know which tests a PR will break — before CI runs

#testing #devops #qa #automation

Every team has lived this: a pull request looks clean, gets approved, merges — and twenty minutes later the test suite is red. The change was in src/. The failure is in tests/. And nothing in the review surfaced the connection.

Why code review misses it

Code review shows you the diff. It does not show you what depends on the diff. A reviewer reading a one-line selector rename has no way to know that three end-to-end specs select on that exact attribute — especially if those specs live in a folder they never open, or a repo they don't have checked out.

So the breakage is discovered by the most expensive possible detector: a full CI run, after merge, by whoever is on call for the red build. The person with the most context — the author, mid-review — never saw it coming.

The three ways teams try to close the gap

1. Run the whole suite on every PR. Correct, but slow and expensive, and it tells you after the fact — you still wait for the red. On large suites it's minutes-to-hours of feedback latency per push.

2. Coverage-based test selection. Map code to the tests that execute it and run only those. Powerful for unit tests, but it needs instrumentation, breaks down for E2E (where the "coverage" is a running browser), and says nothing across repo boundaries.

3. Static impact analysis at review time. Read the diff, extract what it changes that tests depend on — selectors, routes, ids, visible text — and match those against the test files. This is cheap, needs no instrumentation, and runs the moment the PR opens.

What "what tests depend on" actually means

End-to-end and integration tests are coupled to the app through a small, identifiable set of anchors:

Selectors / test ids — data-testid, data-cy, getByRole, CSS, XPath
Routes — URL literals the test navigates to or asserts on
Visible text & labels — button copy, headings, aria-labels the test queries
Element ids and names — the locators page objects hang off

If a PR changes one of these, any test referencing it is a break candidate. That's a tractable matching problem you can run in seconds — no test execution required.

Want to see it on a real diff? I built a free scanner that runs this extraction entirely in your browser (nothing gets uploaded): testward.app/diff-scanner

The cross-repo blind spot

Here's where it gets interesting for QA teams specifically. Many of us keep automation in a dedicated repo — separate from the app it tests. Good hygiene, but it severs the only signal: the frontend dev's PR is green (no E2E tests live there), and the automation repo goes red hours later with no link back to the cause.

Coverage tools, affected-test selectors, and CI test-splitting all operate within a single repo and a single test run. None of them can say "a change here breaks a test there."

How I automated this

I'm a QA automation engineer, and this loop ate enough of my afternoons that I built a GitHub App for it — Testward. On every PR it:

Reads the diff and extracts the anchors the change touches.
Scans your test files — same repo, or a separately-linked automation repo (one line of config).
Runs an LLM confirmation pass that names the specific specs likely to break and why.
Posts one sticky comment: risk level, affected specs, reasons.

The reviewer sees the consequence next to the cause, while the author still has the context to fix it — or update the test in the same PR.

Known limitation worth being upfront about: dynamic selectors (`row-${id}`) can't be matched statically — that's the gap I'm working on. If your suite leans on template-literal test ids, the anchor model will miss those.

Takeaway

You don't need to run tests to know a PR endangers them. The dependency between app code and test code is mostly textual — selectors, routes, labels — and you can check it at review time, when the fix is cheapest. Whether you script it yourself or use a tool, moving that signal left is one of the highest-leverage things a QA team can do.

Questions or war stories about PRs silently breaking your suite? I'd genuinely like to hear them — especially from teams whose automation lives in a separate repo.

Top comments (2)

Aldo • Jul 15

We wrestled with this exact problem for years in a rapidly growing SaaS product, especially as our test suite ballooned to hundreds of thousands of individual tests across multiple services. A full local run became a non-starter, taking upwards of 45 minutes on even powerful machines, which completely killed local development flow. We found that relying solely on Git's changedSince for test selection, while helpful for direct changes, often wasn't enough. The transitive dependencies and broader system impact were frequently missed, leading to those frustrating CI failures that the post describes.

Our most effective strategy involved building a custom layer on top of our build system, primarily around module dependency graphing. For our services built with Go, for instance, we'd analyze the import graph starting from changed files and then map those back to corresponding test files or packages. For frontend components, we used a similar approach with static analysis of the component tree, essentially building a dynamic test suite for a given PR's scope. This required some initial investment in tooling, but the ROI in developer time saved was immense.

The key trade-off, as you've pointed out, is confidence versus speed. While these targeted local runs significantly reduced CI failures, we still found it prudent to run the full, comprehensive integration and end-to-end suite in CI. The local optimization was about catching most issues early and providing rapid feedback, reserving the full suite for the definitive check before merge. It's a continuous balancing act to keep that local feedback loop tight without sacrificing overall quality.

Viktor • Jul 3

The framing of "review shows the diff, not what depends on the diff" is the sharp bit - that's exactly the gap, and static impact analysis at review time is a genuinely underused angle.

Where I'd add a caution though: static impact analysis is great at the connection it can see (this attribute is selected in these three specs) and blind to the class of breakage that actually hurts most - the implicit dependency it can't parse. A selector built at runtime, a test that keys off computed text, an env/config value, a shared fixture two repos away. Static analysis will confidently show you a clean impact set and miss the exact spec that goes red, and a "clean" impact report is more dangerous than no report because people trust it.

So I'd treat it as a high-precision, low-recall signal: when it flags tests, believe it and run those first for fast feedback, but don't let a clean result gate anything - the full suite still has to run, static analysis just reorders it so the likely-red tests fail in the first 30 seconds instead of minute 20. Framed as "fail faster" it's great; framed as "run only these" it'll bite you on exactly the runtime-coupled tests static parsing can't follow.