Every engineering team knows the feeling: a test is failing, but only sometimes. You re-run it. It passes. You push. It fails again in CI. The PR sits blocked. You ping a teammate. They re-run it. It passes again.
That hour of context-switching never shows up in a sprint retrospective.
The numbers most teams don't track
A Launchable study found that flaky tests account for roughly 10–15% of all CI failures in medium-to-large codebases. Stripe's engineering blog put the average cost of a single flaky test investigation at 1–3 engineer-hours. Google classifies a test as flaky if it fails more than 1% of the time — and they run millions of tests per day.
For a team of 15 engineers running CI 50 times a day, even a 5% flakiness rate means 2–3 hours of engineer time evaporated daily on reruns alone. That's before the morale cost of a CI pipeline nobody trusts.
The part nobody talks about: attribution
Detection is table stakes now. Tools like BuildPulse, Trunk, and Buildkite Test Engine flag flaky tests reliably. The harder problem — the one that actually gets tests fixed — is attribution: which commit first made this test flaky?
Without that answer, a flaky test becomes a permanent label on a test nobody wants to touch. With it, the PR that introduced the problem gets a comment and an engineer who knows exactly what they changed.
Most teams are still running manual git bisect sessions — or just ignoring the test entirely.
We're studying this — and we'd love your data
We're running a quick 6-question survey on how engineering teams experience and handle flaky CI today:
- How much time is your team actually losing per week?
- What tools are you using (or not)?
- What would you pay for a tool that automatically identifies the introducing commit — no manual git bisect needed?
The last question uses Van Westendorp price-sensitivity methodology. We'll publish the aggregated results publicly once we hit 30 responses.
Takes 3 minutes. No signup required.
If you've got a war story about a particularly expensive flaky test, drop it in the comments. I'd genuinely like to read it.
Top comments (0)