Flaky CI tests are a productivity tax that most engineering teams quietly absorb. A test reruns, passes, and everyone moves on — but the cost compounds.
According to research from Google and various CI vendors, a single flaky test suite can add 15–30 minutes of developer wait time per day. Multiply that by team size and you get hundreds of engineering-hours lost per quarter to tests that fail for reasons unrelated to the change being reviewed.
The harder problem: even when you know a test is flaky, finding the commit that introduced the flakiness usually means running git bisect by hand — a process that can eat an entire afternoon.
We're running a quick community survey
I'm researching how engineering teams actually experience flaky CI and what they'd pay for a tool that automates the bisect process (telling you exactly which commit first made a test flaky, automatically, posted as a PR comment).
If you have 3 minutes, please answer these 6 questions in the comments:
Q1. What's your role?
- Software Engineer / Senior Engineer
- Staff Engineer / Principal Engineer
- Engineering Manager / Director
- VP Engineering / CTO / Head of Engineering
Q2. How many engineers commit to your main CI pipeline each week?
- 1–3 / 4–10 / 11–25 / 26–50 / 51+
Q3. Roughly how many hours per week does your team collectively lose to flaky CI (reruns, investigations, context-switching)?
- <1 hr / 1–3 hrs / 4–8 hrs / 9–15 hrs / 15+ hrs
Q4. Which tools do you currently use to detect or manage flaky tests?
- Trunk, Buildkite Test Engine, Datadog CI Visibility, BuildPulse, homegrown solution, none, other
Q5. For a tool that automatically identifies the exact commit that introduced a flaky test and posts it as a PR comment — what price per committer/month would feel: (a) too cheap to trust? (b) a fair bargain? (c) getting expensive but you'd still consider? (d) too expensive, you'd walk away?
- Reference anchors: $10 / $18 / $30 / $50 per committer/month
Q6. If this tool offered a 14-day free trial (no credit card), how likely would you be to sign up?
- Very likely / Somewhat likely / Neutral / Unlikely / Very unlikely
I'll compile the results and share a summary post with the willingness-to-pay distribution and the tooling landscape. Comments are the raw data — the more specific the better.
Background: I'm building Culprit — a commit-level root cause tool for flaky CI. This survey informs the pricing model.
Top comments (0)