DEV Community

TechJect Studio
TechJect Studio

Posted on

I'm building an AI that fixes broken CI pipelines — here's what I've learned so far

I've been thinking a lot about this lately and wanted to hear how other people's teams actually handle it day to day.

When a pipeline fails on a PR, what does your process look like? Like, does the developer who opened the PR own the investigation? Does it escalate to a platform/DevOps engineer? Or does everyone just kind of wing it?

The part I find most painful:

  • Scrolling through hundreds of lines of logs to find the one line that actually matters
  • Not knowing if the failure is my code or a flaky test
  • The cycle of "push fix → wait 8 minutes → fail again → repeat"

I've seen teams handle this really differently. Some have runbooks, some just @-mention their DevOps person, some have built internal tooling.

A few things I'm curious about:

  1. How long does it typically take your team to go from "pipeline failed" to "root cause identified"?
  2. Do you have any automation that helps here, or is it mostly manual?
  3. What would actually make this better for you — better log UX? AI diagnosis? Something else?

Asking partly because I'm exploring building something in this space and want to make sure I'm solving a real problem and not just the problem I personally have. So genuinely curious about others' experiences.

Top comments (0)