DEV Community

Cover image for Your CI Is Always Broken. Your AI Agent Has No Idea What to Do About It.
Albert Alov
Albert Alov

Posted on

Your CI Is Always Broken. Your AI Agent Has No Idea What to Do About It.

In any real codebase, CI always has something failing. The hard part isn't finding failures — it's knowing which ones block a release. Here's an MCP server that answers that question automatically.

Here's the situation every engineer knows:

You open CI. Something's failing. You need to ship.

Is it a real regression? A known flaky test? An infra blip that'll pass on retry? You open the logs, grep for errors, cross-reference with last week's run history, check what files changed in the PR, and 20 minutes later you have an answer.

Your AI agent can't do any of that. It sees the same raw logs you do. It doesn't know your flakiness history. It doesn't know which tests are affected by the code change. It guesses.

release-readiness-triage-mcp fixes this. 🚦


🧠 The three signals that actually matter

Triaging a CI failure requires correlating three things simultaneously:

1. Error signature deduplication

If 40 tests failed with ECONNREFUSED 127.0.0.1:5432, that's one problem (database didn't start), not 40. Grouping by normalized error signature tells you the real shape of the failure.

2. Flakiness history

Some tests fail 70% of the time on a good day. If a test has a 0.73 flaky probability in your history, its failure today tells you nothing about the code.

3. Code-change correlation

If Button.test.tsx is failing and Button.tsx is in the diff, that's suspicious. If AuthFlow.test.tsx is failing and nothing in auth changed, that's noise.

Without all three signals in one place, you can't answer "is this safe to release?" You just accumulate tabs.


🛠️ The 4 tools

aggregate_suite_failures

First pass: normalize, deduplicate, categorize.

CI Run Summary
  Total tests:   847
  Passed:        842
  Failed:        5
  Failure rate:  0.59%
  Error groups:  3

Failure Groups (by frequency):
  [NETWORK] 2x — connect ECONNREFUSED 127.0.0.1:Xms
    • API Suite > health check
    • API Suite > readiness probe
  [ASSERTION] 2x — expect(received).toBe(expected)
    • Search Suite > debounce timing
    • Search Suite > sort order
  [ASSERTION] 1x — Expected null, got <button>Submit</button>
    • Button Suite > renders button correctly
Enter fullscreen mode Exit fullscreen mode

Supports customInfraPatterns — pass cloud-specific strings like "GCP quota exceeded" or "No space left on device" to classify them as infrastructure noise instead of unknown failures.

cross_reference_flakiness

Takes your flakiness database and scores each failure:

Flakiness Cross-Reference

  [KNOWN FLAKY] Auth Suite > login with expired token
    Flaky probability: 73%
  [MILDLY FLAKY] Search Suite > debounce timing
    Flaky probability: 22%
  [NO HISTORY] Button Suite > renders button correctly
    Not found in flakiness database
Enter fullscreen mode Exit fullscreen mode

correlate_code_changes

Matches changed files against failing tests. Works standalone or with pre-computed affected test lists from ast-impact-mapper-mcp:

Code Change Correlation
  Changed files: 2
  Pre-identified affected tests: 1

  [CORRELATED] Button Suite > renders button correctly
    → Matched via affected test list
  [NOT CORRELATED] Search Suite > debounce timing
  [NOT CORRELATED] Auth Suite > login with expired token
Enter fullscreen mode Exit fullscreen mode

generate_release_recommendation

The final step. Everything combined into one verdict:

## 🔴 Release Recommendation: NO_GO (75% confidence)

> 1 confirmed regression(s) directly correlated with code changes. Do not release.

| Category            | Count |
|---|---|
| Total failures      | 5     |
| 🔴 Real regressions | 1     |
| 🟡 Known flaky      | 2     |
| ⚪ Infra blips      | 2     |
| ❓ Unknown          | 0     |

### 🔴 Blockers (must fix before release)

**Button Suite > renders button correctly**
- Test is directly affected by code changes in this commit
- `Expected null, got <button>Submit</button>`

### ✅ Safe to ignore

- ~~Auth Suite > login with expired token~~ — Historically flaky: 73% failure rate in history
- ~~API Suite > health check~~ — Error pattern matches infrastructure issues (network)
- ~~Search Suite > debounce timing~~ — Mildly flaky: 22% historical failure rate
- ~~Storage Suite > upload avatar~~ — Error pattern matches infrastructure issues (network)
Enter fullscreen mode Exit fullscreen mode

Pass format: "markdown" and the output is ready to paste directly into a GitHub PR comment or Slack message.


🔗 It's a meta-orchestrator

This MCP is designed to sit on top of the other tools in the ecosystem:

  • flakiness-knowledge-graph-mcp builds the flakiness database from run history — feed its output into cross_reference_flakiness
  • ast-impact-mapper-mcp computes which tests are affected by a code change via TypeScript AST — feed its output into correlate_code_changes
  • playwright-trace-decoder-mcp decodes trace files for individual failure root-cause — use it after getting a NO_GO to understand the blocker

The agent orchestrates the chain. Each MCP handles one thing it couldn't do without tool access.


⚡ Setup

{
  "mcpServers": {
    "release-readiness-triage": {
      "command": "npx",
      "args": ["-y", "release-readiness-triage-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then just ask:

"Here are the failures from CI, our flakiness history, and the files changed in this PR. Is it safe to release?"

One answer. No log reading.


📦 Links

npx release-readiness-triage-mcp
Enter fullscreen mode Exit fullscreen mode

Top comments (0)