DEV Community

Cover image for Stop Drowning in CI Noise: QAI Agent Clusters Your Test Failures and Tells You What Actually Broke
Tomer Lihovetsky
Tomer Lihovetsky

Posted on

Stop Drowning in CI Noise: QAI Agent Clusters Your Test Failures and Tells You What Actually Broke

You open a PR. CI is red. There are 47 failed tests.

Now what?

You scroll through a wall of test names. Some look related. Some look flaky. Some are probably the same root cause repeated across 20 test cases. You don't know which to fix first, or whether it's even safe to merge.

This is CI noise — and it's eating engineering time every single day.


What QAI Agent does

QAI Agent is a GitHub Action that runs after your tests and posts an intelligent summary directly on the pull request.

It does three things:

1. Clusters failures by root cause

Instead of showing you 47 test names, it groups tests that failed for the same underlying reason. If 30 tests all hit the same null pointer, that's one cluster — one thing to fix.

It works by normalizing error messages: stripping timestamps, line numbers, UUIDs, memory addresses, file paths, and variable values, then hashing the result. Tests with the same normalized signature are the same failure.

2. Scores PR risk

Based on the fail rate and number of unique failure patterns, it outputs a risk level: low, medium, or high. You can use this to automatically block merges on high-risk PRs.

3. Analyzes Playwright traces (optional)

If you're using Playwright and save traces on failure, QAI Agent will unzip and analyze them locally — no cloud required. It detects five failure categories:

Cause How it's detected
UI Changed Locator not found, strict mode violation
Backend Error HTTP 5xx response during test
Test Bug Assertion errors in console logs
Timing / Flaky Timeout on step
Environment Failure Network failures, ECONNREFUSED

Setup in 60 seconds

Add one step to your existing workflow, after your tests run:

- name: QAI Agent
  uses: useqai/qai-agent@v1
  if: always()
  with:
    junit-path: 'test-results/results.xml'
Enter fullscreen mode Exit fullscreen mode

Your workflow needs pull-requests: write permission:

jobs:
  test:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npx playwright test --reporter=junit
      - name: QAI Agent
        uses: useqai/qai-agent@v1
        if: always()
        with:
          junit-path: 'test-results/results.xml'
          trace-path: 'test-results/**/*.zip'   # optional, for RCA
Enter fullscreen mode Exit fullscreen mode

That's it. No account. No API key. No configuration.

The PR comment it generates

Every PR gets a comment like this:

It shows:

  • Risk level and merge recommendation
  • Failed tests with their error messages
  • Failure clusters (grouped by root cause)
  • RCA analysis from Playwright traces (if provided)

The comment is upserted — it updates in place when you push new commits, so it doesn't spam your PR timeline.

Block merges on high risk

QAI Agent exposes outputs you can use in subsequent steps:

- name: QAI Agent
  id: qai
  uses: useqai/qai-agent@v1
  with:
    junit-path: 'test-results/results.xml'

- name: Block merge on high risk
  if: steps.qai.outputs.risk-level == 'high'
  run: |
    echo "High risk — investigate failures before merging"
    exit 1
Enter fullscreen mode Exit fullscreen mode

Available outputs: risk-level, risk-score, failed-tests, total-tests, cluster-count.

Works with any JUnit-compatible framework

Framework How to get JUnit output
Playwright --reporter=junit
Jest --reporters=jest-junit
Vitest --reporter=junit
pytest --junitxml=results.xml
Maven/JUnit built-in
Go (gotestsum) --junitfile results.xml

What it doesn't do (yet)

To be honest about limitations:

  • No historical context — without connecting a cloud backend, QAI Agent only sees the current run. It can't tell you "this failure has been flaky for 3 weeks."
  • No LLM explanations — the RCA is rule-based, not AI-generated. It detects categories of failure, not the specific cause in your code.
  • Playwright traces only — the RCA analysis only works with Playwright trace zip files, not other test frameworks.

A cloud platform that adds historical trends, flakiness tracking, and LLM-powered root cause analysis is in development. But the standalone action is genuinely useful today without any of that.

Try it

  • GitHub Action: useqai/qai-agent on the Marketplace
  • Source: github.com/useqai/qai-agent

If you try it, open an issue or leave a comment here — especially if you run into a framework or JUnit variant that doesn't parse correctly. Happy to fix it.

Top comments (0)