DEV Community

Cover image for From Red CI to Green PR — Automatically, Safely, and with Evidence
manoj mallick
manoj mallick

Posted on

From Red CI to Green PR — Automatically, Safely, and with Evidence

GitHub Copilot CLI Challenge Submission


This is a submission for the GitHub Copilot CLI Challenge.


What I Built

I built copilot-ci-doctor, a CLI tool that diagnoses and fixes GitHub Actions CI failures using GitHub Copilot CLI as its core reasoning engine.

Instead of manually digging through noisy logs and guessing fixes, the tool turns a failed CI run into a structured, evidence-based workflow:

failure → evidence → reasoning → safe fix → green CI → Pull Request

Given a failed workflow, copilot-ci-doctor:

  • Collects a tagged Evidence Bundle (repo metadata, failed jobs, logs, workflow YAML)
  • Uses GitHub Copilot CLI to reason about the failure
  • Explains why the CI failed in plain English
  • Generates minimal, safe patch diffs with confidence scores
  • Iteratively applies fixes until CI passes
  • Automatically opens a Pull Request against main

This is not log summarization or autocomplete.
Copilot is used as a reasoning engine that must justify its conclusions using evidence.


Demo

40-second end-to-end demo (recommended viewing):

https://www.youtube.com/watch?v=6w3kjiRh8as

👉 https://github.com/manojmallick/copilot-ci-doctor#-40-second-demo-end-to-end

One command → failing CI → Copilot reasoning → safe fixes → green CI → PR

npx copilot-ci-doctor demo
Enter fullscreen mode Exit fullscreen mode

What the demo shows:

  1. A demo repository is created with a deliberately broken GitHub Actions workflow
  2. CI fails ❌
  3. copilot-ci-doctor enters an automated loop:
  • analyzes the failure
  • explains the root cause
  • proposes a minimal patch
  • applies and pushes the fix
  • waits for CI to re-run
    1. The process repeats (multiple iterations if needed)
    2. CI turns green ✅
    3. A Pull Request is automatically opened with the fix

The demo handles real GitHub latency and shows the full lifecycle, including:

  • multiple CI failures
  • diff previews
  • iteration scoreboard
  • final PR link

Source code and demo assets:
https://github.com/manojmallick/copilot-ci-doctor

npm package:
https://www.npmjs.com/package/copilot-ci-doctor


My Experience with GitHub Copilot CLI

This project fundamentally changed how I think about GitHub Copilot.

Instead of using Copilot to write code, I used GitHub Copilot CLI to reason about systems.

Copilot CLI is used to:

  • analyze CI evidence and form ranked hypotheses
  • explain failures in plain English (including why CI fails but local passes)
  • generate minimal unified diffs, not full rewrites
  • attach confidence scores and risk levels to each fix

To make this reliable:

  • Every Copilot response must follow a strict JSON contract
  • Every conclusion must reference evidence IDs (E1, E2, …)
  • Patch diffs are validated and normalized before being applied
  • A single-call mode combines analysis + explanation + patch to reduce token usage by ~60%

The result is a workflow where Copilot behaves less like an assistant and more like a careful, explainable CI engineer.

This challenge pushed me to think beyond autocomplete and explore how Copilot CLI can safely automate complex, real-world developer workflows.

Top comments (1)

Collapse
 
manoj_mallick_71d0dd7eaa6 profile image
manoj mallick

One thing I learned while building this:
“Automatic” without evidence is dangerous.
That’s why I designed the agent to output audit-ready artifacts instead of just suggestions.
Curious how others here think about AI agents + CI safety.