DEV Community

Cover image for CI Guardian: Safe Human-in-the-Loop AI CI Remediation
Sreenu Sasubilli
Sreenu Sasubilli

Posted on

CI Guardian: Safe Human-in-the-Loop AI CI Remediation

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

CI Guardian is implemented as a GitHub CLI extension (gh ci-guardian) and runs entirely from the terminal, integrating GitHub Actions logs with GitHub Copilot CLI for safe, human-in-the-loop remediation.

Instead of blindly applying AI-generated patches, CI Guardian analyzes real CI logs, summarizes the failure, and attempts a minimal fix only if it’s low-risk. If the fix is unclear or unsafe, it stops and leaves the decision to a human.

The tool can:

  • Diagnose CI failures with structured root-cause analysis
  • Attempt minimal, semantic fixes
  • Automatically open PRs only when patches apply cleanly
  • Refuse unsafe or low-confidence fixes and escalate to a human when necessary

I tested CI Guardian on both a small demo repo and a real fork of Flask, including scenarios with fork permissions, pull-request-only CI, and multiple workflows.

Demo

Repository:

https://github.com/sasubillis/gh-ci-guardian

The extension entrypoint maps directly to ci_guardian/cli.py, which handles run discovery, log extraction, Copilot prompting, patch validation, and PR creation.

All screenshots below were captured against real repositories with real failing CI runs, including a fork of Flask to demonstrate behavior on a production-scale codebase.

Example usage:

# Diagnose the latest failing CI run
gh ci-guardian diagnose --latest --branch all

# Attempt a safe fix and open a PR if possible
gh ci-guardian fix --latest --branch all
Enter fullscreen mode Exit fullscreen mode

What the demo shows:

  • CI failures diagnosed into structured JSON
  • Copilot-generated unified diffs
  • Automatic PR creation when patches are safe
  • Graceful refusal with preserved diffs when fixes are unsafe (human-in-the-loop)

This behavior was demonstrated on a real Flask fork where CI failures only surface on pull requests, not direct pushes.

Diagnosis on Failing CI with demo repo

Diagnosis on Failing CI with demo repo

Fix made by ci-guardian on demo repo

Fix made by ci-guardian on demo repo

PR opened in GitHub by ci-guardian
When a fix is safe and minimal, CI Guardian automatically opens a remediation pull request.
PR opened in GitHub by ci-guardian

Diagnosis on Failing CI on real repo (Flask)
CI Guardian converts a real failing GitHub Actions run into a structured, machine-readable diagnosis using GitHub Copilot CLI.
Diagnosis on Failing CI on real repo (Flask)

Human-in-the-loop Intervention
CI Guardian safely refuses to auto-fix an ambiguous CI failure on a real Flask fork and escalates to human review.

Human-in-the-loop Intervention

My Experience with GitHub Copilot CLI

GitHub Copilot CLI was used as a reasoning engine, not a blind code generator. I used copilot -p to:

  • Summarize CI logs into structured root-cause explanations
  • Generate minimal unified diffs grounded in real failure logs
  • Draft concise pull request titles and descriptions

The key insight was that Copilot is most effective when paired with strict guardrails. CI Guardian treats Copilot output as a proposal, not a command, and enforces safety checks before applying any change. This results in automation that accelerates debugging without sacrificing trust or correctness.

Top comments (0)