GitHub Copilot CLI Challenge Submission
This is my submission for the GitHub Copilot CLI Challenge.
What I Built
Every developer knows the pain: your build breaks, your tests fail, and suddenly you're down a rabbit hole of stack traces, file-hopping, and manual patching. I've been there one too many times, so I built FixForward — a CLI autopilot that takes you from broken tests to a verified, ready-to-merge PR in a single command.
pip install fixforward
fixforward run
That's it. One command.
FixForward detects your test framework, runs the suite, parses the failures, classifies each one, asks GitHub Copilot CLI to generate a minimal fix, applies it on a safe branch, re-runs your tests to verify the fix actually works, and generates a PR description with a confidence score. The entire pipeline is automated, and the entire tool is powered by Copilot CLI under the hood.
Demo
Here is FixForward running against a real broken Python project. It detects the failing test, classifies the bug as an assertion failure, generates a one-line fix via Copilot CLI, patches the file, re-runs the tests, and reports 95% confidence — all without leaving the terminal:
The Pipeline: How It Works
FixForward is not a wrapper around a single Copilot prompt. It's a full incident response pipeline with 8 distinct stages:
tests fail → parse output → classify failure → Copilot generates fix
→ apply on safe branch → re-run tests → confidence score → PR report
1. Detect — Scans for pytest.ini, package.json, or Cargo.toml to identify your ecosystem automatically. No config files, no setup.
2. Run Tests — Executes your test suite (pytest, npm test, or cargo test) and captures full raw output.
3. Parse — Custom parsers extract individual failures with file paths, line numbers, error messages, and tracebacks. This isn't regex on the whole blob — there are dedicated parsers for pytest, Jest/Mocha, and cargo test that understand each format's quirks.
4. Classify — Categorizes every failure using heuristic pattern matching:
| Category | Examples | Confidence |
|---|---|---|
syntax_error |
SyntaxError, IndentationError, Unexpected token
|
95% |
dependency |
ModuleNotFoundError, Cannot find module
|
90% |
api_change |
AttributeError, TypeError (wrong args) |
85% |
assertion |
AssertionError, assert_eq!, expect().toEqual()
|
85% |
env_mismatch |
Version conflicts, missing commands | 80% |
lint |
Flake8, ESLint, Clippy warnings | 75% |
flaky_test |
Timeouts, connection refused, intermittent | 60% |
5. Generate Patch — Sends structured failure context + relevant source files to GitHub Copilot CLI (gh copilot -p) for a minimal fix. Copilot reads the actual code, understands the bug, and generates the smallest possible change.
6. Apply — Creates a fixforward/auto-* branch, writes the patched files, and commits. Your working branch is never touched.
7. Verify — Re-runs the entire test suite on the fix branch and computes a before/after confidence score.
8. Report — Generates a Markdown PR title and body with what changed, why, and the verification results.
Screenshots
Full Autopilot Run
The complete pipeline from broken tests to verified fix:
Diagnose Mode
Use fixforward diagnose to inspect failures without applying any changes:
Dependency Detection
FixForward recognizes when the fix isn't a code change but a missing package:
Multi-Ecosystem: Node.js Support
Same pipeline, different ecosystem. Jest failures parsed and classified automatically:
Three Ecosystems, One Interface
| Ecosystem | Test Command | What Gets Parsed |
|---|---|---|
| Python | pytest --tb=long -v |
Failures, tracebacks, assertion details, collection errors |
| Node.js | npm test |
Jest and Mocha output, suite-level failures, missing modules |
| Rust | cargo test |
Panics, assert_eq! failures, test summaries |
I didn't just support one language and call it a day. Each ecosystem has its own parser that understands the specific output format — pytest tracebacks look nothing like Jest failures, and cargo test panics are their own thing entirely. FixForward handles all of them.
Safety First
I spent a lot of time thinking about what could go wrong. When you're auto-applying code patches, you better not destroy someone's working tree. FixForward has several layers of protection:
-
Never touches your working branch — all fixes go on
fixforward/auto-*branches - Stashes dirty state — uncommitted changes are saved and restored on rollback
- Patch preview — see the exact diff before confirming
-
Dry-run mode —
fixforward run --dry-rundiagnoses without touching anything -
One-command rollback —
fixforward rollbackundoes everything cleanly -
State persistence — rollback info stored at
~/.fixforward/state.json
I wanted this to be a tool you could trust running in your repo without hesitation.
My Experience with GitHub Copilot CLI
Copilot CLI is not just a helper in this project — it is the engine. FixForward uses gh copilot -p at its core to generate the actual code fixes. Here's what that looks like internally:
gh copilot -- -p "I have a python project with failing tests.
The test test_divide in test_app.py fails with AssertionError:
assert 3.333 == 3. Generate a minimal fix..." \
--allow-all-tools --add-dir ./project --silent
Copilot reads the source files through --add-dir, understands the test context, and generates the smallest code change. FixForward then parses Copilot's response (which can come in several formats — code blocks, file headers, inline diffs), extracts the patched files, and applies them.
Building the Copilot response parser was one of the trickier parts. Copilot doesn't always respond in the same format, so I built multiple parsing strategies:
-
Fenced code blocks with
FILE:markers - Filename headers (### file.js, file.js, backtick file.js) followed by code blocks
- Language-tagged blocks matched to project files
- Fuzzy file matching against the project tree (skipping node_modules)
If one strategy fails, the next one kicks in. This makes FixForward resilient to Copilot's varying output formats.
I also used Copilot CLI extensively during development — for scaffolding parsers, debugging edge cases in Jest output handling, and figuring out the right subprocess patterns for capturing test output across platforms.
Try It Yourself
The repo includes ready-made broken demo projects:
git clone https://github.com/stackmasteraliza/fixforward.git
cd fixforward
# Python: division bug (a / b should be a // b)
fixforward run --path demo/broken_python
# Node.js: truncation bug ("..." becomes "..")
fixforward diagnose --path demo/broken_node
# Rust: off-by-one in clamp()
fixforward diagnose --path demo/broken_rust
# Safe mode: see the diagnosis without changing anything
fixforward run --path demo/broken_python --dry-run
Install from PyPI:
pip install fixforward
Architecture
fixforward/
├── cli.py # argparse CLI: run, diagnose, rollback
├── detector.py # Ecosystem detection + test runner
├── classifier.py # Regex heuristic failure classification
├── copilot.py # GitHub Copilot CLI integration
├── patcher.py # Safe branch creation + file patching
├── verifier.py # Test re-run + confidence scoring
├── reporter.py # PR title/body generation
├── display.py # Rich-based terminal UI
├── state.py # Rollback state persistence
└── parsers/
├── pytest_parser.py # pytest output parser
├── npm_parser.py # Jest/Mocha output parser
└── cargo_parser.py # cargo test output parser
Only one dependency: rich. Everything else is Python standard library.
Links
Thanks for reading! I'd love to hear your feedback — especially if you try it on your own broken tests.






Top comments (2)
Seriously impressive 👏🔥
Thank u