DEV Community

Javier Leandro Arancibia
Javier Leandro Arancibia

Posted on

An AI agent fixed my red Python test suite — without gutting the tests

This is the third in a short series. I've been pointing mago — an autonomous agent team that runs over a GitHub repo on your own LLM key — at my own repos and giving it real tasks. So far: a feature in Go and a crash fix in Zig. This time it's Python, and the question every skeptic asks: won't it just hack the tests to make them pass?

The setup

boilerplate-cli-ui-python is my agent-first Python CLI boilerplate. Its test suite was red — 4 failing tests:

FAILED tests/test_config.py::test_config_defaults - assert False == True
FAILED tests/test_output.py::test_output_formatter_json_mode - SystemExit: 0
FAILED tests/test_output.py::test_output_formatter_human_mode - SystemExit: 0
FAILED tests/test_output.py::test_output_error_json - SystemExit: 85
Enter fullscreen mode Exit fullscreen mode

I filed an issue with one explicit instruction: fix the root cause, don't gut the tests.

What it actually did

It diagnosed two real bugs:

  1. A config default mismatch (no_color defaulted to the wrong value).
  2. The interesting one: OutputFormatter.output() and output_error() called sys.exit() inside the formatter, right after writing. That makes them impossible to unit-test — the test can't capture-then-assert because the process just exits — and it's a design smell: a formatter shouldn't own process control.

The fix wasn't "delete the assertions." It was a separation of concerns:

# before — formatter exits, untestable
def output_error(self, error, code):
    ...print json...
    sys.exit(code)

# after — the formatter just formats; the CALL SITE owns the exit
# src/output.py: no sys.exit at all
# src/main.py / src/cli.py, at every error path:
formatter.output_error(error.to_dict(), error.code)
sys.exit(error.code)
Enter fullscreen mode Exit fullscreen mode

It pulled sys.exit() out of the formatter and added it at every error call-site in main.py/cli.py — so production exit codes are preserved while the formatter becomes testable. Then it rewrote the tests to assert the error payload instead of requiring the process to die.

I verified it — and checked specifically for cheating

$ python -m pytest
20 passed
Enter fullscreen mode Exit fullscreen mode

Green is cheap if you gut tests, so I checked the diff:

  • grep sys.exit src/output.py0 (removed, as intended)
  • sys.exit(error.code) added at every error path in main.py/cli.py
  • the CLI still exits non-zero on errors

It fixed the design, kept the behavior, and made the tests meaningful. Merged.

Three repos, three stacks, one loop

  • Go — a feature (a command suggester).
  • Zig — a crash fix (red → green).
  • Python — a real refactor that made a broken suite green without weakening it.

Same loop every time: file an issue, the agent implements it on your own key, runs the repo's own tests, and opens a PR you review — verified, not blindly merged. It isn't pinned to a language or a framework, and — the part I actually cared about — when told not to cheat, it didn't.

Try it

CLI-only, BYOK (your Claude Code or tau key — it never resells completions), €20/mo. First 10 founding operators free during the beta, with a direct line to me:

👉 https://mago.intrane.fr

curl -fsSL https://mago.intrane.fr/install.sh | sh
mago register
Enter fullscreen mode Exit fullscreen mode

If you point it at one of your own repos — any stack — I'd love to hear how it goes.

Top comments (0)