This is the third in a short series. I've been pointing mago — an autonomous agent team that runs over a GitHub repo on your own LLM key — at my own repos and giving it real tasks. So far: a feature in Go and a crash fix in Zig. This time it's Python, and the question every skeptic asks: won't it just hack the tests to make them pass?
The setup
boilerplate-cli-ui-python is my agent-first Python CLI boilerplate. Its test suite was red — 4 failing tests:
FAILED tests/test_config.py::test_config_defaults - assert False == True
FAILED tests/test_output.py::test_output_formatter_json_mode - SystemExit: 0
FAILED tests/test_output.py::test_output_formatter_human_mode - SystemExit: 0
FAILED tests/test_output.py::test_output_error_json - SystemExit: 85
I filed an issue with one explicit instruction: fix the root cause, don't gut the tests.
What it actually did
It diagnosed two real bugs:
- A config default mismatch (
no_colordefaulted to the wrong value). - The interesting one:
OutputFormatter.output()andoutput_error()calledsys.exit()inside the formatter, right after writing. That makes them impossible to unit-test — the test can't capture-then-assert because the process just exits — and it's a design smell: a formatter shouldn't own process control.
The fix wasn't "delete the assertions." It was a separation of concerns:
# before — formatter exits, untestable
def output_error(self, error, code):
...print json...
sys.exit(code)
# after — the formatter just formats; the CALL SITE owns the exit
# src/output.py: no sys.exit at all
# src/main.py / src/cli.py, at every error path:
formatter.output_error(error.to_dict(), error.code)
sys.exit(error.code)
It pulled sys.exit() out of the formatter and added it at every error call-site in main.py/cli.py — so production exit codes are preserved while the formatter becomes testable. Then it rewrote the tests to assert the error payload instead of requiring the process to die.
I verified it — and checked specifically for cheating
$ python -m pytest
20 passed
Green is cheap if you gut tests, so I checked the diff:
-
grep sys.exit src/output.py→ 0 (removed, as intended) -
sys.exit(error.code)added at every error path inmain.py/cli.py - the CLI still exits non-zero on errors
It fixed the design, kept the behavior, and made the tests meaningful. Merged.
Three repos, three stacks, one loop
- Go — a feature (a command suggester).
- Zig — a crash fix (red → green).
- Python — a real refactor that made a broken suite green without weakening it.
Same loop every time: file an issue, the agent implements it on your own key, runs the repo's own tests, and opens a PR you review — verified, not blindly merged. It isn't pinned to a language or a framework, and — the part I actually cared about — when told not to cheat, it didn't.
Try it
CLI-only, BYOK (your Claude Code or tau key — it never resells completions), €20/mo. First 10 founding operators free during the beta, with a direct line to me:
curl -fsSL https://mago.intrane.fr/install.sh | sh
mago register
If you point it at one of your own repos — any stack — I'd love to hear how it goes.
Top comments (0)