While building mago — a CLI that runs a small autonomous "company" of AI agents over a GitHub repo — I did the obvious test: point it at one of my own tools and give it a real task. No demo repo, no cherry-picked example. Here's exactly what happened, PR included.
The tool and the task
hotify-cli is my Go CLI for deploying web apps behind Traefik + Cloudflare (it's literally what deploys mago's own backend). It has 32 subcommands, and like a lot of CLIs, a mistyped command just dumped the entire ~130-line usage block and exited. Annoying for humans, noisy for agents.
So I filed a normal GitHub issue:
feat: suggest the nearest command on unknown input ("did you mean"), not a full-usage dump
…labeled it mago, and let the agent take it.
What the agent did, unattended
mago routed the issue to its implementer, which:
- Cloned the repo and read the 32-command dispatch in
main.go. - Wrote
suggest.go— a rune-aware Levenshtein distance + a prefix-match shortcut + a canonical command list, with a length-aware threshold so unrelated typos don't surface a misleading suggestion. - Wrote
suggest_test.go— 11 cases (typos, prefix matches, unrelated input). - Wired it into the
defaultcase of the dispatch. - Ran
gofmt/go vet/go build/go test, then opened a pull request.
About 3 minutes, on Claude Sonnet, with no human in the per-step loop. Crucially it opened a PR — it didn't push to main. The merge stayed my call.
I verified it (don't trust the summary)
I checked out the branch myself:
gofmt: clean go vet: clean go build: OK go test: ok
And the actual behavior:
$ hotify-cli statuss
hotify-cli: unknown command "statuss"
Did you mean "status"?
Run 'hotify-cli help' for usage.
$ hotify-cli deploi
Did you mean "deploy"?
Correct, scoped, tested, dependency-free. Merged.
Why it works — and where it doesn't
The thing that makes it trustworthy isn't the model, it's the loop:
- Verified autonomy — a PR only auto-merges if review passes and the repo's own build/tests are green; failing or unverifiable work waits for you. (I kept this one review-only.)
- GitHub-native — issues = tasks, labels = status, PRs = deliverables. The worker dials out, so there's no inbound webhook to expose.
- BYOK — it drives your own Claude Code or tau key. It never resells completions.
It's genuinely good at well-scoped, verifiable work: tests, refactors, small features, CLI/UX fixes. It is not "build me a startup." Honest framing — it's a reliable autonomous dev shop you steer with a roadmap, not magic.
Try it
mago is CLI-only, BYOK, €20/mo. I'm onboarding the first 10 founding operators free during the beta, with a direct line to me:
curl -fsSL https://mago.intrane.fr/install.sh | sh
mago register
If you try it on one of your own repos, I'd love to hear where it helped — and where it annoyed you.
Top comments (0)