I let an autonomous agent fix my own deploy CLI — here is the PR it shipped

#ai #go #automation #devtools

While building mago — a CLI that runs a small autonomous "company" of AI agents over a GitHub repo — I did the obvious test: point it at one of my own tools and give it a real task. No demo repo, no cherry-picked example. Here's exactly what happened, PR included.

The tool and the task

hotify-cli is my Go CLI for deploying web apps behind Traefik + Cloudflare (it's literally what deploys mago's own backend). It has 32 subcommands, and like a lot of CLIs, a mistyped command just dumped the entire ~130-line usage block and exited. Annoying for humans, noisy for agents.

So I filed a normal GitHub issue:

feat: suggest the nearest command on unknown input ("did you mean"), not a full-usage dump

…labeled it mago, and let the agent take it.

What the agent did, unattended

mago routed the issue to its implementer, which:

Cloned the repo and read the 32-command dispatch in main.go.
Wrote suggest.go — a rune-aware Levenshtein distance + a prefix-match shortcut + a canonical command list, with a length-aware threshold so unrelated typos don't surface a misleading suggestion.
Wrote suggest_test.go — 11 cases (typos, prefix matches, unrelated input).
Wired it into the default case of the dispatch.
Ran gofmt / go vet / go build / go test, then opened a pull request.

About 3 minutes, on Claude Sonnet, with no human in the per-step loop. Crucially it opened a PR — it didn't push to main. The merge stayed my call.

I verified it (don't trust the summary)

I checked out the branch myself:

gofmt: clean   go vet: clean   go build: OK   go test: ok

And the actual behavior:

$ hotify-cli statuss
hotify-cli: unknown command "statuss"
Did you mean "status"?
Run 'hotify-cli help' for usage.

$ hotify-cli deploi
Did you mean "deploy"?

Correct, scoped, tested, dependency-free. Merged.

Why it works — and where it doesn't

The thing that makes it trustworthy isn't the model, it's the loop:

Verified autonomy — a PR only auto-merges if review passes and the repo's own build/tests are green; failing or unverifiable work waits for you. (I kept this one review-only.)
GitHub-native — issues = tasks, labels = status, PRs = deliverables. The worker dials out, so there's no inbound webhook to expose.
BYOK — it drives your own Claude Code or tau key. It never resells completions.

It's genuinely good at well-scoped, verifiable work: tests, refactors, small features, CLI/UX fixes. It is not "build me a startup." Honest framing — it's a reliable autonomous dev shop you steer with a roadmap, not magic.

Try it

mago is CLI-only, BYOK, €20/mo. I'm onboarding the first 10 founding operators free during the beta, with a direct line to me:

👉 https://mago.intrane.fr

curl -fsSL https://mago.intrane.fr/install.sh | sh
mago register

If you try it on one of your own repos, I'd love to hear where it helped — and where it annoyed you.

DEV Community