After a month of making agent-first CLIs at work, I have settled on a few patterns that help coding agents over several tools my Claude uses every day. I was inspired by an interview with OpenClaw's creator Pete Steinberger. He talks about using CLI over MCP for agent tools, so I started to make my own gmail CLI as an exercise.
I told Claude I wanted to make an agent-first CLI like Pete's gogcli. I was surprised when Claude told me that it was too human focused. Unpacking that with Claude really challenged my mind about what agent-first tools really means.
Agents struggle with CLIs in several ways:
- Understanding errors: "Message not found" doesn't tell an agent what to do next
- Missing context: After a command, what actions are available? Humans infer, agents don't
- Round-trip cost: Each CLI call = latency + tokens to process output
- Destructive mistakes: Agents can hallucinate IDs or misunderstand intent
- Discoverability:
--helpis prose for humans, not structured for LLMs
Addressing these issues we came up with these patterns:
1. Structured Errors with Suggestions
Making error outputs json by default, using error codes, etc. is all good basic foundations, but there is something here that surprised me: the suggestions and the retryable flag.
{
"success": false,
"error": {
"code": "MESSAGE_NOT_FOUND",
"message": "Message with ID abc123 not found",
"suggestions": [
"Run `desk mail search` to find valid message IDs",
"The message may have been deleted or is in trash"
],
"retryable": false
}
}
The suggestions are quick prompts, dynamically generated, based on the error. This helps the agent move to the next action without needing another turn to discover its options for recovery. Retryable is also something I had not thought about, but it's a clear signal about whether this is something is worth trying again. Another feature that saves unnecessary turns.
2. Operation Receipts
All mutating commands return with a receipt confirming what happened, like so:
{
"success": true,
"operation": "archive",
"target": {"id": "abc123", "subject": "Q4 Report"},
"timestamp": "2025-02-07T10:30:00Z",
"undo_command": "desk mail unarchive abc123",
"undo_expires": "30 days"
}
On first pass, you can see this saves turns by confirming what happened, but the real power comes from the undo_command field. Like suggestions, it provides context before the LLM knows it needs it to undo this mutation. That makes recovery much easier and saves turns.
3. Dry-Run Mode
Claude uses --dry-run all the time on existing CLI tools. So like receipts, it is a good opportunity to give the agent context about what is about to happen so that it can confirm the desired outcome.
$ desk mail trash abc123 --dry-run
{
"would_execute": "trash",
"target": {
"id": "abc123",
"subject": "Quarterly Report",
"from": "boss@company.com"
},
"reversible": true
}
Dry runs on their face are not new, but you can add even more agent-specific context. As in the example above, the reversible field is really useful, and not common in existing dry-runs.
4. Batch Operations
Supporting bulk operations is just nice in general, but this is a good reminder that often agents work on operations in bulk:
desk mail archive id1 id2 id3 id4 id5
And support for stdin with large batches helps agents take advantage of the composability of shell commands:
desk mail search "is:unread older_than:30d" --ids-only | desk mail archive --stdin
5. Capabilities Endpoint
How many times have you seen an agent go through several turns navigating a nested help structure?
Bash(desk --help)
Thinking
Bash(desk mail --help)
Percolating
Bash(desk main read --help)
So many unnecessary turns! Giving them the structure up front, allows them to zero in on the exact command they need right away and for the rest of the context window.
$ desk --capabilities
{
"version": "0.1.0",
"services": {
"mail": {
"commands": ["search", "read", "send", "archive", "trash"],
"batch_supported": ["archive", "trash", "label", "mark-read"],
"dry_run_supported": ["send", "trash"],
"destructive": ["send", "trash"]
}
}
}
Capabilities tells the agent all the possible commands and gives important context up front like which are batch or mutations which strikes a balance between no information and too much. This output serves as an index that lets them find what they need before diving in deeper.
6. Idempotency Keys
De-duping operations with idempotency keys means that the CLI can help the agent out if it retries a command and doesn't realize it went through.
desk mail send --to bob@x.com --subject "Update" --body "..." \
--idempotency-key "agent-task-abc123"
If retried, the second call is a safe no-op returning the original result. This is pretty common in APIs, but works just as well on the command line. The unique part here is that a human would probably spend cycles confirming whether it worked, while an agent often retries commands.
Conclusion
Even though coding agents are really good at mimicking the way humans use existing tools, I think there are a lot of patterns that are LLM-forward we have yet to explored. It is tempting to want to settle on the norms, but the truth is we have only been working in this way for a short time. There is value in exploring on your own, even when gogcli and gws exist already. If nothing else you will learn a lot.
Top comments (0)