Peter Tamas

Posted on Apr 25

AI Field Notes #004 | Typing is no longer the bottleneck. Thinking is.

#ai #agents #architecture #productivity

Software engineer behind this case study: Mark Kővári

Goal: Test whether agentic coding workflows can produce production-grade, architecturally complex software from a phone. The project: Clutter, a polyglot multi-agent orchestration system (Rust, TypeScript, K8s, NATS, SurrealDB).

Highlights

The act of writing code is increasingly something an agent can do for you. What it can't do is decide what to build, how to structure it, when to test, and where to draw boundaries. As more of the typing gets delegated, everything else gets proportionally harder to be good at: specification, architecture, review, testing strategy, and the judgment calls that hold a system together. This experiment is about that shift as much as it is about walking.

Every AI coding tool that shipped in the past few months converges on the same interaction surface: prompt, forms, tool-use approval, text output. That's the entire human-in-the-loop contract. None of it requires a desktop.

So I tested a hypothesis: can someone run a full agentic development workflow from a phone while walking the dog?

The bulk of the work happened across a few sessions in late March (March 26-29, 110 commits), with a follow-up on April 9. The walk shown below was a single 18 km, 4-hour session through Budapest, responsible for the biggest commit spike.

Setup

Requirement	Status
Remote execution environment	Claude Code on a Mac Mini
Thin client	Claude iOS app
Input method	iOS voice dictation + text
Nice weather	Recommended
Power bank	Recommended
Dog and a nice view	Optional but highly recommended

Claude Code already supports the full agentic loop: file reads, edits, shell commands, tool-use approvals. The process is identical to terminal usage, just accessed via remote session on iOS.

What I built (and why)

The walking experiment created its own problem quickly. Multiple concurrent Claude Code sessions, each on a different task, and switching between remote sessions on a phone was painful. I kept losing track of which session was working on what. The friction wasn't the coding, it was the context-switching.

So I started building Clutter: a multi-agent orchestration system that manages one-shot agents. Describe a task, fire off an isolated agent, get results back through NATS events. About 80% of it was built while walking, and all of it was written AI-native.

It also serves an MCP server, so I can create projects and tasks in Clutter directly from a Claude conversation. AI-assisted development building the tool that manages AI-assisted development. During development I was spoon-feeding Clutter its own tasks, and it was creating PRs on its own repo automatically (e.g. PR #51, branch agent/picur-agent-1).

Funny enough, for a while that's all I was doing: developing Clutter with itself but never using it on other projects. Someone called that out in a meeting. So I created a project targeting an external repo, added a task, and it completed first try. Sometimes you need someone to point out you've been sharpening the knife without cutting anything.

The best way to judge whether this workflow produces real output is to look at what it produced:

Metric	Value
Commits	112 (commit activity)
Walking sessions	Main session: ~18 km, ~4 hours (biggest commit day). Overall: March 26-29 + April 9 follow-up
Languages	Rust (76%), TypeScript (20%), Gherkin, Dockerfile, Helm
Rust crates	6 (control-plane, agent-runner, core, embedder, agent-mcp, mcp-server)
TS/Node packages	3 (dashboard, shared types, shared UI)
Infrastructure	Docker, Kubernetes, Helm, NATS JetStream, SurrealDB, GitHub Actions CI
Documentation	Architecture docs, ADRs, glossary, orchestration spec, conventions
Tests	BDD feature specs (Gherkin), unit tests, integration tests

The system is a Rust/Axum control plane with K8s agent isolation, SurrealDB task queue with atomic claiming, NATS event streaming, a React/Vite real-time dashboard, and a vector embedder for semantic search across agent history. Agents run in air-gapped namespaces because multiple instances on the same machine fight for ports. The point isn't the architecture itself. It's that this level of complexity came out of a phone screen and a pair of walking shoes.

🐈 A group of bobcats is called a "clutter." That's where the name comes from.

The process

Typical cycle: dictate a feature description while walking, review prompt on screen, send. Claude scaffolds the module, pauses for tool-use approval. Review proposed changes, approve or redirect, ask for tests. Another approval round. One feature, three to four approvals, ten minutes of walking.

What changes without a desk is everything around the core loop. No cmd+click to jump to a definition. No split-screen diff. No grep. All of that gets delegated to the agent: "show me the swarm worker interface," "what imports the NATS subscriber," "run cargo test and show me failures." The agent becomes the IDE.

This forces you to work at a higher abstraction level. Instead of navigating files, you describe what you want to see. Instead of reading trait implementations line by line, you ask the agent to summarize. You stay in the intent layer, the agent handles navigation. For a project with this many moving parts, that's arguably the right level anyway.

What worked

Ambient development is real. The cognitive overhead of the approval loop is low enough that walking actively helps architectural thinking.
Voice-first input works for prompt composition. Not perfect, but sufficient.
Phone as thin client. Functionally equivalent to a laptop for the human-in-the-loop surface.
Cognitive offloading. Moving forces you to reason about structure rather than grep through files. Helps with modularity.
Genuinely fun.

What didn't work

Voice-to-text accuracy. Mishears technical terms and identifiers. Not continuous like Gemini's voice mode either: dictate, review, send.
No push notifications for agent state. Had to keep checking whether Claude was waiting for an approval. A notification on agent yield would change this significantly.
No code navigation without the agent. Every file lookup costs context window tokens.
Session stability. Occasional remote session hiccups requiring reopen.
UI collision. Tool-use approval buttons appearing during typing cause misclicks.

Side effects: what building this way does to you

There's something that doesn't show up in the commit count.

Sharper instincts for code boundaries. When you can't scroll through files, module boundaries need to be explicit and self-explanatory. You notice when an interface is too wide or a module's responsibility is unclear, because those are the moments you need three follow-up questions instead of one. This trains you to read and reason about code structure faster, whether you're on a phone or not.

Code organization follows your mental model. When navigation is by description ("show me the task lifecycle," "what handles NATS events"), the codebase starts reflecting how you reason about it. Modules get named for what they do, not where they sit in a directory tree. Interfaces get narrower because you want to ask for one thing and get one thing back.

Traditional guardrails harden continuously. When an agent writes the code, you stop trusting that things are correct just because they compile. More tests, stricter type boundaries, better CI, more explicit conventions. The BDD specs, the CONVENTIONS.md, the orchestration spec all exist because this workflow surfaces the cost of ambiguity immediately. And this is where the human in the loop actually matters most: every intermediate artifact becomes a quality gate. A PR review isn't just a formality, it's the moment you catch what the agent missed. A rendered UI isn't just a preview, it's verification that intent survived translation. Every checkpoint (a green CI run, a visual diff, a passing smoke test) carries more weight now because the thing that produced the code between checkpoints isn't reasoning the way you would. The quality assurance surface doesn't shrink when agents write code. It grows.

Conclusion

I built a polyglot multi-agent orchestration system with 112 commits, BDD specs, architecture docs, and a real-time dashboard, mostly from a phone while walking. The project itself was born from the friction of doing exactly that.

The limiting factor isn't the device. It's how well you know your architecture and how clearly you can describe intent to the agent.

As AI coding tools converge on the same agentic loop, the interface becomes thinner. The logical endpoint: the "IDE" is just a notification that your agent needs a decision. Everything else happens in the background.

If you want to checkout the repository https://github.com/markkovari/clutter and see what I made.

Highly recommend experimenting with it. Worst case, you go for a nice walk.

DEV Community