Yoana Popova for Datopian

Posted on Jun 3 • Originally published at datopian.com

Multi-Agent AI Is Ready. Your Workflow Infrastructure Isn't.

#ai #programming #productivity #architecture

A conversation between Rufus Pollock, founder of Datopian, and Anuar Ustayev, CTO — from the AI Learned Today minicast.

Disclosure: The authors work at Datopian. This post is based on our own experience running multi-agent AI workflows in production — no sponsors, no affiliate links.

Two months ago, Anuar Ustayev — Datopian's CTO — adopted Gas Town as his primary tool for orchestrating multiple AI coding agents. It changed how he worked. Today, if he were starting from scratch, he might not choose it.

That's not a criticism of Gas Town. It's a signal of how fast the tooling landscape is moving — and how immature the infrastructure around multi-agent AI workflows still is.

This post is a distillation of two candid conversations about what multi-agent AI development actually looks like in practice: the workflow shifts, the unsolved problems, and the honest answers to questions most AI content avoids.

The Bottleneck Has Moved

A year ago, the constraint in software development was execution capacity. You had more work than time to implement it. AI coding assistants helped at the margins: autocomplete, boilerplate, quick fixes.

That constraint is gone.

"Previously it was about having a lot of work to do and then finding time to implement it. Now the bottleneck is around just creating the work — defining it — because you might have ideas, but you need to spec them out."
— Anuar Ustayev

With tools like Claude Code, Codex, and orchestration layers like Gas Town, a single developer can now run multiple agents in parallel — each working on a separate task, simultaneously. The implementation layer has been largely commoditised.

The new constraint is upstream: work definition. Can you write a specification clear and complete enough that an AI agent will implement the right thing?

This is not a minor adjustment. It is a fundamental restructuring of where skilled human effort needs to go.

What Gas Town Actually Does

Gas Town is an open-source tool for orchestrating multiple AI agents from a single terminal interface. Key concepts:

Rigs — project workspaces, each with its own directory, context, and agents. You can have dozens running simultaneously.

The Mayor agent — a central orchestrator that manages all rigs. You interact with it via natural language: "What's the status of project X? Dispatch an agent to pick up the next item in the backlog."

Beads — an AI-native issue tracker that lives as a local database inside each project. Unlike GitHub Issues or Linear, Beads is designed to be read and written by AI agents directly.

Anuar's reported time split after several months: 80% planning and supervising, 20% debugging. He hasn't written code himself in a long time.

🎧 We walked through this live in episode 6 of AI Learned Today — including a real terminal demo of the Mayor agent dispatching work.

The Setup: Claude Code with Subagents

Gas Town was genuinely novel when it launched in early 2025. Since then, the major AI platforms have caught up. Claude Code now supports subagent workflows natively. Codex Desktop has similar capabilities.

Anuar's current recommendation for someone starting today:

"Whatever LLM you're already subscribed to — start there. You don't need dozens of agents right away. Start with a few."

The practical workflow:

Start a planning session with Superwhisper or similar for spec writing assistance
Use plan mode to generate a structured spec from a rough idea
Break the spec into discrete tasks (Beads, GitHub Issues, or local markdown)
Dispatch agents to execute tasks in parallel
Review, merge, iterate

The Spec Quality Problem

The promise of multi-agent AI is compelling: define work once, let agents execute in parallel, review output. In practice, the quality of what agents produce is highly sensitive to the quality of the input spec.

"I find this with UI issues or others that are quite painful to get to a clear spec for the AI to solve. Right now I might spend 20 minutes trying to spec something to the AI that a human developer would understand in 30 seconds from a screenshot."
— Rufus Pollock

The current best approaches:

Interactive spec refinement — tools like Superwhisper and Claude's plan mode ask follow-up questions and force specification decisions before implementation begins.

Screenshot-to-spec — for UI issues, take a screenshot, describe the problem in one sentence, let the AI generate the requirements document.

Iterative shaping — borrow from Basecamp's Shape Up methodology: rough idea → shaped spec → ready to build. Don't hand work to an agent until it's shaped.

⚠️ The uncomfortable truth: most developers skip the shaping step because it feels like overhead. With AI agents, skipping it is expensive. A poorly specified task produces something that looks done but isn't — and unpicking it costs more than starting over.

The Fragmentation Problem Nobody Is Solving

A realistic day of AI-assisted work looks like this: research thread on ChatGPT on your phone. Continue in Claude on your laptop. Spin up Codex Desktop to implement. Ask a quick question in Gemini because Claude is rate-limited.

Each session exists in isolation. No unified history. No search across sessions.

"I want all of my chat sessions I ever did to be archived somewhere I can search — but not getting in my way. The way a good issue tracker has the past, but keeps me focused on what's right now."
— Rufus Pollock

Current workarounds:

Stick to one tool — the discipline of single-tool consistency is more valuable than marginal features of tool-switching
Checkpoint documents — markdown files that capture decisions, dead ends, and current state; portable across tools and sessions
Local-first storage — Beads' local database model is the right architecture: source of truth on your disk, not inside a vendor's SaaS

The tool that builds a unified, searchable, AI-session-aware knowledge layer across Claude, ChatGPT, Codex, and Gemini will be genuinely valuable. It does not yet exist.

The Token Cost Reality

Multi-agent workflows are token-hungry. Anuar uses Claude, OpenAI, and Gemini subscriptions concurrently — and regularly hits rate limits on all three simultaneously.

Mitigation strategies:

Task-appropriate model routing — use cheaper models (Claude Haiku, GPT-4o Mini, Gemini Flash) for mechanical tasks. Reserve expensive models for reasoning-intensive work.

Locally hosted models — Llama 4, Mistral, via Ollama or Cloudflare Workers AI at near-zero marginal cost.

Batching — background agents on non-urgent tasks can be rate-limited deliberately, spreading token consumption across time.

What This Means in Practice

The shift from implementation-focused work to spec-and-supervise work is not coming. For developers actively using these tools, it is already here.

Invest in spec quality. Ten minutes of careful shaping can prevent two hours of agent work in the wrong direction.

Build a checkpoint habit. Write distillation documents at meaningful points in every AI session — not for the AI, but for you.

Choose one orchestration tool and go deep. Self-inflicted fragmentation makes the fragmentation problem worse.

Track the shaping pipeline. Know which ideas are raw, which are shaped, and which are ready to ship to an agent.

Plan for token costs. Model routing and local hosting are budget management, not optional optimisations.

What's Next

The tools are evolving faster than the workflows. The problems described here — fragmented session history, manual model routing, immature spec tooling — will be partially addressed by the major platforms.

What won't be solved by the platforms: the cognitive discipline required to work well with AI agents at scale. The developers building these practices now will have a durable advantage.

The question worth sitting with: where is the real bottleneck in your workflow? If it's still implementation, the tools in this post will help immediately. If it's already moved to definition, the investment is in a different place entirely.

Based on two episodes of AI Learned Today, a minicast from Datopian. Hosted by Rufus Pollock and Anuar Ustayev.

Watch on YouTube:

Datopian builds open data infrastructure for governments, international organisations, and enterprises. We've been building with and for data since we created CKAN in 2006. We've built PortalJS, Datahub.io, Flowershow and many others.

Top comments (1)

Yoana Popova Datopian • Jun 3 • Edited

Is the spec quality problem a tooling problem or a thinking problem? Rufus and Anuar land on "mostly thinking" — better tools will help at the margins but the cognitive discipline is the real unlock. Disagree? Would love to hear what's working for people.