DEV Community

Raj Navakoti
Raj Navakoti

Posted on • Originally published at Medium

Work on the Circuit Board, Don't Box It Yet

Your multi-agent system isn't ready for a UI. And that's fine.


The Temptation

Every enterprise I've seen in the last 18 months does the same thing: they build an agent, it works in a prototype, and immediately someone says "let's make this an app." A nice UI, a button, maybe some charts. Box it up, hand it to users, move on.

I get it. There's a product manager somewhere who just watched the demo and their eyes lit up. "Can we put this in front of customers by Q3?" And the engineer who built it feels the pull too — a polished app feels like real work, a terminal feels like hacking.

But multi-agent orchestration is still in the circuit board stage. And boxing a circuit board is how you kill trust in AI across your entire organisation.


What a Circuit Board Looks Like

I run 17 projects with AI agents. Not through apps. Through terminal sessions and tmux panes.

Here's what a typical multi-agent session looks like on my screen:

┌─────────────────────────┬──────────────────────────┐
│ Agent: Architect        │ Agent: Code Reviewer     │
│                         │                          │
│ > Reading CLAUDE.md...  │ > Waiting for PR...      │
│ > Found 3 context files │ >                        │
│ > Reasoning: "The API   │ >                        │
│   contract suggests     │ >                        │
│   this is a bounded     │ >                        │
│   context for orders,   │ >                        │
│   not fulfillment"      │ >                        │
│ > Tool call: Read       │ >                        │
│   /models/order.yaml    │ >                        │
│ > ...                   │ >                        │
├─────────────────────────┴──────────────────────────┤
│ Orchestrator Log                                    │
│ 14:23:01 architect → code-reviewer: handoff         │
│ 14:23:01 context: 3 files, 2847 tokens              │
│ 14:23:02 code-reviewer: starting review...          │
└─────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

It's ugly. It's not something you'd demo to a VP. But I can see everything:

  • What context the agent loaded
  • How it reasoned about the problem
  • When it handed off to another agent
  • What got passed in the handoff
  • Where it's stuck

That visibility is the entire point. Because agents fail — and right now, they fail in ways you need to see to fix.


The Failure Modes You Can't See Through a UI

Here's what actually goes wrong in multi-agent systems:

The wrong tool call. Agent picks search_confluence when it should have picked read_api_contract. Through a UI, you see a bad answer. Through the circuit board, you see exactly which tool was selected and why — and you fix the tool selection logic.

The handoff fumble. Agent A passes context to Agent B, but drops a critical piece. The user sees a weird response. You see... nothing, because the UI doesn't show inter-agent communication. On the circuit board, the handoff is logged line by line.

The infinite loop. Agent asks for clarification, gets a response, asks for clarification again, gets the same response, asks again. Through a UI, the spinner just keeps spinning. Through tmux, you see the loop happening in real time and kill it at iteration 3, not iteration 47.

The confidence problem. Agent is 30% confident in its answer but presents it with 100% certainty. The UI shows a clean response. The circuit board shows the reasoning chain that led there — and you see the hedging, the contradictions, the "I'm not sure about this but..."

Every one of these is a real failure I've hit in the last six months. Every one of them was caught because I could see the circuit board. None of them would have been visible through a dashboard.


"But Our Users Need an App"

I hear this constantly. And the answer is: which users?

Power users and engineers should be on the circuit board. They're the ones who can spot failures, provide feedback, and help the system improve. Give them terminal access, tmux sessions, or at minimum a verbose logging view. They'll love it — it's like having X-ray vision into the AI's brain.

Business users are a different story. They need something simpler. But "simpler" doesn't mean "a polished app." It means a carefully constrained interface for a narrow, proven workflow.

The mistake is jumping from "prototype works in terminal" straight to "let's build a full app for everyone." There's a middle ground:

STAGE 1: Circuit Board
  Who: Engineers, power users
  Interface: Terminal / tmux
  Goal: Find failure patterns, refine agents
  Duration: Weeks to months

STAGE 2: Guided Circuit Board
  Who: Technical users, early adopters
  Interface: Simple web UI with visible reasoning
  Goal: Validate with real workflows, broader feedback
  Duration: Weeks

STAGE 3: Protective Case
  Who: Business users, general audience
  Interface: Polished app with "inspect reasoning" option
  Goal: Production use
  Prerequisite: Reliability metrics from Stage 1-2
Enter fullscreen mode Exit fullscreen mode

Most enterprises try to skip to Stage 3. They end up back at Stage 1 anyway — just with more sunk cost and more disappointed users.


The Maturity Test

How do you know when an agent workflow is ready to graduate from circuit board to boxed app? Here's the checklist I use:

[ ] Agent succeeds on 90%+ of cases in its target domain
[ ] Failure modes are known and documented (not "it sometimes breaks")
[ ] Recovery from failure is automated or gracefully handled
[ ] Handoffs between agents are consistent and auditable
[ ] A non-engineer has used it successfully for 2+ weeks
[ ] You can explain every tool call the agent makes, not just the output
[ ] You've watched it fail and know exactly why each time
Enter fullscreen mode Exit fullscreen mode

If you can't check all of these, you're not ready to box it. And that's fine. The circuit board isn't a limitation — it's where the learning happens.


What This Means for Your Enterprise

If you're building multi-agent systems right now, here's the practical takeaway:

Don't invest in agent UIs yet. Invest in observability. Build logging, tracing, and inspection tools. Make the circuit board visible and navigable. That's your competitive advantage — not a pretty dashboard.

Let engineers live in the terminal. The feedback loop from "I saw the agent fail" to "I fixed the prompt" is minutes in a terminal. It's days through a bug report from a UI.

Resist the demo pressure. When leadership asks for a demo, show them the tmux session. Explain what they're seeing. The honest demo — "here's the agent thinking, here's where it struggled, here's how we fixed it" — builds more trust than a polished UI that hides the mess.

Plan for the box, but don't build it yet. Know what the eventual app looks like. Design the API. Sketch the UI. But don't build it until the circuit board tells you it's ready.


One Takeaway

We're in the circuit board era of multi-agent AI. The agents work — sometimes brilliantly — but they fail in ways that require human eyes on the wiring. The enterprises that win won't be the ones that boxed it fastest. They'll be the ones that stayed on the circuit board longest, learned the failure patterns, and only boxed it when it was genuinely reliable.

Ship the circuit board. Let people see it work. The box can wait.


Are you boxing agents too early? Or have you found the right moment to graduate from terminal to app? I'm running 17 agent projects from tmux panes and I'm curious what's working for others at scale.

Top comments (0)