MCP Didn't Lose to CLI. My Unity Agent Needed Both.

#ai #unity3d #devtools #agents

Forty minutes into an automated run, the connection to my Unity Editor went dark. My agent had been pushing a batch of PlayMode tests and an Android player build through a Model Context Protocol server, and mid-operation the socket to the Editor dropped. From the agent's point of view, the Editor was now sitting in some half-compiled, unknowable state, and every structured tool call I made came back as an error.

That failure is the boundary most CLI-versus-MCP takes quietly flatten. There's a drift in agent tooling right now — away from MCP servers, toward command-line interfaces. You've seen the takes: MCP is overengineered. The CLI won. Just give the model a shell. And I get the appeal. But that afternoon, staring at a frozen Editor, I didn't reach for one or the other. I reached for both, in sequence, because they were never solving the same problem.

I build mobile games in Unity, and for the last stretch I've been driving the Editor from inside an agent loop — creating GameObjects, editing components, running EditMode and PlayMode tests, kicking off player builds, reading the console, all without touching the mouse. To do it I use a tool I wrote called Heimdall, which happens to expose the Editor through both an MCP server and a CLI. I didn't plan that as a philosophy. It started as an accident of implementation and turned into the most important design decision in the whole system. So when people frame this as CLI-versus-MCP, my honest reaction is: you're asking the wrong question. You don't pick a winner. You figure out which surface each job belongs on.

Let me show you what I mean, starting with the frozen Editor.

The recovery layer: what the CLI is actually for

When the socket dropped, the MCP tools had no reliable process left to talk to — and that's not a knock on MCP, it's the nature of this kind of bridge. My MCP server was a live connection into a running Unity Editor; when that process's state goes uncertain, so does every call you make into it. The agent loop had nothing to grab.

So I dropped to the shell. First command: heimdall doctor. Heimdall runs as a stack — client to host to a HMAC-signed websocket into the Editor — and doctor walks that stack layer by layer and tells you exactly which tier is down. In about two seconds I knew it wasn't my code and it wasn't the Editor crashing; the websocket had dropped and the host was waiting to reconnect. Diagnosis, not guesswork.

Second command dumped the build-and-test artifact to disk: heimdall retrieve --out report.json. This is the other thing the CLI is uniquely good at. That artifact was thousands of lines. You do not want thousands of lines flooding back through the agent's context window — that's how you burn tokens and blow past the model's attention. The CLI streams it straight to a file, where I can grep it, diff it, and feed the model only the lines that matter.

And the lines that mattered were tiny. Under four thousand lines of asset-import spam, the build had failed on exactly one thing: a single null AssetReference field on a ScriptableObject that a PlayMode test depended on. Grep found it in a second. There was no world in which I wanted the agent to read four thousand lines to discover that; there was every reason to let a shell command surface the one that counted.

Notice what the CLI did here. It didn't mutate anything. It didn't reach into the scene and fix a component. It diagnosed and it moved bulk data. That's the pattern. The CLI is the recovery-and-throughput layer: it's what you want when the structured connection is broken, when you're triaging, when the payload is too big to live in context, when you're scripting the same operation across two hundred files. It thrives precisely where a stateful protocol is the wrong tool.

The control layer: what MCP is actually for

Here's the part the "just give the model a shell" crowd skips over. Once doctor told me the websocket had simply dropped, the fix was to reconnect — and the moment the connection was back, I went straight back to MCP. Because now the job changed. I wasn't triaging anymore; I needed to reach into the Editor and safely change things. Restore that missing reference. Re-run the one failing test. Reparent an object and re-serialize the scene.

Try doing that by hand-patching files and you'll feel the difference immediately. Most raw shell edits collapse rich state down into arguments, files, and stdout. Unity, meanwhile, serializes asset references through GUIDs plus file and type metadata — and if the agent generates a shell edit that writes the wrong GUID into a .prefab or .asset file, the reference may deserialize as missing or invalid, and you might not catch it until a later import or test run. A typed, validated operation avoids that class of mistake. In Heimdall, when the agent calls "set property X on component Y to value Z," that call is schema-checked and validated against the live Unity object before it ever touches the Editor. The model gets structured errors back it can actually reason about, not a wall of stderr. For anything that changes state inside a live, complex process, that typed contract isn't bureaucracy — it's the guardrail that lets an agent act without wrecking the thing it's acting on.

So I restored the reference through a single typed property call, re-ran that one PlayMode test through MCP, and watched it go green — inside the loop, no mouse, no corrupted asset. The CLI could tell me what was broken. In this stack, MCP was the safer place to change it — I didn't want the agent hand-patching serialized YAML from a shell.

The hybrid is the actual answer

Put the two next to each other and the false binary dissolves:

MCP is the structured control surface. Typed, stateful, validated. It's how the agent acts on a live process from inside the loop without turning a small mistake into a corrupted asset.
The CLI is the recovery-and-throughput surface. Diagnostic, batch, high-bandwidth. It's how you survive when the structured connection breaks, and how you move data too big or too repetitive to belong in the loop at all.

They fail in opposite directions, which is exactly why they cover for each other. The CLI stayed useful even while the live Editor connection was uncertain, because those commands inspect the stack and read persisted artifacts instead of needing a healthy in-Editor call — that's the moment MCP is weakest. A typed operation, meanwhile, can safely mutate deep structured state that a hand-written shell edit can't — that's the moment the CLI is weakest. Run only one and you have a permanent blind spot. Run both and each one's weakness sits squarely inside the other's strength.

The industry drift toward the CLI isn't wrong, exactly — it's a correction. A lot of MCP servers are overengineered wrappers around things a shell command would do fine, and reaching for the CLI there is just good taste. But "the CLI is often the right call" quietly became "MCP lost," and that second step doesn't follow. The strongest agent stacks I've used or built aren't the ones that picked a side. They're the ones that stopped treating it as a side to pick.

So before you rip the MCP server out of your stack because a thread told you the CLI won: look at what your agent actually does. Is it moving data and diagnosing failures, or is it reaching into a live system and changing its state? Because that's not one question with one answer. It's two jobs — and they want two different surfaces.

Which one is your agent stuck using for both?