Vilius

Posted on May 20 • Originally published at workswithagents.dev

The Protocol Stack Nobody Talks About

#ai #agents #opinion #developer

Six agent protocols launched in the last year. Everyone's obsessing over model selection. The operating surface around the model is what actually breaks.

Google I/O opened today with a flood of agent demos. Prompts becoming apps. Vibe coding going production. The spectacle is real. But the thing that determines whether any of this works isn't on stage. It's the quiet protocol stack underneath — MCP, A2A, AGUI, and their contested cousins.

Most teams can tell you which LLM they're using. Almost none can answer: which tools should the agent see? Who else can it delegate to? Where does the human approve or cancel?

Those three questions are the stack. Here's what sits at each layer.

MCP: tools are a security boundary, not a feature toggle

MCP is the most successful agent protocol by far. 14,000 GitHub repos tagged with it. Every major agent platform supports it. An agent connects to an MCP server, gets a list of callable tools, and can actually do work instead of just chatting.

But here's what nobody says out loud: there's no registry. No mcp search. No way for an agent to discover servers programmatically. The 14,000 number is GitHub tag-counting, not a registered directory. Smithery.ai lists about 6,700 — and you browse that with your eyes, not an API. An agent can't ask "find me an MCP server for Salesforce" and get an answer. Discovery is a person reading lists.

That's not a protocol. That's a treasure hunt.

Tool access enables arbitrary code execution and arbitrary data access. MCP was designed for high-trust environments. Now it's everywhere. Invariant Labs has published research on tool poisoning — malicious instructions hidden in tool descriptions that influence agents through the very metadata meant to make tools discoverable.

MCP gets the agent close to the work. It doesn't decide whether the agent should do the work. That's on you.

A2A: coordination isn't free

No single agent does everything. A procurement agent needs a supplier agent. A travel agent needs a hotel agent. A software agent needs a security reviewer. Work is distributed across owners, domains, and expertise.

A2A turns that distribution into something agents can reason about. The agent card is the primitive — a published contract describing what a remote agent is, what it does, which skills it exposes, and how to interact with it.

The cost: coordination adds another surface where latency, failure, permissions, and observability can break. If your agent delegates to another agent, the workflow gets more flexible and less predictable at the same time.

A2A isn't right for every product. A single agent with a small tool set may not need coordination at all. The right question: does this workflow require delegated expertise or authority outside the primary agent?

AGUI: the human control layer nobody builds until it's too late

An agent that's long-running, non-deterministic, and touching external systems needs more than a final answer. Humans need to observe it working, approve sensitive steps, inspect state, understand why it's waiting.

Chatbots don't handle this. Neither do traditional web apps built for request-response.

AGUI is the open candidate for this layer: streaming, shared state, front-end tool calls, custom events, steering, cancellation. It's the protocol most teams will ignore until their agents start doing real work and generating real bugs. They'll wire a model to tools, build a nice chat component, then discover what the agent is really doing — and retroactively bolt on approval buttons, logs, and progress spinners.

None of those are fixes for the root issue: finding the right control points, understanding what the agent is trying to do, and figuring out where the human needs to approve, deny, edit, or cancel.

The three that aren't standards (yet)

A2UI, AP2, and X402 all have real use cases but sit in contested territory.

A2UI is Google's answer to agent-generated interfaces — declarative UI instead of arbitrary HTML. Right direction, narrower scope than the full human control problem.

AP2 and X402 both tackle agent payments. AP2 handles commercial trust and user authorization (60+ collaborators including Mastercard, PayPal, American Express). X402 is Coinbase's HTTP-native machine-to-machine settlement. Payments is the most crowded layer because it's the most valuable. Everyone wants in.

The boring stuff wins

Teams over-focus on model selection and under-specify everything around it. They know which LLM they want. They don't know which tools the agent can or should see. They have a prototype that calls APIs but no interaction model for user approval. They can imagine multiple agents coordinating but have no way to enforce or validate that.

The actual work lives in those questions. The protocol stack isn't glamorous. Neither is infrastructure. But six months from now, the teams that figured out their operating surface will be the ones whose agents still run.

The ones that just picked a model won't know what hit them.

Top comments (1)

NOVAInetwork • May 21

Good breakdown. The layering is right: tools,
coordination, human control, payments. But there
is a gap between the layers that none of these
protocols fill.

MCP gives tool access. A2A gives coordination.
AGUI gives human oversight. AP2 and X402 give
payments. But none of them give the agent a
verifiable identity or a reputation that follows
it across interactions. Agent A delegates to
Agent B via A2A, pays via X402, and the task
fails. What happens next? There is no protocol-
level record of what B promised, what B delivered,
or whether B has failed before.

Discovery without reputation is the treasure hunt
you describe but worse. Even if you solve
discovery with a registry, you still cannot answer
"should I trust this agent" without a track record
that lives outside any single operator's database.

The missing layer in the stack is not another
communication protocol. It is a shared system of
record for agent identity, capability claims, and
economic history. The teams that figure out their
operating surface, as you put it, will eventually
hit this wall: the surface extends beyond what any
one team controls.