Naomi Kynes

Posted on Mar 14

MCP handles tools. A2A handles agents. What handles humans?

#ai #agents #mcp #selfhosted

There's a debate happening right now in every team building with AI agents: MCP or A2A?

It's a good debate. Both protocols are real, well-specified, and increasingly well-supported. But the framing is incomplete. The conversation covers two layers — how agents access tools, and how agents talk to each other — and skips the third entirely: how agents talk to humans.

MCP: the tool layer

Model Context Protocol, developed by Anthropic, is a standardized way for agents to access external resources — APIs, databases, files, functions. Instead of every agent hard-coding its own Stripe integration or bespoke database connector, you expose the tool as an MCP server and any MCP-compatible agent can use it.

The flow is simple:

# An MCP server exposes tools in a standard format
# The agent discovers available tools, then calls one:

POST /mcp/call
{
  "tool": "query_database",
  "input": {
    "query": "SELECT * FROM orders WHERE status = 'pending' LIMIT 10"
  }
}

The MCP server handles auth, validates the input, runs the query, returns structured output. The agent doesn't need to know anything about the database — just the tool contract.

This is genuinely useful. The ecosystem is growing. If you're building agents today and you haven't looked at MCP, you're probably reinventing it manually.

What MCP doesn't cover: it's synchronous request/response. The agent calls, the tool answers. There's no concept of the tool — or a human, acting as a tool — pushing something back to the agent unprompted. MCP is the agent's outbound interface to the world.

A2A: the coordination layer

Agent-to-Agent protocol, led by Google, handles how agents collaborate with each other. One agent can delegate a subtask to a specialist agent, receive results, and incorporate them into a larger workflow.

The core mechanism is the Agent Card — a self-description that each agent publishes: what it can do, what protocols it speaks, what kinds of requests it accepts. Before delegating, a coordinator agent looks up the agent cards of its candidates and picks the right one.

# A coordinator agent delegates a research subtask to a specialist
import requests

# 1. Discover the research agent
agent_card = requests.get("https://agents.internal/research-bot/.well-known/agent.json").json()

# 2. Delegate the task
task = requests.post(agent_card["endpoint"], json={
    "task": "summarize recent papers on vector database indexing",
    "format": "bullet points, max 5",
    "deadline": "2026-03-07T20:00:00Z"
})

# A2A is async — you get a task ID, then poll for completion  
task_id = task.json()["task_id"]
# {"task_id": "abc-123", "status": "submitted"}

(A2A task execution is asynchronous; simplified above for readability. See the A2A spec for streaming/callback patterns.)

A2A is shipping in real stacks now. Microsoft Foundry added an A2A Tool preview in January 2026. Google's ADK has multi-agent coordination patterns built on it. Huawei's Agentic Communication Network at MWC 2026 covers A2A task sessions for enterprise. It's early, but it's converging.

What A2A doesn't cover: it's agent-to-agent. The moment one of those agents needs a human decision — judgment about something ambiguous, access to private context it doesn't have, approval before a destructive action — A2A has no mechanism for that handoff.

The gap: nobody specified the human layer

Here's the three-layer picture that's emerging:

tools  ←—[MCP]—→  agents  ←—[A2A]—→  agents  ←—[???]—→  humans

MCP handles the left edge. A2A handles the middle. The right edge — how agents reach humans and how humans respond in context — is still improvised.

In practice, teams end up with one of three patterns:

Pattern 1: Tail the logs

python my_agent.py >> agent.log 2>&1 &
tail -f agent.log

Works for one-shot jobs. Falls apart when you're running 5 agents overnight and one hits an edge case at 2am.

Pattern 2: Bolt on a Slack bot

# When agent needs human input:
slack_client.chat_postMessage(
    channel="#agent-alerts",
    text=f"Agent needs a decision: {decision_point}"
)
# Then... what? You check Slack. Maybe. Eventually.
# The agent has no way to receive your reply.

One-way. The agent fires and forgets. You can't respond in context. The agent doesn't know you replied. Three Slack bot integrations later, the API changes and it breaks.

Pattern 3: Polling a REST endpoint

while True:
    response = requests.get("http://localhost:8000/human-input")
    if response.json()["status"] == "ready":
        handle_input(response.json()["content"])
    time.sleep(5)

At least it's bidirectional. But you've now built a bespoke approval interface for every agent, and five-second polling is a bad foundation for anything latency-sensitive.

(This is what Agent United replaces — a persistent WebSocket connection per agent, no polling.)

None of these are protocols. They're workarounds.

What a proper agent-to-human layer needs

The reason this is hard is that the requirements are different from both MCP and A2A:

Agent-initiated, not command-response. The agent needs to open a conversation — not just respond to a human command. When it hits an ambiguous state at 3am, it should be able to say "I found conflicting data, which source do you want me to trust?" and block until it gets an answer.

Bidirectional with context. The human's reply needs to become part of the agent's context for the next step. Not a log entry. Not a webhook payload that gets dropped into a queue. Actual context.

Persistent history. If the agent restarts, the conversation shouldn't disappear. If you're reviewing what an agent did last Tuesday, you should be able to read the thread.

Auth separation. Agents and humans are different principals in the same communication channel. They need different auth paths — API keys for agents, session tokens for humans — but they should be able to participate in the same conversation.

Real-time transport. Polling a REST endpoint every five seconds is brittle. A persistent WebSocket connection lets the agent listen for human replies without burning resources or introducing latency.

This isn't a novel set of requirements. It's basically what a chat platform does. The gap is that nobody has built one where the agents are first-class participants, not bots bolted on with a webhook.

Where things stand

MCP is stable and growing. The tooling is there, the ecosystem is building, and Anthropic has done the hard work of specifying it clearly.

A2A is early but moving fast. Google, Microsoft, and Huawei shipping production implementations in Q1 2026 is a strong signal.

The human layer: still improvised. No standard protocol. No dominant implementation. Everyone is solving it differently — Slack bots, Telegram webhooks, email, custom UIs — and most of those solutions break at 3+ agents or when APIs change.

This isn't a critique of MCP or A2A. They're solving the right problems for their layers. The human layer is just genuinely harder to standardize because it involves a person, which means latency requirements vary, auth gets complicated, and the interaction model needs to be flexible enough for natural language.

A note on what comes next

If you're architecting agent systems right now, the agent-to-human layer is worth thinking about explicitly before you ship. The decisions you make about it — how agents escalate, how humans respond, what gets persisted — tend to become load-bearing parts of your system fast.

The good news is that the primitives are clear even if the standard isn't: you need bidirectional transport (WebSocket or SSE), a persistent message store (Postgres/Redis), and an auth model that treats agents and humans as separate principals.

This is the problem we're working on with Agent United — an open-source, self-hosted platform where agents are first-class participants in the communication layer. Worth a look if you're hitting this wall.

The more interesting question for the community: do we need a formal protocol spec for the human layer, the way MCP formalized tool access? Or is this inherently too use-case-specific to standardize? Genuinely curious what people building production agent systems think.

Building something in this space? The thread is open. —Naomi