A2A Is Not Enough: Production Multi-Agent Systems Need a Control Plane
Most discussions about multi-agent systems still stop at communication.
That is necessary, but it is not sufficient.
If you want real production systems, you need to solve three different layers:
- Agent-to-agent coordination
- Agent-to-tool integration
- Observability and governance
In 2025 and 2026, the open ecosystem got much better on the first two.
- The Linux Foundation launched the Agent2Agent (A2A) project on June 23, 2025 as an open protocol created by Google for secure agent-to-agent communication.
- Anthropic announced on December 9, 2025 that it was donating MCP (Model Context Protocol) to the Agentic AI Foundation, and reported 10,000+ active public MCP servers and 97M+ monthly SDK downloads across Python and TypeScript.
- In 2026, observability vendors started saying the quiet part out loud: observability is the control plane for agentic systems.
Those three facts together define the real architecture shift.
The stack is getting clearer
A useful way to think about modern agent systems is:
- A2A handles agent-to-agent interoperability
- MCP handles agent-to-tool connectivity
- Observability/control-plane infrastructure handles traceability, policy, reliability, and cost control
That separation matters because teams often try to force one layer to do the job of another.
What A2A is good at
According to the Linux Foundation announcement, A2A exists to let agents:
- discover one another
- exchange information securely
- collaborate across systems
- reduce vendor lock-in
- interoperate across platforms and frameworks
That is the collaboration layer.
If you have specialized agents for planning, coding, research, operations, or QA, A2A gives you a standard way to route work between them.
But A2A does not tell you:
- which tool call caused a bad decision
- why costs spiked
- which memory retrieval corrupted the context
- whether a guardrail fired too late
- how to replay a failure path
That is not a protocol failure. It is simply not the protocol’s job.
What MCP is good at
MCP solved a different problem: connecting models and agents to external systems through a common interface.
The December 2025 Anthropic announcement matters because it showed MCP was no longer a niche developer experiment. The protocol had:
- 10,000+ active public servers
- support across major products and platforms
- 97M+ monthly SDK downloads
That is what production gravity looks like.
MCP gives teams a standard way to expose tools, data sources, connectors, and workflows to agent runtimes.
But again, MCP does not solve the entire production problem.
Even with perfect tool access, you still need to answer:
- Who approved this action?
- What prompt and context produced it?
- What chain of tools did the agent invoke?
- How much latency, token usage, and spend did the run create?
- What happened when the system degraded?
Why observability is the missing layer
Arthur AI’s 2026 observability playbook describes observability as the layer that turns autonomous behavior into measurable, auditable outcomes.
That framing is correct.
Traditional monitoring tells you whether a service is up.
Agent observability tells you:
- what the agent attempted
- why it chose that path
- which tools it called
- what context it consumed
- where the workflow stalled or drifted
- how much it cost
- whether it stayed inside policy
That is the difference between a demo and an operating system.
The production architecture pattern
A practical production stack looks like this:
1. Specialized agents
Each agent should own a bounded role:
- planner
- researcher
- coder
- reviewer
- operator
- customer-facing executor
Do not create a committee of identical generalists.
2. Standardized coordination via A2A
Use A2A for:
- capability discovery
- task routing
- handoffs
- delegation
- status exchange
This keeps the collaboration layer interoperable.
3. Standardized tools via MCP
Use MCP for:
- databases
- code execution
- retrieval
- external APIs
- internal services
- enterprise systems
This keeps tool access portable.
4. A real control plane
Instrument the system so every important step emits:
- traces
- tool invocations
- memory retrieval events
- prompts and outputs where policy allows
- latency
- token usage
- error classes
- approval events
- policy/guardrail triggers
This is where OpenTelemetry-first design becomes attractive. You want the telemetry path to be portable too.
Common failure mode: protocol maximalism
A common mistake is thinking:
“If we adopt the right protocol, the system becomes production-ready.”
It does not.
Protocols reduce integration friction. They do not magically produce:
- good decomposition
- cost discipline
- sane governance
- testability
- rollback paths
- useful failure handling
Production multi-agent systems fail less because of missing slogans and more because of missing operational structure.
What teams should build now
If you are building an agent platform in 2026, prioritize this order:
First: make actions visible
Before adding more agent roles, ensure you can answer:
- What did the system do?
- Why did it do it?
- Which tool calls happened?
- What did the run cost?
- Where did it fail?
Second: make collaboration explicit
Use A2A or an equivalent contract so delegation is structured, not ad hoc prompt passing.
Third: make tools standardized
Use MCP or an equivalent abstraction so agents are not tightly coupled to one-off integrations.
Fourth: add governance where it matters
Add policy checks on:
- sensitive tool access
- irreversible actions
- customer-impacting outputs
- cost thresholds
- escalation boundaries
Fifth: optimize for replayability
If you cannot replay an incident, you cannot improve the system reliably.
Final point
The future is not one giant agent.
It is also not a swarm with no discipline.
The winning pattern is:
- A2A for coordination
- MCP for tool use
- observability as the control plane
Communication matters.
Tool access matters.
But if you cannot inspect, govern, and debug autonomous behavior, you do not have a production system.
You have a live experiment.
Sources
- Linux Foundation, “Linux Foundation Launches the Agent2Agent Protocol Project to Enable Secure, Intelligent Communication Between AI Agents” (June 23, 2025)
- Anthropic, “Donating the Model Context Protocol and establishing the Agentic AI Foundation” (Dec 9, 2025)
- Arthur AI, “Agentic AI Observability Playbook 2026: Standards Every Executive Must Adopt” (Apr 2, 2026)
Top comments (0)