Ali Suleyman TOPUZ

Posted on Mar 31 • Originally published at topuzas.Medium on Mar 31

Agentic Architectures — Article 4: Agentic Protocols (MCP and A2A)

#softwareengineering #architecture #api #artificialintelligen

Interoperability and the “Connective Tissue” of AI

Every mature technology ecosystem eventually hits the same wall. Early on, everyone builds their own integrations — custom API wrappers, bespoke data formats, proprietary communication layers. It works, until the ecosystem grows large enough that the integration cost becomes the dominant cost. Then someone proposes a standard, half the industry argues about it for two years, and eventually something wins.

The agentic AI ecosystem is hitting that wall right now.

A year ago, if you wanted your agent to read files from your local filesystem, query your database, and post a summary to Slack, you wrote three custom integrations. If you wanted two agents from different vendors to hand off a task, you wrote a custom serialization format and hoped both sides agreed on what “done” meant. Every team was solving the same plumbing problems independently, and none of the pipes connected.

Two protocols are emerging to fix this. The Model Context Protocol (MCP) standardizes how agents connect to tools and data sources. Agent-to-Agent (A2A) standardizes how agents talk to each other. Together, they are becoming the connective tissue of the agentic ecosystem — the infrastructure layer that lets you stop thinking about plumbing and start thinking about what your agents actually do.

This article is about both: what they are, how they work, and what production deployment with them actually looks like.

Model Context Protocol: One Interface to Rule Them All

Before MCP, every agent-to-tool integration was a bespoke engineering project. Want your agent to read from a PostgreSQL database? Write a tool wrapper. Want it to search Confluence? Write another wrapper. Want it to list files in an S3 bucket? Another wrapper. Each wrapper has its own error handling, its own authentication scheme, its own data format. You end up with a collection of brittle, hard-to-maintain glue code that grows proportionally with every new tool you add.

Anthropic introduced MCP in late 2024, and the core insight is simple: if every tool exposes the same interface, agents only need to learn one way to talk to tools.

MCP defines a standardized JSON-RPC interface between a “host” (the agent or the application running it) and a “server” (any tool or data source). The protocol specifies three primitive types that a server can expose:

Resources — data that the agent can read (files, database rows, API responses, calendar entries)
Tools — functions the agent can invoke with parameters (send an email, create a Jira ticket, run a SQL query)
Prompts — reusable prompt templates that the server exposes for the agent to use in context

The communication looks like this:

Agent (MCP Host) MCP Server (e.g., Filesystem)
      | |
      |── initialize() ────────────────────────>|
      |<─ capabilities (resources, tools) ──────|
      | |
      |── tools/list() ────────────────────────>|
      |<─ [read_file, write_file, list_dir] ────|
      | |
      |── tools/call(read_file, {path}) ────────>|
      |<─ {content: "..."} ─────────────────────|

What makes this powerful is that the agent doesn’t need to know anything about the underlying data source. It just knows: here is a list of tools available on this server, here are their schemas, here is how to call them. The MCP server handles the actual implementation — the filesystem calls, the database queries, the API authentication.

The practical consequence is that an agent built on MCP can connect to any MCP-compatible server without custom integration code. Your Slack workspace, your local filesystem, your PostgreSQL database, your Google Calendar — if there’s an MCP server for it (and increasingly, there is), your agent can use it out of the box.

How MCP Gives Agents Context They Couldn’t Have Before

The “context” in Model Context Protocol is doing real work. One of the fundamental limitations of LLM-based agents has always been that their knowledge is frozen at training time — they know what they were trained on, and nothing that happened after the cutoff date. RAG helps with some of this, but it’s fundamentally a retrieval problem: you have to know what to retrieve.

MCP takes a different approach. Instead of retrieving information and injecting it into the prompt, it gives the agent live access to the systems where your information actually lives.

Consider the difference in practice. A customer support agent without MCP retrieves customer history from a vector store populated by a nightly batch job. The information is at least a day old, possibly more, and it’s a lossy representation — embeddings capture semantic meaning but lose precise details. An MCP-enabled agent with access to your CRM’s MCP server reads the customer record directly, in real time, with full fidelity.

The agent can now:

See the customer’s last three support tickets — not summaries, the actual tickets
Check their current subscription status — not a cached version, the live record
Read the internal notes the account manager left yesterday
Look at the open invoices in your billing system

None of this required a custom integration. It required an MCP server for your CRM, an MCP server for your billing system, and an agent configured to connect to both.

The architectural implication is significant: MCP shifts the integration burden from the agent developer to the tool developer. Once a tool has an MCP server, any MCP-compatible agent can use it. This is the same network effect that made REST APIs dominant — not because REST was technically superior in every dimension, but because a common standard made the ecosystem composable.

Agent-to-Agent Communication: Defining a Common Language

MCP solves the agent-to-tool problem. A2A solves the agent-to-agent problem, and it’s a harder one.

When two agents need to collaborate on a task, they face a set of questions that are easy to answer between humans but surprisingly tricky to standardize for software:

How does Agent A tell Agent B what it needs?
How does Agent B signal that it’s accepted the task, is working on it, or has completed it?
What format does the result come back in?
What happens if Agent B can only partially complete the request?
How does Agent A know Agent B is trustworthy?

The A2A protocol (developed collaboratively by Google and a consortium of enterprise technology vendors, with broad industry participation) defines a standard vocabulary for all of these interactions. Like MCP, it’s built on JSON-RPC, which means it’s transport-agnostic and integrates cleanly with existing HTTP infrastructure.

The core concept in A2A is the Task — a unit of work that one agent requests from another. A Task has a lifecycle:

submitted → working → [input-required] → working → completed
                                                  → failed
                                                  → cancelled

Agent A submits a Task to Agent B’s endpoint. Agent B acknowledges with a task ID and status. Agent A can poll for updates or receive streaming events as Agent B makes progress. When Agent B completes the task, it returns a structured result. If something goes wrong, it returns a structured error with enough context for Agent A to decide what to do next.

What makes this more than just a REST API convention is the Agent Card — a machine-readable document that each agent publishes at a well-known endpoint, describing:

What tasks it can accept (its capabilities)
What authentication it requires
What input formats it accepts and what output formats it produces
Its current availability and load

An orchestrator agent discovering a new peer doesn’t need documentation or a human to explain the integration. It reads the Agent Card, understands the capabilities, and knows how to submit tasks. The protocol handles the rest.

The Contract-Net Protocol: Agents That Bid on Work

One of the more elegant ideas in the A2A ecosystem is borrowed from classical distributed AI: the Contract-Net Protocol , originally proposed in the 1980s and now finding new relevance in the agentic era.

The idea is that task assignment shouldn’t be static — orchestrators shouldn’t hardcode which agent handles which task type. Instead, agents should be able to bid on tasks based on their current state, capabilities, and load.

The flow works like this:

Orchestrator broadcasts task announcement
        ↓
Available agents evaluate: Can I do this? At what cost? How fast?
        ↓
Interested agents submit bids (capability match, estimated latency, current load)
        ↓
Orchestrator evaluates bids and awards task to winning agent
        ↓
Winning agent executes, reports completion
        ↓
Orchestrator releases other agents

In practice, a bid might contain:

Capability score : How well does this agent’s specialization match the task requirements? (0.0 to 1.0)
Estimated completion time : Based on current queue depth and task complexity
Resource cost estimate : How many tokens, compute cycles, or API calls will this take?
Confidence level : How certain is the agent that it can complete this task successfully?

The orchestrator applies a selection policy — lowest cost, fastest completion, highest confidence, or a weighted combination — and awards the contract.

This pattern is particularly valuable in systems where agent load is uneven. A Coder Agent might be heavily loaded while a Reviewer Agent is idle. Without bidding, the orchestrator has no visibility into this. With bidding, the idle Reviewer Agent can bid aggressively on tasks that are near its competency boundary, while the overloaded Coder Agent bids conservatively or not at all.

The Contract-Net Protocol also provides natural load balancing for horizontally scaled agent pools. If you’re running three instances of the same agent type, whichever instance is least loaded will submit the most competitive bid. The orchestrator doesn’t need to know anything about instance count or load distribution — the bidding mechanism handles it automatically.

Security & Identity: How an Agent Proves Who It Is

This is the section that gets skipped in tutorials and becomes an urgent problem in production. When Agent A calls Agent B’s endpoint, Agent B needs to answer a question that is non-trivial: is this request actually coming from a trusted agent in my system, or is someone impersonating it?

In human-facing systems, we solve this with OAuth 2.0 and OIDC — the user authenticates with an identity provider, gets a token, and presents that token to services. The same pattern applies to agents, with some important adaptations.

OIDC for Agents (increasingly referred to as Workload Identity in the cloud provider ecosystem) works like this:

Agent Runtime Identity Provider Downstream Service
      | | |
      |── request token ───────────>| |
      |<─ signed JWT (agent ID) ────| |
      | | |
      |── call with JWT ────────────────────────────────────> |
      | | verify signature ────> |
      | |<─ valid, proceed ─────── |
      |<─ response ─────────────────────────────────────────── |

The key components:

Agent Identity Token — A short-lived JWT issued by your identity provider that asserts the agent’s identity, role, and the specific permissions it has been granted. “I am the CRM-Reader agent, issued by your organization’s IDP, and I am authorized to read customer records but not write them.” The token is signed by the IDP; the downstream service verifies the signature without needing to call the IDP on every request.

Scoped Permissions — Each agent should have a token scoped to the minimum permissions it needs for its function. The Coder Agent doesn’t need write access to the CRM. The Customer Service Agent doesn’t need access to the code repository. Principle of least privilege applies to agents exactly as it does to human users.

Token Rotation — Agent tokens should be short-lived (15–60 minutes) and automatically rotated. This limits the blast radius if a token is compromised. The agent runtime handles rotation transparently — the agent doesn’t need to manage its own credential lifecycle.

Audit Logging — Every action an agent takes should be logged with its identity token. When you need to answer “which agent accessed this customer record at 14:32 yesterday and why,” the audit log should give you a precise answer. This is not optional in regulated industries; it’s increasingly expected everywhere.

On AWS, this pattern maps naturally to IAM Roles for Tasks (ECS) or Pod Identity (EKS). On the Bedrock AgentCore Runtime, each agent execution context gets an IAM role with the permissions defined at deployment time. The agent never handles long-lived credentials — the runtime injects temporary credentials into the execution environment automatically.

Discovery Services: Building the Agent Registry

As your agent ecosystem grows, a new operational problem emerges: how does an orchestrator find the right agent for a given task? Hardcoding agent endpoints into orchestrator logic works for two or three agents. It becomes a maintenance liability at ten, and an operational nightmare at fifty.

The solution is borrowed directly from service mesh architecture: a Discovery Service — a registry where agents advertise their presence, capabilities, and health, and where orchestrators query to find appropriate peers.

The concept maps to familiar infrastructure patterns:

Eureka (Netflix’s service registry) and Consul (HashiCorp’s service mesh) solve this problem for microservices. The same principles apply to agent registries.
In the Kubernetes ecosystem, this maps naturally to Service resources and endpoint discovery.
In the cloud-native agentic ecosystem, the A2A Agent Card serves as the registration payload.

A well-designed Agent Registry exposes two primary interfaces:

Registration — Agents announce themselves on startup and deregister on shutdown. The registration payload includes the Agent Card (capabilities, input/output schemas, authentication requirements) plus runtime metadata (current load, health status, version).

Discovery — Orchestrators query the registry with a capability description: “I need an agent that can process PDF documents, write to a SQL database, and respond within 5 seconds.” The registry returns a ranked list of matching agents, filtered by health status and sorted by relevance score.

Agent Startup Registry Orchestrator
      |── register(AgentCard) ──>| |
      |<─ registered (id) ───────| |
      | |<─ discover(capability query) ───|
      | |── [Agent A, Agent B] ──────────>|
      |<─ task submission ────────────────────────────────────────|

Health checking is essential. An agent that has registered but stopped responding is worse than an absent agent — it will be selected for tasks it can’t complete, causing failures and retries. The registry should actively probe registered agents on a regular heartbeat interval and automatically deregister agents that miss consecutive health checks.

The discovery query language deserves careful design. Simple string matching on capability names breaks down quickly — “summarization” and “document summarization” and “text condensation” might all refer to the same capability. A well-designed registry uses structured capability taxonomies (standardized tags from a shared vocabulary) rather than free-text descriptions, ensuring that capability matching is reliable rather than approximate.

Putting It Together: The Full Protocol Stack

Across this series, we’ve built up a complete picture of what a production agentic system looks like. The protocol layer is where all of it connects:

┌─────────────────────────────────────────────────────────────────┐
│ USER / APPLICATION │
└───────────────────────────────┬─────────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ (Hierarchical Planning, ReAct Loop) │
│ [Article 2 patterns] │
└──────────┬──────────────────────────────────┬───────────────────┘
           │ │
    A2A Protocol A2A Protocol
           │ │
┌──────────▼──────────┐ ┌──────────▼──────────┐
│ SPECIALIST AGENT │ │ SPECIALIST AGENT │
│ (Coder / Writer) │ │ (Reviewer / Critic)│
│ [Article 2 & 3] │ │ [Article 2 & 3] │
└──────────┬──────────┘ └──────────┬──────────┘
           │ │
    MCP Protocol MCP Protocol
           │ │
┌──────────▼──────────┐ ┌──────────▼──────────┐
│ MCP SERVER │ │ MCP SERVER │
│ (Filesystem / DB) │ │ (Slack / Calendar) │
└─────────────────────┘ └─────────────────────┘
           │ │
     [AgentOps Layer: OTel, Guardrails, HITL — Article 3]

MCP handles the vertical connections — agents to tools and data. A2A handles the horizontal connections — agents to agents. The AgentOps layer (observability, guardrails, eval pipelines, HITL) sits across all of it, providing the operational visibility and control that makes the whole system trustworthy in production.

The Maturity Model from Article 1 maps onto this stack naturally: L1 and L2 systems use neither MCP nor A2A. L3 systems benefit significantly from MCP — standardizing tool access reduces integration overhead and makes the agent more capable without custom code. L4 and L5 systems need both — A2A for agent-to-agent coordination and MCP for tool access, with the AgentOps layer making the whole thing operable.

Production Reality Check

MCP and A2A are genuine improvements over the integration chaos they replace. They’re also early-stage standards in an ecosystem that is moving fast, and production adoption comes with real caveats.

MCP server quality is uneven. The protocol is well-designed, but the ecosystem of available servers ranges from production-ready to experimental. Before adopting a community-maintained MCP server for a critical tool, audit its error handling, its authentication implementation, and how actively it’s maintained. A poorly implemented MCP server that swallows errors silently is harder to debug than a custom integration that fails loudly.

A2A task lifecycle management requires discipline. The protocol defines task states clearly, but implementing correct lifecycle management — handling timeouts, zombie tasks that never complete, cascade failures when a Worker agent goes down mid-task — requires careful engineering. Don’t assume the protocol handles operational edge cases for you; it defines the interface, not the reliability.

Discovery services add operational surface area. A registry is another system to operate, monitor, and keep highly available. If your registry goes down, your orchestrators can’t find agents. Design for registry failure explicitly: agents should cache recent discovery results, orchestrators should have fallback direct-connection configurations for critical agents, and your monitoring should alert on registry health before it affects agent routing.

Identity and security are non-negotiable at scale. It’s tempting to skip the OIDC integration during early development and use shared API keys for agent-to-agent authentication. This is fine for a proof of concept and a liability in production. Build the identity layer before you scale, not after — retrofitting workload identity into a running multi-agent system is significantly more painful than designing it in from the start.

The practical adoption path that has worked well: start with MCP for tool integrations (the ROI is immediate and the risk is low), add A2A when you have multiple agents that need to coordinate (and not before), build the identity layer in parallel with A2A adoption, and add a discovery service when you have more than five distinct agent types in production.

Closing the Series

Over four articles, we’ve covered the full arc of agentic system design:

Article 1 gave us the vocabulary — five levels of maturity, mapped to the infrastructure and cost reality of each.
Article 2 gave us the reasoning patterns — how agents plan, reflect, coordinate, and share knowledge without drowning in state.
Article 3 gave us the operational discipline — observability, safety, evaluation, and the human checkpoints that keep the system trustworthy.
Article 4 gave us the protocols — the standardized interfaces that make agents composable, discoverable, and secure at scale.

The through-line across all four is a consistent argument: the intelligence of your agent system is not primarily determined by the model you choose. It’s determined by the architecture around the model — the planning patterns, the memory design, the error handling, the observability, the coordination protocols. Models are commoditizing. Architecture is the durable differentiator.

The teams building production agentic systems that actually work — not just in demos, but at scale, with real users, over time — are the ones treating AI like the distributed systems discipline it has become. The tools, protocols, and patterns in this series are what that looks like in practice.

Build carefully. Measure everything. Ship incrementally. And set a maximum iteration count on your reflection loops.

                  L1: Stateless L2: Tool-Augmented L3: Autonomous L4: Multi-Agent L5: Self-Correcting
------------------------------------------------------------------------------------------------------------------------------------------------------
Execution Serverless / Edge Serverless + integr. Long-running container Distributed orchestrator Distributed + feedback loops
State None None Short + long-term memory Shared state across agents State + mutation history
Latency profile Predictable Slightly variable Variable (loop-dependent) High, parallelizable Highest, bounded by budget
Cost model Linear (tokens) Linear + tool costs Nonlinear (calls per task) Nonlinear × agent count Nonlinear × iteration count
Primary failure Bad retrieval Tool hallucination Context overflow Cascade failures Runaway loops
Observability Basic logging Tool call tracing Full trace per loop Cross-agent tracing Cost + quality dashboards

Included for reference from Article 1 — MCP maps primarily to L2 and L3. A2A maps primarily to L4 and L5.

DEV Community