DEV Community: Mahmoud Ayoub

The Next Wave of AI: Intelligent Agents Working Together

Mahmoud Ayoub — Mon, 05 May 2025 09:39:14 +0000

The next era of AI isn’t powered by solo models it’s built by teams of agents that think, act, and collaborate. With A2A and MCP, the future of AI is not just intelligent, it’s interoperable.

The Challenge of Building Multi-Agent Systems Today

If you're building AI agents today, you’ve probably noticed the challenge: individual agents can be smart, but when they need to collaborate, communication often feels clumsy and inefficient.

Without a common language or system, agents end up siloed unable to share information or coordinate tasks effectively.

This is where Agent2Agent (A2A) and Model Context Protocol (MCP) come in.

These protocols offer a standardized foundation for real-world, production-grade multi-agent ecosystems.

What is Agent2Agent (A2A)?

At its core, Agent2Agent (A2A) is an open protocol that allows AI agents to:

Discover each other
Share their capabilities
Request and delegate tasks
Exchange structured data
Stream updates in real time

The heart of A2A is the Agent Card a standardized description of what an agent can do, which interfaces it supports (text, video, forms, etc.), and how to interact with it.

Instead of brittle, custom integrations, agents can simply browse available Agent Cards, select the right collaborator, and initiate cooperation.

Key Features of A2A

HTTP and JSON based (easy for developers)
Push notifications for real-time updates
Streaming support for long-running tasks
Built-in authentication and security
Designed for multiple interaction modes (not just text chat)

Bottom line:

A2A helps agents function like true teammates, not isolated bots operating in silos.

A2A in Action: Reimbursement Agent Example

To ground this in reality, let’s look at a simplified example from the opensource A2A agent repo.

This Reimbursement Agent helps users submit reimbursement requests and shows how an A2A-compliant agent defines its skills, handles missing information, and interacts with tools like APIs or forms.

1. Defining the Agent’s Skill and Capabilities

The agent advertises its functionality using an AgentCard, which includes a skill (in this case, reimbursement) and its capabilities:

capabilities = AgentCapabilities(streaming=True)

skill = AgentSkill(
    id="process_reimbursement",
    name="Process Reimbursement Tool",
    description="Helps with the reimbursement process for users.",
    tags=["reimbursement"],
    examples=["Can you reimburse me $20 for my lunch with the clients?"],
)

agent_card = AgentCard(
    name="Reimbursement Agent",
    description="Handles reimbursement processes for employees.",
    url=f"http://{host}:{port}/",
    version="1.0.0",
    defaultInputModes=ReimbursementAgent.SUPPORTED_CONTENT_TYPES,
    defaultOutputModes=ReimbursementAgent.SUPPORTED_CONTENT_TYPES,
    capabilities=capabilities,
    skills=[skill],
)

server = A2AServer(
    agent_card=agent_card,
    task_manager=AgentTaskManager(agent=ReimbursementAgent()),
    host=host,
    port=port,
)

This skill definition helps other agents know when and how to call this agent.

2. Creating a Reimbursement Request Form

The agent uses a structured tool called create_request_form() to collect missing information from users before proceeding:

def create_request_form(date=None, amount=None, purpose=None):
    return {
        "request_id": "request_id_123456",
        "date": date or "<transaction date>",
        "amount": amount or "<transaction dollar amount>",
        "purpose": purpose or "<business justification>",
    }

This helps standardize the input, ensuring the agent can reason about incomplete or partial information.

3. Returning a Structured Form to the User

Once a form is generated, the agent can return it as a JSON object that will be rendered in a UI:

def return_form(form_data, tool_context, instructions=None):
    return {
        "type": "form",
        "form": {
            "type": "object",
            "properties": {
                "date": {"type": "string", "title": "Date"},
                "amount": {"type": "string", "title": "Amount"},
                "purpose": {"type": "string", "title": "Purpose"},
                "request_id": {"type": "string", "title": "Request ID"},
            },
            "required": list(form_data.keys()),
        },
        "form_data": form_data,
        "instructions": instructions,
    }

4. Validating and Processing the Request

Once the form is filled, the agent uses the reimburse() function to process it:

def reimburse(request_id):
    if request_id not in request_ids:
        return {"status": "Error: Invalid request_id."}
    return {"status": "approved", "request_id": request_id}

5. Putting It All Together with an LLM Agent

The core logic of how the agent uses its tools is defined in a prompt and wrapped in an LlmAgent:

return LlmAgent(
    model="gemini-2.0-flash-001",
    name="reimbursement_agent",
    instruction="""
        You are an agent who processes reimbursements. Start by calling create_request_form().
        Then call return_form(). Once completed by the user, call reimburse().
    """,
    tools=[create_request_form, return_form, reimburse],
)

What is Model Context Protocol (MCP)?

While A2A focuses on agent-to-agent communication, Model Context Protocol (MCP) focuses on context delivery, ensuring that models have all the information they need to perform intelligently.

Large Language Models (LLMs) are powerful, but they need access to:

User profiles
Real-time external data
APIs for tools and services
Internal documents and knowledge bases

MCP standardizes how this information is delivered to the model in a structured, secure, and model-agnostic way.

Key Features of MCP

Model-agnostic (compatible with Claude, Gemini, GPT, and others)
Security-first architecture for sensitive data
Built-in support for tool calling
Enables richer, more accurate outputs by providing complete context

Think of MCP as a universal adapter that plugs LLMs into your organization's real-world data, systems, and workflows.

Extending the Reimbursement Agent with MCP

While A2A enables agent discovery and collaboration, Model Context Protocol (MCP) ensures each agent or model receives relevant, structured context for better decisions.

Let’s integrate an MCP-compliant context server into the Reimbursement Agent. This allows it to expose useful tools, documents, and real-time context to LLMs or other agents.

1. Define the MCP Context Server

The MCP server provides access to the agent’s tools and context through a standard interface.

from mcp.server import MCPServer
from mcp.schema import ToolDefinition, ToolCallRequest, ToolCallResponse

# Define the tool metadata
tool_definitions = [
    ToolDefinition(
        name="create_request_form",
        description="Creates a reimbursement request form with fields for date, amount, and purpose.",
        input_schema={"type": "object", "properties": {}},  # Parameters can be defined as needed
        output_schema={"type": "object"},
    ),
    ToolDefinition(
        name="return_form",
        description="Returns the structured reimbursement form for user input.",
        input_schema={"type": "object"},
        output_schema={"type": "object"},
    ),
    ToolDefinition(
        name="reimburse",
        description="Processes the reimbursement request and returns the status.",
        input_schema={"type": "object", "properties": {"request_id": {"type": "string"}}},
        output_schema={"type": "object"},
    ),
]

2. Handle Tool Calls via the MCP API

This endpoint allows models to invoke tools securely and consistently.

def handle_tool_call(request: ToolCallRequest) -> ToolCallResponse:
    if request.tool_name == "create_request_form":
        result = create_request_form(**request.input)
    elif request.tool_name == "return_form":
        result = return_form(**request.input)
    elif request.tool_name == "reimburse":
        result = reimburse(**request.input)
    else:
        return ToolCallResponse(error="Unknown tool")

    return ToolCallResponse(output=result)

3. Launch the MCP Server

Finally, spin up the MCP server alongside the A2A server:

mcp_server = MCPServer(
    tools=tool_definitions,
    handle_tool_call=handle_tool_call,
    host="0.0.0.0",
    port=8081,
)

mcp_server.run()

Why A2A and MCP Are Powerful Together

On their own, both protocols add value.

Together, they unlock the next generation of intelligent agent ecosystems.

Imagine this scenario:

An agent manages job interviews using A2A and MCP.
It discovers other agents like a resume parser, calendar scheduler, or interviewer assistant through A2A, using their Agent Cards to understand their capabilities.
It accesses your internal company data like HR policies, org charts, or even your calendar via MCP, using standardized tools, prompts, and data sources.
It invokes tools exposed by remote systems (e.g., ATS platforms or calendar APIs) through the MCP client-server structure, enabling secure, structured execution of real-world actions.
It streams updates in real time to stakeholders hiring managers, candidates, or other agents as the workflow progresses.

The result?

Not a collection of disconnected bots, but a coordinated system of intelligent agents operating with context, awareness, and autonomy.

This is the shift from clever AI demos to real, production-grade multi-agent systems dynamic, modular, and ready for the complexity of real-world work.

Why Developers Should Pay Attention

Before A2A and MCP, multi-agent systems were often:

Painful and time-consuming to build
Dependent on fragile custom integrations
Brittle across model updates and system changes

With A2A and MCP, developers gain a shared, standardized foundation that offers:

Easier interoperability between agents from different vendors
The emergence of agent marketplaces
Dynamic, adaptive multi-agent workflows
A truly modular approach to building AI systems

This marks a major step forward in composable, scalable AI architecture no longer tied to a single vendor or platform.

Final Thoughts

A2A and MCP are still early-stage protocols. Standards will continue to evolve, and adoption may take time.

However, the future direction is clear:

Multi-agent AI needs common languages and protocols.
Real-world context is critical for model success.
Open, interoperable ecosystems will outperform closed, proprietary ones.

If you're building agentic AI today, bookmark A2A and MCP.

If you're observing the space, prepare for rapid innovation.

The next era of AI isn't about isolated genius models it's about intelligent agents working together like dynamic, adaptable teams.

And the future is already taking shape.

References:

How DeepSeek Narrowed the Gap to OpenAI’s o1 Model: A Revolutionary Step in Reasoning AI

Mahmoud Ayoub — Tue, 28 Jan 2025 10:05:00 +0000

In January 2025, DeepSeek-AI introduced its reasoning model, DeepSeek-R1, claiming performance on par with OpenAI's o1-1217 model. By combining reinforcement learning (RL) with innovative training approaches, DeepSeek achieved remarkable reasoning performance without the vast computational resources typically associated with pretraining. This article explores how DeepSeek brought its model within striking distance of OpenAI’s and highlights key insights for the AI community.

Superiority of DeepSeek's Approach

Reinforcement Learning as the Core Training Strategy

DeepSeek leveraged Group Relative Policy Optimization (GRPO), a cost-effective RL algorithm, to optimize reasoning capabilities. Unlike traditional supervised fine-tuning, GRPO enabled significant improvements in math, coding, and logical reasoning by sampling and comparing group outputs during training.
Two-Tiered Model Development
- DeepSeek-R1-Zero: Trained purely with RL, this model displayed self-evolution, developing advanced problem-solving behaviors such as reflection and iterative re-evaluation.
- DeepSeek-R1: Built upon R1-Zero, this version added a cold-start phase, utilizing curated Chain-of-Thought (CoT) datasets to produce coherent, user-friendly outputs and align with human preferences.
Cold Start Data for Readability and Accuracy

The cold-start phase addressed RL’s training instability by incorporating a small set of high-quality CoT examples. This improved both readability and alignment with user expectations, ensuring the model produced clearer and more accurate outputs.
Revolutionizing Distillation

DeepSeek demonstrated the power of distillation, transferring reasoning capabilities from the 70B-parameter DeepSeek-R1 into smaller models like Qwen-14B and Qwen-32B. These smaller models outperformed many larger counterparts, achieving state-of-the-art results on benchmarks such as AIME 2024 and MATH-500 without requiring expensive RL training.
Benchmark Excellence
- Achieved 97.3% on MATH-500 and 79.8% on AIME 2024, matching OpenAI-o1-1217.
- Excelled on Codeforces, with an Elo rating of 2029, outperforming 96% of human participants.
- Delivered strong results on non-reasoning tasks like creative writing, summarization, and editing, with a 92.3% win-rate on ArenaHard.
Emergent Behaviors

During RL training, DeepSeek-R1-Zero developed advanced reasoning strategies like reflection, verification, and prolonged thinking time. These unprogrammed emergent behaviors underscored RL’s potential to drive high-level intelligence.
Open-Source Contributions

DeepSeek went beyond the norm by open-sourcing not only its primary models but also six smaller dense models distilled from DeepSeek-R1. This decision enables researchers to build on its achievements without facing prohibitive computational costs.

Challenges Faced and Overcome

Instability in Early RL Training
- Challenge: Pure RL training led to unstable outputs, including poor readability and language mixing.
- Solution: The cold-start phase stabilized training by giving the model a structured foundation, significantly improving output quality.
Language Mixing in Chain-of-Thought (CoT)
- Challenge: RL training often resulted in mixed-language responses, reducing accessibility.
- Solution: A language consistency reward was introduced to enforce single-language outputs, aligning with user preferences.
Scaling RL for Smaller Models
- Challenge: Direct RL on smaller models was computationally expensive and yielded limited results.
- Solution: Reasoning patterns were distilled from DeepSeek-R1 to smaller models like Qwen and Llama, achieving strong performance with far lower costs.
Cold-Start Data Challenges
- Challenge: Curating high-quality cold-start datasets was time-intensive but necessary.
- Solution: Strategies like refining outputs, using long CoT examples, and employing human annotators ensured effective datasets.
Sensitivity to Prompts
- Challenge: DeepSeek-R1’s performance was highly sensitive to how prompts were phrased.
- Solution: Users were advised to adopt zero-shot prompting, directly describing problems and output formats for optimal results.
Impact of Safety RL
- Challenge: Safety-focused RL caused overly cautious behavior, such as refusing to answer certain queries on the Chinese SimpleQA benchmark.
- Solution: Plans are in place to fine-tune safety mechanisms to better balance task performance and risk management.
Complexity of Software Engineering Tasks
- Challenge: Long evaluation times limited RL’s effectiveness for coding and engineering tasks.
- Solution: Future iterations will implement asynchronous evaluations and rejection sampling to boost efficiency in these areas.
Challenges with Fine-Grained Rewards
- Challenge: Process-based reward models struggled to define intermediate steps and were prone to reward hacking.
- Solution: DeepSeek adopted simpler rule-based accuracy rewards, ensuring a robust RL pipeline.
Monte Carlo Tree Search (MCTS) Limitations
- Challenge: MCTS failed to scale due to the large search space in token generation.
- Solution: RL with CoT was more practical and effective for handling complex reasoning tasks.

Key Takeaways

Reinforcement Learning Alone Can Drive Reasoning

DeepSeek proved that RL alone can develop strong reasoning capabilities, challenging the reliance on supervised fine-tuning.
Cold Start Data Makes a Big Difference

Introducing a small, high-quality dataset as a cold start greatly improved training stability and output clarity, solving major RL-only issues.
Distillation Expands Access

By distilling reasoning capabilities into smaller models, DeepSeek made high-performance AI accessible without massive computational requirements.
Emergent Behaviors Show RL’s Power

Spontaneous behaviors like reflection and iterative problem-solving highlight the potential of RL to unlock sophisticated reasoning in AI.
Open Source Accelerates Progress

DeepSeek’s open-source models invite collaboration and innovation, speeding up advancements in reasoning AI.
Competitive Results Validate the Approach

With performance rivaling OpenAI’s o1-1217 on reasoning and coding benchmarks, DeepSeek-R1 proved itself as a serious contender in the AI space.

Benchmark Analysis

General Knowledge Performance

Achieved 90.8% on MMLU (Pass@1), surpassing GPT-4 (88.5%) and Claude-3.5 (88.3%)
Exceptional performance on MMLU-Pro with 84.0%, significantly ahead of competitors
Strong showing on DROP with 92.2% F1 score, outperforming all tested models

Mathematical Reasoning

Demonstrated remarkable mathematical abilities:
- MATH-500: 97.3% (versus OpenAI o1-1217's 96.4%)
- AIME 2024: 79.8% (nearly matching o1-1217's 79.2%)
- CNMO 2024: 78.8% (significantly higher than GPT-4's 43.2%)

Coding Capabilities

Achieved elite-level performance on Codeforces:
- 2029 rating (96.3 percentile)
- Nearly matched OpenAI o1-1217's 2061 rating
Strong performance on LiveCodeBench with 65.9% Pass@1-COT
Solid results on SWE Verified tasks at 49.2% resolution rate

Multilingual Understanding

Demonstrated strong Chinese language capabilities:
- C-Eval: 91.8%
- CLUEWSC: 92.8%
- C-SimpleQA: 63.7%
Outperformed most competitors in Chinese-language tasks

DeepSeek-R1 represents a landmark achievement in AI development, demonstrating that sophisticated reasoning capabilities can be achieved through innovative RL approaches without requiring massive computational resources. By combining GRPO with cold-start training and successful distillation strategies, DeepSeek has not only matched industry leaders but also made these capabilities more accessible to the broader AI community.

The success of DeepSeek-R1 suggests a promising future where advanced AI reasoning becomes more democratized. As the field continues to evolve, the lessons learned from DeepSeek's approach—particularly around RL training stability, model distillation, and open-source collaboration—will likely shape the next generation of AI development.