DEV Community

Programming Central
Programming Central

Posted on

Beyond Function Calling: How the Model Context Protocol (MCP) Turns AI Agents into Self-Evolving Systems

Imagine building a highly skilled master craftsman. This craftsman possesses immense cognitive power—the ability to reason, plan, and decompose incredibly complex problems. But there’s a catch: they are locked in an empty, windowless room. They have no raw materials, no specialized tools, and no way to interact with the outside world. Their brilliant cognitive power remains entirely theoretical.

This is the state of most modern Large Language Models (LLMs). They are intellectual giants trapped in digital sensory deprivation chambers.

To break them out, we historically relied on hardcoded "tool calling" or custom API integrations. But anyone who has built production-grade AI agents knows the painful truth: hardcoded tool execution is brittle, monolithic, and incredibly difficult to scale. Every time you add a new tool, you risk confusing the model, breaking your prompts, or introducing critical security vulnerabilities.

A quiet revolution is underway to solve this once and for all. It is called the Model Context Protocol (MCP).

In this deep dive, we will explore how the Hermes Agent architecture implements MCP not just as a way to call tools, but as a universal, bidirectional, and standardized integration bus. We will look at the production-grade Python patterns that turn an isolated LLM into a modular, self-improving "system of systems."

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)


The Core Shift: From Tool User to Tool Weaver

To understand the Model Context Protocol, we must first discard the mental model of a simple function call. MCP is not an API endpoint; it is a standardized workshop interface.

It defines the exact specifications for every tool, every drawer, every power outlet, and every raw material bin in our craftsman's workshop. It doesn't matter if a tool is a simple local file writer or a complex browser automation suite hosted on a remote server. As long as it adheres to the MCP standard, the agent can pick it up and use it without any retraining.

This architectural shift achieves a clean separation of cognitive capability (the agent) from operational capability (the tools).

In the Hermes codebase, this separation is stark:

  • The AIAgent class is the craftsman. It doesn't know how to search the web, execute code, or read databases. It only knows how to reason and issue intent.
  • The orchestration layer (model_tools.py) acts as the "nervous system," translating the agent's intent into standardized protocol calls and routing them to the appropriate tool hosts.

This architecture stands on three core pillars: Standardized Schema Definition, Secure Client-Server Communication, and Closed-Loop Observability. Let's break down how each of these is implemented in production code.


Pillar 1: Standardized Schema Definition (The Contract for Action)

In traditional software engineering, we rely on rigid API contracts. In an agentic architecture, the contract must be understood by both machines and probabilistic neural networks.

Under MCP, this contract is a JSON Schema that serves three distinct purposes simultaneously:

  1. For the LLM (The Brain): The schema is injected into the system prompt. The LLM reads it to understand what tools are available, what parameters they require, and what format to output.
  2. For the Orchestrator (The Nervous System): The orchestrator uses the schema to validate the LLM's output before execution, intercepting hallucinations before they hit production systems.
  3. For the Tool (The Muscle): The target MCP server uses the schema to validate the incoming payload, creating a defense-in-depth security posture.

But static schemas are a recipe for failure. If you present a model with 100 tools at once, its reasoning capability degrades due to context distraction. The solution? Dynamic, context-aware schema generation.

Below is how Hermes dynamically computes tool definitions at runtime:

# model_tools.py - Dynamic, context-aware schema computation
def get_tool_definitions(
    enabled_toolsets: List[str] = None,
    disabled_toolsets: List[str] = None,
    quiet_mode: bool = False,
) -> List[Dict[str, Any]]:
    """
    Get tool definitions for model API calls with toolset-based filtering.
    All tools must be part of a toolset to be accessible.
    """
    # ... toolset resolution logic ...

    # Ask the registry for schemas (only returns tools whose check_fn passes)
    filtered_tools = registry.get_definitions(tools_to_include, quiet=quiet_mode)

    # Rebuild execute_code schema to only list sandbox tools that are actually available
    if "execute_code" in available_tool_names:
        sandbox_enabled = SANDBOX_ALLOWED_TOOLS & available_tool_names
        dynamic_schema = build_execute_code_schema(sandbox_enabled, mode=_get_execution_mode())
        # Replace static schema with the dynamically generated one
        for tool in filtered_tools:
            if tool["name"] == "execute_code":
                tool["parameter_schema"] = dynamic_schema
                break

    # Rebuild discord schemas based on bot's privileged gateway intents
    if discord_tool_name in available_tool_names:
        dynamic_schema = build_discord_schema_based_on_intents()
        # Replace static schema with dynamic one
        for tool in filtered_tools:
            if tool["name"] == discord_tool_name:
                tool["parameter_schema"] = dynamic_schema
                break

    return filtered_tools
Enter fullscreen mode Exit fullscreen mode

Why This Matters for Production

The schema is not a static document; it is a living contract. If the agent's code execution sandbox loses access to a specific library, the execute_code schema is instantly rebuilt to omit that capability. If a Discord bot lacks certain admin permissions, those tools vanish from the schema.

By dynamically tailoring the schema to the environment, you prevent the LLM from attempting impossible actions, dramatically cutting down on execution errors and wasted API tokens.

Defensive Programming at the Orchestrator Level

Even with perfect schemas, LLMs occasionally output malformed JSON (e.g., trailing commas, unclosed brackets, or Python-style None instead of JSON null). To maintain system reliability, the orchestrator must perform self-healing on the incoming data before validation:

# run_agent.py - Defensive schema enforcement
import re

def _repair_tool_call_arguments(raw_args: str, tool_name: str = "?") -> str:
    """Attempt to repair common LLM-generated malformed JSON arguments."""
    raw_stripped = raw_args.strip()

    # Fast-path: empty / whitespace-only -> empty object
    if not raw_stripped:
        return "{}"
    # Python-literal None -> normalize to {}
    if raw_stripped == "None":
        return "{}"

    fixed = raw_stripped
    # 1. Strip trailing commas before closing braces or brackets
    fixed = re.sub(r',\s*([}\]])', r'\1', fixed)
    # 2. Fix unescaped newlines inside string values
    # 3. Ensure balanced structural characters
    # ... additional robust repair logic ...

    return fixed
Enter fullscreen mode Exit fullscreen mode

By placing this validation and repair layer directly in the orchestrator, we prevent raw, malformed syntax from crashing the underlying tool servers.


Pillar 2: Secure Client-Server Communication (The Async Bridge)

MCP decouples the agent from its tools by running them in separate processes, containers, or even different machines. This separation provides:

  • Fault Isolation: A memory leak or crash in a heavy web-scraping tool cannot take down the core agent reasoning loop.
  • Language Agnosticism: Your agent can be written in Python, while a high-performance database tool runs in Go, and a browser automation tool runs in Node.js.
  • Resource Scaling: Heavy tools can be hosted on auto-scaling serverless infrastructure, while the agent runs on a lightweight control plane.

However, this introduces a major technical hurdle: the async impedance mismatch.

Modern LLM orchestrators often run in synchronous, multi-threaded environments (like CLI loops or synchronous web workers), while MCP servers are inherently asynchronous (relying on non-blocking network I/O, WebSockets, or subprocess pipes).

If you try to block an active async event loop from a sync context, you will quickly run into the dreaded RuntimeError: This event loop is already running or Event loop is closed errors.

To solve this, Hermes implements a robust asynchronous bridge that manages three distinct event loop strategies depending on the calling thread's state:

# model_tools.py - The Async Bridge
import asyncio
import threading
import concurrent.futures

def _run_async(coro):
    """Run an async coroutine safely from any synchronous context."""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None

    if loop and loop.is_running():
        # Scenario A: We are inside an active async context (e.g., FastAPI gateway).
        # We must offload the coroutine to a fresh background thread to avoid blocking.
        pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        future = pool.submit(_run_in_worker, coro)
        try:
            return future.result(timeout=300)
        except concurrent.futures.TimeoutError:
            # Gracefully cancel the coroutine inside its own worker loop
            _cancel_all_worker_tasks()
            raise
        finally:
            pool.shutdown(wait=False)

    # Scenario B: We are on a worker thread. Use a per-thread persistent event loop.
    if threading.current_thread() is not threading.main_thread():
        worker_loop = _get_worker_loop()
        return worker_loop.run_until_complete(coro)

    # Scenario C: We are on the main thread. Use a shared, persistent tool loop.
    tool_loop = _get_tool_loop()
    return tool_loop.run_until_complete(coro)
Enter fullscreen mode Exit fullscreen mode

Deconstructing the Loop Strategies

  1. The Background Thread Isolation (Scenario A): If the orchestrator is called from within an active async framework (like FastAPI or Sanic), we cannot block the main thread. We spawn a dedicated thread pool, run the coroutine to completion, and enforce a strict 300-second timeout to prevent runaway execution.
  2. Per-Thread Persistent Loops (Scenario B): In multi-threaded environments, creating and destroying event loops for every single tool call is incredibly expensive and leaks resources (especially with cached HTTP connections). By binding a persistent event loop to each worker thread, we reuse connections safely.
  3. Main Thread Shared Loop (Scenario C): For CLI-driven runs, a single shared persistent loop avoids thread-switching overhead entirely.

Pillar 3: Closed-Loop Observability (The Self-Improvement Loop)

The true magic of the Model Context Protocol is not just that it allows an agent to act, but that it enables the agent to learn from its actions. Every tool call is a telemetry event that feeds back into the agent's memory.

When the agent calls a tool, the orchestrator doesn't just return the raw string output. It measures execution latency, captures system logs, tracks resource consumption, and triggers hooks that modify the agent's internal state.

Here is how the central dispatch function handles this feedback loop:

# model_tools.py - Observability-Driven Tool Dispatch
import time

def handle_function_call(
    function_name: str,
    function_args: Dict[str, Any],
    task_id: Optional[str] = None,
    tool_call_id: Optional[str] = None,
    session_id: Optional[str] = None,
    # ... context variables ...
) -> str:
    # 1. Enforce argument coercion and validation against schema
    coerced_args = validate_and_coerce(function_name, function_args)

    # 2. Measure precise tool dispatch latency
    dispatch_start = time.monotonic()

    try:
        # Execute the tool via the registered MCP client
        result = registry.dispatch(function_name, coerced_args)
        is_error = False
    except Exception as e:
        result = str(e)
        is_error = True

    duration_ms = int((time.monotonic() - dispatch_start) * 1000)

    # 3. Fire post-execution hooks with performance and telemetry data
    invoke_hook(
        "post_tool_call",
        tool_name=function_name,
        args=coerced_args,
        result=result,
        duration_ms=duration_ms,
        failed=is_error
    )

    # 4. Allow registered plugins to sanitize or canonicalize the raw output
    hook_results = invoke_hook("transform_tool_result", tool_name=function_name, result=result)
    for hook_result in hook_results:
        if isinstance(hook_result, str):
            result = hook_result
            break

    return result
Enter fullscreen mode Exit fullscreen mode

This telemetry data doesn't just sit in a log file; it is consumed live by the agent to make strategic decisions:

  • Dynamic Budgeting: If a tool call to a code sandbox is fast and computationally cheap, the orchestrator refunds "iteration tokens" back to the agent, encouraging it to write and test code iteratively.
  • Context Compression: If a tool execution returns a massive payload (e.g., a 50KB scraped web page), the orchestrator intercepts the result, summarizes it, and compresses the context window before passing it back to the LLM.
  • Self-Healing Strategies: If a tool call fails with a timeout, the agent detects this via the failed flag and automatically attempts a fallback strategy (e.g., querying an alternate search index).

The Ouroboros Pattern: Agents Reviewing Agents

The pinnacle of this closed-loop observability is what we call the Ouroboros Pattern—an agent recursively using its own tools to review and optimize its own behavior.

In Hermes, when a main task is completed, the orchestrator spawns a background "Review Agent." This review agent is given access to a highly specialized subset of tools: memory and skills. It reads the transaction log of the conversation that just occurred, analyzes what went right and what went wrong, and writes new procedural knowledge directly back to the main agent's persistent memory.

# run_agent.py - The Ouroboros Self-Improvement Loop
def _spawn_background_review(self, messages_snapshot, review_memory, review_skills):
    """Spawn a background thread to review the conversation and save new skills/memories."""
    def _run_review():
        # Instantiate a clean, lightweight agent inheriting the parent's API runtime
        review_agent = AIAgent(
            model=self.model,
            max_iterations=16,
            quiet_mode=True,
            provider=self.provider,
            api_key=self.api_key,
            enabled_toolsets=["memory", "skills"],  # Restrict tools to memory writing
        )

        review_prompt = (
            "Analyze the conversation history. Extract key user preferences, "
            "successful code patterns, or tool execution failures. Use the "
            "provided tools to save these as persistent memories or skills."
        )

        # Run the review conversation in the background
        review_agent.run_conversation(
            user_message=review_prompt,
            conversation_history=messages_snapshot,
        )

        # Summarize actions taken during self-improvement
        actions = self._summarize_background_review_actions(review_agent.history)
        if actions:
            summary = " · ".join(dict.fromkeys(actions))
            self._safe_print(f"  💾 Self-improvement complete: {summary}")

    # Spawn off the main thread so the user never experiences latency
    threading.Thread(target=_run_review, daemon=True).start()
Enter fullscreen mode Exit fullscreen mode

This background review loop is completely non-blocking. While the user is reading the agent's response, a background thread is spinning up a separate context, evaluating the tool execution latency, and updating the agent's "Soul," "Memory," and "Skills" databases. On the very next prompt, the agent is already smarter, faster, and more aligned with the user's workflow.


The Architectural Blueprint

To visualize how these components interact, let's look at the flow of a single user interaction through this multi-layered architecture:

  1. The User Request enters the system.
  2. The Agent Core (LLM) analyzes the request. It references its current Memory Store and Skill Library (which were updated by previous background runs).
  3. The Agent decides to act. It looks at the Dynamic JSON Schemas provided by the orchestrator to construct a valid tool call.
  4. The Orchestrator catches the raw call, sanitizes it using the Defensive Parser, and routes it through the Async Bridge to ensure thread safety.
  5. The External MCP Server executes the physical action (e.g., running code, searching the web) and returns the result.
  6. The result passes through Observability Hooks, capturing execution times and success flags.
  7. The final result is returned to the Agent Core to complete the turn.
  8. Simultaneously, a background Review Agent analyzes the telemetry and updates the Memory Store and Skill Library, closing the loop.

This is the power of the MCP Revolution: action and learning are two sides of the same coin.


Conclusion: From Static to Dynamic AI

For years, developers treated AI agents like traditional software programs—writing rigid, hardcoded wrappers around API calls. The Model Context Protocol changes the paradigm.

By standardizing the communication layer, dynamically generating schemas, building robust async bridges, and hooking telemetry directly into self-improvement loops, we transition from building static tool users to deploying dynamic, self-evolving tool weavers.

If you are still writing custom wrapper functions for every API you want your LLM to use, it is time to step into the workshop. The tools are ready. The craftsman is waiting. It's time to build.


Let's Discuss

  1. Handling Malformed Outputs: In your own experience with LLMs, what are the most common ways models break tool-calling schemas (e.g., JSON syntax errors, parameter hallucinations), and how have you handled them?
  2. The Async/Sync Mismatch: Have you run into event loop collisions when integrating async tool frameworks (like Playwright or HTTP clients) into synchronous agent loops? How did you resolve the threading issues?

Leave a comment below with your thoughts and architectural approaches!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.