Gokul S

Posted on Jan 2 • Edited on Feb 4

Model Context Protocol

#architecture #agents #llm #nlp

The widespread deployment of Large Language Models (LLMs) had lead to a fundamental shift in software architecture, moving from deterministic, rule-based systems to probabilistic, reasoning-based agents. However, this transition has been hampered by a critical infrastructure gap: the inability of these intelligent models to seamlessly access the proprietary, dynamic, and stateful data that resides within enterprise ecosystems. As organizations attempt to integrate models—whether proprietary systems like Anthropic’s Claude and OpenAI’s GPT-4, or open weights models hosted on Hugging Face—into their operational workflows, they encounter the "Context Gap." This phenomenon, where intelligence is isolated from data, results in hallucinations, operational inefficiencies, and a fragmented landscape of brittle, bespoke integrations.

The Model Context Protocol (MCP) has emerged as the definitive industry response to this crisis. By standardizing the interface between AI hosts and data sources, MCP replaces the exponential complexity of the "M × N" integration problem with a linear "M + N" solution. Functioning analogously to a "USB-C port for AI," MCP provides a universal, open protocol that governs how AI assistants discover, connect to, and securely utilize external context.

This blog provides an introduction to Model Context Protocol. It traces the protocol’s origins from the fragmentation of early AI plugins to its current status as an open standard. It also discussses MCP architecture, detailing the JSON-RPC message flows, transport layers, and the three core primitives: Tools, Resources, and Prompts. At the end, we see practical implementation, specifically focusing on the integration of MCP with the Hugging Face ecosystem and the smolagents framework.

The Contextual Connectivity Crisis

To appreciate the necessity of the Model Context Protocol, one must first analyze the historical trajectory of Large Language Model integration. The initial generation of LLMs, exemplified by GPT-3, were immensely capable reasoning engines, trained on vast snapshots of the public internet, yet they were fundamentally severed from the immediate reality of the user. They possessed encyclopedic knowledge of history up to their training cutoff but had zero awareness of a user’s current calendar, the state of a production database, or the content of a local file system.This isolation created two distinct classes of failure modes in early AI adoption:

Hallucination via Deprivation: When asked questions requiring specific, non-public knowledge (e.g., "What is the status of ticket #402?"), models would often fabricate plausible but incorrect answers based on statistical likelihood rather than factual retrieval.
Context Assembly Bottleneck: To mitigate hallucinations, users were forced into a manual workflow known as "Context Stuffing." This involved manually retrieving data from disparate sources (copying rows from a spreadsheet, exporting logs), sanitizing it, and pasting it into the model’s context window. This process effectively relegated the human user to the role of a data fetcher, negating the efficiency gains promised by AI.

As the industry moved toward Retrieval-Augmented Generation (RAG) and agentic workflows, developers began building "connectors"—software bridges linking models to APIs. However, in the absence of a standard, every connection was a bespoke engineering effort. A developer building a coding assistant had to write specific code to talk to GitHub. If they wanted to switch the underlying model from GPT-4 to Claude, or if they wanted to add support for GitLab, they had to rewrite the integration layer. This lack of standardization led to a fragmented ecosystem where data was siloed, and agents were limited to the specific tools hard-coded by their creators

The "M × N" Integration Problem

The systemic failure of the pre-MCP landscape is best described mathematically as the M × N Integration Problem. In a thriving digital economy, there exists a growing set of AI applications (Clients or Hosts), denoted as set N. Simultaneously, there is a vast and expanding set of data sources and services (Servers), denoted as set M.

In a non-standardized environment, connecting these two sets requires a unique bridge for every pair. Let N = The number of AI clients (e.g., Claude Desktop, Cursor, VS Code, Zed, custom enterprise dashboards). Let M = The number of data sources (e.g., Google Drive, Slack, PostgreSQL, Linear, Salesforce, local file systems). The total number of unique integrations required to achieve full connectivity is the product M x N.

Here, complexity growth is exponential and maintenance burden is High. If a single API in set M changes its schema, every one of the N clients interacting with it must be updated. This dynamic creates a "Winner-Take-All" effect where only the largest data sources (like Google Workspace) get integrated into the largest AI clients, leaving long-tail business tools and specialized enterprise data effectively invisible to AI agents. It stifles innovation by raising the barrier to entry: a new AI client startup cannot merely build a better reasoning engine; they must also replicate the thousands of integrations that incumbent players possess.

MCP Solution: Linearizing Complexity

The Model Context Protocol fundamentally alters this topology by introducing a universal interface. Instead of connecting directly to each other, both clients and servers connect to the protocol standard. AI Clients (N) implement the MCP Host standard once. Data Sources (M) implement the MCP Server standard once. The total number of integrations required becomes the sum M + N.

Here, complexity growth is linear and interoperability is universal. Under this paradigm, a developer creates an MCP Server for an internal SQL database. Immediately, that database becomes accessible to Claude Desktop, the Cursor IDE, and any other MCP-compliant tool, without writing a single line of client-specific glue code. This shift is analogous to the standardization of physical ports. Before USB, peripherals required parallel ports, serial ports, and PS/2 ports. USB provided a single protocol that allowed any device to connect to any computer. MCP is the "USB-C for AI," standardizing the flow of context and capabilities

Architectural Fundamentals of MCP

The architecture of MCP is designed to be robust, secure, and transport-agnostic. It eschews the typical monolithic API gateway approach in favor of a decentralized, client-host-server model that respects the boundaries of local and remote computing.

The Three-Tiered Topology

The protocol defines three distinct roles, each with specific responsibilities in the context lifecycle. Understanding these roles is crucial for developers implementing MCP, as the separation of concerns is strict.

The MCP Host

The Host is the "brain" of the operation. It is the user-facing application where the Large Language Model (LLM) resides (or is accessed). Examples include the Claude Desktop App, an AI-integrated IDE like VS Code, or a custom chat interface built with smolagents.

Role: The Host acts as the orchestrator. It manages the user interface, maintains the conversation history, and decides when to utilize external context.
Security Responsibility: The Host is the gatekeeper. It is responsible for authorization—asking the user, "Do you want to allow this agent to read your file system?"—and for managing the lifecycle of connections to various servers.
Context Aggregation: A single Host can connect to multiple Servers simultaneously. It aggregates the capabilities (tools, resources) from all connected servers and presents a unified list to the LLM.

The MCP Client

The Client is the protocol implementation layer that resides inside the Host application. It is not a standalone app but a library or module responsible for speaking the language of MCP.

Role: The Client maintains a 1:1 connection with a specific MCP Server. If a Host connects to five servers, it instantiates five distinct Client objects.
Functionality: It handles the handshake negotiation, serializes requests into JSON-RPC format, deserializes responses, and manages bi-directional communication channels (such as handling incoming notifications from a server).

The MCP Server
The Server is the gateway to the data. It is a standalone process or service that wraps a specific data source or toolset.

Role: The Server exposes "Primitives" (Tools, Resources, Prompts) to the Client. Crucially, the Server is model-agnostic. It does not know which LLM is on the other end; it effectively says, "Here are the files I can read, and here is the database I can query."
Flexibility: Servers can be lightweight scripts (e.g., a 50-line Python script wrapping a calculator) or complex applications (e.g., a persistent service indexing a terabyte-scale documentation repository).

The Protocol Layer: JSON-RPC

At the wire level, MCP relies on JSON-RPC 2.0, a stateless, lightweight remote procedure call protocol. This choice distinguishes MCP from REST or GraphQL architectures. Unlike REST, which is resource-oriented (GET/POST on URLs), JSON-RPC is action-oriented. It maps perfectly to the concept of "calling a tool" or "executing a function." It supports bidirectional communication, allowing the Server to push notifications (e.g., "The file you are watching has changed") to the Client without polling. Every message is a JSON object containing a jsonrpc version, an id (for request/response correlation), a method (e.g., tools/list), and params. This simplicity makes debugging easy, as messages are human-readable.

Transport Mechanisms
MCP is designed to work in diverse environments, from a single laptop to a distributed cloud cluster. To achieve this, it supports pluggable transport layers.

*Stdio Transport *

This is the default transport for desktop applications. The Host spawns the Server as a subprocess. Communication occurs over the standard input (stdin) and standard output (stdout) streams.

Advantages:

Security: Data never leaves the machine. It flows directly between processes via the operating system's pipes. There is no network port opened, reducing the attack surface.
Simplicity: No authentication handshake is needed for the transport itself, as the OS handles process ownership.
Performance: Extremely low latency.
Use Case: A user running Claude Desktop wants to let the AI edit files on their local hard drive. The "Filesystem MCP Server" runs as a local subprocess.

SSE Transport

This transport is used for connecting to remote services or containerized environments. It uses Server-Sent Events (SSE) for server-to-client messages and standard HTTP POST requests for client-to-server messages.

Advantages:

Scalability: Allows a centralized MCP server to serve multiple clients over a network.
Compatibility: Works over standard web infrastructure (firewalls, load balancers).
Use Case: An enterprise deploys a "Salesforce MCP Server" in their secure cloud VPC. Employees' local AI agents connect to this remote URL to query customer data safely.

Lifecycle of a Connection

MCP is a stateful protocol, meaning a connection has a defined beginning, middle, and end. The lifecycle ensures that both parties agree on capabilities before exchanging data.

Initialization (The Handshake):

The Client sends an initialize request containing its protocol version and capabilities (e.g., "I support sampling").
The Server responds with its own capabilities (e.g., "I have tools and resources") and protocol version.
If versions are compatible, the connection proceeds. This negotiation allows the protocol to evolve without breaking backward compatibility.

Operation:

The Client can discover capabilities (tools/list, resources/list).
The Client can invoke capabilities (tools/call).
The Server can send notifications (notifications/message).

Termination:

The connection is closed, and resources are released. In Stdio transport, terminating the Host process automatically kills the Server subprocesses.

Primitives of MCP

The core value of MCP lies in its three primitives: Tools, Resources, and Prompts. These abstractions provide a structured language for the LLM to interact with the world. Tools represent the executable capabilities of the server. They are the mechanism by which an AI agent takes action or performs complex queries. A Tool is defined by a unique name (e.g., git_commit), a description (e.g., "Commits staged changes to the repository"), and an inputSchema. Resources represent passive, read-only data. They are distinct from tools in that reading a resource is expected to be side-effect free. Prompts are reusable templates or workflows baked into the server. They solve the "blank page problem" for users and agents. A Prompt can accept arguments and return a list of messages (User/Assistant roles) to be inserted into the conversation context.

Developing an MCP Server with Python

The MCP ecosystem provides SDKs for TypeScript/Node.js and Python. We use the Python SDK, specifically leveraging the fastmcp library, which provides a high-level, decorator-based developer experience similar to FastAPI.

Developing an MCP server requires Python 3.10+ and the fastmcp package.

# Install the FastMCP library
pip install fastmcp

We will construct a comprehensive "Financial Analytics" server. This server will provide tools to calculate compound interest and resources to access market data.

from fastmcp import FastMCP

# Initialize the server with a descriptive name.
# This name is used for logging and identification.
mcp = FastMCP(name="Financial Analytics Server")

We define a tool to calculate compound interest.

@mcp.tool
def calculate_compound_interest(principal: float, rate: float, years: int) -> str:
    """
    Calculates the future value of an investment using compound interest.

    Args:
        principal: The initial amount of money.
        rate: The annual interest rate (as a decimal, e.g., 0.05 for 5%).
        years: The number of years the money is invested.
    """
    amount = principal * (1 + rate) ** years
    return f"The future value after {years} years is ${amount:.2f}"

The @mcp.tool decorator introspects the function signature. It identifies that principal is a float. If an LLM tries to call this tool with a string "five hundred," the MCP validation layer will reject the request with a precise error message, preventing the Python function from ever crashing due to type errors.

Next, we need to expose a resource representing a stock price. In a real application, this would fetch from an API. Finally, we make the server runnable.

if __name__ == "__main__":
    # Runs the server using the default Stdio transport
    mcp.run()

This block allows the script to be executed directly. When run, it will listen on stdin for JSON-RPC messages. To debug, one can use the mcp-inspector tool, a web-based debugger provided by Anthropic.

Integration with Hugging Face Ecosystem

The Hugging Face ecosystem has embraced MCP as a key enabler for its agentic frameworks. The smolagents library is a minimalist, code-first agent framework that integrates seamlessly with MCP, allowing open-source models (like Llama 3 or Qwen) to utilize the same tools as proprietary models.

smolagents differentiates itself by prioritizing Code Agents. While traditional agents output JSON blobs to call tools, smolagents Code Agents generate executable Python code.

Advantage: Code allows for loops, logic, and variable assignment within the reasoning step, offering greater expressivity than simple JSON function calling.

MCP Integration: To support this, smolagents wraps MCP tools in Python functions, allowing the LLM to "import" and call them as if they were local libraries.

Now, we will build an agent that connects to our Financial Analytics server using smolagents.

First, we install the required libraries.

pip install smolagents "huggingface_hub[mcp]" mcp

The bridge between MCP and smolagents is the ToolCollection class (and the underlying MCPClient). It connects to the server, fetches the tool list, and dynamically constructs Python tool objects.

First, we define how to connect to our server. Since our server is a local Python script, we use StdioServerParameters.

from mcp import StdioServerParameters
import os

# Define the connection to the local Python server script
financial_server_params = StdioServerParameters(
    command="python",  # The executable
    args=["path/to/financial_server.py"],  # Arguments
    env={"PYTHONUNBUFFERED": "1", **os.environ} # Environment variables
)

from smolagents import CodeAgent, HfApiModel, ToolCollection

# The 'with' block manages the lifecycle of the connection (Initialize -> Use -> Shutdown)
with ToolCollection.from_mcp(financial_server_params) as tool_collection:

    # 1. Initialize the Model
    # We use a robust open-weights model capable of code generation
    model = HfApiModel(
        model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
        provider="hf-inference"
    )

    # 2. Instantiate the Agent
    # We pass the tools loaded from the MCP server.
    # 'add_base_tools=True' gives the agent basic capabilities like web search if configured.
    agent = CodeAgent(
        tools=[*tool_collection.tools], 
        model=model,
        add_base_tools=True 
    )

    # 3. Execution
    print("Agent initialized. Running query...")

    query = """
    Check the current price of Apple stock. 
    Then, assume a 7% annual return rate. 
    Calculate how much investment of 100 euros will be worth in 10 years.
    """

    response = agent.run(query)
    print(f"Final Answer: {response}")

The ToolCollection context manager spawns python financial_server.py. It performs the MCP handshake. The client calls tools/list. The server returns metadata for calculate_compound_interest and get_stock_price. The smolagents constructs a system prompt for the Qwen model. It describes the available tools: calculate_compound_interest(principal: float,...) and
get_stock_price(symbol: str). The Qwen model analyzes the query and generates a Python plan. The smolagents runtime executes this code. When get_stock_price is called, the library intercepts the call, converts it to a JSON-RPC message (tools/call), and sends it to the MCP server process. The MCP server computes the result and sends it back. The Python script continues. The agent returns the final calculation to the user.