Daniel Shashko

Posted on Nov 3

MCP Server Architecture: A Developer's Guide

#mcp #ai #programming #architecture

The Model Context Protocol lets AI applications like Claude Desktop pull in context from different data sources. Think of it as a way to connect your AI to databases, APIs, file systems, and other tools without writing custom integrations for each one.

This guide focuses on how MCP actually works. If you want to start building, check out the language-specific SDK docs instead.

What's Actually in MCP

MCP is really four things:

First, there's the specification that defines how clients and servers should talk to each other. Then you've got SDKs in different languages that handle the protocol details for you. The MCP Inspector helps you test and debug your servers during development. And finally, there are reference server implementations you can learn from or use directly.

One thing to understand upfront: MCP only handles context exchange. It doesn't care how your AI application uses that context or which language model you're running. That's your problem to solve.

How MCP Components Fit Together

MCP uses a straightforward client-server setup. An AI application (the MCP host) creates separate client connections to each server it wants to talk to. Every connection is one-to-one, so if you connect to three servers, you're managing three separate client instances.

The main players are:

MCP Host - This is your AI application. It could be Claude Desktop, VS Code, or anything else that needs context from external sources. The host coordinates multiple client connections.

MCP Client - Each client maintains one connection to one server. It's basically a messenger that fetches context and passes it back to the host.

MCP Server - A program that serves up context. Could be running locally on your machine or hosted remotely somewhere.

Here's a real example: VS Code is the host. When you connect it to the Sentry MCP server, VS Code spins up a client object just for that connection. Connect to the filesystem server too, and you get a second client object. Two servers, two clients, clean separation.

graph TB
    subgraph "MCP Host (AI Application)"
        Client1["MCP Client 1"]
        Client2["MCP Client 2"]
        Client3["MCP Client 3"]
    end

    Server1["MCP Server 1<br/>(e.g., Sentry)"]
    Server2["MCP Server 2<br/>(e.g., Filesystem)"]
    Server3["MCP Server 3<br/>(e.g., Database)"]

    Client1 ---|"One-to-one<br/>connection"| Server1
    Client2 ---|"One-to-one<br/>connection"| Server2
    Client3 ---|"One-to-one<br/>connection"| Server3

Quick terminology note: we call them MCP servers regardless of where they run. A server running on your laptop using standard input/output? That's a local server. A server running on Sentry's infrastructure using HTTP? That's a remote server. Both are just MCP servers.

The Two Layers

MCP splits responsibilities between two layers:

Data Layer - This is the protocol itself, built on JSON-RPC 2.0. It defines the message formats for everything: initializing connections, discovering tools, reading resources, handling prompts, and pushing notifications.

Transport Layer - This is how messages actually move between client and server. Could be stdio (standard input/output) for local processes or HTTP for remote servers. The transport handles connection setup, message framing, and authentication.

Think of the data layer as what you're saying, and the transport layer as how you're saying it. The same JSON-RPC messages work across both transport types.

Transport Options

You've got two ways to move data around:

Stdio Transport - Uses standard input and output streams. Perfect for local servers because there's no network overhead. When Claude Desktop launches the filesystem server, it uses stdio. Fast, simple, stays on your machine.

Streamable HTTP Transport - Uses regular HTTP POST requests for client-to-server messages. Optionally adds Server-Sent Events for streaming updates from server to client. This is how remote servers work. Supports standard HTTP auth: bearer tokens, API keys, custom headers. MCP recommends OAuth for getting those tokens.

The nice thing about having two transport options is you can use the same server code locally during development (stdio) and deploy it remotely in production (HTTP) without changing the core logic.

What You Actually Share: The Data Layer

This is the interesting part. The data layer defines how servers share context with AI applications. It uses JSON-RPC 2.0 for all communication. Clients send requests, servers respond. If no response is needed, you send a notification instead.

MCP is stateful (mostly - you can make parts of it stateless with HTTP). That means you need to manage the connection lifecycle. Both sides negotiate what features they support during initialization. We'll walk through a real example in a minute.

MCP Primitives: The Core Concepts

Primitives are the heart of MCP. They define what clients and servers can offer each other. Three main types that servers expose:

Tools - Functions the AI can call to do things. Query a database, hit an API, write a file. Tools are actions.

Resources - Data the AI can read for context. File contents, database schemas, API responses. Resources are information.

Prompts - Templates that structure how the AI interacts with everything. System prompts, few-shot examples, conversation starters. Prompts are guidance.

Each primitive type has methods for listing (*/list), getting details (*/get), and sometimes executing (tools/call). Clients use */list to discover what's available, and those lists can change dynamically. That's important - a database server might expose different tools depending on which database is connected.

Here's a concrete example: you build an MCP server for Postgres. It exposes tools for running queries, resources containing your table schemas, and prompts with examples of good queries for your specific schema. The AI application can discover all of this through the list methods, then use it as needed.

Servers can also use primitives that clients expose:

Sampling - Ask the client's AI to generate completions. Useful when you need language model access but don't want to bundle an LLM SDK in your server. Call sampling/complete and let the client handle it.

Elicitation - Request more information from users. Need confirmation before deleting something? Want clarification on a query? Use elicitation/request to ask.

Logging - Send debug and monitoring messages back to the client. Helps you understand what's happening without cluttering the main data flow.

Real-Time Updates

MCP supports push notifications so servers can tell clients when things change. Your available tools change? Send a notification. A monitored resource updates? Push it to the client. Notifications use JSON-RPC 2.0's notification format (no response expected) and keep everything synchronized without constant polling.

Walking Through a Real Interaction

Let's see how this actually works with a step-by-step example. We'll show the JSON-RPC messages going back and forth.

Step 1: Initializing the Connection

First, the client and server need to agree on what they can do. The client sends an initialize request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-06-18",
    "capabilities": {
      "elicitation": {}
    },
    "clientInfo": {
      "name": "example-client",
      "version": "1.0.0"
    }
  }
}

The server responds with its own capabilities:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-06-18",
    "capabilities": {
      "tools": {
        "listChanged": true
      },
      "resources": {}
    },
    "serverInfo": {
      "name": "example-server",
      "version": "1.0.0"
    }
  }
}

What's happening here? The protocol version ensures compatibility. If they can't agree on a version, they should disconnect. The capabilities object is crucial - it's how each side declares what features it supports.

In this exchange, the client says "I can handle elicitation requests" (meaning it can prompt users for input). The server says "I've got tools and they might change, plus I can serve resources." That listChanged: true flag means the server will send notifications when its tool list updates.

The info objects are mainly for debugging and logging. After this succeeds, the client sends a quick notification saying it's ready:

{
  "jsonrpc": "2.0",
  "method": "notifications/initialized"
}

Behind the scenes in your AI application, this initialization happens for each server you connect to. The app stores those capabilities and uses them to route requests correctly.

Step 2: Discovering Tools

Now the client wants to know what tools are available. It sends a simple list request:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/list"
}

The server responds with detailed information about each tool:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "calculator_arithmetic",
        "title": "Calculator",
        "description": "Perform mathematical calculations including basic arithmetic, trigonometric functions, and algebraic operations",
        "inputSchema": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Mathematical expression to evaluate (e.g., '2 + 3 * 4', 'sin(30)', 'sqrt(16)')"
            }
          },
          "required": ["expression"]
        }
      },
      {
        "name": "weather_current",
        "title": "Weather Information",
        "description": "Get current weather information for any location worldwide",
        "inputSchema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name, address, or coordinates (latitude,longitude)"
            },
            "units": {
              "type": "string",
              "enum": ["metric", "imperial", "kelvin"],
              "description": "Temperature units to use in response",
              "default": "metric"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }
}

Each tool includes everything the client needs to use it. The name is a unique identifier. The description explains what it does. The inputSchema uses JSON Schema to define exactly what parameters it expects, which ones are required, and what types they should be. This lets clients validate inputs before sending them and helps language models understand how to call the tools correctly.

Your AI application fetches these tool lists from all connected servers and builds a unified registry. When the language model needs to do something, it can see all available tools and pick the right one.

Step 3: Calling a Tool

Let's actually use one of those tools. The client sends a call request with the tool name and arguments:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "weather_current",
    "arguments": {
      "location": "San Francisco",
      "units": "imperial"
    }
  }
}

The name has to match exactly what was in the discovery response. The arguments need to conform to that inputSchema we saw earlier. Here we're providing both the required location and the optional units parameter.

The server executes the tool and returns results:

{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Current weather in San Francisco: 68°F, partly cloudy with light winds from the west at 8 mph. Humidity: 65%"
      }
    ]
  }
}

The response uses a content array. This gives you flexibility - you could return text, images, or references to other resources. Each content object has a type field to identify what kind of data it contains. Here it's just text, but MCP supports richer response formats.

When your language model decides to use a tool during a conversation, your AI app intercepts that call, routes it to the right MCP server, executes it, and feeds the results back to the LLM. This is how the AI can access real-time data and take actions in the real world.

Step 4: Handling Real-Time Updates

Remember how the server said listChanged: true during initialization? That means it can send notifications when its tools change. Here's what that looks like:

{
  "jsonrpc": "2.0",
  "method": "notifications/tools/list_changed"
}

Notice there's no id field. That's because this is a notification - no response expected. The server is just saying "hey, my tools changed, you should check again."

When the client gets this, it typically responds by requesting an updated tool list:

{
  "jsonrpc": "2.0",
  "id": 4,
  "method": "tools/list"
}

This notification pattern is really useful. Tools might appear or disappear based on server state, user permissions, or external dependencies. Instead of constantly polling for changes, clients get notified exactly when updates happen. Keeps everything in sync efficiently.

This works for other primitives too, not just tools. You can get notifications about changed resources, updated prompts, anything that makes sense for your use case.

Your AI application handles these notifications by refreshing its internal registry and updating what the language model can access. If a conversation is active, the LLM can immediately start using new tools as they become available.

Wrapping Up

That's the core flow: initialize, discover, execute, stay synchronized. The actual protocol has more features (resources, prompts, sampling, elicitation), but this example covers the fundamental patterns you'll use most often.

DEV Community