Varun Gujarathi

Posted on Oct 14

MCP Streaming HTTP Deep Dive

#ai #architecture #networking

How the Model Context Protocol Works Under the Hood

The Model Context Protocol (MCP) defines how AI clients and servers talk to each other. It’s designed to be simple: everything happens over HTTP, but with a few modern twists that make it efficient for both request/response and streaming use cases.

At its core, MCP runs on Streamable HTTP — a unified way for clients to send requests and receive either regular HTTP responses or streamed data using Server-Sent Events (SSE).

This post explains how that works, step by step.

1. The Big Picture

Think of MCP as HTTP plus streaming.

A single endpoint (usually /mcp) supports two verbs:

POST — to send a request
GET — to open a streaming connection (optional)

The server can respond in one of two ways:

A normal JSON HTTP response
A streamed SSE response, where messages arrive as a sequence of data: events

So the client doesn’t need to care whether the response is immediate or streamed — the same endpoint handles both.

2. Initializing the Session

Before any work starts, the client initializes a session with the server.

This is done by sending an InitializeRequest to /mcp via HTTP POST.

import requests

url = "http://localhost:3000/mcp"
payload = {
    "type": "initialize",
    "version": "1.0",
    "capabilities": {
        "streaming": True,
        "notifications": True
    }
}

response = requests.post(url, json=payload)
print(response.json())

The server replies with an InitializeResponse, which includes information about its capabilities and configuration.

This is the point where capability negotiation happens — both sides agree on what features they support (e.g., streaming, notifications, or specific tool APIs).

3. Notifications and Client Capabilities

After the session is established, the client can send a notification/initialized message.

This signals that the client is ready to start exchanging regular MCP messages.

notification = {
    "type": "notification/initialized"
}
requests.post(url, json=notification)

If accepted, the server returns 202 Accepted, and the connection is ready for normal use.

Notifications are also how servers can asynchronously inform the client about changes — for example, progress updates or state changes. These can arrive via streaming responses or through explicit poll requests, depending on the server’s configuration.

4. Sending Requests and Getting Responses

Once the session is live, the client can send any MCP request (for example, asking for tool execution, resource data, or context retrieval).

All these are sent to the same /mcp endpoint as POST requests.

task_request = {
    "type": "request",
    "method": "tools/run",
    "params": {"tool": "summarize", "input": "Example text"}
}

r = requests.post(url, json=task_request)
print(r.json())

The server now has two choices:

Return a standard HTTP 200 JSON response (simple case)
Return a streaming SSE response (for progressive results or long-running tasks)

5. Streaming with SSE

In MCP, streaming uses Server-Sent Events (SSE), but it’s integrated into the HTTP layer — not a separate system.

The server decides when to stream. If it wants to push incremental updates, it replies with the header:

Content-Type: text/event-stream

Each event carries part of the response as a data: block.

Here’s how a client can open and read an SSE stream using Python:

import sseclient
import requests

url = "http://localhost:3000/mcp"
payload = {"type": "request", "method": "tools/run", "params": {"tool": "analyze"}}

# POST the request that triggers streaming
response = requests.post(url, json=payload, stream=True)

client = sseclient.SSEClient(response)
for event in client.events():
    print(event.data)

6. Client-Initiated Streaming

Sometimes the client wants to keep a live connection open to receive notifications or progress updates.

To do this, it can explicitly open an SSE connection with a GET /mcp request.

This is optional — servers may return 405 Method Not Allowed if they don’t support it.

import sseclient
import requests

url = "http://localhost:3000/mcp"
response = requests.get(url, stream=True)

client = sseclient.SSEClient(response)
for event in client.events():
    print(event.data)

This mechanism lets clients subscribe to server events — such as updates or new resource availability — without polling.

7. Streamable HTTP in Context

Older versions of MCP used separate endpoints (/messages for POST and /sse for streaming).

The new Streamable HTTP model unifies everything under one endpoint, keeping things simpler and more reliable.

Key properties of Streamable HTTP:

Stateless at the transport layer (each request is independent)
Session-aware at the protocol level (via initialize)
Compatible with regular HTTP servers
SSE support for streaming, but optional

8. When to Use Streaming vs. Regular HTTP

Situation	Recommended Transport
Short, simple operations	Regular HTTP
Long-running or incremental responses	SSE streaming
Continuous updates or notifications	Client-initiated SSE (`GET /mcp`)

In short:

Use HTTP when you expect a quick, single response.
Use SSE when you need to see progress or partial results.

MCP lets both coexist cleanly.

9. Wrapping Up

Under the hood, MCP is elegantly simple:

It’s all HTTP.
It adds optional streaming through SSE.
Sessions and capabilities are negotiated up front.
Everything lives on one /mcp endpoint.

That combination makes it easy to implement, debug, and extend — whether you’re building an agent client or hosting your own MCP-compatible server.

DEV Community