Yigit Konur

Posted on Jul 31 • Edited on Nov 24

Understanding Streamable HTTP (the new one after SSE) of MCP Server/Client Architecture (+vs SSE)

#mcp #llm #ai

If you've ever tried to build a truly interactive application, especially one that talks to a modern AI, you know the struggle. Simple request-response cycles feel clunky for long-running tasks. WebSockets are powerful but can be overkill and a headache to manage. So, how do you build something that feels as fluid and responsive as a native app, but over plain old HTTP?

I write these posts while creating LLM context for myself, double-checking to make sure they're hallucination-free. I'm adding them as search context for both LLMs and people working in niche fields — plus, I need them anyway. They weren't written entirely by me; they were created through my guidance combined with extensive deep research, merged with long-context LLMs. Just dropping this note here 🖖🏾

Today, we're going on a deep, deep dive into a protocol designed to solve this exact problem: the MCP Streamable HTTP protocol. We're not just scratching the surface. We're going to tear down the client and server implementations from the official TypeScript SDK, look at the code, and understand the "why" behind every design choice.

By the end of this (admittedly very long) post, you'll have a rock-solid understanding of how to build robust, stateful, and even fault-tolerant applications on top of HTTP. Let's get started!

Part 1: The Big Picture - Why This Protocol Even Exists

The Philosophy: More Than Just Requests and Responses

At its heart, the Streamable HTTP protocol is built on a clever, dual-channel model that uses standard HTTP methods in unconventional ways. Think of it like a restaurant's communication system:

The Command Channel (HTTP POST 🗣️): This is your direct line to the kitchen. You use it to place an order (send a request or notification). The magic here is that the waiter (the server) can talk back to you on that same line while your order is being prepared. They might give you progress updates ("The chef is searing the steak now!") or even partial results ("Here are your appetizers while you wait."). This is all handled within the response to your single POST request, which can itself be a stream of events.
The Announcement Channel (HTTP GET 📢): This is the restaurant's PA system. You tune in once (by making a long-lived GET request) and then you can hear any general announcements from the restaurant ("The special for tonight is..."). These are unsolicited, server-initiated events that aren't tied to any specific order you placed.

This design gives us the best of both worlds: the familiar, direct nature of POST for commands, and the persistent, low-latency nature of a GET-based Server-Sent Events (SSE) stream for asynchronous updates. The entire system is brought to life by the implementations in client/streamableHttp.ts and server/streamableHttp.ts.

The Brains of the Operation: The Abstract `Protocol` Class 🧠

Before we even get to the HTTP part, we need to understand the core logic layer: the abstract Protocol class found in shared/protocol.ts. Think of the HTTP transport layers as the plumbing (the pipes and wires), but this Protocol class is the brain that decides what flows through them. It handles the nitty-gritty of JSON-RPC 2.0 framing, request lifecycles, and reliability.

How Requests and Responses are Matched

When your application code calls client.request(...), how does it know which response belongs to it, especially when multiple requests are happening at once?

It all starts with a unique ID. The Protocol class maintains a counter, _requestMessageId, and assigns a new ID to every outgoing request. It then creates a Promise and cleverly stores its resolve and reject functions in a Map called _responseHandlers, using the message ID as the key.

Here's that critical piece of code. It's the moment the client makes a promise it intends to keep.

// From: @modelcontextprotocol/typescript-sdk/src/shared/protocol.ts

// The client is setting a trap. It's saying, "When a response with `messageId`
// arrives, execute this function to either resolve or reject my promise."
this._responseHandlers.set(messageId, (response) => {
  // First, check if the request was already cancelled by our side.
  if (options?.signal?.aborted) {
    return;
  }
  // If the response is an error, reject the promise.
  if (response instanceof Error) {
    return reject(response);
  }
  // If it's a success, parse the result against the expected schema and resolve!
  try {
    const result = resultSchema.parse(response.result);
    resolve(result);
  } catch (error) {
    // If the server's response doesn't match our expected shape, that's an error too.
    reject(error);
  }
});

When a message arrives from the server, the transport's onmessage handler passes it to the Protocol class, which acts as a triage nurse. If the message has a result or error field, it knows it's a response and calls _onresponse. This function is the other half of the trap: it grabs the ID from the response, finds the corresponding handler in _responseHandlers, and springs it, fulfilling the promise.

Handling In-Flight Cancellations Gracefully

What if a user gets impatient and wants to cancel a long-running operation? The protocol has a clean way to handle this using the standard AbortSignal.

The client application triggers an AbortSignal.
The request() method catches this, rejects its promise locally, and, crucially, sends a notifications/cancelled message to the server.
The server's Protocol instance has a pre-registered handler specifically for this notification. This handler looks up the task's AbortController (which it stored when the request first arrived) and calls .abort(), signaling the server-side code to stop its work.

Keeping Long-Running Tasks Alive with Timeouts

To prevent requests from hanging forever, the Protocol class has a smart timeout system. When a request is made, it starts a timer. The real magic, however, is in the resetTimeoutOnProgress option. For a long AI task, you don't want it to time out just because it's taking a while. If this option is true, every time the server sends a progress notification, the client resets the timeout timer. This ensures that as long as the server is showing signs of life, the client will wait patiently.

// From: @modelcontextprotocol/typescript-sdk/src/shared/protocol.ts

// This method is called when a progress notification arrives.
private _resetTimeout(messageId: number): boolean {
    const info = this._timeoutInfo.get(messageId);
    if (!info) return false;

    // It even checks against a `maxTotalTimeout` so it can't be extended forever.
    const totalElapsed = Date.now() - info.startTime;
    if (info.maxTotalTimeout && totalElapsed >= info.maxTotalTimeout) {
      this._timeoutInfo.delete(messageId);
      throw new McpError(/* ... */);
    }

    // Clear the old timer and start a new one!
    clearTimeout(info.timeoutId);
    info.timeoutId = setTimeout(info.onTimeout, info.timeout);
    return true;
}

Part 2: The Client's Perspective - Making the Connection

Now let's dive into the concrete client implementation in client/streamableHttp.ts.

The Client Handshake: Connection, Init, and Auth

A client's journey to a full connection is a precise dance:

The initialize POST: The first thing a client does is POST an initialize message. This is the formal handshake where the client tells the server who it is and what it can do.
The 202 Accepted Trigger: The server responds with an HTTP 202 Accepted. This is the signal! The client's send() method sees this and immediately knows it's time to open the second channel.
The Asynchronous GET: The client immediately calls _startOrAuthSse(), which fires off a long-lived GET request with an Accept: text/event-stream header. This is the client opening its ear for the server's PA system. If the server doesn't support this (and returns a 405 Method Not Allowed), the client gracefully carries on without it.
The Auth Flow: If at any point the server responds with 401 Unauthorized, the client's authProvider kicks in. It might try to refresh a token, or if it has no credentials, it will trigger redirectToAuthorization, sending the user off to log in. Once they return, the application calls finishAuth() to complete the OAuth2 flow and get the tokens needed to retry the connection.

The Client's Gateway: A Forensic Look at the `send()` Method

Every single message the client sends goes through the send() method via POST. The true genius of the client is how it interprets the response to this POST.

If status is 202 Accepted: This is the "message received, thanks" signal. If the message was initialize, this is the cue to start the SSE GET stream, as we saw above.
If status is 200 OK and Content-Type is application/json: This is a simple, synchronous-style response. The client parses the JSON and is done with this transaction.
If status is 200 OK and Content-Type is text/event-stream: This is where it gets really cool. The POST request's response itself is a stream. The client pipes this stream into _handleSseStream to process the progress updates and final result for that specific request.

This logic is the heart of the client's flexibility.

// From: @modelcontextprotocol/typescript-sdk/src/client/streamableHttp.ts

// This block in send() decides what to do based on the server's response.
if (response.status === 202) {
    // If the server accepted our initialization...
    if (isInitializedNotification(message)) {
      // ...it's time to open the general announcement (GET) channel!
      this._startOrAuthSse({ resumptionToken: undefined }).catch(err => this.onerror?.(err));
    }
    return;
}

const contentType = response.headers.get("content-type");

if (hasRequests) {
    if (contentType?.includes("text/event-stream")) {
        // The POST response is a stream! Handle it accordingly.
        this._handleSseStream(response.body, { onresumptiontoken }, false);
    } else if (contentType?.includes("application/json")) {
        // The POST response is a simple JSON object. Parse it.
        const data = await response.json();
        // ... process data ...
    }
}

From Bytes to Messages: Parsing Streams with `_handleSseStream`

This method is the designated parser for all SSE streams, whether from the main GET or a streaming POST. It sets up a beautiful, modern stream processing pipeline:

Raw Bytes (ReadableStream<Uint8Array>) → Decoded Text (TextDecoderStream) → Parsed Events (EventSourceParserStream)

It then reads from the end of this pipeline, taking the event.data (which is the JSON payload), parsing it, and passing it back to the Protocol layer's main onmessage callback for routing. Simple, efficient, and non-blocking.

// From: @modelcontextprotocol/typescript-sdk/src/client/streamableHttp.ts

// This is a masterclass in modern stream processing in JavaScript.
const reader = stream
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(new EventSourceParserStream())
  .getReader();

Surviving the Chaos: Session Management and Resumability 🛡️

This is where the protocol truly shines, providing statefulness and recovery from network failures.

Session Management: The client grabs the mcp-session-id header from the very first response and stores it. From then on, every subsequent request includes this header, telling the server, "Hey, it's me again."

Connection Resumability: This is the critical flow for fault tolerance.

Capture the Token: When handling a stream, if an SSE event has an id field, that's our resumption token! The client captures it as lastEventId and calls the onresumptiontoken callback so the application can save it somewhere safe (like localStorage).
Detect the Disconnect: If the network drops, the stream will error out. The catch block in _handleSseStream is triggered.
Schedule a Reconnect: Instead of giving up, the catch block calls _scheduleReconnection, which uses an exponential backoff delay to plan its next attempt.
Attempt Resumption: After the delay, it calls _startOrAuthSse again, but this time it passes in the lastEventId it saved.
Send the Magic Header: _startOrAuthSse then creates a new GET request, but with a special last-event-id header containing the token. This tells the server exactly where the client left off, allowing it to replay any missed messages.

It's a complete, closed-loop system for recovering from connection failures.

Part 3: The Server's Side of the Story

Now let's flip the table and look at the server implementation in server/streamableHttp.ts.

The Server's Front Door: `handleRequest` and the Transport Lifecycle

The simpleStreamableHttp.ts example server shows a beautiful pattern for managing stateful connections. It maintains a global transports map.

When a request comes in, it checks for an mcp-session-id header.
If the ID exists in the map, it reuses the existing StreamableHTTPServerTransport instance for that session. State is maintained!
If there's no ID but the message is initialize, it knows a new client is connecting. It creates a new transport instance. The key is the onsessioninitialized callback: once the new transport generates its session ID, this callback fires and saves the new transport into the global map.

This logic is the core of how the server manages multiple, distinct client sessions concurrently.

// From: @modelcontextprotocol/typescript-sdk/src/examples/server/simpleStreamableHttp.ts

// This logic from the example server is the key to stateful session management.
if (sessionId && transports[sessionId]) {
  // Found an existing session, let's reuse its transport.
  transport = transports[sessionId];
} else if (!sessionId && isInitializeRequest(req.body)) {
  // A new client is initializing!
  transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => randomUUID(),
    // This callback is the magic glue. It links the new session ID to its transport instance.
    onsessioninitialized: (newlyCreatedSessionId) => {
      console.log(`Session initialized with ID: ${newlyCreatedSessionId}`);
      transports[newlyCreatedSessionId] = transport;
    }
  });
  // ...
}

Intelligent Routing: The Server's `send()` Method

The server's send() method is the mirror image of the client's and is responsible for routing outbound messages to the correct channel. The deciding factor is the relatedRequestId.

Case 1: General Announcement. If send() is called for a notification without a relatedRequestId, the server knows it's a general, server-initiated event. It looks up the ServerResponse object for the long-lived GET stream and writes the message there.
Case 2: Specific Response. If send() is called for a message that is a response (it has an id) or has a relatedRequestId, the server knows it belongs to a specific POST transaction. It uses its internal mappings (_requestToStreamMapping) to find the exact ServerResponse object associated with that original POST and writes the message to that dedicated stream.

This ensures that progress updates for Tool A don't accidentally get sent to the response stream for Tool B.

Choose Your Weapon: Streaming SSE vs. Synchronous JSON

The server can respond in two ways, controlled by the enableJsonResponse option.

Streaming SSE (default): When a POST arrives, the server immediately sends back 200 OK with a text/event-stream content type. The connection is now an open stream, and the server can send events over it as they become available.
Synchronous JSON: If enableJsonResponse is true, the server holds its horses. It doesn't send any response right away. It buffers all the results for the request batch in memory. Only when the entire batch is complete does it send a single 200 OK with an application/json content type and the full JSON payload. This is perfect for simple tools where streaming is unnecessary.

Picking Up Where You Left Off: The `EventStore`

The server's half of the resumability feature is powered by the EventStore interface.

Storing Events: In the send() method, if an event store is configured, the server first calls eventStore.storeEvent(). This saves the message and returns a unique eventId. This ID is then sent as the id: field in the SSE message to the client.
Replaying Events: When a client reconnects with a last-event-id header, the server's handleGetRequest method catches it. It then calls eventStore.replayEventsAfter(), which fetches all the messages the client missed and sends them down the new connection, seamlessly restoring the client's state.

Part 4: Putting It All to Work: Practical Scenarios

So, what can you actually build with this?

Long-Running AI Tools: Imagine you're building a "Research Agent" tool. The user gives it a topic. The POST request is sent. The server can now stream back updates on the dedicated response stream: {"status": "Searching web..."}, {"status": "Found 10 sources, summarizing..."}, {"status": "Generating report..."}, followed by the final text. It's a long task made interactive.
Interactive User Input (Elicitation): Your AI needs the user's permission to access a file. It can send an elicitInput request over the general announcement (GET) channel. Your client app sees this, pops up a native "Allow Access?" dialog, and sends the yes/no answer back to the server. This is a fluid, two-way conversation.
Real-Time Dashboards: Imagine a server monitoring system resources. The server can have multiple client dashboards connected via the GET stream. Whenever CPU usage changes, the server just send()s a cpu_usage_changed notification, and all connected dashboards update in real-time.

SSE vs. Streamable HTTP: An Evolution in Design

You've almost certainly encountered Server-Sent Events (SSE). It's a fantastic, simple technology for pushing data from a server to a client. But the Streamable HTTP protocol looks like SSE and smells like SSE... yet it's not quite the same. So, are they the same thing? Is one better? Why is this new protocol necessary?

This section clears up that confusion. We'll explore how Streamable HTTP evolves the concepts of SSE to create a more powerful, robust, and truly bidirectional communication channel over standard HTTP.

The TL;DR: Two Phones vs. One Smartphone

Before we dive into the technical details, let's start with a simple analogy that captures the core difference.

Classic SSE (+ separate POSTs) is like using two separate, old-school phones:
- You have a landline phone (GET) that can only receive calls. The server holds this line open to talk to you whenever it wants.
- You have a payphone (POST) that can only make calls. Every time you want to say something to the server, you have to go to the payphone, make a call, say your piece, and hang up.
- This system is often asymmetric and requires extra work to correlate the incoming calls on the landline with the outgoing calls from the payphone.
Streamable HTTP is like a modern smartphone call:
- You make a single call (POST).
- On this one call, you can both talk to the server (by sending your request) and the server can talk back to you continuously (by streaming a response). It can even send you "text messages" (progress updates) during the call without interrupting the main conversation.
- You also have the option of opening a separate, "listen-only" channel (GET), like putting the server on speakerphone for background announcements, but it's not required for a two-way conversation.

This analogy captures the essence: classic SSE setups require two separate, asymmetric channels to achieve two-way communication, while the Streamable HTTP protocol can unify this into a single, more powerful HTTP transaction.

A Feature-by-Feature Protocol Showdown

Here, we'll break down the core concepts of real-time communication and compare how each protocol handles them.

1. The Connection & Communication Model

This is the most fundamental difference and the source of most of the architectural changes.

Attribute	"Classic" SSE-based Approach	Streamable HTTP
Primary Channel(s)	Two separate channels: 1. A persistent `GET` for server-to-client messages. 2. Separate, transient `POST`s for client-to-server messages.	Unified hybrid channel: A single `POST` can handle both the client's request and a streaming server-to-client response. A separate `GET` channel is optional for unsolicited server events.
The Handshake	Often ad-hoc & asymmetric. For example, the client connects via `GET`, then must wait for a custom event from the server to learn where to send its `POST`s.	Implicit & Flexible. The client sends an `initialize` `POST`. The server's response (`202 Accepted` or `200 OK`) dictates the next step. No custom handshake event is needed.
Flexibility	Rigid. The two-channel model is the only way it operates.	Highly Flexible. A server can choose to respond to a `POST` with a single JSON object (classic RPC) or a full event stream, depending on the nature of the request.

The key innovation here is that Streamable HTTP allows the response to a POST request to be, itself, a stream. This turns a traditionally one-shot request into a long-lived conversation scoped to a single transaction.

2. Session & State Management

How do the client and server keep track of who they're talking to?

Attribute	"Classic" SSE-based Approach	Streamable HTTP
Session Initiation	Often handled via query parameters. The session ID might be created by the server and sent back in a URL within a custom event.	Session ID is created by the server and sent back in a dedicated HTTP header (`mcp-session-id`).
Session Tracking	The client must parse the session ID and manually add it to subsequent `POST`s. The server needs an application-level map to link the `POST` back to the original `GET` stream.	The client simply reads the `mcp-session-id` header and adds it to all subsequent requests. The transport layer can handle the session mapping more cleanly.

The key takeaway here is that Streamable HTTP uses standard HTTP mechanisms (headers) for state management, which is cleaner and less burdensome on the application developer compared to ad-hoc solutions using query parameters and custom events.

3. Resumability & Reliability

What happens when your mobile network drops mid-request? This is where Streamable HTTP truly shines.

Attribute	"Classic" SSE-based Approach	Streamable HTTP
Connection Resumption	Not natively supported. The SSE standard itself has a `last-event-id` header, but a full protocol for replaying missed events across both `GET` and `POST` channels is not defined. If the `GET` stream is dropped, the client must typically start over.	First-class feature. This is one of the primary reasons for the protocol's existence.
Mechanism	N/A	Token-based. 1. Server sends an `id:` field with each SSE event. This is the resumption token. 2. Client persists the last seen token. 3. On reconnect, the client sends a `last-event-id` HTTP header. 4. Server uses a persistent `EventStore` to replay any missed messages.
Server-Side Requirement	N/A	Requires a pluggable `EventStore` component on the server to persist message history for replay, making the system fault-tolerant.

This makes applications built on Streamable HTTP incredibly resilient to the transient network issues common on mobile and unreliable networks.

4. Key Features & Use Cases

What kind of applications are each of these protocols best suited for?

Attribute	"Classic" SSE-based Approach	Streamable HTTP
Progress Updates	Clunky. The server can send notifications on the `GET` stream, but they aren't directly tied to the `POST` request that initiated the task. Correlating them requires extra logic.	Seamless. A long-running tool call is made via `POST`. The server can stream progress updates back in the response body of that same POST, keeping everything neatly scoped to a single transaction.
Interactive Elicitation	Possible, but awkward. The server would send a request on the `GET` stream. The client would respond with a new `POST`. The server then has to correlate that `POST` with its original request.	Natural. This is a core use case. The server can send a request on the optional standalone `GET` stream at any time, enabling true, back-and-forth conversational AI.
Ideal Use Case	Simple, one-way server-to-client notification systems. (e.g., "A new article was posted!", stock tickers).	Complex, stateful, interactive applications. (e.g., AI agents, long-running data processing tools, real-time collaborative dashboards).

The journey from a classic SSE-based architecture to the Streamable HTTP protocol is a perfect case study in software evolution. The classic approach is a clever solution that works, but it has architectural seams—the need for application-level session mapping, the lack of built-in resumability, and the clunky correlation of requests and responses.

The Streamable HTTP protocol is the direct result of learning from those seams. It re-imagines the flow to be more aligned with the nature of HTTP, creating a unified, more powerful, and vastly more resilient system.

I know that was a lot to take in, but hopefully, this deep dive has demystified the magic behind the MCP Streamable HTTP protocol. By understanding the code and the design choices, you're now equipped to leverage its full power.

Let me know your thoughts or questions in the comments below. Happy coding