What: The MCP 2026-07-28 release candidate reworks transport so the tools/call request itself carries every field a server needs to handle it — protocol version, capabilities, auth context, routing keys. The headline framing is stateless transport: any server in a fleet can serve any request, with no per-session pin to a specific instance.
Why: The previous design forced sticky routing: a session was bound to a single server for its lifetime, so load balancers had to either pin connections by session ID or replicate session state out-of-band. Horizontal scaling, blue/green deploys, and crash-recovery all suffered. The 2026-07-28 RC is the headline change of the next stable MCP spec — and it touches every harness that talks to MCP.
vs prior: Earlier MCP transports treated the first request as a handshake that established server-local state; subsequent requests had to land on the same instance. The new design drops the in-process session: each request is self-contained, and when long-lived cross-request state is genuinely needed (subscriptions, sampling sessions, auth tokens) it lives in a shared store any server can read — not in one server's memory.
Think of it as
A self-addressed envelope at a post office with many windows.
tools/call (one letter)
│
┌─────────────┴─────────────┐
│ │
┌───────▼───────┐ ┌───────▼───────┐
│ sticky clerk │ │ self-addressed│
│ (one window) │ │ envelope │
└───────┬───────┘ └───────┬───────┘
│ │
state in HER drawer address + tracking
notebook, hers alone ID on the envelope
│ │
▼ ▼
✗ wait at her window ✓ any open window
again; if she's serves it — open
out, trail is lost ten more, all equal
- tools/call = handing a letter to a clerk
- sticky routing = a clerk who only remembers your shipment from a notebook on her desk — come back to HER for status
- self-addressed request = a letter with the destination, sender, and tracking ID printed on the envelope — any window reads it
- shared session store (when needed) = the post office's central tracking database — any clerk queries it
- horizontal scaling = open ten more windows in the same office; any one serves you
Quick glossary
MCP — The Model Context Protocol — an open protocol for connecting LLM hosts to external tool servers. The host runs the model and the agent's tool loop; servers expose tools, resources, and prompts over JSON-RPC.
SEP — Specification Enhancement Proposal — MCP's RFC-style change document. The 2026-07-28 RC bundles twenty-two scoped SEPs covering the transport rework, the new Extensions framework, MCP Apps, Tasks, and authorization fixes.
Sticky routing — A load-balancing pattern where a session ID is pinned to a single backend instance for its lifetime. The load balancer hashes the session ID and always routes to the same server. Works fine until that one server is overloaded, restarted, or replaced.
Self-contained request — A request shape where every field the server needs to handle it — protocol version, declared client capabilities, routing keys, auth context — travels with the request itself. The server does not assume any prior state from earlier messages on the same socket.
Shared session store — An out-of-process store (Redis-equivalent, a database, an object store) that any server in the fleet can read and write. Used for the small subset of MCP interactions that genuinely need cross-request state — long-lived subscriptions, sampling sessions, OAuth tokens. The transport itself is still stateless; the store is an implementation pattern for state that has to survive across requests.
Tasks extension (SEP-2663) — The async-handle model for long-running tools: a server returns a Task handle the client drives with tasks/get, tasks/update, tasks/cancel. It composes naturally with stateless transport because the task handle is the only cross-request key the client needs.
The news. On May 22, 2026, the MCP project landed PR #2750 — the blog announcement for the 2026-07-28 specification release candidate. The post leads with the stateless transport rework as the headline change, with a before/after HTTP example showing a self-contained
tools/callrequest. Extensions, MCP Apps, and Tasks follow as the new capability story; the authorization changes are summarized by the failure modes they fix rather than enumerated SEP-by-SEP. All twenty-two scoped SEPs are linked from the announcement.
Picture the post office with many windows. The slow path is the sticky clerk: you hand your letter to clerk #3, and clerk #3 jots the details in a notebook only she keeps in her drawer. If you come back to check on your shipment, you have to wait at her window — none of the other clerks can tell you anything. If clerk #3 is busy, or goes on break, or quits, the trail of your shipment goes with her. The line at her window grows; the other windows are quiet. That is exactly what sticky-routed MCP looks like today. The agent's tool-use loop opens a session, the load balancer pins that session to one server, and every follow-up call has to land on that same server. One server gets the traffic; the others sit idle.
The fast path is the self-addressed envelope. You write the destination, the sender, and a tracking ID on the front of every letter, and the post office stops needing any one clerk to remember anything about your shipment. Any open window will do. That is the 2026-07-28 framing: each tools/call carries the protocol version it expects, the client capabilities it declared, any routing keys the server fleet needs, and the auth context — all in the request itself. The server reads the envelope and acts. No drawer notebook. No "come back to me." A second request half a second later can land on a different server entirely and produce identical behavior.
There is a real subtlety worth saying out loud. A few MCP interactions genuinely do need cross-request memory — long-lived subscriptions, sampling sessions, OAuth tokens that have to outlive a single call. The new design does not pretend those don't exist. It externalizes them: the central tracking database the metaphor mentions is a shared store (a Redis-equivalent, a database, an object store) that any server queries when it needs to hydrate that bit of cross-request state. The transport is still stateless — the request itself is self-contained — and the implementation pattern of a shared store is what makes the small slice of stateful behavior work across a fleet. Mixing those two ideas up is easy and worth keeping straight: the protocol's change is at the transport layer; the shared store is one way servers can choose to persist what little state has to outlive a request.
The capacity argument writes itself. Consider 300 concurrent agent sessions, each holding open MCP traffic at ~2 calls per second, hitting a fleet of 3 servers. Sticky routing assigns each session to one server at session open. Distribution is rarely uniform — three or four "power user" sessions can pin one server's load near saturation while the others sit at 10-20%. Numerically: a typical sticky-imbalance run might leave S1 at ~92% utilization while S2 and S3 sit at ~8% and ~41% (illustrative). Under stateless transport with the same workload, the load balancer can spray every call independently. The same 600 calls/sec land on three servers at ~49% each (illustrative) — a ~1.9× improvement in usable fleet headroom before any vertical scaling.
Where the rework earns its keep
Sticky routing's failure modes are well-known in the agent harness world: one hot server, blue/green deploys that have to drain sessions for minutes, crash recovery that can't transparently re-route. The 2026-07-28 RC closes all three at the transport level. Self-contained requests do not pin to anything, so a deploy that rolls a server out of rotation finishes in seconds — pending requests just hit the next server. A server that crashes drops its in-flight requests, and the client retries against the fleet — the next call lands somewhere else and proceeds. The only state that needs to survive the crash is whatever the workload chose to put in the shared store, which is the small minority of interactions.
The shape of what the RC actually changes is concrete. The table below contrasts the legacy and new transport.
| Aspect | Sticky-routed transport (legacy) | Stateless transport (2026-07-28 RC) |
|---|---|---|
| Session lifetime | Bound to one server for the session's life | No per-session server binding |
| Routing key | Session ID hashed to a specific instance | None — any instance, any request |
| First request | Handshake that creates server-local state | Self-contained, no implicit setup |
| Cross-request state | In server memory | In a shared store, only when needed (subscriptions, sampling, auth) |
| Horizontal scale-out | Awkward — uneven load by session hash | Native — load balancer sprays calls |
| Server restart | Drops the session; client must rebuild | Drops in-flight; retry hits any other server |
A related design point is worth knowing. The Tasks extension (SEP-2663) ships a complementary idea one layer up: it gives the client a long-lived taskId it can poll across reconnects. SEP-2663 needed the transport rework to be fully useful — a taskId polled across reconnects only works if the next tasks/get doesn't have to land on the same server that issued the handle. Stateless transport is what makes that work: the taskId is the only cross-request key the client carries, the server fleet hydrates the task's state from the shared store, and the polling call goes to whichever server is least busy.
The boundary of what the RC changes is the transport itself, not the protocol semantics. Tools still return tool results; resources still return resource contents; the wire format of a method call is the same JSON-RPC envelope. What changes is what a server is allowed to assume: nothing about prior calls on the same connection. That single discipline is enough to make every harness operator's life easier and to make the parallel-tool-call patterns the Cost & Latency module recommends actually achievable in a fleet.
FAQ
What does stateless transport mean in the MCP 2026-07-28 RC?
It means the tools/call request itself carries every field a server needs to handle it — protocol version, declared client capabilities, routing keys, auth context. The server is not allowed to assume any state from prior calls on the same connection. A consequence is that any server in a fleet can serve any request, so no sticky session binding is needed at the load balancer.
What replaces sticky routing for state that genuinely has to live across requests?
A shared store. The small subset of MCP interactions that need cross-request memory — long-lived subscriptions, sampling sessions, OAuth tokens — moves out of any one server's process and into a Redis-equivalent (or database, or object store) the entire fleet reads. The transport itself is still stateless; the shared store is an implementation pattern for the slice of state that must survive across requests.
How does the transport rework relate to the Tasks extension (SEP-2663)?
They compose. SEP-2663 lets a server return a long-lived taskId the client polls later. Stateless transport is what makes that poll robust across a fleet: the next tasks/get does not need to land on the same server that issued the handle. Together they let an agent harness survive server restarts, blue/green deploys, and load-balancer reshuffles without any session affinity.
What needs to change in existing MCP server code to support stateless transport?
Concretely: stop reading state from the connection. Any field the server used to learn once at session-establish and remember for the lifetime of the connection — declared client capabilities, protocol version, auth identity, routing tenant — must now be read from each tools/call request instead. Servers that already drove every decision off the incoming request payload need minimal changes. Servers that built up per-connection caches (negotiated capabilities, OAuth introspection results, tenant routing decisions) need to externalize those caches into a shared store the whole fleet reads, or push them to the client to re-send. Most production MCP servers will land in the middle: a few small migrations rather than a rewrite.
How does stateless transport affect MCP authentication and authorization?
Auth context becomes a per-request field rather than a per-session attribute. The 2026-07-28 RC expects every tools/call to carry whatever proof the server needs — a bearer token, a signed capability, a tenant identifier — so any server in the fleet can verify the call without consulting prior connection state. The net effect on a production stack is that a load-balancer reshuffle, a server restart, or a blue/green deploy mid-flight no longer drops the agent's authorization, because no server held it in process memory in the first place. Token introspection caches still live somewhere, but in a shared store the entire fleet shares (Redis-equivalent), not in any single server's per-connection state.
Originally posted on Learn AI Visually.
Top comments (0)