From Personal Tools to Organizational Systems
In a real-world enterprise environment, resources are rarely isolated. Systems like Databases, Emails, Calendars, CRMs, and ERPs are shared assets used by multiple stakeholders — Sales, HR, Finance, and Admin.
To evolve from a “Personal AI Assistant” (running locally) to a true “Organizational AI System,” the architecture must support multiple concurrent users, each with their own specific permissions and authentication levels. It’s no longer just about connecting a tool; it’s about managing an ecosystem.
Moving beyond localhost: A multi-user MCP architecture using Streamable HTTP
The Plot Twist in My AI Infrastructure Journey
I documented migrating from Google Cloud Run (Serverless) to a VM-based architecture. The culprit? SSE (Server-Sent Events) couldn’t maintain stable connections in a stateless serverless environment.
That was the right decision at the time. But technology moves fast.
I’m going back to serverless — and this time, with proper multi-user support that SSE could never provide. The game-changer? MCP’s new Streamable HTTP transport protocol.
Why SSE Failed in Production
Let me be clear: SSE worked perfectly on localhost. The problems emerged only at scale.
The Architectural Mismatch
SSE requires a persistent, long-lived connection between client and server. Every user maintains an open TCP connection that the server must track continuously.
Why Serverless Couldn’t Handle This
I tried Session Affinity to route users to the same instance. It helped — until auto-scaling kicked in and connections dropped anyway.
The solution was clear: migrate to a VM with a static IP. It worked. The agent ran 24/7 with stable connections.
But I lost the elegance of serverless: auto-scaling, zero maintenance, and pay-per-use pricing.
Enter Streamable HTTP: The Protocol Upgrade
In March 2025, the MCP specification introduced Streamable HTTP as the recommended transport, deprecating the SSE-based approach.
The key insight: Why maintain persistent connections when you don’t have to?
The Fundamental Shift
What Changed Under the Hood
The Real Win: Native Multi-User Support
Here’s what made this migration transformative. With Streamable HTTP, I finally implemented proper multi-user routing using URL parameters.
See the live demo below:
link
Note: The narration is in Korean. Please enable CC for English subtitles
The Architecture
Implementation with FastMCP
The beauty of FastMCP is that middleware handles user context injection seamlessly:
python
from fastmcp import FastMCP, Context
from fastmcp.server.dependencies import get_http_request
USER_CONFIGS = {
"admin": {
"gmail": "admin@company.com",
"permissions": ["read", "write", "delete"],
},
"sales": {
"gmail": "sales@company.com",
"permissions": ["read", "write"],
},
"finance": {
"gmail": "finance@company.com",
"permissions": ["read"],
}
}
mcp = FastMCP("Multi-User Server")
@mcp.middleware
async def inject_user_context(ctx, call_next):
request = get_http_request()
user_id = request.query_params.get("user_id", "admin")
config = USER_CONFIGS.get(user_id, USER_CONFIGS["admin"])
ctx.set_state("user_id", user_id)
ctx.set_state("gmail", config["gmail"])
ctx.set_state("permissions", config["permissions"])
return await call_next(ctx)
@mcp.tool()
def send_email(ctx: Context, to: str, subject: str, body: str):
"""Send email from the current user's account"""
sender = ctx.get_state("gmail")
# Gmail API call with user-specific credentials
return {"sent_from": sender, "to": to}USER_CONFIGS = {
"admin": {
"gmail": "admin@company.com",
"permissions": ["read", "write", "delete"],
},
"sales": {
"gmail": "sales@company.com",
"permissions": ["read", "write"],
},
"finance": {
"gmail": "finance@company.com",
"permissions": ["read"],
}
}
Claude Desktop Configuration
Each user gets their own MCP server entry pointing to the same endpoint with different parameters:
json
{
"mcpServers": {
"company-admin": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://your-server/mcp?user_id=admin"]
},
"company-sales": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://your-server/mcp?user_id=sales"]
},
"company-finance": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://your-server/mcp?user_id=finance"]
}
}
}
The Migration: Surprisingly Simple
Server Code Changes (2 lines)
python
Before (SSE)
mcp.run(transport="sse", host="0.0.0.0", port=8000)
After (Streamable HTTP)
mcp.run(transport="http", host="0.0.0.0", port=8000, path="/mcp"
The get_http_request() function works identically because both transports are HTTP-based underneath. Your business logic stays intact.
Key Takeaways
- SSE was the bottleneck, not serverless itself. The persistent connection model simply doesn’t fit stateless infrastructure.
- Streamable HTTP enables true multi-user MCP servers. URL parameter routing with middleware-based context injection is clean and scalable.
- The migration is minimal. If you’re using FastMCP, it’s literally changing one parameter.
- You now have options. VM for predictable workloads, serverless for variable traffic — both work with Streamable HTTP.






Top comments (0)