First Conclusion: Python-based MCP Servers Now Run in the Cloud
Traditionally, MCP servers, especially those implemented in Python, were often perceived as being for local use. Configurations combining Server-Sent Events (SSE) and auxiliary WebSockets frequently posed challenges for deployment in cloud environments.
However, just a few days ago on May 8, 2025, v1.8.0 of the MCP Python SDK was released, officially supporting the long-awaited Streamable HTTP transport.
The release notes proudly declare, "This is the first release supporting the new Streamable HTTP transport from protocol version 2025-03-26, which supersedes the SSE transport from protocol version 2024-11-05. 🎉", signaling a significant milestone in MCP's communication protocol.
With the introduction of Streamable HTTP, bidirectional stream communication can now be handled efficiently over a single standard HTTP connection. This has significantly opened up the path for deploying and operating Python-based MCP Servers directly in common cloud environments such as VPS, Google Cloud Run, and AWS Lambda.
By deploying to the cloud and leveraging existing HTTP infrastructure (load balancers, CDNs, WAFs, etc.), while directly calling upon Python's rich ecosystem (any machine learning libraries or LLMs) within the server, the flexibility and scalability of MCP application development using Python are dramatically improved.
The Technical Appeal of Streamable HTTP: A Deeper Dive
The Streamable HTTP transport in the MCP specification (see official specification) offers a sophisticated mechanism that goes beyond simple HTTP streaming, aiming to resolve issues of the previous HTTP+SSE method. Key features include:
- Endpoint Consolidation and Simplified Communication: Previously, separate endpoints were needed for establishing SSE connections and sending messages (e.g.,
/sse
and/sse/messages
). Streamable HTTP consolidates these into a single endpoint (e.g.,/mcp
). All client messages (requests, notifications, responses) are always sent as HTTP POST requests to this single endpoint. The server then responds either with a single JSON response or by streaming the response as an SSE stream, as needed. This greatly simplifies connection management for both clients and servers. - Flexible Connection Persistence and Efficient Bidirectional Communication:
- Connections are initiated as regular HTTP requests and can be "upgraded" to an SSE stream on the same connection at the server's discretion. This eliminates the need to maintain a persistent connection at all times, improving resource efficiency, especially in serverless environments. The ability to "start with a normal HTTP connection and switch to sequential data transmission as needed, thus eliminating the need for a constantly open dedicated connection" is a significant advantage.
- The server can use the established SSE stream not only to send responses to client requests but also to send notifications or additional requests to the client at any time, enabling true bidirectional communication over a single logical connection.
- Clients can also establish an SSE stream via an HTTP GET request to listen for spontaneous messages from the server (e.g., resource change notifications).
- Session Management and Future Robustness Enhancements (Resumability & Cancellability):
- The MCP specification defines a framework for stateful session management using the
Mcp-Session-Id
HTTP header. This allows servers to maintain client-specific state and context across multiple requests or connections. - Advanced features for more robust communication, such as stream resumption using the standard SSE
Last-Event-ID
header (Resumability) and explicit operation cancellation by the client (Cancellability), are also included in the MCP specification. These are being progressively implemented in the SDKs and are expected to enable recovery from network interruptions and termination of unnecessary long-running processes, especially in unstable network environments. - As of MCP Python SDK v1.8.0, these advanced session management features are not yet fully implemented and are primarily considered a foundation for future extensions. However, basic Streamable HTTP send/receive operations function as specified, achieving feature parity with the previous SSE method.
- The MCP specification defines a framework for stateful session management using the
- Consideration for Backward Compatibility: Guidelines are provided for maintaining compatibility with the older HTTP+SSE transport (protocol version 2024-11-05), allowing for a gradual transition from existing systems. For instance, Cloudflare's implementation demonstrates how a server can support both old and new clients by concurrently offering paths for the legacy SSE and the new Streamable HTTP.
These characteristics allow Streamable HTTP to transcend simple unidirectional streaming, offering an efficient, flexible, and prospectively more robust bidirectional communication infrastructure over standard HTTP protocols.
A Cloudflare blog post, "Bringing streamable HTTP transport and Python language support to MCP servers" (April 30, 2025), also emphasizes the simplicity of this new transport. While the previous SSE-based transport required managing separate endpoints for sending and receiving messages, Streamable HTTP consolidates this into a single endpoint. The article aptly describes this change: "it's like having a conversation with two phones, one for listening and one for speaking," highlighting how Streamable HTTP reduces developer burden.
1. Evolution of the Python SDK: Expanding Possibilities with Streamable HTTP Support
With the MCP Python SDK now supporting Streamable HTTP, how do the options and possibilities for Python-based MCP server development expand? Let's compare it with the TypeScript SDK, which already supported Streamable HTTP, to see Python's unique advantages.
Aspect | TypeScript SDK (Node.js-based) | Python SDK (v1.8.0 onwards) |
---|---|---|
Streamable HTTP Support | Already supported | Officially supported in v1.8.0 |
LLM & ML Libraries | Relies on the JavaScript ecosystem | Full utilization of Python's rich ecosystem, including PyTorch, TensorFlow, Hugging Face Transformers, etc. |
Serverless Deployment | Proven in Cloud Functions, AWS Lambda (Node.js runtime), etc. | Enables similarly easy deployment in Cloud Run, AWS Lambda (Python runtime), etc. |
Ecosystem Maturity | MCP support has progressed relatively early, with ample related tools and samples. | Rapidly catching up by leveraging the strengths of the Python community. This template is part of that effort. |
What becomes clear from this comparison is that by supporting Streamable HTTP, the Python SDK is catching up to the TypeScript SDK in terms of features, while also allowing developers to fully leverage Python's strengths in LLM inference and data science processing. This makes tasks like the following easier:
- Directly executing advanced natural language processing or machine learning models within the MCP server and providing the results to clients (like AI agents) in real-time.
- Seamlessly integrating existing Python-based machine learning workflows or data pipelines with MCP servers.
In other words, Python's Streamable HTTP support isn't just about enabling a new communication method; it's about opening the door to bring the entire power of Python's ecosystem 본격적으로 into the MCP world.
2. A Minimalist Template to Bridge the "Official Samples Gap"
https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template
Goal | Implementation |
---|---|
Get it running quickly | Includes mcp-server-demo / mcp-client-demo
|
Eliminate environment discrepancies |
Dockerfile + VS Code Dev Container
|
Start with a test-first approach | E2E tests with pytest-asyncio included initially |
Maintain code style & quality | Automated linting/formatting with pre-commit + Ruff
|
Speed up installation | Faster venv and pip with uv
|
Clone the template → run uv venv && uv pip install -e ".[dev,test]"
→ execute pytest
. All within about 3 minutes. You can confirm that Streamable HTTP round-trips as expected.
3. Quick Start and Setup Details
Let's dive into the setup procedure to make the most of this template and the underlying mechanisms.
3.1 Basic Startup Procedure
Here's the basic flow from cloning the repository to starting the server and connecting from a client.
# 1. Clone the repository
git clone https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template.git my-mcp
cd my-mcp
# 2. Create a virtual environment with uv and install dependencies
uv venv # A virtual environment will be created in the .venv directory
uv pip install -e ".[dev,test]" # Install development and test dependencies as well
# 3. Activate the virtual environment
source .venv/bin/activate # For Linux / macOS
# .venv\Scripts\activate # For Windows (Command Prompt)
# .\.venv\Scripts\Activate.ps1 # For Windows (PowerShell)
# 4. Start the MCP server
mcp-server-demo
# By default, it listens for requests at http://0.0.0.0:8000/mcp
Open another terminal, activate the virtual environment similarly, and then run the client.
# (After activating the virtual environment in another terminal)
# 5. Run the MCP client
mcp-client-demo
# If "Result of add(10, 5): 15" is displayed, it's successful.
# 6. Run E2E tests
pytest
# Confirm that all tests pass (green OK).
3.2 Key Setup Points
uv
for Fast Environment Setup: Python Packaging's Next-Gen Ace
This template fully adopts uv
, a new package manager for Python. It's no exaggeration to say uv
is one of the most注目されている (attention-grabbing) tools in the Python world right now.
- What is
uv
?:- Developed by Astral, also known for the high-performance linter
Ruff
,uv
is a Rust-based, high-speed Python packaging tool. - It aims not only to replace
pip
andvenv
but also to cover dependency locking features likepip-tools
and, in the future, project management functionalities similar toPoetry
orPDM
(see Bite code!'s article and Hacker News discussion). - Its overwhelming speed and ambitious scope suggest it has the potential to significantly influence Python's standard toolchain (though it doesn't fully replace all features of
Poetry
orPDM
at present).
- Developed by Astral, also known for the high-performance linter
- Installation: If
uv
is not installed on your system, please refer to the official uv documentation to install it. Methods likepipx install uv
orcargo install uv
are available. - Benefits in this template:
-
uv venv
: Virtual environment creation is literally instantaneous. -
uv pip install
: Dependency resolution and package download/installation are dramatically faster, even for projects with complex dependencies. Processes that took minutes with traditionalpip
can often finish in seconds. - This template efficiently installs dependencies defined in
pyproject.toml
(regular, development[dev]
, and test[test]
extras) with a single command:uv pip install -e ".[dev,test]"
. This speed also significantly contributes to reducing CI/CD pipeline execution times.
-
Adopting uv
reduces time costs at every stage of the development cycle, allowing you to focus more on core development tasks.
pre-commit
and Ruff
for Quality Maintenance: Modern Python Development Best Practices
Automatically maintaining high code quality is essential in modern software development. This template introduces this best practice by combining pre-commit
hooks and Ruff
.
-
pre-commit
: A framework for automatically running predefined checks (hooks) before Git commits (official website).-
pre-commit
itself is also installed as a development dependency withuv pip install -e ".[dev,test]"
. - To start using it, run
pre-commit install
once in the repository root. This sets up the Git hooks, and checks will run automatically on subsequent commits.
-
-
Ruff
: A Rust-based, ultra-fast Python linter and formatter developed by Astral (official website).- What's astounding is not just its speed, but its ability to cover most of the checks and formatting previously done by multiple tools like
Flake8
,isort
,pydocstyle
,pyupgrade
, etc., withRuff
alone. This simplifies configuration files and significantly reduces tool management costs. -
Ruff
hooks are defined in.pre-commit-config.yaml
. When you try to commit, static code analysis (detecting potential bugs or deprecated practices) and formatting (unifying coding style) are automatically executed. - If issues are found, the commit is aborted, prompting you to fix them. In many cases,
Ruff
can automatically fix the problems it finds.
- What's astounding is not just its speed, but its ability to cover most of the checks and formatting previously done by multiple tools like
This pre-commit
+ Ruff
combination is rapidly becoming a de facto standard in the Python community, greatly contributing to maintaining code consistency and reducing review burden in team development.
VS Code Dev Container for Environment Reproducibility: A Smoother Development Experience
If you use VS Code, the Dev Container feature can further reduce the effort of setting up your development environment and provide a smoother experience.
- What is a Dev Container?: A mechanism to build a fully isolated development environment within a Docker container and use it directly from VS Code.
- How to use:
- Open this template project in VS Code.
- If the Dev Containers extension is installed, a notification "Reopen in Container" will appear in the bottom right. Click it.
- Benefits:
- The
.devcontainer/devcontainer.json
file defines everything: the Docker image to use, VS Code extensions to install, and commands to run after container creation (likeuv pip install
andpre-commit install
viapostCreateCommand
). - There's absolutely no need to install Python,
uv
, or various tools locally. - All team members can develop with the exact same tools and versions, eliminating "it works on my machine" problems.
- The
4. Peeking Inside the Template (Key File Explanations)
This template provides the backbone for rapidly developing Streamable HTTP-enabled MCP servers. Let's look at the key files and their roles.
.
├── Dockerfile # Container image definition (python:3.13-slim base, non-root execution)
├── .devcontainer/
│ └── devcontainer.json # VS Code Dev Container settings
├── src/
│ └── mcp_python_streamable_e2e_test_template/
│ ├── __init__.py
│ ├── client.py # Sample MCP client implementation
│ ├── config.py # Configuration loading from environment variables
│ └── server.py # FastMCP server core and tool/resource definitions
├── tests/
│ ├── conftest.py # pytest fixtures (e.g., for starting test server)
│ └── test_client.py # E2E test cases (tool calls via Streamable HTTP)
├── .pre-commit-config.yaml # pre-commit hook definitions (Ruff, etc.)
├── pyproject.toml # Project definition, dependencies (for uv)
└── README.md # Detailed project description
src/server.py
- The Heart of the MCP Server
Defines the MCP server using FastMCP
and registers sample tools and resources.
import logging
import os
from mcp.server.fastmcp import FastMCP
from .config import Config
# Load configuration (LOG_LEVEL, MCP_SERVER_PORT, etc. from environment variables)
cfg = Config()
# ... (Port setting logic) ...
# Create FastMCP instance (server name "Demo" is used in logs, etc.)
server: FastMCP = FastMCP("Demo")
logging.basicConfig(level=cfg.log_level, format="...")
logger = logging.getLogger(server.name)
# Define 'add' tool
@server.tool()
def add(a: int, b: int) -> int:
"""Add two numbers."""
logger.debug(f"Tool 'add' called with a={a}, b={b}")
result = a + b
logger.debug(f"Tool 'add' result: {result}")
return result
# Define 'greeting' resource
@server.resource("greeting://{name}")
def get_greeting(name: str) -> str:
"""Get a personalized greeting."""
logger.debug(f"Resource 'greeting://{name}' accessed")
greeting = f"Hello, {name}!"
# ...
return greeting
def main() -> None:
"""Entry point for starting the server"""
transport = os.getenv("MCP_TRANSPORT", "streamable-http") # Defaults to Streamable HTTP
logger.info(f"Starting server '{server.name}' with transport '{transport}'...")
server.run(transport=transport) # Run the server!
if __name__ == "__main__":
main()
-
FastMCP("Demo")
: Creates a lightweight MCP server instance. -
@server.tool()
: Functions decorated with this are exposed as MCP tools. Type hints are used for argument and return value schema definitions. -
@server.resource("greeting://{name}")
: Defines a resource matching a URI pattern. The{name}
part is passed as an argument to the function. -
server.run(transport="streamable-http")
: This is the command that starts the server with Streamable HTTP.
src/client.py
- Client to Interact with the Server
A sample client that connects to the server using Streamable HTTP and calls a tool.
import asyncio
from mcp import ClientSession, types
from mcp.client.streamable_http import streamablehttp_client # Streamable HTTP client
async def run_client(quiet: bool, verbose: bool) -> None:
server_url: str = "http://localhost:8000/mcp" # Target URL
try:
async with streamablehttp_client(server_url) as (
client_read_stream, client_write_stream, client_get_session_id_callback
):
async with ClientSession(client_read_stream, client_write_stream) as current_session:
session: ClientSession = current_session
await session.initialize() # Initialize session
logger.info(f"Connected. Session ID: {client_get_session_id_callback()}")
tool_name: str = "add"
arguments: dict[str, Any] = {"a": 10, "b": 5}
response_object: types.CallToolResult = await session.call_tool(
tool_name, arguments # Call 'add' tool
)
# ... (Process response) ...
# ... (Error handling) ...
# ... (main function and argparse) ...
-
streamablehttp_client(server_url)
: An async context manager that attempts to connect to the specified URL using Streamable HTTP. On success, it returns read/write streams and a session ID callback. -
ClientSession(...)
: Manages the MCP session using the streams. -
session.call_tool(...)
: Executes the server-side tool with the given name and arguments.
Dockerfile
- Reproducible Execution Environment
This file defines how to build a Docker image for running the MCP server application.
# Base image (Python 3.13 slim version)
FROM python:3.13-slim
# ... (System utilities, uv installation) ...
# Create non-root user (for security)
ARG APP_USER=appuser
RUN groupadd ${APP_USER} && useradd -ms /bin/bash -g ${APP_USER} ${APP_USER}
WORKDIR /app # Working directory
# Copy dependency files & install with uv
COPY pyproject.toml uv.lock* ./
COPY src/ ./src
COPY README.md ./README.md
RUN uv venv .venv && \
. .venv/bin/activate && \
uv pip install --no-cache-dir -e ".[test,dev]"
COPY . . # Copy remaining code
RUN chown -R ${APP_USER}:${APP_USER} /app # Change ownership
USER ${APP_USER} # Switch to non-root user
# Add venv's bin directory to PATH
ENV VIRTUAL_ENV="/app/.venv"
ENV PATH="/app/.venv/bin:$PATH"
EXPOSE 8000 # Port the server listens on
CMD ["mcp-server-demo"] # Default command when container starts
-
FROM python:3.13-slim
: Uses a lightweight official Python image as a base. -
uv venv
&uv pip install
: Sets up the environment quickly usinguv
even inside the container. - Non-root user execution: Runs processes as a non-root user (
USER ${APP_USER}
) for better security. -
CMD ["mcp-server-demo"]
: Themcp-server-demo
script (themain
function insrc/server.py
) is executed when the container starts.
tests/
Directory - Quality Assurance with E2E Tests
E2E tests using pytest
ensure the correctness of Streamable HTTP communication.
tests/conftest.py
(Test Configuration and Fixtures):
import pytest
import subprocess
import os
import time
@pytest.fixture(scope="session")
def mcp_server_url() -> str:
# Start server on a different port (8001) for tests
port = "8001"
env = os.environ.copy()
env["MCP_SERVER_PORT"] = port # Specify port via environment variable
env["LOG_LEVEL"] = "WARNING" # Suppress logs during tests
# Start server as a background process
process = subprocess.Popen(
["mcp-server-demo"],
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
time.sleep(1) # Wait for server to start (more robust checks are preferable)
yield f"http://localhost:{port}/mcp" # Provide server URL to test cases
process.terminate() # Stop server after tests
process.wait()
- The
mcp_server_url
fixture startsmcp-server-demo
on port 8001 at the beginning of the session. This is because the logic insrc/server.py
prioritizes theMCP_SERVER_PORT
environment variable forFASTMCP_PORT
. This prevents conflicts with the development server (default port 8000).
tests/test_client.py
(Test Cases):
import pytest
from mcp import ClientSession, types
from mcp.client.streamable_http import streamablehttp_client
@pytest.mark.asyncio
async def test_add_tool_success(mcp_server_url: str) -> None: # Get URL from fixture
tool_name: str = "add"
arguments: dict[str, Any] = {"a": 10, "b": 5}
expected_result: int = 15
async with streamablehttp_client(mcp_server_url) as ( # Connect to test server
read_stream, write_stream, get_session_id_callback
):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
response: types.CallToolResult = await session.call_tool(tool_name, arguments)
assert not response.isError
# ... (Detailed result validation) ...
content_list: list[types.Content] = response.content
first_content: types.TextContent = content_list[0]
assert int(first_content.text) == expected_result
- Test functions take
mcp_server_url
as an argument and actually connect to that URL usingstreamablehttp_client
. - It calls the
add
tool and verifies that the returned result matches the expected value usingassert
.
This allows automatic testing of the entire flow: "When I call a tool via Streamable HTTP, I get the expected result back correctly."
5. Advanced Use Cases: Deployment to the Cloud
MCP servers created with this template are containerized, making them easy to deploy to various cloud platforms. Here are some representative examples.
5.1 Easy Serverless Deployment with Google Cloud Run
Google Cloud Run is a service that allows you to deploy applications to a scalable serverless environment simply by uploading a container image.
-
Deployment Overview:
- Build the Docker image locally:
docker build -t gcr.io/YOUR_PROJECT_ID/mcp-server:latest .
- Push the image to Google Container Registry (GCR) or Artifact Registry:
docker push gcr.io/YOUR_PROJECT_ID/mcp-server:latest
-
Deploy with the
gcloud run deploy
command:
gcloud run deploy mcp-server \ --image gcr.io/YOUR_PROJECT_ID/mcp-server:latest \ --platform managed \ --region YOUR_REGION \ --allow-unauthenticated \ # Configure authentication as needed --port 8000 # Port EXPOSEd in Dockerfile
(See also official documentation: Build and deploy a Python service)
- Build the Docker image locally:
-
Compatibility with Streamable HTTP:
- Cloud Run supports HTTP/2 by default, which is technically well-suited for long-lived connections and bidirectional streaming like Streamable HTTP.
- Cost-effective operation can be expected due to auto-scaling based on request volume (including scaling to zero).
-
Considerations:
- Cold start time: Can be mitigated by setting minimum instances to 1 or more.
- Timeout settings: Cloud Run's request timeout (max 60 minutes) needs to be appropriately adjusted to prevent unintentional disconnection of Streamable HTTP sessions.
- The Glama project's example of deploying an authenticated SSE server to Cloud Run demonstrates achieving secure MCP server exposure by combining Cloud Run's IAM authentication with a local proxy.
- Cloud Run has officially supported HTTP streaming (including SSE) since October 2020, so the Streamable HTTP mode of an MCP server should technically work. However, since Cloud Run's request timeout (max 15 minutes, or up to 60 minutes with configuration) might cut off very long SSE connections, MCP's recommendation (closing the stream after all responses for long-running operations are complete) and future Resumability features become important.
5.2 Flexible Serverless Experience with AWS Lambda and Function URLs
AWS Lambda also supports container images, and Function URLs allow direct HTTP endpoint exposure without an API Gateway.
-
Deployment Overview:
- Build the Docker image locally.
-
Push the image to Amazon Elastic Container Registry (ECR) (see official documentation: Creating Lambda functions from container images).
# Create ECR repository (first time only) aws ecr create-repository --repository-name mcp-server --image-scanning-configuration scanOnPush=true # Docker login aws ecr get-login-password --region YOUR_REGION | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com # Tag and push image docker tag mcp-server:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest
3. Create a Lambda function specifying the ECR image and enable Function URL.
```bash
aws lambda create-function \
--function-name mcp-server-lambda \
--package-type Image \
--code ImageUri=YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest \
--role YOUR_LAMBDA_EXECUTION_ROLE_ARN \
--timeout 300 \ # Adjust as needed (max 900 seconds)
--memory-size 512 # Adjust as needed
aws lambda create-function-url-config \
--function-name mcp-server-lambda \
--auth-type NONE # Or AWS_IAM
```
- Compatibility with Streamable HTTP:
- Function URLs support HTTP/1.1 and HTTP/2.
- Lambda's execution time limit (max 15 minutes) and payload size limit (6MB for request/response) must be considered. Even with Streamable HTTP, long-duration streaming or large data transfers may require design工夫 (ingenuity/workarounds) to operate within Lambda's constraints.
- Considerations:
- Image Size: Lambda's container image size limit is 10GB. This template's Dockerfile uses a slim image, so it usually fits, but be mindful when adding large ML libraries.
- Cold Starts: Can be mitigated by configuring Provisioned Concurrency, but this impacts cost.
- Running Heavy Models: Increasing
--memory-size
allows running larger models like PyTorch or TensorFlow, but balance this with cost.
5.3 Full-Fledged Operation and Scaling with Kubernetes (K8s)
For more advanced control and scalability, deployment to Kubernetes (K8s) is an option.
-
Deployment Overview:
- Build the Docker image and push it to any container registry (Docker Hub, GCR, ECR, etc.).
-
Create Deployment and Service manifest files.
# deployment.yaml (excerpt) apiVersion: apps/v1 kind: Deployment metadata: name: mcp-server-deployment spec: replicas: 2 # Initial replica count selector: matchLabels: app: mcp-server template: metadata: labels: app: mcp-server spec: containers: - name: mcp-server-container image: YOUR_REGISTRY/mcp-server:latest # Your pushed image ports: - containerPort: 8000 --- # service.yaml (excerpt) apiVersion: v1 kind: Service metadata: name: mcp-server-service spec: selector: app: mcp-server ports: - protocol: TCP port: 80 # Port exposed by the Service targetPort: 8000 # Container's port type: LoadBalancer # Or ClusterIP/NodePort + Ingress
3. Deploy with `kubectl apply -f deployment.yaml` and `kubectl apply -f service.yaml`.
-
Scaling and Availability:
-
Horizontal Pod Autoscaler (HPA): Automatically scales the number of Pods based on CPU utilization or custom metrics (see official walkthrough).
# hpa.yaml (example based on CPU utilization) apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: mcp-server-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: mcp-server-deployment minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Ingress: Exposes the Service externally, providing routing, SSL termination, HTTP/2 support, etc. Many Ingress Controllers like Nginx Ingress Controller or Traefik support HTTP/2, which can potentially improve Streamable HTTP performance. Configuring the backend protocol in Ingress settings to
HTTP2
(or an appropriate value likeGRPC
supported by the controller) can help maintain HTTP/2 along the path to the Pod (check specific annotations in your Ingress Controller's documentation).
-
-
Considerations:
- K8s has a steep learning curve but offers great flexibility and robustness once set up.
- Managed Kubernetes services (GKE, EKS, AKS, etc.) can reduce the operational burden of the control plane.
Utilizing these cloud platforms allows you to smoothly progress through the steps of "try it out → deploy to the cloud → integrate LLM/ML" starting from this template.
6. Community Trends and the Future of MCP: Active Discussion and Evolution
Since Anthropic's initial announcement (see Hacker News discussion: Model Context Protocol, around December 2024), MCP and related technologies have been actively discussed and continue to evolve within the developer community.
- The Road to Streamable HTTP: Initially, MCP was a stateful protocol assuming long-lived connections. However, the difficulty of deploying in serverless environments led to a demand for more flexible communication methods. In GitHub Discussions, particularly "State, and long-lived vs. short-lived connections," developers from companies like Shopify and Automattic (WordPress.com) who were trying to use MCP discussed specific challenges (e.g., difficulties implementing SSE in PHP, serverless scaling issues) and proposed various solutions like session tokens, stateless/stateful protocol variants, and WebSocket usage. The current Streamable HTTP transport (HTTP POST + optional SSE) specification was adopted as a result of this active feedback loop, demonstrating MCP's evolution with the community.
- Python MCP Servers on Cloudflare Workers: The aforementioned Cloudflare article also introduces how to build and deploy MCP servers using Python on Cloudflare Workers. This suggests new possibilities for MCP utilization in edge computing environments. Notably, the ability to easily expose existing FastAPI applications as MCP tools using the
FastAPI-MCP
library is welcome news for many Python developers. - Continuous Evolution of MCP Specifications and Roadmap:
- Cloudflare has indicated its policy to actively incorporate advanced features defined in the MCP specification—such as Resumability, Cancellability, and Session management—into its Agents SDK.
- MCP's official roadmap prioritizes "enhancement of authentication/authorization," "service registry and discovery features," "further improvements for streaming and serverless support (including Resumability and stateless operation support)," and "enrichment of multi-language SDKs and testing." This suggests that the protocol itself is expected to continue evolving towards greater robustness and scalability. The roadmap also lists "service discovery and stateless operation support for serverless" as important items, indicating improvements conscious of environments like Cloud Run and AWS Lambda.
- Expansion of the MCP Ecosystem and Industry Adoption Trends:
- Entry of Major Players: A noteworthy development is OpenAI's official announcement in March 2025 of its adoption of MCP for its products. Starting with the Agents SDK, plans are in place to support MCP in ChatGPT and its APIs. CEO Sam Altman commented, "MCP has been well-received, and we are excited to add support to all our products." This fact is an extremely significant driving force for MCP to develop into an industry-standard protocol.
- Support from Cloud Vendors:
- Microsoft: Involved with MCP from an early stage, developing the official C# SDK and implementing MCP integration in Copilot Studio and the Autogen framework. They also released an MCP server extension for Playwright (web test automation tool) and are advancing support in Azure OpenAI services and GitHub areas.
- Google Cloud (including DeepMind): Announced the "MCP Toolbox for Databases" (formerly Gen AI Toolbox for Databases), supporting MCP as a standard for linking databases and AI agents. DeepMind CEO Demis Hassabis also confirmed the incorporation of MCP support into the next-generation Gemini SDK. Furthermore, Google is promoting scenarios where MCP agents run on Cloud Run, supporting it as a managed deployment target for Google's Agent2Agent (A2A) Protocol.
- AWS: In April 2025, AWS open-sourced a collection of MCP servers for its AI code assistance services (presumed to be related to CodeWhisperer and CodeCatalyst). At KubeCon EU 2025, AWS also mentioned MCP support in Bedrock agents, showing its stance as a major cloud provider accepting MCP as a "bridging standard for AI and tools."
- Cloudflare: Promoting MCP server hosting on its Workers edge platform, announcing Python and Streamable HTTP support updates in April 2025. They provide implementations supporting both new and old transports through their Agents SDK.
- Developer Tools and Startup Movements: Developer tool companies like Zed, Replit, Codeium, and Sourcegraph are moving to integrate MCP into their platforms. Adoption by cloud-native startups is also active, such as Kubiya, which provides a Kubernetes-based platform, and Solo.io, which released "MCP Gateway" by extending its API gateway OSS "Kgateway."
- Abundant Server Implementations and Community Enthusiasm: The MCP official website's example servers page (modelcontextprotocol.io/examples) lists a wide variety of official reference servers, including file system operations, DB integration, development tool integration, browser automation, communication tool integration, and AI-specific tools, demonstrating the breadth of MCP's applicability. Furthermore, leading companies like Axiom (log analysis), Browserbase (cloud browser automation), Cloudflare (developer platform), E2B (code execution sandbox), Neon (serverless Postgres), Prisma (DB management), Qdrant (vector search), Stripe (payments), Tinybird (real-time data platform), and Weaviate (Agentic RAG) are providing official MCP integrations for their platforms, accelerating ecosystem growth. The community is also actively developing and releasing MCP servers for popular tools and services like Docker container management, Kubernetes cluster operations, Linear issue tracking, Snowflake data warehouse integration, Spotify music control, and Todoist task management, showing that MCP's utility is expanding daily. The Python SDK's GitHub repository has garnered over 12,000 stars and 1,300 forks (as of May 2025), and active information exchange occurs in MCP-related Reddit communities. Reports of "hundreds of tool vendors advancing MCP integration" also indicate its high level of attention.
- Alternative Implementations and Tech Demos: Movements like Blaxel developing and open-sourcing a WebSocket-based MCP implementation (a fork of Supergateway) to solve SSE challenges are also seen. Additionally, tech demo videos like "MCP - Can Lambda do it? - Streamable HTTP Model Context Protocol" on the YouTube channel "the_context()" show growing interest from the tech community in combining Streamable HTTP with serverless architectures.
- Cloud Run Deployment Case Study: Mark W Kiehl's Medium article "Deploy Your Custom MCP AI Tool to Cloud Run" (May 1, 2025) introduces a procedure for deploying a custom MCP tool (using the
python-a2a
library to MCP-ify a LangChain tool) to Cloud Run, serving as a reference for understanding the practicalities of operating Python MCP servers on Cloud Run (though, as it predates SDK v1.8.0, it's unclear if it directly uses Streamable HTTP).
- MCP's Roadmap and Challenges:
- MCP's official roadmap prioritizes "enhancement of authentication/authorization," "service registry and discovery features," "further improvements for streaming and serverless support (including Resumability and stateless operation support)," and "enrichment of multi-language SDKs and testing."
- On the other hand, analyses by Gartner and others point out that "currently, there are immature aspects such as the security model, and it is mainly used for desktop application integration." The community and companies are collaboratively working to solve practical operational challenges.
Dubbed the "USB-C port of the AI industry," MCP is seeing unusually rapid adoption not only by AI pioneers like Anthropic and OpenAI but also by major cloud vendors. This powerful momentum strongly suggests MCP's great potential to grow into the de facto standard connecting AI agents and external services, and its future developments are worth watching closely.
7. Summary and Next Steps
In this article, we've explained how the official support for the Streamable HTTP transport in MCP Python SDK v1.8.0 has made it easier and more flexible to operate Python-based MCP servers in cloud environments. We also introduced a minimalist, E2E-tested template to accelerate its development, touched upon key technological elements, cloud deployment examples, and active community trends.
We hope this template serves as a helpful aid in your MCP server development.
We encourage you to clone https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template, first try out Streamable HTTP locally, and then expand its possibilities to the cloud.
8. References
- MCP (Model Context Protocol) General
- MCP Python SDK v1.8.0 Release Notes: https://github.com/modelcontextprotocol/python-sdk/releases/tag/v1.8.0 (3)
- FastMCP (Server within SDK): https://github.com/modelcontextprotocol/python-sdk/tree/main/mcp/server/fastmcp (27)
- Model Context Protocol Official Site: https://modelcontextprotocol.io/introduction
- MCP Server Examples: https://modelcontextprotocol.io/examples
- Awesome MCP Servers (Community-curated list): https://github.com/punkpeye/awesome-mcp-servers
- Key Technologies Used in This Template
- uv (Python Package Installer and Resolver): https://github.com/astral-sh/uv (23)
- Ruff (Python linter / formatter): https://github.com/astral-sh/ruff (21)
- pre-commit (Git hook management): https://pre-commit.com/ (18)
- VS Code Dev Containers: https://code.visualstudio.com/docs/devcontainers/containers (25)
- pytest (Python testing framework): https://docs.pytest.org/en/stable/
- pytest-asyncio (asyncio support for pytest): https://pytest-asyncio.readthedocs.io/ (15)
- Related Technologies
- Docker: https://docs.docker.com/
- Python
asyncio
: https://docs.python.org/3/library/asyncio.html
Top comments (0)