akitana-airtanker

Posted on May 11

Python MCP Remote Server — The Dawn of the Streamable HTTP Era ~ With a Minimalist Template Featuring uv / Docker / pytest ~

#python #ai #cloud #programming

First Conclusion: Python-based MCP Servers Now Run in the Cloud

Traditionally, MCP servers, especially those implemented in Python, were often perceived as being for local use. Configurations combining Server-Sent Events (SSE) and auxiliary WebSockets frequently posed challenges for deployment in cloud environments.

However, just a few days ago on May 8, 2025, v1.8.0 of the MCP Python SDK was released, officially supporting the long-awaited Streamable HTTP transport.
The release notes proudly declare, "This is the first release supporting the new Streamable HTTP transport from protocol version 2025-03-26, which supersedes the SSE transport from protocol version 2024-11-05. 🎉", signaling a significant milestone in MCP's communication protocol.

With the introduction of Streamable HTTP, bidirectional stream communication can now be handled efficiently over a single standard HTTP connection. This has significantly opened up the path for deploying and operating Python-based MCP Servers directly in common cloud environments such as VPS, Google Cloud Run, and AWS Lambda.

By deploying to the cloud and leveraging existing HTTP infrastructure (load balancers, CDNs, WAFs, etc.), while directly calling upon Python's rich ecosystem (any machine learning libraries or LLMs) within the server, the flexibility and scalability of MCP application development using Python are dramatically improved.

The Technical Appeal of Streamable HTTP: A Deeper Dive

The Streamable HTTP transport in the MCP specification (see official specification) offers a sophisticated mechanism that goes beyond simple HTTP streaming, aiming to resolve issues of the previous HTTP+SSE method. Key features include:

Endpoint Consolidation and Simplified Communication: Previously, separate endpoints were needed for establishing SSE connections and sending messages (e.g., /sse and /sse/messages). Streamable HTTP consolidates these into a single endpoint (e.g., /mcp). All client messages (requests, notifications, responses) are always sent as HTTP POST requests to this single endpoint. The server then responds either with a single JSON response or by streaming the response as an SSE stream, as needed. This greatly simplifies connection management for both clients and servers.
Flexible Connection Persistence and Efficient Bidirectional Communication:
- Connections are initiated as regular HTTP requests and can be "upgraded" to an SSE stream on the same connection at the server's discretion. This eliminates the need to maintain a persistent connection at all times, improving resource efficiency, especially in serverless environments. The ability to "start with a normal HTTP connection and switch to sequential data transmission as needed, thus eliminating the need for a constantly open dedicated connection" is a significant advantage.
- The server can use the established SSE stream not only to send responses to client requests but also to send notifications or additional requests to the client at any time, enabling true bidirectional communication over a single logical connection.
- Clients can also establish an SSE stream via an HTTP GET request to listen for spontaneous messages from the server (e.g., resource change notifications).
Session Management and Future Robustness Enhancements (Resumability & Cancellability):
- The MCP specification defines a framework for stateful session management using the Mcp-Session-Id HTTP header. This allows servers to maintain client-specific state and context across multiple requests or connections.
- Advanced features for more robust communication, such as stream resumption using the standard SSE Last-Event-ID header (Resumability) and explicit operation cancellation by the client (Cancellability), are also included in the MCP specification. These are being progressively implemented in the SDKs and are expected to enable recovery from network interruptions and termination of unnecessary long-running processes, especially in unstable network environments.
- As of MCP Python SDK v1.8.0, these advanced session management features are not yet fully implemented and are primarily considered a foundation for future extensions. However, basic Streamable HTTP send/receive operations function as specified, achieving feature parity with the previous SSE method.
Consideration for Backward Compatibility: Guidelines are provided for maintaining compatibility with the older HTTP+SSE transport (protocol version 2024-11-05), allowing for a gradual transition from existing systems. For instance, Cloudflare's implementation demonstrates how a server can support both old and new clients by concurrently offering paths for the legacy SSE and the new Streamable HTTP.

These characteristics allow Streamable HTTP to transcend simple unidirectional streaming, offering an efficient, flexible, and prospectively more robust bidirectional communication infrastructure over standard HTTP protocols.

A Cloudflare blog post, "Bringing streamable HTTP transport and Python language support to MCP servers" (April 30, 2025), also emphasizes the simplicity of this new transport. While the previous SSE-based transport required managing separate endpoints for sending and receiving messages, Streamable HTTP consolidates this into a single endpoint. The article aptly describes this change: "it's like having a conversation with two phones, one for listening and one for speaking," highlighting how Streamable HTTP reduces developer burden.

1. Evolution of the Python SDK: Expanding Possibilities with Streamable HTTP Support

With the MCP Python SDK now supporting Streamable HTTP, how do the options and possibilities for Python-based MCP server development expand? Let's compare it with the TypeScript SDK, which already supported Streamable HTTP, to see Python's unique advantages.

Aspect	TypeScript SDK (Node.js-based)	Python SDK (v1.8.0 onwards)
Streamable HTTP Support	Already supported	Officially supported in v1.8.0
LLM & ML Libraries	Relies on the JavaScript ecosystem	Full utilization of Python's rich ecosystem, including PyTorch, TensorFlow, Hugging Face Transformers, etc.
Serverless Deployment	Proven in Cloud Functions, AWS Lambda (Node.js runtime), etc.	Enables similarly easy deployment in Cloud Run, AWS Lambda (Python runtime), etc.
Ecosystem Maturity	MCP support has progressed relatively early, with ample related tools and samples.	Rapidly catching up by leveraging the strengths of the Python community. This template is part of that effort.

What becomes clear from this comparison is that by supporting Streamable HTTP, the Python SDK is catching up to the TypeScript SDK in terms of features, while also allowing developers to fully leverage Python's strengths in LLM inference and data science processing. This makes tasks like the following easier:

Directly executing advanced natural language processing or machine learning models within the MCP server and providing the results to clients (like AI agents) in real-time.
Seamlessly integrating existing Python-based machine learning workflows or data pipelines with MCP servers.

In other words, Python's Streamable HTTP support isn't just about enabling a new communication method; it's about opening the door to bring the entire power of Python's ecosystem 본격적으로 into the MCP world.

2. A Minimalist Template to Bridge the "Official Samples Gap"

https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template

Goal	Implementation
Get it running quickly	Includes `mcp-server-demo` / `mcp-client-demo`
Eliminate environment discrepancies	`Dockerfile` + VS Code Dev Container
Start with a test-first approach	E2E tests with `pytest-asyncio` included initially
Maintain code style & quality	Automated linting/formatting with `pre-commit` + Ruff
Speed up installation	Faster `venv` and `pip` with uv

Clone the template → run uv venv && uv pip install -e ".[dev,test]" → execute pytest. All within about 3 minutes. You can confirm that Streamable HTTP round-trips as expected.

3. Quick Start and Setup Details

Let's dive into the setup procedure to make the most of this template and the underlying mechanisms.

3.1 Basic Startup Procedure

Here's the basic flow from cloning the repository to starting the server and connecting from a client.

# 1. Clone the repository
git clone https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template.git my-mcp
cd my-mcp

# 2. Create a virtual environment with uv and install dependencies
uv venv  # A virtual environment will be created in the .venv directory
uv pip install -e ".[dev,test]" # Install development and test dependencies as well

# 3. Activate the virtual environment
source .venv/bin/activate      # For Linux / macOS
# .venv\Scripts\activate         # For Windows (Command Prompt)
# .\.venv\Scripts\Activate.ps1 # For Windows (PowerShell)

# 4. Start the MCP server
mcp-server-demo
# By default, it listens for requests at http://0.0.0.0:8000/mcp

Open another terminal, activate the virtual environment similarly, and then run the client.

# (After activating the virtual environment in another terminal)
# 5. Run the MCP client
mcp-client-demo
# If "Result of add(10, 5): 15" is displayed, it's successful.

# 6. Run E2E tests
pytest
# Confirm that all tests pass (green OK).

3.2 Key Setup Points

`uv` for Fast Environment Setup: Python Packaging's Next-Gen Ace

This template fully adopts uv, a new package manager for Python. It's no exaggeration to say uv is one of the most注目されている (attention-grabbing) tools in the Python world right now.

What is uv?:
- Developed by Astral, also known for the high-performance linter Ruff, uv is a Rust-based, high-speed Python packaging tool.
- It aims not only to replace pip and venv but also to cover dependency locking features like pip-tools and, in the future, project management functionalities similar to Poetry or PDM (see Bite code!'s article and Hacker News discussion).
- Its overwhelming speed and ambitious scope suggest it has the potential to significantly influence Python's standard toolchain (though it doesn't fully replace all features of Poetry or PDM at present).
Installation: If uv is not installed on your system, please refer to the official uv documentation to install it. Methods like pipx install uv or cargo install uv are available.
Benefits in this template:
- uv venv: Virtual environment creation is literally instantaneous.
- uv pip install: Dependency resolution and package download/installation are dramatically faster, even for projects with complex dependencies. Processes that took minutes with traditional pip can often finish in seconds.
- This template efficiently installs dependencies defined in pyproject.toml (regular, development [dev], and test [test] extras) with a single command: uv pip install -e ".[dev,test]". This speed also significantly contributes to reducing CI/CD pipeline execution times.

Adopting uv reduces time costs at every stage of the development cycle, allowing you to focus more on core development tasks.

`pre-commit` and `Ruff` for Quality Maintenance: Modern Python Development Best Practices

Automatically maintaining high code quality is essential in modern software development. This template introduces this best practice by combining pre-commit hooks and Ruff.

pre-commit: A framework for automatically running predefined checks (hooks) before Git commits (official website).
- pre-commit itself is also installed as a development dependency with uv pip install -e ".[dev,test]".
- To start using it, run pre-commit install once in the repository root. This sets up the Git hooks, and checks will run automatically on subsequent commits.
Ruff: A Rust-based, ultra-fast Python linter and formatter developed by Astral (official website).
- What's astounding is not just its speed, but its ability to cover most of the checks and formatting previously done by multiple tools like Flake8, isort, pydocstyle, pyupgrade, etc., with Ruff alone. This simplifies configuration files and significantly reduces tool management costs.
- Ruff hooks are defined in .pre-commit-config.yaml. When you try to commit, static code analysis (detecting potential bugs or deprecated practices) and formatting (unifying coding style) are automatically executed.
- If issues are found, the commit is aborted, prompting you to fix them. In many cases, Ruff can automatically fix the problems it finds.

This pre-commit + Ruff combination is rapidly becoming a de facto standard in the Python community, greatly contributing to maintaining code consistency and reducing review burden in team development.

VS Code Dev Container for Environment Reproducibility: A Smoother Development Experience

If you use VS Code, the Dev Container feature can further reduce the effort of setting up your development environment and provide a smoother experience.

What is a Dev Container?: A mechanism to build a fully isolated development environment within a Docker container and use it directly from VS Code.
How to use:
1. Open this template project in VS Code.
2. If the Dev Containers extension is installed, a notification "Reopen in Container" will appear in the bottom right. Click it.
Benefits:
- The .devcontainer/devcontainer.json file defines everything: the Docker image to use, VS Code extensions to install, and commands to run after container creation (like uv pip install and pre-commit install via postCreateCommand).
- There's absolutely no need to install Python, uv, or various tools locally.
- All team members can develop with the exact same tools and versions, eliminating "it works on my machine" problems.

4. Peeking Inside the Template (Key File Explanations)

This template provides the backbone for rapidly developing Streamable HTTP-enabled MCP servers. Let's look at the key files and their roles.

.
├── Dockerfile               # Container image definition (python:3.13-slim base, non-root execution)
├── .devcontainer/
│   └── devcontainer.json    # VS Code Dev Container settings
├── src/
│   └── mcp_python_streamable_e2e_test_template/
│       ├── __init__.py
│       ├── client.py        # Sample MCP client implementation
│       ├── config.py        # Configuration loading from environment variables
│       └── server.py        # FastMCP server core and tool/resource definitions
├── tests/
│   ├── conftest.py        # pytest fixtures (e.g., for starting test server)
│   └── test_client.py     # E2E test cases (tool calls via Streamable HTTP)
├── .pre-commit-config.yaml  # pre-commit hook definitions (Ruff, etc.)
├── pyproject.toml           # Project definition, dependencies (for uv)
└── README.md                # Detailed project description

`src/server.py` - The Heart of the MCP Server

Defines the MCP server using FastMCP and registers sample tools and resources.

import logging
import os
from mcp.server.fastmcp import FastMCP
from .config import Config

# Load configuration (LOG_LEVEL, MCP_SERVER_PORT, etc. from environment variables)
cfg = Config()
# ... (Port setting logic) ...

# Create FastMCP instance (server name "Demo" is used in logs, etc.)
server: FastMCP = FastMCP("Demo")
logging.basicConfig(level=cfg.log_level, format="...")
logger = logging.getLogger(server.name)

# Define 'add' tool
@server.tool()
def add(a: int, b: int) -> int:
    """Add two numbers."""
    logger.debug(f"Tool 'add' called with a={a}, b={b}")
    result = a + b
    logger.debug(f"Tool 'add' result: {result}")
    return result

# Define 'greeting' resource
@server.resource("greeting://{name}")
def get_greeting(name: str) -> str:
    """Get a personalized greeting."""
    logger.debug(f"Resource 'greeting://{name}' accessed")
    greeting = f"Hello, {name}!"
    # ...
    return greeting

def main() -> None:
    """Entry point for starting the server"""
    transport = os.getenv("MCP_TRANSPORT", "streamable-http") # Defaults to Streamable HTTP
    logger.info(f"Starting server '{server.name}' with transport '{transport}'...")
    server.run(transport=transport) # Run the server!

if __name__ == "__main__":
    main()

FastMCP("Demo"): Creates a lightweight MCP server instance.
@server.tool(): Functions decorated with this are exposed as MCP tools. Type hints are used for argument and return value schema definitions.
@server.resource("greeting://{name}"): Defines a resource matching a URI pattern. The {name} part is passed as an argument to the function.
server.run(transport="streamable-http"): This is the command that starts the server with Streamable HTTP.

`src/client.py` - Client to Interact with the Server

A sample client that connects to the server using Streamable HTTP and calls a tool.

import asyncio
from mcp import ClientSession, types
from mcp.client.streamable_http import streamablehttp_client # Streamable HTTP client

async def run_client(quiet: bool, verbose: bool) -> None:
    server_url: str = "http://localhost:8000/mcp" # Target URL

    try:
        async with streamablehttp_client(server_url) as (
            client_read_stream, client_write_stream, client_get_session_id_callback
        ):
            async with ClientSession(client_read_stream, client_write_stream) as current_session:
                session: ClientSession = current_session
                await session.initialize() # Initialize session
                logger.info(f"Connected. Session ID: {client_get_session_id_callback()}")

                tool_name: str = "add"
                arguments: dict[str, Any] = {"a": 10, "b": 5}
                response_object: types.CallToolResult = await session.call_tool(
                    tool_name, arguments # Call 'add' tool
                )
                # ... (Process response) ...
    # ... (Error handling) ...

# ... (main function and argparse) ...

streamablehttp_client(server_url): An async context manager that attempts to connect to the specified URL using Streamable HTTP. On success, it returns read/write streams and a session ID callback.
ClientSession(...): Manages the MCP session using the streams.
session.call_tool(...): Executes the server-side tool with the given name and arguments.

`Dockerfile` - Reproducible Execution Environment

This file defines how to build a Docker image for running the MCP server application.

# Base image (Python 3.13 slim version)
FROM python:3.13-slim

# ... (System utilities, uv installation) ...

# Create non-root user (for security)
ARG APP_USER=appuser
RUN groupadd ${APP_USER} && useradd -ms /bin/bash -g ${APP_USER} ${APP_USER}

WORKDIR /app # Working directory

# Copy dependency files & install with uv
COPY pyproject.toml uv.lock* ./
COPY src/ ./src
COPY README.md ./README.md
RUN uv venv .venv && \
    . .venv/bin/activate && \
    uv pip install --no-cache-dir -e ".[test,dev]"

COPY . . # Copy remaining code
RUN chown -R ${APP_USER}:${APP_USER} /app # Change ownership

USER ${APP_USER} # Switch to non-root user

# Add venv's bin directory to PATH
ENV VIRTUAL_ENV="/app/.venv"
ENV PATH="/app/.venv/bin:$PATH"

EXPOSE 8000 # Port the server listens on
CMD ["mcp-server-demo"] # Default command when container starts

FROM python:3.13-slim: Uses a lightweight official Python image as a base.
uv venv & uv pip install: Sets up the environment quickly using uv even inside the container.
Non-root user execution: Runs processes as a non-root user (USER ${APP_USER}) for better security.
CMD ["mcp-server-demo"]: The mcp-server-demo script (the main function in src/server.py) is executed when the container starts.

`tests/` Directory - Quality Assurance with E2E Tests

E2E tests using pytest ensure the correctness of Streamable HTTP communication.

tests/conftest.py (Test Configuration and Fixtures):

import pytest
import subprocess
import os
import time

@pytest.fixture(scope="session")
def mcp_server_url() -> str:
    # Start server on a different port (8001) for tests
    port = "8001"
    env = os.environ.copy()
    env["MCP_SERVER_PORT"] = port # Specify port via environment variable
    env["LOG_LEVEL"] = "WARNING"  # Suppress logs during tests

    # Start server as a background process
    process = subprocess.Popen(
        ["mcp-server-demo"],
        env=env,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )
    time.sleep(1) # Wait for server to start (more robust checks are preferable)
    yield f"http://localhost:{port}/mcp" # Provide server URL to test cases

    process.terminate() # Stop server after tests
    process.wait()

The mcp_server_url fixture starts mcp-server-demo on port 8001 at the beginning of the session. This is because the logic in src/server.py prioritizes the MCP_SERVER_PORT environment variable for FASTMCP_PORT. This prevents conflicts with the development server (default port 8000).

tests/test_client.py (Test Cases):

import pytest
from mcp import ClientSession, types
from mcp.client.streamable_http import streamablehttp_client

@pytest.mark.asyncio
async def test_add_tool_success(mcp_server_url: str) -> None: # Get URL from fixture
    tool_name: str = "add"
    arguments: dict[str, Any] = {"a": 10, "b": 5}
    expected_result: int = 15

    async with streamablehttp_client(mcp_server_url) as ( # Connect to test server
        read_stream, write_stream, get_session_id_callback
    ):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            response: types.CallToolResult = await session.call_tool(tool_name, arguments)

            assert not response.isError
            # ... (Detailed result validation) ...
            content_list: list[types.Content] = response.content
            first_content: types.TextContent = content_list[0]
            assert int(first_content.text) == expected_result

Test functions take mcp_server_url as an argument and actually connect to that URL using streamablehttp_client.
It calls the add tool and verifies that the returned result matches the expected value using assert.

This allows automatic testing of the entire flow: "When I call a tool via Streamable HTTP, I get the expected result back correctly."

5. Advanced Use Cases: Deployment to the Cloud

MCP servers created with this template are containerized, making them easy to deploy to various cloud platforms. Here are some representative examples.

5.1 Easy Serverless Deployment with Google Cloud Run

Google Cloud Run is a service that allows you to deploy applications to a scalable serverless environment simply by uploading a container image.

Deployment Overview:
1. Build the Docker image locally: docker build -t gcr.io/YOUR_PROJECT_ID/mcp-server:latest .
2. Push the image to Google Container Registry (GCR) or Artifact Registry: docker push gcr.io/YOUR_PROJECT_ID/mcp-server:latest
3. Deploy with the gcloud run deploy command:
```
gcloud run deploy mcp-server \
    --image gcr.io/YOUR_PROJECT_ID/mcp-server:latest \
    --platform managed \
    --region YOUR_REGION \
    --allow-unauthenticated \ # Configure authentication as needed
    --port 8000 # Port EXPOSEd in Dockerfile
```
  (See also official documentation: Build and deploy a Python service)
Compatibility with Streamable HTTP:
- Cloud Run supports HTTP/2 by default, which is technically well-suited for long-lived connections and bidirectional streaming like Streamable HTTP.
- Cost-effective operation can be expected due to auto-scaling based on request volume (including scaling to zero).
Considerations:
- Cold start time: Can be mitigated by setting minimum instances to 1 or more.
- Timeout settings: Cloud Run's request timeout (max 60 minutes) needs to be appropriately adjusted to prevent unintentional disconnection of Streamable HTTP sessions.
- The Glama project's example of deploying an authenticated SSE server to Cloud Run demonstrates achieving secure MCP server exposure by combining Cloud Run's IAM authentication with a local proxy.
- Cloud Run has officially supported HTTP streaming (including SSE) since October 2020, so the Streamable HTTP mode of an MCP server should technically work. However, since Cloud Run's request timeout (max 15 minutes, or up to 60 minutes with configuration) might cut off very long SSE connections, MCP's recommendation (closing the stream after all responses for long-running operations are complete) and future Resumability features become important.

5.2 Flexible Serverless Experience with AWS Lambda and Function URLs

AWS Lambda also supports container images, and Function URLs allow direct HTTP endpoint exposure without an API Gateway.

Deployment Overview:

Build the Docker image locally.

Push the image to Amazon Elastic Container Registry (ECR) (see official documentation: Creating Lambda functions from container images).

# Create ECR repository (first time only)
aws ecr create-repository --repository-name mcp-server --image-scanning-configuration scanOnPush=true
# Docker login
aws ecr get-login-password --region YOUR_REGION | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com
# Tag and push image
docker tag mcp-server:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest
docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest

3.  Create a Lambda function specifying the ECR image and enable Function URL.

    ```bash
    aws lambda create-function \
        --function-name mcp-server-lambda \
        --package-type Image \
        --code ImageUri=YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-server:latest \
        --role YOUR_LAMBDA_EXECUTION_ROLE_ARN \
        --timeout 300 \ # Adjust as needed (max 900 seconds)
        --memory-size 512 # Adjust as needed

    aws lambda create-function-url-config \
        --function-name mcp-server-lambda \
        --auth-type NONE # Or AWS_IAM
    ```

Compatibility with Streamable HTTP:
- Function URLs support HTTP/1.1 and HTTP/2.
- Lambda's execution time limit (max 15 minutes) and payload size limit (6MB for request/response) must be considered. Even with Streamable HTTP, long-duration streaming or large data transfers may require design工夫 (ingenuity/workarounds) to operate within Lambda's constraints.
Considerations:
- Image Size: Lambda's container image size limit is 10GB. This template's Dockerfile uses a slim image, so it usually fits, but be mindful when adding large ML libraries.
- Cold Starts: Can be mitigated by configuring Provisioned Concurrency, but this impacts cost.
- Running Heavy Models: Increasing --memory-size allows running larger models like PyTorch or TensorFlow, but balance this with cost.

5.3 Full-Fledged Operation and Scaling with Kubernetes (K8s)

For more advanced control and scalability, deployment to Kubernetes (K8s) is an option.

Deployment Overview:

Build the Docker image and push it to any container registry (Docker Hub, GCR, ECR, etc.).

Create Deployment and Service manifest files.

# deployment.yaml (excerpt)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server-deployment
spec:
  replicas: 2 # Initial replica count
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
      - name: mcp-server-container
        image: YOUR_REGISTRY/mcp-server:latest # Your pushed image
        ports:
        - containerPort: 8000
---
# service.yaml (excerpt)
apiVersion: v1
kind: Service
metadata:
  name: mcp-server-service
spec:
  selector:
    app: mcp-server
  ports:
    - protocol: TCP
      port: 80 # Port exposed by the Service
      targetPort: 8000 # Container's port
  type: LoadBalancer # Or ClusterIP/NodePort + Ingress

3.  Deploy with `kubectl apply -f deployment.yaml` and `kubectl apply -f service.yaml`.

Scaling and Availability:
- Horizontal Pod Autoscaler (HPA): Automatically scales the number of Pods based on CPU utilization or custom metrics (see official walkthrough).
```
# hpa.yaml (example based on CPU utilization)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```
- Ingress: Exposes the Service externally, providing routing, SSL termination, HTTP/2 support, etc. Many Ingress Controllers like Nginx Ingress Controller or Traefik support HTTP/2, which can potentially improve Streamable HTTP performance. Configuring the backend protocol in Ingress settings to HTTP2 (or an appropriate value like GRPC supported by the controller) can help maintain HTTP/2 along the path to the Pod (check specific annotations in your Ingress Controller's documentation).
Considerations:
- K8s has a steep learning curve but offers great flexibility and robustness once set up.
- Managed Kubernetes services (GKE, EKS, AKS, etc.) can reduce the operational burden of the control plane.

Utilizing these cloud platforms allows you to smoothly progress through the steps of "try it out → deploy to the cloud → integrate LLM/ML" starting from this template.

6. Community Trends and the Future of MCP: Active Discussion and Evolution

Since Anthropic's initial announcement (see Hacker News discussion: Model Context Protocol, around December 2024), MCP and related technologies have been actively discussed and continue to evolve within the developer community.

The Road to Streamable HTTP: Initially, MCP was a stateful protocol assuming long-lived connections. However, the difficulty of deploying in serverless environments led to a demand for more flexible communication methods. In GitHub Discussions, particularly "State, and long-lived vs. short-lived connections," developers from companies like Shopify and Automattic (WordPress.com) who were trying to use MCP discussed specific challenges (e.g., difficulties implementing SSE in PHP, serverless scaling issues) and proposed various solutions like session tokens, stateless/stateful protocol variants, and WebSocket usage. The current Streamable HTTP transport (HTTP POST + optional SSE) specification was adopted as a result of this active feedback loop, demonstrating MCP's evolution with the community.
Python MCP Servers on Cloudflare Workers: The aforementioned Cloudflare article also introduces how to build and deploy MCP servers using Python on Cloudflare Workers. This suggests new possibilities for MCP utilization in edge computing environments. Notably, the ability to easily expose existing FastAPI applications as MCP tools using the FastAPI-MCP library is welcome news for many Python developers.
Continuous Evolution of MCP Specifications and Roadmap:
- Cloudflare has indicated its policy to actively incorporate advanced features defined in the MCP specification—such as Resumability, Cancellability, and Session management—into its Agents SDK.
- MCP's official roadmap prioritizes "enhancement of authentication/authorization," "service registry and discovery features," "further improvements for streaming and serverless support (including Resumability and stateless operation support)," and "enrichment of multi-language SDKs and testing." This suggests that the protocol itself is expected to continue evolving towards greater robustness and scalability. The roadmap also lists "service discovery and stateless operation support for serverless" as important items, indicating improvements conscious of environments like Cloud Run and AWS Lambda.
Expansion of the MCP Ecosystem and Industry Adoption Trends:
- Entry of Major Players: A noteworthy development is OpenAI's official announcement in March 2025 of its adoption of MCP for its products. Starting with the Agents SDK, plans are in place to support MCP in ChatGPT and its APIs. CEO Sam Altman commented, "MCP has been well-received, and we are excited to add support to all our products." This fact is an extremely significant driving force for MCP to develop into an industry-standard protocol.
- Support from Cloud Vendors:
  - Microsoft: Involved with MCP from an early stage, developing the official C# SDK and implementing MCP integration in Copilot Studio and the Autogen framework. They also released an MCP server extension for Playwright (web test automation tool) and are advancing support in Azure OpenAI services and GitHub areas.
  - Google Cloud (including DeepMind): Announced the "MCP Toolbox for Databases" (formerly Gen AI Toolbox for Databases), supporting MCP as a standard for linking databases and AI agents. DeepMind CEO Demis Hassabis also confirmed the incorporation of MCP support into the next-generation Gemini SDK. Furthermore, Google is promoting scenarios where MCP agents run on Cloud Run, supporting it as a managed deployment target for Google's Agent2Agent (A2A) Protocol.
  - AWS: In April 2025, AWS open-sourced a collection of MCP servers for its AI code assistance services (presumed to be related to CodeWhisperer and CodeCatalyst). At KubeCon EU 2025, AWS also mentioned MCP support in Bedrock agents, showing its stance as a major cloud provider accepting MCP as a "bridging standard for AI and tools."
  - Cloudflare: Promoting MCP server hosting on its Workers edge platform, announcing Python and Streamable HTTP support updates in April 2025. They provide implementations supporting both new and old transports through their Agents SDK.
- Developer Tools and Startup Movements: Developer tool companies like Zed, Replit, Codeium, and Sourcegraph are moving to integrate MCP into their platforms. Adoption by cloud-native startups is also active, such as Kubiya, which provides a Kubernetes-based platform, and Solo.io, which released "MCP Gateway" by extending its API gateway OSS "Kgateway."
- Abundant Server Implementations and Community Enthusiasm: The MCP official website's example servers page (modelcontextprotocol.io/examples) lists a wide variety of official reference servers, including file system operations, DB integration, development tool integration, browser automation, communication tool integration, and AI-specific tools, demonstrating the breadth of MCP's applicability. Furthermore, leading companies like Axiom (log analysis), Browserbase (cloud browser automation), Cloudflare (developer platform), E2B (code execution sandbox), Neon (serverless Postgres), Prisma (DB management), Qdrant (vector search), Stripe (payments), Tinybird (real-time data platform), and Weaviate (Agentic RAG) are providing official MCP integrations for their platforms, accelerating ecosystem growth. The community is also actively developing and releasing MCP servers for popular tools and services like Docker container management, Kubernetes cluster operations, Linear issue tracking, Snowflake data warehouse integration, Spotify music control, and Todoist task management, showing that MCP's utility is expanding daily. The Python SDK's GitHub repository has garnered over 12,000 stars and 1,300 forks (as of May 2025), and active information exchange occurs in MCP-related Reddit communities. Reports of "hundreds of tool vendors advancing MCP integration" also indicate its high level of attention.
- Alternative Implementations and Tech Demos: Movements like Blaxel developing and open-sourcing a WebSocket-based MCP implementation (a fork of Supergateway) to solve SSE challenges are also seen. Additionally, tech demo videos like "MCP - Can Lambda do it? - Streamable HTTP Model Context Protocol" on the YouTube channel "the_context()" show growing interest from the tech community in combining Streamable HTTP with serverless architectures.
- Cloud Run Deployment Case Study: Mark W Kiehl's Medium article "Deploy Your Custom MCP AI Tool to Cloud Run" (May 1, 2025) introduces a procedure for deploying a custom MCP tool (using the python-a2a library to MCP-ify a LangChain tool) to Cloud Run, serving as a reference for understanding the practicalities of operating Python MCP servers on Cloud Run (though, as it predates SDK v1.8.0, it's unclear if it directly uses Streamable HTTP).
MCP's Roadmap and Challenges:
- MCP's official roadmap prioritizes "enhancement of authentication/authorization," "service registry and discovery features," "further improvements for streaming and serverless support (including Resumability and stateless operation support)," and "enrichment of multi-language SDKs and testing."
- On the other hand, analyses by Gartner and others point out that "currently, there are immature aspects such as the security model, and it is mainly used for desktop application integration." The community and companies are collaboratively working to solve practical operational challenges.

Dubbed the "USB-C port of the AI industry," MCP is seeing unusually rapid adoption not only by AI pioneers like Anthropic and OpenAI but also by major cloud vendors. This powerful momentum strongly suggests MCP's great potential to grow into the de facto standard connecting AI agents and external services, and its future developments are worth watching closely.

7. Summary and Next Steps

In this article, we've explained how the official support for the Streamable HTTP transport in MCP Python SDK v1.8.0 has made it easier and more flexible to operate Python-based MCP servers in cloud environments. We also introduced a minimalist, E2E-tested template to accelerate its development, touched upon key technological elements, cloud deployment examples, and active community trends.

We hope this template serves as a helpful aid in your MCP server development.
We encourage you to clone https://github.com/akitana-airtanker/mcp-python-streamable-e2e-test-template, first try out Streamable HTTP locally, and then expand its possibilities to the cloud.

8. References

MCP (Model Context Protocol) General
- MCP Python SDK v1.8.0 Release Notes: https://github.com/modelcontextprotocol/python-sdk/releases/tag/v1.8.0 (3)
- FastMCP (Server within SDK): https://github.com/modelcontextprotocol/python-sdk/tree/main/mcp/server/fastmcp (27)
- Model Context Protocol Official Site: https://modelcontextprotocol.io/introduction
- MCP Server Examples: https://modelcontextprotocol.io/examples
- Awesome MCP Servers (Community-curated list): https://github.com/punkpeye/awesome-mcp-servers
Key Technologies Used in This Template
- uv (Python Package Installer and Resolver): https://github.com/astral-sh/uv (23)
- Ruff (Python linter / formatter): https://github.com/astral-sh/ruff (21)
- pre-commit (Git hook management): https://pre-commit.com/ (18)
- VS Code Dev Containers: https://code.visualstudio.com/docs/devcontainers/containers (25)
- pytest (Python testing framework): https://docs.pytest.org/en/stable/
- pytest-asyncio (asyncio support for pytest): https://pytest-asyncio.readthedocs.io/ (15)
Related Technologies
- Docker: https://docs.docker.com/
- Python asyncio: https://docs.python.org/3/library/asyncio.html

DEV Community

Python MCP Remote Server — The Dawn of the Streamable HTTP Era ~ With a Minimalist Template Featuring uv / Docker / pytest ~

First Conclusion: Python-based MCP Servers Now Run in the Cloud

The Technical Appeal of Streamable HTTP: A Deeper Dive

1. Evolution of the Python SDK: Expanding Possibilities with Streamable HTTP Support

2. A Minimalist Template to Bridge the "Official Samples Gap"

3. Quick Start and Setup Details

3.1 Basic Startup Procedure

3.2 Key Setup Points

`uv` for Fast Environment Setup: Python Packaging's Next-Gen Ace

`pre-commit` and `Ruff` for Quality Maintenance: Modern Python Development Best Practices

VS Code Dev Container for Environment Reproducibility: A Smoother Development Experience

4. Peeking Inside the Template (Key File Explanations)

`src/server.py` - The Heart of the MCP Server

`src/client.py` - Client to Interact with the Server

`Dockerfile` - Reproducible Execution Environment

`tests/` Directory - Quality Assurance with E2E Tests

5. Advanced Use Cases: Deployment to the Cloud

5.1 Easy Serverless Deployment with Google Cloud Run

5.2 Flexible Serverless Experience with AWS Lambda and Function URLs

5.3 Full-Fledged Operation and Scaling with Kubernetes (K8s)

6. Community Trends and the Future of MCP: Active Discussion and Evolution

7. Summary and Next Steps

8. References

Top comments (0)

First Conclusion: Python-based MCP Servers Now Run in the Cloud

The Technical Appeal of Streamable HTTP: A Deeper Dive

1. Evolution of the Python SDK: Expanding Possibilities with Streamable HTTP Support

2. A Minimalist Template to Bridge the "Official Samples Gap"

3. Quick Start and Setup Details

3.1 Basic Startup Procedure

3.2 Key Setup Points

uv for Fast Environment Setup: Python Packaging's Next-Gen Ace

pre-commit and Ruff for Quality Maintenance: Modern Python Development Best Practices

VS Code Dev Container for Environment Reproducibility: A Smoother Development Experience

4. Peeking Inside the Template (Key File Explanations)

src/server.py - The Heart of the MCP Server

src/client.py - Client to Interact with the Server

Dockerfile - Reproducible Execution Environment

tests/ Directory - Quality Assurance with E2E Tests

5. Advanced Use Cases: Deployment to the Cloud

5.1 Easy Serverless Deployment with Google Cloud Run

5.2 Flexible Serverless Experience with AWS Lambda and Function URLs

5.3 Full-Fledged Operation and Scaling with Kubernetes (K8s)

6. Community Trends and the Future of MCP: Active Discussion and Evolution

7. Summary and Next Steps

8. References

`uv` for Fast Environment Setup: Python Packaging's Next-Gen Ace

`pre-commit` and `Ruff` for Quality Maintenance: Modern Python Development Best Practices

`src/server.py` - The Heart of the MCP Server

`src/client.py` - Client to Interact with the Server

`Dockerfile` - Reproducible Execution Environment

`tests/` Directory - Quality Assurance with E2E Tests