π This is the third chapter of a series where I document what I'm learning about Model Context Protocol architecture and tool implementations
In Chapter 2, I built a GitHub Stats MCP server with CHAOSS security metrics. It worked, it returned real data, and Goose could orchestrate its tools into a security report. But it had one big limitation: it only ran on my laptop.
This time, I put it on the cloud for an external MCP client to reach it.
That means introducing a new MCP transport mechanism, containerizing the server with Docker, deploying it on Hugging Face Spaces, and setting up Goose extensions to point at a public URL instead of running a local command. Let's dive in!s
Laying out Foundational Concepts
Let's start with what the MCP spec documentation says about transports and then look at where Goose, the mcp host used in this example, fits in.
Streamable HTTP MCP Transport
In MCP, a transport is the layer that defines how a client and a server exchange their JSON-RPC messages. The standard currently defines two transport mechanisms, and the journey in this chapter is about moving from one model to the other:
stdio: The client launches the server as a subprocess and they communicate through standard input/output pipes.
Streamable HTTP: The MCP server runs as an independent HTTP service, typically exposed through a single endpoint. Clients send JSON-RPC messages using HTTP POST requests. The server can reply with a regular JSON response, or, when it needs to send multiple messages over time, it can return an SSE stream.
So while in Chapters 1 and 2, stdio was perfect for local development, for cloud deployment, I need a different model: one server running independently, reachable over the network, and updated in one place. That is what Streamable HTTP gives me.
Goose's Role as MPC host
In Chapter 2, Goose was the host. It holds the conversation, runs the model, and decides when to call my MCP tools.
What changes now is the transport. The tools, the resources, and the prompt my server exposes are identical to Chapter 2. Switching to Streamable HTTP doesn't touch any of them, it just changes how the client reaches the server: instead of Goose launching my server as a subprocess, Goose connects to a server that's already running at a URL. And because the door is now a public URL, any other MCP-compatible host that supports Streamable HTTP could connect to it too
An Always-on MCP Server
The server now has to stay up on its own, so it needs to live somewhere that isn't my laptop, which is why I containerize it and deploy it.
One bit of cloud plumbing is worth naming now because it shows up in the practical section: many cloud platforms put a reverse proxy in front of your container. Long-lived open streams like SSE can work, but they are often fragile unless the platform is configured for them (proxies may enforce idle timeouts or drop connections). For simpler deployments like this one, I run the server in a stateless mode where each interaction is handled as a plain HTTP request/response
With the standard in hand, the rest is implementation!
Step 1: Teaching the Server to Speak Streamable HTTP
The tempting move is to just switch the transport. Don't. Current Goose setup spawns the server over stdio, and flipping the transport would break it.
A cleaner pattern is one codebase that can run either transport, chosen at runtime. The tools, resources, and the prompt do not change. What changes is the way the host reaches the server: locally through stdio, or remotely through Streamable HTTP.
In practice, that means reading the transport from an environment variable and passing the right host and port when the server runs in HTTP mode.
import os
from mcp.server.fastmcp import FastMCP
mcp = FastMCP(
"github-stats",
host=os.getenv("MCP_HOST", "127.0.0.1"),
port=int(os.getenv("MCP_PORT", "8000")),
stateless_http=True,
json_response=True,
)
# In this section, you add the tools, the resources, and the prompt, which stays exactly the same as chapter 2 shows
if __name__ == "__main__":
transport = os.getenv("MCP_TRANSPORT", "stdio")
mcp.run(transport=transport)
stateless_http=Trueandjson_response=Trueis the reverse-proxy problem from the theory section
Then, to test it locally:
MCP_TRANSPORT=streamable-http uv run server.py
The server is now live at http://127.0.0.1:8000/mcp. I opened the MCP Inspector, set the transport to Streamable HTTP, pointed it at that URL, and saw the exact same tools, resource, and prompt I had over stdio.
Step 2: Containerizing it
Hugging Face Spaces can host an arbitrary server as a Docker Space: you hand it a Dockerfile, it builds a container and runs it on a public HTTPS URL. So the next step was packaging the server.
Here's the shape of what the Space needs and what each piece is for:
github-stats-mcp/
βββ server.py # the MCP server (same as Chapter 2 + dual-transport add)
βββ requirements.txt # include mcp[cli], httpx, python-dotenv
βββ Dockerfile # set how HF builds and runs the container
βββ README.md # YAML header to tell HF this is a Docker Space
βββ .dockerignore # keeps junk and secrets out of the image
βββ .env # keep it local
The Dockerfile creates a non-root user (HF Spaces runs containers as a non-root user with uid 1000), and it bakes in the environment variables that flip the server into cloud mode:
MCP_TRANSPORT=streamable-http, MCP_HOST=0.0.0.0 (so it's reachable from outside the container), and MCP_PORT=7860 (the port HF routes public traffic to)
FROM python:3.11-slim
RUN useradd -m -u 1000 user
USER user
ENV PATH="/home/user/.local/bin:$PATH"
WORKDIR /app
COPY --chown=user requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY --chown=user . .
ENV MCP_TRANSPORT=streamable-http
ENV MCP_HOST=0.0.0.0
ENV MCP_PORT=7860
EXPOSE 7860
CMD ["python", "server.py"]
And .dockerignore keeps the noise (and, importantly, secrets, under .env) out of the image
Step 3: Deploying to Hugging Face
You can easily set the space SDK via Hugging Face
The README needs a YAML header so HF treats it as a Docker Space. The two lines that actually matter are sdk: docker and app_port: 7860 (which has to match MCP_PORT in the Dockerfile).
---
title: "GitHub Stats MCP"
emoji: π
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
---
Then you create a new Docker Space, push server.py, Dockerfile, requirements.txt, README.md, and .dockerignore in your HF repo, and set the token as a secret in Settings. The server went live at a public .hf.space URL
Step 4: Reconnecting Goose to the Cloud
In Goose, I added the server as a remote extension pointing at the .hf.space/mcp URL instead of a local command. Same model switching, same chat, now talking to a server in the cloud. And not just Goose: any MCP client anywhere can now connect to that URL.
As last time, I used Qwen3-32b (via Groq) with my MCP tool activated and sent the prompt:
- Goose passed it to the LLM along with my tool descriptions
- The model decided it needed the tool
- Goose executed that call against my server
- The model formatted the result the way we wanted, grounded in the CHAOSS expert and context
I used the same project as in Chapter 2:
Then it dropped the report:
Authentication Note
At the end of Chapter 2, I said I would add a security layer with OpenID Connect in this cloud chapter. Once I actually deployed the server, I changed my mind. There are a few reasons.
The deployment itself deserved a full chapter: Adding auth on top would have buried the main story (moving the MCP server from stdio to a public cloud environment). That's why authentication will get its own treatment in a later chapter
Deploying it publicly made the auth problem more concrete: A public server changes the threat model immediately: anyone with the URL can call its tools, and every call consumes the GitHub tokenβs rate limit.
I am treating the deployment as a temporary sandbox, not a production setup: The server exposes no sensitive data, and the GitHub token is deliberately constrained: fine-grained, read-only, public-repositories-only, and with no account-level permissions.
But please note that this does not make authentication unnecessary. It only makes the remaining risk acceptable for this specific experiment (e.g., someone could burn through the tokenβs read rate limit for a while, but they could not access private data or perform write actions as me). For any persistent, shared, write-capable, or private-data-connected server, I would add authentication before exposing it publicly.
Final Remarks and Questions for the Next Chapter
This chapter shows the moment my server stopped being a local process someone runs and became a hosted service available at a URL. The server is deployed, and anyone can call it.
Deploying made the next question worth testing: What does MCP actually improve in the quality of the response, compared with asking the same model without MCP tools or with plain web search?
In the next chapter, I build a second server based on the other CHAOSS practitioner guide and metrics set: Responsivenessβ , and its open knowledge. Then I run an experiment inspired by a community peer at CHAOSS, asking the same question to the same model in three different ways: with my MCP tools, without my MCP tools, and with plain web search.





Top comments (0)