Ana Jimenez Santamaria

Posted on Jun 28

Taking my GitHub Stats MCP Server to the Cloud 🚀☁️

#mcp #docker #agents #opensource

👋 This is the third chapter of a series where I document what I'm learning about Model Context Protocol architecture and tool implementations

In Chapter 2, I built a GitHub Stats MCP server with CHAOSS security metrics. It worked, it returned real data, and Goose could orchestrate its tools into a security report. But it had one big limitation: it only ran on my laptop.

This time, I put it on the cloud for an external MCP client to reach it.

That means introducing a new MCP transport mechanism, containerizing the server with Docker, deploying it on Hugging Face Spaces, and setting up Goose extensions to point at a public URL instead of running a local command. Let's dive in!s

Laying out Foundational Concepts

Let's start with what the MCP spec documentation says about transports and then look at where Goose, the mcp host used in this example, fits in.

Streamable HTTP MCP Transport

In MCP, a transport is the layer that defines how a client and a server exchange their JSON-RPC messages. The standard currently defines two transport mechanisms, and the journey in this chapter is about moving from one model to the other:

stdio: The client launches the server as a subprocess and they communicate through standard input/output pipes.
Streamable HTTP: The MCP server runs as an independent HTTP service, typically exposed through a single endpoint. Clients send JSON-RPC messages using HTTP POST requests. The server can reply with a regular JSON response, or, when it needs to send multiple messages over time, it can return an SSE stream.

So while in Chapters 1 and 2, stdio was perfect for local development, for cloud deployment, I need a different model: one server running independently, reachable over the network, and updated in one place. That is what Streamable HTTP gives me.

Goose's Role as MPC host

In Chapter 2, Goose was the host. It holds the conversation, runs the model, and decides when to call my MCP tools.

What changes now is the transport. The tools, the resources, and the prompt my server exposes are identical to Chapter 2. Switching to Streamable HTTP doesn't touch any of them, it just changes how the client reaches the server: instead of Goose launching my server as a subprocess, Goose connects to a server that's already running at a URL. And because the door is now a public URL, any other MCP-compatible host that supports Streamable HTTP could connect to it too

An Always-on MCP Server

The server now has to stay up on its own, so it needs to live somewhere that isn't my laptop, which is why I containerize it and deploy it.

One bit of cloud plumbing is worth naming now because it shows up in the practical section: many cloud platforms put a reverse proxy in front of your container. Long-lived open streams like SSE can work, but they are often fragile unless the platform is configured for them (proxies may enforce idle timeouts or drop connections). For simpler deployments like this one, I run the server in a stateless mode where each interaction is handled as a plain HTTP request/response

With the standard in hand, the rest is implementation!

Step 1: Teaching the Server to Speak Streamable HTTP

The tempting move is to just switch the transport. Don't. Current Goose setup spawns the server over stdio, and flipping the transport would break it.

A cleaner pattern is one codebase that can run either transport, chosen at runtime. The tools, resources, and the prompt do not change. What changes is the way the host reaches the server: locally through stdio, or remotely through Streamable HTTP.

In practice, that means reading the transport from an environment variable and passing the right host and port when the server runs in HTTP mode.

import os
from mcp.server.fastmcp import FastMCP
mcp = FastMCP(
    "github-stats",
    host=os.getenv("MCP_HOST", "127.0.0.1"),
    port=int(os.getenv("MCP_PORT", "8000")),
    stateless_http=True,
    json_response=True,
)

# In this section, you add the  tools, the resources, and the prompt, which stays exactly the same as chapter 2 shows

if __name__ == "__main__":
    transport = os.getenv("MCP_TRANSPORT", "stdio")
    mcp.run(transport=transport)

stateless_http=True and json_response=True is the reverse-proxy problem from the theory section

Then, to test it locally:

MCP_TRANSPORT=streamable-http uv run server.py

The server is now live at http://127.0.0.1:8000/mcp. I opened the MCP Inspector, set the transport to Streamable HTTP, pointed it at that URL, and saw the exact same tools, resource, and prompt I had over stdio.

Step 2: Containerizing it

Hugging Face Spaces can host an arbitrary server as a Docker Space: you hand it a Dockerfile, it builds a container and runs it on a public HTTPS URL. So the next step was packaging the server.

Here's the shape of what the Space needs and what each piece is for:

github-stats-mcp/
├── server.py          # the MCP server (same as Chapter 2 + dual-transport add)
├── requirements.txt   # include mcp[cli], httpx, python-dotenv
├── Dockerfile         # set how HF builds and runs the container
├── README.md          # YAML header to tell HF this is a Docker Space
├── .dockerignore      # keeps junk and secrets out of the image
└── .env               # keep it local

The Dockerfile creates a non-root user (HF Spaces runs containers as a non-root user with uid 1000), and it bakes in the environment variables that flip the server into cloud mode:

MCP_TRANSPORT=streamable-http, MCP_HOST=0.0.0.0 (so it's reachable from outside the container), and MCP_PORT=7860 (the port HF routes public traffic to)

FROM python:3.11-slim

RUN useradd -m -u 1000 user
USER user
ENV PATH="/home/user/.local/bin:$PATH"

WORKDIR /app

COPY --chown=user requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY --chown=user . .

ENV MCP_TRANSPORT=streamable-http
ENV MCP_HOST=0.0.0.0
ENV MCP_PORT=7860

EXPOSE 7860

CMD ["python", "server.py"]

And .dockerignore keeps the noise (and, importantly, secrets, under .env) out of the image

Step 3: Deploying to Hugging Face

You can easily set the space SDK via Hugging Face

The README needs a YAML header so HF treats it as a Docker Space. The two lines that actually matter are sdk: docker and app_port: 7860 (which has to match MCP_PORT in the Dockerfile).

---
title: "GitHub Stats MCP"
emoji: 📊
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
---

Then you create a new Docker Space, push server.py, Dockerfile, requirements.txt, README.md, and .dockerignore in your HF repo, and set the token as a secret in Settings. The server went live at a public .hf.space URL

Step 4: Reconnecting Goose to the Cloud

In Goose, I added the server as a remote extension pointing at the .hf.space/mcp URL instead of a local command. Same model switching, same chat, now talking to a server in the cloud. And not just Goose: any MCP client anywhere can now connect to that URL.

As last time, I used Qwen3-32b (via Groq) with my MCP tool activated and sent the prompt:

Goose passed it to the LLM along with my tool descriptions
The model decided it needed the tool
Goose executed that call against my server
The model formatted the result the way we wanted, grounded in the CHAOSS expert and context

I used the same project as in Chapter 2:

Then it dropped the report:

Authentication Note

At the end of Chapter 2, I said I would add a security layer with OpenID Connect in this cloud chapter. Once I actually deployed the server, I changed my mind. There are a few reasons.

The deployment itself deserved a full chapter: Adding auth on top would have buried the main story (moving the MCP server from stdio to a public cloud environment). That's why authentication will get its own treatment in a later chapter
Deploying it publicly made the auth problem more concrete: A public server changes the threat model immediately: anyone with the URL can call its tools, and every call consumes the GitHub token’s rate limit.
I am treating the deployment as a temporary sandbox, not a production setup: The server exposes no sensitive data, and the GitHub token is deliberately constrained: fine-grained, read-only, public-repositories-only, and with no account-level permissions.

But please note that this does not make authentication unnecessary. It only makes the remaining risk acceptable for this specific experiment (e.g., someone could burn through the token’s read rate limit for a while, but they could not access private data or perform write actions as me). For any persistent, shared, write-capable, or private-data-connected server, I would add authentication before exposing it publicly.

Final Remarks and Questions for the Next Chapter

This chapter shows the moment my server stopped being a local process someone runs and became a hosted service available at a URL. The server is deployed, and anyone can call it.

Deploying made the next question worth testing: What does MCP actually improve in the quality of the response, compared with asking the same model without MCP tools or with plain web search?

In the next chapter, I build a second server based on the other CHAOSS practitioner guide and metrics set: Responsiveness⁠, and its open knowledge. Then I run an experiment inspired by a community peer at CHAOSS, asking the same question to the same model in three different ways: with my MCP tools, without my MCP tools, and with plain web search.

DEV Community