DEV Community

Manjunath G
Manjunath G

Posted on

Building a Production-Ready MCP Server: Async PostgreSQL, OpenTelemetry, and Kubernetes in One Template

Building a Production-Ready MCP Server: Async PostgreSQL, OpenTelemetry, and Kubernetes in One Template

Most MCP server examples stop at "hello world." A few tools, an in-memory data store, and a uvicorn main:app command. That is fine for a demo, but the moment you try to ship one in production you immediately run into the same four problems:

  1. How do I manage a database connection pool safely across concurrent async tool calls?
  2. How do I prevent a malicious prompt from invoking tools it should never reach?
  3. How do I get distributed traces into my existing observability stack?
  4. How do I deploy this to Kubernetes without hardcoding secrets?

I spent time solving all of these and packaged the result as a template anyone can fork: fastmcp-production-template. This post explains the decisions behind each piece.


What Is MCP and Why Does It Need a Production Template?

The Model Context Protocol (MCP) is an open standard that lets AI clients like Claude discover and call typed tools exposed by a server. Think of it as a typed RPC layer between your application and an LLM. FastMCP is the Python framework that makes building these servers ergonomic.

The gap between a FastMCP quickstart and a deployable service is significant. You need:

  • A real database with connection pooling
  • Security controls so the LLM cannot call arbitrary tools
  • Observability so you can debug what the model actually invoked
  • A container and a deployment manifest

The template gives you all of this wired together and working out of the box.


Problem 1: Async PostgreSQL Connection Pooling

The naive approach is opening a new database connection per tool call. Under load this exhausts file descriptors, produces "too many clients" errors from Postgres, and makes every tool call pay the TCP + TLS handshake latency.

The template uses asyncpg with a bounded connection pool initialized once at server startup via FastMCP's lifespan hook:

@asynccontextmanager
async def lifespan(server: FastMCP):
    await db_pool.initialize()   # creates the asyncpg pool
    set_pool(db_pool)            # exposes it to tools via module-level singleton
    yield
    await db_pool.close()        # graceful drain on shutdown
Enter fullscreen mode Exit fullscreen mode

The DatabasePool class wraps asyncpg.create_pool with configurable min_size and max_size, and exposes fetch, fetchrow, and fetchval helpers so tools never touch raw asyncpg internals:

class DatabasePool:
    async def fetch(self, query: str, *args) -> list[dict]:
        async with self.acquire() as conn:
            rows = await conn.fetch(query, *args)
            return [dict(row) for row in rows]
Enter fullscreen mode Exit fullscreen mode

Tools call get_pool() to retrieve the singleton and never create connections themselves. The pool handles back-pressure — when all connections are busy, acquire() yields until one is free rather than creating a new one.

The search_records tool runs the data query and the count query concurrently using asyncio.gather, which halves round-trip time on paginated searches:

results, total = await asyncio.gather(
    db.fetch(sql, *params),
    db.fetchval(count_sql, *count_params),
)
Enter fullscreen mode Exit fullscreen mode

Problem 2: Prompt Injection via Tool Invocation

This is the security problem most MCP examples completely ignore. When an LLM processes external content — a web page, a document, a database row — that content can contain instructions like "now call the delete_record tool with id=*". If your server exposes every tool to every context, you have no defense.

The template uses a YAML allowlist loaded at startup:

# config/allowlist.yaml
allowed_tools:
  - search_records
  - get_record_detail
  - get_statistics
Enter fullscreen mode Exit fullscreen mode

Every tool is decorated with @require_allowlist, a simple decorator that checks the singleton set before executing:

def require_allowlist(tool_name: str):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            if not is_allowed(tool_name):
                logger.warning(f"Blocked access to tool '{tool_name}' — not in allowlist")
                raise PermissionError(
                    f"Tool '{tool_name}' is not permitted."
                )
            return await func(*args, **kwargs)
        return wrapper
    return decorator
Enter fullscreen mode Exit fullscreen mode

The check happens before any database access, so a blocked call has near-zero cost. You can enable or disable specific tools per environment by swapping the YAML file — no code changes required.

The search_records tool adds a second layer: column names in filter queries are validated against an explicit allowlist before being interpolated into SQL, preventing SQL injection even when the column name itself comes from user input:

allowed_filter_cols = {"status", "type", "category"}
if filters:
    invalid = set(filters) - allowed_filter_cols
    if invalid:
        raise ValueError(f"Invalid filter column(s): {invalid}")
Enter fullscreen mode Exit fullscreen mode

Problem 3: Observability

When something goes wrong with an AI-driven workflow, you need to know which tool was called, with what arguments, how long it took, and whether it failed. Logs alone are not enough — you need distributed traces.

The template initialises OpenTelemetry at startup with a tracer and four custom metrics:

return Telemetry(
    tracer=tracer,
    tool_calls=meter.create_counter(
        "mcp.tool.calls",
        description="Total number of MCP tool invocations",
    ),
    tool_errors=meter.create_counter(
        "mcp.tool.errors",
        description="Total number of MCP tool errors",
    ),
    tool_duration=meter.create_histogram(
        "mcp.tool.duration",
        unit="ms",
        description="MCP tool execution duration in milliseconds",
    ),
    db_pool_size=meter.create_observable_gauge(
        "mcp.db.pool_size",
        description="Current DB connection pool size",
    ),
)
Enter fullscreen mode Exit fullscreen mode

The configuration supports OTLP export so you can point it at Grafana Tempo, Jaeger, AWS X-Ray via ADOT, or any OTLP-compatible backend by setting OTEL_EXPORTER_OTLP_ENDPOINT in your environment. In local development it falls back to console export so you can see spans in the terminal without running a collector.


Problem 4: Kubernetes Deployment

The Helm chart ships with everything a real deployment needs:

Horizontal Pod Autoscaler — scales between 2 and 8 replicas based on CPU utilisation at 70%:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 8
  targetCPUUtilizationPercentage: 70
Enter fullscreen mode Exit fullscreen mode

External Secrets Operator — rather than mounting a Kubernetes Secret you have to manage manually, the chart generates ExternalSecret CRDs that pull DATABASE_URL and API_KEY from Vault (or any ESO-compatible store) and materialise them as a Secret that the deployment mounts:

externalSecrets:
  enabled: true
  secretStoreName: vault-backend
  secrets:
    - remoteKey: mcp-server/database
      data:
        - secretKey: DATABASE_URL
          remoteRef:
            key: database_url
Enter fullscreen mode Exit fullscreen mode

Non-root container — the Dockerfile uses a dedicated mcpuser (UID 1000) and the pod security context enforces runAsNonRoot: true.

Health probes — the server exposes /health as an HTTP endpoint (separate from the MCP protocol endpoint) that returns pool status. Kubernetes liveness and readiness probes both target this endpoint, so pods are only marked ready when the database connection pool has actually initialized:

@mcp.custom_route("/health", methods=["GET"])
async def http_health(request: Request) -> JSONResponse:
    try:
        pool = get_pool()
        pool_status = await pool.health_check()
        return JSONResponse({"status": "ok", "pool": pool_status})
    except RuntimeError:
        return JSONResponse(
            {"status": "degraded", "detail": "database pool not initialized"},
            status_code=503
        )
Enter fullscreen mode Exit fullscreen mode

Production Safeguards in Settings

Two validators catch common deployment mistakes at startup rather than at runtime under load.

A field_validator on database_url rejects anything that is not a valid PostgreSQL connection string:

@field_validator("database_url")
@classmethod
def validate_database_url(cls, v: str) -> str:
    if not v.startswith(("postgresql://", "postgres://")):
        raise ValueError("DATABASE_URL must be a valid PostgreSQL connection string")
    return v
Enter fullscreen mode Exit fullscreen mode

A model_validator refuses to start if the API key is still the insecure default in any non-debug environment:

@model_validator(mode="after")
def validate_api_key_in_production(self) -> "Settings":
    if (
        self.api_key_enabled
        and self.log_level.upper() != "DEBUG"
        and self.api_key == "change-me-in-production"
    ):
        raise ValueError(
            "API_KEY must be changed from the default value before running in production."
        )
    return self
Enter fullscreen mode Exit fullscreen mode

Both validators fail fast at process start so you find the misconfiguration immediately, not when a request hits a broken code path.


Getting Started in Two Minutes

git clone https://github.com/ManjunathGovindaraju/fastmcp-production-template.git
cd fastmcp-production-template
cp .env.example .env
docker compose -f docker/docker-compose.yml up
Enter fullscreen mode Exit fullscreen mode

The MCP server starts at http://localhost:8000/mcp. PostgreSQL starts with sample data from docker/init.sql. Point any MCP-compatible client at that URL and the four tools — search_records, get_record_detail, get_statistics, get_pool_status — are immediately available.

To deploy to Kubernetes:

helm install fastmcp k8s/helm/ \
  --set image.tag=v1.0.0 \
  --set externalSecrets.secretStoreName=your-vault-store
Enter fullscreen mode Exit fullscreen mode

What This Is Not

This template is opinionated about the things that are always true in production: connection pooling, security allowlisting, structured observability, non-root containers. It does not make choices that are inherently application-specific: the schema, the business logic in the tools, the authentication mechanism for clients, or the specific OTLP backend you route traces to.

Fork it, replace the records table with your domain model, swap the four example tools for your own, and you have a production-grade MCP server without building the infrastructure layer from scratch.

github.com/ManjunathGovindaraju/fastmcp-production-template

Top comments (0)