DEV Community: Chris Burns

From Black Box to Observable: Deploying ToolHive with OTel + Prometheus in Kubernetes

Chris Burns — Tue, 30 Sep 2025 17:49:19 +0000

In our previous two posts, we laid the groundwork for modern Kubernetes observability. We explored why OpenTelemetry (OTel) and Prometheus work best in tandem, and how ToolHive helps bridge the observability gap for Model Context Protocol (MCP) servers that rarely expose their own usage metrics.

Now it's time to get hands-on. ToolHive sits in front of your MCP servers, collecting vital usage statistics and feeding them directly into your existing observability stack. In this tutorial, we'll walk through deploying ToolHive in a Kubernetes cluster alongside OTel, Prometheus, and Grafana. By the end, you'll have transformed your black-box MCP workloads into observable services.

Prerequisites: Kubernetes, Helm, kubectl

Before we begin, you'll need the following tools:

A Kubernetes cluster: Any cluster will do. For this tutorial, we're using a local cluster created with kind
Helm 3: The package manager for Kubernetes, making it easy to deploy complex applications like monitoring stacks
kubectl: The command-line tool for interacting with your Kubernetes cluster

If you're using kind, create a cluster with:

kind create cluster --name toolhive-demo

Output the kubeconfig file into a file called kconfig.yaml so we can avoid conflicting with existing clusters when running further kubectl and helm commands.

kind get kubeconfig --name toolhive-demo > kconfig.yaml

Verify your cluster is ready:

kubectl cluster-info --kubeconfig kconfig.yaml
Kubernetes control plane is running at https://127.0.0.1:55371
CoreDNS is running at https://127.0.0.1:55371/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.


kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
toolhive-control-plane   Ready    control-plane   14m   v1.33.1

You should see your cluster responding and nodes in a Ready state.

Installing Prometheus + Grafana

We'll start by setting up our monitoring backbone using the kube-prometheus-stack Helm chart. This comprehensive solution deploys Prometheus for metric collection and Grafana for visualization, with everything pre-configured to work together out of the box.

First, add the Prometheus community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a dedicated namespace for our monitoring components and install the stack:

helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 77.12.0 -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/prometheus-stack-values.yaml -n monitoring --create-namespace --kubeconfig kconfig.yaml

The values file from the ToolHive repository is specifically configured for this tutorial's architecture and sets up:

Prometheus with scrape jobs ready to pull metrics from our OTel collector
Grafana with admin credentials (admin:admin for testing)

Wait for all components to be ready:

kubectl get pods -n monitoring --kubeconfig kconfig.yaml
NAME                                                        READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-grafana-6c5cb68857-4hmw4              3/3     Running   0          30s
kube-prometheus-stack-kube-state-metrics-557fd457c6-c489z   1/1     Running   0          30s
kube-prometheus-stack-operator-7c6d8c4dc7-j2g24             1/1     Running   0          30s
kube-prometheus-stack-prometheus-node-exporter-t5q9w        1/1     Running   0          30s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          30s

All pods should show Running or Completed status.

Installing the OTel Collector

With our monitoring backend in place, we'll deploy the OpenTelemetry collector. The collector is a crucial component that receives metrics and traces from ToolHive and makes them available to Prometheus and tracing backends.

Add the OpenTelemetry Helm repository:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

Install the collector using ToolHive's specialized values file:

helm upgrade -i otel-collector open-telemetry/opentelemetry-collector  -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/otel-values.yaml -n monitoring --kubeconfig kconfig.yaml

This values file configures the collector to:

Use the OTLP receiver to accept metrics and traces pushed from ToolHive
Enable the kubeletstats receiver, providing valuable runtime metrics about containers and nodes that native OTel libraries sometimes miss
Enable the Kubernetes attributes processor to add pod/namespace context to telemetry
Enable the Prometheus exporter, making collected metrics available for Prometheus scraping
Configure service pipelines that route metrics and traces appropriately

Verify the collector is running:

kubectl get pods -n monitoring -l app.kubernetes.io/name=opentelemetry-collector --kubeconfig kconfig.yaml
NAME                                                 READY   STATUS    RESTARTS   AGE
otel-collector-opentelemetry-collector-agent-g5crz   1/1     Running   0          33s

Installing Jaeger Backend for Trace Querying

Add the Jaeger Helm repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update

Install Jaeger using ToolHive's specialized values file:

helm upgrade -i jaeger-all-in-one jaegertracing/jaeger -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/jaeger-values.yaml -n monitoring --kubeconfig kconfig.yaml

Deploying ToolHive Operator and MCP Server

Now that our observability stack is ready, we can deploy the ToolHive operator. The operator is a Kubernetes-native tool that simplifies the management and deployment of MCP servers with built-in observability.

Install the CRDs (Custom Resource Definitions) and the operator:

helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.0.27 --kubeconfig kconfig.yaml

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  -n toolhive-system \
  --create-namespace --version 0.2.18 \
  --kubeconfig kconfig.yaml

Wait for the operator to be ready:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          31s

Now let's deploy a sample MCP server. We'll use gofetch, a simple MCP server that provides web scraping capabilities:

kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/operator/mcp-servers/mcpserver_fetch_otel.yaml --kubeconfig kconfig.yaml

This MCPServer custom resource automatically configures:

The MCP server container (gofetch)
ToolHive proxy for client access and observability
Telemetry settings for OTel integration
Service configuration for client access

Let's examine the key telemetry configuration in the MCPServer resource:

spec:
  telemetry:
    openTelemetry:
      enabled: true
      endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
      serviceName: mcp-fetch-server
      metrics:
        enabled: true
      tracing:
        enabled: true
        samplingRate: "1.0"

This tells ToolHive to:

Enable OpenTelemetry export for both metrics and traces
Send telemetry to our OTel collector using gRPC
Include metrics about MCP operations

Verify everything is running:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml
NAME                                READY   STATUS    RESTARTS   AGE
fetch-0                             1/1     Running   0          2m59s
fetch-7d988cbd46-cqdzq              1/1     Running   0          3m4s
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          3m31s


kubectl get mcpserver -n toolhive-system --kubeconfig kconfig.yaml
NAME    STATUS    URL                                                             AGE
fetch   Running   http://mcp-fetch-proxy.toolhive-system.svc.cluster.local:8080   2m27s

You should see both the MCP server pod and the MCPServer custom resource showing as ready.

Generating Traffic to Produce Metrics and Traces

To see metrics in action, we need to generate some traffic. Since the MCP server is running inside the cluster, we'll use kubectl port-forward to expose it locally.

Port-forward the ToolHive proxy service:

kubectl port-forward -n toolhive-system service/mcp-fetch-proxy --kubeconfig kconfig.yaml 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

In a new terminal, initialize an MCP session to get a session ID:

SESSION_ID=$(curl -s -D /dev/stderr \
  -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-06-18",
      "capabilities": {},
      "clientInfo": {
        "name": "curl-client",
        "version": "1.0.0"
      }
    }
  }' 2>&1 >/dev/null | grep "Mcp-Session-Id:" | cut -d' ' -f2 | tr -d '\r')

echo "Session ID: $SESSION_ID"

Now use the session ID to make tool call requests:

curl -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -H "Mcp-Session-Id: $SESSION_ID" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "fetch",
      "arguments": {
        "url": "https://github.com/stacklok/toolhive",
        "max_length": 100,
        "raw": false
      }
    }
  }'

The response should look something like:

event: message
id: D4AVERIINDK3ZTBYUKUHM5PCV5_0
data: {"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"![ToolHive Studio logo](/stacklok/toolhive/raw/main/docs/images/toolhive-icon-1024.png)![ToolHive wo\n\n[Content truncated. Use start_index to get more content.]"}]}}

Repeat this request 10-15 times with different URLs to generate meaningful traffic for our metrics dashboards:

Each request generates telemetry data that ToolHive captures and forwards to the OTel collector.

Visualizing Metrics in Grafana

Now for the exciting part: visualizing our metrics! Since the kube-prometheus-stack automatically deploys Grafana, we just need to expose it locally.

Port-forward the Grafana service:

kubectl port-forward -n monitoring service/kube-prometheus-stack-grafana 3000:80 --kubeconfig kconfig.yaml

Navigate to http://localhost:3000 and log in with the default credentials:

Username: admin
Password: admin

Importing the ToolHive Dashboard

We've created a starter dashboard to help you visualize some simple MCP server metrics. To import it:

Click the "+" icon in the top-right of the Grafana UI
Select "Import dashboard"
In the "Import via panel JSON" text box, paste the contents from an example dashboard that we've created
Click "Load" and then "Import"
- You may need to modify the UID to be shorter; sometimes Grafana doesn’t like long UIDs for it to successfully import

After importing, you'll see panels with real-time data showing.

Exploring Metrics with PromQL

You can also explore metrics directly using Prometheus queries. Go to "Explore" in Grafana and try queries on the custom ToolHive metrics:

This is powerful because it uses the same tools and dashboards you already use for your other workloads, bringing your MCP servers into the fold of your existing observability practice.

Note that CPU and memory metrics come from the OTel collector's kubeletstats receiver rather than directly from the Go application, providing more comprehensive resource monitoring.

Visualizing Tracing in Grafana

We can also explore some traces in Grafana that were reported by the ToolHive ProxyRunner to the OTel collector and then further to the Jaeger backend.

To do this:

Go to "Explore" on the side menu
Ensure the Jaeger Data source is selected
Click “Search” instead of “TraceID”
Select the “Service Name” dropdown and you should see the MCP server name. Select it and click “Run Query”
Several traces should appear, click into one and you should now see the single span reported by the ProxyRunner

Congratulations! You've successfully deployed a complete observability pipeline for MCP workloads and queried their metrics and traces. You've transformed a black-box service into a transparent, observable part of your system. This setup demonstrates Architecture 1 from our previous post: ToolHive pushes telemetry to an OTel collector, which Prometheus scrapes for metrics while traces flow to your tracing backend.

Call to Action: Contribute, Share Feedback, Explore Docs

By now, you've seen how ToolHive integrates seamlessly with OTel and Prometheus to make MCP workloads observable inside Kubernetes. With Prometheus scraping metrics, OTel collecting richer signals, and Grafana visualizing results, you've got a practical foundation for monitoring MCP servers.

The setup you've deployed represents just the beginning of what's possible with MCP observability. We encourage you to:

Try Different MCP Servers: Deploy other MCP servers and see how they behave differently in your dashboards. Each server type may expose different usage patterns and performance characteristics.

Share Your Experience: Join our community discussions on GitHub or Discord to share what you've learned and help improve ToolHive.

Contribute Back: Found issues or have ideas for improvements of custom metrics? The project team welcomes contributions, whether they're bug reports, feature requests, or code contributions.

If you missed the earlier posts in this series, be sure to check out:

Post 1: The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes
Post 2: Bridging the Observability Gap in MCP Servers with ToolHive

The story doesn't stop here. We'll continue exploring advanced observability features and new integrations as the MCP ecosystem evolves. The foundation you've built today will serve you well as both ToolHive and the broader observability landscape continue to mature.

Bridging the Observability Gap in MCP Servers with ToolHive

Chris Burns — Thu, 25 Sep 2025 16:16:38 +0000

In our previous post, we explored why Kubernetes observability requires both OpenTelemetry (OTel) and Prometheus. Together, they form a powerful foundation for monitoring modern workloads, but only when those workloads expose telemetry. What happens when they don't?

That's exactly the case with many Model Context Protocol (MCP) servers. These servers run critical workloads but rarely expose metrics or integrate with observability frameworks. For operations teams, they behave like black boxes; you see requests going in and responses coming out, but nothing about what's happening inside.

ToolHive was built to reduce this gap. Running natively inside Kubernetes, ToolHive acts as an intelligent proxy that collects usage statistics from MCP servers without requiring any modifications to the servers themselves. It then seamlessly feeds that data into your existing OTel + Prometheus stack, giving you the same dashboards, alerts, and reliability insights you rely on for other workloads.

Recap: The MCP Observability Problem

The core issue is a mismatch between modern observability standards and the operational reality of many MCP servers. While Prometheus expects to scrape a /metrics endpoint and OTel expects data to be pushed from instrumented applications, many MCP servers do neither. They are designed for a single purpose: providing specialized capabilities to AI systems by bridging models to the real world. Operational telemetry is often an afterthought, if it's considered at all.

This lack of metrics makes it impossible to answer basic but critical questions: How many requests is my MCP server handling per second? What is the average latency of tool calls? Is the server experiencing errors or timeouts? How much CPU and memory is the server consuming?

Without this data, you're flying blind, unable to optimize performance, troubleshoot issues, or ensure the reliability of your AI-powered applications.

How ToolHive Collects Metrics

ToolHive's approach is straightforward: instead of relying on MCP servers to expose their own metrics, it wraps them and acts as an intermediary for all client requests. ToolHive runs alongside MCP servers in Kubernetes and observes their activity directly at the orchestration layer.

As requests and responses flow through ToolHive, it observes and records key operational data points: request counts and rates, response latency and duration, error codes and status, and tool usage statistics. It also generates distributed traces for each MCP interaction, providing end-to-end visibility into request flows. By intercepting request and usage data, ToolHive can measure request volumes, latencies, and error rates while attributing metrics to specific MCP servers or workloads.

This approach decouples observability from the MCP server itself. Zero server modification means existing MCP servers work immediately without code changes or additional dependencies. Protocol awareness allows ToolHive to understand MCP-specific operations like tool calls and resource requests, providing metrics that generic proxies couldn't capture. Kubernetes native deployment means it integrates naturally with service discovery and scaling patterns.

Since ToolHive is built with OTel and Prometheus in mind, it generates and exposes both metrics and traces in formats your existing monitoring stack can consume immediately, normalising data into standard OTel and Prometheus formats.

Four Supported Architectures

ToolHive is designed for flexibility and can integrate into a variety of observability setups. It supports four primary architectures for feeding data to your pipeline:

Architecture 1 (Recommended): ToolHive → OTel Collector ← Prometheus ToolHive pushes both metrics and traces to an OpenTelemetry collector using OTLP (OpenTelemetry Protocol). The collector exposes a /metrics endpoint that Prometheus scrapes for metrics data, while traces are exported to your tracing backend (like Jaeger or Tempo). This is a robust and scalable architecture that centralizes data collection and processing while leveraging the pull-based reliability of Prometheus.

Architecture 2: ToolHive → OTel Collector → Prometheus (RemoteWrite) Similar to Architecture 1, ToolHive pushes metrics and traces to the OTel collector. The collector exports traces to your tracing backend and uses the Prometheus RemoteWrite exporter to push metrics directly to the Prometheus server. This reduces scraping overhead but can lose data if Prometheus is unavailable when the push occurs.

Architecture 3: ToolHive ← Prometheus (Direct Scrape) ToolHive exposes its own /metrics endpoint for Prometheus scraping, while traces are still pushed to an OTel collector for export to tracing backends. This is the simplest setup for metrics collection but requires separate configuration for trace export.

Architecture 4: Hybrid This approach maximizes flexibility: ToolHive pushes traces to an OTel collector (which exports to tracing backends) while exposing a /metrics endpoint that Prometheus scrapes directly. This provides full observability coverage but adds operational complexity.

Why We Recommend Architecture 1

While all four architectures are valid, Architecture 1 represents the best practice for most modern Kubernetes environments. It strikes the right balance for most Kubernetes deployments by providing several key advantages:

Centralization and Standardization: It centralizes both metrics and traces in a single pipeline, making it easier to manage, enrich, and route to different backends. For organizations already using OTel collectors for other services, this architecture maintains consistency across the monitoring stack.

Reliability: The pull-based model of Prometheus is inherently reliable for metrics. The OTel collector acts as a reliable buffer between ToolHive and both Prometheus and tracing backends, handling temporary network issues or unavailability gracefully. If Prometheus is down for maintenance, it can catch up by scraping when it comes back online, rather than losing data that would have been pushed during the outage.

Flexibility: The OTel collector can process and export both metrics and traces to any number of destinations. For metrics, this includes Prometheus, long-term storage, or analytics platforms. For traces, it can route to Jaeger, Tempo, or other tracing backends. It can add labels, perform transformations, and route data to multiple backends if needed, avoiding vendor lock-in.

ToolHive in Action: Real Metrics and Traces From MCP Servers

Once you've deployed ToolHive and configured it to work with your OTel + Prometheus stack, your dashboards will be populated with both metrics and traces that provide immediate visibility into MCP server operations:

Request Metrics include counters (toolhive_mcp_requests_total) for total requests and toolhive_mcp_request_duration_seconds_* histogram showing p95 and p99 latency,, broken down by MCP server and operation type. These help identify usage patterns and performance trends across your MCP infrastructure.

Tool Usage Statistics track which MCP tools are being called most frequently with toolhive_mcp_tool_calls_total (counter for specific tool invocations), success rates for different tool types, and usage patterns over time. This data is invaluable for understanding how AI systems are interacting with your MCP servers and which capabilities are most critical.

Distributed Traces show the complete journey of MCP requests, from client initiation through ToolHive processing to server response. Each trace includes timing information for different phases of the MCP interaction, making it possible to identify bottlenecks and understand request flow patterns. Traces are correlated with metrics through trace and span IDs, enabling powerful troubleshooting workflows.

All metrics include standard Kubernetes labels for namespace, pod, and service, making it easy to aggregate and filter data in existing dashboards. These are the metrics and traces you need to build meaningful dashboards, set up critical alerts, and truly understand the health and performance of your MCP workloads. The observability data integrates seamlessly with alerting rules, allowing teams to set up notifications for MCP-specific issues like tool failure rates or unusual usage patterns.

Call to Action: Try ToolHive / Join the Community

MCP observability gaps don't have to be a given. With ToolHive, operations teams can gain critical visibility into your AI workloads without waiting for server-side changes or upstream telemetry support. By supporting multiple architectures - and recommending a best practice approach - ToolHive makes it possible to monitor MCP servers as part of a standard OTel + Prometheus pipeline.

The project is actively developed and welcomes community input. Whether you're running a single MCP server or managing dozens across multiple clusters, ToolHive can provide the visibility you need to operate confidently. Please checkout ToolHive and connect with us on Discord.

Ready to see it in action? In the next post, we'll walk through a hands-on guide to deploying ToolHive in Kubernetes, complete with Helm charts, kubectl steps, and a starter Grafana dashboard.

If you missed the earlier posts in this series, be sure to check out: Post 1: The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes

The Next Big Observability Gap for Kubernetes is MCP Servers

Chris Burns — Mon, 22 Sep 2025 14:56:18 +0000

Kubernetes has become the de facto operating system for the cloud, empowering organizations to scale and orchestrate their workloads with unprecedented ease. But with this power comes a new set of challenges. As you break down monoliths into microservices and deploy hundreds or thousands of pods, each workload can become a potential black box. The very agility that makes Kubernetes so valuable also makes observability a monumental task.

Prometheus and OpenTelemetry (OTel) emerged as the go-to tools for making these workloads observable, yet not every system plays by the same rules. A growing example is Model Context Protocol (MCP) servers, which often don't expose metrics at all, creating blind spots in even the most sophisticated monitoring stacks. This gap highlights the next great challenge in Kubernetes observability.

In this post, we'll explore the broader observability landscape, why OTel and Prometheus work best in tandem, and why MCP highlights the gaps that still remain in our quest for comprehensive system visibility.

The Black Box Problem in Kubernetes

In a monolithic application, you often have a single point of failure and a single application log to comb through. In a Kubernetes environment, that single application is now a distributed system of dozens or hundreds of services, each with its own logs, resource consumption patterns, and unique failure modes. A single user request might traverse multiple services, making it nearly impossible to trace without a robust observability strategy.

This black box problem is amplified by several characteristics of Kubernetes environments. Ephemeral workloads mean that debugging information disappears when pods are terminated. Service mesh complexity introduces additional network hops and failure modes that aren't immediately visible. Multi-tenant clusters create resource contention that can be difficult to attribute to specific workloads. Dynamic scaling means that performance baselines are constantly shifting as replicas come and go.

Traditional monitoring approaches that rely on host-level metrics and application logs quickly become inadequate. Pods come and go, workloads scale dynamically, and ephemeral containers rarely leave behind a trail. You need telemetry that can follow requests across service boundaries, survive pod restarts, and provide insights into the distributed system as a whole rather than just individual components.

Metrics, Logs, and Traces: A Quick Refresher

Before we dive into the tools, let's quickly recap the three pillars of observability:

Metrics are numerical measurements collected over time, such as CPU utilization, request rate, error count, and response latency. They're perfect for dashboards and alerting but don't provide detail about individual requests.

Logs are timestamped event records that provide detailed context about what happened at specific points in time. Invaluable for debugging and auditing, but correlating across services can be challenging.

Traces record the journey of individual requests through distributed systems, connecting operations across service boundaries to identify bottlenecks and dependencies.

Each pillar serves different purposes, and effective observability strategies combine all three.

Prometheus: The Metrics Powerhouse

Prometheus has earned its place as the standard for collecting and storing metrics in Kubernetes environments, and for good reason. Its pull-based model is a perfect fit for a dynamic, ephemeral world where services appear and disappear constantly. The Prometheus server periodically scrapes metric endpoints (usually /metrics) exposed by applications, making it naturally aligned with Kubernetes' service discovery mechanisms.

What makes Prometheus particularly powerful in Kubernetes is its integration with platform concepts like ServiceMonitor and PodMonitor custom resources, which automatically discover services as they scale. The query language, PromQL, excels at time series analysis, making it straightforward to calculate rates, percentiles, and aggregations across multiple dimensions.

However, Prometheus has limitations. It's primarily designed for metrics, so correlating with logs and traces requires additional tooling. The pull model can also miss short-lived processes or workloads that can't expose HTTP endpoints.

OpenTelemetry: The Unified Framework

OpenTelemetry (OTel) provides a single, vendor-neutral framework for collecting metrics, logs, and traces. The OpenTelemetry SDK lets developers instrument code once and export telemetry to multiple backends - traces to Jaeger, metrics to Prometheus, logs to Elasticsearch.

Auto-instrumentation capabilities mean many applications gain observability without code changes. The OpenTelemetry Collector serves as a central hub for processing telemetry data, typically deployed in Kubernetes as both a DaemonSet and Deployment.

What makes OTel particularly valuable is its ability to correlate telemetry across all three pillars. Trace spans can include logs as events, and metrics can be tagged with trace IDs, enabling workflows like jumping from dashboard alerts to specific failing traces.

Why They're Better Together

Prometheus and OTel work best together, each excelling where the other has limitations. Standardized instrumentationthrough OTel means developers can use one toolset regardless of telemetry type, while exposing metrics in formats Prometheus can scrape.

Complementary strengths provide both high-level operational views (Prometheus) and detailed diagnostic capabilities (OTel). Shared infrastructure reduces overhead - the same Kubernetes service discovery works for both tools. Correlated troubleshooting becomes possible when metrics alerts include trace context, letting teams drill down from aggregate problems to specific failing requests.

MCP as the Case Study for Observability Gaps

Model Context Protocol (MCP) servers exemplify this challenge. These lightweight applications provide context and tools to AI systems, handling requests and managing state - all behaviors that should be observable - yet they typically operate as complete black boxes.

The core problem: Many MCP servers prioritize minimal dependencies and fast startup over telemetry. They often lack /metrics endpoints, don't log structured data, and can't be traced by standard tools. The protocol itself doesn't mandate observability standards, and established patterns don't exist yet.

Real-world impact: When AI systems behave unexpectedly, teams can't determine which MCP servers were involved. When response times increase, there's no visibility into whether bottlenecks are in the AI model, MCP server, or external systems. Even with comprehensive Prometheus and OTel stacks, MCP servers remain invisible, creating significant blind spots in otherwise well-monitored systems.

Teaser: In Our Next Post, We'll Explore ToolHive

These observability challenges with MCP servers aren't insurmountable, but they require solutions that bridge the gap between emerging technologies and established monitoring practices.

In our next post, we'll explore ToolHive, which aims to fill part of this specific gap by providing MCP tool usage data that most servers don't expose natively. We'll look at how it integrates with existing OTel and Prometheus infrastructure to make MCP servers observable within your current monitoring stack, and examine practical approaches for implementing observability patterns with other emerging technologies in Kubernetes environments.

Performance Testing MCP Servers in Kubernetes: Transport Choice is THE Make-or-Break Decision for Scaling MCP

Chris Burns — Tue, 19 Aug 2025 13:16:31 +0000

The Model Context Protocol (MCP) has emerged as a critical standard for enabling AI models to interact with external tools and data sources securely. As organisations increasingly deploy MCP servers at scale in Kubernetes environments, understanding their performance characteristics under load becomes essential for production readiness.

This article analyses the findings from initial load testing performed on MCP servers running in Kubernetes with ToolHive, examining three different transport protocols and their suitability for high-concurrency production workloads.

Test Methodology and Setup

The load testing was conducted using a systematic approach to evaluate three MCP transport implementations:

stdio: Standard input/output communication requiring direct container attachment
SSE (Server-Sent Events): HTTP-based streaming protocol
StreamableHTTP: Custom streamable HTTP protocol designed for MCP

Each transport type was subjected to various load scenarios to measure throughput, error rates, latency, and scalability characteristics. The tests focused on identifying bottlenecks and determining which transport mechanisms could reliably handle production-scale traffic.

The MCP server used for testing was yardstick, which exposes an echo tool that simply returns the text provided in the request. This design helps eliminate caching effects, giving a clearer view of raw MCP server and ToolHive performance. Functionally similar to the mcp/everything server, yardstick is containerised and supports all three transport types.

This MCP server was deployed onto a local Kubernetes cluster using kind, with ToolHive running the MCP server and simple port forwarding for access. In real environments, this would look largely different, resulting in some additional latency in response times.

Performance Findings by Transport Type

stdio Transport

The stdio implementation demonstrated severe performance limitations that make it unsuitable for production use.

Test Name	Concurrent Connections	Duration	RPS	Total Expected	Actual Requests	Successful	Failed	Req/sec	Min RT	Max RT	Avg RT
Basic Test	20	5s	10	50	22	2	20	0.64	19.78ms	30.02s	20.01s

Error Breakdown:

Timeouts: 8
Connection resets: 3
Connection closed: 9

The underlying architecture’s reliance on direct container attachment introduces built-in scalability limits. Every connection consumes dedicated container resources, making horizontal scaling costly and unreliable. As a result, performance was poor even at low concurrency: out of 50 requests, only 2 succeeded, and over half never left the client due to the cascading effects of earlier timeout errors.

SSE Transport

Test Name	Concurrent Connections	Duration	RPS	Total Expected	Actual Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Basic Test	20	5s	10	50	50	100.00%	7.23	11.13ms	21.89ms	18.56ms
Sustained Load	20	60s	50	3000	1861	100.00%*	29.87	4.76ms	2.00s	564.57ms

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Compared to stdio, SSE demonstrated superior throughput and reliability, completing all traffic, including items not transmitted in the stdio trial, and maintained solid performance at moderate volumes. However, with sustained heavy load, response times deteriorated, and at peak rates, the test harness expired before all requests could be issued.

SSE is now officially deprecated (in favour of Streamable HTTP), so expect fewer and fewer MCP servers to offer this as a transport type in future.

Streamable HTTP Transport

Streamable HTTP dominated across all metrics, but with one crucial caveat about session management: shared session pools and unique session pools.

Shared Session Pool (10 sessions)

Test Scenario	Concurrent Connections	Duration	RPS	Total Expected	Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Basic Load Test	20	5s	10	50	50	100.00%	7.24	1.88ms	15.66ms	5.31ms
Sustained	20	60s	50	3000	3000	100.00%	48.40	1.02ms	97.55ms	5.03ms
High Load	50	60s	100	6000	6000	100.00%	96.78	831µs	135.05ms	6.68ms
Very High Load	200	60s	500	30000	18757	100.00%	299.85	1.33ms	783.43ms	622.20ms
Very High Load	400	60s	500	30000	18546	100.00%	293.16	36.87ms	1.69s	1.28s
Very High Load	1000	60s	500	30000	19112	100.00%	292.62	5.09ms	3.58s	3.09s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Unique Session Per Request

Test Scenario	Concurrent Connections	Duration	RPS	Total Expected	Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Sustained	20	60s	50	3000	2244	100.00%	36.07	4.05ms	1.31s	272.93ms
High Load	50	60s	100	6000	2086	100.00%	33.03	5.37ms	4.23s	1.12s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Streamable HTTP maintained 100% success rates across all requests sent during the scenarios while delivering 290-300 requests per second with shared sessions versus only 30-36 requests per second with unique sessions.

The Key Insight: Session Management is Everything

The most striking finding was the 10x performance difference between shared and unique session handling in Streamable HTTP. This reveals that session reuse isn't just an optimisation - it's fundamental to achieving production-scale performance.

Recommendations:

Build around sessions: Pool and reuse aggressively (where appropriate)
Avoid stdio in production, prefer Streamable HTTP by default (unless you have good reasons not to use it)

The Caveats

The yardstick MCP server is a simple echo tool with no long-running work, so it responds extremely quickly. Real MCP servers in the wild will almost certainly benchmark slower than the figures shown here.
Tests were run on a local Kubernetes cluster with port-forwarding, minimising latency. Expect slower results on remote clusters.
The load testing tool used was built specifically to run performance tests against MCP servers and is not a battle-hardened tool

The Takeaway

These results fundamentally change how we should think about MCP server deployments. Transport choice isn't just a technical detail - it's a make-or-break architectural decision that can determine whether your AI capabilities scale or fail under load.

For teams building production AI systems with MCP, Streamable HTTP with optimised session management represents a key path forward in the current MCP landscape for achieving the reliability and performance modern applications demand.

ToolHive Operator: Multi-Namespace Support for Enhanced Security and Flexibility

Chris Burns — Tue, 10 Jun 2025 17:02:13 +0000

We're excited to announce a significant enhancement to the ToolHive Operator: multi-namespace deployment support. This update provides organizations with greater flexibility and security when deploying MCP (Model Context Protocol) servers across their Kubernetes environments.

What's New

The ToolHive Operator now supports two distinct deployment modes:

🌍 Cluster Mode (Default)

Suitable for platform teams managing MCPServers across the entire cluster

Full cluster-wide access to manage MCPServer's in any namespace
Uses ClusterRole and ClusterRoleBinding for broad permissions

🔒 Namespace Mode (New!)

Perfect for multi-tenant environments and organizations following the principle of least privilege

Restricted access to only specified namespaces
Uses ClusterRole with namespace-specific RoleBindings for precise access control

Why Multi-Namespace Support Matters

Enhanced Security

In namespace mode, the ToolHive Operator only has permissions in the namespaces you explicitly specify. This significantly reduces the blast radius and follows Kubernetes security best practices. This prevents a compromised operator from accessing sensitive workloads in other namespaces. For example, if an attacker exploits the operator, they can't pivot to your production databases, payment systems, or other critical applications running in separate namespaces. It also eliminates the risk of accidental misconfiguration affecting unrelated services across your entire cluster.

Multi-Tenancy Support

Different teams can now have their own namespaces with MCPServers while maintaining strict isolation. The operator in the toolhive-system namespace can manage resources across designated team namespaces without requiring cluster-wide permissions. This eliminates resource conflicts where one team's MCP configuration could interfere with another team's. It also prevents competing resource quotas or conflicting network policies that could degrade performance. Teams can iterate independently without waiting for central infrastructure changes, accelerating development cycles while maintaining security boundaries between departments.

Compliance and Governance

Organizations with strict security requirements can now deploy ToolHive with minimal necessary permissions, making it easier to pass security audits and meet compliance requirements. Security auditors can quickly verify that ToolHive follows the principle of least privilege by examining a limited set of namespace-scoped permissions rather than auditing complex cluster-wide access patterns. This reduces audit preparation time from weeks to days and helps developers satisfy InfoSec requirements upfront, avoiding the common scenario where security teams block deployments due to overly broad permissions that violate corporate security policies.

How It Works

The magic happens through an RBAC pattern where the operator uses a ClusterRole (for permission consistency) but applies it through namespace-specific RoleBindings. This means:

Single source of truth: One ClusterRole defines all the permissions
Namespace isolation: RoleBindings restrict where those permissions apply
Dynamic scaling: Easy to add or remove namespace access as needed

Helm Configuration Examples

Cluster Mode (Default)

# values.yaml
operator:
  rbac:
    scope: "cluster"

This creates a ClusterRoleBinding granting the operator access to all namespaces.

Namespace Mode

# values.yaml
operator:
  rbac:
    scope: "namespace"
    allowedNamespaces:
      - "team-frontend"
      - "team-backend"
      - "staging"
      - "production"

This creates individual RoleBindings in each specified namespace, granting the operator access only where needed.

What Permissions Does the Operator Get?

The ToolHive Operator requires specific permissions to manage MCPServer resources and their associated Kubernetes objects:

Core Permissions

MCPServers: Full lifecycle management of your custom resources
ServiceAccounts: Creates dedicated service accounts for each MCPServer
Roles & RoleBindings: Manages RBAC for ProxyRunner and MCPServer workloads
ConfigMaps & Secrets: Handles configuration and credentials
Deployments & Services: Manages the underlying workloads

Additional Permissions

These are permissions that are needed by the toolhive-operator so that it can grant these privileges to the ProxyRunners in the dedicated namespaces. The ProxyRunners are the component that uses these permissions.

Pod logs: Ability to get pod logs
Pod attach: Ability to attach to the pod (for stdio MCP Servers)

Note: The above are what are currently used, these are likely to evolve in future.

Scope-Specific Behavior

Cluster mode: These permissions apply cluster-wide
Namespace mode: These permissions apply only to specified namespaces

Real-World Use Cases

Multi-Team Organization

# Platform team controls toolhive-system
# Individual teams get their own namespaces
operator:
  rbac:
    scope: "namespace"
    allowedNamespaces:
      - "team-data"
      - "team-ai"
      - "team-platform"

Environment Separation

operator:
  rbac:
    scope: "namespace"
    allowedNamespaces:
      - "development"
      - "staging"

What's Next?

This multi-namespace support is just the beginning. We're looking into additional features including:

Dynamic namespace discovery: Automatically detect and manage namespaces based on labels
Separate ProxyRunner and MCPServer permissions: The ProxyRunner and the MCPServer pod do not need to share permissions, we want to make this even more secure by following the principle of least privilege

Community Feedback

We'd love to hear how you're using multi-namespace support! Share your use cases, feedback, and questions:

GitHub
Discussions: Join our community discussions on Discord
Issues: Report bugs or request features

The ToolHive Operator's multi-namespace support represents our commitment to providing secure, flexible, and enterprise-ready solutions for MCP server management. Whether you're a platform team managing cluster-wide resources or a security-conscious organization requiring strict namespace isolation, we've got you covered.

Happy deploying! 🚀

ToolHive: A Kubernetes Operator for Deploying MCP Servers

Chris Burns — Thu, 01 May 2025 11:14:15 +0000

Introduction

Building on our earlier discussion about enterprises needing dedicated hosting for MCP servers and ToolHive's Kubernetes-based solution, we're excited to announce our new Kubernetes Operator for ToolHive. This specialised tool streamlines the secure deployment of MCP servers to Kubernetes environments for enterprise and engineers.

In this article, we'll explore practical ways to leverage this new operator's capabilities.

Let's jump right in! 🚀

Deploying the Operator

For the installation of the ToolHive Operator, we’ve assumed there is already a Kubernetes cluster available with an Ingress controller. We have used Kind for this post as it is simple to set up, free and easy to use.

For simplified local ingress setup with Kind we utilise a basic IP with the Kind Load Balancer - feel free to follow our guide for easy steps on how to do this. To keep things straightforward, we won't use a local hostname in this setup.

Now, with a running cluster, execute the following Helm commands (remember to adjust the --kubeconfig and --kube-context flags as needed).

Install the ToolHive Operator Custom Resource Definitions (CRD’s):

$ helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds

Deploy the Operator:

$ helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace

At this point, the ToolHive Kubernetes Operator should now be installed and running.

To verify this, run the following:

$ kubectl get pods -n toolhive-system

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-7f946d9c5-9s8dk   1/1     Running   0          59s

Deploy an MCP Server

Now to install a sample fetch MCP server, run the following:

$ kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/main/examples/operator/mcp-servers/mcpserver_fetch.yaml

To verify this has been installed, run the following:

$ kubectl get pods -n toolhive-system -l toolhive=true

NAME                     READY   STATUS    RESTARTS   AGE
fetch-0                  1/1     Running   0          115s
fetch-649c5b958c-nhjbq   1/1     Running   0          2m1s

As shown above, 2 pods are running. The fetch MCP server (fetch-0) is a pod associated with the MCP Server StatefulSet. The other - fetch-xxxxxxxxxx-xxxxx - is the proxy server that deals with all communication between the fetch MCP server and external callers.

Looking back, let’s review how the MCP server was created. Here is the fetch MCP server resource that we’ve applied to the cluster.

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: fetch
  namespace: toolhive-system
spec:
  image: docker.io/mcp/fetch
  transport: stdio
  port: 8080
  permissionProfile:
    type: builtin
    name: network
  podTemplateSpec:
    spec:
      containers:
        - name: mcp
          securityContext:
            allowPrivilegeEscalation: false
            runAsNonRoot: false
            runAsUser: 0
            runAsGroup: 0
            capabilities:
              drop:
              - ALL
          resources:
            limits:
              cpu: "500m"
              memory: "512Mi"
            requests:
              cpu: "100m"
              memory: "128Mi"
      securityContext:
        runAsNonRoot: false
        runAsUser: 0
        runAsGroup: 0
        seccompProfile:
          type: RuntimeDefault
  resources:
    limits:
      cpu: "100m"
      memory: "128Mi"
    requests:
      cpu: "50m"
      memory: "64Mi"

The ToolHive Operator introduces a new Custom Resource called: MCPServer. Here’s a breakdown of the MCPServer configuration:

transport: stdio - This creates the MCP server allowing only stdin and stdout traffic. In Kubernetes this results in the proxy server attaching to the container via the Kubernetes API. No other access is given to the caller.
permissionProfile.type: builtin - This references the built-in profiles with ToolHive
permissionProfile.name: network - Permits outbound network connections to any host on any port (not recommended for production use).

Now to connect an example Client such as Cursor to our MCP server, we can do so simply with an Ingress record that is enabled by the Load Balancer mentioned earlier.

We can apply the following Ingress entry, ensuring that the ingressClassName matches what we have in our cluster.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-fetch-ingress
  namespace: toolhive-system
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mcp-fetch-proxy
            port:
              number: 8080

At this point we should be able to connect to the running fetch MCP server using the external IP address of our Load Balancer.

Note: If you have not chosen Kind for the cluster and you have a different Load Balancer setup than what is followed in this post, you will have to make the respective changes in your configuration to send ingress traffic to the fetch server proxy service.

Due to the fact that we did not use the CLI to create the MCP server, the server configuration did not get automatically applied to our local Client configurations. For this reason, we have to add the configuration manually.

For Cursor, we go to Users/$USERNAME/.cursor/mcp.json, ensuring to replace $USERNAME with our home directory username and we add the following:

{
    "mcpServers": {
        "fetch": {"url": "http://localhost:8080/sse#fetch"}
    }
}

Now, if we go into the Cursor chat, and we ask it to fetch the contents of a web page, it should ask us for approval for the use of the fetch MCP server and then return the content.

Now if we see the logs for the fetch MCP server.

$ {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text","text":"Contents of https://chrisjburns.com/:\n\n\nchrisjburns\n\n# Chris Burns\n\n## Software engineer\n\n"}],"isError":false}}
$ {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text","text":"Content type text/html; charset=utf-8 cannot be simplified to markdown, but here is the raw content:\nContents of https://chrisjburns.com/:\n<!doctype html><html lang=en><head><meta charset=utf-8><meta name=viewport content=\"width=device-width,initial-scale=1\"><meta name=author content=\"Chris Burns\"><meta name=keywords content=\"blog,developer,personal\"><meta name=twitter:card content=\"summary\"><meta name=twitter:title content=\"chrisjburns\"><meta name=twitter:description content><meta property=\"og:title\" content=\"chrisjburns\"><meta property=\"og:description\" content><meta property=\"og:type\" content=\"website\"><meta property=\"og:url\" content=\"https://chrisjburns.com/\"><meta property=\"og:updated_time\" content=\"2020-05-20T00:18:23+01:00\"><base href=https://chrisjburns.com/><title>chrisjburns</title><link rel=canonical href=https://chrisjburns.com/><link href=\"https://fonts.googleapis.com/css?family=Lato:400,700%7CMerriweather:300,700%7CSource+Code+Pro:400,700\" rel=stylesheet><link href=\"https://fonts.googleapis.com/css?family=Montserrat:400,700|Open+Sans:400,600,300,800,700\" rel=stylesheet type=text/css><link rel=stylesheet href=https://use.fontawesome.com/releases/v5.11.2/css/all.css integrity=sha384-KA6wR/X5RY4zFAHpv/CnoG2UW1uogYfdnP67Uv7eULvTveboZJg0qUpmJZb5VqzN crossorigin=anonymous><link rel=stylesheet href=https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css integrity=\"sha256-l85OmPOjvil/SOvVt3HnSSjzF1TUMyT9eV0c2BzEGzU=\" crossorigin=anonymous><link rel=stylesheet href=https://chrisjburns.com/css/coder.min.9f38ad26345e306650770a3b91475e09efa3026c59673a09eff165cfa8f1a30e.css integrity=\"sha256-nzitJjReMGZQdwo7kUdeCe+jAmxZZzoJ7/Flz6jxow4=\" crossorigin=anonymous media=screen><link rel=icon type=image/png href=https://chrisjburns.com/images/favicon-32x32.png sizes=32x32><link rel=icon type=image/png href=https://chrisjburns.com/images/favicon-16x16.png sizes=16x16><link rel=alternate type=application/rss+xml href=https://chrisjburns.com/index.xml title=chrisjburns><meta name=generator content=\"Hugo 0.63.2\"></head><body class=colorscheme-light><main class=wrapper><nav class=navigation><section class=container><a class=navigation-title href=https://chrisjburns.com/>chrisjburns</a>\n<input type=checkbox id=menu-toggle>\n<label class=\"menu-button float-right\" for=menu-toggle><i class=\"fas fa-bars\"></i></label><ul class=navigation-list><li class=navigation-item><a class=navigation-link href=https://chrisjburns.com/posts/>BLOG</a></li></ul></section></nav><div class=content><section class=\"container centered\"><div class=about><div class=avatar><img src=https://chrisjburns.com/images/avatar.jpg alt=avatar></div><h1>Chris Burns</h1><h2>Software engineer</h2><ul><li><a href=https://github.com/ChrisJBurns/ aria-label=Github><i class=\"fab fa-github\" aria-hidden=true></i></a></li><li><a href=https://www.linkedin.com/in/chris-j-burns/ aria-label=LinkedIn><i class=\"fab fa-linkedin\" aria-hidden=true></i></a></li></ul><img src=https://ghchart.rshah.org/ChrisJBurns alt=\"Chris Burns's Github chart\"></div></section></div><footer class=footer><section class=container></section></footer></main></body></html>"}],"isError":false}}

There we have it, an MCP Server, created in Kubernetes using the new ToolHive Operator.

Summary

At this point, we hope that it is possible to see the power that this will give engineers and enterprises that want to create MCP servers within Kubernetes. For those who have already played around with Operators and used them, they already know that the potential capabilities are unrivalled when it comes to creating and managing workloads inside of Kubernetes. We know at Stacklok, that behind the Operator we can hide away a lot of complexity that is normally burdened onto the engineer. We really are excited to release this and we are even more excited to see where it goes.

Give it a try, and let us know what you think!

Essential Links:

ToolHive: Secure MCP in a Kubernetes-native World

Chris Burns — Tue, 22 Apr 2025 14:32:16 +0000

⚠️ Deprecate Notice: The recommended way of installing ToolHive on Kubernetes is now via the ToolHive Operator and the manifests in this post have now been removed. Follow https://dev.to/stacklok/toolhive-an-mcp-kubernetes-operator-321 to find out how to install the Operator. ⚠️

Introduction

Model Context Protocol (MCP) enables seamless integration with applications and services to extend an LLM's context and capabilities. However, deploying MCP servers in production environments raises concerns surrounding data privacy, unauthorised access, and potential vulnerabilities. Current MCP server setups often lack the robust security measures required to safeguard sensitive model data and prevent malicious activities, thus hindering widespread adoption.

Kubernetes offers a compelling solution for running MCP servers securely and efficiently. Its containerisation and orchestration capabilities provide a strong foundation for isolating and managing MCP instances. Kubernetes' built-in features, such as role-based access control (RBAC), network policies, and secrets management, address the security concerns that deter enterprises. Furthermore, the Kubernetes ecosystem, including tools for monitoring, logging, and automated deployment, enables a comprehensive and secure operational environment for MCP servers.

The team at Stacklok, empowered by our CEO, Craig McLuckie (and co-creator of Kubernetes), recently released ToolHive, an open source project that offers a convenient way to run MCP servers with familiar technologies with authentication, authorization and network isolation. Let’s take a closer look at how ToolHive and Kubernetes come together to support MCP in an enterprise environment.

Running ToolHive on Kubernetes

ToolHive lets you run MCP servers in Kubernetes using one of its native workload types: StatefulSets. StatefulSets are designed for managing stateful applications, making them ideal for MCP servers. When deploying ToolHive in Kubernetes, you’ll create a StatefulSet for ToolHive itself, which is configured to launch an MCP server in the foreground. Running the server in the foreground ensures the ToolHive pod remains active for the full duration of the MCP server’s lifecycle. Once the ToolHive StatefulSet is up and the pod is running, it will then provision your target MCP server, also as a StatefulSet. This results in two workloads running: ToolHive and the desired MCP server.

Let’s try it out. We’ll use the example YAML manifests available in the ToolHive GitHub repository. Before getting started, make sure you have access to a running Kubernetes cluster. If you want to avoid cloud costs, you can use a local setup like Kind, which lets you run Kubernetes clusters locally using Docker.

Create the ToolHive namespace:
$ kubectl apply -f namespace.yaml
Provision the correct RBAC roles and service account for ToolHive:
$ kubectl apply -f rbac.yaml -n toolhive-deployment
Provision ToolHive and an example fetch MCP server:
$ kubectl apply -f thv.yaml -n toolhive-deployment

At this point, you should have an MCP server running, with its associated ToolHive workload. To check this, run:

$ kubectl -n toolhive-deployment get all

NAME              READY   STATUS    RESTARTS   AGE
pod/mcp-fetch-0   1/1     Running   0          6m40s
pod/toolhive-0    1/1     Running   0          6m46s

NAME               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/toolhive   ClusterIP   10.96.10.131   <none>        8080/TCP   6m46s

NAME                         READY   AGE
statefulset.apps/mcp-fetch   1/1     6m40s
statefulset.apps/toolhive    1/1     6m46s

Looking good, the ToolHive and MCP pods are both healthy. Let’s look at the logs:

$ kubectl logs pod/toolhive-0 -n toolhive-deployment
checking for updates...
A new version of ToolHive is available: v0.0.15
Currently running: dev
{"time":"2025-04-17T12:01:24.912633512Z","level":"INFO","msg":"Processed cmdArgs: []"}
{"time":"2025-04-17T12:01:24.914158929Z","level":"INFO","msg":"Image docker.io/mcp/fetch has 'latest' tag, pulling to ensure we have the most recent version..."}
{"time":"2025-04-17T12:01:24.914169221Z","level":"INFO","msg":"Skipping explicit image pull for docker.io/mcp/fetch in Kubernetes - images are pulled automatically when pods are created"}
{"time":"2025-04-17T12:01:24.914171179Z","level":"INFO","msg":"Successfully pulled image: docker.io/mcp/fetch"}
{"time":"2025-04-17T12:01:24.915905346Z","level":"INFO","msg":"Using host port: 8080"}
{"time":"2025-04-17T12:01:24.915920096Z","level":"INFO","msg":"Setting up stdio transport..."}
{"time":"2025-04-17T12:01:24.915923512Z","level":"INFO","msg":"Creating container mcp-fetch from image docker.io/mcp/fetch..."}
{"time":"2025-04-17T12:01:24.922990637Z","level":"INFO","msg":"Applied statefulset mcp-fetch"}
{"time":"2025-04-17T12:01:37.823657379Z","level":"INFO","msg":"Container created with ID: mcp-fetch"}
{"time":"2025-04-17T12:01:37.823676796Z","level":"INFO","msg":"Starting stdio transport..."}
{"time":"2025-04-17T12:01:37.825097838Z","level":"INFO","msg":"Attaching to pod mcp-fetch-0 container mcp-fetch..."}
{"time":"2025-04-17T12:01:37.825138463Z","level":"INFO","msg":"HTTP SSE proxy started, processing messages..."}
{"time":"2025-04-17T12:01:37.825430046Z","level":"INFO","msg":"HTTP proxy started for container mcp-fetch on port 8080"}
{"time":"2025-04-17T12:01:37.825438004Z","level":"INFO","msg":"SSE endpoint: http://localhost:8080/sse"}
{"time":"2025-04-17T12:01:37.825440046Z","level":"INFO","msg":"JSON-RPC endpoint: http://localhost:8080/messages"}
{"time":"2025-04-17T12:01:37.827135754Z","level":"INFO","msg":"MCP server mcp-fetch started successfully"}
{"time":"2025-04-17T12:01:37.827350796Z","level":"INFO","msg":"Saved run configuration for mcp-fetch"}
{"time":"2025-04-17T12:01:37.827414796Z","level":"INFO","msg":"Would you like to enable auto discovery and configuraion of MCP clients? (y/n) [n]: "}
{"time":"2025-04-17T12:01:37.827423629Z","level":"INFO","msg":"Unable to read input, defaulting to No."}
{"time":"2025-04-17T12:01:37.827425713Z","level":"INFO","msg":"initializing configuration file at /home/nonroot/.config/toolhive/config.yaml"}
{"time":"2025-04-17T12:01:37.827466963Z","level":"INFO","msg":"No client configuration files found"}
{"time":"2025-04-17T12:01:37.827474546Z","level":"INFO","msg":"Press Ctrl+C to stop or wait for container to exit"}

Nice!

You won’t see any logs in the fetch MCP server just yet—that’s because no requests have been made. Let’s change that by connecting it to a local Cursor client.

To expose the MCP server locally, we’ll use a simple port-forward. While the ToolHive repository includes a sample Ingress Controller setup, we’ll stick with port-forwarding here for the sake of simplicity.

$ kubectl port-forward svc/toolhive 8080:8080  -n toolhive-deployment

Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Now let’s connect a local Cursor client to our MCP server. Head over to the MCP settings in Cursor to configure the connection

{
   "mcpServers": {
       "fetch": {
           "url": "http://localhost:8080/sse#fetch"
       }
   }
}

At this point, the Cursor MCP settings page should display a configured and ready-to-use fetch MCP server.

If we now check the logs for the MCP container, we should see entries reflecting the initial connection from Cursor.

$ kubectl logs mcp-fetch-0 -n toolhive-deployment

{"jsonrpc":"2.0","id":0,"result":{"protocolVersion":"2024-11-05","capabilities":{"experimental":{},"prompts":{"listChanged":false},"tools":{"listChanged":false}},"serverInfo":{"name":"mcp-fetch","version":"1.2.0"}}}
{"jsonrpc":"2.0","id":1,"result":{"tools":[{"name":"fetch","description":"Fetches a URL from the internet and optionally extracts its contents as markdown.\n\nAlthough originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that.","inputSchema":{"description":"Parameters for fetching a URL.","properties":{"url":{"description":"URL to fetch","format":"uri","minLength":1,"title":"Url","type":"string"},"max_length":{"default":5000,"description":"Maximum number of characters to return.","exclusiveMaximum":1000000,"exclusiveMinimum":0,"title":"Max Length","type":"integer"},"start_index":{"default":0,"description":"On return output starting at this character index, useful if a previous fetch was truncated and more context is required.","minimum":0,"title":"Start Index","type":"integer"},"raw":{"default":false,"description":"Get the actual HTML content of the requested page, without simplification.","title":"Raw","type":"boolean"}},"required":["url"],"title":"Fetch","type":"object"}}]}}

Awesome, now let’s give it a test!

Click Run tool and it will populate the results with the HTML of the Wikipedia page we’ve requested.

In this case, Cursor detected that the results were truncated, so it issued additional requests starting at index 5000 to retrieve the remaining content.

And just like that, you've successfully connected a local Cursor client to a fetch MCP server running inside Kubernetes, using nothing more than a simple port-forward. 🎉

Now, you might be wondering: "What exactly just happened under the hood?"

Let’s break it down.

Remember how we mentioned that ToolHive acts as a proxy to MCP server containers, communicating via stdio (stdin and stdout)? That same pattern applies when running in Kubernetes.

When we deployed the ToolHive workload, it was instructed to spin up a fetch MCP server. ToolHive’s Kubernetes runtime took care of:

Creating a StatefulSet for the MCP server
Connecting to it via stdin/stdout
Acting as a proxy, shuttling data between the MCP server and clients like Cursor

Notably, the MCP server itself does not expose any network port, by design. All communication must go through ToolHive. This design ensures a more secure setup because if a malicious workload is running in your cluster, it cannot query the MCP server directly unless it has the specific privileges required to attach to the MCP process via stdin/stdout.
In short: ToolHive is the only interface to the MCP server. It controls all traffic and limits direct access, adding a layer of isolation and protection by default.

What’s Next?

Kubernetes, Kubernetes, and more Kubernetes.

At Stacklok, we’re Kubernetes people at heart. While ToolHive is designed to make it easy for engineers to run local MCP servers without fear, we know that for enterprises to confidently run MCP servers in production environments, the solution needs to be standardised, secure, and built on battle-tested infrastructure. For us, that foundation is Kubernetes.

Right now, ToolHive is a lightweight, developer-friendly tool that runs both locally and in Kubernetes—but this is just the beginning. There's a huge opportunity to push it further.

For example: while applying YAML manifests directly to a cluster works, it's only part of the story. We believe the future of ToolHive lies in evolving it into a Kubernetes Operator. As an operator, ToolHive could handle orchestration, security hardening, and lifecycle management automatically—removing the manual effort and unlocking more powerful, streamlined workflows for teams. Think: more automation, more control, and less cognitive load.

But to get there, we need to make sure we’re solving the right problems.

ToolHive is still in its early stages. It works well today, but it can be even better with your help. Whether you’ve tried it out or are just curious, we’d love your feedback: What do you like? What don’t you like? What would you like to see from ToolHive? We’re not just building a tool for ourselves, we’re building based on our company’s core principles: to create software people love, that ultimately makes the world a safer place. It’s not just what we do, it’s who we are.

📌 First Principles – Stacklok

Give it a try, and let us know what you think!

Essential Links: