Chris Burns for Stacklok

Posted on Sep 30 • Edited on Oct 3

From Black Box to Observable: Deploying ToolHive with OTel + Prometheus in Kubernetes

#kubernetes #monitoring #tooling #tutorial

In our previous two posts, we laid the groundwork for modern Kubernetes observability. We explored why OpenTelemetry (OTel) and Prometheus work best in tandem, and how ToolHive helps bridge the observability gap for Model Context Protocol (MCP) servers that rarely expose their own usage metrics.

Now it's time to get hands-on. ToolHive sits in front of your MCP servers, collecting vital usage statistics and feeding them directly into your existing observability stack. In this tutorial, we'll walk through deploying ToolHive in a Kubernetes cluster alongside OTel, Prometheus, and Grafana. By the end, you'll have transformed your black-box MCP workloads into observable services.

Prerequisites: Kubernetes, Helm, kubectl

Before we begin, you'll need the following tools:

A Kubernetes cluster: Any cluster will do. For this tutorial, we're using a local cluster created with kind
Helm 3: The package manager for Kubernetes, making it easy to deploy complex applications like monitoring stacks
kubectl: The command-line tool for interacting with your Kubernetes cluster

If you're using kind, create a cluster with:

kind create cluster --name toolhive-demo

Output the kubeconfig file into a file called kconfig.yaml so we can avoid conflicting with existing clusters when running further kubectl and helm commands.

kind get kubeconfig --name toolhive-demo > kconfig.yaml

Verify your cluster is ready:

kubectl cluster-info --kubeconfig kconfig.yaml
Kubernetes control plane is running at https://127.0.0.1:55371
CoreDNS is running at https://127.0.0.1:55371/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.


kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
toolhive-control-plane   Ready    control-plane   14m   v1.33.1

You should see your cluster responding and nodes in a Ready state.

Installing Prometheus + Grafana

We'll start by setting up our monitoring backbone using the kube-prometheus-stack Helm chart. This comprehensive solution deploys Prometheus for metric collection and Grafana for visualization, with everything pre-configured to work together out of the box.

First, add the Prometheus community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a dedicated namespace for our monitoring components and install the stack:

helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 77.12.0 -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/prometheus-stack-values.yaml -n monitoring --create-namespace --kubeconfig kconfig.yaml

The values file from the ToolHive repository is specifically configured for this tutorial's architecture and sets up:

Prometheus with scrape jobs ready to pull metrics from our OTel collector
Grafana with admin credentials (admin:admin for testing)

Wait for all components to be ready:

kubectl get pods -n monitoring --kubeconfig kconfig.yaml
NAME                                                        READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-grafana-6c5cb68857-4hmw4              3/3     Running   0          30s
kube-prometheus-stack-kube-state-metrics-557fd457c6-c489z   1/1     Running   0          30s
kube-prometheus-stack-operator-7c6d8c4dc7-j2g24             1/1     Running   0          30s
kube-prometheus-stack-prometheus-node-exporter-t5q9w        1/1     Running   0          30s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          30s

All pods should show Running or Completed status.

Installing the OTel Collector

With our monitoring backend in place, we'll deploy the OpenTelemetry collector. The collector is a crucial component that receives metrics and traces from ToolHive and makes them available to Prometheus and tracing backends.

Add the OpenTelemetry Helm repository:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

Install the collector using ToolHive's specialized values file:

helm upgrade -i otel-collector open-telemetry/opentelemetry-collector  -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/otel-values.yaml -n monitoring --kubeconfig kconfig.yaml

This values file configures the collector to:

Use the OTLP receiver to accept metrics and traces pushed from ToolHive
Enable the kubeletstats receiver, providing valuable runtime metrics about containers and nodes that native OTel libraries sometimes miss
Enable the Kubernetes attributes processor to add pod/namespace context to telemetry
Enable the Prometheus exporter, making collected metrics available for Prometheus scraping
Configure service pipelines that route metrics and traces appropriately

Verify the collector is running:

kubectl get pods -n monitoring -l app.kubernetes.io/name=opentelemetry-collector --kubeconfig kconfig.yaml
NAME                                                 READY   STATUS    RESTARTS   AGE
otel-collector-opentelemetry-collector-agent-g5crz   1/1     Running   0          33s

Installing Jaeger Backend for Trace Querying

Add the Jaeger Helm repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update

Install Jaeger using ToolHive's specialized values file:

helm upgrade -i jaeger-all-in-one jaegertracing/jaeger -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/jaeger-values.yaml -n monitoring --kubeconfig kconfig.yaml

Deploying ToolHive Operator and MCP Server

Now that our observability stack is ready, we can deploy the ToolHive operator. The operator is a Kubernetes-native tool that simplifies the management and deployment of MCP servers with built-in observability.

Install the CRDs (Custom Resource Definitions) and the operator:

helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.0.27 --kubeconfig kconfig.yaml

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  -n toolhive-system \
  --create-namespace --version 0.2.18 \
  --kubeconfig kconfig.yaml

Wait for the operator to be ready:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          31s

Now let's deploy a sample MCP server. We'll use gofetch, a simple MCP server that provides web scraping capabilities:

kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/operator/mcp-servers/mcpserver_fetch_otel.yaml --kubeconfig kconfig.yaml

This MCPServer custom resource automatically configures:

The MCP server container (gofetch)
ToolHive proxy for client access and observability
Telemetry settings for OTel integration
Service configuration for client access

Let's examine the key telemetry configuration in the MCPServer resource:

spec:
  telemetry:
    openTelemetry:
      enabled: true
      endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
      serviceName: mcp-fetch-server
      metrics:
        enabled: true
      tracing:
        enabled: true
        samplingRate: "1.0"

This tells ToolHive to:

Enable OpenTelemetry export for both metrics and traces
Send telemetry to our OTel collector using gRPC
Include metrics about MCP operations

Verify everything is running:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml
NAME                                READY   STATUS    RESTARTS   AGE
fetch-0                             1/1     Running   0          2m59s
fetch-7d988cbd46-cqdzq              1/1     Running   0          3m4s
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          3m31s


kubectl get mcpserver -n toolhive-system --kubeconfig kconfig.yaml
NAME    STATUS    URL                                                             AGE
fetch   Running   http://mcp-fetch-proxy.toolhive-system.svc.cluster.local:8080   2m27s

You should see both the MCP server pod and the MCPServer custom resource showing as ready.

Generating Traffic to Produce Metrics and Traces

To see metrics in action, we need to generate some traffic. Since the MCP server is running inside the cluster, we'll use kubectl port-forward to expose it locally.

Port-forward the ToolHive proxy service:

kubectl port-forward -n toolhive-system service/mcp-fetch-proxy --kubeconfig kconfig.yaml 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

In a new terminal, initialize an MCP session to get a session ID:

SESSION_ID=$(curl -s -D /dev/stderr \
  -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-06-18",
      "capabilities": {},
      "clientInfo": {
        "name": "curl-client",
        "version": "1.0.0"
      }
    }
  }' 2>&1 >/dev/null | grep "Mcp-Session-Id:" | cut -d' ' -f2 | tr -d '\r')

echo "Session ID: $SESSION_ID"

Now use the session ID to make tool call requests:

curl -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -H "Mcp-Session-Id: $SESSION_ID" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "fetch",
      "arguments": {
        "url": "https://github.com/stacklok/toolhive",
        "max_length": 100,
        "raw": false
      }
    }
  }'

The response should look something like:

event: message
id: D4AVERIINDK3ZTBYUKUHM5PCV5_0
data: {"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"![ToolHive Studio logo](/stacklok/toolhive/raw/main/docs/images/toolhive-icon-1024.png)![ToolHive wo\n\n[Content truncated. Use start_index to get more content.]"}]}}

Repeat this request 10-15 times with different URLs to generate meaningful traffic for our metrics dashboards:

Each request generates telemetry data that ToolHive captures and forwards to the OTel collector.

Visualizing Metrics in Grafana

Now for the exciting part: visualizing our metrics! Since the kube-prometheus-stack automatically deploys Grafana, we just need to expose it locally.

Port-forward the Grafana service:

kubectl port-forward -n monitoring service/kube-prometheus-stack-grafana 3000:80 --kubeconfig kconfig.yaml

Navigate to http://localhost:3000 and log in with the default credentials:

Username: admin
Password: admin

Importing the ToolHive Dashboard

We've created a starter dashboard to help you visualize some simple MCP server metrics. To import it:

Click the "+" icon in the top-right of the Grafana UI
Select "Import dashboard"
In the "Import via panel JSON" text box, paste the contents from an example dashboard that we've created
Click "Load" and then "Import"
- You may need to modify the UID to be shorter; sometimes Grafana doesn’t like long UIDs for it to successfully import

After importing, you'll see panels with real-time data showing.

Exploring Metrics with PromQL

You can also explore metrics directly using Prometheus queries. Go to "Explore" in Grafana and try queries on the custom ToolHive metrics:

This is powerful because it uses the same tools and dashboards you already use for your other workloads, bringing your MCP servers into the fold of your existing observability practice.

Note that CPU and memory metrics come from the OTel collector's kubeletstats receiver rather than directly from the Go application, providing more comprehensive resource monitoring.

Visualizing Tracing in Grafana

We can also explore some traces in Grafana that were reported by the ToolHive ProxyRunner to the OTel collector and then further to the Jaeger backend.

To do this:

Go to "Explore" on the side menu
Ensure the Jaeger Data source is selected
Click “Search” instead of “TraceID”
Select the “Service Name” dropdown and you should see the MCP server name. Select it and click “Run Query”
Several traces should appear, click into one and you should now see the single span reported by the ProxyRunner

Congratulations! You've successfully deployed a complete observability pipeline for MCP workloads and queried their metrics and traces. You've transformed a black-box service into a transparent, observable part of your system. This setup demonstrates Architecture 1 from our previous post: ToolHive pushes telemetry to an OTel collector, which Prometheus scrapes for metrics while traces flow to your tracing backend.

Call to Action: Contribute, Share Feedback, Explore Docs

By now, you've seen how ToolHive integrates seamlessly with OTel and Prometheus to make MCP workloads observable inside Kubernetes. With Prometheus scraping metrics, OTel collecting richer signals, and Grafana visualizing results, you've got a practical foundation for monitoring MCP servers.

The setup you've deployed represents just the beginning of what's possible with MCP observability. We encourage you to:

Try Different MCP Servers: Deploy other MCP servers and see how they behave differently in your dashboards. Each server type may expose different usage patterns and performance characteristics.

Share Your Experience: Join our community discussions on GitHub or Discord to share what you've learned and help improve ToolHive.

Contribute Back: Found issues or have ideas for improvements of custom metrics? The project team welcomes contributions, whether they're bug reports, feature requests, or code contributions.

If you missed the earlier posts in this series, be sure to check out:

Post 1: The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes
Post 2: Bridging the Observability Gap in MCP Servers with ToolHive

The story doesn't stop here. We'll continue exploring advanced observability features and new integrations as the MCP ecosystem evolves. The foundation you've built today will serve you well as both ToolHive and the broader observability landscape continue to mature.

DEV Community