DEV Community

Chris Burns for Stacklok

Posted on

From Black Box to Observable: Deploying ToolHive with OTel + Prometheus in Kubernetes

In our previous two posts, we laid the groundwork for modern Kubernetes observability. We explored why OpenTelemetry (OTel) and Prometheus work best in tandem, and how ToolHive helps bridge the observability gap for Model Context Protocol (MCP) servers that rarely expose their own usage metrics.

Now it's time to get hands-on. ToolHive sits in front of your MCP servers, collecting vital usage statistics and feeding them directly into your existing observability stack. In this tutorial, we'll walk through deploying ToolHive in a Kubernetes cluster alongside OTel, Prometheus, and Grafana. By the end, you'll have transformed your black-box MCP workloads into observable services.

Prerequisites: Kubernetes, Helm, kubectl

Before we begin, you'll need the following tools:

  • A Kubernetes cluster: Any cluster will do. For this tutorial, we're using a local cluster created with kind
  • Helm 3: The package manager for Kubernetes, making it easy to deploy complex applications like monitoring stacks
  • kubectl: The command-line tool for interacting with your Kubernetes cluster

If you're using kind, create a cluster with:

kind create cluster --name toolhive-demo
Enter fullscreen mode Exit fullscreen mode

Output the kubeconfig file into a file called kconfig.yaml so we can avoid conflicting with existing clusters when running further kubectl and helm commands.

kind get kubeconfig --name toolhive-demo > kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

Verify your cluster is ready:

kubectl cluster-info --kubeconfig kconfig.yaml
Kubernetes control plane is running at https://127.0.0.1:55371
CoreDNS is running at https://127.0.0.1:55371/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.


kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
toolhive-control-plane   Ready    control-plane   14m   v1.33.1

Enter fullscreen mode Exit fullscreen mode

You should see your cluster responding and nodes in a Ready state.

Installing Prometheus + Grafana

We'll start by setting up our monitoring backbone using the kube-prometheus-stack Helm chart. This comprehensive solution deploys Prometheus for metric collection and Grafana for visualization, with everything pre-configured to work together out of the box.

First, add the Prometheus community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

Create a dedicated namespace for our monitoring components and install the stack:

helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 77.12.0 -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/prometheus-stack-values.yaml -n monitoring --create-namespace --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

The values file from the ToolHive repository is specifically configured for this tutorial's architecture and sets up:

  • Prometheus with scrape jobs ready to pull metrics from our OTel collector
  • Grafana with admin credentials (admin:admin for testing)

Wait for all components to be ready:

kubectl get pods -n monitoring --kubeconfig kconfig.yaml
NAME                                                        READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-grafana-6c5cb68857-4hmw4              3/3     Running   0          30s
kube-prometheus-stack-kube-state-metrics-557fd457c6-c489z   1/1     Running   0          30s
kube-prometheus-stack-operator-7c6d8c4dc7-j2g24             1/1     Running   0          30s
kube-prometheus-stack-prometheus-node-exporter-t5q9w        1/1     Running   0          30s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          30s

Enter fullscreen mode Exit fullscreen mode

All pods should show Running or Completed status.

Installing the OTel Collector

With our monitoring backend in place, we'll deploy the OpenTelemetry collector. The collector is a crucial component that receives metrics and traces from ToolHive and makes them available to Prometheus and tracing backends.

Add the OpenTelemetry Helm repository:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

Install the collector using ToolHive's specialized values file:

helm upgrade -i otel-collector open-telemetry/opentelemetry-collector  -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/otel-values.yaml -n monitoring --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

This values file configures the collector to:

  • Use the OTLP receiver to accept metrics and traces pushed from ToolHive
  • Enable the kubeletstats receiver, providing valuable runtime metrics about containers and nodes that native OTel libraries sometimes miss
  • Enable the Kubernetes attributes processor to add pod/namespace context to telemetry
  • Enable the Prometheus exporter, making collected metrics available for Prometheus scraping
  • Configure service pipelines that route metrics and traces appropriately

Verify the collector is running:

kubectl get pods -n monitoring -l app.kubernetes.io/name=opentelemetry-collector --kubeconfig kconfig.yaml
NAME                                                 READY   STATUS    RESTARTS   AGE
otel-collector-opentelemetry-collector-agent-g5crz   1/1     Running   0          33s

Enter fullscreen mode Exit fullscreen mode

Installing Jaeger Backend for Trace Querying

Add the Jaeger Helm repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

Install Jaeger using ToolHive's specialized values file:

helm upgrade -i jaeger-all-in-one jaegertracing/jaeger -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/jaeger-values.yaml -n monitoring --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

Deploying ToolHive Operator and MCP Server

Now that our observability stack is ready, we can deploy the ToolHive operator. The operator is a Kubernetes-native tool that simplifies the management and deployment of MCP servers with built-in observability.

Install the CRDs (Custom Resource Definitions) and the operator:

helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.0.27 --kubeconfig kconfig.yaml

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  -n toolhive-system \
  --create-namespace --version 0.2.18 \
  --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

Wait for the operator to be ready:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          31s
Enter fullscreen mode Exit fullscreen mode

Now let's deploy a sample MCP server. We'll use gofetch, a simple MCP server that provides web scraping capabilities:

kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/operator/mcp-servers/mcpserver_fetch_otel.yaml --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

This MCPServer custom resource automatically configures:

  • The MCP server container (gofetch)
  • ToolHive proxy for client access and observability
  • Telemetry settings for OTel integration
  • Service configuration for client access

Let's examine the key telemetry configuration in the MCPServer resource:

spec:
  telemetry:
    otel:
      enabled: true
      endpoint: "http://otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317"
      protocol: "grpc"
    metrics:
      enabled: true
    tracing:
      enabled: true
      sampleRate: 1.0
Enter fullscreen mode Exit fullscreen mode

This tells ToolHive to:

  • Enable OpenTelemetry export for both metrics and traces
  • Send telemetry to our OTel collector using gRPC
  • Include metrics about MCP operations

Verify everything is running:

kubectl get pods -n toolhive-system --kubeconfig kconfig.yaml
NAME                                READY   STATUS    RESTARTS   AGE
fetch-0                             1/1     Running   0          2m59s
fetch-7d988cbd46-cqdzq              1/1     Running   0          3m4s
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          3m31s


kubectl get mcpserver -n toolhive-system --kubeconfig kconfig.yaml
NAME    STATUS    URL                                                             AGE
fetch   Running   http://mcp-fetch-proxy.toolhive-system.svc.cluster.local:8080   2m27s

Enter fullscreen mode Exit fullscreen mode

You should see both the MCP server pod and the MCPServer custom resource showing as ready.

Generating Traffic to Produce Metrics and Traces

To see metrics in action, we need to generate some traffic. Since the MCP server is running inside the cluster, we'll use kubectl port-forward to expose it locally.

Port-forward the ToolHive proxy service:

kubectl port-forward -n toolhive-system service/mcp-fetch-proxy --kubeconfig kconfig.yaml 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Enter fullscreen mode Exit fullscreen mode

In a new terminal, initialize an MCP session to get a session ID:

SESSION_ID=$(curl -s -D /dev/stderr \
  -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-06-18",
      "capabilities": {},
      "clientInfo": {
        "name": "curl-client",
        "version": "1.0.0"
      }
    }
  }' 2>&1 >/dev/null | grep "Mcp-Session-Id:" | cut -d' ' -f2 | tr -d '\r')

echo "Session ID: $SESSION_ID"
Enter fullscreen mode Exit fullscreen mode

Now use the session ID to make tool call requests:

curl -X POST "http://localhost:8080/mcp" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Mcp-Protocol-Version: 2025-06-18" \
  -H "Mcp-Session-Id: $SESSION_ID" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "fetch",
      "arguments": {
        "url": "https://github.com/stacklok/toolhive",
        "max_length": 100,
        "raw": false
      }
    }
  }'
Enter fullscreen mode Exit fullscreen mode

The response should look something like:

event: message
id: D4AVERIINDK3ZTBYUKUHM5PCV5_0
data: {"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"![ToolHive Studio logo](/stacklok/toolhive/raw/main/docs/images/toolhive-icon-1024.png)![ToolHive wo\n\n[Content truncated. Use start_index to get more content.]"}]}}

Enter fullscreen mode Exit fullscreen mode

Repeat this request 10-15 times with different URLs to generate meaningful traffic for our metrics dashboards:

Each request generates telemetry data that ToolHive captures and forwards to the OTel collector.

Visualizing Metrics in Grafana

Now for the exciting part: visualizing our metrics! Since the kube-prometheus-stack automatically deploys Grafana, we just need to expose it locally.

Port-forward the Grafana service:

kubectl port-forward -n monitoring service/kube-prometheus-stack-grafana 3000:80 --kubeconfig kconfig.yaml
Enter fullscreen mode Exit fullscreen mode

Navigate to http://localhost:3000 and log in with the default credentials:

  • Username: admin
  • Password: admin

Importing the ToolHive Dashboard

We've created a starter dashboard to help you visualize some simple MCP server metrics. To import it:

  1. Click the "+" icon in Grafana's left sidebar
  2. Select "Import dashboard"
  3. In the "Import via panel JSON" text box, paste the contents from an example dashboard that we've created
  4. Click "Load" and then "Import"
    • You may need to modify the UID to be shorter; sometimes Grafana doesn’t like long UIDs for it to successfully import

After importing, you'll see panels with real-time data showing.

Exploring Metrics with PromQL

You can also explore metrics directly using Prometheus queries. Go to "Explore" in Grafana and try queries on the custom ToolHive metrics:

This is powerful because it uses the same tools and dashboards you already use for your other workloads, bringing your MCP servers into the fold of your existing observability practice.

Note that CPU and memory metrics come from the OTel collector's kubeletstats receiver rather than directly from the Go application, providing more comprehensive resource monitoring.

Visualizing Tracing in Grafana

We can also explore some traces in Grafana that were reported by the ToolHive ProxyRunner to the OTel collector and then further to the Jaeger backend.

To do this:

  1. Go to "Explore" on the side menu
  2. Ensure the Jaeger Data source is selected
  3. Click “Search” instead of “TraceID
  4. Select the “Service Name” dropdown and you should see the MCP server name. Select it and click “Run Query
  5. Several traces should appear, click into one and you should now see the single span reported by the ProxyRunner

Congratulations! You've successfully deployed a complete observability pipeline for MCP workloads and queried their metrics and traces. You've transformed a black-box service into a transparent, observable part of your system. This setup demonstrates Architecture 1 from our previous post: ToolHive pushes telemetry to an OTel collector, which Prometheus scrapes for metrics while traces flow to your tracing backend.

Call to Action: Contribute, Share Feedback, Explore Docs

By now, you've seen how ToolHive integrates seamlessly with OTel and Prometheus to make MCP workloads observable inside Kubernetes. With Prometheus scraping metrics, OTel collecting richer signals, and Grafana visualizing results, you've got a practical foundation for monitoring MCP servers.

The setup you've deployed represents just the beginning of what's possible with MCP observability. We encourage you to:

Try Different MCP Servers: Deploy other MCP servers and see how they behave differently in your dashboards. Each server type may expose different usage patterns and performance characteristics.

Share Your Experience: Join our community discussions on GitHub or Discord to share what you've learned and help improve ToolHive.

Contribute Back: Found issues or have ideas for improvements of custom metrics? The project team welcomes contributions, whether they're bug reports, feature requests, or code contributions.

If you missed the earlier posts in this series, be sure to check out:

The story doesn't stop here. We'll continue exploring advanced observability features and new integrations as the MCP ecosystem evolves. The foundation you've built today will serve you well as both ToolHive and the broader observability landscape continue to mature.

Top comments (0)