<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chris Burns</title>
    <description>The latest articles on DEV Community by Chris Burns (@chrisjburns).</description>
    <link>https://dev.to/chrisjburns</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3060357%2Fe1d31d63-a3fc-4a93-9967-420a4f27f48e.png</url>
      <title>DEV Community: Chris Burns</title>
      <link>https://dev.to/chrisjburns</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chrisjburns"/>
    <language>en</language>
    <item>
      <title>From Black Box to Observable: Deploying ToolHive with OTel + Prometheus in Kubernetes</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Tue, 30 Sep 2025 17:49:19 +0000</pubDate>
      <link>https://dev.to/stacklok/from-black-box-to-observable-deploying-toolhive-with-otel-prometheus-in-kubernetes-lhg</link>
      <guid>https://dev.to/stacklok/from-black-box-to-observable-deploying-toolhive-with-otel-prometheus-in-kubernetes-lhg</guid>
      <description>&lt;p&gt;In our previous two posts, we laid the groundwork for modern Kubernetes observability. We explored why OpenTelemetry (OTel) and Prometheus work best in tandem, and how ToolHive helps bridge the observability gap for Model Context Protocol (MCP) servers that rarely expose their own usage metrics.&lt;/p&gt;

&lt;p&gt;Now it's time to get hands-on. ToolHive sits in front of your MCP servers, collecting vital usage statistics and feeding them directly into your existing observability stack. In this tutorial, we'll walk through deploying ToolHive in a Kubernetes cluster alongside OTel, Prometheus, and Grafana. By the end, you'll have transformed your black-box MCP workloads into observable services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites: Kubernetes, Helm, kubectl
&lt;/h2&gt;

&lt;p&gt;Before we begin, you'll need the following tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A Kubernetes cluster&lt;/strong&gt;: Any cluster will do. For this tutorial, we're using a local cluster created with &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;kind&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Helm 3&lt;/strong&gt;: The package manager for Kubernetes, making it easy to deploy complex applications like monitoring stacks
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kubectl&lt;/strong&gt;: The command-line tool for interacting with your Kubernetes cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using kind, create a cluster with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kind create cluster &lt;span class="nt"&gt;--name&lt;/span&gt; toolhive-demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output the kubeconfig file into a file called &lt;code&gt;kconfig.yaml&lt;/code&gt; so we can avoid conflicting with existing clusters when running further &lt;code&gt;kubectl&lt;/code&gt; and &lt;code&gt;helm&lt;/code&gt; commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kind get kubeconfig &lt;span class="nt"&gt;--name&lt;/span&gt; toolhive-demo &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify your cluster is ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl cluster-info &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
Kubernetes control plane is running at https://127.0.0.1:55371
CoreDNS is running at https://127.0.0.1:55371/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use &lt;span class="s1"&gt;'kubectl cluster-info dump'&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;


kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
toolhive-control-plane   Ready    control-plane   14m   v1.33.1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see your cluster responding and nodes in a Ready state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Prometheus + Grafana
&lt;/h2&gt;

&lt;p&gt;We'll start by setting up our monitoring backbone using the &lt;a href="https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md" rel="noopener noreferrer"&gt;kube-prometheus-stack&lt;/a&gt; Helm chart. This comprehensive solution deploys Prometheus for metric collection and Grafana for visualization, with everything pre-configured to work together out of the box.&lt;/p&gt;

&lt;p&gt;First, add the Prometheus community Helm repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a dedicated namespace for our monitoring components and install the stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; kube-prometheus-stack prometheus-community/kube-prometheus-stack &lt;span class="nt"&gt;--version&lt;/span&gt; 77.12.0 &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/prometheus-stack-values.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The values file from the ToolHive repository is specifically configured for this tutorial's architecture and sets up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus with scrape jobs ready to pull metrics from our OTel collector
&lt;/li&gt;
&lt;li&gt;Grafana with admin credentials (admin:admin for testing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wait for all components to be ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
NAME                                                        READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-grafana-6c5cb68857-4hmw4              3/3     Running   0          30s
kube-prometheus-stack-kube-state-metrics-557fd457c6-c489z   1/1     Running   0          30s
kube-prometheus-stack-operator-7c6d8c4dc7-j2g24             1/1     Running   0          30s
kube-prometheus-stack-prometheus-node-exporter-t5q9w        1/1     Running   0          30s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          30s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All pods should show Running or Completed status.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the OTel Collector
&lt;/h2&gt;

&lt;p&gt;With our monitoring backend in place, we'll deploy the OpenTelemetry collector. The collector is a crucial component that receives metrics and traces from ToolHive and makes them available to Prometheus and tracing backends.&lt;/p&gt;

&lt;p&gt;Add the OpenTelemetry Helm repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install the collector using ToolHive's specialized values file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; otel-collector open-telemetry/opentelemetry-collector  &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/otel-values.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This values file configures the collector to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;strong&gt;OTLP receiver&lt;/strong&gt; to accept metrics and traces pushed from ToolHive
&lt;/li&gt;
&lt;li&gt;Enable the &lt;strong&gt;kubeletstats receiver&lt;/strong&gt;, providing valuable runtime metrics about containers and nodes that native OTel libraries sometimes miss
&lt;/li&gt;
&lt;li&gt;Enable the &lt;strong&gt;Kubernetes attributes processor&lt;/strong&gt; to add pod/namespace context to telemetry
&lt;/li&gt;
&lt;li&gt;Enable the &lt;strong&gt;Prometheus exporter&lt;/strong&gt;, making collected metrics available for Prometheus scraping
&lt;/li&gt;
&lt;li&gt;Configure &lt;strong&gt;service pipelines&lt;/strong&gt; that route metrics and traces appropriately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verify the collector is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;-l&lt;/span&gt; app.kubernetes.io/name&lt;span class="o"&gt;=&lt;/span&gt;opentelemetry-collector &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
NAME                                                 READY   STATUS    RESTARTS   AGE
otel-collector-opentelemetry-collector-agent-g5crz   1/1     Running   0          33s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing Jaeger Backend for Trace Querying
&lt;/h2&gt;

&lt;p&gt;Add the Jaeger Helm repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Jaeger using ToolHive's specialized values file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; jaeger-all-in-one jaegertracing/jaeger &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/jaeger-values.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying ToolHive Operator and MCP Server
&lt;/h2&gt;

&lt;p&gt;Now that our observability stack is ready, we can deploy the ToolHive operator. The operator is a Kubernetes-native tool that simplifies the management and deployment of MCP servers with built-in observability.&lt;/p&gt;

&lt;p&gt;Install the CRDs (Custom Resource Definitions) and the operator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;--install&lt;/span&gt; toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds &lt;span class="nt"&gt;--version&lt;/span&gt; 0.0.27 &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml

helm upgrade &lt;span class="nt"&gt;--install&lt;/span&gt; toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="nt"&gt;--version&lt;/span&gt; 0.2.18 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for the operator to be ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          31s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's deploy a sample MCP server. We'll use &lt;a href="https://github.com/StacklokLabs/gofetch" rel="noopener noreferrer"&gt;gofetch&lt;/a&gt;, a simple MCP server that provides web scraping capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/operator/mcp-servers/mcpserver_fetch_otel.yaml &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This MCPServer custom resource automatically configures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The MCP server container (gofetch)
&lt;/li&gt;
&lt;li&gt;ToolHive proxy for client access and observability
&lt;/li&gt;
&lt;li&gt;Telemetry settings for OTel integration
&lt;/li&gt;
&lt;li&gt;Service configuration for client access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's examine the key telemetry configuration in the MCPServer resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec:
  telemetry:
    openTelemetry:
      enabled: true
      endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
      serviceName: mcp-fetch-server
      metrics:
        enabled: true
      tracing:
        enabled: true
        samplingRate: "1.0"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells ToolHive to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable OpenTelemetry export for both metrics and traces
&lt;/li&gt;
&lt;li&gt;Send telemetry to our OTel collector using gRPC
&lt;/li&gt;
&lt;li&gt;Include metrics about MCP operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verify everything is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
NAME                                READY   STATUS    RESTARTS   AGE
fetch-0                             1/1     Running   0          2m59s
fetch-7d988cbd46-cqdzq              1/1     Running   0          3m4s
toolhive-operator-95b55b47d-pbqlh   1/1     Running   0          3m31s


kubectl get mcpserver &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
NAME    STATUS    URL                                                             AGE
fetch   Running   http://mcp-fetch-proxy.toolhive-system.svc.cluster.local:8080   2m27s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see both the MCP server pod and the MCPServer custom resource showing as ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating Traffic to Produce Metrics and Traces
&lt;/h2&gt;

&lt;p&gt;To see metrics in action, we need to generate some traffic. Since the MCP server is running inside the cluster, we'll use kubectl port-forward to expose it locally.&lt;/p&gt;

&lt;p&gt;Port-forward the ToolHive proxy service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system service/mcp-fetch-proxy &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml 8080:8080
Forwarding from 127.0.0.1:8080 -&amp;gt; 8080
Forwarding from &lt;span class="o"&gt;[&lt;/span&gt;::1]:8080 -&amp;gt; 8080

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a new terminal, initialize an MCP session to get a session ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;SESSION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; /dev/stderr &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:8080/mcp"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/json, text/event-stream"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Mcp-Protocol-Version: 2025-06-18"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-06-18",
      "capabilities": {},
      "clientInfo": {
        "name": "curl-client",
        "version": "1.0.0"
      }
    }
  }'&lt;/span&gt; 2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Mcp-Session-Id:"&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'\r'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Session ID: &lt;/span&gt;&lt;span class="nv"&gt;$SESSION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now use the session ID to make tool call requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:8080/mcp"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/json, text/event-stream"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Mcp-Protocol-Version: 2025-06-18"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Mcp-Session-Id: &lt;/span&gt;&lt;span class="nv"&gt;$SESSION_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "fetch",
      "arguments": {
        "url": "https://github.com/stacklok/toolhive",
        "max_length": 100,
        "raw": false
      }
    }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response should look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;event:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;message&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;id:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;D&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="err"&gt;AVERIINDK&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;ZTBYUKUHM&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="err"&gt;PCV&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="err"&gt;_&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;data:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"jsonrpc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"![ToolHive Studio logo](/stacklok/toolhive/raw/main/docs/images/toolhive-icon-1024.png)![ToolHive wo&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;[Content truncated. Use start_index to get more content.]"&lt;/span&gt;&lt;span class="p"&gt;}]}}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat this request 10-15 times with different URLs to generate meaningful traffic for our metrics dashboards:&lt;/p&gt;

&lt;p&gt;Each request generates telemetry data that ToolHive captures and forwards to the OTel collector.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Metrics in Grafana
&lt;/h2&gt;

&lt;p&gt;Now for the exciting part: visualizing our metrics! Since the kube-prometheus-stack automatically deploys Grafana, we just need to expose it locally.&lt;/p&gt;

&lt;p&gt;Port-forward the Grafana service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring service/kube-prometheus-stack-grafana 3000:80 &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; kconfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt; and log in with the default credentials:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Username: &lt;code&gt;admin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;admin&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Importing the ToolHive Dashboard
&lt;/h3&gt;

&lt;p&gt;We've created a starter dashboard to help you visualize some simple MCP server metrics. To import it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click the "&lt;strong&gt;+&lt;/strong&gt;" icon in the top-right of the Grafana UI&lt;/li&gt;
&lt;li&gt;Select "&lt;strong&gt;Import dashboard&lt;/strong&gt;"
&lt;/li&gt;
&lt;li&gt;In the "&lt;strong&gt;Import via panel JSON&lt;/strong&gt;" text box, paste the contents from an &lt;a href="https://raw.githubusercontent.com/stacklok/toolhive/6929a52b4460cd0951c30e8ca65490f7b38e91ca/examples/otel/grafana-dashboards/toolhive-mcp-grafana-dashboard-otel-scrape.json" rel="noopener noreferrer"&gt;example dashboard&lt;/a&gt; that we've created &lt;/li&gt;
&lt;li&gt;Click "&lt;strong&gt;Load&lt;/strong&gt;" and then "&lt;strong&gt;Import&lt;/strong&gt;"

&lt;ul&gt;
&lt;li&gt;You may need to modify the UID to be shorter; sometimes Grafana doesn’t like long UIDs for it to successfully import&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After importing, you'll see panels with real-time data showing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdi94bxbu2hpyausrj33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdi94bxbu2hpyausrj33.png" alt=" " width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploring Metrics with PromQL
&lt;/h3&gt;

&lt;p&gt;You can also explore metrics directly using Prometheus queries. Go to "Explore" in Grafana and try queries on the custom ToolHive &lt;a href="https://docs.stacklok.com/toolhive/concepts/observability#metrics-collection" rel="noopener noreferrer"&gt;metrics&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;This is powerful because it uses the same tools and dashboards you already use for your other workloads, bringing your MCP servers into the fold of your existing observability practice.&lt;/p&gt;

&lt;p&gt;Note that CPU and memory metrics come from the OTel collector's kubeletstats receiver rather than directly from the Go application, providing more comprehensive resource monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Tracing in Grafana
&lt;/h2&gt;

&lt;p&gt;We can also explore some traces in Grafana that were reported by the ToolHive ProxyRunner to the OTel collector and then further to the Jaeger backend.&lt;/p&gt;

&lt;p&gt;To do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to "&lt;strong&gt;Explore&lt;/strong&gt;" on the side menu
&lt;/li&gt;
&lt;li&gt;Ensure the Jaeger Data source is selected&lt;/li&gt;
&lt;li&gt;Click “&lt;strong&gt;Search&lt;/strong&gt;” instead of “&lt;strong&gt;TraceID&lt;/strong&gt;”
&lt;/li&gt;
&lt;li&gt;Select the “&lt;strong&gt;Service Name&lt;/strong&gt;” dropdown and you should see the MCP server name. Select it and click “&lt;strong&gt;Run Query&lt;/strong&gt;”
&lt;/li&gt;
&lt;li&gt;Several traces should appear, click into one and you should now see the single span reported by the ProxyRunner&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftybymer3p8mbhrtwi2w8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftybymer3p8mbhrtwi2w8.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations! You've successfully deployed a complete observability pipeline for MCP workloads and queried their metrics and traces. You've transformed a black-box service into a transparent, observable part of your system. This setup demonstrates Architecture 1 from our previous post: ToolHive pushes telemetry to an OTel collector, which Prometheus scrapes for metrics while traces flow to your tracing backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call to Action: Contribute, Share Feedback, Explore Docs
&lt;/h2&gt;

&lt;p&gt;By now, you've seen how ToolHive integrates seamlessly with OTel and Prometheus to make MCP workloads observable inside Kubernetes. With Prometheus scraping metrics, OTel collecting richer signals, and Grafana visualizing results, you've got a practical foundation for monitoring MCP servers.&lt;/p&gt;

&lt;p&gt;The setup you've deployed represents just the beginning of what's possible with MCP observability. We encourage you to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try Different MCP Servers&lt;/strong&gt;: Deploy other MCP servers and see how they behave differently in your dashboards. Each server type may expose different usage patterns and performance characteristics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Share Your Experience&lt;/strong&gt;: Join our community discussions on &lt;a href="https://github.com/stacklok/toolhive" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or &lt;a href="https://discord.gg/stacklok" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; to share what you've learned and help improve ToolHive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contribute Back&lt;/strong&gt;: Found issues or have ideas for improvements of custom metrics? The project team welcomes contributions, whether they're bug reports, feature requests, or code contributions.&lt;/p&gt;

&lt;p&gt;If you missed the earlier posts in this series, be sure to check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Post 1&lt;/strong&gt;: &lt;a href="https://dev.to/stacklok/the-next-big-observability-gap-for-kubernetes-is-mcp-servers-421d"&gt;The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post 2&lt;/strong&gt;: &lt;a href="https://dev.to/stacklok/bridging-the-observability-gap-in-mcp-servers-with-toolhive-3827"&gt;Bridging the Observability Gap in MCP Servers with ToolHive&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The story doesn't stop here. We'll continue exploring advanced observability features and new integrations as the MCP ecosystem evolves. The foundation you've built today will serve you well as both ToolHive and the broader observability landscape continue to mature.  &lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>monitoring</category>
      <category>tooling</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Bridging the Observability Gap in MCP Servers with ToolHive</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Thu, 25 Sep 2025 16:16:38 +0000</pubDate>
      <link>https://dev.to/stacklok/bridging-the-observability-gap-in-mcp-servers-with-toolhive-3827</link>
      <guid>https://dev.to/stacklok/bridging-the-observability-gap-in-mcp-servers-with-toolhive-3827</guid>
      <description>&lt;p&gt;In our previous post, we explored why Kubernetes observability requires both OpenTelemetry (OTel) and Prometheus. Together, they form a powerful foundation for monitoring modern workloads, but only when those workloads expose telemetry. What happens when they don't?&lt;/p&gt;

&lt;p&gt;That's exactly the case with many Model Context Protocol (MCP) servers. These servers run critical workloads but rarely expose metrics or integrate with observability frameworks. For operations teams, they behave like black boxes; you see requests going in and responses coming out, but nothing about what's happening inside.&lt;/p&gt;

&lt;p&gt;ToolHive was built to reduce this gap. Running natively inside Kubernetes, ToolHive acts as an intelligent proxy that collects usage statistics from MCP servers without requiring any modifications to the servers themselves. It then seamlessly feeds that data into your existing OTel + Prometheus stack, giving you the same dashboards, alerts, and reliability insights you rely on for other workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap: The MCP Observability Problem
&lt;/h2&gt;

&lt;p&gt;The core issue is a mismatch between modern observability standards and the operational reality of many MCP servers. While Prometheus expects to scrape a &lt;code&gt;/metrics&lt;/code&gt; endpoint and OTel expects data to be pushed from instrumented applications, many MCP servers do neither. They are designed for a single purpose: providing specialized capabilities to AI systems by bridging models to the real world. Operational telemetry is often an afterthought, if it's considered at all.&lt;/p&gt;

&lt;p&gt;This lack of metrics makes it impossible to answer basic but critical questions: How many requests is my MCP server handling per second? What is the average latency of tool calls? Is the server experiencing errors or timeouts? How much CPU and memory is the server consuming?&lt;/p&gt;

&lt;p&gt;Without this data, you're flying blind, unable to optimize performance, troubleshoot issues, or ensure the reliability of your AI-powered applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ToolHive Collects Metrics
&lt;/h2&gt;

&lt;p&gt;ToolHive's approach is straightforward: instead of relying on MCP servers to expose their own metrics, it wraps them and acts as an intermediary for all client requests. ToolHive runs alongside MCP servers in Kubernetes and observes their activity directly at the orchestration layer.&lt;/p&gt;

&lt;p&gt;As requests and responses flow through ToolHive, it observes and records key operational data points: request counts and rates, response latency and duration, error codes and status, and tool usage statistics. It also generates distributed traces for each MCP interaction, providing end-to-end visibility into request flows. By intercepting request and usage data, ToolHive can measure request volumes, latencies, and error rates while attributing metrics to specific MCP servers or workloads.&lt;/p&gt;

&lt;p&gt;This approach decouples observability from the MCP server itself. Zero server modification means existing MCP servers work immediately without code changes or additional dependencies. Protocol awareness allows ToolHive to understand MCP-specific operations like tool calls and resource requests, providing metrics that generic proxies couldn't capture. Kubernetes native deployment means it integrates naturally with service discovery and scaling patterns.&lt;/p&gt;

&lt;p&gt;Since ToolHive is built with OTel and Prometheus in mind, it generates and exposes both metrics and traces in formats your existing monitoring stack can consume immediately, normalising data into standard OTel and Prometheus formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Supported Architectures
&lt;/h2&gt;

&lt;p&gt;ToolHive is designed for flexibility and can integrate into a variety of observability setups. It supports four primary architectures for feeding data to your pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture 1 (Recommended): ToolHive → OTel Collector ← Prometheus&lt;/strong&gt; ToolHive pushes both metrics and traces to an OpenTelemetry collector using OTLP (OpenTelemetry Protocol). The collector exposes a &lt;code&gt;/metrics&lt;/code&gt; endpoint that Prometheus scrapes for metrics data, while traces are exported to your tracing backend (like Jaeger or Tempo). This is a robust and scalable architecture that centralizes data collection and processing while leveraging the pull-based reliability of Prometheus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture 2: ToolHive → OTel Collector → Prometheus (RemoteWrite)&lt;/strong&gt; Similar to Architecture 1, ToolHive pushes metrics and traces to the OTel collector. The collector exports traces to your tracing backend and uses the Prometheus RemoteWrite exporter to push metrics directly to the Prometheus server. This reduces scraping overhead but can lose data if Prometheus is unavailable when the push occurs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture 3: ToolHive ← Prometheus (Direct Scrape)&lt;/strong&gt; ToolHive exposes its own &lt;code&gt;/metrics&lt;/code&gt; endpoint for Prometheus scraping, while traces are still pushed to an OTel collector for export to tracing backends. This is the simplest setup for metrics collection but requires separate configuration for trace export.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture 4: Hybrid&lt;/strong&gt; This approach maximizes flexibility: ToolHive pushes traces to an OTel collector (which exports to tracing backends) while exposing a &lt;code&gt;/metrics&lt;/code&gt; endpoint that Prometheus scrapes directly. This provides full observability coverage but adds operational complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Recommend Architecture 1
&lt;/h2&gt;

&lt;p&gt;While all four architectures are valid, Architecture 1 represents the best practice for most modern Kubernetes environments. It strikes the right balance for most Kubernetes deployments by providing several key advantages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Centralization and Standardization&lt;/strong&gt;: It centralizes both metrics and traces in a single pipeline, making it easier to manage, enrich, and route to different backends. For organizations already using OTel collectors for other services, this architecture maintains consistency across the monitoring stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliability&lt;/strong&gt;: The pull-based model of Prometheus is inherently reliable for metrics. The OTel collector acts as a reliable buffer between ToolHive and both Prometheus and tracing backends, handling temporary network issues or unavailability gracefully. If Prometheus is down for maintenance, it can catch up by scraping when it comes back online, rather than losing data that would have been pushed during the outage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: The OTel collector can process and export both metrics and traces to any number of destinations. For metrics, this includes Prometheus, long-term storage, or analytics platforms. For traces, it can route to Jaeger, Tempo, or other tracing backends. It can add labels, perform transformations, and route data to multiple backends if needed, avoiding vendor lock-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  ToolHive in Action: Real Metrics and Traces From MCP Servers
&lt;/h2&gt;

&lt;p&gt;Once you've deployed ToolHive and configured it to work with your OTel + Prometheus stack, your dashboards will be populated with both metrics and traces that provide immediate visibility into MCP server operations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Request Metrics&lt;/strong&gt; include counters (&lt;code&gt;toolhive_mcp_requests_total&lt;/code&gt;) for total requests and &lt;code&gt;toolhive_mcp_request_duration_seconds_*&lt;/code&gt; histogram showing p95 and p99 latency,, broken down by MCP server and operation type. These help identify usage patterns and performance trends across your MCP infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk8vwqjcszwx3d1n3nk2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk8vwqjcszwx3d1n3nk2.png" alt=" " width="800" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Usage Statistics&lt;/strong&gt; track which MCP tools are being called most frequently with &lt;code&gt;toolhive_mcp_tool_calls_total&lt;/code&gt; (counter for specific tool invocations), success rates for different tool types, and usage patterns over time. This data is invaluable for understanding how AI systems are interacting with your MCP servers and which capabilities are most critical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distributed Traces&lt;/strong&gt; show the complete journey of MCP requests, from client initiation through ToolHive processing to server response. Each trace includes timing information for different phases of the MCP interaction, making it possible to identify bottlenecks and understand request flow patterns. Traces are correlated with metrics through trace and span IDs, enabling powerful troubleshooting workflows.&lt;/p&gt;

&lt;p&gt;All metrics include standard Kubernetes labels for namespace, pod, and service, making it easy to aggregate and filter data in existing dashboards. These are the metrics and traces you need to build meaningful dashboards, set up critical alerts, and truly understand the health and performance of your MCP workloads. The observability data integrates seamlessly with alerting rules, allowing teams to set up notifications for MCP-specific issues like tool failure rates or unusual usage patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call to Action: Try ToolHive / Join the Community
&lt;/h2&gt;

&lt;p&gt;MCP observability gaps don't have to be a given. With &lt;a href="https://toolhive.dev" rel="noopener noreferrer"&gt;ToolHive&lt;/a&gt;, operations teams can gain critical visibility into your AI workloads without waiting for server-side changes or upstream telemetry support. By supporting multiple architectures - and recommending a best practice approach - ToolHive makes it possible to monitor MCP servers as part of a standard OTel + Prometheus pipeline.&lt;/p&gt;

&lt;p&gt;The project is actively developed and welcomes community input. Whether you're running a single MCP server or managing dozens across multiple clusters, ToolHive can provide the visibility you need to operate confidently. Please checkout ToolHive and connect with us on &lt;a href="https://discord.gg/stacklok" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ready to see it in action? In the next post, we'll walk through a hands-on guide to deploying ToolHive in Kubernetes, complete with Helm charts, kubectl steps, and a starter Grafana dashboard.&lt;/p&gt;

&lt;p&gt;If you missed the earlier posts in this series, be sure to check out: &lt;a href="https://dev.to/stacklok/the-next-big-observability-gap-for-kubernetes-is-mcp-servers-421d"&gt;Post 1: The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>monitoring</category>
      <category>tooling</category>
    </item>
    <item>
      <title>The Next Big Observability Gap for Kubernetes is MCP Servers</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Mon, 22 Sep 2025 14:56:18 +0000</pubDate>
      <link>https://dev.to/stacklok/the-next-big-observability-gap-for-kubernetes-is-mcp-servers-421d</link>
      <guid>https://dev.to/stacklok/the-next-big-observability-gap-for-kubernetes-is-mcp-servers-421d</guid>
      <description>&lt;p&gt;Kubernetes has become the de facto operating system for the cloud, empowering organizations to scale and orchestrate their workloads with unprecedented ease. But with this power comes a new set of challenges. As you break down monoliths into microservices and deploy hundreds or thousands of pods, each workload can become a potential black box. The very agility that makes Kubernetes so valuable also makes observability a monumental task.&lt;/p&gt;

&lt;p&gt;Prometheus and OpenTelemetry (OTel) emerged as the go-to tools for making these workloads observable, yet not every system plays by the same rules. A growing example is &lt;strong&gt;Model Context Protocol (MCP) servers&lt;/strong&gt;, which often don't expose metrics at all, creating blind spots in even the most sophisticated monitoring stacks. This gap highlights the next great challenge in Kubernetes observability.&lt;/p&gt;

&lt;p&gt;In this post, we'll explore the broader observability landscape, why OTel and Prometheus work best in tandem, and why MCP highlights the gaps that still remain in our quest for comprehensive system visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Black Box Problem in Kubernetes
&lt;/h2&gt;

&lt;p&gt;In a monolithic application, you often have a single point of failure and a single application log to comb through. In a Kubernetes environment, that single application is now a distributed system of dozens or hundreds of services, each with its own logs, resource consumption patterns, and unique failure modes. A single user request might traverse multiple services, making it nearly impossible to trace without a robust observability strategy.&lt;/p&gt;

&lt;p&gt;This black box problem is amplified by several characteristics of Kubernetes environments. Ephemeral workloads mean that debugging information disappears when pods are terminated. Service mesh complexity introduces additional network hops and failure modes that aren't immediately visible. Multi-tenant clusters create resource contention that can be difficult to attribute to specific workloads. Dynamic scaling means that performance baselines are constantly shifting as replicas come and go.&lt;/p&gt;

&lt;p&gt;Traditional monitoring approaches that rely on host-level metrics and application logs quickly become inadequate. Pods come and go, workloads scale dynamically, and ephemeral containers rarely leave behind a trail. You need telemetry that can follow requests across service boundaries, survive pod restarts, and provide insights into the distributed system as a whole rather than just individual components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics, Logs, and Traces: A Quick Refresher
&lt;/h2&gt;

&lt;p&gt;Before we dive into the tools, let's quickly recap the three pillars of observability:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt; are numerical measurements collected over time, such as CPU utilization, request rate, error count, and response latency. They're perfect for dashboards and alerting but don't provide detail about individual requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logs&lt;/strong&gt; are timestamped event records that provide detailed context about what happened at specific points in time. Invaluable for debugging and auditing, but correlating across services can be challenging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traces&lt;/strong&gt; record the journey of individual requests through distributed systems, connecting operations across service boundaries to identify bottlenecks and dependencies.&lt;/p&gt;

&lt;p&gt;Each pillar serves different purposes, and effective observability strategies combine all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prometheus: The Metrics Powerhouse
&lt;/h2&gt;

&lt;p&gt;Prometheus has earned its place as the standard for collecting and storing metrics in Kubernetes environments, and for good reason. Its pull-based model is a perfect fit for a dynamic, ephemeral world where services appear and disappear constantly. The Prometheus server periodically scrapes metric endpoints (usually &lt;code&gt;/metrics&lt;/code&gt;) exposed by applications, making it naturally aligned with Kubernetes' service discovery mechanisms.&lt;/p&gt;

&lt;p&gt;What makes Prometheus particularly powerful in Kubernetes is its integration with platform concepts like ServiceMonitor and PodMonitor custom resources, which automatically discover services as they scale. The query language, PromQL, excels at time series analysis, making it straightforward to calculate rates, percentiles, and aggregations across multiple dimensions.&lt;/p&gt;

&lt;p&gt;However, Prometheus has limitations. It's primarily designed for metrics, so correlating with logs and traces requires additional tooling. The pull model can also miss short-lived processes or workloads that can't expose HTTP endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenTelemetry: The Unified Framework
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry (OTel) provides a single, vendor-neutral framework for collecting metrics, logs, and traces. The &lt;strong&gt;OpenTelemetry SDK&lt;/strong&gt; lets developers instrument code once and export telemetry to multiple backends - traces to Jaeger, metrics to Prometheus, logs to Elasticsearch.&lt;/p&gt;

&lt;p&gt;Auto-instrumentation capabilities mean many applications gain observability without code changes. The OpenTelemetry Collector serves as a central hub for processing telemetry data, typically deployed in Kubernetes as both a DaemonSet and Deployment.&lt;/p&gt;

&lt;p&gt;What makes OTel particularly valuable is its ability to correlate telemetry across all three pillars. Trace spans can include logs as events, and metrics can be tagged with trace IDs, enabling workflows like jumping from dashboard alerts to specific failing traces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why They're Better Together
&lt;/h2&gt;

&lt;p&gt;Prometheus and OTel work best together, each excelling where the other has limitations. Standardized instrumentationthrough OTel means developers can use one toolset regardless of telemetry type, while exposing metrics in formats Prometheus can scrape.&lt;/p&gt;

&lt;p&gt;Complementary strengths provide both high-level operational views (Prometheus) and detailed diagnostic capabilities (OTel). Shared infrastructure reduces overhead - the same Kubernetes service discovery works for both tools. Correlated troubleshooting becomes possible when metrics alerts include trace context, letting teams drill down from aggregate problems to specific failing requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP as the Case Study for Observability Gaps
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) servers exemplify this challenge. These lightweight applications provide context and tools to AI systems, handling requests and managing state - all behaviors that should be observable - yet they typically operate as complete black boxes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The core problem&lt;/strong&gt;: Many MCP servers prioritize minimal dependencies and fast startup over telemetry. They often lack &lt;code&gt;/metrics&lt;/code&gt; endpoints, don't log structured data, and can't be traced by standard tools. The protocol itself doesn't mandate observability standards, and established patterns don't exist yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact&lt;/strong&gt;: When AI systems behave unexpectedly, teams can't determine which MCP servers were involved. When response times increase, there's no visibility into whether bottlenecks are in the AI model, MCP server, or external systems. Even with comprehensive Prometheus and OTel stacks, MCP servers remain invisible, creating significant blind spots in otherwise well-monitored systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teaser: In Our Next Post, We'll Explore ToolHive
&lt;/h2&gt;

&lt;p&gt;These observability challenges with MCP servers aren't insurmountable, but they require solutions that bridge the gap between emerging technologies and established monitoring practices.&lt;/p&gt;

&lt;p&gt;In our next post, we'll explore ToolHive, which aims to fill part of this specific gap by providing MCP tool usage data that most servers don't expose natively. We'll look at how it integrates with existing OTel and Prometheus infrastructure to make MCP servers observable within your current monitoring stack, and examine practical approaches for implementing observability patterns with other emerging technologies in Kubernetes environments.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>mcp</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Performance Testing MCP Servers in Kubernetes: Transport Choice is THE Make-or-Break Decision for Scaling MCP</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Tue, 19 Aug 2025 13:16:31 +0000</pubDate>
      <link>https://dev.to/stacklok/performance-testing-mcp-servers-in-kubernetes-transport-choice-is-the-make-or-break-decision-for-1ffb</link>
      <guid>https://dev.to/stacklok/performance-testing-mcp-servers-in-kubernetes-transport-choice-is-the-make-or-break-decision-for-1ffb</guid>
      <description>&lt;p&gt;The Model Context Protocol (MCP) has emerged as a critical standard for enabling AI models to interact with external tools and data sources securely. As organisations increasingly deploy MCP servers at scale in Kubernetes environments, understanding their performance characteristics under load becomes essential for production readiness.&lt;/p&gt;

&lt;p&gt;This article analyses the findings from initial load testing performed on MCP servers running in Kubernetes with ToolHive, examining three different transport protocols and their suitability for high-concurrency production workloads.&lt;/p&gt;

&lt;h1&gt;
  
  
  Test Methodology and Setup
&lt;/h1&gt;

&lt;p&gt;The load testing was conducted using a systematic approach to evaluate three MCP transport implementations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stdio&lt;/strong&gt;: Standard input/output communication requiring direct container attachment
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSE&lt;/strong&gt; (Server-Sent Events): HTTP-based streaming protocol
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StreamableHTTP&lt;/strong&gt;: Custom streamable HTTP protocol designed for MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each transport type was subjected to various load scenarios to measure throughput, error rates, latency, and scalability characteristics. The tests focused on identifying bottlenecks and determining which transport mechanisms could reliably handle production-scale traffic.&lt;/p&gt;

&lt;p&gt;The MCP server used for testing was &lt;a href="https://github.com/StacklokLabs/yardstick" rel="noopener noreferrer"&gt;&lt;code&gt;yardstick&lt;/code&gt;&lt;/a&gt;, which exposes an &lt;code&gt;echo&lt;/code&gt; tool that simply returns the text provided in the request. This design helps eliminate caching effects, giving a clearer view of raw MCP server and ToolHive performance. Functionally similar to the &lt;code&gt;mcp/everything&lt;/code&gt; server, yardstick is containerised and supports all three transport types.&lt;/p&gt;

&lt;p&gt;This MCP server was deployed onto a local Kubernetes cluster using kind, with &lt;a href="https://toolhive.dev" rel="noopener noreferrer"&gt;ToolHive&lt;/a&gt; running the MCP server and simple port forwarding for access. In real environments, this would look largely different, resulting in some additional latency in response times.&lt;/p&gt;

&lt;h1&gt;
  
  
  Performance Findings by Transport Type
&lt;/h1&gt;

&lt;h2&gt;
  
  
  stdio Transport
&lt;/h2&gt;

&lt;p&gt;The stdio implementation demonstrated severe performance limitations that make it unsuitable for production use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Name&lt;/th&gt;
&lt;th&gt;Concurrent Connections&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Total Expected&lt;/th&gt;
&lt;th&gt;Actual Requests&lt;/th&gt;
&lt;th&gt;Successful&lt;/th&gt;
&lt;th&gt;Failed&lt;/th&gt;
&lt;th&gt;Req/sec&lt;/th&gt;
&lt;th&gt;Min RT&lt;/th&gt;
&lt;th&gt;Max RT&lt;/th&gt;
&lt;th&gt;Avg RT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Test&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;5s&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;19.78ms&lt;/td&gt;
&lt;td&gt;30.02s&lt;/td&gt;
&lt;td&gt;20.01s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Error Breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeouts: 8
&lt;/li&gt;
&lt;li&gt;Connection resets: 3
&lt;/li&gt;
&lt;li&gt;Connection closed: 9&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying architecture’s reliance on direct container attachment introduces built-in scalability limits. Every connection consumes dedicated container resources, making horizontal scaling costly and unreliable. As a result, performance was poor even at low concurrency: out of 50 requests, only 2 succeeded, and over half never left the client due to the cascading effects of earlier timeout errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  SSE Transport
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Name&lt;/th&gt;
&lt;th&gt;Concurrent Connections&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Total Expected&lt;/th&gt;
&lt;th&gt;Actual Requests&lt;/th&gt;
&lt;th&gt;Success Rate&lt;/th&gt;
&lt;th&gt;Req/sec&lt;/th&gt;
&lt;th&gt;Min RT&lt;/th&gt;
&lt;th&gt;Max RT&lt;/th&gt;
&lt;th&gt;Avg RT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Test&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;5s&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;7.23&lt;/td&gt;
&lt;td&gt;11.13ms&lt;/td&gt;
&lt;td&gt;21.89ms&lt;/td&gt;
&lt;td&gt;18.56ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sustained Load&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;3000&lt;/td&gt;
&lt;td&gt;1861&lt;/td&gt;
&lt;td&gt;100.00%*&lt;/td&gt;
&lt;td&gt;29.87&lt;/td&gt;
&lt;td&gt;4.76ms&lt;/td&gt;
&lt;td&gt;2.00s&lt;/td&gt;
&lt;td&gt;564.57ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Compared to stdio, SSE demonstrated superior throughput and reliability, completing all traffic, including items not transmitted in the stdio trial, and maintained solid performance at moderate volumes. However, with sustained heavy load, response times deteriorated, and at peak rates, the test harness expired before all requests could be issued. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;SSE is now officially deprecated (in favour of Streamable HTTP), so expect fewer and fewer MCP servers to offer this as a transport type in future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Streamable HTTP Transport
&lt;/h2&gt;

&lt;p&gt;Streamable HTTP dominated across all metrics, but with one crucial caveat about session management: &lt;strong&gt;shared session pools&lt;/strong&gt; and &lt;strong&gt;unique session pools&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared Session Pool (10 sessions)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Scenario&lt;/th&gt;
&lt;th&gt;Concurrent Connections&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Total Expected&lt;/th&gt;
&lt;th&gt;Requests&lt;/th&gt;
&lt;th&gt;Success Rate&lt;/th&gt;
&lt;th&gt;Req/sec&lt;/th&gt;
&lt;th&gt;Min RT&lt;/th&gt;
&lt;th&gt;Max RT&lt;/th&gt;
&lt;th&gt;Avg RT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Basic Load Test&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;5s&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;7.24&lt;/td&gt;
&lt;td&gt;1.88ms&lt;/td&gt;
&lt;td&gt;15.66ms&lt;/td&gt;
&lt;td&gt;5.31ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sustained&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;3000&lt;/td&gt;
&lt;td&gt;3000&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;48.40&lt;/td&gt;
&lt;td&gt;1.02ms&lt;/td&gt;
&lt;td&gt;97.55ms&lt;/td&gt;
&lt;td&gt;5.03ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High Load&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;6000&lt;/td&gt;
&lt;td&gt;6000&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;96.78&lt;/td&gt;
&lt;td&gt;831µs&lt;/td&gt;
&lt;td&gt;135.05ms&lt;/td&gt;
&lt;td&gt;6.68ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very High Load&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;30000&lt;/td&gt;
&lt;td&gt;18757&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;299.85&lt;/td&gt;
&lt;td&gt;1.33ms&lt;/td&gt;
&lt;td&gt;783.43ms&lt;/td&gt;
&lt;td&gt;622.20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very High Load&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;30000&lt;/td&gt;
&lt;td&gt;18546&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;293.16&lt;/td&gt;
&lt;td&gt;36.87ms&lt;/td&gt;
&lt;td&gt;1.69s&lt;/td&gt;
&lt;td&gt;1.28s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very High Load&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;30000&lt;/td&gt;
&lt;td&gt;19112&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;292.62&lt;/td&gt;
&lt;td&gt;5.09ms&lt;/td&gt;
&lt;td&gt;3.58s&lt;/td&gt;
&lt;td&gt;3.09s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Unique Session Per Request
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test Scenario&lt;/th&gt;
&lt;th&gt;Concurrent Connections&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Total Expected&lt;/th&gt;
&lt;th&gt;Requests&lt;/th&gt;
&lt;th&gt;Success Rate&lt;/th&gt;
&lt;th&gt;Req/sec&lt;/th&gt;
&lt;th&gt;Min RT&lt;/th&gt;
&lt;th&gt;Max RT&lt;/th&gt;
&lt;th&gt;Avg RT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sustained&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;3000&lt;/td&gt;
&lt;td&gt;2244&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;36.07&lt;/td&gt;
&lt;td&gt;4.05ms&lt;/td&gt;
&lt;td&gt;1.31s&lt;/td&gt;
&lt;td&gt;272.93ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High Load&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;6000&lt;/td&gt;
&lt;td&gt;2086&lt;/td&gt;
&lt;td&gt;100.00%&lt;/td&gt;
&lt;td&gt;33.03&lt;/td&gt;
&lt;td&gt;5.37ms&lt;/td&gt;
&lt;td&gt;4.23s&lt;/td&gt;
&lt;td&gt;1.12s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Streamable HTTP maintained &lt;strong&gt;100% success rates&lt;/strong&gt; across all requests sent during the scenarios while delivering &lt;strong&gt;290-300 requests per second with shared sessions&lt;/strong&gt; versus only &lt;strong&gt;30-36 requests per second with unique sessions&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key Insight: Session Management is Everything
&lt;/h2&gt;

&lt;p&gt;The most striking finding was the &lt;strong&gt;10x performance difference&lt;/strong&gt; between shared and unique session handling in Streamable HTTP. This reveals that session reuse isn't just an optimisation - it's fundamental to achieving production-scale performance.&lt;/p&gt;

&lt;p&gt;Recommendations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Build around sessions&lt;/strong&gt;: Pool and reuse aggressively (where appropriate)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid stdio in production&lt;/strong&gt;, prefer Streamable HTTP by default (unless you have good reasons not to use it)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Caveats
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;The yardstick MCP server is a simple &lt;code&gt;echo&lt;/code&gt; tool with no long-running work, so it responds extremely quickly. Real MCP servers in the wild will almost certainly benchmark slower than the figures shown here.
&lt;/li&gt;
&lt;li&gt;Tests were run on a local Kubernetes cluster with port-forwarding, minimising latency. Expect slower results on remote clusters.
&lt;/li&gt;
&lt;li&gt;The load testing tool used was built specifically to run performance tests against MCP servers and is not a battle-hardened tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Takeaway
&lt;/h1&gt;

&lt;p&gt;These results fundamentally change how we should think about MCP server deployments. Transport choice isn't just a technical detail - it's a make-or-break architectural decision that can determine whether your AI capabilities scale or fail under load.&lt;/p&gt;

&lt;p&gt;For teams building production AI systems with MCP, Streamable HTTP with optimised session management represents a key path forward in the current MCP landscape for achieving the reliability and performance modern applications demand.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>toolhive</category>
      <category>kubernetes</category>
      <category>ai</category>
    </item>
    <item>
      <title>ToolHive Operator: Multi-Namespace Support for Enhanced Security and Flexibility</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Tue, 10 Jun 2025 17:02:13 +0000</pubDate>
      <link>https://dev.to/stacklok/toolhive-operator-multi-namespace-support-for-enhanced-security-and-flexibility-2dcn</link>
      <guid>https://dev.to/stacklok/toolhive-operator-multi-namespace-support-for-enhanced-security-and-flexibility-2dcn</guid>
      <description>&lt;p&gt;We're excited to announce a significant enhancement to the ToolHive Operator: &lt;strong&gt;multi-namespace deployment support&lt;/strong&gt;. This update provides organizations with greater flexibility and security when deploying MCP (Model Context Protocol) servers across their Kubernetes environments.&lt;/p&gt;

&lt;h1&gt;
  
  
  What's New
&lt;/h1&gt;

&lt;p&gt;The ToolHive Operator now supports two distinct deployment modes:&lt;/p&gt;

&lt;h2&gt;
  
  
  🌍 Cluster Mode (Default)
&lt;/h2&gt;

&lt;p&gt;Suitable for platform teams managing MCPServers across the entire cluster&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full cluster-wide access to manage &lt;code&gt;MCPServer&lt;/code&gt;'s in any namespace&lt;/li&gt;
&lt;li&gt;Uses &lt;code&gt;ClusterRole&lt;/code&gt; and &lt;code&gt;ClusterRoleBinding&lt;/code&gt; for broad permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔒 Namespace Mode (New!)
&lt;/h2&gt;

&lt;p&gt;Perfect for multi-tenant environments and organizations following the principle of least privilege&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Restricted access to only specified namespaces&lt;/li&gt;
&lt;li&gt;Uses &lt;code&gt;ClusterRole&lt;/code&gt; with namespace-specific &lt;code&gt;RoleBinding&lt;/code&gt;s for precise access control&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Why Multi-Namespace Support Matters
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Enhanced Security
&lt;/h2&gt;

&lt;p&gt;In namespace mode, the ToolHive Operator only has permissions in the namespaces you explicitly specify. This significantly reduces the blast radius and follows Kubernetes security best practices. This prevents a compromised operator from accessing sensitive workloads in other namespaces. For example, if an attacker exploits the operator, they can't pivot to your production databases, payment systems, or other critical applications running in separate namespaces. It also eliminates the risk of accidental misconfiguration affecting unrelated services across your entire cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Tenancy Support
&lt;/h2&gt;

&lt;p&gt;Different teams can now have their own namespaces with MCPServers while maintaining strict isolation. The operator in the toolhive-system namespace can manage resources across designated team namespaces without requiring cluster-wide permissions. This eliminates resource conflicts where one team's MCP configuration could interfere with another team's. It also prevents competing resource quotas or conflicting network policies that could degrade performance. Teams can iterate independently without waiting for central infrastructure changes, accelerating development cycles while maintaining security boundaries between departments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance and Governance
&lt;/h2&gt;

&lt;p&gt;Organizations with strict security requirements can now deploy ToolHive with minimal necessary permissions, making it easier to pass security audits and meet compliance requirements. Security auditors can quickly verify that ToolHive follows the principle of least privilege by examining a limited set of namespace-scoped permissions rather than auditing complex cluster-wide access patterns. This reduces audit preparation time from weeks to days and helps developers satisfy InfoSec requirements upfront, avoiding the common scenario where security teams block deployments due to overly broad permissions that violate corporate security policies.&lt;/p&gt;

&lt;h1&gt;
  
  
  How It Works
&lt;/h1&gt;

&lt;p&gt;The magic happens through an RBAC pattern where the operator uses a &lt;code&gt;ClusterRole&lt;/code&gt; (for permission consistency) but applies it through namespace-specific &lt;code&gt;RoleBinding&lt;/code&gt;s. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single source of truth&lt;/strong&gt;: One &lt;code&gt;ClusterRole&lt;/code&gt; defines all the permissions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Namespace isolation&lt;/strong&gt;: &lt;code&gt;RoleBinding&lt;/code&gt;s restrict where those permissions apply&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic scaling&lt;/strong&gt;: Easy to add or remove namespace access as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Helm Configuration Examples
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Cluster Mode (Default)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# values.yaml&lt;/span&gt;
&lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rbac&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cluster"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;code&gt;ClusterRoleBinding&lt;/code&gt; granting the operator access to all namespaces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Namespace Mode
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# values.yaml&lt;/span&gt;
&lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rbac&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespace"&lt;/span&gt;
    &lt;span class="na"&gt;allowedNamespaces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team-frontend"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team-backend"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates individual &lt;code&gt;RoleBindings&lt;/code&gt; in each specified namespace, granting the operator access only where needed.&lt;/p&gt;

&lt;h1&gt;
  
  
  What Permissions Does the Operator Get?
&lt;/h1&gt;

&lt;p&gt;The ToolHive Operator requires specific permissions to manage MCPServer resources and their associated Kubernetes objects:&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Permissions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCPServers&lt;/strong&gt;: Full lifecycle management of your custom resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ServiceAccounts&lt;/strong&gt;: Creates dedicated service accounts for each MCPServer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Roles &amp;amp; RoleBindings&lt;/strong&gt;: Manages RBAC for ProxyRunner and MCPServer workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ConfigMaps &amp;amp; Secrets&lt;/strong&gt;: Handles configuration and credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployments &amp;amp; Services&lt;/strong&gt;: Manages the underlying workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Additional Permissions
&lt;/h2&gt;

&lt;p&gt;These are permissions that are needed by the &lt;code&gt;toolhive-operator&lt;/code&gt; so that it can grant these privileges to the ProxyRunners in the dedicated namespaces. The ProxyRunners are the component that uses these permissions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pod logs&lt;/strong&gt;: Ability to get pod logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod attach&lt;/strong&gt;: Ability to attach to the pod (for stdio MCP Servers)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: The above are what are currently used, these are likely to evolve in future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Scope-Specific Behavior
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster mode&lt;/strong&gt;: These permissions apply cluster-wide&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Namespace mode&lt;/strong&gt;: These permissions apply only to specified namespaces&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Real-World Use Cases
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Multi-Team Organization
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Platform team controls toolhive-system&lt;/span&gt;
&lt;span class="c1"&gt;# Individual teams get their own namespaces&lt;/span&gt;
&lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rbac&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespace"&lt;/span&gt;
    &lt;span class="na"&gt;allowedNamespaces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team-data"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team-ai"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;team-platform"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Environment Separation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rbac&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespace"&lt;/span&gt;
    &lt;span class="na"&gt;allowedNamespaces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;development"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  What's Next?
&lt;/h1&gt;

&lt;p&gt;This multi-namespace support is just the beginning. We're looking into additional features including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic namespace discovery&lt;/strong&gt;: Automatically detect and manage namespaces based on labels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate ProxyRunner and MCPServer permissions&lt;/strong&gt;: The ProxyRunner and the MCPServer pod do not need to share permissions, we want to make this even more secure by following the principle of least privilege&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Community Feedback
&lt;/h1&gt;

&lt;p&gt;We'd love to hear how you're using multi-namespace support! Share your use cases, feedback, and questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/stacklok/toolhive" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discussions&lt;/strong&gt;: Join our community discussions on &lt;a href="https://discord.gg/stacklok" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Issues&lt;/strong&gt;: Report bugs or request features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ToolHive Operator's multi-namespace support represents our commitment to providing secure, flexible, and enterprise-ready solutions for MCP server management. Whether you're a platform team managing cluster-wide resources or a security-conscious organization requiring strict namespace isolation, we've got you covered.&lt;/p&gt;

&lt;p&gt;Happy deploying! 🚀&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>mcp</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>ToolHive: A Kubernetes Operator for Deploying MCP Servers</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Thu, 01 May 2025 11:14:15 +0000</pubDate>
      <link>https://dev.to/stacklok/toolhive-an-mcp-kubernetes-operator-321</link>
      <guid>https://dev.to/stacklok/toolhive-an-mcp-kubernetes-operator-321</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Building on our &lt;a href="https://dev.to/stacklok/toolhive-secure-mcp-in-a-kubernetes-native-world-3o65"&gt;earlier discussion&lt;/a&gt; about enterprises needing dedicated hosting for MCP servers and ToolHive's Kubernetes-based solution, we're excited to announce our new &lt;a href="https://github.com/StacklokLabs/toolhive/tree/main/cmd/thv-operator" rel="noopener noreferrer"&gt;Kubernetes Operator&lt;/a&gt; for ToolHive. This specialised tool streamlines the secure deployment of MCP servers to Kubernetes environments for enterprise and engineers.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore practical ways to leverage this new operator's capabilities. &lt;/p&gt;

&lt;p&gt;Let's jump right in! 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the Operator
&lt;/h2&gt;

&lt;p&gt;For the installation of the ToolHive Operator, we’ve assumed there is already a Kubernetes cluster available with an Ingress controller. We have used &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt; for this post as it is simple to set up, free and easy to use. &lt;/p&gt;

&lt;p&gt;For simplified local ingress setup with Kind we utilise a basic IP with the Kind Load Balancer - feel free to follow &lt;a href="https://github.com/stacklok/toolhive/blob/main/docs/kind/ingress.md" rel="noopener noreferrer"&gt;our guide&lt;/a&gt; for easy steps on how to do this. To keep things straightforward, we won't use a local hostname in this setup. &lt;/p&gt;

&lt;p&gt;Now, with a running cluster, execute the following Helm commands (remember to adjust the &lt;code&gt;--kubeconfig&lt;/code&gt; and &lt;code&gt;--kube-context&lt;/code&gt; flags as needed).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Install the ToolHive Operator Custom Resource Definitions (CRD’s):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploy the Operator:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point, the ToolHive Kubernetes Operator should now be installed and running. &lt;/p&gt;

&lt;p&gt;To verify this, run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system

NAME                                READY   STATUS    RESTARTS   AGE
toolhive-operator-7f946d9c5-9s8dk   1/1     Running   0          59s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploy an MCP Server
&lt;/h2&gt;

&lt;p&gt;Now to install a sample &lt;code&gt;fetch&lt;/code&gt; MCP server, run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/stacklok/toolhive/main/examples/operator/mcp-servers/mcpserver_fetch.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To verify this has been installed, run the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-system &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;toolhive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true

&lt;/span&gt;NAME                     READY   STATUS    RESTARTS   AGE
fetch-0                  1/1     Running   0          115s
fetch-649c5b958c-nhjbq   1/1     Running   0          2m1s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As shown above, 2 pods are running. The fetch MCP server (&lt;code&gt;fetch-0&lt;/code&gt;) is a pod associated with the MCP Server &lt;code&gt;StatefulSet&lt;/code&gt;. The other - &lt;code&gt;fetch-xxxxxxxxxx-xxxxx&lt;/code&gt; - is the proxy server that deals with all communication between the &lt;code&gt;fetch&lt;/code&gt; MCP server and external callers.&lt;/p&gt;

&lt;p&gt;Looking back, let’s review how the MCP server was created. Here is the fetch MCP server resource that we’ve applied to the cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;toolhive.stacklok.dev/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MCPServer&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fetch&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;toolhive-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker.io/mcp/fetch&lt;/span&gt;
  &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stdio&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
  &lt;span class="na"&gt;permissionProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;builtin&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;network&lt;/span&gt;
  &lt;span class="na"&gt;podTemplateSpec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp&lt;/span&gt;
          &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
            &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
            &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
            &lt;span class="na"&gt;runAsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
            &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ALL&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500m"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
            &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100m"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128Mi"&lt;/span&gt;
      &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
        &lt;span class="na"&gt;runAsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
        &lt;span class="na"&gt;seccompProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RuntimeDefault&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100m"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128Mi"&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50m"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;64Mi"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ToolHive Operator introduces a new Custom Resource called: &lt;strong&gt;MCPServer&lt;/strong&gt;. Here’s a breakdown of the MCPServer configuration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;transport: stdio&lt;/code&gt; - This creates the MCP server allowing only stdin and stdout traffic. In Kubernetes this results in the proxy server attaching to the container via the Kubernetes API. No other access is given to the caller.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;permissionProfile.type: builtin&lt;/code&gt; - This references the built-in profiles with ToolHive&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;permissionProfile.name: network&lt;/code&gt; - Permits outbound network connections to any host on any port (not recommended for production use).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now to connect an example Client such as Cursor to our MCP server, we can do so simply with an Ingress record that is enabled by the Load Balancer mentioned earlier.&lt;/p&gt;

&lt;p&gt;We can apply the following Ingress entry, ensuring that the &lt;code&gt;ingressClassName&lt;/code&gt; matches what we have in our cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-fetch-ingress&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;toolhive-system&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mcp-fetch-proxy&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point we should be able to connect to the running fetch MCP server using the external IP address of our Load Balancer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: If you have not chosen Kind for the cluster and you have a different Load Balancer setup than what is followed in this post, you will have to make the respective changes in your configuration to send ingress traffic to the fetch server proxy service.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Due to the fact that we did not use the CLI to create the MCP server, the server configuration did not get automatically applied to our local Client configurations. For this reason, we have to add the configuration manually.&lt;/p&gt;

&lt;p&gt;For &lt;a href="https://docs.cursor.com/context/model-context-protocol#configuration-locations" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, we go to &lt;code&gt;Users/$USERNAME/.cursor/mcp.json&lt;/code&gt;, ensuring to replace &lt;code&gt;$USERNAME&lt;/code&gt; with our home directory username and we add the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"fetch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080/sse#fetch"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we go into the Cursor chat, and we ask it to fetch the contents of a web page, it should ask us for approval for the use of the &lt;code&gt;fetch&lt;/code&gt; MCP server and then return the content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz690t6jbpawzylz8ece.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz690t6jbpawzylz8ece.png" alt="Cursor Fetch MCP" width="800" height="1184"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now if we see the logs for the &lt;code&gt;fetch&lt;/code&gt; MCP server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text","text":"Contents of https://chrisjburns.com/:\n\n\nchrisjburns\n\n# Chris Burns\n\n## Software engineer\n\n"}],"isError":false}}
$ {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text","text":"Content type text/html; charset=utf-8 cannot be simplified to markdown, but here is the raw content:\nContents of https://chrisjburns.com/:\n&amp;lt;!doctype html&amp;gt;&amp;lt;html lang=en&amp;gt;&amp;lt;head&amp;gt;&amp;lt;meta charset=utf-8&amp;gt;&amp;lt;meta name=viewport content=\"width=device-width,initial-scale=1\"&amp;gt;&amp;lt;meta name=author content=\"Chris Burns\"&amp;gt;&amp;lt;meta name=keywords content=\"blog,developer,personal\"&amp;gt;&amp;lt;meta name=twitter:card content=\"summary\"&amp;gt;&amp;lt;meta name=twitter:title content=\"chrisjburns\"&amp;gt;&amp;lt;meta name=twitter:description content&amp;gt;&amp;lt;meta property=\"og:title\" content=\"chrisjburns\"&amp;gt;&amp;lt;meta property=\"og:description\" content&amp;gt;&amp;lt;meta property=\"og:type\" content=\"website\"&amp;gt;&amp;lt;meta property=\"og:url\" content=\"https://chrisjburns.com/\"&amp;gt;&amp;lt;meta property=\"og:updated_time\" content=\"2020-05-20T00:18:23+01:00\"&amp;gt;&amp;lt;base href=https://chrisjburns.com/&amp;gt;&amp;lt;title&amp;gt;chrisjburns&amp;lt;/title&amp;gt;&amp;lt;link rel=canonical href=https://chrisjburns.com/&amp;gt;&amp;lt;link href=\"https://fonts.googleapis.com/css?family=Lato:400,700%7CMerriweather:300,700%7CSource+Code+Pro:400,700\" rel=stylesheet&amp;gt;&amp;lt;link href=\"https://fonts.googleapis.com/css?family=Montserrat:400,700|Open+Sans:400,600,300,800,700\" rel=stylesheet type=text/css&amp;gt;&amp;lt;link rel=stylesheet href=https://use.fontawesome.com/releases/v5.11.2/css/all.css integrity=sha384-KA6wR/X5RY4zFAHpv/CnoG2UW1uogYfdnP67Uv7eULvTveboZJg0qUpmJZb5VqzN crossorigin=anonymous&amp;gt;&amp;lt;link rel=stylesheet href=https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css integrity=\"sha256-l85OmPOjvil/SOvVt3HnSSjzF1TUMyT9eV0c2BzEGzU=\" crossorigin=anonymous&amp;gt;&amp;lt;link rel=stylesheet href=https://chrisjburns.com/css/coder.min.9f38ad26345e306650770a3b91475e09efa3026c59673a09eff165cfa8f1a30e.css integrity=\"sha256-nzitJjReMGZQdwo7kUdeCe+jAmxZZzoJ7/Flz6jxow4=\" crossorigin=anonymous media=screen&amp;gt;&amp;lt;link rel=icon type=image/png href=https://chrisjburns.com/images/favicon-32x32.png sizes=32x32&amp;gt;&amp;lt;link rel=icon type=image/png href=https://chrisjburns.com/images/favicon-16x16.png sizes=16x16&amp;gt;&amp;lt;link rel=alternate type=application/rss+xml href=https://chrisjburns.com/index.xml title=chrisjburns&amp;gt;&amp;lt;meta name=generator content=\"Hugo 0.63.2\"&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;body class=colorscheme-light&amp;gt;&amp;lt;main class=wrapper&amp;gt;&amp;lt;nav class=navigation&amp;gt;&amp;lt;section class=container&amp;gt;&amp;lt;a class=navigation-title href=https://chrisjburns.com/&amp;gt;chrisjburns&amp;lt;/a&amp;gt;\n&amp;lt;input type=checkbox id=menu-toggle&amp;gt;\n&amp;lt;label class=\"menu-button float-right\" for=menu-toggle&amp;gt;&amp;lt;i class=\"fas fa-bars\"&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/label&amp;gt;&amp;lt;ul class=navigation-list&amp;gt;&amp;lt;li class=navigation-item&amp;gt;&amp;lt;a class=navigation-link href=https://chrisjburns.com/posts/&amp;gt;BLOG&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&amp;lt;/section&amp;gt;&amp;lt;/nav&amp;gt;&amp;lt;div class=content&amp;gt;&amp;lt;section class=\"container centered\"&amp;gt;&amp;lt;div class=about&amp;gt;&amp;lt;div class=avatar&amp;gt;&amp;lt;img src=https://chrisjburns.com/images/avatar.jpg alt=avatar&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;h1&amp;gt;Chris Burns&amp;lt;/h1&amp;gt;&amp;lt;h2&amp;gt;Software engineer&amp;lt;/h2&amp;gt;&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;&amp;lt;a href=https://github.com/ChrisJBurns/ aria-label=Github&amp;gt;&amp;lt;i class=\"fab fa-github\" aria-hidden=true&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;&amp;lt;a href=https://www.linkedin.com/in/chris-j-burns/ aria-label=LinkedIn&amp;gt;&amp;lt;i class=\"fab fa-linkedin\" aria-hidden=true&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&amp;lt;img src=https://ghchart.rshah.org/ChrisJBurns alt=\"Chris Burns's Github chart\"&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;/section&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;footer class=footer&amp;gt;&amp;lt;section class=container&amp;gt;&amp;lt;/section&amp;gt;&amp;lt;/footer&amp;gt;&amp;lt;/main&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;"}],"isError":false}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There we have it, an MCP Server, created in Kubernetes using the new ToolHive Operator.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;At this point, we hope that it is possible to see the power that this will give engineers and enterprises that want to create MCP servers within Kubernetes. For those who have already played around with Operators and used them, they already know that the potential capabilities are unrivalled when it comes to creating and managing workloads inside of Kubernetes. We know at Stacklok, that behind the Operator we can hide away a lot of complexity that is normally burdened onto the engineer. We really are excited to release this and we are even more excited to see where it goes.&lt;/p&gt;

&lt;p&gt;Give it a try, and let us know what you think!&lt;/p&gt;

&lt;p&gt;Essential Links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/StacklokLabs/toolhive" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/stacklok" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/@Stacklok" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>ToolHive: Secure MCP in a Kubernetes-native World</title>
      <dc:creator>Chris Burns</dc:creator>
      <pubDate>Tue, 22 Apr 2025 14:32:16 +0000</pubDate>
      <link>https://dev.to/stacklok/toolhive-secure-mcp-in-a-kubernetes-native-world-3o65</link>
      <guid>https://dev.to/stacklok/toolhive-secure-mcp-in-a-kubernetes-native-world-3o65</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;⚠️ Deprecate Notice: The recommended way of installing ToolHive on Kubernetes is now via the ToolHive Operator and the manifests in this post have now been removed. Follow &lt;a href="https://dev.to/stacklok/toolhive-an-mcp-kubernetes-operator-321"&gt;https://dev.to/stacklok/toolhive-an-mcp-kubernetes-operator-321&lt;/a&gt; to find out how to install the Operator. ⚠️&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) enables seamless integration with applications and services to extend an LLM's context and capabilities. However, deploying MCP servers in production environments raises concerns surrounding data privacy, unauthorised access, and potential vulnerabilities. Current MCP server setups often lack the robust security measures required to safeguard sensitive model data and prevent malicious activities, thus hindering widespread adoption.&lt;/p&gt;

&lt;p&gt;Kubernetes offers a compelling solution for running MCP servers securely and efficiently. Its containerisation and orchestration capabilities provide a strong foundation for isolating and managing MCP instances. Kubernetes' built-in features, such as role-based access control (RBAC), network policies, and secrets management, address the security concerns that deter enterprises. Furthermore, the Kubernetes ecosystem, including tools for monitoring, logging, and automated deployment, enables a comprehensive and secure operational environment for MCP servers.&lt;/p&gt;

&lt;p&gt;The team at Stacklok, empowered by our CEO, Craig McLuckie (and co-creator of Kubernetes), recently released &lt;a href="https://github.com/StacklokLabs/toolhive" rel="noopener noreferrer"&gt;ToolHive&lt;/a&gt;, an open source project that offers a convenient way to run MCP servers with familiar technologies with authentication, authorization and network isolation. Let’s take a closer look at how ToolHive and Kubernetes come together to support MCP in an enterprise environment. &lt;/p&gt;

&lt;h2&gt;
  
  
  Running ToolHive on Kubernetes
&lt;/h2&gt;

&lt;p&gt;ToolHive lets you run MCP servers in Kubernetes using one of its native workload types: StatefulSets. StatefulSets are designed for managing stateful applications, making them ideal for MCP servers. When deploying ToolHive in Kubernetes, you’ll create a StatefulSet for ToolHive itself, which is configured to launch an MCP server in the foreground. Running the server in the foreground ensures the ToolHive pod remains active for the full duration of the MCP server’s lifecycle. Once the ToolHive StatefulSet is up and the pod is running, it will then provision your target MCP server, also as a StatefulSet. This results in two workloads running: ToolHive and the desired MCP server.&lt;/p&gt;

&lt;p&gt;Let’s try it out. We’ll use the example &lt;a href="https://github.com/StacklokLabs/toolhive/tree/main/deploy/k8s" rel="noopener noreferrer"&gt;YAML manifests&lt;/a&gt; available in the ToolHive GitHub repository. Before getting started, make sure you have access to a running Kubernetes cluster. If you want to avoid cloud costs, you can use a local setup like &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt;, which lets you run Kubernetes clusters locally using Docker.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create the ToolHive namespace:&lt;br&gt;
&lt;code&gt;$ kubectl apply -f namespace.yaml&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provision the correct RBAC roles and service account for ToolHive:&lt;br&gt;
&lt;code&gt;$ kubectl apply -f rbac.yaml -n toolhive-deployment&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provision ToolHive and an example &lt;code&gt;fetch&lt;/code&gt; MCP server:&lt;br&gt;
&lt;code&gt;$ kubectl apply -f thv.yaml -n toolhive-deployment&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point, you should have an MCP server running, with its associated ToolHive workload. To check this, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-deployment get all

NAME              READY   STATUS    RESTARTS   AGE
pod/mcp-fetch-0   1/1     Running   0          6m40s
pod/toolhive-0    1/1     Running   0          6m46s

NAME               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT&lt;span class="o"&gt;(&lt;/span&gt;S&lt;span class="o"&gt;)&lt;/span&gt;    AGE
service/toolhive   ClusterIP   10.96.10.131   &amp;lt;none&amp;gt;        8080/TCP   6m46s

NAME                         READY   AGE
statefulset.apps/mcp-fetch   1/1     6m40s
statefulset.apps/toolhive    1/1     6m46s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looking good, the ToolHive and MCP pods are both healthy. Let’s look at the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl logs pod/toolhive-0 &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-deployment
checking &lt;span class="k"&gt;for &lt;/span&gt;updates...
A new version of ToolHive is available: v0.0.15
Currently running: dev
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.912633512Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Processed cmdArgs: []"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.914158929Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Image docker.io/mcp/fetch has 'latest' tag, pulling to ensure we have the most recent version..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.914169221Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Skipping explicit image pull for docker.io/mcp/fetch in Kubernetes - images are pulled automatically when pods are created"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.914171179Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Successfully pulled image: docker.io/mcp/fetch"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.915905346Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Using host port: 8080"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.915920096Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Setting up stdio transport..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.915923512Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Creating container mcp-fetch from image docker.io/mcp/fetch..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:24.922990637Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Applied statefulset mcp-fetch"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.823657379Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Container created with ID: mcp-fetch"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.823676796Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Starting stdio transport..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.825097838Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Attaching to pod mcp-fetch-0 container mcp-fetch..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.825138463Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"HTTP SSE proxy started, processing messages..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.825430046Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"HTTP proxy started for container mcp-fetch on port 8080"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.825438004Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"SSE endpoint: http://localhost:8080/sse"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.825440046Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"JSON-RPC endpoint: http://localhost:8080/messages"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827135754Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"MCP server mcp-fetch started successfully"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827350796Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Saved run configuration for mcp-fetch"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827414796Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Would you like to enable auto discovery and configuraion of MCP clients? (y/n) [n]: "&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827423629Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Unable to read input, defaulting to No."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827425713Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"initializing configuration file at /home/nonroot/.config/toolhive/config.yaml"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827466963Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"No client configuration files found"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-04-17T12:01:37.827474546Z"&lt;/span&gt;,&lt;span class="s2"&gt;"level"&lt;/span&gt;:&lt;span class="s2"&gt;"INFO"&lt;/span&gt;,&lt;span class="s2"&gt;"msg"&lt;/span&gt;:&lt;span class="s2"&gt;"Press Ctrl+C to stop or wait for container to exit"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice!&lt;/p&gt;

&lt;p&gt;You won’t see any logs in the &lt;code&gt;fetch&lt;/code&gt; MCP server just yet—that’s because no requests have been made. Let’s change that by connecting it to a local Cursor client.&lt;/p&gt;

&lt;p&gt;To expose the MCP server locally, we’ll use a simple &lt;strong&gt;port-forward&lt;/strong&gt;. While the ToolHive repository includes a sample Ingress Controller setup, we’ll stick with port-forwarding here for the sake of simplicity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl port-forward svc/toolhive 8080:8080  &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-deployment

Forwarding from 127.0.0.1:8080 -&amp;gt; 8080
Forwarding from &lt;span class="o"&gt;[&lt;/span&gt;::1]:8080 -&amp;gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let’s connect a local Cursor client to our MCP server. Head over to the &lt;a href="https://docs.cursor.com/context/model-context-protocol#configuration-locations" rel="noopener noreferrer"&gt;MCP settings&lt;/a&gt; in Cursor to configure the connection&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"fetch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
           &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080/sse#fetch"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the Cursor MCP settings page should display a configured and ready-to-use &lt;code&gt;fetch&lt;/code&gt; MCP server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu3ykm0ynakgjzapj3vj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu3ykm0ynakgjzapj3vj.png" alt="Cursor MCP Fetch" width="800" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we now check the logs for the MCP container, we should see entries reflecting the initial connection from Cursor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl logs mcp-fetch-0 &lt;span class="nt"&gt;-n&lt;/span&gt; toolhive-deployment

&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"jsonrpc"&lt;/span&gt;:&lt;span class="s2"&gt;"2.0"&lt;/span&gt;,&lt;span class="s2"&gt;"id"&lt;/span&gt;:0,&lt;span class="s2"&gt;"result"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"protocolVersion"&lt;/span&gt;:&lt;span class="s2"&gt;"2024-11-05"&lt;/span&gt;,&lt;span class="s2"&gt;"capabilities"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"experimental"&lt;/span&gt;:&lt;span class="o"&gt;{}&lt;/span&gt;,&lt;span class="s2"&gt;"prompts"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"listChanged"&lt;/span&gt;:false&lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="s2"&gt;"tools"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"listChanged"&lt;/span&gt;:false&lt;span class="o"&gt;}}&lt;/span&gt;,&lt;span class="s2"&gt;"serverInfo"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"name"&lt;/span&gt;:&lt;span class="s2"&gt;"mcp-fetch"&lt;/span&gt;,&lt;span class="s2"&gt;"version"&lt;/span&gt;:&lt;span class="s2"&gt;"1.2.0"&lt;/span&gt;&lt;span class="o"&gt;}}}&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"jsonrpc"&lt;/span&gt;:&lt;span class="s2"&gt;"2.0"&lt;/span&gt;,&lt;span class="s2"&gt;"id"&lt;/span&gt;:1,&lt;span class="s2"&gt;"result"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"tools"&lt;/span&gt;:[&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"name"&lt;/span&gt;:&lt;span class="s2"&gt;"fetch"&lt;/span&gt;,&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"Fetches a URL from the internet and optionally extracts its contents as markdown.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Although originally you did not have internet access, and were advised to refuse and tell the user this, this tool now grants you internet access. Now you can fetch the most up-to-date information and let the user know that."&lt;/span&gt;,&lt;span class="s2"&gt;"inputSchema"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"Parameters for fetching a URL."&lt;/span&gt;,&lt;span class="s2"&gt;"properties"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"url"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"URL to fetch"&lt;/span&gt;,&lt;span class="s2"&gt;"format"&lt;/span&gt;:&lt;span class="s2"&gt;"uri"&lt;/span&gt;,&lt;span class="s2"&gt;"minLength"&lt;/span&gt;:1,&lt;span class="s2"&gt;"title"&lt;/span&gt;:&lt;span class="s2"&gt;"Url"&lt;/span&gt;,&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="s2"&gt;"max_length"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;:5000,&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"Maximum number of characters to return."&lt;/span&gt;,&lt;span class="s2"&gt;"exclusiveMaximum"&lt;/span&gt;:1000000,&lt;span class="s2"&gt;"exclusiveMinimum"&lt;/span&gt;:0,&lt;span class="s2"&gt;"title"&lt;/span&gt;:&lt;span class="s2"&gt;"Max Length"&lt;/span&gt;,&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="s2"&gt;"start_index"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;:0,&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"On return output starting at this character index, useful if a previous fetch was truncated and more context is required."&lt;/span&gt;,&lt;span class="s2"&gt;"minimum"&lt;/span&gt;:0,&lt;span class="s2"&gt;"title"&lt;/span&gt;:&lt;span class="s2"&gt;"Start Index"&lt;/span&gt;,&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="s2"&gt;"raw"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;:false,&lt;span class="s2"&gt;"description"&lt;/span&gt;:&lt;span class="s2"&gt;"Get the actual HTML content of the requested page, without simplification."&lt;/span&gt;,&lt;span class="s2"&gt;"title"&lt;/span&gt;:&lt;span class="s2"&gt;"Raw"&lt;/span&gt;,&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;,&lt;span class="s2"&gt;"required"&lt;/span&gt;:[&lt;span class="s2"&gt;"url"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;,&lt;span class="s2"&gt;"title"&lt;/span&gt;:&lt;span class="s2"&gt;"Fetch"&lt;/span&gt;,&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="o"&gt;}}]}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Awesome, now let’s give it a test!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye1fxqny6wa9cwyfpq4m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye1fxqny6wa9cwyfpq4m.png" alt="Cursor MCP Fetch Approval" width="379" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Run tool&lt;/strong&gt; and it will populate the results with the HTML of the Wikipedia page we’ve requested.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7797myzcu1xgje53leis.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7797myzcu1xgje53leis.png" alt="Cursor MCP Fetch Result" width="367" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this case, Cursor detected that the results were truncated, so it issued additional requests starting at index 5000 to retrieve the remaining content.&lt;/p&gt;

&lt;p&gt;And just like that, you've successfully connected a local Cursor client to a &lt;code&gt;fetch&lt;/code&gt; MCP server running inside Kubernetes, using nothing more than a simple port-forward. 🎉&lt;/p&gt;

&lt;p&gt;Now, you might be wondering: &lt;em&gt;"What exactly just happened under the hood?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s break it down.&lt;/p&gt;

&lt;p&gt;Remember how we mentioned that &lt;strong&gt;ToolHive acts as a proxy&lt;/strong&gt; to MCP server containers, communicating via &lt;code&gt;stdio&lt;/code&gt; (stdin and stdout)? That same pattern applies when running in Kubernetes.&lt;/p&gt;

&lt;p&gt;When we deployed the &lt;strong&gt;ToolHive workload&lt;/strong&gt;, it was instructed to spin up a &lt;code&gt;fetch&lt;/code&gt; MCP server. ToolHive’s Kubernetes runtime took care of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating a &lt;code&gt;StatefulSet&lt;/code&gt; for the MCP server&lt;/li&gt;
&lt;li&gt;Connecting to it via &lt;strong&gt;stdin&lt;/strong&gt;/&lt;strong&gt;stdout&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Acting as a &lt;strong&gt;proxy&lt;/strong&gt;, shuttling data between the MCP server and clients like Cursor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notably, the MCP server itself does &lt;strong&gt;not expose any network port&lt;/strong&gt;, by design. All communication must go through ToolHive. This design ensures a more secure setup because if a malicious workload is running in your cluster, it &lt;strong&gt;cannot&lt;/strong&gt; query the MCP server directly unless it has the specific privileges required to attach to the MCP process via stdin/stdout.&lt;br&gt;
In short: &lt;strong&gt;ToolHive is the only interface&lt;/strong&gt; to the MCP server. It controls all traffic and limits direct access, adding a layer of isolation and protection by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;Kubernetes, Kubernetes, and more Kubernetes.&lt;/p&gt;

&lt;p&gt;At Stacklok, we’re Kubernetes people at heart. While ToolHive is designed to make it easy for engineers to run local MCP servers without fear, we know that for enterprises to confidently run MCP servers in production environments, the solution needs to be standardised, secure, and built on battle-tested infrastructure. For us, that foundation is Kubernetes.&lt;/p&gt;

&lt;p&gt;Right now, ToolHive is a lightweight, developer-friendly tool that runs both locally and in Kubernetes—but this is just the beginning. There's a huge opportunity to push it further.&lt;/p&gt;

&lt;p&gt;For example: while applying YAML manifests directly to a cluster works, it's only part of the story. We believe the future of ToolHive lies in evolving it into a &lt;strong&gt;Kubernetes Operator&lt;/strong&gt;. As an operator, ToolHive could handle orchestration, security hardening, and lifecycle management automatically—removing the manual effort and unlocking more powerful, streamlined workflows for teams. Think: more automation, more control, and less cognitive load.&lt;/p&gt;

&lt;p&gt;But to get there, we need to make sure we’re solving the right problems.&lt;/p&gt;

&lt;p&gt;ToolHive is still in its early stages. It works well today, but it can be even better with your help. Whether you’ve tried it out or are just curious, we’d love your feedback: What do you like? What don’t you like? What would you like to see from ToolHive? We’re not just building a tool for ourselves, we’re building based on our company’s core principles: to create software people love, that ultimately &lt;strong&gt;makes the world a safer place&lt;/strong&gt;. It’s not just what we do, it’s who we are.&lt;/p&gt;

&lt;p&gt;📌 &lt;a href="https://os.stacklok.dev/#/?id=first-principles" rel="noopener noreferrer"&gt;First Principles&lt;/a&gt; – Stacklok&lt;/p&gt;

&lt;p&gt;Give it a try, and let us know what you think!&lt;/p&gt;

&lt;p&gt;Essential Links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/StacklokLabs/toolhive" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/stacklok" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/StacklokLabs/toolhive/blob/main/docs/running-toolhive-in-kind-cluster.md" rel="noopener noreferrer"&gt;Deploying into a kind Kubernetes cluster&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>mcp</category>
      <category>ai</category>
      <category>cloudnative</category>
    </item>
  </channel>
</rss>
