Bridging the Observability Gap in MCP Servers with ToolHive

#kubernetes #monitoring #tooling

In our previous post, we explored why Kubernetes observability requires both OpenTelemetry (OTel) and Prometheus. Together, they form a powerful foundation for monitoring modern workloads, but only when those workloads expose telemetry. What happens when they don't?

That's exactly the case with many Model Context Protocol (MCP) servers. These servers run critical workloads but rarely expose metrics or integrate with observability frameworks. For operations teams, they behave like black boxes; you see requests going in and responses coming out, but nothing about what's happening inside.

ToolHive was built to reduce this gap. Running natively inside Kubernetes, ToolHive acts as an intelligent proxy that collects usage statistics from MCP servers without requiring any modifications to the servers themselves. It then seamlessly feeds that data into your existing OTel + Prometheus stack, giving you the same dashboards, alerts, and reliability insights you rely on for other workloads.

Recap: The MCP Observability Problem

The core issue is a mismatch between modern observability standards and the operational reality of many MCP servers. While Prometheus expects to scrape a /metrics endpoint and OTel expects data to be pushed from instrumented applications, many MCP servers do neither. They are designed for a single purpose: providing specialized capabilities to AI systems by bridging models to the real world. Operational telemetry is often an afterthought, if it's considered at all.

This lack of metrics makes it impossible to answer basic but critical questions: How many requests is my MCP server handling per second? What is the average latency of tool calls? Is the server experiencing errors or timeouts? How much CPU and memory is the server consuming?

Without this data, you're flying blind, unable to optimize performance, troubleshoot issues, or ensure the reliability of your AI-powered applications.

How ToolHive Collects Metrics

ToolHive's approach is straightforward: instead of relying on MCP servers to expose their own metrics, it wraps them and acts as an intermediary for all client requests. ToolHive runs alongside MCP servers in Kubernetes and observes their activity directly at the orchestration layer.

As requests and responses flow through ToolHive, it observes and records key operational data points: request counts and rates, response latency and duration, error codes and status, and tool usage statistics. It also generates distributed traces for each MCP interaction, providing end-to-end visibility into request flows. By intercepting request and usage data, ToolHive can measure request volumes, latencies, and error rates while attributing metrics to specific MCP servers or workloads.

This approach decouples observability from the MCP server itself. Zero server modification means existing MCP servers work immediately without code changes or additional dependencies. Protocol awareness allows ToolHive to understand MCP-specific operations like tool calls and resource requests, providing metrics that generic proxies couldn't capture. Kubernetes native deployment means it integrates naturally with service discovery and scaling patterns.

Since ToolHive is built with OTel and Prometheus in mind, it generates and exposes both metrics and traces in formats your existing monitoring stack can consume immediately, normalising data into standard OTel and Prometheus formats.

Four Supported Architectures

ToolHive is designed for flexibility and can integrate into a variety of observability setups. It supports four primary architectures for feeding data to your pipeline:

Architecture 1 (Recommended): ToolHive → OTel Collector ← Prometheus ToolHive pushes both metrics and traces to an OpenTelemetry collector using OTLP (OpenTelemetry Protocol). The collector exposes a /metrics endpoint that Prometheus scrapes for metrics data, while traces are exported to your tracing backend (like Jaeger or Tempo). This is a robust and scalable architecture that centralizes data collection and processing while leveraging the pull-based reliability of Prometheus.

Architecture 2: ToolHive → OTel Collector → Prometheus (RemoteWrite) Similar to Architecture 1, ToolHive pushes metrics and traces to the OTel collector. The collector exports traces to your tracing backend and uses the Prometheus RemoteWrite exporter to push metrics directly to the Prometheus server. This reduces scraping overhead but can lose data if Prometheus is unavailable when the push occurs.

Architecture 3: ToolHive ← Prometheus (Direct Scrape) ToolHive exposes its own /metrics endpoint for Prometheus scraping, while traces are still pushed to an OTel collector for export to tracing backends. This is the simplest setup for metrics collection but requires separate configuration for trace export.

Architecture 4: Hybrid This approach maximizes flexibility: ToolHive pushes traces to an OTel collector (which exports to tracing backends) while exposing a /metrics endpoint that Prometheus scrapes directly. This provides full observability coverage but adds operational complexity.

Why We Recommend Architecture 1

While all four architectures are valid, Architecture 1 represents the best practice for most modern Kubernetes environments. It strikes the right balance for most Kubernetes deployments by providing several key advantages:

Centralization and Standardization: It centralizes both metrics and traces in a single pipeline, making it easier to manage, enrich, and route to different backends. For organizations already using OTel collectors for other services, this architecture maintains consistency across the monitoring stack.

Reliability: The pull-based model of Prometheus is inherently reliable for metrics. The OTel collector acts as a reliable buffer between ToolHive and both Prometheus and tracing backends, handling temporary network issues or unavailability gracefully. If Prometheus is down for maintenance, it can catch up by scraping when it comes back online, rather than losing data that would have been pushed during the outage.

Flexibility: The OTel collector can process and export both metrics and traces to any number of destinations. For metrics, this includes Prometheus, long-term storage, or analytics platforms. For traces, it can route to Jaeger, Tempo, or other tracing backends. It can add labels, perform transformations, and route data to multiple backends if needed, avoiding vendor lock-in.

ToolHive in Action: Real Metrics and Traces From MCP Servers

Once you've deployed ToolHive and configured it to work with your OTel + Prometheus stack, your dashboards will be populated with both metrics and traces that provide immediate visibility into MCP server operations:

Request Metrics include counters (toolhive_mcp_requests_total) for total requests and toolhive_mcp_request_duration_seconds_* histogram showing p95 and p99 latency,, broken down by MCP server and operation type. These help identify usage patterns and performance trends across your MCP infrastructure.

Tool Usage Statistics track which MCP tools are being called most frequently with toolhive_mcp_tool_calls_total (counter for specific tool invocations), success rates for different tool types, and usage patterns over time. This data is invaluable for understanding how AI systems are interacting with your MCP servers and which capabilities are most critical.

Distributed Traces show the complete journey of MCP requests, from client initiation through ToolHive processing to server response. Each trace includes timing information for different phases of the MCP interaction, making it possible to identify bottlenecks and understand request flow patterns. Traces are correlated with metrics through trace and span IDs, enabling powerful troubleshooting workflows.

All metrics include standard Kubernetes labels for namespace, pod, and service, making it easy to aggregate and filter data in existing dashboards. These are the metrics and traces you need to build meaningful dashboards, set up critical alerts, and truly understand the health and performance of your MCP workloads. The observability data integrates seamlessly with alerting rules, allowing teams to set up notifications for MCP-specific issues like tool failure rates or unusual usage patterns.

Call to Action: Try ToolHive / Join the Community

MCP observability gaps don't have to be a given. With ToolHive, operations teams can gain critical visibility into your AI workloads without waiting for server-side changes or upstream telemetry support. By supporting multiple architectures - and recommending a best practice approach - ToolHive makes it possible to monitor MCP servers as part of a standard OTel + Prometheus pipeline.

The project is actively developed and welcomes community input. Whether you're running a single MCP server or managing dozens across multiple clusters, ToolHive can provide the visibility you need to operate confidently. Please checkout ToolHive and connect with us on Discord.

Ready to see it in action? In the next post, we'll walk through a hands-on guide to deploying ToolHive in Kubernetes, complete with Helm charts, kubectl steps, and a starter Grafana dashboard.

If you missed the earlier posts in this series, be sure to check out: Post 1: The Next Observability Challenge: OTel, Prometheus, and MCP Servers in Kubernetes