"What caused the outage last night, and how do we fix it?"
Imagine typing this question and getting an immediate, comprehensive answer – not from a human on call, but from an AI that has already analyzed your telemetry data, identified the root cause, and prepared a detailed explanation and recommendations with supporting evidence.
OTEL MCP Server makes this possible today by connecting AI systems directly to your OpenTelemetry data. While we've solved the collection problem with OpenTelemetry, we're still facing a critical gap – mountains of telemetry data that remain largely untapped because our tools can't keep pace with the volume and complexity. OTEL MCP Server bridges this gap, transforming raw telemetry into actionable intelligence through the power of AI.
What is OpenTelemetry?
OpenTelemetry (OTEL) is an open-source observability framework that provides a standardized way to collect telemetry data from your applications and infrastructure. It's a CNCF (Cloud Native Computing Foundation) project that has quickly become the industry standard for instrumentation.
The key components of OpenTelemetry include:
- Traces: Distributed tracing that tracks requests as they flow through your services
- Metrics: Numerical measurements collected at regular intervals
- Logs: Time-stamped records of discrete events
- Baggage: Contextual metadata that enriches your telemetry data
OpenTelemetry solves the instrumentation problem by providing a vendor-neutral API and SDK that works across languages and frameworks. This means you can instrument your code once and send the data to any compatible backend – whether that's Elasticsearch, Prometheus, Jaeger, or commercial observability platforms.
What is OTEL MCP Server?
OTEL MCP Server is an open-source tool that implements the Model Context Protocol (MCP) to provide AI systems with direct access to your OpenTelemetry data in Elasticsearch. It acts as a specialized connector that allows AI assistants to query and analyze your traces, metrics, and logs in real-time, regardless of your application architecture - from monoliths to microservices, cloud-native to on-premises deployments.
Key Capabilities
- Direct Elasticsearch Queries: Execute powerful, custom queries against your telemetry data
- Service-Aware Tools: Filter all operations by service or query across multiple services
- Field Discovery: Automatically identify available fields to construct effective queries
- Dual Mapping Support: Works with both native OTEL and Elastic Common Schema (ECS) mapping modes
- Cross-Platform: Runs on Windows, macOS, and Linux
Why It Matters
OpenTelemetry has solved the data collection problem by providing a vendor-neutral way to instrument applications and gather telemetry data. However, traditional observability tools still require humans to manually navigate dashboards and query interfaces.
While paid observability platforms offer features like anomaly detection and alerting, they often come with significant costs and lock you into proprietary ecosystems. These services are valuable but can be expensive as your data volume grows, and they still require manual investigation when issues arise.
OTEL MCP Server offers a complementary approach by enabling AI systems to:
- Investigate Incidents Autonomously: AI can query traces during outages, find error patterns, and correlate issues across your entire application landscape
- Provide Context-Aware Assistance: When reviewing code, AI can simultaneously analyze related production telemetry data
- Answer Complex Questions: "Show me all errors from the authentication flow in the last 24 hours" becomes a simple natural language request
- Detect Anomalies On-Demand: Instead of configuring static thresholds, ask the AI to identify unusual patterns in your data
As developers, we know that a well-designed system reveals its own inner workings. A truly complete software system teaches developers how it functions, how to troubleshoot it, and how to extend it. By connecting OpenTelemetry data to AI, OTEL MCP Server amplifies this natural learning process, making the system's behavior more transparent and accessible.
Getting Started
Prerequisites
Before you begin, you'll need:
-
Elasticsearch (version 8.x or higher) with OpenTelemetry data
- OTEL MCP Server works with both OTEL and ECS mapping modes
- For development, you can use Elasticsearch with security disabled
-
OpenTelemetry Data
- Either from your own instrumented applications
- Or from the OpenTelemetry Demo for testing
Windsurf Integration
To use OTEL MCP Server with Windsurf, add this to your Windsurf MCP configuration:
{
"servers": {
"otel-mcp-server": {
"command": "npx",
"args": ["-y", "otel-mcp-server"],
"env": {
// Replace with your Elasticsearch URL and credentials
"ELASTICSEARCH_URL": "http://localhost:9200",
"ELASTICSEARCH_USERNAME": "elastic",
"ELASTICSEARCH_PASSWORD": "changeme",
"SERVER_NAME": "otel-mcp-server",
"LOGLEVEL": "OFF"
}
}
}
}
Once running, the server automatically detects available telemetry types in your Elasticsearch instance and registers the appropriate tools for your environment.
Real-World Applications
The true power of OTEL MCP Server emerges when integrated with AI-powered tools like Windsurf, creating a seamless bridge between code and telemetry data.
End-to-End Incident Management
In a recent demonstration, OTEL MCP Server was used with Windsurf and the OpenTelemetry Demo application to generate an entire incident report based on the demo application's test features, along with issues injected by Chaos Mesh. With just a few natural language prompts, the AI was able to:
- Discover and analyze error patterns across multiple services
- Correlate logs, traces, and metrics to identify the root cause
- Generate a comprehensive incident report with supporting evidence
- Provide actionable recommendations for fixing the issues
You can see the process in action and view the end result.
Unified Code and Telemetry Experience
With OTEL MCP Server, developers can interact with both their code and production telemetry through a single interface. This creates a powerful feedback loop where you can:
- Analyze Performance Bottlenecks: "Find slow database queries in our payment service and suggest code improvements"
- Debug with Production Context: "Show me recent error traces related to this authentication function I'm debugging"
- Validate Fixes: "After my recent fix to the checkout service, did the error rate decrease?"
- Understand System Behavior: "How does this microservice interact with others in production?"
- Identify Patterns: "Show me the memory usage metrics for our application over the past day"
- Compare Deployments: "Compare response times before and after our recent deployment"
The AI can directly access this data, analyze it, combine it with code context, and provide insights - all without you having to switch contexts or manually query your observability platform.
This unified view creates a development environment where the system itself becomes self-documenting and self-healing through its telemetry data, teaching developers how it works, how to fix issues, and how to extend it effectively.
The Future of AI-Powered Observability
OpenTelemetry has democratized telemetry data collection across the industry. Now, OTEL MCP Server takes the next step by democratizing access to this data for AI systems. This represents a fundamental shift in how we interact with observability data, moving toward a future where:
- Incident response becomes more automated and efficient
- Developers get real-time feedback based on production telemetry
- The gap between code and runtime behavior narrows significantly
- Observability becomes a natural part of every developer's workflow
Ready to try it yourself? Check out the GitHub repository and start connecting your AI tools to your OpenTelemetry!
Top comments (0)