I’ve just released oteldoctor v0.1.0, an open-source Go CLI that analyzes OpenTelemetry Collector configurations before they reach production.
Why I built it
OpenTelemetry Collector configs often start small.
Then they grow.
A few receivers.
A few processors.
A couple of exporters.
Some Kubernetes manifests.
A little bit of batching.
A debug exporter during testing.
A quick HTTP endpoint.
A few attributes added for convenience.
Eventually, that YAML file becomes production-critical infrastructure.
The problem is that a Collector config can be valid YAML, and even acceptable to the Collector, while still being risky for production.
For example:
- telemetry may be dropped when exporters fail
- the Collector may restart during memory spikes
- debug endpoints may be exposed
- hardcoded secrets may leak into version control
- high-cardinality attributes may increase observability cost
- service identity may be inconsistent across environments
That’s the gap oteldoctor tries to fill.
What oteldoctor checks
oteldoctor analyzes Collector configs across six categories:
| Category | Examples |
|---|---|
| Structural | Undefined references, unused components, empty pipelines |
| Reliability | Missing memory_limiter, missing batch, retry/queue gaps |
| Security | Plain HTTP, hardcoded secrets, exposed debug endpoints |
| Cost / Cardinality | High-cardinality dimensions, missing sampling, debug in production |
| Semantic Quality | Deprecated attributes, missing service identity |
| Kubernetes Readiness |
GOMEMLIMIT, resource limits, probes, exposure risks |
Example usage
oteldoctor analyze ./deploy --profile production
Generate SARIF for GitHub Code Scanning:
oteldoctor analyze ./deploy --profile production --format sarif > oteldoctor.sarif
Render the Collector pipeline as a graph:
oteldoctor graph collector.yaml --format mermaid
Explain a rule:
oteldoctor explain OTEL-SEC-202
Install:
go install github.com/firfircelik/oteldoctor/cmd/oteldoctor@v0.1.0
What it is not
oteldoctor does not replace the OpenTelemetry Collector’s own configuration validation.
The Collector can tell you whether a config is syntactically valid and operationally acceptable.
oteldoctor focuses on production readiness: reliability, security, cost/cardinality, semantic convention quality, and Kubernetes deployment risks.
GitHub: https://github.com/firfircelik/oteldoctor
This is the first public release. I’d love feedback from anyone using OpenTelemetry Collector, especially around real-world configs and rule calibration.
Top comments (0)