By Senthilkumar Sugumar
As a full-stack developer working on distributed systems and cloud-native applications, I recently embarked on a journey to master OpenTelemetry β an open-source observability framework that unifies logs, metrics, and traces in a vendor-neutral format.
From local Node.js development to Kubernetes deployment on Amazon EKS, and from exporting telemetry to Grafana Cloud and Elastic Cloud, I explored how OpenTelemetry can provide end-to-end observability across an entire system.
π What is OpenTelemetry?
OpenTelemetry (OTel) is a standardized framework for instrumenting, generating, collecting, and exporting telemetry data:
- Traces: Follow the flow of requests across microservices
- Metrics: Monitor application and system health
- Logs: Capture contextual outputs and events
π οΈ Manual vs Automatic Instrumentation
π§ Manual Instrumentation
Using SDKs like @opentelemetry/sdk-node, developers explicitly define spans, metrics, and logs:
const span = tracer.startSpan('get-user');
// Your business logic here
span.end();
`
β
Pros: More control over what gets tracked
β Cons: More boilerplate and maintenance
βοΈ Automatic Instrumentation
Use auto-instrumentation tools for popular frameworks like Express:
node --require @opentelemetry/auto-instrumentations-node/register app.js
β
Pros: Quick setup, zero code changes
β Cons: Limited customization options
π Telemetry Lifecycle
| Layer | Method | Telemetry Types |
|---|---|---|
| Application | SDKs, Auto-instrumentation | Traces, Metrics, Logs |
| Host/Node | OTEL Agent (hostmetrics, filelog) | CPU, Memory, Disk, Logs |
| Kubernetes | OTEL Operator, Sidecars | Pod, Node, Container |
| Cloud Providers | AWS, GCP, Azure Receivers | Cloud-specific metrics |
All of this data is centralized and processed using the OpenTelemetry Collector.
π§° OpenTelemetry Collector: The Heart of Observability
The OTEL Collector receives, processes, and exports telemetry data.
| Component | Role |
|---|---|
| Receivers | Ingest telemetry (OTLP, Prometheus, Jaeger, etc.) |
| Processors | Enrich/filter data (batching, resource attribution) |
| Exporters | Send to backends (Grafana, Elastic, etc.) |
| Extensions | Add health checks, authentication, etc. |
π¦ Collector Distributions
| Type | Description |
|---|---|
| Core | Basic exporters like OTLP, Prometheus, Jaeger |
| Contrib | Extras: hostmetrics, filelog, Kafka, AWS, Elastic, etc. |
π My Learning Path β Real-World Projects
π§ͺ 1. Basic OTEL SDK with Node.js
π GitHub: opentelemetry-nodejs
- Used
@opentelemetry/sdk-nodefor manual instrumentation - Exported traces to Jaeger, metrics to Prometheus
- Local OTEL Collector setup
β Takeaway: Learned core SDK usage, span creation, and OTLP export setup
π 2. Grafana Cloud Integration with Node.js
π GitHub: OpenTelemetry-Node.js-App-with-Grafana-Cloud-Integration
- Exported traces to Grafana Tempo, metrics to Grafana Cloud
- Used OTLP HTTP exporter with basic auth
- Configured OTEL Collector for secure pipeline
β Takeaway: Experience with remote observability and secure telemetry flows
π¦ 3. Full Observability with Elastic APM + ELK
π GitHub: openTelemetry-elastic-APM-nodejs
- Sent traces to Elastic APM
- Used Logstash for log ingestion into Elasticsearch
- Visualized telemetry in Kibana
β Takeaway: Full-stack telemetry system using Elastic APM + ELK
βοΈ 4. OpenTelemetry on Kubernetes with FastAPI
π GitHub: fastapi-otel-eks
- Deployed FastAPI on Amazon EKS
- Used OTEL SDK +
opentelemetry-instrument - Enabled auto-instrumentation with OTEL Kubernetes Operator
- Exported to Elastic Cloud via OTLP and API key
β Takeaway: Production-grade observability in Kubernetes
π§© Common Challenges I Faced
- YAML syntax errors and pipeline misconfigurations in the Collector
- Missing telemetry due to incorrect API keys or OTLP endpoints
While setting up Grafana, Prometheus, Jaeger, and the ELK stack locally with Docker for a Node.js app, I faced several dependency issues and ELK authentication problemsβespecially with the APM setup.
However, when I switched to cloud services, both auto and manual instrumentation worked smoothly with Elastic and Grafana Cloud using Docker-based Node.js applications.
Later, I explored OpenTelemetry on Kubernetes by deploying a Python (FastAPI) application integrated with Elastic Cloud. Initially, I was able to collect server-level metrics and logs using Elastic Agents, but I didnβt receive application-level traces. To fix this, I added OpenTelemetry API-based instrumentation in the Python app, which enabled me to capture application-level metrics, traces, and logs successfully.
π§΅ Repository Recap
| Use Case | GitHub Repo |
|---|---|
| Basic Node.js OTEL SDK | opentelemetry-nodejs |
| Grafana Cloud Integration | OpenTelemetry-Node.js-App-with-Grafana-Cloud-Integration |
| Elastic APM + Logs | openTelemetry-elastic-APM-nodejs |
| FastAPI + EKS + Elastic | fastapi-otel-eks |
π‘ Final Thoughts
OpenTelemetry has transformed how I monitor, debug, and operate distributed apps. Whether it's a monolith or microservices on Kubernetes, OTEL gives me the visibility I need β across any stack.
Observability isnβt just about tools β itβs about empowering developers to deeply understand their systems.
π« Letβs Connect
Thanks for reading! π
Top comments (0)