DEV Community

Cover image for Observability Journey with OpenTelemetry
senthilkumar sugumar
senthilkumar sugumar

Posted on

Observability Journey with OpenTelemetry

By Senthilkumar Sugumar

As a full-stack developer working on distributed systems and cloud-native applications, I recently embarked on a journey to master OpenTelemetry β€” an open-source observability framework that unifies logs, metrics, and traces in a vendor-neutral format.

From local Node.js development to Kubernetes deployment on Amazon EKS, and from exporting telemetry to Grafana Cloud and Elastic Cloud, I explored how OpenTelemetry can provide end-to-end observability across an entire system.


πŸ” What is OpenTelemetry?

OpenTelemetry (OTel) is a standardized framework for instrumenting, generating, collecting, and exporting telemetry data:

  • Traces: Follow the flow of requests across microservices
  • Metrics: Monitor application and system health
  • Logs: Capture contextual outputs and events

πŸ› οΈ Manual vs Automatic Instrumentation

πŸ”§ Manual Instrumentation

Using SDKs like @opentelemetry/sdk-node, developers explicitly define spans, metrics, and logs:

const span = tracer.startSpan('get-user');
// Your business logic here
span.end();
Enter fullscreen mode Exit fullscreen mode


`

βœ… Pros: More control over what gets tracked
❌ Cons: More boilerplate and maintenance


βš™οΈ Automatic Instrumentation

Use auto-instrumentation tools for popular frameworks like Express:

node --require @opentelemetry/auto-instrumentations-node/register app.js

βœ… Pros: Quick setup, zero code changes
❌ Cons: Limited customization options


πŸ” Telemetry Lifecycle

Layer Method Telemetry Types
Application SDKs, Auto-instrumentation Traces, Metrics, Logs
Host/Node OTEL Agent (hostmetrics, filelog) CPU, Memory, Disk, Logs
Kubernetes OTEL Operator, Sidecars Pod, Node, Container
Cloud Providers AWS, GCP, Azure Receivers Cloud-specific metrics

All of this data is centralized and processed using the OpenTelemetry Collector.


🧰 OpenTelemetry Collector: The Heart of Observability

The OTEL Collector receives, processes, and exports telemetry data.

Component Role
Receivers Ingest telemetry (OTLP, Prometheus, Jaeger, etc.)
Processors Enrich/filter data (batching, resource attribution)
Exporters Send to backends (Grafana, Elastic, etc.)
Extensions Add health checks, authentication, etc.

πŸ“¦ Collector Distributions

Type Description
Core Basic exporters like OTLP, Prometheus, Jaeger
Contrib Extras: hostmetrics, filelog, Kafka, AWS, Elastic, etc.

πŸ“ˆ My Learning Path – Real-World Projects

πŸ§ͺ 1. Basic OTEL SDK with Node.js

πŸ”— GitHub: opentelemetry-nodejs

  • Used @opentelemetry/sdk-node for manual instrumentation
  • Exported traces to Jaeger, metrics to Prometheus
  • Local OTEL Collector setup

βœ… Takeaway: Learned core SDK usage, span creation, and OTLP export setup


🌐 2. Grafana Cloud Integration with Node.js

πŸ”— GitHub: OpenTelemetry-Node.js-App-with-Grafana-Cloud-Integration

  • Exported traces to Grafana Tempo, metrics to Grafana Cloud
  • Used OTLP HTTP exporter with basic auth
  • Configured OTEL Collector for secure pipeline

βœ… Takeaway: Experience with remote observability and secure telemetry flows


πŸ“¦ 3. Full Observability with Elastic APM + ELK

πŸ”— GitHub: openTelemetry-elastic-APM-nodejs

  • Sent traces to Elastic APM
  • Used Logstash for log ingestion into Elasticsearch
  • Visualized telemetry in Kibana

βœ… Takeaway: Full-stack telemetry system using Elastic APM + ELK


☁️ 4. OpenTelemetry on Kubernetes with FastAPI

πŸ”— GitHub: fastapi-otel-eks

  • Deployed FastAPI on Amazon EKS
  • Used OTEL SDK + opentelemetry-instrument
  • Enabled auto-instrumentation with OTEL Kubernetes Operator
  • Exported to Elastic Cloud via OTLP and API key

βœ… Takeaway: Production-grade observability in Kubernetes


🧩 Common Challenges I Faced

  • YAML syntax errors and pipeline misconfigurations in the Collector
  • Missing telemetry due to incorrect API keys or OTLP endpoints
  • While setting up Grafana, Prometheus, Jaeger, and the ELK stack locally with Docker for a Node.js app, I faced several dependency issues and ELK authentication problemsβ€”especially with the APM setup.

  • However, when I switched to cloud services, both auto and manual instrumentation worked smoothly with Elastic and Grafana Cloud using Docker-based Node.js applications.

  • Later, I explored OpenTelemetry on Kubernetes by deploying a Python (FastAPI) application integrated with Elastic Cloud. Initially, I was able to collect server-level metrics and logs using Elastic Agents, but I didn’t receive application-level traces. To fix this, I added OpenTelemetry API-based instrumentation in the Python app, which enabled me to capture application-level metrics, traces, and logs successfully.


🧡 Repository Recap

Use Case GitHub Repo
Basic Node.js OTEL SDK opentelemetry-nodejs
Grafana Cloud Integration OpenTelemetry-Node.js-App-with-Grafana-Cloud-Integration
Elastic APM + Logs openTelemetry-elastic-APM-nodejs
FastAPI + EKS + Elastic fastapi-otel-eks

πŸ’‘ Final Thoughts

OpenTelemetry has transformed how I monitor, debug, and operate distributed apps. Whether it's a monolith or microservices on Kubernetes, OTEL gives me the visibility I need β€” across any stack.

Observability isn’t just about tools β€” it’s about empowering developers to deeply understand their systems.


πŸ“« Let’s Connect

Thanks for reading! 🌟

Top comments (0)