Karina Babcock for Causely

Posted on Jul 24, 2024

Using OpenTelemetry and the OTel Collector for Logs, Metrics and Traces

#opentelemetry #devops #cloudnative

OpenTelemetry (fondly known as OTel) is an open-source project that provides a unified set of APIs, libraries, agents, and instrumentation to capture and export logs, metrics, and traces from applications. The project’s goal is to standardize observability across various services and applications, enabling better monitoring and troubleshooting.

Our team at Causely has adopted OpenTelemetry within our own platform, which prompted us to share a production-focused guide. Our goal is to help developers, DevOps engineers, software engineers, and SREs understand what OpenTelemetry is, its core components, and a detailed look at the OpenTelemetry Collector (OTel Collector). This background will help you use OTel and the OTel Collector as part of a comprehensive strategy to monitor and observe applications.

What Data Does OpenTelemetry Collect?

There are 3 types of data that are gathered by OpenTelemetry using the OTel Collector: logs, metrics, and traces.

Logs
Logs are records of events that occur within an application. They provide a detailed account of what happened, when it happened, and any relevant data associated with the event. Logs are helpful for debugging and understanding the behavior of applications.

OpenTelemetry collects and exports logs, providing insights into events and errors that occur within the system. For example, if a user reports a slow response time in a specific feature of the application, engineers can use OpenTelemetry logs to trace back the events leading up to the reported issue.

Metrics
Metrics are the quantitative data that measure the performance and health of an application. Metrics help in tracking system behavior and identifying trends over time. OpenTelemetry collects metrics data, which helps in tracking resource usage, system performance, and identifying anomalies.

For instance, if a spike in CPU usage is detected using OpenTelemetry metrics, engineers can investigate the potential issue using the OTel data collected and make necessary adjustments to optimize performance.

Developers use OpenTelemetry metrics to see granular resource utilization data, which helps understand how the application is functioning under different conditions.

Traces
Traces provide a detailed view of request flows within a distributed system. Traces help understand the execution path, diagnose application behaviors, and see the interactions between different services.

For example, if a user reports slow response times on a website, developers can use trace data to help better identify which service is experiencing issues. Traces can also help in debugging issues such as failed requests or errors by providing a step-by-step view of how requests are processed through the system.

Introduction to OTel Collector

You can deploy the OTel Collector as a standalone agent or as a sidecar alongside your application. The OTel Collector also includes some helpful features for sampling, filtering, and transforming data before sending it to a monitoring backend.

How it Works
The OTel Collector works by receiving telemetry data from many different sources, processing it based on configured pipelines, and exporting it to chosen backends. This modular architecture allows for customization and scalability.

The OTel Collector acts as a central data pipeline for collecting, processing, and exporting telemetry data (metrics, logs, traces) within an observability stack.

Image source: opentelemetry.io

Here’s a technical breakdown:

Data Ingestion:

Leverages pluggable receivers for specific data sources (e.g., Redis receiver, MySQL receiver).
Receivers can be configured for specific endpoints, authentication, and data collection parameters.
Supports various data formats (e.g., native application instrumentation libraries, vendor-specific formats) through receiver implementations.

Data Processing:

Processors can be chained to manipulate the collected data before export.
Common processing functions include:
- Batching: Improves efficiency by sending data in aggregates.
- Filtering: Selects specific data based on criteria.
- Sampling: Reduces data volume by statistically sampling telemetry.
- Enrichment: Adds contextual information to the data.

Data Export:

Utilizes exporters to send the processed data to backend systems.
Exporters are available for various observability backends (e.g., Jaeger, Zipkin, Prometheus).
Exporter configurations specify the destination endpoint and data format for the backend system.

Internal Representation:

Leverages OpenTelemetry’s internal Protobuf data format (pdata) for efficient data handling.
Receivers translate source-specific data formats into pdata format for processing.
Exporters translate pdata format into the backend system’s expected data format.

Scalability and Configurability:

Designed for horizontal scaling by deploying multiple collector instances.
Configuration files written in YAML allow for dynamic configuration of receivers, processors, and exporters.
Supports running as an agent on individual hosts or as a standalone service.

The OTel Collector is format-agnostic and flexible, built to work with various backend observability systems.

Setting up the OpenTelemetry (OTel) Collector
Starting with OpenTelemetry for your new system is a straightforward process that takes only a few steps:

Download the OTel Collector: Obtain the latest version from the official OpenTelemetry website or your preferred package manager.
Configure the OTel Collector: Edit the configuration file to define data sources and export destinations.
Run the OTel Collector: Start the Collector to begin collecting and processing telemetry data.

Keep in mind that the example we will show here is relatively simple. A large scale production implementation will require fine-tuning to ensure optimal results. Make sure to follow your OS-specific instructions to deploy and run the OTel collector.

Next, we need to configure some exporters for your application stack.

Integration with Popular Tools and Platforms

Let’s use an example system running a multi-tier web application using NGINX, MySQL, and Redis. Each source platform will have some application-specific configuration parameters.

Configuring Receivers
redisreceiver:

Replace receiver_name with redisreceiver
Set endpoint to the port where your Redis server is listening (default: 6379)
You can configure additional options like authentication and collection intervals in the receiver configuration. Refer to the official documentation for details.

mysqlreceiver:

Replace receiver_name with mysqlreceiver
Set endpoint to the connection string for your MySQL server (e.g., mysql://user:password@localhost:3306/database)
Similar to Redis receiver, you can configure authentication and collection intervals. Refer to the documentation for details.

nginxreceiver:

Replace receiver_name with nginxreceiver
No endpoint configuration needed as it scrapes metrics from the NGINX process.
You can configure what metrics to collect and scraping intervals in the receiver configuration. Refer to the documentation for details.

The OpenTelemetry Collector can export data to multiple providers including Prometheus, Jaeger, Zipkin, and, of course, Causely. This flexibility allows users to leverage their existing tools while adopting OpenTelemetry.

Configuring Exporters
Replace exporter_name with the actual exporter type for your external system. Here are some common options:

jaeger for Jaeger backend
zipkin for Zipkin backend
otlp/causely for Causely backend
There are exporters for many other systems as well. Refer to the documentation for a complete list.

Set endpoint to the URL of your external system where you want to send the collected telemetry data. You might need to configure additional options specific to the chosen exporter (e.g., authentication for Jaeger).

There is also a growing list of supporting vendors who consume OpenTelemetry data.

Conclusion

OpenTelemetry provides a standardized approach to collecting and exporting logs, metrics, and traces. Implementing OpenTelemetry and the OTel Collector offer a scalable and flexible solution for managing telemetry data, making it a popular and effective tool for modern applications.

You can use OpenTelemetry as part of your monitoring and observability practice in order to gather data that can help drive better understanding of the state of your applications. The most valuable part of OpenTelemetry is the ability to ingest the data for deeper analysis.

How Causely Works with OpenTelemetry

At Causely, we leverage OpenTelemetry as one of many data sources to assure application reliability for our clients. OpenTelemetry data is ingested by our Causal Reasoning Platform, which detects and remediates application failures in complex cloud-native environments. Causely is designed to be an intuitive, automated way to view and maintain the health of your applications and to eliminate the need for manual troubleshooting.