Tom Weiss for Aspecto

Posted on Aug 24, 2022 • Edited on Nov 8, 2022 • Originally published at aspecto.io

OpenTelemetry Collector: A Friendly Guide for Devs

In this guide, you will learn everything you need to know about the OpenTelemetry Collector.

Before reading about it myself, the collector felt like a complex beast. But as I was learning, I realized it was not so complicated after all.

In a nutshell, the collector receives telemetry data and sends it to wherever you configure it. It can also manipulate the data as you see fit. These are the key concepts you need to understand.

We will later go through a practical example where we configure the collector and visualize data.

Now that you know this, we can further elaborate.

What to Expect

How does the OpenTelemetry Collector work?
The Architecture of Running an OpenTelemetry Collector
- Gateway
- Agent
OpenTelemetry Collector Deployment Methods
Configuring OpenTelemetry Collector Gateway with Jaeger for Visualization
Configuring Your Collector to Send Data to an Observability Vendor
OpenTelemetry Collector Extensions

How does the OpenTelemetry Collector work?

Images speak louder than words. So here’s one that’d help you figure this one out:

So what is happening here?

Looking at data ingestion to the OTel collector, you should know you can send logs, metrics, and traces using the OTLP protocol, which supports those data types.

The OpenTelemetry collector has three main components:

Receivers

As their name suggests, they are in charge of receiving spans from our span emitting applications. We can use a native OTEL SDK to create spans and export them to the receiver that listens for calls in a specified port on the collector.

We can configure our receivers to accept both gRPC and HTTP protocols.

For a list of receivers, you can use this link.

Processors

The processor’s goal is to enable us to manipulate the data the collector receives right before we export it to our DB or backend. Common use cases:

Sampling

Assume we have a lot of data we do not actually want to store in our backend. We can define a processor responsible for implementing the sampling logic and deciding which spans would be sent to the backend/DB.

Note – the collector still receives all traces using the receiver, but then the processor comes and picks the relevant ones.

Data alteration

Say you want to remove sensitive data before it reaches the distributed tracing backend (to avoid tokens or other info from leaking).

You could use a processor to do this for you, but note that this would be a processor that has to be marked as such that is able to mutate your data, thus causing the creation of a copy of your entire data for this processor. Meaning, that every processor marked as mutatesData: true would work with a copy of the entire data as it received it.

That is not a problem in most cases, but you should be aware of it. It is called data ownership, and you can read more about it here.

You could also use this to add attributes (for example, calculate new ones based on others).

Exporting metrics from spans

Using the Span Metrics Processor, we can aggregate data from spans into metrics and then export it to a relevant backend. For example, this processor could gather data about the HTTP path and response status code, enabling you to see which routes have the most 500 errors. You can read more about this here.

You could take a look at the OTEL Collector Contrib to see a list of additional processors you can use.

Exporters

The goal of the exporters is to take the telemetry data as it is represented in the collector (OTEL data), convert it to a different format when needed (like Jaeger), and then send it to the endpoint you define. The sending part is done using the OTLP format, over either HTTP or gRPC.

You can export directly to Elasticsearch, Jaeger, Zipkin, and other vendors to enable distributed services visualization. Visit the OTEL Collector Contrib to see a list of additional exporters you can use.

Why should you use the OpenTelemetry collector?

From the capabilities described above, you can understand that the collector is helpful in the following scenarios:

You want to control the data leaving your company
You want to convert data before it leaves for another system
You want to add abilities you would not otherwise have, e.g., extracting metrics from spans and other data manipulations.
You want to send your data to multiple destinations as easily as adding a few lines of configuration (like different vendors)
You want to control the sampling levels yourself (even though some vendors like Aspecto do let you do this without the need to set up your own collector).

But all this goodness comes with a price: you must know that maintaining a collector has its own complexities to take care of, like security, infrastructure management & cost (which could go up or down depending on your needs), etc.

If you want to go about it by yourself, the next part talks about the architecture of running an OTEL collector.

The Architecture of Running an OpenTelemetry Collector

The collector binary has two modes of running:

As an Agent
As a Gateway

Gateway

When we use the collector as a Gateway, it is run as a standalone machine independent from any other services or serves as a central point at which the telemetry data of an entire distributed architecture comes in.

It provides more advanced capabilities than the agent, such as tail-based sampling (in which the collector only exports spans that have errors, for example).

It can also help with API token management and reduce the number of egress points required to send data.

You should know that each collector instance is independent. This means that if you want to scale, you could set up a “gateway cluster” with various collector gateway instances behind a load balancer.

Agent

The agent runs at the same host as your app server (whether it runs as a container or not).

Advantages:

It can add insightful information about the host (IP, hostname, etc.).
Receives the data faster since there’s no DNS resolving to do (it’s just localhost).
Offload responsibilities otherwise belonging to the app like batching, compression, retry, and more.

After doing all the above, it would send the data to the Gateway Collector. For more information about all the above, visit Getting Started | OpenTelemetry Collector.

Now that you’re familiar with the two modes of operation of the collector, we can talk about deployment methods.

OpenTelemetry Collector Deployment Methods

I would suggest two options:

Option 1

All the microservices of your app interact directly with a single collector instance as a collector gateway:

Option 2

Each microservice writes to an agent running at the same host as the microservice. Then, the agent writes to a central collector gateway:

Despite the benefits of using the agent mode (listed above), I would only recommend this approach when you’ve reached a level of maturity with your app and OpenTelemetry, as maintaining it introduces infrastructure overhead. In the beginning, you could probably do just as well with the simple single collector mode.

And that’s the reason why I will be showing a practical example of the first option (gateway).

Configuring OpenTelemetry Collector Gateway with Our App and Jaeger for Visualization

Here’s a diagram of what we will be building:

What we want to achieve: we run a collector and Jaeger using docker-compose, set up OTLP receivers, add some data using a processor, and export to Jaeger for visualization.

Setting up the collector gateway

First, let’s set up a configuration file for our collector.

Collector-gateway.yaml:

receivers:
 otlp:
   protocols:
     http:
       endpoint: 0.0.0.0:4318
     grpc:
       endpoint: 0.0.0.0:4317
processors:
 batch:
   timeout: 1s
 resource:
   attributes:
     - key: test.key
       value: "test-value"
       action: insert
exporters:
 logging:
   loglevel: info
 jaeger:
   endpoint: jaeger-all-in-one:14250
   insecure: true
extensions:
 health_check:
 pprof:
   endpoint: :1888
 zpages:
   endpoint: :55679
service:
 extensions: [pprof, zpages, health_check]
 pipelines:
   traces:
     receivers: [otlp]
     processors: [batch, resource]
     exporters: [logging, jaeger]

So what do we see in the file above?

First, we set up OTLP receivers and told our collector to add HTTP & gRPC endpoints at ports 4318 & 4317.

Then, we set up a batch processor that batches up the spans together and every 1 second sends the batch forward. In production, you would want more than 1 second, but I set this here to 1 second for instant feedback in Jaeger.

Then we add another processor that inserts a new attribute on each span that passed through it. The key: “test.key”, the value: “test-value”.

After that, we define two exporters: one to log everything to the console and the other to export the spans to Jaeger.

We do it insecurely now, but you would not want to do that for production purposes.

Jaeger-all-in-one is the container’s name in docker-compose. You will see its setup below.

Ignore the extensions part, for now (we will discuss it later).

The service is the orchestrator of all the above. We set up a tracing pipeline (though you also could add a metrics/logs one). In our tracing pipeline, we tell the collector to use all the definitions from above.

Now that we have the collector config ready – let’s use it in the collector.

Docker-compose.yaml:

version: "2"
services:
 # Jaeger
 jaeger-all-in-one:
   image: jaegertracing/all-in-one:latest
   ports:
     - "16686:16686"
     - "14268"
     - "14250"
 # Collector
 collector-gateway:
   image: otel/opentelemetry-collector:0.29.0
   volumes:
     - ./collector-gateway.yaml:/etc/collector-gateway.yaml
   command: [ "--config=/etc/collector-gateway.yaml" ]
   ports:
     - "1888:1888"   # pprof extension
     - "13133:13133" # health_check extension
     - "4317:4317"        # OTLP gRPC receiver
     - "4318:4318"        # OTLP HTTP receiver
     - "55670:55679" # zpages extension
   depends_on:
     - jaeger-all-in-one

So we configure Jaeger to run in the relevant UI and data collection ports and call it Jaeger-all-in-one, as mentioned before.

Then, we define a collector-gateway container to use the otel collector binary and map the volume so that the container would take the otel config file we created above and use it.

The command below is where the collector takes that file from within the container.

Then you can see the ports for each receiver/extension and the dependency on the existence of the jaeger container.

Now let’s run with the following command:

docker compose up

We need to start sending data to the collector and then see it from Jaeger.

Let’s set up a simple nodejs-express app with OpenTelemetry enabled for this.

Let’s use an express-generator to generate our initial code:

npx express-generator otel-collector-sender

This structure should have been automatically created for you:

Let’s install the relevant dependencies:

npm install @opentelemetry/sdk-node @opentelemetry/api opentelemetry-instrumentation-express
 @opentelemetry/exporter-trace-otlp-http

Now your package.json should look like this:

{
 "name": "otel-collector-sender",
 "version": "0.0.0",
 "private": true,
 "scripts": {
   "start": "node ./bin/www"
 },
 "dependencies": {
   "@opentelemetry/api": "^1.0.4",
   "@opentelemetry/exporter-trace-otlp-http": "^0.28.0",
   "@opentelemetry/instrumentation-http": "^0.28.0",
   "@opentelemetry/sdk-node": "^0.28.0",
   "cookie-parser": "~1.4.4",
   "debug": "~2.6.9",
   "express": "~4.16.1",
   "http-errors": "~1.6.3",
   "jade": "~1.11.0",
   "morgan": "~1.9.1",
   "opentelemetry-instrumentation-express": "^0.27.1"
 }
}

Launch the app to see that it works as expected. If you run “npm start” and go to localhost:3000 you should see this, which means our Express app is up:

Create a tracing.js file to enable tracing

/* tracing.js */
// Require dependencies
const opentelemetry = require("@opentelemetry/sdk-node");
const { diag, DiagConsoleLogger, DiagLogLevel } = require('@opentelemetry/api');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { ExpressInstrumentation } = require('opentelemetry-instrumentation-express');
// For troubleshooting, set the log level to DiagLogLevel.DEBUG
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);
const exporter = new OTLPTraceExporter({
 // optional - url default value is http://localhost:55681/v1/traces
 url: 'http://localhost:4318/v1/traces',
 // optional - collection of custom headers to be sent with each request, empty by default
 headers: {},
});
const sdk = new opentelemetry.NodeSDK({
 traceExporter: exporter,
 instrumentations: [new ExpressInstrumentation()]
});
sdk.start()

Now, this code has to be run before our app does, so that traces could be created.

The way to do so is by modifying the start script in our package.json to contain -r:

{
 "name": "otel-collector-sender",
 "version": "0.0.0",
 "private": true,
 "scripts": {
   "start": "node -r ./tracing.js ./bin/www"
 },
 "dependencies": {
   "@opentelemetry/api": "^1.0.4",
   "@opentelemetry/exporter-trace-otlp-http": "^0.28.0",
   "@opentelemetry/instrumentation-http": "^0.28.0",
   "@opentelemetry/sdk-node": "^0.28.0",
   "cookie-parser": "~1.4.4",
   "debug": "~2.6.9",
   "express": "~4.16.1",
   "http-errors": "~1.6.3",
   "jade": "~1.11.0",
   "morgan": "~1.9.1",
   "opentelemetry-instrumentation-express": "^0.27.1"
 }
}

Restart the app and refresh your browser at localhost:3000.

Head over to Jaeger, refresh your page, and select unknown_service:node from the panel on your left. Then hit Find Traces, and you should see a list of traces that have gone from the OTEL JS SDK to the collector to Jaeger.

Would look like this:

Pretty nice, isn’t it?

Now you may ask – “but you promised to add some attributes, where are they?” – I am glad you asked that.

Let’s see the HTTP GET trace, for example, pick any span, and check out the process area:

At the bottom, we have a test.key and test-value just like we defined in the collector.

You could use this to add any data you want, not just test-value. And of course, you could use any other type of processor as mentioned above.

But now, you might ask, what if I want to use a vendor instead of Jaeger? Before I show you how to add a vendor into the loop, let’s talk about usage with vendors.

Usage with observability vendors

Since the OpenTelemetry collector works with the standard OpenTelemetry specification, it is vendor agnostic. You could configure the collector to send data to any vendor that supports OpenTelemetry data by switching or adding another exporter to your collector.

Configuring Your Collector to Send Data to an Observability Vendor

Modify your collector-gateway.yaml file to look like this:

receivers:
 otlp:
   protocols:
     http:
       endpoint: 0.0.0.0:4318
     grpc:
       endpoint: 0.0.0.0:4317
processors:
 batch:
   timeout: 1s
 resource:
   attributes:
     - key: test.key
       value: "test-value"
       action: insert
exporters:
 logging:
   loglevel: info
 jaeger:
   endpoint: jaeger-all-in-one:14250
   insecure: true
 otlphttp:
   endpoint: https://otelcol-fast.aspecto.io
   headers:
     Authorization: YOUR-ASPECTO-TOKEN

extensions:
 health_check:
 pprof:
   endpoint: :1888
 zpages:
   endpoint: :55679
service:
 extensions: [pprof, zpages, health_check]
 pipelines:
   traces:
     receivers: [otlp]
     processors: [batch, resource]
     exporters: [logging, jaeger, otlphttp]

At Aspecto, you can sign up for free and use our generous free-forever plan (no limited features).

Just grab your token and paste it where it says YOUR-ASPECTO-TOKEN (https://app.aspecto.io/app/integration/token (Settings > Integrations > Tokens).

Now re-run using docker-compose up and refresh your localhost:3000 browser page.

Then, in the Aspecto app, the traces would look like this:

Drill down into the trace below, it would look like this:

Below you can see the test.key: “test-value” we added.

It sent it both to Aspecto and Jaeger, and we have a high level of visibility this way 🙂

If you want to see payloads and not just standard otel data, you can use Aspecto Node SDK written on top of OTEL. It adds this and other capabilities.

OpenTelemetry Collector Extensions

This article wouldn’t be complete without talking about the extensions you’ve seen earlier.

OpenTelemetry collector extensions provide additional capabilities that are not provided in the standard collector. In addition, these abilities are not part of the logs, metrics, and traces pipeline.

Let’s talk about the extensions you’ve seen above:

health_check – responsible for responding to health check calls on behalf of the collector.
PProf – fetches the collector’s performance data
zPages – serves as an http endpoint that provides live debugging data about instrumented components.

For more information on the above, read here.

That would sum up everything you need to know to get started with the OpenTelemetry collector.

If you want to learn more about the collector and OpenTelemetry in general, watch our OpenTelemetry Bootcamp. It’s a free and vendor-neutral video series (6-episodes) that you can use as your OpenTelemetry playbook. It contains everything you need from the basics to production deployment.