Your services likely emit a constant steam of telemetry data, but how does it actually travel from your applications and infrastructure to the observability backend where it's stored and analyzed?
For many, the answer is a chaotic web of vendor-specific agents, direct-to-backend SDK configurations, and disparate data shippers. This setup is brittle, expensive, hard to manage, and locks you into a single vendor's ecosystem.
There is a better way.
Instead of managing a maze of point-to-point integrations, we're going to build a telemetry pipeline: a centralized, vendor-neutral system that gives you complete control to collect, enrich, and route your observability data.
At the heart of this system is the OpenTelemetry Collector. It is a standalone service that acts as a universal receiver, a powerful processing engine, and a flexible dispatcher for telemetry data.
In this article, we'll build a telemetry pipeline from the ground up. You'll move from configuring basic data ingestion and exporting to discovering several processing techniques and designing complex data flows that help turn raw telemetry into actionable insights.
Let's get started!
The simplest possible pipeline
Every data pipeline needs an entry point and an exit. We'll start by building the most basic version of an OpenTelemetry pipeline imaginable. The goal is to receive telemetry data and print it directly to the console, confirming that data is flowing correctly before we add complexity.
The Collector's behavior is defined by a YAML configuration file. For this initial setup, you need to understand three top-level sections: receivers, exporters, and service.
Receivers
Receivers are the entry points for all telemetry data coming into the Collector from your applications and infrastructure.
They're configured to ingest data in various ways such as listening for network traffic, actively polling endpoints, reading from local sources (like files), or querying infrastructure APIs.
For example, the OTLP receiver sets up an endpoint that accepts data sent using the OpenTelemetry Protocol, while the Prometheus receiver periodically scrapes metrics from specified targets.
Exporters
Exporters are the final destinations for all telemetry data leaving the Collector after it has been processed.
They're responsible for translating data into the required format and transmitting it to various backend systems, such as observability platforms, databases, or message queues.
For example, the otlphttp exporter can send data to any OTLP-compatible backend over HTTP, while the debug exporter simply writes telemetry data to the console for debugging.
Service
The service section is the central orchestrator that activates and defines the flow of data through the Collector. No component is active unless it is enabled here.
It works by defining pipelines for each signal type (traces, metrics, or logs). Each pipeline specifies the exact path data will take by linking receivers, processors, and exporters.
For example, a traces pipeline could be configured to receive span data over OTLP, and fan it out to Jaeger through the OTLP exporter.
To see the three components in action, let's create our first configuration file:
# otelcol.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [otlp]
exporters: [debug]
This configuration creates a simple pipeline for logs alone. It sets up an otlp receiver to accept log data sent over GRPC on port 4317. Any logs received are immediately passed without any processing to the debug exporter, which then prints the full, detailed content to the Collector's stderr.
To test your pipeline, you need an application that can generate and send telemetry data. A convenient tool for this is otelgen, which produces synthetic logs, traces, and metrics.
You can define and run the Collector and the otelgen tool using the following Docker Compose configuration:
# docker-compose.yml
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.129.1
container_name: otelcol
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
restart: unless-stopped
otelgen:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"logs",
"multi",
]
depends_on:
- otelcol
networks:
otelnet:
driver: bridge
The otelgen service is configured via its command arguments to send telemetry that matches our Collector's setup:
-
--otel-exporter-otlp-endpoint otelcol:4317: Tellsotelgento send data to theotelcolservice on port 4317. -
--insecure: Disables TLS. -
logs: Instructsotelgento generate log data specifically. -
multi: A subcommand that generates a continuous, varied stream of logs.
To see it in action, start both services in detached mode:
docker compose up -d
Once running, the otelcol service listens for OTLP data over gRPC on port 4317, and the otelgen service generates and sends a continuous stream of logs to it.
You can monitor the Collector's output to verify that it's receiving the logs:
docker compose logs otelcol -f --no-log-prefix
You will see a continuous stream of log data being printed to the console. A single log entry will be formatted like this, showing rich contextual information like the severity, body, and various attributes:
ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-ab06ca8b)
-> service.name: Str(otelgen)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope otelgen
LogRecord #0
ObservedTimestamp: 2025-07-06 11:21:57.085421018 +0000 UTC
Timestamp: 2025-07-06 11:21:57.085420886 +0000 UTC
SeverityText: Error
SeverityNumber: Error(17)
Body: Str(Log 3: Error phase: finish)
Attributes:
-> worker_id: Str(3)
-> service.name: Str(otelgen)
-> trace_id: Str(46287c1c7b7eebea22af2b48b97f4a49)
-> span_id: Str(f5777521efe11f94)
-> trace_flags: Str(01)
-> phase: Str(finish)
-> http.method: Str(PUT)
-> http.status_code: Int(403)
-> http.target: Str(/api/v1/resource/3)
-> k8s.pod.name: Str(otelgen-pod-8f215fc5)
-> k8s.namespace.name: Str(default)
-> k8s.container.name: Str(otelgen)
Trace ID:
Span ID:
Flags: 0
Understanding the debug exporter output
The output from the debug exporter shows the structured format of OpenTelemetry data (OTLP). It's hierarchical, starting from the resource that generated the telemetry all the way down to the individual telemetry record. Let's break down what you're seeing.
ResourceLogs and Resource attributes
ResourceLog #0: This is the top-level container. The#0indicates it's the first resource in this batch, which means all telemetry within this block comes from the same resource.Resource attributes: These are key-value pairs that describe the entity that produced the log. This could be a service, a container, or a host machine. In the example, attributes likeservice.nameandk8s.pod.nameapply to every log generated by this resource.
ScopeLogs
ScopeLogs #0: Within a resource, telemetry is grouped by its origin, known as the instrumentation scope. This block contains a batch of logs from the same scope.InstrumentationScope: This identifies the specific library or module that generated the log (in this case,otelgen). This is useful for knowing which part of your application emitted the log.
LogRecord
Within a single ResourceLog block, you may see multiple LogRecord entries (#0, #1, #2, and so on), all belonging to the same resource.
-
LogRecord #0: This is the first log entry belonging to the resource. The key fields are:-
Timestamp: When the event occurred. -
SeverityNumber/SeverityText: The log level, such asERRORorINFO. -
Body: The actual log message content. -
Attributes: Key-value pairs that provide context specific to this single log event. -
Trace ID/Span ID: When populated, they directly link a log to a specific trace and span, allowing you to easily correlate logs and traces in your observability backend.
-
Congratulations, you've built and verified your first telemetry pipeline! It's simple, but it establishes the fundamental flow of data from a source, through the Collector, and to an exit point. Now, let's make it more powerful.
Processing and transforming telemetry
Right now, your pipeline is just an empty conduit. Data goes in one end and comes out the other untouched. The real power of the Collector lies in its ability to process data in-flight. This is where processors come in.
Processors are intermediary components in a pipeline that can inspect, modify, filter, or enrich your telemetry. Let's add a few essential processors to solve common problems and make the pipeline more intelligent.
Our new pipeline flow will look like this: Receiver -> [Processors] -> Exporter.
Batching telemetry for efficiency
Sending every single span or metric individually over the network is incredibly inefficient. It creates high network traffic and puts unnecessary load on the backend. The batch processor solves this by grouping telemetry into batches before exporting.
Go ahead and add it to your processors section. By default, it buffers data for a short period to create batches automatically:
# otelcol.yaml
# Add this top-level 'processors' section
processors:
batch:
# You can customize the default values for more control
# send_batch_size: 8192
# timeout: 200ms
service:
pipelines:
logs:
receivers: [otlp]
# Add the processor to your pipeline's execution path.
# Order matters here if you have multiple processors.
processors: [batch]
exporters: [debug]
With this simple addition, your pipeline now buffers data for up to 200 milliseconds or until it has 8192 items (whichever comes first) before it forwards data to the configured exporters.
Reducing noise by filtering telemetry
Telemetry data can be noisy. For example, frequent DEBUG level logs are often useful in development but are often superfluous in production except when debugging an active issue. Let's add a bouncer to our pipeline to drop this noise at the source.
We'll use the filter processor, which lets you drop telemetry data using the powerful OpenTelemetry Transformation Language (OTTL). Let's say you wanted to drop all logs below the INFO severity level, you can do so with the following modifications:
# otelcol.yaml
processors:
batch:
# The filter processor lets you exclude telemetry data based on its attributes
filter:
logs:
log_record:
- severity_number < SEVERITY_NUMBER_INFO
service:
pipelines:
logs:
receivers: [otlp]
# The order is important. You want to drop data before batching it.
processors: [filter, batch]
exporters: [debug]
Now, any log with a severity number less than 9 (INFO) will be dropped by the Collector and will never reach the debug exporter.
Modifying and enriching telemetry data
When you'd like to add, remove, or modify attributes in your telemetry data, there are a few general-purpose processors you can use:
-
resource processor: For actions targeting resource-level attributes (e.g.,
host.name,service.name). - attributes processor: For manipulating attributes of individual logs, spans, or metric datapoints.
- transform processor: The most powerful of the three, for performing complex transformations on any part of your telemetry data.
Some common use cases for these processors include:
- Redacting or removing sensitive information from telemetry before it leaves your systems.
- Enriching data by adding static attributes.
- Renaming or standardizing attributes to conform to semantic conventions across different services.
- Correcting malformed or misplaced data sent by older or misconfigured instrumentation.
Let's examine the structure of the OTLP log records being sent by the otelgen tool once again:
ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-b9919c90)
-> service.name: Str(otelgen)
LogRecord #0
ObservedTimestamp: 2025-07-03 16:01:40.264711241 +0000 UTC
Timestamp: 2025-07-03 16:01:40.264711041 +0000 UTC
SeverityText: Fatal
SeverityNumber: Fatal(21)
Body: Str(Log 1763: Fatal phase: finish)
Attributes:
-> worker_id: Str(1763)
-> service.name: Str(otelgen)
-> trace_id: Str(a85d432127e63d667508563efd73af52)
-> span_id: Str(34c07d59e6cfa2d9)
-> trace_flags: Str(01)
-> phase: Str(finish)
-> http.method: Str(POST)
-> http.status_code: Int(200)
-> http.target: Str(/api/v1/resource/1763)
-> k8s.pod.name: Str(otelgen-pod-b9919c90)
-> k8s.namespace.name: Str(default)
-> k8s.container.name: Str(otelgen)
Trace ID:
Span ID:
Flags: 0
There are (at least) three issues here that deviate from the correct OpenTelemetry data model and semantic conventions:
Misplaced trace context: The
trace_id,span_id, andtrace_flagsvalues are incorrectly placed inside theAttributesmap, while the dedicated top-levelTrace ID,Span ID, andFlagsfields are empty.Redundant attributes: Resource attributes like
k8s.pod.nameandservice.nameare duplicated in the log record'sAttributes.Deprecated attributes: HTTP attributes like
http.method,http.target, andhttp.status_codehave all been deprecated in favor of newer attributes.
The transform processor is the perfect tool for fixing these issues. Add the following modifications to your otelcol.yaml file:
# otelcol.yaml
processors:
transform:
log_statements:
# Move trace context from attributes to the correct top-level fields
- context: log
statements:
- set(trace_id.string, attributes["trace_id"])
- set(span_id.string, attributes["span_id"])
- set(flags, Int(attributes["trace_flags"]))
# Delete the original, now redundant, trace context attributes
- context: log
statements:
- delete_key(attributes, "trace_id")
- delete_key(attributes, "span_id")
- delete_key(attributes, "trace_flags")
# Delete the duplicated resource attributes from the log record's attributes
- context: log
statements:
- delete_key(attributes, "k8s.pod.name")
- delete_key(attributes, "k8s.namespace.name")
- delete_key(attributes, "k8s.container.name")
- delete_key(attributes, "service.name")
service:
pipelines:
logs:
receivers: [otlp]
processors: [filter, transform, batch] # Add the transform processor to the pipeline
exporters: [debug]
This configuration uses OTTL statements to clean up the log records:
set(trace_id, ...): This function takes the value from thetrace_idkey within the attributes map and sets it as the top-levelTrace IDfor the log record. The same logic applies to the other statements.delete_key(attributes, ...): After moving the values, this function removes the original keys from the attributes map to eliminate redundancy.
You can recreate the containers to see it in action:
docker compose up --force-recreate -d
When you check the logs, you'll notice that the outgoing log data is now correctly formatted, smaller in size, and fully compliant with semantic conventions with the Trace ID and Span ID fields properly populated:
2025-07-03T16:36:49.418Z info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-3efafa6f)
-> service.name: Str(otelgen)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope otelgen
LogRecord #0
ObservedTimestamp: 2025-07-03 16:36:48.41161663 +0000 UTC
Timestamp: 2025-07-03 16:36:48.411616563 +0000 UTC
SeverityText: Error
SeverityNumber: Error(17)
Body: Str(Log 38: Error phase: finish)
Attributes:
-> worker_id: Str(38)
-> phase: Str(finish)
-> url.path: Str(/api/v1/resource/340)
-> http.response.status_code: Int(200)
-> http.request.method: Str(GET)
Trace ID: 86713e2736d6f6a398047b9317b11398
Span ID: d06e86785766aa64
Flags: 1
Ensuring resilience with the Memory Limiter
An overloaded service could suddenly send a massive flood of data, overwhelming the Collector and causing it to run out of memory and crash. This would create a total visibility outage.
The memory_limiter processor acts as a safety valve to prevent this. It monitors memory usage and starts rejecting data if it exceeds a configured limit, enforcing backpressure on the data source.
# otelcol.yaml
processors:
batch:
filter: # ...
transform: # ...
memory_limiter:
# How often to check the collector's memory usage.
check_interval: 1s
# The hard memory limit in Mebibytes (MiB). If usage exceeds this,
# the collector will start rejecting new data.
limit_mib: 400
# A soft limit. When usage drops below this, the collector will
# start accepting data again.
spike_limit_mib: 100
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug]
Note that the memory_limiter should come first in your pipeline's processor list. If it's over the limit, you want to reject data immediately, before wasting CPU cycles on other processing.
Handling multiple signals with parallel pipelines
So far, you've built a simple pipeline for processing logs. However, a key strength of the OpenTelemetry Collector is its ability to manage all signals simultaneously within a single instance. You can achieve this by defining parallel pipelines, one for each signal type, in the service section.
Let's expand the configuration to also process traces. The goal is to receive traces from an application, batch them for efficiency, and then send them to a Jaeger instance for visualization, while the existing logs pipeline continues to operate independently (writing to the console as before).
To send traces to Jaeger, you can use the OTLP exporter. You can provide an identifying name for any component using the type/name syntax as follows:
# otelcol.yaml
exporters:
otlp/jaeger:
endpoint: jaeger:4317 # The address of the Jaeger gRPC endpoint
tls:
insecure: true # Use TLS in production
You can now add a new pipeline to the service section specifically for traces. This pipeline will:
- Reuse the same
otlpreceiver we already defined. - Reuse the
batchandmemory_limiterprocessors. - Send its data to the new
otlp/jaegerexporter.
Here is the complete service section showing both the logs and traces pipelines running in parallel:
# otelcol.yaml
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger]
To test this, you need to update your docker-compose.yml to run a Jaeger instance and a second otelgen service configured to generate traces:
# docker-compose.yml
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.129.1
container_name: otelcol
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one:1.71.0
container_name: jaeger
ports:
- 16686:16686
otelgen-logs:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen-logs
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"logs",
"multi",
]
depends_on:
- otelcol
otelgen-traces:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen-traces
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"--duration",
"86400",
"traces",
"multi",
]
depends_on:
- otelcol
networks:
otelnet:
driver: bridge
Notice we now have two otelgen services: otelgen-logs sends logs as before, and otelgen-traces sends traces to the same OTLP endpoint on our Collector.
Recreate the containers with the updated configuration:
docker compose up --force-recreate --remove-orphans -d
While the otelcol logs will continue showing the processed logs, the easiest way to verify the traces pipeline is to check the Jaeger UI.
Open your web browser and navigate to http://localhost:16686. In the Jaeger UI, select otelgen from the Service dropdown menu and click Find Traces.
You will see a list of traces generated by the otelgen-traces service, confirming that your new traces pipeline is successfully receiving, processing, and exporting trace data to Jaeger.
With this setup, you have a single Collector instance efficiently managing two completely separate data flows, demonstrating the power and flexibility of defining multiple pipelines.
Fanning out to multiple destinations
A key advantage of the OpenTelemetry Collector is its ability to easily route telemetry to multiple destinations at once, a concept often called "fanning out". This is done by simply adding more exporters to a pipeline's exporters list.
Let's demonstrate this by forwarding both our logs and traces to Dash0, an OpenTelemetry-native platform, in addition to the existing destinations.
You'll need to sign up for a free trial first, find the OpenTelemetry Collector integration, and copy your authentication token and Dash0 endpoint (OTLP via gRPC) into your configuration file:
# otelcol.yaml
exporters:
# [...]
otlp/dash0:
endpoint: <your_dash0_endpoint>
headers:
Authorization: Bearer <your_dash0_token>
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug, otlp/dash0]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger, otlp/dash0]
With this change, both pipelines now fan out the processed data to the specified exporters. You'll see the data in your Dash0 dashboard as follows:
This capability allows you to experiment with multiple backends, migrate between vendors without downtime, or satisfy other uses for your telemetry without touching your application code.
Chaining pipelines with connectors
You can create powerful data processing flows by generating new telemetry signals from your existing data.
This is possible with connectors. A connector is a special component that acts as both an exporter for one pipeline and a receiver for another, allowing you to chain pipelines together.
Let's demonstrate this by building a system that generates an error count metric from the otelgen log data. The count connector is perfect for this.
First, you'll need to define the count connector and configure it to create a metric named log_error.count that increments every time it sees a log with a severity of ERROR or higher:
# otelcol.yaml
connectors:
count/log_errors:
logs:
log_error.count:
description: count of errors logged
conditions:
- severity_number >= SEVERITY_NUMBER_ERROR
To use this, go ahead and update your service configuration to create a new metrics pipeline. The count/log_errors connector will serve as the bridge: it will be an exporter for the logs pipeline and a receiver for the new metrics pipeline:
# otelcol.yaml
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
# The connector is added as a destination for logs.
exporters: [debug, otlp/dash0, count/log_errors]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger, otlp/dash0]
metrics:
# This new pipeline receives data exclusively from the connector.
receivers: [count/log_errors]
processors: [memory_limiter, batch]
exporters: [otlphttp/dash0]
This configuration is a game-changer because it allows you to derive new insights from existing data streams directly within the Collector. The data flow is now:
- The
logspipeline processes logs and sends a copy to thecount/log_errorsconnector. - The
count/log_errorsconnector inspects these logs, generates a newlog_error.countmetric based on our condition, and passes this metric along. sends them to your backend. - The
metricspipeline receives the newly generated metric, batches it, and sends them to your backend.
After relaunching the services, you'll see the new log_error.count metric appear in your dashboard, all without adding a single line of metrics instrumentation code to your application.
This is a basic example, but it demonstrates the power of a true pipeline architecture. The same principle can be used for more advanced scenarios, like using the spanmetrics connector to automatically generate full RED metrics (request rates, error counts, and duration histograms) directly from your trace data.
Understanding Collector distributions
When you use the OpenTelemetry Collector, you're not running a single, monolithic application. Instead, you use a distribution: a specific binary packaged with a curated set of components (receivers, processors, exporters, and extensions).
This model exists to allow you to use a version of the Collector that is tailored to your specific needs or even create your own. There are three primary types of distributions you will encounter:
1. Official OpenTelemetry distributions
The OpenTelemetry project maintains several official distributions. The two most common are:
Core (
otelcol): This is a minimal, lightweight distribution that includes only the most essential and stable components. It provides a stable foundation but has limited functionality.Contrib (
otelcol-contrib): This is the most comprehensive version, that includes almost every component from both the core and contrib repositories. It is the recommended distribution for getting started, as it provides the widest range of capabilities for connecting to various sources and destinations without needing to build a custom version.
2. Vendor distributions
Some observability vendors provide their own Collector distributions. These are typically based on the otelcol-contrib distribution but are pre-configured with the vendor's specific exporter and other recommended settings. Using a vendor distribution can simplify the process of sending data to that vendor's platform.
3. Custom distributions
For production environments, the recommended practice is to build your own custom distribution. This involves creating a lightweight, fit-for-purpose Collector binary that contains only the components you need.
You can create a custom distribution using the OpenTelemetry Collector Builder (ocb) tool. It involves creating a simple manifest file that lists the components you want to include, and then running the ocb tool to compile your custom binary.
You can learn more about building a custom Collector distribution by reading this guide.
Debugging and observing your pipeline
A critical piece of infrastructure like your telemetry pipeline must itself be observable and easy to debug. If the Collector is dropping data, experiencing high latency, or is unhealthy, you definitely need to know about it.
Fortunately, the Collector is instrumented out-of-the-box and provides several tools for validation and observation.
Validating your configuration
Before deploying the Collector, you should always validate that your config.yaml file is syntactically correct. The primary way to do this is with the validate subcommand which checks the configuration file for errors without starting the full Collector service:
otelcol-contrib validate --config=otelcol.yaml
If the configuration is valid, the command will exit silently. If there are errors, it will print them to the console:
You can also use OtelBin to visualize your pipeline, and validate it against various Collector distributions before you deploy.
If you're complex OTTL statements in the transform or filter processor, you will also find the OTTL Playground to be a useful resource for understanding how different configurations impact the OTLP data transformation.
Live debugging
When building a pipeline, you'll often need to inspect the data flowing through it in real-time. As you've already seen, the debug exporter is primary way to do this.
By adding it to any pipeline's exporters list, you can print the full content of traces, metrics, or logs to the console, and verify that your receivers and processors are working as expected.
For debugging the Collector components themselves, you can enable the zPages extension:
# otelcol.yaml
extensions:
zpages: # default endpoint is localhost:55679
service:
extensions: [zpages]
pipelines:
# ... your pipelines
Once the Collector is running, you can access several useful debugging pages in your browser, such as /debug/pipelinez to view your pipeline components or /debug/tracez to see recently sampled traces.
Observing the Collector's internal telemetry
In production environments, you'll need to monitor the Collector's health and performance over time. This is configured under the service.telemetry section.
By default, the Collector sends its own internal logs to stderr, and its often the first place you'll check when there's a problem with your pipeline. For metrics, the Collector can expose its own data in a Prometheus-compatible format:
# otelcol.yaml
service:
telemetry:
metrics:
readers:
- pull:
exporter:
prometheus:
host: "0.0.0.0"
port: 8888
You can now scrape this endpoint with a Prometheus instance to monitor key health indicators like otelcol_exporter_send_failed_spans_total, otelcol_processor_batch_send_size, and otelcol_receiver_accepted_spans.
You can also push the metrics to an OTLP-compatible backend using the following configuration:
# otelcol.yaml
service:
telemetry:
metrics:
readers:
- periodic:
exporter:
otlp:
protocol: http/protobuf
endpoint: https://backend:4318
For more details, see the official documentation on Collector telemetry.
Going to production: Collector deployment patterns
How you run the Collector in production is a critical architectural decision. Your deployment strategy affects the scalability, security, and resilience of your entire observability setup. The two fundamental roles a Collector can play are that of an agent or a gateway, which can be combined into several common patterns.
1. Agent-only deployment
The simplest pattern is to deploy a Collector agent on every host or as a sidecar to every application pod. In this model, each agent is responsible for collecting, processing, and exporting telemetry directly to one or more backends.
Application → OpenTelemetry Collector (Agent) → Observability Backend
This approach is easy to start with but it offers limited durability, as agents typically buffer in memory, meaning a single node failure can lead to data loss.
2. Agent and gateway deployment
A more robust production pattern enhances the agent deployment with a new, centralized gateway layer. In this model, the agent's role is simplified: it handles local collection and metadata enrichment before forwarding all telemetry to the gateway.
This gateway is a standalone, centralized service consisting of one or more Collector instances that receive telemetry from all agents. It's the ideal place for heavy processing like PII scrubbing, filtering, and tail-based sampling, which ensures rules are applied consistently before data leaves your environment.
Application → Collector (Agent) → Collector (Gateway) → Observability Backend
This layered approach provides the best of both worlds: agents handle local collection and metadata enrichment efficiently, while the gateway provides centralized control, security, and processing.
High-scale deployment with a message queue
When you're dealing with massive data volumes or require extreme durability, the standard pattern is to introduce an event queue (like Apache Kafka) between your agents and a fleet of Collectors that act as consumers.
Application → Collector (Agent) → Message Queue → Collector (Aggregator) → Backend(s)
This pattern provides two advantages. The message queue acts as a massive buffer for durability; even if the aggregator fleet is down, agents can continue sending data to the queue, preventing data loss.
It also provides load-leveling by decoupling the agents from the aggregators, which smooths out traffic spikes and allows the aggregators to consume data at a steady rate.
You're now a pipeline architect
We've journeyed from a simple data pass-through to a powerful, multi-stage pipeline that enriches, filters, and routes telemetry data, even generating new, valuable signals along the way.
By adopting the pipeline mindset, you gain:
- Centralized control, allowing you to manage your entire telemetry flow from one place.
- Vendor neutrality so you can swap backends with a simple config change or use multiple vendors at once.
- Efficiency and cost savings by applying consistent filtering policies across your entire environment which ultimately reduces your observability bill.
- Enhanced security by scrubbing sensitive data before it ever leaves your infrastructure.
- Powerful capabilities using advanced patterns like metric generation that would be complex or impossible otherwise.
To complete the picture, you need to send that data to an OpenTelemetry-native observability platform like Dash0 that makes it easy to quickly move from raw data to insight.
For more on the Collector itself, refer to the official documentation. Thanks for reading!












Top comments (0)