I decided tp configure OTel in 9109679196/piper-tts-rest-api, but then I decided to share my findings about OpenTelemetry Collector configuration. And JFYI you can find the official collector configuration documentation here.
extensions:
health_check:
endpoint: 0.0.0.0:13133
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
send_batch_size: 512
send_batch_max_size: 2048
# Tail sampling β always keep error traces, probabilistically sample the rest.
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 50
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 100
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
debug:
verbosity: basic
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/jaeger]
Big picture
An OpenTelemetry Collector configuration is usually built from these main sections:
-
receivers: how telemetry gets into the Collector. -
processors: how telemetry is modified, grouped, filtered, sampled, or enriched. -
exporters: where telemetry is sent after processing. -
extensions: extra Collector functionality that is not directly part of the telemetry pipeline. -
service: where you enable extensions and wire receivers, processors, and exporters together into pipelines.
Official docs:
- Collector configuration: https://opentelemetry.io/docs/collector/configuration/
- Collector overview: https://opentelemetry.io/docs/collector/
extensions
extensions:
health_check:
endpoint: 0.0.0.0:13133
What it does
extensions define extra behavior for the Collector itself. Extensions are not directly involved in receiving, processing, or exporting traces, metrics, or logs.
In this config, you enable the health_check extension.
health_check
health_check:
endpoint: 0.0.0.0:13133
This exposes an HTTP health endpoint for the Collector.
In Docker Compose, another container can call:
http://otel-collector:13133/
This is useful for checking whether the Collector process is alive and ready enough to respond.
endpoint: 0.0.0.0:13133
This means the health check server listens on all network interfaces inside the container on port 13133.
If you used only localhost:13133, another container might not be able to reach it. For Docker Compose, 0.0.0.0:13133 is usually the practical choice.
Official docs:
- Health check extension: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/healthcheckextension
receivers
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
What it does
A receiver is how telemetry enters the Collector.
Your config enables the otlp receiver. OTLP means OpenTelemetry Protocol. It is the standard protocol used by OpenTelemetry SDKs and instrumentation libraries to send traces, metrics, and logs.
Official docs:
- Receivers in Collector configuration: https://opentelemetry.io/docs/collector/configuration/#receivers
- OTLP receiver: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
otlp
otlp:
This configures the Collector to accept telemetry in OTLP format.
For your NestJS app, this is the receiver your app will send traces to.
protocols
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
The OTLP receiver can listen using HTTP and/or gRPC.
You enabled both:
- OTLP/HTTP on port
4318 - OTLP/gRPC on port
4317
http.endpoint: 0.0.0.0:4318
This makes the Collector listen for OTLP over HTTP.
Your app would usually send HTTP OTLP traces to something like:
http://otel-collector:4318/v1/traces
grpc.endpoint: 0.0.0.0:4317
This makes the Collector listen for OTLP over gRPC.
Your app would usually send gRPC OTLP traces to:
otel-collector:4317
Use this if your SDK/exporter is configured for OTLP/gRPC.
processors
processors:
batch:
timeout: 5s
send_batch_size: 512
send_batch_max_size: 2048
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 50
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 100
What processors do
Processors sit between receivers and exporters.
They can batch, filter, sample, enrich, transform, or otherwise modify telemetry before it is exported.
In your trace pipeline, processors are applied in this order:
processors: [tail_sampling, batch]
That means:
- First, the Collector decides which traces to keep using
tail_sampling. - Then, the kept traces are batched using
batchbefore sending them to Jaeger.
This order makes sense. Sampling drops data first, and batching groups the remaining data before export.
Official docs:
- Processors in Collector configuration: https://opentelemetry.io/docs/collector/configuration/#processors
batch processor
batch:
timeout: 5s
send_batch_size: 512
send_batch_max_size: 2048
What it does
The batch processor groups spans together before exporting them.
This is useful because sending many small requests to your backend is inefficient. Batching can reduce network overhead and improve export performance.
Official docs:
- Batch processor: https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
timeout: 5s
The Collector sends a batch after 5s, even if the batch has not reached send_batch_size.
So this puts a maximum wait time on batching.
send_batch_size: 512
This is the batch size trigger.
When the Collector has collected around 512 spans, metric data points, or log records, it will send a batch.
In your trace pipeline, this applies to spans.
send_batch_max_size: 2048
This is the maximum batch size allowed.
If a batch becomes larger than 2048, the processor will split it into smaller batches.
A key detail: send_batch_max_size must be greater than or equal to send_batch_size.
Your config is valid because:
2048 >= 512
tail_sampling processor
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 50
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 100
What it does
Tail sampling means the Collector waits until it has seen enough spans from a trace before deciding whether to keep or drop that trace.
This is different from head sampling, where the decision is made at the beginning of the trace.
Tail sampling is useful because you can make decisions based on the final trace result. For example:
- keep traces with errors;
- keep slow traces;
- keep traces matching specific services or attributes;
- sample only a percentage of normal successful traces.
Official docs:
- Tail sampling processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor
- Tail sampling examples: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/testdata/tail_sampling_config.yaml
Important production note
Tail sampling needs all spans for the same trace to arrive at the same Collector instance. If you run multiple Collector replicas later, you need to be careful with load balancing so spans from the same trace are routed to the same Collector.
For local Docker Compose with one Collector instance, this is fine.
decision_wait: 10s
The Collector waits 10s from the first span of a trace before making a sampling decision.
This gives other spans in the same distributed trace time to arrive.
If your traces are long or your services are slow, you may need a higher value. If you want faster export, you may reduce it.
Trade-off:
- Higher value: better chance of complete traces, but more memory usage and more delay.
- Lower value: less memory and faster export, but higher chance of incomplete sampling decisions.
num_traces: 50000
This is the number of traces the processor keeps in memory while waiting to make sampling decisions.
Higher traffic usually needs a higher value.
If this is too low, the Collector may be forced to drop or evict traces before making the right decision.
expected_new_traces_per_sec: 50
This tells the processor roughly how many new traces per second to expect.
It helps the processor allocate internal data structures more efficiently.
For local development, 50 is usually fine unless you generate a lot of load.
policies
Policies define which traces should be sampled/kept.
You configured two policies:
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 100
Tail sampling policy: keep-errors
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
What it does
This policy keeps traces where the span status code is ERROR.
This is useful because error traces are usually the traces you most want to inspect in Jaeger.
name: keep-errors
A human-readable name for this policy.
type: status_code
This means the policy makes its decision based on span status code.
status_codes: [ERROR]
This means traces with error spans should be sampled/kept.
Tail sampling policy: probabilistic-sample
- name: probabilistic-sample
type: probabilistic
probabilistic:
sampling_percentage: 100
What it does
This policy samples a percentage of traces.
In your current config:
sampling_percentage: 100
That means keep 100% of traces.
So right now, your Collector keeps:
- all error traces because of
keep-errors; - all other traces because
probabilistic-sampleis100%.
For local development, this is fine because you probably want to see everything.
For production or heavier load testing, you might reduce this, for example:
sampling_percentage: 10
That would keep all error traces, plus approximately 10% of other traces.
exporters
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
debug:
verbosity: basic
What exporters do
Exporters send telemetry out of the Collector to another system.
In your config, there are two exporters defined:
otlp/jaegerdebug
But only otlp/jaeger is currently used in the trace pipeline.
Official docs:
- Exporters in Collector configuration: https://opentelemetry.io/docs/collector/configuration/#exporters
- OTLP exporter: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
- Debug exporter: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/debugexporter
otlp/jaeger exporter
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
What it does
This exporter sends telemetry from the Collector to Jaeger using OTLP/gRPC.
The name has two parts:
otlp/jaeger
-
otlpis the exporter type. -
jaegeris a custom name you gave to this exporter instance.
This is useful when you want multiple exporters of the same type with different destinations.
endpoint: jaeger:4317
This tells the Collector to send data to the service named jaeger on port 4317.
In Docker Compose, jaeger is resolved by Docker's internal DNS to your Jaeger container.
Port 4317 is the usual OTLP/gRPC port.
tls.insecure: true
This disables TLS for this exporter connection.
For local Docker Compose, this is normal because the Collector and Jaeger communicate inside a local Docker network.
For production, you should think carefully before using insecure transport.
debug exporter
debug:
verbosity: basic
What it does
The debug exporter prints telemetry to the Collector logs.
This is useful when troubleshooting because you can verify whether the Collector is receiving and processing telemetry.
However, your current pipeline does not use it:
exporters: [otlp/jaeger]
So defining this exporter does nothing unless you add it to a pipeline.
For example:
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/jaeger, debug]
Then traces would be sent to Jaeger and also printed in the Collector logs.
service
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/jaeger]
What it does
The service section is where you enable configured components.
Defining a receiver, processor, exporter, or extension is not always enough. You usually also need to reference it in the service section.
Official docs:
- Service section and pipelines: https://opentelemetry.io/docs/collector/configuration/#service
service.extensions
extensions: [health_check]
This enables the health_check extension that you configured earlier.
Without this line, the health_check config would exist, but it would not actually be started.
service.pipelines
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/jaeger]
A pipeline defines the path telemetry follows through the Collector.
Your config defines only one pipeline: traces.
That means this Collector configuration is currently only handling traces. It is not handling metrics or logs yet.
Trace pipeline flow
Your trace data flows like this:
NestJS app
-> OTLP receiver
-> tail_sampling processor
-> batch processor
-> OTLP exporter
-> Jaeger
Or in config terms:
otlp -> tail_sampling -> batch -> otlp/jaeger
receivers: [otlp]
The trace pipeline receives spans from the OTLP receiver.
processors: [tail_sampling, batch]
The trace pipeline first applies tail sampling, then batching.
exporters: [otlp/jaeger]
The trace pipeline sends the final spans to Jaeger.
Practical notes for your current setup
1. Your current sampling keeps everything
Because you configured:
sampling_percentage: 100
You are keeping all traces.
That is OK for local development.
If you want to keep all error traces but only some successful traces, use something like:
sampling_percentage: 10
2. debug exporter is configured but unused
You define:
debug:
verbosity: basic
But your pipeline only uses:
exporters: [otlp/jaeger]
So the debug exporter is inactive.
Use this during troubleshooting:
exporters: [otlp/jaeger, debug]
3. You only configured traces
Your pipeline is:
pipelines:
traces:
So this config handles traces only.
If later you want metrics or logs, you need separate pipelines, for example:
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
4. The health check port does not need to be published to your host
For Docker Compose internal checks, this is enough:
http://otel-collector:13133/
You only need this in ports:
- "13133:13133"
if you want to call the health endpoint from your host machine or browser.
Official documentation links
- OpenTelemetry Collector overview: https://opentelemetry.io/docs/collector/
- Collector configuration: https://opentelemetry.io/docs/collector/configuration/
- Health check extension: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/healthcheckextension
- OTLP receiver: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
- Batch processor: https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
- Tail sampling processor: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor
- Tail sampling example config: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/testdata/tail_sampling_config.yaml
- OTLP exporter: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
- Debug exporter: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/debugexporter

Top comments (0)