The system-level symptoms are familiar: trace ingestion spikes that trigger throttling, backend query latencies rising under index pressure, dashboards that show stable metrics but miss the critical error traces that explain the outage, and divergent sampling behavior across teams because sampling lives in different places (SDKs, sidecars, collectors). Every one of those symptoms points to a lack of centralized sampling policy and observability over sampling decisions.
Contents
- [Why Sampling Is Non-Negotiable for Production Tracing]
- [Compare Sampling Strategies: Probabilistic, Rate-Limiting, and Tail-Based]
- [How to Implement Sampling in the OpenTelemetry Collector (concrete configs)]
- [How Adaptive Sampling and Dynamic Rules Keep Costs Predictable]
- [Actionable Checklist: Implement a Global Adaptive Sampling Pipeline]
Why Sampling Is Non-Negotiable for Production Tracing
Sampling is not a cost-cutting nicety; it’s an architectural control. Traces impose three distinct costs: application-side overhead (CPU/memory and network), collector-side state and CPU to reassemble traces, and backend costs for ingest, indexing, and long-term retention. When you instrument broadly and run without a plan, you pay all three costs for most traffic that’s routine and uninteresting. OpenTelemetry SDKs provide deterministic head samplers such as TraceIdRatioBasedSampler to control generation at the source, and the collector provides processors to control ingest and retention across tiers.
Two operational truths steer good design:
- Sampling at the source (head sampling) reduces application overhead and network volume, but it makes later, context-aware decisions impossible because child spans can be dropped at creation.
- Collector-side sampling (tail sampling) can make richer decisions because it observes whole traces, but it requires stateful processors and memory sizing trade-offs.
When total trace traffic grows beyond a few hundred to a few thousand traces per second for a single cluster, you need a systematic sampling approach (many vendors recommend evaluating sampling when you exceed ~1,000 traces/sec).
Compare Sampling Strategies: Probabilistic, Rate-Limiting, and Tail-Based
Choosing the right sampler is about matching decision time to decision quality and cost.
| Strategy | Decision point | Pros | Cons | Typical OpenTelemetry implementation |
|---|---|---|---|---|
| Probabilistic (head-based) | At span creation or collector stateless hash | Very low overhead, deterministic, easy to reason about | May drop interesting traces; incomplete traces if front-end and back-end use different probabilities | SDK TraceIdRatioBasedSampler or Collector probabilistic_sampler. |
| Rate‑limiting | Head or remote control plane, token/leaky-bucket | Guarantees steady ingest rate, protects backend budget | Can bias results toward recent bursts; needs careful per-service tuning | Jaeger remote/rate-limiting or collector tail_sampling rate-limiting policy. |
| Tail‑based | After trace completes (collector) | Keeps rare events (errors, slow traces); policy-rich (attributes, latency) | Requires stateful collectors, memory sizing, decision latency | Collector tail_sampling processor (policies: status_code, latency, probabilistic, rate_limiting, composite). |
Key facts you must account for:
- Head samplers like
TraceIdRatioBasedSamplerimplement deterministic sampling via TraceID hashing so different hosts can make consistent decisions. - Collector
probabilistic_samplerperforms consistent hashing too and exposeshash_seedto coordinate sampling across collector tiers. -
tail_samplingsupports rich policy types (error, latency, string/numeric attributes, byte/span rate limits, composite allocation) and needsdecision_waitand memory sizing. Policy and implementation details live in the collector contrib docs.
How to Implement Sampling in the OpenTelemetry Collector (concrete configs)
Practical pipeline patterns converge on two core ideas: generate metrics before sampling, and centralize complex decisions in a stateful pool of collectors. The following YAML is a compact, production-oriented example you can adapt.
receivers:
otlp:
protocols:
grpc:
http:
processors:
memory_limiter:
check_interval: 5s
limit_mib: 1024
spike_limit_mib: 256
# Head-like collector probabilistic sampler (stateless, quick)
probabilistic_sampler:
sampling_percentage: 10.0
hash_seed: 42
# Tail sampler: decision_wait / num_traces sizing must match your workload
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 500
policies:
- name: retain-errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-requests
type: latency
latency: { threshold_ms: 1000 }
- name: sampling-fallback
type: probabilistic
probabilistic: { sampling_percentage: 1.0 }
exporters:
otlp/tempo:
endpoint: "tempo:4317"
service:
pipelines:
traces/metrics:
receivers: [otlp]
processors: [memory_limiter] # do not batch before tail sampling/groupbytrace
exporters: [otlp/metrics-backend]
traces/sampled:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, probabilistic_sampler, batch]
exporters: [otlp/tempo]
Implementation notes:
- The
tail_samplingprocessor’sdecision_waitcontrols how long the collector waits for the rest of a trace before making a decision; a common default is 30s but values should match your system’s maximum trace duration and SLOs for trace availability. - Compute
num_tracesconservatively asexpected_new_traces_per_sec * decision_wait * safety_factorso the collector can hold the working set of traces in memory; many distributions provide guidance and metrics to detect eviction. - Never put a
batchprocessor upstream of components that need full trace context (for examplegroupbytrace,tail_sampling) because batching can split spans across pushes and break reassembly.
Small SDK example for head sampling (Node.js):
// Node.js example: sample ~1% at SDK
import { NodeSDK } from '@opentelemetry/sdk-node';
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
sampler: new TraceIdRatioBasedSampler(0.01)
});
await sdk.start();
That head sampler reduces network and backend load but intentionally sacrifices the option to reconstitute traces later for tail decisions.
Important: Generate span-derived metrics (span metrics / exemplars) before applying tail-based sampling so metric aggregates remain accurate; sampling at the wrong place will skew latency and error-rate metrics.
How Adaptive Sampling and Dynamic Rules Keep Costs Predictable
Adaptive sampling is the control-plane pattern that converts throughput and value signals into sampling probabilities that meet a target budget. The pattern has three parts:
- Observability of incoming traffic (per-service, per-operation TPS, error rate, latency distribution).
- A controller or engine that computes per-key probabilities against a budget/target (for example,
target_samples_per_secondfor each service). - A distribution mechanism that pushes sampling probabilities to the decision point (SDK remote sampler, collector policies, or a dedicated sampler like Jaeger’s remote sampling engine).
Jaeger’s adaptive/remote sampling model recalculates per-service/per-operation probabilities so the collected trace volume matches target_samples_per_second; new services are sampled at an initial_sampling_probability until enough data exists to stabilize the estimate. That engine requires a sampling_store to hold observed traffic and computed probabilities.
Practical patterns you’ll use:
- Keep an always-sample policy for critical flows (auth, billing) and for error traces (
status_code == ERROR) viatail_sampling. This preserves fidelity for high-business-value areas. - Use a composite policy to allocate a fixed portion of the sampling budget to different classes (errors, slow paths, high-cardinality features) and let a probabilistic fallback fill remaining capacity.
tail_samplingsupportscompositeandrate_allocation. - Implement a feedback loop where backend ingestion metrics (sampled traces/s, dropped traces/s, tail-sampler evictions, collector memory pressure) feed the adaptive engine. Many distributions export collector self-metrics to help tune
num_tracesand observe when decisions are evicted.
Adaptive sampling examples in the wild include Jaeger’s remote/adaptive engine and Honeycomb’s Refinery (a trace-aware tail-sampling proxy). Those systems show the trade-offs between centralized control and the operational complexity of stateful components.
Actionable Checklist: Implement a Global Adaptive Sampling Pipeline
-
Inventory and baseline.
- Measure current trace TPS per service and 95th/99th trace duration for a 7–14 day window.
- Record backend cost per million traces and current retention policy to set a budget.
-
Decide sampling layers.
- Use SDK head sampling (
TraceIdRatioBasedSampler) for coarse volume control where application-side resource savings matter. - Use collector probabilistic sampling (
probabilistic_sampler) as a stateless, consistent second tier for large but predictable traffic. - Use collector tail sampling for business-critical flows and to retain error/latency traces.
- Use SDK head sampling (
-
Define initial policy bank (expressed as
tail_samplingpolicies).-
always_samplefor critical services. -
status_codepolicy to keep errors. -
latencypolicy for slow requests above athreshold_ms. -
probabilisticfallback for low-priority traffic. - Consider
rate_limitingorbytes_limitingpolicies to cap steady-state budget.
-
-
Size stateful components.
- Set
decision_waitto slightly above your max observed trace duration (e.g., max duration + 25% headroom). - Compute
num_traces >= expected_new_traces_per_sec * decision_wait * 1.5. Monitor eviction metrics such asotelcol_processor_groupbytrace_traces_evictedand increase sizing if > 0.
- Set
-
Instrument sampling telemetry (metrics and attributes).
- Export and alert on:
- Incoming traces/sec (ingest TPS)
- Sampled traces/sec (per service)
- Tail-sampler cached decisions hit/miss and eviction counters
- Collector memory and CPU utilization
- Backend ingest error/latency and cost metrics
- Tag sampled spans with a
sampler.*attribute showing the policy orSampleRateso the backend can compensate for weighting when calculating aggregates. Honeycomb-styleSampleRateattributes allow correct aggregation of counts.
- Export and alert on:
-
Rollout and validate.
- Roll sample-rate changes in a canary group (non-critical namespaces) and compare detection rates for known incidents.
- Validate that SLO-related signals (error-rate spikes, p99 latency) are still detectable at the new sampling level.
- Use periodic full-capture windows (for example, a 1–4 hour snapshot at 100% for critical services) to recalibrate baselines and verify adaptive-engine behavior.
-
Automate policy delivery.
- Choose a control plane: remote-sampling endpoints for SDKs, a policy datastore used by your collectors, or an adaptive engine (e.g., Jaeger remote sampling). Automate policy rollout and auditing.
-
Keep cost and fidelity visible.
- Maintain a dashboard that correlates sample rate, ingested spans, traced incidents resolved, and dollar cost. Treat that dashboard as the system’s SLA for observability spend.
Practical metric example: For a service generating ~500 traces/sec with 2s typical duration and a target backend of 50 sampled traces/sec, set
decision_wait = 3s, computenum_traces >= 500 * 3 * 1.5 ≈ 2250, and set aprobabilisticfallback that produces approximately the remaining budget afteralways_sample/status_codepolicies have their share. Monitor backend ingress and iterate.
Closing
A global sampling strategy is not a one-time config; it is an operational feedback loop that balances value (errors, high-cardinality flows, SLO-implicated traces) against cost (ingest, storage, query latency). Adopt layered sampling — conservative head-based controls, stateless collector-level probabilistic gates, and stateful tail-based policies for high-value retention — instrument the decision telemetry, and iterate on concrete budgets so the system keeps the traces that solve incidents while keeping the bill predictable.
Sources
Tail Sampling with OpenTelemetry: Why it’s useful, how to do it - OpenTelemetry blog post describing tail sampling concepts, decision_wait semantics, and a sample tail_sampling configuration.
Tracing SDK Sampling (OpenTelemetry Tracing SDK spec and language docs) - Specification and language-specific docs for head samplers such as TraceIdRatioBasedSampler.
Tail sampling processor (OpenTelemetry Collector Contrib) - Processor reference listing supported tail_sampling policy types (status_code, latency, probabilistic, rate_limiting, composite, etc.) and configuration fields.
Getting Started with Advanced Sampling (AWS Distro for OpenTelemetry) - Practical guidance on groupbytrace/tail_sampling pipeline patterns and sizing guidance (num_traces, decision_wait) plus monitoring recommendations.
Sampling (Jaeger documentation) - Explanation of remote sampling, adaptive sampling, and configuration patterns for per-service and per-operation policies.
Tail sampling (Grafana / Alloy documentation) - Best-practice: generate span-derived metrics before sampling to avoid metric skew; also shows pipeline patterns for metrics + sampling.
Sampled Data in Honeycomb - Explanation of SampleRate attributes and how backends can adjust aggregates to compensate for sampling.
Probabilistic sampler processor (Splunk / Collector distributions) - Practical probabilistic_sampler configuration options including sampling_percentage, hash_seed, and failure modes.
Top comments (0)