Modern distributed systems require structured observability without sacrificing developer velocity or introducing technical debt.
What We're Building
We are constructing a high-throughput Rust API service that automatically traces HTTP requests from the first incoming header to the final response. The goal is to demonstrate how to integrate OpenTelemetry (OTLP) into a production-grade Rust stack using minimal boilerplate while maintaining async context. We will avoid external managed SDKs in favor of the standard opentelemetry crates, ensuring full control over the telemetry pipeline. This approach applies to any backend service written in Rust, whether it runs on Kubernetes or local infrastructure.
Step 1 — Initialize the Global Provider
Before recording any telemetry data, you must configure the global Provider to handle context propagation and resource attributes. This step ensures that the OpenTelemetry SDK manages the lifecycle of the trace pipeline without manual resource cleanup.
use opentelemetry::sdk::propagation::TraceContextPropagator;
use opentelemetry_sdk::trace::TracerProvider;
// Create a provider using a default exporter or OTLP
let tracer = TracerProvider::builder()
.with_simple_processor(opentelemetry_sdk::runtime::Tokio::current())
.build();
opentelemetry::global::set_tracer_provider(tracer);
This configuration step establishes the global telemetry state. By initializing the provider early in the application lifecycle, you guarantee that all subsequent code uses the same tracer implementation. This prevents race conditions where concurrent tasks might attempt to use a null or uninitialized provider, causing silent failures in metrics collection.
Step 2 — Configure the OTLP Pipeline
The OpenTelemetry Collector expects data in a specific protocol format, usually OTLP over HTTP or gRPC. You define this endpoint in the exporter configuration to ensure data reaches your monitoring backend securely.
let exporter = OtlpExporter::new().with_endpoint("http://localhost:4317");
The OpenTelemetry Collector acts as the intermediary between your application and monitoring tools. Configuring the endpoint correctly prevents data loss during high load or network instability. The exporter handles batching logic internally, so you do not need to manage buffer sizes manually unless throughput optimization is required.
Step 3 — Handle Context Propagation
When a request hits your service, the incoming traceparent header must be extracted and attached to the current async runtime context. Without this, you cannot correlate requests across microservices or handle retries correctly.
use opentelemetry::propagation::Propagator;
// In Axum middleware or handlers:
let context = Propagator::extract(&request, &|_, _| None);
Context propagation is critical for distributed systems. It allows the runtime to identify which request triggered the execution context automatically. If you skip this step, every retry or callback within the service will spawn a new, uncorrelated trace tree.
Step 4 — Instrument Handler Logic
You attach a span to the handler function so that all internal calls made within that scope are automatically included in the trace. This creates a clear boundary between business logic and infrastructure noise.
async fn handle_request(req: Request) {
let tracer = opentelemetry::global::get_tracer("my-service");
let span = tracer.span_builder("process_request").start();
// ... logic
}
Span lifecycle management ensures that the trace object is dropped correctly when the async task completes. Rust's Drop trait handles this automatically. Keeping the instrumentation code close to business logic reduces the risk of missing steps in complex flows.
Step 5 — Record Errors and Metrics
Finally, you ensure that panics or HTTP errors are recorded as distinct spans with error status. This allows your backend monitoring to alert on failure rates instantly.
span.set_attribute(opentelemetry::trace::KeyValue::new("error", "true"));
Error handling is a distinct telemetry concern. It prevents false positives where a 500 error is treated as a system crash rather than a business error. You should also attach status codes to spans to help downstream services understand request outcomes without needing to inspect raw logs.
Takeaways
- Tracing Spans — Encapsulate logic boundaries.
- Context Propagation — Correlate distributed calls.
- OTLP Export — Standardize data shipping.
- SDK Lifecycle — Ensure resource management.
- Error Status — Track failure events.
What's Next
- Visualize traces in Tempo.
- Add metric aggregation.
- Configure batching.
Further Reading
- Designing Data-Intensive Applications (Kleppmann) — Explains the underlying systems architecture needed to support observability pipelines.
- A Philosophy of Software Design (Ousterhout) — Discusses why complexity grows without structured boundaries like traces.
- Learn Rust in a Month of Lunches (MacLeod) — Essential for understanding Rust async lifetimes used in OTel.
Architecture Patterns
This guide is part of the Architecture Patterns series, focusing on scalable backend services in Rust.
Top comments (0)