Dylan Dumont

Posted on Apr 24

OpenTelemetry in Rust: Instrumenting a Service From Scratch

#architecture #observability #backend #devops

Modern distributed systems require structured observability without sacrificing developer velocity or introducing technical debt.

What We're Building

We are constructing a high-throughput Rust API service that automatically traces HTTP requests from the first incoming header to the final response. The goal is to demonstrate how to integrate OpenTelemetry (OTLP) into a production-grade Rust stack using minimal boilerplate while maintaining async context. We will avoid external managed SDKs in favor of the standard opentelemetry crates, ensuring full control over the telemetry pipeline. This approach applies to any backend service written in Rust, whether it runs on Kubernetes or local infrastructure.

Step 1 — Initialize the Global Provider

Before recording any telemetry data, you must configure the global Provider to handle context propagation and resource attributes. This step ensures that the OpenTelemetry SDK manages the lifecycle of the trace pipeline without manual resource cleanup.

use opentelemetry::sdk::propagation::TraceContextPropagator;
use opentelemetry_sdk::trace::TracerProvider;

// Create a provider using a default exporter or OTLP
let tracer = TracerProvider::builder()
    .with_simple_processor(opentelemetry_sdk::runtime::Tokio::current())
    .build();

opentelemetry::global::set_tracer_provider(tracer);

This configuration step establishes the global telemetry state. By initializing the provider early in the application lifecycle, you guarantee that all subsequent code uses the same tracer implementation. This prevents race conditions where concurrent tasks might attempt to use a null or uninitialized provider, causing silent failures in metrics collection.

Step 2 — Configure the OTLP Pipeline

The OpenTelemetry Collector expects data in a specific protocol format, usually OTLP over HTTP or gRPC. You define this endpoint in the exporter configuration to ensure data reaches your monitoring backend securely.

let exporter = OtlpExporter::new().with_endpoint("http://localhost:4317");

The OpenTelemetry Collector acts as the intermediary between your application and monitoring tools. Configuring the endpoint correctly prevents data loss during high load or network instability. The exporter handles batching logic internally, so you do not need to manage buffer sizes manually unless throughput optimization is required.

Step 3 — Handle Context Propagation

When a request hits your service, the incoming traceparent header must be extracted and attached to the current async runtime context. Without this, you cannot correlate requests across microservices or handle retries correctly.

use opentelemetry::propagation::Propagator;
// In Axum middleware or handlers:
let context = Propagator::extract(&request, &|_, _| None);

Context propagation is critical for distributed systems. It allows the runtime to identify which request triggered the execution context automatically. If you skip this step, every retry or callback within the service will spawn a new, uncorrelated trace tree.

Step 4 — Instrument Handler Logic

You attach a span to the handler function so that all internal calls made within that scope are automatically included in the trace. This creates a clear boundary between business logic and infrastructure noise.

async fn handle_request(req: Request) {
    let tracer = opentelemetry::global::get_tracer("my-service");
    let span = tracer.span_builder("process_request").start();
    // ... logic
}

Span lifecycle management ensures that the trace object is dropped correctly when the async task completes. Rust's Drop trait handles this automatically. Keeping the instrumentation code close to business logic reduces the risk of missing steps in complex flows.

Step 5 — Record Errors and Metrics

Finally, you ensure that panics or HTTP errors are recorded as distinct spans with error status. This allows your backend monitoring to alert on failure rates instantly.

span.set_attribute(opentelemetry::trace::KeyValue::new("error", "true"));

Error handling is a distinct telemetry concern. It prevents false positives where a 500 error is treated as a system crash rather than a business error. You should also attach status codes to spans to help downstream services understand request outcomes without needing to inspect raw logs.

Takeaways

Tracing Spans — Encapsulate logic boundaries.
Context Propagation — Correlate distributed calls.
OTLP Export — Standardize data shipping.
SDK Lifecycle — Ensure resource management.
Error Status — Track failure events.

What's Next

Visualize traces in Tempo.
Add metric aggregation.
Configure batching.

Architecture Patterns

This guide is part of the Architecture Patterns series, focusing on scalable backend services in Rust.

DEV Community