Borys Generalov

Posted on May 7 • Edited on May 28 • Originally published at blog.bgener.nl

OpenTelemetry custom spans in .NET: seeing what your code decided

#development #devops #opentelemetry #monitoring

How to instrument business decisions that auto-instrumentation cannot see. Events, span kinds, cross-process context propagation over RabbitMQ, and baggage” with a working .NET 10 demo.

What you get: when to use custom spans, what to put on them, and how to avoid the traps. How to name spans, what to tag, when to use events instead, context propagation between services over RabbitMQ, and baggage that travels without any method parameters.

Demo: github.com/bgener/otel-custom-spans” two .NET 10 services, RabbitMQ, Aspire Dashboard and Jaeger. docker compose up --build and you have a working trace across two processes.

OpenTelemetry spans

A trace is the full path of a request through your system. Each step in that path is a span: a named, timed unit of work with attributes attached. OpenTelemetry's auto-instrumentation creates spans for the calls it understands: inbound HTTP requests, outbound database queries, gRPC calls, message queue operations. For each of those you see the operation name, duration, and whether it succeeded.

But auto-instrumentation stops at the call boundary. Your .NET service runs business logic and makes decisions. It returns a result without throwing anything. Something is off, but none of that is visible.

You probably recognize this from the era of heavy log.Debug and log.Verbose. One log statement after every if clause, every internal state worth knowing. The code got noisy and barely readable.

Custom spans are the same idea done properly. Structured attributes, timestamped, sitting in the trace right next to the auto-instrumented calls.

ActivitySource and your first span

In .NET, custom spans are built with ActivitySource. You create one per service or logical area of code, start activities from it, and attach attributes that describe what your code decided. The pattern works the same whether you are in an API handler, a domain service, or a background worker.

When you call StartActivity(), the new span automatically becomes the active span for the current async execution context. Nested calls can reach it via Activity.Current without you threading a variable through every method signature. Auto-instrumented spans work the same way: the inbound HTTP span created by ASP.NET Core middleware is also an Activity, and your custom spans nest inside it automatically.

The demo centralizes source creation in a static class:

internal static class Observability
{
    public const string ServiceName = "weather-api";
    public static readonly ActivitySource ActivitySource = new(ServiceName);
}

One source per service. The name matches the service name so spans are grouped correctly in your APM backend.

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService(Observability.ServiceName))
    .WithTracing(tracing => tracing
        .AddSource(Observability.ServiceName)
        .AddAspNetCoreInstrumentation()
        .AddOtlpExporter(e => { e.Endpoint = new Uri(otelOptions.AspireDashboardEndpoint); }));

Caution:
Skip AddSource and the spans are created but never exported. No error. No warning. Just silence. I have stared at an empty trace viewer for an embarrassing amount of time before catching this.

Here is the weather forecast service from the demo:

internal sealed class WeatherService
{
    private static readonly ActivitySource _source = Observability.ActivitySource;

    public WeatherForecast[] GetForecast(int days = 5)
    {
        using var activity = _source.StartActivity("generate forecast", ActivityKind.Internal);

        activity?.SetTag("forecast.days", days);
        activity?.SetTag("client.id", Baggage.GetBaggage("client.id"));

        var forecast = Enumerable.Range(1, days)
            .Select(index => new WeatherForecast(
                DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
                Random.Shared.Next(-20, 55),
                _summaries[Random.Shared.Next(_summaries.Length)]))
            .ToArray();

        activity?.SetTag("forecast.min_temp_c", forecast.Min(f => f.TemperatureC));
        activity?.SetTag("forecast.max_temp_c", forecast.Max(f => f.TemperatureC));

        return forecast;
    }
}

Three things are on this span: a name, a kind, and attributes.

The name is how this operation appears in the trace viewer. Keep it low-cardinality: name the class of operation, not the instance. OTel recommends {verb} {noun} with spaces: generate forecast, process payment, write record. This mirrors how HTTP (GET /forecast) and database (SELECT weather) name operations. generate forecast is a span name. generate forecast london is not â€” the city belongs in an attribute. See the OTel naming guide.

The attribute forecast.days travels with every trace for this operation. The client.id comes from baggage set at the API edge â€” covered in the cross-process section. Record inputs before the work, outcomes after it. That is the pattern.

Events

Attributes describe the final state of the span. But some operations have decision points inside them that attributes cannot capture.

The forecast can include sub-zero days. That is a fact worth recording at the moment it was detected, not as a summary after the fact. An attribute set after the loop collapses the timeline into a single value. An event preserves when inside the span the thing happened.

var coldDays = forecast.Count(f => f.TemperatureC < 0);
if (coldDays > 0)
{
    activity?.AddEvent(new ActivityEvent("forecast.cold_days_detected",
        tags: new ActivityTagsCollection
        {
            ["cold.day_count"] = coldDays,
            ["cold.threshold_c"] = 0
        }));
}

Events are timestamped. In the trace viewer they appear pinned at the exact millisecond they fired, inside the span timeline. If the span was slow, you can see whether the detection happened early or late and whether it correlates with the latency. An attribute cannot tell you that.

OTel event names follow the general naming conventions: lowercase, dot-separated namespaces, snake_case for multi-word parts. forecast.cold_days_detected is correct. forecastColdDaysDetected is not.

The rule: use an attribute for facts about the operation as a whole. Use an event for things that happened at a specific moment inside it.

Custom span roles and error signals

When you add a custom span, two things determine where it shows up and what it feeds: the kind and the status.

The kind tells your backend what role the span plays. There are five:

_source.StartActivity("handle request",        ActivityKind.Server);   // receiving work
_source.StartActivity("generate forecast",     ActivityKind.Internal); // internal logic
_source.StartActivity("call stored procedure", ActivityKind.Client);   // calling something
_source.StartActivity("publish event",         ActivityKind.Producer); // sending to a queue
_source.StartActivity("process event",         ActivityKind.Consumer); // reading from a queue

Pick the wrong one and your span shows up in the wrong dashboard. Backends that derive metrics from spans emit different time series depending on kind. Server spans become incoming-request RED metrics. Client spans become outbound-dependency metrics. Producer and Consumer feed messaging throughput. Internal appears only inside traces.

Mark a stored proc call as Internal and the dependency dashboard goes blank for that call. Mark a queue publish as Client and the messaging panel never sees it. Kind is not cosmetic.

Status is separate from exceptions. A business rejection is not an exception. Nothing throws. But it is still an Error outcome from the trace's perspective.

if (!CoverageMap.Includes(city))
{
    activity?.SetStatus(ActivityStatusCode.Error, "city outside coverage area");
}

Your error rate dashboard now reflects business failures, not just crashes. This is the gap between SRE error rates and product success rates that you usually learn to live with.

For real exceptions, the demo consumer records them with full structured context:

catch (Exception ex)
{
    activity?.AddException(ex);
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    await _channel!.BasicNackAsync(result.DeliveryTag, multiple: false, requeue: false);
}

AddException records the exception as a span event with exception.type, exception.message, and exception.stacktrace as structured fields. Searchable, filterable, aggregatable in any OTel-compatible backend.

Span links

Most spans have one parent. Batch consumers do not. A consumer drains 50 messages from a queue in one poll. Each message came from a different upstream request, each with its own trace ID. A span needs a parent to belong to a trace, but with 50 different upstream traces there is no single parent.

Span links solve this. A span can carry references to spans in other traces. The consumer span becomes the root of its own trace and holds pointers back to all 50 producer spans. From the consumer trace you click through to any of them.

var links = batch
    .Select(msg => new ActivityLink(msg.ExtractParentContext()))
    .ToArray();

using var activity = _source.StartActivity(
    "consume forecast batch",
    ActivityKind.Consumer,
    parentContext: default,
    links: links);

Without links, batch consumers either pick one arbitrary parent and silently lose the rest, or appear as orphan traces with no context.

Common cases: batch processors, dead-letter retries linking back to the original failure, fan-out workflows where one request kicks off many async jobs.

Cross-process tracing

When both ends of a call know about OTel, trace context flows automatically. HTTP, gRPC, async/await: the .NET runtime handles all of it. The hard case is any transport the SDK has no built-in instrumentation for. RabbitMQ is one of those.

Baggage

Before the producer and consumer, it helps to understand baggage. Baggage is a set of key-value pairs the propagator carries across every process boundary automatically. Set it once at the API edge and it appears on every span in every downstream service without any method parameters.

The demo sets client.id from the X-Client-Id request header:

app.Use(async (context, next) =>
{
    var clientId = context.Request.Headers["X-Client-Id"].FirstOrDefault() ?? "anonymous";
    Baggage.SetBaggage("client.id", clientId);
    await next();
});

Three hops downstream, in the worker process, without passing anything as a parameter:

activity?.SetTag("client.id", Baggage.GetBaggage("client.id"));

Wire a BaggageActivityProcessor into your tracer (see the OTel contrib repo) and you never call SetTag for baggage values at all. The processor copies them onto every span automatically as it closes. Every dashboard you build gets client.id as a free filter.

Caution:
Do not put sensitive data in baggage. It travels in plain text in headers. If your OTel collector exports to a third-party SaaS, you have shipped that data across your trust boundary.

Producer to queue

The publisher creates a Producer span, tags it with messaging attributes, then injects W3C trace context and baggage into the message headers:

internal sealed class WeatherEventPublisher(IConnection connection)
{
    private static readonly ActivitySource _source = Observability.ActivitySource;

    public async Task PublishAsync(WeatherForecastServedEvent evt, CancellationToken ct)
    {
        using var activity = _source.StartActivity(
            "publish weatherforecast event",
            ActivityKind.Producer);

        activity?.SetTag("messaging.system", "rabbitmq");
        activity?.SetTag("messaging.destination.name", QueueNames.WeatherForecastServed);
        activity?.SetTag("messaging.operation.type", "publish");
        activity?.SetTag("client.id", evt.ClientId);

        await using var channel = await connection.CreateChannelAsync(cancellationToken: ct);

        var props = new BasicProperties
        {
            Headers = new Dictionary<string, object?>(),
            DeliveryMode = DeliveryModes.Persistent,
            MessageId = Guid.NewGuid().ToString()
        };

        // Inject W3C trace context + baggage into message headers.
        // The worker extracts these to continue this trace across processes.
        Propagators.DefaultTextMapPropagator.Inject(
            new PropagationContext(Activity.Current!.Context, Baggage.Current),
            props.Headers,
            static (carrier, key, value) => carrier![key] = Encoding.UTF8.GetBytes(value));

        var body = JsonSerializer.SerializeToUtf8Bytes(evt);
        await channel.BasicPublishAsync(
            exchange: string.Empty,
            routingKey: QueueNames.WeatherForecastServed,
            mandatory: false,
            basicProperties: props,
            body: body,
            cancellationToken: ct);

        activity?.SetTag("messaging.message.id", props.MessageId);
    }
}

Two things worth distinguishing. The SetTag calls attach attributes to this specific span â€” they describe this publish operation and appear in the trace detail view. The Propagators.Inject call encodes the W3C trace context and baggage into the message headers so the worker can restore the trace on the other side. Span attributes stay on the span. Baggage travels forward.

connection is IConnection from RabbitMQ.Client v7, which added async channel creation.

The headers on the wire after Inject runs:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
baggage:     client.id=my-client

The traceparent format is the W3C Trace Context spec: {version}-{trace-id}-{parent-id}-{flags}. The 32-character trace ID identifies the whole distributed trace. The 16-character parent ID is the span ID of the publisher span. 01 means sampled. This is the same format HTTP headers carry automatically â€” here you are writing it manually into AMQP message headers.

Consumer

The worker is a completely separate process with its own Program.cs and its own OTel registration. It extracts context from the message headers and passes it as the parent when starting the consumer span:

private async Task ProcessMessageAsync(BasicGetResult result, CancellationToken ct)
{
    var parentContext = Observability.Propagator.Extract(
        default,
        result.BasicProperties.Headers,
        static (headers, key) =>
        {
            if (headers is null || !headers.TryGetValue(key, out var raw) || raw is null)
                return [];
            return raw switch
            {
                byte[] bytes => [Encoding.UTF8.GetString(bytes)],
                string text  => [text],
                _            => [raw.ToString() ?? string.Empty]
            };
        });

    Baggage.Current = parentContext.Baggage;

    using var activity = Observability.ActivitySource.StartActivity(
        "process weatherforecast event",
        ActivityKind.Consumer,
        parentContext.ActivityContext);  // links consumer span to producer span

    activity?.SetTag("messaging.system", "rabbitmq");
    activity?.SetTag("messaging.operation.type", "process");
    activity?.SetTag("messaging.message.id", result.BasicProperties.MessageId);
    activity?.SetTag("client.id", Baggage.GetBaggage("client.id"));

    var evt = JsonSerializer.Deserialize<WeatherForecastServedEvent>(result.Body.Span)!;
    await SimulateAnalyticsWriteAsync(evt, ct);
    await _channel!.BasicAckAsync(result.DeliveryTag, multiple: false);
}

Two lines connect the processes:

parentContext.ActivityContext            // restores trace ID + parent span ID from message headers
Baggage.Current = parentContext.Baggage // restores baggage that travelled with the message

The consumer creates a nested child span for the analytics write â€” a separate Client span showing the DB operation as a distinct step:

private static async Task SimulateAnalyticsWriteAsync(WeatherForecastServedEvent evt, CancellationToken ct)
{
    using var activity = Observability.ActivitySource.StartActivity(
        "write analytics record",
        ActivityKind.Client);

    activity?.SetTag("db.system", "postgresql");
    activity?.SetTag("db.operation", "INSERT");
    activity?.SetTag("db.sql.table", "weather_analytics");
    activity?.SetTag("client.id", evt.ClientId);

    await Task.Delay(Random.Shared.Next(10, 50), ct);
}

What you see in Aspire Dashboard or Jaeger after one request:

GET /weatherforecast                          (Server, ForecastApi)
  generate forecast                           (Internal, ForecastApi)
  publish weatherforecast event               (Producer, ForecastApi)
                    ----- RabbitMQ boundary -----
process weatherforecast event                 (Consumer, ForecastWorker)
  write analytics record                      (Client, ForecastWorker)

Two processes. One trace. The consumer span hangs off the producer span as if they were in the same call stack.

Without Inject and Extract: two completely disconnected traces. The forecast request appears to end at the publish. The worker appears to start a job for no apparent reason. The first time a customer reports a bug you discover the only thing connecting the API request to the analytics record is a wall-clock timestamp.

Spans inside the database

This section covers a scenario not in the demo: stored procedures and how to get visibility into what happens inside the database.

When .NET calls Postgres via Npgsql, EF Core records a SQL client span. Postgres has no idea your trace exists. The proc executes in isolation. The bridge is SQLCommenter, a format that encodes trace context as a SQL comment the database can read.

Info:
Database support: SQLCommenter works with PostgreSQL and MySQL. SQL Server and Oracle have no equivalent. If your stack runs a database without SQLCommenter support, wrap the call in a custom span and capture output parameters as attributes instead.

using var activity = _source.StartActivity("call stored procedure", ActivityKind.Client);

var traceparent =
    $"00-{activity!.TraceId.ToHexString()}-{activity.SpanId.ToHexString()}-01";

await using var cmd = _connection.CreateCommand();
cmd.CommandText =
    $"/*traceparent='{traceparent}'*/ CALL get_forecast_for_city(@city, @tier)";
cmd.Parameters.AddWithValue("@city", city);
cmd.Parameters.AddWithValue("@tier", tier);
await cmd.ExecuteNonQueryAsync();

EF Core 9 with Npgsql ships a built-in EnableSqlCommenter() option that does this for you. Use it if your driver supports it.

Beyond SQLCommenter, there is pg_tracing: a Postgres extension open-sourced by Datadog that reads the traceparent comment and emits OTLP spans for the engine work it can see: parse, plan, execute, nested function calls, trigger execution. Postgres 16+ only.

call stored procedure  (Client, your app)
  pg.parse
  pg.plan
  pg.execute
    pg.function: get_forecast_for_city
      pg.query: SELECT FROM sensor_readings
      pg.query: SELECT FROM city_forecast

Caveats: Postgres 16 or higher only. You build the extension yourself or use a Datadog prebuilt image. It does not let you write start_span() inside PL/pgSQL. Instrumentation happens at the engine level, not in user code. MySQL and SQL Server have nothing comparable.

For any team running Postgres, this is the most direct way to stop treating stored procedures as a blackbox.

The cardinality trap

Every major APM backend converts span names into metric labels. Put a dynamic value in the span name and you get a unique time series for every unique value. At production scale that exhausts cardinality limits and your backend starts dropping data. It looks completely innocent when you write it.

// Do NOT do this
using var activity = _source.StartActivity($"generate forecast {city}");

SigNoz, Datadog APM, Honeycomb, Grafana Tempo all run a spanmetrics processor that emits time series keyed on span_name. A few hundred cities is fine. A few thousand customer IDs in the span name is not. You find out at 2am when dashboards stop refreshing.

// Correct
using var activity = _source.StartActivity("generate forecast");
activity?.SetTag("forecast.city", city);

Same signal. No explosion. The span name stays generate forecast. The city is a filterable attribute. If it changes per request: attribute. If it names the kind of operation: span name.

The real cost

Custom spans have a maintenance cost nobody mentions in the enthusiastic how-to posts.

Business logic changes. When it does, the instrumentation has to change with it. If the attribute you added six months ago no longer reflects what the code does, you now have misleading data in traces. Not missing data. Wrong data. I would rather have an empty trace than a trace that confidently tells me the wrong thing.

I would not use this in production unless your team treats span attributes with the same review discipline as API contracts. They are observable behavior. When they drift silently they break dashboards, misfire alerts, and send engineers in the wrong direction during the worst possible moment.

There is also the coordination overhead. You need a naming convention the whole team follows. You need someone who knows what sources are registered, what each attribute means, and why it was added. A bit of documentation here, kept up to date, is not glamorous but it is real engineering infrastructure.

On a team of three with two services, nobody will notice if you skip that. On a platform with twenty services and four teams it becomes a governance problem before it becomes an engineering one.

The alternative is debugging production with a green trace and a prayer. I have done both. Pick your poison.

Final thoughts

Auto-instrumentation covers infrastructure. Custom spans cover intent: what the code chose to do, not just what it called.

Patterns to take from here:

One ActivitySource per service, centralized in a static Observability class. Register it by name in AddSource() before you have ten sources.
Span name for the operation kind, attributes for variable data. If it changes per request, it belongs in a tag.
Match ActivityKind to the role. Wrong kind, wrong dashboard.
Inject and extract everywhere your runtime cannot. Queues, SQL comments, file metadata, custom protocols.
Set baggage at the API edge once. It travels to every span in every downstream service automatically.
Review span attributes in the same PR as the business logic they describe. They are observable behavior. When they drift silently they break dashboards.

The portable rule: if it names the kind of thing that happened, it is the span name. If it describes the operation, it is an attribute. If it happened at a specific moment inside the operation, it is an event.

What is the difference between a span event and a span attribute? Attributes describe the span as a whole. Events are timestamped points inside the span with their own attributes. Use an attribute for forecast.days = 5. Use an event for forecast.cold_days_detected at 12ms with cold.day_count = 3. The event tells you when inside the span the thing happened.

What does ActivityKind actually change? It changes which dashboards your span shows up in. Backends that derive metrics from spans emit different time series for Server vs Client vs Producer vs Consumer. Internal appears only inside traces. Wrong kind, wrong dashboard.

How do I propagate trace context through RabbitMQ? Use Propagators.DefaultTextMapPropagator.Inject on the producer side, passing Activity.Current!.Context and Baggage.Current. On the consumer side use Extract with a getter that reads the message headers, then pass parentContext.ActivityContext to StartActivity and restore Baggage.Current = parentContext.Baggage.

What is baggage and when should I use it? Baggage is a set of key-value pairs the propagator carries across every process boundary automatically. Use it for cross-cutting context: tenant ID, client ID, correlation tokens. Do not put sensitive data in baggage â€” it travels in plain text in headers.

Does this work with .NET Aspire? Yes. Register each ActivitySource via AddSource() inside ConfigureOpenTelemetry() in ServiceDefaults. Aspire does not auto-discover custom sources.

What happens if I put high-cardinality values in the span name? Backends that derive metrics from spans turn the span name into a metric label. Each unique name becomes a unique time series. At production scale this exhausts cardinality limits and causes data drops.

Can I use Activity.Current instead of a local variable? Yes. Activity.Current returns the ambient span for the current async context. Useful in nested calls where you want to attach an attribute without threading the activity reference through every method signature.

Can I get spans from inside a stored procedure? Only on PostgreSQL 16+, using the pg_tracing extension. It reads the traceparent from a SQLCommenter comment and emits OTLP spans for parse, plan, execute, function calls, and triggers. MySQL and SQL Server have no equivalent.

DEV Community