DEV Community

Cover image for Following a Request Across Services with OpenTelemetry: How I Wired smart-novel piper-tts-rest-api

Following a Request Across Services with OpenTelemetry: How I Wired smart-novel piper-tts-rest-api

When a request crosses a service boundary, the most useful question you can answer in production is: "What did this one call actually do?" Not "what did service A do" and then "what did service B do" in two separate windows. Instead you wanna see is one trace, one timeline, and everything in the same tab.

That is exactly what I did in my side project: a single GraphQL mutation GenerateChapterAudio in my smart-novel backend (NestJS) calls out over HTTP to my piper-tts-rest-api service (plain Node), and Jaeger renders both services as branches of the same trace.

Jaeger trace spanning smart-novel and piper-tts-rest-api, with the cross-service POST highlighted

The two yellow spans are children of the smart-novel POST span in the parent service (smart-novel). That parent/child relationship across a network hop is the whole point of distributed tracing.

The Mental Model in One Sentence

OpenTelemetry standardises a tiny HTTP header called traceparent. If smart-novel adds it to the outgoing request and piper-tts-rest-api extracts it from the incoming request, both services will emit spans that share the same trace_id, piper-tts-rest-api's spans will listed under the smart-novel's spans. Jaeger then knows how to draw the tree you saw.

Bootstrapping the OpenTelemetry SDK

OpenTelemetry's auto-instrumentation works by monkey-patching libraries. That means the SDK must start before any of those modules are loaded β€” otherwise the instrumentation grabs a reference to the original, unpatched function and you get no spans at all.

For that we have two options:

  1. Like what I am doing in smart-novel we must import apps/backend/src/instrumentation.ts as the first import in the entrypoint file which in our NestJS app would be main.ts.
    • OTLPTraceExporter which exports OTel data over HTTP.
    • getNodeAutoInstrumentations(...) disables noisy instrumentations:
      • fs, dns, net, express, router.
      • Silences the optional winston-transport warning.
      • JWKS fetches to Zitadel is filtered out at the source via ignoreOutgoingRequestHook.
  2. And what we can have an src/instrumentation.ts piper-tts-rest-api and then use --import where you have your start script in package.json.
    • HttpInstrumentation enables automatic tracing for Node’s HTTP module.
    • piper-tts-rest-api is ESM, and ESM is a different beast. A import './instrumentation.js' at the top of server.ts is not enough, because ESM hoists every import in a module to the top of its evaluation phase. By the time instrumentation.js's body runs, node:http has already been resolved and its exports captured.

BatchSpanProcessor with explicit queue size, batch size, flush interval, export timeout. While propagators read and write trace context from request headers.

Propagator is a component within the @opentelemetry/core package responsible for serializing and deserializing context data so it can be transmitted across network boundaries to remote services.

smart-novel <> piper-tts-rest-api

@opentelemetry/instrumentation-http propagates traceparent the headers for every to the outgoing requests (caller), then @opentelemetry/instrumentation-http in piper-tts-rest-api extract traceparent from the incoming request.

πŸ’‘Tip

I created "tts.synthesize" span to separate things, e.g. outside of the new span I can validate and return a 400 error if the incoming request body is invalid.

You can do the same in smart-novel, e.g. adding attributes to the span.

x-trace-id Header

I add the trace ID to the response which is great for debugging. In my smart-novel we have trace-id.interceptor.ts and in piper-tts-rest-api I am manually adding it.

πŸ›ˆ x-trace-id VS traceparent:

x-trace-id is intentionally separate from the W3C traceparent header. traceparent is reserved for service-to-service propagation and is set by the OTel propagators themselves whilst x-trace-id is meant to be used for debugging purposes.

Two Small Ergonomic Touches I'm Proud of

  1. Renaming the GraphQL root span: instrumentation-http names every server span POST /graphql β€” which is useless when you have 30 mutations and queries. So I wrote a tiny Apollo plugin that renames the active root span the moment Apollo finishes parsing/validating the operation (graphql-span-rename.plugin.ts). That is why the top of the screenshot says mutation GenerateChapterAudio instead of POST /graphql.
  2. Tail-sampling at the OpenTelemetry collector: the OpenTelemetry Collector between the app and Jaeger is configured to use tail_sampling, so that:
    • Always keep error traces, regardless of what else is going on.
    • Drop known-noisy operations (health checks, introspection queries) using the graphql.operation.name attribute the plugin above sets.
    • Probabilistically sample the rest, currently at 100% for dev env, and for production we can tone it down.

Visualization

smart-novel => piper-tts-rest-api

Putting Everything Together

If you're trying to replicate this in your own stack, here's the minimal checklist:

  1. Start the OTel SDK before anything else loads.
  2. Use and configure the same propagators on both sides so they can decode and encode the messages.
  3. Do NOT auto-instrument everything (although I have done exactly that ATM with Prisma πŸ˜…).
  4. Echo the trace ID back to clients (x-trace-id response header). Future-you will be very, very grateful.

Top comments (0)