When a request crosses a service boundary, the most useful question you can answer in production is: "What did this one call actually do?" Not "what did service A do" and then "what did service B do" in two separate windows. Instead you wanna see is one trace, one timeline, and everything in the same tab.
That is exactly what I did in my side project: a single GraphQL mutation GenerateChapterAudio in my smart-novel backend (NestJS) calls out over HTTP to my piper-tts-rest-api service (plain Node), and Jaeger renders both services as branches of the same trace.
The two yellow spans are children of the smart-novel POST span in the parent service (smart-novel). That parent/child relationship across a network hop is the whole point of distributed tracing.
The Mental Model in One Sentence
OpenTelemetry standardises a tiny HTTP header called traceparent. If smart-novel adds it to the outgoing request and piper-tts-rest-api extracts it from the incoming request, both services will emit spans that share the same trace_id, piper-tts-rest-api's spans will listed under the smart-novel's spans. Jaeger then knows how to draw the tree you saw.
Bootstrapping the OpenTelemetry SDK
OpenTelemetry's auto-instrumentation works by monkey-patching libraries. That means the SDK must start before any of those modules are loaded β otherwise the instrumentation grabs a reference to the original, unpatched function and you get no spans at all.
For that we have two options:
- Like what I am doing in smart-novel we must import
apps/backend/src/instrumentation.tsas the first import in the entrypoint file which in our NestJS app would bemain.ts.-
OTLPTraceExporterwhich exports OTel data over HTTP. -
getNodeAutoInstrumentations(...)disables noisy instrumentations:-
fs,dns,net,express,router. - Silences the optional
winston-transportwarning. - JWKS fetches to Zitadel is filtered out at the source via
ignoreOutgoingRequestHook.
-
-
- And what we can have an
src/instrumentation.tspiper-tts-rest-api and then use--importwhere you have your start script inpackage.json.-
HttpInstrumentationenables automatic tracing for Nodeβs HTTP module. - piper-tts-rest-api is ESM, and ESM is a different beast. A
import './instrumentation.js'at the top ofserver.tsis not enough, because ESM hoists everyimportin a module to the top of its evaluation phase. By the timeinstrumentation.js's body runs,node:httphas already been resolved and its exports captured.
-
BatchSpanProcessor with explicit queue size, batch size, flush interval, export timeout. While propagators read and write trace context from request headers.
Propagator is a component within the
@opentelemetry/corepackage responsible for serializing and deserializing context data so it can be transmitted across network boundaries to remote services.
smart-novel <> piper-tts-rest-api
@opentelemetry/instrumentation-http propagates traceparent the headers for every to the outgoing requests (caller), then @opentelemetry/instrumentation-http in piper-tts-rest-api extract traceparent from the incoming request.
π‘Tip
I created "tts.synthesize" span to separate things, e.g. outside of the new span I can validate and return a 400 error if the incoming request body is invalid.
You can do the same in smart-novel, e.g. adding attributes to the span.
x-trace-id Header
I add the trace ID to the response which is great for debugging. In my smart-novel we have trace-id.interceptor.ts and in piper-tts-rest-api I am manually adding it.
π
x-trace-idVStraceparent:
x-trace-idis intentionally separate from the W3Ctraceparentheader.traceparentis reserved for service-to-service propagation and is set by the OTel propagators themselves whilstx-trace-idis meant to be used for debugging purposes.
Two Small Ergonomic Touches I'm Proud of
- Renaming the GraphQL root span:
instrumentation-httpnames every server spanPOST /graphqlβ which is useless when you have 30 mutations and queries. So I wrote a tiny Apollo plugin that renames the active root span the moment Apollo finishes parsing/validating the operation (graphql-span-rename.plugin.ts). That is why the top of the screenshot saysmutation GenerateChapterAudioinstead ofPOST /graphql. - Tail-sampling at the OpenTelemetry collector: the OpenTelemetry Collector between the app and Jaeger is configured to use
tail_sampling, so that:- Always keep error traces, regardless of what else is going on.
-
Drop known-noisy operations (health checks, introspection queries) using the
graphql.operation.nameattribute the plugin above sets. - Probabilistically sample the rest, currently at 100% for dev env, and for production we can tone it down.
Visualization
Putting Everything Together
If you're trying to replicate this in your own stack, here's the minimal checklist:
- Start the OTel SDK before anything else loads.
- Use and configure the same propagators on both sides so they can decode and encode the messages.
- Do NOT auto-instrument everything (although I have done exactly that ATM with Prisma π ).
-
Echo the trace ID back to clients (
x-trace-idresponse header). Future-you will be very, very grateful.


Top comments (0)