OpenTelemetry is a framework for sending traces, metrics, and logs. It consists of various components, such as protocols and SDKs tailored for many programming languages.
This article explores how OpenTelemetry accomplishes distributed tracing and delves into the underlying mechanisms that make it work.
What is Distributed Tracing?
Distributed tracing is a vital mechanism for monitoring and tracking traces across multiple servers, commonly employed in microservices architectures.
OpenTelemetry's distributed tracing primarily consists of two elements: Trace
and Span
.
A Span
contains execution details such as timestamps or SQL query, while a Trace
is a tree structure with Span as its nodes.
In the visual representation below, the entire figure represents a Trace
, and each bar represents a Span
.
It's important to note that a Trace
is not a tangible entity; programs generate only Span
.
The backend organizes a Trace
from received Spans
.
Generating Traces from Spans
Distributed tracing can be achieved when the backend constructs a Trace
based on Spans
.
To create a Trace based on Spans
, you need to consider three key elements within the Span
:
- TraceId
- SpanId
- ParentSpanId
You can find them in the OpenTelemetry Proto file, specifically in the trace.proto.
-
TraceID: This is the unique identifier for the entire trace. If, for instance, the
TraceID
is9023c11c...
in hexadecimal, any span sharing this TraceID (9023c11c...
) is part of the same trace. -
SpanId: Each
Span
has a unique identifier within the sameTrace
-
ParentSpanId: This is the identifier of the parent span. When the
SpanId
of one Span matches theParentSpanId
of another, the Span is considered as the parent of another one.
By combining these elements, we can construct a Trace
tree.
Let's illustrate this with an example:
In this example, observe that the ParentSpanId
of Spans matches the SpanId
of their parent, while different TraceId give rise to different trees.
Propagation in Distributed Tracing
As established in the previous chapter, we learned that Traces can be constructed from elements in Spans.
Now, the question arises: How does a program determine its own TraceId
and ParentSpanId
?
While sharing TraceId
within the same process can be achieved through memory, distributed tracing in a multi-machine environment necessitates a more sophisticated approach.
This is where Propagation becomes essential.
The idea is simple: "Pass your TraceId and SpanId to other machines somehow."
For instance, when making an HTTP request, you can include these identifiers in the headers.
There are some methods for transmitting this information in HTTP headers, due to historical reasons, but I'll focus on the approach standardized by the W3C, which has gained adoption in the industry.
W3C Trace Context
The format of the W3C Trace Context is specified by the W3C, as indicated by its name.
When utilizing W3C Trace Context, the traceparent
field is employed in the HTTP request header.
The header follows the format ${version}-${trace-id}-${parent-id}-${trace-flags}
.
When using curl
, it looks like the example below:
curl \
-H "traceparent:00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" \
localhost
This example demonstrates how the traceparent
field is included in an HTTP request header, conveying essential trace information to the destination:
- Version:
00
(currently, there is no version other than 00) - TraceId:
4bf92f3577b34da6a3ce929d0e0e4736
- ParentSpanId:
00f067aa0ba902b7
- Flag:
01
at the end indicates whether the request is being sampled or not.
This header enables the sharing of Trace information with another service.
Conclusion
In conclusion, despite the formidable name of distributed tracing, it essentially boils down to passing data (TraceId
, SpanId
, ParentSpanId
).
That's the way OpenTelemetry facilitates distributed tracing.
Top comments (0)