In my last blog post, I showed y’all how to instrument Python code with OpenTelemetry (OTel), à la auto-instrumentation. You may also recall from that post that I recommended using the Python auto-instrumentation binary even for non-auto-instrumented libraries, because it abstracts all that pesky OTel config stuff so nicely. When you use it, along with any applicable Python auto-instrumentation libraries (installed courtesy of opentelemetry-bootstrap), it takes care of context propagation across related services for you.
All in all, it makes life nice ‘n easy for us!
Well, today, my friends, we’re going to torture ourselves a weeeee bit, because we’re going to put that auto-instrumentation binary aside, and will instead dig into super-duper manual OpenTelemetry instrumentation for Python. Since we don’t have auto-instrumentation as our security blanket, we will have to learn how to do the following:
- Configure OpenTelemetry for Python to send instrumentation data to an Observability back-end that supports OTLP. Spoiler alert: we’ll be using Lightstep as our Observability back-end. ✅
- Propagate context across related services so that they show up as part of the same trace ✅
Note: I won’t go into how to create Spans with OTel for Python, since the official OTel docs do a mighty fine job of it.
Are you scared? Well don’t be, because I’ve figured it all out so that you don’t have to!
Are you readyyyyy? Let’s do this!!
Pre-Requisites
Before we start our tutorial, here are some things that you’ll need:
- A basic understanding of Python and Python virtual environments
- A basic understanding of OpenTelemetry. I suggest checking out the official OTel Docs for refresher, if you need one.
If you’d like to run the full code examples in Part 2, you’ll also need:
- A Lightstep Observability account
- A Lightstep Access Token to tell Lightstep what project to send your traces to
- A basic understanding of how to use Lightstep Observability
- A working installation of Python
Part 1: What’s Happening?
We’ll be illustrating Python manual instrumentation with OpenTelemetry with a client and server app. The client will call a /ping
endpoint hosted by the server.
The example in this tutorial can be found in the lightstep/opentelemetry-examples repo. We will be working with three main files:
- common.py - OTel configuration and connectivity (to connect to Lightstep)
-
client.py - Connect to our server’s
/ping
endpoint -
server.py - Host the
/ping
endpoint
Before we run the example code, we must first understand what it’s doing.
1- OTel Libraries
In order to send OpenTelemetry data to an Observability back-end (e.g Lightstep), you need to install the following OpenTelemetry packages, which are included in requirements.txt:
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-grpc
As you can see, we’re installing the OpenTelemetry API and SDK packages, along with opentelemetry-exporter-otlp-proto-grpc
, which is used to send OTel data to your Observability back-end (e.g. Lightstep) via gRPC.
2- OTel Setup and Configuration (common.py)
In our example, OTel setup and configuration is done in common.py. We split things out into this separate file so that we don’t have to duplicate this code in client.py and server.py.
First, we must import the required OTel packages:
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
Next, we must configure the Exporter. An Exporter is how we send data to OpenTelemetry. As I mentioned earlier, Lightstep accepts data in the OTLP format, so we need to define an OTLP Exporter.
Note: Some vendors don’t accept data in OTLP format, which means that you will need to use a vendor-specific exporter to send data to them.
We configure our Exporter in Python like this:
def get_otlp_exporter():
ls_access_token = os.environ.get("LS_ACCESS_TOKEN")
return OTLPSpanExporter(
endpoint="ingest.lightstep.com:443",
headers=(("lightstep-access-token", ls_access_token),),
)
Some noteworthy items:
- The
endpoint
is set toingest.lightstep.com:443
, which points to Lightstep’s public Microsatellite pool. If you are using an on-premise satellite pool, then check out these docs. - You will need to set the
LS_ACCESS_TOKEN
environment variable with your own Lightstep Access Token.
Finally, we configure the Tracer Provider. A TracerProvider
serves as the entry point of the OpenTelemetry API. It provides access to Tracer
s. A Tracer
is responsible for creating a Span to trace the given operation.
We configure our Tracer Provider in Python like this:
def get_tracer():
span_exporter = get_otlp_exporter()
provider = TracerProvider()
if not os.environ.get("OTEL_RESOURCE_ATTRIBUTES"):
# Service name is required for most backends
resource = Resource(attributes={
SERVICE_NAME: "test-py-manual-otlp"
})
provider = TracerProvider(resource=resource)
print("Using default service name")
processor = BatchSpanProcessor(span_exporter)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
return trace.get_tracer(__name__)
A few noteworthy items:
- We define a Resource to provide OpenTelemetry with a bunch of information that identifies our service, including service name and service version. (You can see a full list of Resource attributes that you can set here.) As the name implies, service name is the name of the microservice that you are instrumenting, and service version is the version of the service that you are instrumenting. In this example, we get the service name and service version are passed in as key/value in the environment variable, OTEL_RESOURCE_ATTRIBUTES (we’ll see some example values in Part 2). If that environment variable is not present, we then set a default service name,
"test-py-manual-otlp"
. - We are using the BatchSpanProcessor, which means that we are telling OTel to export the data in batches. For the purposes of this example, we’re not doing anything beyond a basic configuration.
3- Initialization (client.py and server.py)
We’re finally ready to send data to Lightstep! All we need to do is call common.py’s get_tracer
function from client.py
(Lines 17-20) and server.py
(Lines 17 and 29), like this:
from common import get_tracer
...
tracer = get_tracer()
...
4- Instrumentation (client.py and server.py)
With initialization done, we need to instrument our code, which means that we’ll need to create Spans. I won’t go into the specifics of Span creation here, since the OTel docs do a pretty good job of it, and as I mentioned in the intro, it’s outside of the scope of this post.
I will, however, briefly mention that there are a couple of ways to instrument our code in Python, and you’ll see both ways of Span creation in the example code: using the with statement, and using function decorators.
You can see an example of creating a Span using the with statement in client.py, Lines 23-32. Below is the full function listing:
def send_requests(url):
with tracer.start_as_current_span("client operation"):
try:
carrier = {}
TraceContextTextMapPropagator().inject(carrier)
header = {"traceparent": carrier["traceparent"]}
res = requests.get(url, headers=header)
print(f"Request to {url}, got {len(res.content)} bytes")
except Exception as e:
print(f"Request to {url} failed {e}")
pass
The Span is initialized with the line, with tracer.start_as_current_span("client operation"):
, and everything below that line is within the scope of that Span.
You can see an example of creating a Span using a function decorator in server.py Line 78. Below is the full function listing:
@tracer.start_as_current_span("pymongo_integration")
@app.route("/pymongo/<length>")
def pymongo_integration(length):
with tracer.start_as_current_span("server pymongo operation"):
client = MongoClient("mongo", 27017, serverSelectionTimeoutMS=2000)
db = client["opentelemetry-tests"]
collection = db["tests"]
collection.find_one()
return _random_string(length)
A few noteworthy items:
- The line
@tracer.start_as_current_span("pymongo_integration")
starts the Span for thepymongo_integration
function. Everything in that function is within the scope of that Span. - You may have also noticed that we initialize another span in there, with the line,
with tracer.start_as_current_span("server pymongo operation"):
, (server.py, Line 89). This means that we end up with nested Spans (a Span within a Span).
5- Context Propagation
As I mentioned in the intro, one of the advantages of using Python auto-instrumentation is that it takes care of context propagation across services for you. If you don’t use auto-instrumentation, however, you have to take care of context propagation yourself. Great. Just great.
But before we dig into how to do that, we need to first understand context propagation.
Definition time!
Context represents the information that correlates Spans across process boundaries.
Propagation is the means by which context is bundled and transferred in and across services, often via HTTP headers.
This means that when one service calls another, they will be linked together as part of the same Trace. If you go the pure manual instrumentation route (like we’re doing today), however, you have to make sure that your context is propagated across services that call each other, otherwise you’ll end up with separate, unrelated-even-though-they-should-be-related) Traces.
I have to admit that I was wracking my brains trying to figure out this context propagation stuff. After much time spent Googling and asking folks around here for clarification, I finally got it, so I’m going to share this piece with you here to hopefully spare you some stress.
Note: Although the OpenTelemetry documentation does provide some insight into how to do manual context propagation in Python, the documentation needs a little work. I’m actually part of the OpenTelemetry Comms SIG, so I am using this as motivation to improve the docs around this topic…stay tuned for updates to the OTel docs too! 😎
Okay, so how do we do this manual context propagation? First, let’s remind ourselves of what’s happening in our example app. We have a client service and a server service. The client service calls the /ping
endpoint on the server service, which means that we expect them to be part of the same Trace. This in turn means that we have to ensure that they both have the same Trace ID in order to be seen by Lightstep (and other Observability back-ends) as being related.
At a high level, we accomplish this by:
- Getting the Trace ID of the client
- Injecting the Trace ID into the HTTP header before the client calls the server
- Extracting the client’s Trace ID from the HTTP header on the server side
Easy peasey! Now let’s look at the code that needs to make this happen.
First, we need to start with something called a carrier
. A carrier
is just a key-value pair containing a Trace ID, and it looks something like this:
{'traceparent': '00-a9c3b99a95cc045e573e163c3ac80a77-d99d251a8caecd06-01'}
Where traceparent
is the key, and the value is your Trace ID. Note that the above is just an example of what a Trace ID might look like. Obviously, your own Trace ID will be different (and will be different each time you run the code).
Okay, great. Now how do we obtain said carrier
?
First, we need to import a TraceContextTextMapPropagator
in client.py:
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
Next, we must populate the carrier:
carrier = {}
TraceContextTextMapPropagator().inject(carrier)
If you were to inspect the value of carrier
after this line, you would see that it would look something like this:
{'traceparent': '00-a9c3b99a95cc045e573e163c3ac80a77-d99d251a8caecd06-01'}
Look familiar? 🤯
Now that we have the carrier
, we need to put it into our HTTP header before we make a call to the server.
header = {"traceparent": carrier["traceparent"]}
res = requests.get(url, headers=header)
And voilà! Your carrier is in the HTTP request!
Now that we know what all of these snippets do, let’s put it all together. Here’s what our client code looks like:
def send_requests(url):
with tracer.start_as_current_span("client operation"):
try:
carrier = {}
TraceContextTextMapPropagator().inject(carrier)
header = {"traceparent": carrier["traceparent"]}
res = requests.get(url, headers=header)
print(f"Request to {url}, got {len(res.content)} bytes")
except Exception as e:
print(f"Request to {url} failed {e}")
pass
For the full code listing, check out client.py.
Okay…we’ve got things sorted out on the client side. Yay! Now let’s go to the server side and pluck our carrier
from the HTTP request.
In server.py, we pull the value of traceparent
from our header like this:
traceparent = get_header_from_flask_request(request, "traceparent")
Where we define get_header_from_flask_request
as:
def get_header_from_flask_request(request, key):
return request.headers.get_all(key)
Now we can build our carrier
from this information:
carrier = {"traceparent": traceparent[0]}
We use that to extract the context from this carrier
:
ctx = TraceContextTextMapPropagator().extract(carrier)
Now we can create our Span with the context, ctx
:
with tracer.start_as_current_span("/ping", context=ctx):
Here, we are passing ctx
to a named parameter called context
. This ensures that our "/ping"
Span knows that it’s part of an existing Trace (the one originating from our client call).
It is worth noting that any child Spans of the "/ping"
Span do not require us to pass in a context, since that’s passed in implicitly (see server.py, Line 81, for example).
Now that we know what all of these snippets do, let’s put it all together. Here’s what our server code looks like:
...
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
...
def get_header_from_flask_request(request, key):
return request.headers.get_all(key)
...
@app.route("/ping")
def ping():
traceparent = get_header_from_flask_request(request, "traceparent")
carrier = {"traceparent": traceparent[0]}
ctx = TraceContextTextMapPropagator().extract(carrier)
with tracer.start_as_current_span("/ping", context=ctx):
length = random.randint(1, 1024)
redis_integration(length)
pymongo_integration(length)
sqlalchemy_integration(length)
return _random_string(length)
...
For the full code listing, check out server.py.
Part 2: Try it!
Now that we know the theory behind all of this, let’s run our example!
1- Clone the repo
git clone https://github.com/lightstep/opentelemetry-examples.git
2- Setup
Let’s first start by setting up our Python virtual environment:
cd python/opentelemetry/manual_instrumentation
python3 -m venv .
source ./bin/activate
# Install requirements.txt
pip install -r requirements.txt
3- Run the Server app
We’re ready to run the server. Be sure to replace <LS_ACCESS_TOKEN>
with your own Lightstep Access Token.
export LS_ACCESS_TOKEN="<LS_ACCESS_TOKEN>"
export OTEL_RESOURCE_ATTRIBUTES=service.name=py-opentelemetry-manual-otlp-server,service.version=10.10.9
python server.py
Remember how I told you that we’d see an example of values passed into OTEL_RESOURCE_ATTRIBUTES? Well, here it is! Here, we’re passing in the service name py-opentelemetry-manual-otlp-server
, and service version 10.10.9
. The service name will show up in the Lightstep explorer.
Your output will look something like this:
4- Run the Client app
Open up a new terminal window, and run the client app. Be sure to replace <LS_ACCESS_TOKEN>
with your own Lightstep Access Token.
PS: Make sure you’re in python/opentelemetry/manual_instrumentation
in the opentelemetry-examples
repo root.
export LS_ACCESS_TOKEN="<LS_ACCESS_TOKEN>"
export OTEL_RESOURCE_ATTRIBUTES=service.name=py-opentelemetry-manual-otlp-client,service.version=10.10.10
python client.py test
Note how we’re passing in the service name py-opentelemetry-manual-otlp-client
, and service version 10.10.10
. The service name will show up in the Lightstep explorer.
When you run the client app, it will continuously call the /ping
endpoint. Let it run a few times (maybe 5-6 times-ish?), and kill it (à la ctrl+c
). Sample output:
If you peek over at the terminal running server.py
, you will likely notice a super-ugly stack trace. DON’T PANIC! The /ping
service makes calls to Redis and MongoDB, and since neither of these services is running, you end up getting some nasty error messages like this:
5- See it in Lightstep
If you go to your trace view in Lightstep by selecting the py-opentelemetry-manual-otlp-client
service from the explorer (you could also see the same thing by going to the py-opentelemetry-manual-otlp-server
service), you’ll see the end-to-end trace showing the client calling the server, and the other functions called within the server.
And remember that stack trace in Step 4? Well, it shows up as an error in your Trace. Which is cool, because it tells you that you have a problem, and pinpoints to where it’s happening! How cool is that??
And remember how we never passed our context to the redis_integration
and server redis operation
Spans, you can see that server redis operation
rolls up to redis_integration
, which rolls up to /ping
, just like I said it would. Magic! 🪄
Final Thoughts
Today we learned how to manually configure OpenTelemetry for Python to connect to Lightstep (this also works for any Observability back-end that ingests the OTLP format). We also learned how to link related services together through manual context propagation.
Now, if you ever find yourself in a situation whereby you need to either connect to your Observability back-end without the use of the Python auto-instrumentation binary and/or need to manually propagate context across services, you will know how to do it!
Now, please enjoy this cuddly little pile of rats. From front to back: Phoebe, Bunny, and Mookie. They were nice enough to sit still for the camera while my husband held them.
Peace, love, and code. 🌈 🦄 💫
Got questions about OTel instrumentation with Python? Talk to me! Feel free to connect through e-mail, or hit me up on Twitter or LinkedIn. Hope to hear from y’all!
Top comments (0)