Szymon Stawski

Posted on Sep 18, 2023

Ways of effortless instrumentation.

#observability #monitoring #ebpf #opensource

Understanding the inner workings of our services and applications on production is a great and powerful thing but there is one big upfront cost - code instrumentation. Not that long ago the default option to run applications was to buy racks and servers and set them up in the server room or in the co-located data center, now in addition to that we have a landscape of cloud IaaS and serverless offerings, essentially computing power on demand where the upfront cost is significantly smaller. We need a similar transformation of cost and effort distribution in the observability world especially in the times when the number of possible architectures is growing and architectures like microservices allow full team ownership over a set of services including tech stack choices which can result in a vast range of different techniques to monitor these stacks.

This article is a part of formatted notes after my search for effortless yet reliable observability stack instrumentation techniques. Hopefully you will find them useful and insightful.

Intro
.NET Startup Hooks.
Monkey-Patching python example.
New(old)eBPF.
Is autoinstrumentation enough?
Autoinstrumentation at scale.

Resources

Introduction

There are many techniques of instrumentation, each is adjusted to a specific programming language or set of languages. I've tested a few implemented by maintainers of the opentelemetry project. Cloning autoinstrumentation-playground repo and running services with sample usage of instrumentation techniques while going through this article is highly recommended to understand nuances of following implementations.

.NET Startup Hooks

This specific type of instrumentation is dedicated for .net clr virtual machine.
Basic set up is fairly simple as it should be and thanks to opentelemetry standard even with only one service(root) instrumented we are able to get some meaningful information about downstream services as well. That can be a proof of value in the project that needs to be instrumented.
curl -X PUT -H "Content-Type: application/json" -d '{"Name": "alpha","Email": "alpha@alpha.com"}' http://localhost:8088/set_profile

In order to enable instrumentation in c#/.net application we need to provide opentelemetry details starting with OTEL_ prefix, clr and general dotnet details.

CORECLR_ENABLE_PROFILING="1"
CORECLR_PROFILER='{918728DD-259F-4A6A-AC2B-B85E1B658318}'
CORECLR_PROFILER_PATH="/otel-dotnet-auto/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so"
DOTNET_ADDITIONAL_DEPS="/otel-dotnet-auto/AdditionalDeps"
DOTNET_SHARED_STORE="/otel-dotnet-auto/store"

#Openeteletry default STARTUP HOOK an dotnet specific opentelemetry settings
DOTNET_STARTUP_HOOKS="/otel-dotnet-auto/net/OpenTelemetry.AutoInstrumentation.StartupHook.dll"
OTEL_DOTNET_AUTO_HOME="/otel-dotnet-auto"
OTEL_DOTNET_AUTO_TRACES_ADDITIONAL_SOURCES: dummy-gateway

#Opentelemetry configuration details
OTEL_SERVICE_NAME: dummy-gateway
OTEL_EXPORTER_OTLP_ENDPOINT: http://autoinstrumentation-playground-otel-collector-1:4318
OTEL_PROPAGATORS: tracecontext,baggage

Startup hooks allow us to inject code before application startup that's why we need to provide the path to the dll file which will inject Opentelemetry .NET SDK to our application using DOTNET_STARTUP_HOOKS="/otel-dotnet-auto/net/OpenTelemetry.AutoInstrumentation.StartupHook.dll" env variable. In autoinstrumentation-playground this file is downloaded and set up using script provided by maintainers of dotnet autoinstrumentation components.

Dockerfile

##OTEL DOTNET AUTOINSTRUMENTATION
ARG OTEL_VERSION=0.7.0
ADD https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/releases/download/v${OTEL_VERSION}/otel-dotnet-auto-install.sh otel-dotnet-auto-install.sh
RUN apt-get update && apt-get install -y unzip && \
    OTEL_DOTNET_AUTO_HOME="/otel-dotnet-auto" sh otel-dotnet-auto-install.sh
##

Another ineteresing piece of configuration is OTEL_DOTNET_AUTO_TRACES_ADDITIONAL_SOURCES: dummy-gateway env variable by providing the name of our binary we are able to extend auto generated traces with custom ones according to the needs. Similar variable exists for metrics, more about it here.

Monkey-Patching python example

Monkey-patching is a techinque that allows instrumenting library to intercept oryginal call, produce necessary telemetry data and then calls underlying library. It is useful for interpreted languages where we cannot inject code like in a previous example.
In order to instrument our python app we need to use wrapper script:
Dockerfile

# With opentelemetry instrumentation
CMD [ \
"opentelemetry-instrument", \
"--traces_exporter", "otlp", \
"--metrics_exporter", "none", \
 "python", "-u", "main.py" ]

...and set up basic env variables for opentelemetry trace exporter:

OTEL_SERVICE_NAME=dummy-profile-service
OTEL_EXPORTER_OTLP_ENDPOINT=http://autoinstrumentation-playground-otel-collector-1:4318
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf

Let's call /set_profile gateway endpoint once again:
curl -X PUT -H "Content-Type: application/json" -d '{"Name": "alpha","Email": "alpha@alpha.com"}' http://localhost:8088/set_profile

Span produced by python instrumentation lib is now a child span to dummy-gateway spans thanks to tracecontext standard. After a bit of playing around I've even managed to add custom attribute by getting current tracer:

server.py

def __init__(self, db_conn) -> None:
   self.db_conn = db_conn
   self.tracer = trace.get_tracer(__name__)
                           ......
trace.get_current_span().set_attribute("api-version","v1")

This doesn't seem to be possible while working with .net startup hooks(at least I didn't manage to reproduce it) same for eBPF instrumentation. Any suggestions in comments on how it could be approached in .net and eBPF instrumentation types are most welcome.

New(old)eBPF

Piece of linux technology that seem to experience some type of rebirth recently, even microsoft is trying to incorporate this feature in their OSes.
eBPF is a feature of linux kernel which can do a lot more than pulling insights about apps running on host but we will focus on just this use case here. Thanks to eBPF programs running in kernel not userspace all of the syscalls sourcing from our application process can be monitored. For those who are interested in eBPF technology I recommend book by Liz Rice "Learning eBPF".

In opposition to previous two examples this instrumentation type doesn't inject any code, we need to run a sidecar though, see docker-compose.yaml.

  dummy-score-instrumentation:
    image: 322456/otel-go-autoinstrumentation:latest
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: http://autoinstrumentation-playground-otel-collector-1:4317
      OTEL_SERVICE_NAME: dummy-score-service
      OTEL_GO_AUTO_TARGET_EXE: /app/bin/scoreservice
      OTEL_PROPAGATORS: tracecontext,baggage
      CGO_ENABLED: 1
    privileged: true
    pid: host
    volumes:
      - shared-instrumentation-data:/shared-data
      - /proc:/host/proc

In addition to that as you can see in the mentioned docker compose file we need to assign pid from host to our container process and mount /proc host volume to both dummy-score-service and dummy-score-instrumentation.

Let's call /submit_score gateway endpoint:
curl -X PUT -H "Content-Type: application/json" -d '{"UserProfile": "alpha","Score": 1}' http://localhost:8088/submit_score

Is autoinstrumentation enough?

While it all looks very promising and taking into account how immature yet how powerful frameworks for this techniques are now one can be very optimistic. However autoinstrumentation certainly has some limitations. The right question to ask here would be how we can combine autoinstrumentation and minimalistic manual instrumentation, correlate it to acheive truly reliable and customizable instrumenation with positive change of cost and effort distribution to achieve state of the art observability.

Autoinstrumentation at scale.

Previous example on dummy application shows that in different tech stacks autoinstrumentation can be approached differently. My first thought about that was that it would be super hard and tidious to enforce it on a multiple teams level or even whole organization level. While trying to find a solution for that problem I learnt about few projects. I've created an obsrervablity stack with three projects: Odigos, Opentelemetry and Jaeger Tracing. Let's explore it a bit.
Odigos essentialy orchestrates different types of instrumentation and leverages eBPF technology so you do not need to.
After deploying dummy app and observability stack to k8s - guide

We need to set up odigos with opentelemetry this time only traces but it can be done for metrics and logs as well. Autoinstrumentation-playground uses helm chart to deploy odigos however odigos cli seems to be recommended approach for instalation and managing odigos. More about odigos and opentelemetry project in the Resources section of the article.
1.

2.

After generating some traffic we can already see generated traces with just a two steps configuration.

Contributing

Observablity landscape evolves very quickly and a lot of people are working on these opensource projects becuase of that all of the contributions to autoinstrumentation-playground project to keep it up to date are most welcome.

DEV Community