The OpenTelemetry (OTel) Operator is a Kubernetes Operator that manages OTel for you in your Kubernetes cluster to make life a little easier. It does the following:
- Manages deployment of the OpenTelemetry Collector, supported by the
OpenTelemetryCollector
custom resource (CR). - Manages the configuration of a fleet of OpenTelemetry Collectors via OpAMP integration, supported by the
OpAMPBridge
custom resource. - Provides
integration with the Prometheus Operator's
PodMonitor
andServiceMonitor
CRs. - Injects and configures auto-instrumentation into your pods, supported by the
Instrumentation
custom resource.
I first used the Operator last year when I used Kratix to demonstrate how a platform team can make the OTel Operator available to developers, so that they can self-provision a pre-configured OTel Collector in their Kubernetes clusters. More recently, I’ve had a chance to get to know the Operator a bit better, especially while putting together my O’Reilly video course on Observability with OpenTelemetry, and in the recent KubeCon EU talk that I did with Reese Lee.
I’ve learned some really cool things, and I thought it might be helpful to share some little OTel Operator goodies that I’ve picked up along the way, in the form of a Q&A.
Please note that this post assumes that you have some familiarity with OpenTelemetry, the OpenTelemetry Collector, the OpenTelemetry Operator (including the Target Allocator), and Kubernetes.
Q&A
Q1: Does the Operator support multiple Collector configuration sources?
Short answer: no.
Longer answer: OTel Collector can be fed more than one Collector config YAML file. That way, you can keep your base configurations in, say, otelcol-config.yaml
, and overrides or additional configurations of the base can go in, for example, otelcol-config-extras.yaml
. See an example of this in the OTel Demo’s Docker compose file.
Unfortunately, while the OTel Collector supports multiple Collector configuration files, the Collector managed by the OTel Operator does not.
To get around this, you could merge the multiple Collector configs through some external tool beforehand. For example, if you were deploying the Operator via Helm, you could technically pass it multiple Collector config files using multiple --values flags and let Helm do the merging for you.
For reference, check out this thread in the #otel-operator CNCF Slack channel.
Q2: How can I securely reference access tokens in the Collector’s configuration?
In order to send OpenTelemetry data to an Observability backend, you must define at least one exporter. Whether you use OTLP or some proprietary vendor format, most exporters typically require that you specify an endpoint and an access token when sending data to a vendor backend.
When using the OpenTelemetry Operator to manage the OTel Collector, the OTel Collector config YAML is defined in the OpenTelemetryCollector CR. This file should be version-controlled and therefore shouldn’t contain any sensitive data, including access tokens stored as plain text.
Fortunately, the OpenTelemetryCollector
CR gives us a way to reference that value as a secret. Here’s how you do it:
1- Create a Kubernetes secret for your access token. Remember to base-64 encode the secret.
2- Expose the secret as an environment variable by adding it to the OpenTelemetryCollector
CR’s env section
. For example:
env:
- name: LS_TOKEN
valueFrom:
secretKeyRef:
key: LS_TOKEN
name: otel-collector-secret
3- Reference the environment variable in your exporter definition:
exporters:
otlp/ls:
endpoint: "ingest.lightstep.com:443"
headers:
"lightstep-access-token": "${LS_TOKEN}"
For more info, check out my full example here, along with full instructions here.
Q3: Is the Operator version at parity with the Collector version?
For every Collector release, there is an Operator release which provides support for that Collector version. For example, at the time of this writing, the latest Operator version is 0.98.0
. Thus, the the default image of the Collector used by the Operator is version 0.98.0
of the core distribution (as opposed to the contrib distribution).
Q4: Can I override the base OTel Collector image?
Yes!
As we saw earlier, the core distribution is the default Collector distribution used by the OpenTelemetryCollector
CR. The Core distribution is a bare-bones distribution of the Collector for OTel developers to develop and test. It contains a base set of components–i.e. extensions, connectors, receivers, processors, and exporters.
If you want access to more components than the ones offered by core, you can use the Collector's Kubernetes Distribution instead. This distribution is made specifically to be used in a Kubernetes cluster to monitor Kubernetes and services running in Kubernetes. It contains a subset of components from
OpenTelemetry Collector Core and OpenTelemetry Collector Contrib.
If you want to use specific Collector components, you can build your own distribution using the OpenTelemetry Collector Builder (OCB), and include only the components that you need.
Either way, the OpenTelemetryCollector CR allows you to override the default Collector image with one that better suits your needs by adding spec.image
to your OpenTelemetryCollector
YAML. In addition, you can also specify the number of Collector replicas you wish to spin up, by adding spec.replicas
(this is totally independent of whether or not you override the Collector image). Your code would look something like this:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol
namespace: mynamespace
spec:
mode: statefulset
image: <my_collector_image>
replicas: <number_of_replicas>
...
Where:
-
<my_collector_image>
is the name of a valid Collector image from a container repository -
<number_of_replicas>
is the number of pod instances for the underlying OpenTelemetry Collector
Keep in mind that if you're pulling a Collector image from a private container registry, you'll need to use imagePullSecrets
. Since private container registries require authentication, this will enable you to authenticate against that private registry. For more info on how to use imagePullSecrets
for your Collector image, check out the instructions here.
For more info, check out the OpenTelemetryCollector CR API docs.
Q5: Does the Target Allocator work for all deployment types?
No. The Target Allocator only works for StatefulSet, and DaemonSet (newly-introduced). More info here.
Q6: If I’m using Operator’s Target Allocator for Prometheus service discovery, do I need PodMonitor
and ServiceMonitor
CRs installed in my Kubernetes cluster?
Yes, you do. These CRs are bundled with the Prometheus Operator; however, they can be installed standalone, which means that you don’t need to install the Prometheus Operator just to use these two CRs with the Target Allocator.
The easiest way to install the PodMonitor
and ServiceMonitor
CRs is to grab a copy of the individual PodMonitor YAML and ServiceMonitor YAML custom resource definitions (CRDs), like this:
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.71.2/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.71.2/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
Check out my example of the OpenTelemetry Operator’s Target Allocator with ServiceMonitor
here.
Q7: Do I need to create a service account to use the Target Allocator?
No, but you do need to do a bit of extra work. So, here’s the deal…although you need a service account to use the Target Allocator, you don’t have to create your own.
If you enable the Target Allocator and don’t create a service account, one is automagically created for you. This service account’s default name is a concatenation of the Collector name (metadata.name
in the OpenTelemetryCollector
CR) and -collector
. For example, if your Collector is called mycollector
, then your service account would be called mycollector-collector
.
By default, this service account has no defined policy, you’ll still need to create your own ClusterRole
and ClusterRoleBinding
, and associate the ClusterRole
to the ServiceAccount
via ClusterRoleBinding
.
See the Target Allocator readme for more on Target Allocator RBAC configuration.
Q8: Can I override the Target Allocator base image?
Just like you can override the Collector base image (as we saw in Q4), you can also override the Target Allocator’s base image.
Please keep in mind that it’s usually best to keep the Target Allocator and OTel operator versions the same, to avoid any compatibility issues. If you do choose to override the Target Allocator’s base image, you can do so by adding spec.targetAllocator.image
in the OpenTelemetryCollector
CR. You can also specify the number of replicas by adding spec.targetAllocator.replicas
(this is totally independent of whether or not you override the TA image). Your code would look something like this:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol
namespace: mynamespace
spec:
mode: statefulset
targetAllocator:
image: <ta_image_name>
replicas: <number_of_replicas>
...
Where:
-
<ta_image_name>
is a valid Target Allocator image from a container repository. -
<number_of_replicas>
is the number of pod instances for the underlying Target Allocator
Q9: If it’s not recommended that you override the Target Allocator base image, then why would you want to?
One use case might be if you need to host a mirror of the Target Allocator image in your own private container registry for security purposes.
If you do need to reference a Target Allocator image from a private registry, you’ll need to use imagePullSecrets
. To use imagePullSecrets
with the OTel Operator, check out the instructions here. Note that you don’t need to create a serviceAccount
for the Target Allocator, since once is already created for you automagically if you don’t create one yourself (see Q6).
For more info, check out the Target Allocator API docs.
Q10: Is there a version lag between the OTel Operator auto-instrumentation and auto-instrumentation of supported languages?
If there is a lag, it's minimal, as maintainers try to keep these up to date for each release cycle. Keep in mind that there are breaking changes in some semconvs and the team is trying to avoid breaking users' code. More info here.
Final Thoughts
Hopefully this has helped to demystify the OTel Operator a bit more. There’s definitely a lot going on, and the OTel Operator can certainly be a bit scary at first, but understanding some of the basics will get you well on your way to mastering this powerful tool.
If you have any questions about the OTel Operator, I highly recommend that you post questions on the #otel-operator channel on the CNCF Slack. Maintainers and contributors are super friendly, and have always been more than willing to answer my questions! You can also hit me up, and I'll try my best to answer your questions, or to direct you to folks who have the answers!
And now I'll leave you with a photo of my rat, Katie Jr.
Until next time, peace, love, and code. ☮️❤️👩💻
Top comments (0)