Amir Blum for Aspecto

Posted on Dec 19, 2022 • Originally published at aspecto.io

Aspecto OpenTelemetry Sampler for NodeJS

At Aspecto, we enable users to configure advanced sampling rules (both head and tail sampling) in an easy-to-use UI with reach features and centralized management. It's a control plane for your trace sampling.

The Aspecto Sampler for NodeJS is an implementation of the OpenTelemetry head sampling that allows you to remotely configure all head sampling needs from one UI with zero code changes.

Introduction

Until today, head sampling capabilities were part of our OpenTelemetry distribution, which included a full OpenTelemetry SDK, instrumentations, resource detectors, payload collection, and more. Due to ongoing demand, we are introducing AspectoSampler. Now you can craft your own custom vendor-agnostic OpenTelemetry NodeJS SDK setup and plug the Aspecto sampler with just a few lines of code.

Here is an example of setting up instrumentation for your typescript service from the official OpenTelemetry docs:

/*tracing.ts*/
import * as opentelemetry from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
const sdk = new opentelemetry.NodeSDK({
 traceExporter: new opentelemetry.tracing.ConsoleSpanExporter(),
 instrumentations: [getNodeAutoInstrumentations()]
});
sdk.start()

Your real-life setup will probably include more components, like resource detectors and an otlp exporter to ship the telemetry data to an OpenTelemetry Collector or your chosen vendor (Aspecto, ahem ahem 👀)

This simple installation of OpenTelemetry will record all traces -- each operation going on inside your service. While collecting all spans is fine for POCs, production workloads usually create a firehose of costly and noisy data. Sampling is a powerful tool in your OpenTelemetry toolbox to reduce your costs and focus on telemetry that is useful for you. Here is the same code with an Aspecto sampler integrated (just two lines of code):

/*tracing.ts*/
import * as opentelemetry from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { AspectoSampler } from "@aspecto/opentelemetry-sampler";
const sdk = new opentelemetry.NodeSDK({
  traceExporter: new opentelemetry.tracing.ConsoleSpanExporter(),
  instrumentations: [getNodeAutoInstrumentations()],
  sampler: new AspectoSampler(),
});
sdk.start()

That is it! Just two lines of code that you will never need to touch again, and you unlocked the FULL power of sampling 🕺

Now let's see how it works.

Plain OpenTelemetry Sampling

Once you realize you collect millions of boring and useless health-check traces or that active endpoint in your system that floods your trace viewer and blows up your motley bill, you ask yourself -- how do I get rid of them!? And the immediate answer is sampling. It is a widely used practice and the ideal go-to solution for this problem.

OpenTelemetry provides out-of-the-box samplers such as TraceIdRatioBased. These are great if you want to blindly record a percentage of the traces (sample 1% of spans).

It is a simple strategy to implement, but despite reducing the overall trace amount, you randomly sample traces while dropping other critical ones, those you always want to collect.

After you realize you need more fine-grained control of your sampling, you can write your own custom sampler by implementing the Sampler interface. You might start with something like this:

class MySampler implements Sampler {
  shouldSample(context: Context, traceId: string, spanName: string, spanKind: SpanKind, attributes: Attributes) {
      if (attributes[SemanticAttributes.HTTP_TARGET] === '/health-check') {
          return { decision: SamplingDecision.NOT_RECORD };
      }
      // fallback
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
  }
  toString() { return 'MySampler'; }
}

You then wrap with the following, to apply the sampling decision on the entire trace based on its root span.

const sampler = new ParentBasedSampler({ root: new MySampler() });

Then you test this code, create a PR, get it approved, and deploy it to your dozens of microservices.

A few days later, your vendor's Billing dashboard already looks happier with fewer spans and less cost.

Then you browse your Trace Search screen and realize that 95% of your remaining traces come from one heavily used endpoint that gets executed millions of times daily. You still want observability into it but ideally, collect fewer spans than what you do today. You wish you could sample just 1% of this endpoint.

You might go back to your custom sampler and add something like this:

import { Sampler, ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
import { Context, SpanKind, Attributes, SamplingDecision } from '@opentelemetry/api';
class MySampler implements Sampler {
  private probabilitySampler = new TraceIdRatioBasedSampler(0.01);
  shouldSample(context: Context, traceId: string, spanName: string, spanKind: SpanKind, attributes: Attributes): SamplingResult {
      if (attributes[SemanticAttributes.HTTP_TARGET] === '/health-check') {
          return { decision: SamplingDecision.NOT_RECORD };
      }
      if ( attributes[SemanticAttributes.HTTP_TARGET] === '/my/busy/endpoint') {
          return this.probabilitySampler.shouldSample(context, traceId);
      }
      // fallback
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
  }
  toString() { return 'MySampler'; }
}
const sampler = new ParentBasedSampler({ root: new MySampler() });

And again, create a PR, review, merge, deploy, test, and watch your billings drop and your trace viewer becomes less busy.

At this point, you, your manager, and every other trace consumer in your company are happy, but that is not where the story ends.

Now, maintaining this sampler becomes one of your tasks. You might need to implement it in multiple languages, add dozens of complex statements, synchronize it across dozens of microservices and serverless functions and debug it when it fails to do what you meant. Requests keep flowing. "I want to see more of this", "can you hide that?", and "why can't I find my trace??"

One day, an incident in production hits your /my/busy/endpoint. You say to yourself -- I wish I could see all traces for just a few hours until we understand what is going on. But modifying and deploying new versions is not your first priority, while the incident keeps everyone stressed.

If you deploy changes in sampling, you need to remember to go back and revert these changes after the incident is resolved.

This problem does not scale well. Managing sampling for a real-life production environment in code can be a massive headache.

Introducing Aspecto Sampler 👏🎉

Aspecto Sampler

Aspecto has years of experience implementing OpenTelemetry in high-workload production environments. We have collected needs and use cases from our customers and iterated them to crystallize the best-managed sampling mechanism, so you can focus on your task and leave the hard work and technical implementation to us.

Aspecto sampler abstracts away all the nitty details and implementation code and exposes a UI where you configure the logic in a simple interface. It is basically the sampler from above, but you do not need to write it yourself. Plus, it provides advanced features you can explore to customize your sampling to your specific needs.

Let's review how it works and its features.

Remote Configuration

All sampling configurations are centralized and managed on the Aspecto platform. You only need to install the sampler into your OpenTelemetry SDK once or use our OpenTelemetry distribution and never touch it again.

Up-to-date Configuration

Aspecto Sampler always fetches the latest configuration. Sampling configuration is always up-to-date and consistent across all your services. You do not need to deploy any code or worry that some services still run with an old configuration that is not getting updated.

Automatic Updates

Your changes are immediately and automatically pushed to all your samplers in seconds. With one click, all your services are updated live. No need to restart, deploy or monitor anything.

No Code

Adding, updating, or removing sampling rules does not require any code changes. It is all done in an easy-to-use UI and involves zero code. Any function in your organization can manage it, not only developers who master a specific programming language.

Instant

Any change takes effect immediately. You can experiment and fine-tune your configurations and get fast feedback.

UI

The entire sampling workflow is managed and configured in a dedicated UI in the Aspecto app, built to give you focused and easy yet powerful tools to achieve your sampling goals.

Multiple Languages

One unified interface to customize your sampling for multiple programming languages. No need to master or jump between languages to implement a sampling policy in your organization.

Timer Rules

If you are debugging something or working on a specific service or endpoint, it is convenient to "sample everything" while you need it, but you also do not want to forget about it and see your tracing costs inflate. You can add rules with a timer -- so you get instant visibility into the desired area and sleep well knowing that your traces bill will not blow up.

Turn Rules On and Off

With a single click, you can turn a rule on and off, dynamically adapting sampling as you go and integrating sampling tools into your everyday workflow to find the right balance between costs and telemetry verbosity.

Search Capabilities

OpenTelemetry implementation can easily include many sampling rules. Search your rules with free text to quickly find what you need. You can sort them and check who added each rule and when.

Rich Sampling Language

Define your sampling strategy with rules, which are evaluated in order to derive a sampling decision for each new trace.

Conditions

Use span attributes like http path, http method, messaging queue or topic name, or any custom span attribute to add conditions. Describe each one with a rich set of operators, such as HTTP path starts with "/user" or even write regular expressions for parameterized routes like */account/:id/users*

Services and Environments

Narrow a rule to affect only a specific service/s or environment. For example, you can easily add a rule that only affects the users-service in the production environment.

Sampling Rates and Priority

Arrange your rules according to priorities, place-specific important rules on top, and general fallback rules at the bottom. Each rule defines a sampling rate that can be 0% (record nothing).

Installation Instructions

Visit our docs for complete instructions on installation and configuration.

SAMPLER DOCS

Tail Sampling

The sampler above is a Head Sampler that applies sampling decisions in the SDK as spans are created.

A popular alternative is to use the Aspecto Tail Sampler -- an OpenTelemetry Collector distribution that aggregates spans into traces and applies a sampling decision on an entire trace. Each option has pros and cons, and we encourage you to research and choose the option that best fits your needs.

Not sure what to choose? Schedule a call with our OpenTelemetry experts to assist in the process.

Feedback

We are in a constant process of learning and improving and would love to hear from you. Do not hesitate to contact us (also via our website chat) to share feedback / ask questions / or get support for our managed sampling product.

Our OpenTelemetry experts are here for you.

Supported Languages

Our managed sampler has the following support:

NodeJS -- integrated with our OpenTelemetry distribution, or as a standalone sampler which you can integrate into your custom opentelemetry setup
Ruby -- integrated into our Ruby OpenTelemetry distribution

Support for more languages is coming soon.

Our Tail Sampler can be used with any OpenTelemetry SDK and components that produce trace data.

DEV Community