austin for Lightstep

Posted on Jun 24, 2020 • Originally published at lightstep.com

Understanding Static Site Performance with OpenTelemetry

#javascript #hugo #observability #tutorial

When you're building a static website, performance can be hard to measure. There are hundreds of variables and confounding conditions that can impact page load speed, everything from mundane considerations like the size of a PNG to the complex vagaries of Content Delivery Networks. How are you supposed to figure out if your users, the people that are trying to load your page, are having a good experience? This becomes trickier when you're using popular managed hosting services, like Netlify, Contentful, or GitHub Pages. You need some sort of telemetry data from the perspective of your end-users in order to accurately gauge how long things take to load.

OpenTelemetry can help you solve this problem! OpenTelemetry is an open source project that promises to make high-quality telemetry a "built-in" feature of cloud-native software. To this end, the project maintains a variety of 'automatic instrumentation' plugins for popular frameworks, runtimes, and libraries that offer you the ability to drop OpenTelemetry into your existing code without major changes in order to profile the performance of your software in production. In simpler terms, OpenTelemetry collects data on the performance of your software or web site from your end-user's perspective, and send that telemetry data to one of many open source or proprietary tools that allow you to persist and analyze it.

In this tutorial, we'll go through the entire process of setting up OpenTelemetry on a static site using Hugo and Webpack, then configuring a deployment of the OpenTelemetry collector in Kubernetes to receive data from our site.

Prerequisites

In order to add OpenTelemetry to a Hugo site, you'll need a Hugo site - tautological, I know, but them's the breaks. If you already have one, then you can use that -- if you're trying to follow along without an existing site, I'd check out the Victor Hugo boilerplate generator, as it'll get you started on the right foot with Hugo and Webpack. You'll also need the following:

Node.JS
NPM

We'll continue this tutorial assuming you're using Victor Hugo, with some notes from my experience instrumenting the opentelemetry.io site. I'll also assume that you're familiar with the basics of using Git and GitHub, and the basics of HTML, CSS, and JavaScript.

If you'd like to follow along with deploying an OpenTelemetry Collector, you'll need a Kubernetes cluster as well - we'll use Google Kubernetes Engine in this tutorial, but the steps should work on any Kubernetes cluster.

Getting Started

First, create a fork (or simply clone) the Victor Hugo boilerplate, and checkout your source code. We'll start by adding OpenTelemetry to our project -

$ npm install --save @opentelemetry/core @opentelemetry/tracing @opentelemetry/web @opentelemetry/plugin-document-load @opentelemetry/exporter-collector

This will install and save several OpenTelemetry components, including the core API and SDK components, automatic browser instrumentation and plugins, and an exporter to the OpenTelemetry Collector. Let's now add a new file to our repository where we'll import OpenTelemetry at /src/tracing.js. You don't need to add anything here for now, but before we forget, let's import it in the main JS file. Open /src/index.js and modify it as so

// JS Goes here - ES6 supported
import "./tracing.js";
import "./css/main.css";

// Say hello
console.log("🦊 Hello! Edit me in src/index.js");

Now that we've got the skeleton of our project set up, it's time to add OpenTelemetry itself.

Adding OpenTelemetry-Web

In the previous step, we created a new file, called tracing.js to hold our OpenTelemetry code. Switch to that in your editor, and you're ready to install and configure OpenTelemetry. First, we'll need to import a few packages that we installed earlier. Add the following --

import { SimpleSpanProcessor, ConsoleSpanExporter } from '@opentelemetry/tracing';
import { WebTracerProvider } from '@opentelemetry/web';
import { DocumentLoad } from '@opentelemetry/plugin-document-load';

Let's briefly touch on what our imports are doing here. First, we're importing a Span Processor which is registered to our Tracer Provider. This component is responsible for handling span data that's generated by the tracer, customarily by exporting it. Our ConsoleSpanExporter will write span data to the browser console for now. Finally, the DocumentLoad plugin extends the capabilities of our tracer provider, allowing it to automatically instrument (read: generate spans for) our page load.

Spans are the building blocks of traces. They represent work being done by your process, or page.

Complete the setup of OpenTelemetry by adding the following code to this file --

const provider = new WebTracerProvider({
  plugins: [
    new DocumentLoad()
  ]
});

provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();

This code will create our provider and plugin, register the span
processor to the provider, and start the provider. Amazingly enough, this is all you need to do for now! In your terminal, start your page preview with npm run preview and open http://localhost:3000 in a web browser. Open your JavaScript console, and refresh the page, you should see output similar to the following.

[HMR] Waiting for update signal from WDS... log.js:24:12
🦊 Hello! Edit me in src/index.js index.js:9:9
Object { traceId: "16b18f5cef76bc6c4fd1578bd0df53d9", parentId: "741587dc317632f9", name: "documentFetch", id: "53ea6e17e3389a01", kind: 0, timestamp: 1592582737016000, duration: 57000, attributes: {…}, status: {…}, events: (8) […] }
ConsoleSpanExporter.js:64:21
Object { traceId: "16b18f5cef76bc6c4fd1578bd0df53d9", parentId: "741587dc317632f9", name: "http://localhost:3000/main.js", id: "ffd85307d05068f5", kind: 0, timestamp: 1592582737140000, duration: 17000, attributes: {…}, status: {…}, events: (8) […] }
ConsoleSpanExporter.js:64:21
Object { traceId: "16b18f5cef76bc6c4fd1578bd0df53d9", parentId: "741587dc317632f9", name: "http://localhost:3000/main.css", id: "278b38cfa637b67c", kind: 0, timestamp: 1592582737140000, duration: 19000, attributes: {…}, status: {…}, events: (8) […] }
ConsoleSpanExporter.js:64:21
Object { traceId: "16b18f5cef76bc6c4fd1578bd0df53d9", parentId: undefined, name: "documentLoad", id: "741587dc317632f9", kind: 0, timestamp: 1592582737016000, duration: 252000, attributes: {…}, status: {…}, events: (9) […] }
ConsoleSpanExporter.js:64:21

Briefly, let's take a look at one of the objects we see here --

{
  "traceId": "16b18f5cef76bc6c4fd1578bd0df53d9",
  "name": "documentLoad",
  "id": "741587dc317632f9",
  "kind": 0,
  "timestamp": 1592582737016000,
  "duration": 252000,
  "attributes": {
    "component": "document-load"
  },
  "status": {
    "code": 0
  },
  "events": [
    {
      "name": "fetchStart",
      "time": [
        1592582737,
        16000105
      ]
    },
    // more events...
  ]
}

This is a JSON representation of a span, which is what OpenTelemetry is creating for you using the DocumentLoad plugin. Spans include more information than you see here, but this is most of the important parts: a name, a trace identifier, a span identifier, timestamp, and duration. We can also see attributes and events -- these are, respectively, properties that help categorize the span, and events that occurred during the span's lifetime.

Let's add some more attributes to our spans in order to make them a bit more useful. Since our ultimate goal is to understand the performance of our page loads, there's two things I can immediately think of that would be useful -- the language of the user's browser, and the path that our users are accessing. We can add both of these properties to our traces by creating some default attributes. In tracing.js, add a new object and modify your provider initialization as follows:

const locale = {
  "browser.language": navigator.language,
  "browser.path": location.pathname
}

const provider = new WebTracerProvider({
  plugins: [
    new DocumentLoad()
  ],
  defaultAttributes: locale
});

Our locale object reads a few values from the browser runtime (namely, the language that the browser is set to, and the current path) and assigns them to our provider as default attributes, which means they'll be applied to all spans created by our tracer. If you refresh your page, you can prove this to yourself by looking at the attribute key in the console output. We'll use these later to get an idea of what pages people are looking at, and roughly where they're from in the world (or at least, we'll be able to use the browser language as a rough proxy for where in the world they are).

Now that we've added OpenTelemetry, we need to actually get the data somewhere other than the browser console. There's a few wrinkles to attend to here as well. First, on modern browsers, OpenTelemetry uses the Beacon API to forward telemetry to a collector service in order to reduce latency for end-users. We also need a place to send that data to. You can either directly export telemetry data to a backend service, or send it to a collector to be aggregated and forwarded.

You can discover the wide range of exporters available to OpenTelemetry at the OpenTelemetry Registry.

There's pros and cons to each of these methods, ones that we won't fully elaborate on due to space considerations, but for the purposes of this tutorial we'll be setting up an OpenTelemetry collector in order to receive our telemetry data. This provides a useful separation of concerns between the generation of telemetry, and the dispensation of that telemetry - for example, if we want to send our telemetry data elsewhere, we can do so by modifying our collector, without having to redeploy our site.

Deploying the OpenTelemetry Collector

The collector itself is a fairly straightforward piece of software with a few moving parts to understand. Generally, though, it allows you to define one or more Receivers, which are endpoints that can receive telemetry data in a specific format. This telemetry data is then sent to another system for analysis and storage through an Exporter. Receivers and Exporters are part of one or more Pipelines, which also allow for the configuration of Processors that can modify the telemetry data in some way. Finally, the collector supports several Extensions which add new features and functionality.

In our case, we don't need anything terribly complicated in terms of collector configuration. We're going to receive data in the OpenTelemetry Format (henceforth referred to as OTLP), and export it to Lightstep using OTLP as well. We'll add some processors to control the amount of memory our collector instances use, and to allow for batching and retrying of exports. One other consideration we must address, however, is TLS (Transport Layer Security). If we deploy our site using HTTPS (and you are using HTTPS in 2020, aren't you?) then our collector also needs to be served over HTTPS. Since we're using Kubernetes, we can take advantage of the Ingress resource to handle this for us -- we'll be using nginx-ingress along with cert-manager to automate the process of creating and provisioning SSL certificates.

Note that the OTLP receiver can handle TLS on its own, but for ease of certificate maintenence, we'll be using our Kubernetes Ingress to terminate TLS and send data to collector instances in plaintext.

I'm going to split this next part into a few discrete steps because there's a lot going on. I'm going to assume that your cluster is basically pristine - here's what mine looked like when I started.

Master version: 1.16.8-gke.15.
3 nodes, type n1-standard-1 with autoscaling.

Set your kubectl context to your Kubernetes cluster before continuing.

Preparing our Kubernetes Cluster

Without getting into a ton of extraneous details, we're going to be using nginx-ingress as our Ingress resource provider in lieu of the GKE Ingress. This is mostly because of how health checks work on GKE Ingress and how the OTLP Receiver in the controller functions (in short, GKE expects the / route to return HTTP 200 OK on GET even if your container readiness probe specifies something else entirely), so we'll start by installing nginx-ingress and cert-manager to our cluster.

Not using GKE? Find more installation instructions for NGINX Ingress here and for cert-manager here.

You'll need to first initialize your user as a cluster administrator by running the following command.

$ kubectl create clusterrolebinding cluster-admin-binding \ --clusterrole cluster-admin \ --user $(gcloud config get-value account)

After this, install nginx-ingress by running this command.

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/cloud/deploy.yaml

You should see a variety of resources being configured and created on your cluster. You can validate that the installation worked by executing $ kubectl get pods -n ingress-nginx, you should see something similar to the following:

NAME                                        READY   STATUS      RESTARTS   AGE
ingress-nginx-admission-create-9hv54        0/1     Completed   0          22h
ingress-nginx-admission-patch-ddjfp         0/1     Completed   0          22h
ingress-nginx-controller-579fddb54f-zjhq7   1/1     Running     0          22h

Now, lets install cert-manager. Run the following command.

$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager.yaml

Again, you'll see a lot of output as resources are created on your cluster. Validate the installation by running $ kubectl get pods -n cert-manager, your result should look something like this:

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-9b8969d86-zrgpg               1/1     Running   0          22h
cert-manager-cainjector-8545fdf87c-pfvxd   1/1     Running   0          22h
cert-manager-webhook-8c5db9fb6-4bdpq       1/1     Running   0          22h

We're now ready to configure our deployment of the OpenTelemetry Collector.

Creating a Collector Configuration

Our first order of business will be to configure the collector itself. We'll store our configuration as a Kubernetes ConfigMap which will be mounted into each pod, and the collector will read this file at startup to configure itself. This makes reconfiguring our collector as simple as updating the ConfigMap, then restarting the pods.

In our case, we expect fairly light load on the collectors, so we're not going to go overboard in resourcing them. Here's the ConfigMap I used, I'll explain some of the more esoteric parts below.

apiVersion: v1
kind: ConfigMap
metadata:
  name: web-collector-conf
  labels:
    app: opentelemetry-collector
    component: web-collector-conf
data:
  web-collector-config: |
    receivers:
      otlp:
        endpoint: "0.0.0.0:55680"
    processors:
      batch:
      memory_limiter:
        ballast_size_mib: 700
        limit_mib: 1500
        spike_limit_mib: 100
        check_interval: 5s
      queued_retry:
    extensions:
      health_check: {}
    exporters:
      otlp:
        endpoint: "ingest.lightstep.com:443"
        headers:
          "lightstep-access-token": <insert access token>
    service:
      extensions: [health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch, queued_retry]
          exporters: [otlp]

The configuration file for the collector is also a YAML file, which makes it compose neatly with Kubernetes YAML syntax. Of note in this are really two things - first, the memory_limiter processor and the otlp exporter. I'm going to link to the documentation on the memory limiter but in short, these options assist us in managing the memory usage of the collector process in order to prevent it running out of memory. On the exporter, I've set the endpoint to forward traces to Lightstep, and I'm passing in an access token (you'd find this in your Lightstep project under 'Settings') as a header.

Not already a Lightstep user? Get a free developer account here and follow along!

If we wanted to add another exporter to this pipeline, it'd be very simple -- create a new exporter, and add it to the array of exporters in the pipelines section of our configuration. We could also define a metrics pipeline and send that data to Prometheus or any other desired system. This is, really, one of the advantages of using the collector - you can manage where stuff goes, completely independent of how its generated.

Set up the Kubernetes Deployment and Service

Now that our configuration is settled, it's time to deploy the collector to our Kubernetes cluster and expose it so that it's reachable by our Ingress. At this point, I'd suggest you reference this GitHub repository as a source for the Kubernetes YAML, as I'm going to just point out things you should be aware of -- we're not doing anything too different from a bog-standard deployment. First, let's check out deployment.yaml.

One very important thing to note is the command being passed to the container. The flag --mem-ballast-size-mib must match the ballast_size_mib value defined in the ConfigMap. Other than this, it's a fairly straightforward deployment. The livenessProbe and readinessProbe are accessing port 13133 because that's the default (you enable this by adding the health_check extension in the collector configuration). Finally, take note of the image -- in this case, we're using a development build of the collector, but you may wish to use a stable release or the opentelemetry-collector-contrib container. See this link for information about what's contained in the contrib collector -- it's usually exporters and plugins that aren't in the "main line" collector. On to service.yaml. We're simply mapping port 55680 to port 80 on a ClusterIP, which is how our Ingress will connect to it.

At this point, you're ready to start deploying this to your cluster. Execute $ kubectl apply -f configmap.yaml -f deployment.yaml -f service.yaml, and the cluster will be on its way. After a few moments (slightly more if you've never pulled these containers before), running $ kubectl get pods should display something similar to this:

NAME                                        READY   STATUS    RESTARTS   AGE
web-collector-deployment-79cfc8797c-7vvln   1/1     Running   0          23h
web-collector-deployment-79cfc8797c-vzslm   1/1     Running   0          23h

Creating your Ingress and Certificates

You're halfway there, and if you were only going to be sending telemetry data to your collector from inside this cluster, you'd be ready to roll. However, we want to send data from outside the cluster, so we need to expose this service to the world.

Important Note! You'll need a domain name that you control in order to create a certificate using ACME, so be ready.

First, we're going to deploy our Ingress service in order to determine the external IP address we need to assign to our domain name. We'll actually be deploying this ingress in two steps, so rather than simply applying the existing ingress.yaml, take a look at this version:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: web-collector-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - hosts:
    - <YOUR FQDN>
    secretName: web-collector-tls
  rules:
  - host: <YOUR FQDN>
    http:
      paths:
      - path: /
        backend:
          serviceName: web-collector-svc
          servicePort: 80

For <YOUR FQDN>, you would want to use whatever domain name will point to your collector (in my case, I used 'otelwebtelemetry.com', but you could use a subdomain, such as 'collector.mysite.com'). Save this file and apply it using kubectl, and wait several minutes. Run $ kubectl get ingress and you should see something similar to the following:

NAME                    HOSTS                  ADDRESS           PORTS     AGE
web-collector-ingress   otelwebtelemetry.com   104.198.132.223   80, 443   22h

In your DNS management, set your host to the ADDRESS you see in your kubectl output. Note that DNS changes can take some time to propagate around the internet, so you may need to wait up to 30 minutes (or longer) - a good way to see if it's ready is to run $ dig @8.8.8.8 <your domain> and see if the answer section has correctly associated your domain name with the IP address of your ingress controller.

Meanwhile, you should verify that the Ingress controller is functioning properly. The easiest way to do so, for the collector, is to run curl against the OTLP receiver path.

$ curl -kivL -X POST -H 'Host: <YOUR FQDN>' 'http://<YOUR IP>/v1/trace'

This command will provide verbose output, follow redirects, show TLS headers, and not give an error on an insecure certificate as it makes a POST request to the OTLP endpoint. If you get a 200 OK response, everything is working, and we can set up certificate management through Let's Encrypt.

Refer to the le-staging-issuer.yaml and le-prod-issuer.yaml files in the repository. You should start with the staging one, as Let's Encrypt aggressively rate limits connections - once everything works, you'll switch to the production (prod) issuer.

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: your@email.com
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - http01:
        ingress:
          class:  nginx

In both this and the production issuer, make sure to change the email field to one that you control. You can then apply this to your cluster with $ kubectl apply -f le-staging-issuer.yaml. Verify that the issuer was successfully created and registered by running $ kubectl describe clusterissuer letsencrypt-staging and verify that the Type field is set to Ready.

In your ingress.yaml, add two new annotations:

annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-staging"
    acme.cert-manager.io/http01-ingress-class: "nginx"

Now, run $ kubectl apply -f ingress.yaml once again. After a few moments, run $ kubectl get certificate, and you should see a certificate that's set to the value of secretName in your ingress (in my case, it's web-collector-tls). Run $ kubectl describe certificate <name> and you should see 'Ready' under Type, as well as several events (one of which should say 'Certificate issued successfully').

The last step, then, is to switch from the staging Let's Encrypt issuer to the production one. In your ingress, change the "cert-manager.io/cluster-issuer" annotation value to "letsencrypt-prod" and the secretName so that it won't conflict with the staging secret (you can just add -prod). Deploy the production issuer by running $ kubectl apply -f le-prod-issuer.yaml, and then redeploy your ingress again. You should now have an OpenTelemetry Collector deployed to the public internet! Verify this with $ curl -vL -X POST https://<your domain>/v1/trace, if you see a 200 response code with empty braces as the body, then you're good to go!

Configure OpenTelemetry Web with the Collector Exporter

That was a lot, I know - but we're back to something more straightforward now. Only one more step to go! Back to our tracing.js file, add a new import and configure the Collector Exporter. The following is what your file should look like after we're done:

import { SimpleSpanProcessor } from '@opentelemetry/tracing';
import { WebTracerProvider } from '@opentelemetry/web';
import { DocumentLoad } from '@opentelemetry/plugin-document-load';
import { CollectorExporter } from '@opentelemetry/exporter-collector';

const exporter = new CollectorExporter({
  serviceName: '<your website name>',
  url: 'https://<your domain name>/v1/trace'
})

const locale = {
  "browser.language": navigator.language,
  "browser.path": location.pathname
}

const provider = new WebTracerProvider({
  plugins: [
    new DocumentLoad()
  ],
  defaultAttributes: locale
});

provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();

Now, if you've configured everything correctly up to this point, you should be able to refresh your page a few times in order to generate data and then open up Lightstep. You should see some data in the Explorer, corresponding to page loads from your site!

From here, you can simply deploy your site to the internet using Netlify, GitHub Pages, or your own personal hosting and start to see exactly how people are using your site in new and interesting ways. Want to know how page load speeds are for users with the Russian language, grouped by the pages they're viewing? Lightstep makes that easy!

Summary

We've been through a lot in this tutorial, so I think it's best to do a quick recap of everything we learned today.

Integrating OpenTelemetry into your static site is as straightforward as adding some packages and configuring a tracing module. You don't have to change any of your existing code, just make sure that you import the tracing module first!
Setting up an OpenTelemetry Collector is a great way to collect trace and metric data from your services, be they front-end or back-end, and can be done through Kubernetes.
Once you're using OpenTelemetry, Lightstep is a great way to get started analyzing your trace data, but by no means are you locked in. You can use OpenTelemetry to export data to a variety of open source and proprietary analysis tools!

Thanks for sticking with me through this, I know it's a lot to take in - but I know that once you try it, you'll find something to love. I firmly believe that OpenTelemetry solves one of the biggest problems that plague people who run and build software for a living, the eternal question, "What the $#@* is it doing?" As someone that's asked that question many times over the years, usually in the middle of the night, I'm really excited to see the progress it's made towards providing easier and better answers to that question.

Latest comments (1)

Hongbo Miao • Aug 16 '20 • Edited

Thanks for sharing, Austin!
This tutorial is extremely helpful.