Gabriel Anhaia

Posted on May 3

OpenTelemetry in TypeScript: Trace Your Hono Service in 50 Lines

#typescript #opentelemetry #observability #node

Book: TypeScript in Production
Also by me: The TypeScript Library — the 5-book collection
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You have seen the shape of this incident before. A 500 lands in production. The frontend says "checkout failed". The Hono service that owns /checkout called the pricing service, which called the inventory service. Inventory is the one that timed out, but its logs are on a different host with a different clock skew, and no shared request id. The on-call engineer spends the next stretch of the morning tail-grepping log streams, trying to align them by guess.

The version with OpenTelemetry wired in is different. You open the trace by request id, see the full span tree across all four services, and the slow span has a red border on inventory.reserve with the SKU and warehouse partition as attributes. The fix ships before the support inbox finishes loading.

The gap between those outcomes is small. The bootstrap file is about fifty lines of TypeScript; the Hono middleware adds another forty; the example handler is about thirty. Three packages plus a docker compose up for the backend, and you are done.

What you are wiring up

The Node SDK (@opentelemetry/sdk-node) initializes the global tracer provider, wires the resource (service name, version, environment), starts the OTLP exporter, and plugs in the auto-instrumentations. The auto-instrumentation bundle (@opentelemetry/auto-instrumentations-node) patches Node's http/https, pg, mysql2, ioredis, mongodb, fetch, and a couple of dozen other modules at require time, so every outgoing call gets a span without you writing code. A small Hono middleware turns each incoming request into a server span with the route, method, and status code, and threads the trace context for everything the handler does inside.

The exporter sends spans to an OTLP endpoint. In dev that endpoint is a local Jaeger container. In prod it is Grafana Tempo, Honeycomb, Datadog's OTLP gateway, or whatever your team runs. Switching between them is one env var.

Install and bootstrap

npm install hono \
  @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-proto \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions \
  @opentelemetry/api

@opentelemetry/api is the package your application code imports from. The other OTel packages are wired up at startup and you never reference them again from your handlers. That separation matters: if you swap the exporter or the SDK later, no business code changes.

tracing.ts gets loaded before any other module in your app. That is the only load-order rule that matters in OTel-Node land. Auto-instrumentations patch modules at require / import time, so they have to run before the modules they wrap.

// tracing.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from
  "@opentelemetry/exporter-trace-otlp-proto";
import { getNodeAutoInstrumentations } from
  "@opentelemetry/auto-instrumentations-node";
import { resourceFromAttributes } from "@opentelemetry/resources";
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
} from "@opentelemetry/semantic-conventions";
// deployment.environment.name is still an unstable convention,
// so it lives in the /incubating sub-export.
import {
  ATTR_DEPLOYMENT_ENVIRONMENT_NAME,
} from "@opentelemetry/semantic-conventions/incubating";

const exporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT
    ?? "http://localhost:4318/v1/traces",
});

const sdk = new NodeSDK({
  resource: resourceFromAttributes({
    [ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME
      ?? "checkout-api",
    [ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION
      ?? "0.0.0",
    [ATTR_DEPLOYMENT_ENVIRONMENT_NAME]: process.env.NODE_ENV
      ?? "development",
  }),
  traceExporter: exporter,
  instrumentations: [
    getNodeAutoInstrumentations({
      // The fs instrumentation fires on every read and is
      // pure noise for an HTTP service. Disable it.
      "@opentelemetry/instrumentation-fs": { enabled: false },
    }),
  ],
});

sdk.start();

And the shutdown handler that most blog posts skip:

process.on("SIGTERM", () => {
  sdk.shutdown()
    .catch((err) => console.error("OTel shutdown failed", err))
    .finally(() => process.exit(0));
});

That is the whole bootstrap. A few notes on the choices.

The exporter uses OTLP HTTP/protobuf (exporter-trace-otlp-proto), not gRPC. HTTP/protobuf is the more portable choice for Node services: it works through proxies, works on serverless, and is debuggable with curl. Use the gRPC exporter only if your backend specifically needs it. (See the OTel JS exporters page for the full comparison.)

resourceFromAttributes is the current Resources API. In the OTel JS 2.x line the Resource class itself was made package-private (renamed ResourceImpl internally), and resourceFromAttributes, defaultResource, and emptyResource are the public factories now. Three resource attributes carry their weight on every single span the SDK exports: service name, service version, and deployment environment. They are the labels you join on, group by, and filter with for the next year. Set them once, set them right, and you save yourself a fleet-wide redeploy later.

The instrumentation-fs toggle is the one auto-instrumentation everyone disables. It traces every fs.readFile, which on a hot path means hundreds of useless spans per request. Turn it off the day you wire OTel up.

The SIGTERM handler is the part most blog posts skip and prod regrets. The OTLP exporter buffers spans and flushes on a timer (the BatchSpanProcessor under the hood); without sdk.shutdown() on shutdown, the last few seconds of traces from a rolling deploy disappear. That is exactly the window you want traces from when something goes wrong.

Loading it before everything else

Two ways to make sure tracing.ts runs first.

The clean way, in your package.json:

{
  "scripts": {
    "start": "node --require ./dist/tracing.js dist/server.js",
    "dev": "tsx --import ./tracing.ts src/server.ts"
  }
}

--require (CJS) or --import (ESM) loads the file before the entrypoint. Auto-instrumentations register their hooks, and then your import "hono" picks up the instrumented http module.

The lazy way: import "./tracing" as the very first line of server.ts. It works for most cases, but it depends on import-order discipline that breaks the moment somebody adds a side-effecting import above it. Use the flag.

The Hono middleware

Hono runs in Node, Bun, Deno, Cloudflare Workers, and a handful of edge runtimes. The auto-instrumentations only patch Node's http module, which means on Node you already get a server span per request from the auto-instrumentation. On Bun or Workers you do not. The runtime ships its own HTTP layer that the Node instrumentation cannot patch.

A small middleware fixes that gap and lets you attach Hono-specific attributes (the matched route pattern) that the raw http span does not know about.

// otel-middleware.ts
import { trace, SpanStatusCode, SpanKind } from "@opentelemetry/api";
import type { MiddlewareHandler } from "hono";
import { routePath } from "hono/route";

// The second arg is the *instrumentation* version of this
// middleware itself, not Hono's version.
const tracer = trace.getTracer("hono-otel-middleware", "1.0.0");

export const otelMiddleware = (): MiddlewareHandler =>
  async (c, next) => {
    // routePath() comes from Hono's Route Helper. c.req.routePath
    // was deprecated in Hono v4.8.0 in favour of this.
    const route = routePath(c) ?? c.req.path;
    const name = `${c.req.method} ${route}`;

    await tracer.startActiveSpan(
      name,
      { kind: SpanKind.SERVER },
      async (span) => {
        span.setAttribute("http.request.method", c.req.method);
        span.setAttribute("url.path", c.req.path);
        span.setAttribute("http.route", route);

        try {
          await next();
          const status = c.res.status;
          span.setAttribute("http.response.status_code", status);
          if (status >= 500) {
            span.setStatus({ code: SpanStatusCode.ERROR });
          }
        } catch (err) {
          span.setStatus({
            code: SpanStatusCode.ERROR,
            message: (err as Error).message,
          });
          span.recordException(err as Error);
          throw err;
        } finally {
          span.end();
        }
      },
    );
  };

startActiveSpan is doing the heavy lifting. It creates the span and sets it as the active span on the current async context, so any code that runs inside next() automatically nests under it. That includes your handlers, your Drizzle queries, and your fetch calls. That is how you get a tree instead of a flat list of unrelated spans.

routePath(c) returns the matched route pattern (/users/:id), not the resolved URL (/users/42). You want the pattern as the span name; the resolved path goes on url.path. Otherwise your trace explorer shows ten thousand unique span names instead of one named GET /users/:id with cardinality on an attribute. (On Hono 4.7 and earlier, fall back to c.req.routePath.)

The cardinality trap is real: the recordException plus the ERROR status on 5xx are what make the trace queryable later. "Show me all spans where status >= 500 in the last hour" is the query you will actually run during an incident.

Wiring it up

Assume Drizzle wired to Postgres for illustration:

// server.ts
import { Hono } from "hono";
import { otelMiddleware } from "./otel-middleware.js";
import { trace } from "@opentelemetry/api";
import { db } from "./db.js";
import { items } from "./schema.js";
import { eq } from "drizzle-orm";

const app = new Hono();
app.use("*", otelMiddleware());

// Different tracer name from the middleware on purpose: this one
// is for business code, named after the domain. The middleware's
// tracer is named after what it instruments (the framework).
const tracer = trace.getTracer("checkout", "1.0.0");

app.get("/checkout/:cartId", async (c) => {
  const cartId = c.req.param("cartId");

  return tracer.startActiveSpan("checkout.price", async (span) => {
    span.setAttribute("checkout.cart.id", cartId);

    try {
      const rows = await db.select().from(items)
        .where(eq(items.cartId, cartId));
      span.setAttribute("checkout.cart.size", rows.length);

      const res = await fetch(
        `${process.env.PRICING_URL}/price`,
        { method: "POST", body: JSON.stringify(rows) },
      );
      const total = (await res.json()) as { total: number };
      span.setAttribute("checkout.total_cents", total.total);

      return c.json(total);
    } finally {
      span.end();
    }
  });
});

export default app;

Most of what gets traced here, you did not write code for. The app.use("*", otelMiddleware()) line creates the server span for every request, and that is the parent everything else nests under. The db.select() call goes through Drizzle, which uses pg under the hood, which the auto-instrumentation patched at startup, so you get a span named something like pg.query with the SQL on it, automatically. The fetch to the pricing service goes through Node's built-in fetch, which the auto-instrumentation also patched: you get a client span with the target URL, the method, and the response status, and the traceparent header is injected automatically into the outgoing request, so the pricing service can pick up the trace context on its end.

The tracer.startActiveSpan("checkout.price", ...) is the one custom span. It carries business attributes the auto-instrumentations cannot know about: the cart id, the cart size, the final total. Custom spans are where the trace stops being a generic HTTP/SQL waterfall and starts being a story your team can read.

The parent-child relationship is automatic. startActiveSpan reads the current async context (set by the middleware), nests the new span under the server span, and any auto-instrumented pg/fetch calls inside nest under that. No manual parentSpanId plumbing.

Context propagation across services

The W3C traceparent header is what makes a four-service trace possible. It looks like this:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

Trace id, span id, sample flag. The auto-instrumentation injects it on every outgoing fetch/http/gRPC call, and extracts it on every incoming request. Your downstream service, if it has its own OTel SDK loaded, picks up the parent context and continues the trace.

That is the single biggest thing OTel buys you. Your team does not need a shared logging schema, a custom request id, a homemade correlation header, or a careful contract between services. Every service initializes the SDK, every service ships traces to the same OTLP endpoint, and the trace tree assembles itself across language and framework boundaries.

The one footgun: a service with no OTel installed but in the middle of the chain breaks the trace. The header gets dropped, the downstream service starts a new trace id, and your dashboard shows two unconnected trees.

When you onboard OTel to a system, do it from the edge inward: frontend, gateway, then services in dependency order. The first deploy that does not break the chain is the day every trace becomes end-to-end.

Backends: Jaeger in dev, Tempo (or anything OTLP) in prod

Drop this into a docker-compose.yml next to your service:

services:
  jaeger:
    # Pinned to v1.62 because COLLECTOR_OTLP_ENABLED is the
    # right flag for Jaeger v1.x. Jaeger v2 uses a different
    # config model and OTLP is on by default there.
    image: jaegertracing/all-in-one:1.62
    container_name: jaeger
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "16686:16686"   # Jaeger UI
      - "4317:4317"     # OTLP gRPC
      - "4318:4318"     # OTLP HTTP

docker compose up -d, open http://localhost:16686, and the next request to your Hono service shows up in the search tab within a couple of seconds. COLLECTOR_OTLP_ENABLED=true is what flips on the OTLP receivers in v1.x; without it, port 4318 is closed and your exporter quietly fails into the void. (Verified against Jaeger's official getting-started guide.)

For the search query: pick the service name you set in the resource (checkout-api), pick a recent time range, and you get the full span tree per request, with the SQL, the outgoing fetch, and your custom checkout.price span all nested under the Hono server span. The slow span at the bottom is your suspect.

In prod the URL changes and almost nothing else does:

OTEL_EXPORTER_OTLP_ENDPOINT=https://tempo.example.com:4318/v1/traces
OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer ${TEMPO_TOKEN}"

Same OTLP HTTP/protobuf wire format, same exporter code, different URL and an auth header. Honeycomb, Datadog OTLP gateway, New Relic, and Lightstep all consume the same OTLP payload. The exporter is the part of OTel you almost never need to swap; the protocol is the actual standard.

In prod, almost always run an OTel Collector between your service and the backend. The collector batches, retries, drops on overload, and gives you one place to attach extra resource attributes (region, cluster, k8s pod name) without rebuilding your service image. The exporter in your service points at the collector's 4318; the collector points at Tempo. Worth it the day you switch backends or add a second one.

The three attributes you should always set

Most teams set service name and stop. Set three. service.name is the identifier the trace UI groups by, so pick something that survives renames (checkout-api, not api, not node-server, not the hostname). service.version is the build version (git SHA, semver tag, whatever your CI sets); the single most useful filter during an incident is "which version started failing?", and without service.version on every span you are guessing from deploy timestamps. deployment.environment.name is production or staging or dev, because eventually somebody points a staging service at the prod OTLP endpoint and you need to filter the noise. The tracing.ts above sets all three. If you do nothing else from this post, set those three.

When to add metrics and logs

OpenTelemetry covers three signals: traces, metrics, logs. Traces are the right starting point. They answer "where did the time go?" and "where did the error happen?" in one query.

Add metrics when traces become too high-cardinality to query from the dashboard layer. Request rate, error rate, p95 latency by endpoint: those belong in metrics. Trace sampling gives statistical samples; metrics give exact counts. The Node SDK supports metrics through the same package: add a metricReader to the NodeSDK constructor.

Add logs through OTel when you want them correlated to traces by trace id. The auto-instrumentation includes log correlation for pino and winston: every log line picks up trace_id and span_id, so pivoting from a span to its logs becomes one click. Order: traces, then metrics, then logs.

What comes after

Wire this up on a Tuesday afternoon. The next time a request goes wrong, the on-call engineer who would have spent the morning correlating logs spends two minutes reading a tree instead. That is when OTel earns the lines.

Two things compound from there. The first is sampling: head sampling at the SDK, tail sampling at the collector. You keep the slow and erroring traces and drop the boring 200-OK ones. The second is custom spans named for operations a product manager would recognise (checkout.price, cart.merge, inventory.reserve), not internal function names. Span links across async boundaries come next, so a queue job processed five minutes later still threads back to the request that produced it.

Each is another small file. The thing that turns OTel into something the team actually uses is the first incident resolved by reading a trace. Wire it up, ship it, wait for the first interesting failure.

If this was useful

OTel is one of the production-layer concerns that quietly decides whether your team can debug a service at 02:00 or has to wait for daylight. The same is true of tsconfig, build pipelines, monorepo layout, and library authoring. TypeScript in Production is the book for the production layer of TypeScript work.

The full TypeScript Library, five books:

TypeScript Essentials — daily-driver TS across Node, Bun, Deno, and the browser: Amazon
The TypeScript Type System — generics, mapped/conditional types, template literals, branded types: Amazon
Kotlin and Java to TypeScript — bridge for JVM developers: Amazon
PHP to TypeScript — bridge for modern PHP 8+ developers: Amazon
TypeScript in Production — tsconfig, build, monorepos, library authoring: Amazon