Observability with Claude Code: OpenTelemetry Distributed Tracing for Node.js

#claudecode #opentelemetry #node #observability

"Why is this API slow?" and "Which service is throwing errors?" are questions you can't answer without observability. Claude Code can design the three pillars — logs, metrics, and traces — when you give it the right patterns.

CLAUDE.md for Observability Standards

## Observability Rules

### Three pillars (all required)
1. Logs: pino (structured JSON, PII masking) — see logging.md
2. Metrics: Prometheus format via prom-client
3. Traces: OpenTelemetry (distributed tracing)

### Tracing requirements
- All HTTP requests get a trace ID
- External API calls and DB operations create child spans
- Errors: record exception + set span status to ERROR
- Sampling: production 10%, development 100%

### Metrics requirements
- Request count: http_requests_total (method, route, statusCode)
- Latency: http_request_duration_ms (histogram with p50/p95/p99)
- Error rate: http_errors_total (statusCode, route)
- Business metrics: order_count, revenue_total, user_registrations_total

### Alert thresholds
- Error rate > 1% → warning
- p99 latency > 1s → warning
- p99 latency > 3s → critical

Generating OpenTelemetry Setup

Set up OpenTelemetry for distributed tracing.

Requirements:
- Auto-instrumentation: HTTP, Express, Prisma, Redis
- Exporter: OTLP (Jaeger or Grafana Tempo)
- Resource attributes: service.name, service.version, deployment.environment
- Sampling: TraceIdRatioBased (production: 10%, development: 100%)
- TypeScript

Generate: src/instrumentation.ts (import before anything else in main.ts)

Generated:

// src/instrumentation.ts — must be the FIRST import in main.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { TraceIdRatioBased } from '@opentelemetry/sdk-trace-base';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  resource: new Resource({
    'service.name': process.env.SERVICE_NAME ?? 'my-service',
    'service.version': process.env.npm_package_version ?? '0.0.0',
    'deployment.environment': process.env.NODE_ENV ?? 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
  }),
  sampler: new TraceIdRatioBased(
    process.env.NODE_ENV === 'production' ? 0.1 : 1.0
  ),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false },
    }),
  ],
});

sdk.start();
process.on('SIGTERM', () => sdk.shutdown());

Prometheus Metrics

Set up Prometheus metrics using prom-client.

Metrics:
- http_requests_total: Counter (method, route, statusCode)
- http_request_duration_ms: Histogram (method, route)
- active_connections: Gauge
- db_query_duration_ms: Histogram (operation, table)

Endpoint: GET /metrics (Prometheus scrape format)

Generate:
- src/lib/metrics.ts
- src/middleware/metricsMiddleware.ts

Generated:

// src/lib/metrics.ts
import { Registry, Counter, Histogram, collectDefaultMetrics } from 'prom-client';

export const register = new Registry();
collectDefaultMetrics({ register }); // Node.js built-in metrics

export const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register],
});

export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_ms',
  help: 'HTTP request duration in ms',
  labelNames: ['method', 'route'],
  buckets: [10, 25, 50, 100, 250, 500, 1000, 2500, 5000],
  registers: [register],
});

// src/middleware/metricsMiddleware.ts
export function metricsMiddleware(): RequestHandler {
  return (req, res, next) => {
    const start = Date.now();

    res.on('finish', () => {
      const duration = Date.now() - start;
      const route = req.route?.path ?? req.path;
      httpRequestsTotal.labels(req.method, route, String(res.statusCode)).inc();
      httpRequestDuration.labels(req.method, route).observe(duration);
    });

    next();
  };
}

Custom Spans for Business Logic

Add custom trace spans to this order processing flow:
- Check inventory
- Process payment
- Create order

Per span: operation name, orderId, userId, amount
On error: record exception + set status to ERROR

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

export async function processOrder(userId: string, items: OrderItem[]) {
  return tracer.startActiveSpan('order.process', async (span) => {
    span.setAttributes({ userId, itemCount: items.length });

    try {
      await tracer.startActiveSpan('inventory.check', async (s) => {
        await inventoryService.check(items);
        s.end();
      });

      const payment = await tracer.startActiveSpan('payment.process', async (s) => {
        const result = await paymentService.charge(userId, total);
        s.setAttributes({ paymentId: result.id, amount: total });
        s.end();
        return result;
      });

      span.setAttributes({ paymentId: payment.id });
      span.end();
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
      span.end();
      throw err;
    }
  });
}

Summary

Design observability with Claude Code:

CLAUDE.md — Define all three pillars: logs, metrics, traces
OpenTelemetry — Auto-instrumentation + OTLP export
Prometheus metrics — Request counts, latency histograms
Custom spans — Make business logic visible in traces

Code Review Pack (¥980) includes /code-review for observability gaps — missing spans on critical paths, metrics without labels, unsampled traces.

👉 prompt-works.jp

Myouga (@myougatheaxo) — Claude Code engineer focused on production observability.