"Why is this API slow?" and "Which service is throwing errors?" are questions you can't answer without observability. Claude Code can design the three pillars — logs, metrics, and traces — when you give it the right patterns.
CLAUDE.md for Observability Standards
## Observability Rules
### Three pillars (all required)
1. Logs: pino (structured JSON, PII masking) — see logging.md
2. Metrics: Prometheus format via prom-client
3. Traces: OpenTelemetry (distributed tracing)
### Tracing requirements
- All HTTP requests get a trace ID
- External API calls and DB operations create child spans
- Errors: record exception + set span status to ERROR
- Sampling: production 10%, development 100%
### Metrics requirements
- Request count: http_requests_total (method, route, statusCode)
- Latency: http_request_duration_ms (histogram with p50/p95/p99)
- Error rate: http_errors_total (statusCode, route)
- Business metrics: order_count, revenue_total, user_registrations_total
### Alert thresholds
- Error rate > 1% → warning
- p99 latency > 1s → warning
- p99 latency > 3s → critical
Generating OpenTelemetry Setup
Set up OpenTelemetry for distributed tracing.
Requirements:
- Auto-instrumentation: HTTP, Express, Prisma, Redis
- Exporter: OTLP (Jaeger or Grafana Tempo)
- Resource attributes: service.name, service.version, deployment.environment
- Sampling: TraceIdRatioBased (production: 10%, development: 100%)
- TypeScript
Generate: src/instrumentation.ts (import before anything else in main.ts)
Generated:
// src/instrumentation.ts — must be the FIRST import in main.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { TraceIdRatioBased } from '@opentelemetry/sdk-trace-base';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
resource: new Resource({
'service.name': process.env.SERVICE_NAME ?? 'my-service',
'service.version': process.env.npm_package_version ?? '0.0.0',
'deployment.environment': process.env.NODE_ENV ?? 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
}),
sampler: new TraceIdRatioBased(
process.env.NODE_ENV === 'production' ? 0.1 : 1.0
),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-fs': { enabled: false },
}),
],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
Prometheus Metrics
Set up Prometheus metrics using prom-client.
Metrics:
- http_requests_total: Counter (method, route, statusCode)
- http_request_duration_ms: Histogram (method, route)
- active_connections: Gauge
- db_query_duration_ms: Histogram (operation, table)
Endpoint: GET /metrics (Prometheus scrape format)
Generate:
- src/lib/metrics.ts
- src/middleware/metricsMiddleware.ts
Generated:
// src/lib/metrics.ts
import { Registry, Counter, Histogram, collectDefaultMetrics } from 'prom-client';
export const register = new Registry();
collectDefaultMetrics({ register }); // Node.js built-in metrics
export const httpRequestsTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
export const httpRequestDuration = new Histogram({
name: 'http_request_duration_ms',
help: 'HTTP request duration in ms',
labelNames: ['method', 'route'],
buckets: [10, 25, 50, 100, 250, 500, 1000, 2500, 5000],
registers: [register],
});
// src/middleware/metricsMiddleware.ts
export function metricsMiddleware(): RequestHandler {
return (req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
const route = req.route?.path ?? req.path;
httpRequestsTotal.labels(req.method, route, String(res.statusCode)).inc();
httpRequestDuration.labels(req.method, route).observe(duration);
});
next();
};
}
Custom Spans for Business Logic
Add custom trace spans to this order processing flow:
- Check inventory
- Process payment
- Create order
Per span: operation name, orderId, userId, amount
On error: record exception + set status to ERROR
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('order-service');
export async function processOrder(userId: string, items: OrderItem[]) {
return tracer.startActiveSpan('order.process', async (span) => {
span.setAttributes({ userId, itemCount: items.length });
try {
await tracer.startActiveSpan('inventory.check', async (s) => {
await inventoryService.check(items);
s.end();
});
const payment = await tracer.startActiveSpan('payment.process', async (s) => {
const result = await paymentService.charge(userId, total);
s.setAttributes({ paymentId: result.id, amount: total });
s.end();
return result;
});
span.setAttributes({ paymentId: payment.id });
span.end();
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
span.end();
throw err;
}
});
}
Summary
Design observability with Claude Code:
- CLAUDE.md — Define all three pillars: logs, metrics, traces
- OpenTelemetry — Auto-instrumentation + OTLP export
- Prometheus metrics — Request counts, latency histograms
- Custom spans — Make business logic visible in traces
Code Review Pack (¥980) includes /code-review for observability gaps — missing spans on critical paths, metrics without labels, unsampled traces.
Myouga (@myougatheaxo) — Claude Code engineer focused on production observability.
Top comments (0)