Sentry captures errors. Datadog monitors infrastructure. Grafana visualizes metrics. Using all three is overkill for most projects. Here's what production observability actually looks like for a Next.js app at early-to-mid scale.
The Three Pillars
Logs: What happened and when. Structured JSON, queryable.
Metrics: Aggregated numbers over time. Request rate, error rate, latency.
Traces: Request paths through your system. Where did this request spend its time?
Most apps need logs and metrics. Traces become valuable once you have multiple services.
Structured Logging with Pino
// lib/logger.ts
import pino from 'pino'
export const logger = pino({
level: process.env.LOG_LEVEL ?? 'info',
base: {
service: 'api',
version: process.env.npm_package_version,
environment: process.env.NODE_ENV,
},
...(process.env.NODE_ENV === 'development' && {
transport: { target: 'pino-pretty', options: { colorize: true } },
}),
})
// Child logger with request context
export function createRequestLogger(requestId: string, userId?: string) {
return logger.child({ requestId, userId })
}
Request ID Middleware
// middleware.ts
import { NextResponse } from 'next/server'
import { v4 as uuid } from 'uuid'
export function middleware(req: Request) {
const requestId = req.headers.get('x-request-id') ?? uuid()
const response = NextResponse.next()
response.headers.set('x-request-id', requestId)
return response
}
Tag every log line with requestId to trace all events from a single request.
Error Tracking with Sentry
npm install @sentry/nextjs
npx @sentry/wizard@latest -i nextjs
// sentry.client.config.ts
import * as Sentry from '@sentry/nextjs'
Sentry.init({
dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
beforeSend(event) {
// Strip PII before sending
if (event.user) delete event.user.email
return event
},
})
Custom Metrics
Track business metrics alongside technical ones:
// lib/metrics.ts
import { Counter, Histogram, register } from 'prom-client'
export const httpRequests = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
})
export const requestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
})
export const aiTokensUsed = new Counter({
name: 'ai_tokens_total',
help: 'Total AI tokens consumed',
labelNames: ['model', 'type'],
})
// Expose /metrics endpoint for Prometheus scraping
// app/api/metrics/route.ts
export async function GET() {
return new Response(await register.metrics(), {
headers: { 'Content-Type': register.contentType },
})
}
Health Check Endpoint
// app/api/health/route.ts
export async function GET() {
const checks = await Promise.allSettled([
db.$queryRaw`SELECT 1`, // database
redis.ping(), // cache
])
const [dbCheck, redisCheck] = checks
const healthy = checks.every(c => c.status === 'fulfilled')
return Response.json({
status: healthy ? 'ok' : 'degraded',
checks: {
database: dbCheck.status === 'fulfilled' ? 'ok' : 'error',
cache: redisCheck.status === 'fulfilled' ? 'ok' : 'error',
},
uptime: process.uptime(),
version: process.env.npm_package_version,
}, {
status: healthy ? 200 : 503,
})
}
Alerting Rules
Set up alerts for these thresholds:
- Error rate > 1% of requests → immediate alert
- p99 latency > 2s → warning alert
- Health check failing → immediate alert
- Disk > 80% full → warning alert
- AI token spend > $50/day → cost alert
The AI SaaS Starter at whoffagents.com ships with Pino structured logging, Sentry error tracking, and /api/health pre-configured. $99 one-time.
Top comments (0)