Logging is the backbone of production observability. Without it, debugging a live incident is like navigating in the dark, you know something is wrong but have no way to trace why or where. Yet logging done carelessly introduces its own risks: sensitive user data written to disk, credentials captured in request logs, and compliance violations hiding in plain text files.
The challenge every engineering team faces is not whether to log, but what to log, how to structure it, and who gets access to it.
This guide covers production logging best practices that give your team the observability they need, without turning your log aggregator into a liability.
Why Logging Strategy Matters in Production
Poorly designed logging creates two equally dangerous failure modes:
Too little logging means you're flying blind during incidents. No request context, no error trail, no performance baseline. Mean time to resolution (MTTR) skyrockets because engineers spend hours reconstructing what happened.
Too much logging or logging the wrong things, creates serious security and compliance risks:
- Passwords captured in login request bodies
- Credit card numbers logged from payment payloads
- Session tokens recorded in access logs
- Personally Identifiable Information (PII) written to third-party log aggregators
- HIPAA or GDPR violations from retaining sensitive health or user data
A mature logging strategy threads this needle deliberately, maximizing signal while minimizing exposure.
Log Levels: Using Them Correctly
Log levels are the first layer of control. Using them correctly keeps your logs actionable and your signal-to-noise ratio high.
| Level | When to Use |
|---|---|
| ERROR | Unrecoverable failures that require immediate attention, exceptions, service crashes, failed critical operations |
| WARN | Recoverable issues that indicate something is wrong but hasn't broken yet, retry attempts, deprecated API usage, slow queries |
| INFO | Normal application lifecycle events, service startup, job completion, significant state transitions |
| DEBUG | Detailed diagnostic information useful during development and troubleshooting, never enabled in production by default |
| VERBOSE / TRACE | Highly granular execution flow, only for deep debugging in isolated environments |
A common mistake is logging everything at INFO or ERROR. This floods your aggregator with noise (making alerts meaningless) or misses important context entirely. Be deliberate: if a message doesn't require human attention, it doesn't belong at ERROR.
Structured Logging: JSON Over Plain Text
Plain text logs are human-readable but machine-hostile. Parsing "[2025-04-06 10:23:11] ERROR: User not found for ID 42" requires regex and breaks the moment the format changes.
Structured logging emits logs as JSON objects, making them directly queryable in any log aggregator (Datadog, ELK, Loki, CloudWatch):
{
"level": "error",
"message": "User not found",
"context": "UsersService",
"userId": "usr_8f3a2c",
"requestId": "req_9d1b4e",
"statusCode": 404,
"timestamp": "2025-04-06T10:23:11.412Z",
"environment": "production",
"service": "users-service"
}
Every field is indexable, filterable, and alertable. You can instantly query "all 404 errors in the users-service in the last hour" without parsing a single string.
Setting Up Structured Logging in NestJS with Pino
Pino is the fastest structured logger for Node.js with native JSON output and minimal overhead:
npm install nestjs-pino pino-http pino-pretty
Configure it in your AppModule:
// app.module.ts
import { LoggerModule } from 'nestjs-pino';
@Module({
imports: [
LoggerModule.forRoot({
pinoHttp: {
level: process.env.LOG_LEVEL ?? 'info',
transport:
process.env.NODE_ENV !== 'production'
? { target: 'pino-pretty' } // human-readable in development
: undefined, // raw JSON in production
redact: {
paths: [
'req.headers.authorization',
'req.headers.cookie',
'req.body.password',
'req.body.creditCard',
],
censor: '[REDACTED]',
},
},
}),
],
})
export class AppModule {}
The redact configuration is critical, it automatically censors sensitive fields before they ever reach your log output. More on this in the next section.
Redacting Sensitive Data
Sensitive data leaking into logs is one of the most common and costly compliance failures. It often happens accidentally, a developer logs req.body for debugging and forgets to remove it before merging.
What to Always Redact
- Authentication: passwords, tokens, API keys, session IDs, JWTs
- Payment data: credit card numbers, CVVs, bank account numbers
- PII: full names combined with identifiers, email addresses in certain contexts, phone numbers, dates of birth
- Health data: any field that could be classified as PHI under HIPAA
- Infrastructure secrets: database connection strings, internal IPs, service credentials
Redaction Strategies
Field-level redaction (as shown with Pino's redact config) is the most reliable approach, it operates at the logger level before output, regardless of what gets passed in.
For custom redaction logic, implement a sanitization utility:
// src/common/utils/sanitize-log.util.ts
const SENSITIVE_KEYS = new Set([
'password', 'token', 'secret', 'authorization',
'creditCard', 'ssn', 'apiKey', 'refreshToken',
]);
export function sanitizeForLog(obj: Record<string, unknown>): Record<string, unknown> {
return Object.fromEntries(
Object.entries(obj).map(([key, value]) => [
key,
SENSITIVE_KEYS.has(key.toLowerCase()) ? '[REDACTED]' : value,
])
);
}
Use this utility whenever logging request payloads or user-supplied data:
this.logger.log({
message: 'User registration attempt',
payload: sanitizeForLog(dto),
});
Never Log Full Request Bodies Indiscriminately
Logging req.body wholesale in a middleware is a common antipattern. Instead, log only specific, known-safe fields:
// ❌ Dangerous — logs everything including passwords
this.logger.log({ body: request.body });
// ✅ Safe — log only what you explicitly need
this.logger.log({
message: 'Login attempt',
email: request.body.email, // safe — not a secret
ip: request.ip,
});
Request Correlation with Trace IDs
In a distributed system, a single user action triggers calls across multiple services. Without a shared identifier, correlating logs across services is nearly impossible.
Request correlation IDs (also called trace IDs) solve this by attaching a unique identifier to every request at the entry point (API Gateway or first service), then propagating it through every downstream call via headers.
// src/common/middleware/correlation-id.middleware.ts
import { Injectable, NestMiddleware } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import { v4 as uuidv4 } from 'uuid';
@Injectable()
export class CorrelationIdMiddleware implements NestMiddleware {
use(req: Request, res: Response, next: NextFunction) {
const correlationId = (req.headers['x-correlation-id'] as string) ?? uuidv4();
req.headers['x-correlation-id'] = correlationId;
res.setHeader('x-correlation-id', correlationId);
next();
}
}
Include the correlationId in every log entry. When an incident occurs, you can filter your entire log aggregator by a single ID and see the complete request journey across every service, with timestamps, durations, and error context.
Log Retention and Access Control
Storing logs indefinitely is a compliance and cost problem. Define a retention policy that balances operational need with regulatory requirements:
| Log Type | Recommended Retention |
|---|---|
| Application errors | 90 days |
| Access / audit logs | 1-7 years (varies by regulation) |
| Debug logs | 7-14 days |
| Security events | 1-2 years |
Beyond retention, control who can access logs:
- Logs containing any PII should be access-controlled - not every engineer needs to read production user data.
- Use role-based access in your log aggregator (Datadog, Splunk, ELK) to restrict sensitive log streams.
- Enable audit logging on your log aggregator itself - knowing who queried which logs is part of your compliance story.
- Consider log encryption at rest for any aggregator storing sensitive application data.
Alerting: Turning Logs Into Action
Logs without alerts are archives, not observability. Define alert rules on your log aggregator for conditions that require immediate human attention:
-
Error rate spike: more than X
ERRORlevel logs per minute - Authentication failures: repeated 401s from the same IP (brute force indicator)
- Downstream service failures: sustained connection errors to a dependency
- Zero logs: a sudden absence of logs from a service may indicate it has crashed
Keep alert thresholds tuned, too sensitive and engineers become desensitized to noise; too lenient and real incidents go undetected.
Logging Antipatterns to Avoid
Logging in a tight loop: logging inside high-frequency loops or hot paths creates I/O pressure and can degrade application performance. Sample logs or aggregate counters instead.
Using console.log in production: console.log bypasses your structured logger, produces unstructured output, and cannot be controlled by log level configuration. Replace all instances with your logger before deploying.
Logging and rethrowing without context: catching an exception, logging it, and rethrowing it without adding context creates duplicate log entries with no additional signal. Add context or don't log, pick one.
Storing logs locally on application servers: local log files are lost when containers restart or servers are terminated. Always ship logs to an external aggregator in real time.
Recommended Tooling
| Category | Tool |
|---|---|
| Logger (Node.js) | Pino, Winston |
| Log aggregation | Datadog, ELK Stack, Grafana Loki |
| Distributed tracing | OpenTelemetry, Jaeger, Tempo |
| Alerting | PagerDuty, Grafana Alerting, Datadog Monitors |
| Compliance scanning | Nightfall, Presidio (PII detection in logs) |
Conclusion
Production logging is not a feature you bolt on after launch, it's a foundational engineering practice that determines how quickly your team can detect, diagnose, and resolve incidents. Done well, it's invisible to users and invaluable to engineers. Done poorly, it's either useless noise or a ticking compliance time bomb.
The formula is straightforward: use structured JSON logs, redact sensitive fields at the logger level, correlate requests with trace IDs, enforce retention policies, and alert on what matters. Everything else is tuning.
Observability and security are not in conflict, with the right logging architecture, they reinforce each other.
What log aggregator does your team use in production? Share your stack in the comments - we'd love to hear how different teams approach this.
Top comments (0)