Most logging is written for the person who wrote the code. The author knows the system, knows what the messages mean, and can fill in context from memory. Then at 3 AM, an alert fires, and the person debugging is someone else — or it's the author at month six, who has forgotten everything.
Logs that work at 3 AM are different from logs that look good in a unit test. The difference is structure, context, and the discipline to leave breadcrumbs that the future-you will actually be able to follow.
We've had two real incidents on HelperX where this mattered. Both were resolved in under an hour because the logs were good. Both could have been multi-hour debugging sessions if the logs had been the typical console.log("error") slop.
This is the logging setup that actually helped, and the incident that shaped it.
The incident that taught us
3:47 AM, a Saturday. Pager: "high error rate on slot worker pool." I opened the dashboard. Half the slot workers were failing on reply generation with a non-specific timeout. The errors had been happening for 12 minutes.
The instinct was to look at the most recent error logs. They said:
[2026-04-12 03:47:23] ERROR Reply generation failed for slot abc123
[2026-04-12 03:47:23] ERROR Reply generation failed for slot def456
[2026-04-12 03:47:24] ERROR Reply generation failed for slot ghi789
Nothing. Just "failed." No error message, no stack, no context. The errors were correlated in time but the messages were useless.
The actual root cause turned out to be a regional outage at our LLM provider that was returning 503s with empty bodies. Our timeout handler was swallowing the body, logging the generic message, and moving on. The fix took 4 minutes once we found the cause. Finding the cause took 38 minutes.
Most of those 38 minutes were spent scrolling through logs trying to find any hint of what was happening. After the incident, I rewrote the logging system in a weekend. The next incident — a proxy provider hiccup three months later — was resolved in 11 minutes flat.
The pillars: what to log
The setup has four design rules. Every log line satisfies all four.
1. Structure, not strings
Every log line is JSON, not text. This is non-negotiable.
// Bad
logger.error(`Reply generation failed for slot ${slotId}: ${err.message}`);
// Good
logger.error({
event: 'reply_generation_failed',
slot_id: slotId,
error: err.message,
error_code: err.code,
duration_ms: duration,
tweet_id: tweet.id,
attempt: attemptNumber,
}, 'Reply generation failed');
The text version is fine if you're tailing logs by eye. The structured version is queryable, filterable, and aggregatable. At 3 AM, you don't want to grep — you want to WHERE event = 'reply_generation_failed' AND error_code = 'ECONNRESET'.
We use Pino because it's fast and outputs JSON natively. Bunyan, Winston with JSON format, or any equivalent works fine. The library matters less than the discipline.
2. Every log line has an event name
The event field is the most important field in any log. It's what you'll group by, filter on, and alert against.
Event names follow a convention:
-
noun_verb_status—reply_generation_failed,proxy_request_succeeded,auth_token_refreshed - Lowercase, underscore-separated
- Always include the status (success/failed/skipped/started/completed)
This convention is what makes log analytics queryable. The first dashboard I built after the incident was a per-event-name frequency chart. Twelve minutes of looking at it tells me everything about system health.
3. Correlation IDs everywhere
Every log line carries a correlation_id that follows a logical operation across all the modules it touches.
async function processSlot(slot) {
const correlationId = crypto.randomUUID();
const log = baseLogger.child({ correlation_id: correlationId, slot_id: slot.id });
log.info({ event: 'slot_processing_started' });
const tweets = await fetchTweets(slot, log);
log.info({ event: 'tweets_fetched', count: tweets.length });
for (const tweet of tweets) {
const tweetLog = log.child({ tweet_id: tweet.id });
const reply = await generateReply(tweet, slot, tweetLog);
if (reply) {
await postReply(reply, slot, tweetLog);
}
}
log.info({ event: 'slot_processing_completed' });
}
When something fails, you grab the correlation_id from the error log and pull every log line with that ID. You get the entire trace of the operation in one query. No more scrolling through logs trying to figure out what request the error belonged to.
We use Pino's child mechanism for this — child loggers automatically include the parent's fields:
const log = baseLogger.child({ correlation_id, slot_id });
// All future log calls automatically include those fields
4. Latency on every operation
Every meaningful operation logs its duration. This is the second-most-important field after event:
const start = Date.now();
try {
const result = await externalApiCall();
log.info({ event: 'external_api_succeeded', duration_ms: Date.now() - start });
return result;
} catch (err) {
log.error({
event: 'external_api_failed',
duration_ms: Date.now() - start,
error: err.message,
error_code: err.code,
});
throw err;
}
At 3 AM, the question "is this fast or slow?" is what tells you whether the system is hung or just struggling. Without a duration field, you have to infer from timestamps — which works but is annoying. With it, your dashboard can show "p99 latency by event" and you immediately spot the slow operations.
The implementation
The full Pino setup, with our customizations:
// lib/logger.js
import pino from 'pino';
import { hostname } from 'os';
const isProduction = process.env.NODE_ENV === 'production';
const baseLogger = pino({
level: process.env.LOG_LEVEL || 'info',
base: {
service: 'helperx-worker',
hostname: hostname(),
pid: process.pid,
},
formatters: {
level: (label) => ({ level: label }), // string level instead of number
},
timestamp: pino.stdTimeFunctions.isoTime,
redact: {
paths: [
'req.headers.authorization',
'req.headers.cookie',
'auth_token',
'password',
'api_key',
'*.password',
'*.token',
'*.secret',
],
censor: '[REDACTED]',
},
});
// Per-slot logger factory
export function getSlotLogger(slotId) {
return baseLogger.child({ slot_id: slotId });
}
// Operation logger factory — includes correlation_id
export function getOperationLogger(parentLog, operation) {
return parentLog.child({
correlation_id: crypto.randomUUID(),
operation,
});
}
export default baseLogger;
The redaction is important — auth_token, password, and similar fields get scrubbed before they hit the log. Easy to forget, hard to recover from if they leak.
Wrapping operations
Most operations follow a standard pattern. We wrap it in a helper:
// lib/log-operation.js
export async function logOperation(log, eventBase, fn, context = {}) {
const start = Date.now();
log.info({ event: `${eventBase}_started`, ...context });
try {
const result = await fn();
log.info({
event: `${eventBase}_succeeded`,
duration_ms: Date.now() - start,
...context,
});
return result;
} catch (err) {
log.error({
event: `${eventBase}_failed`,
duration_ms: Date.now() - start,
error: err.message,
error_code: err.code,
stack: err.stack,
...context,
});
throw err;
}
}
// Usage
const reply = await logOperation(
log,
'reply_generation',
() => llmGenerate(tweet, persona),
{ tweet_id: tweet.id, persona_id: persona.id, model: 'haiku' },
);
The wrapper guarantees:
- Start and end log lines have matching
eventprefixes - Duration is captured automatically
- Errors include stack traces in dev, not in production (to keep logs slim)
- Context is propagated to both success and failure logs
90% of our operations go through this wrapper. The 10% that don't are ones with custom log shapes for very specific cases.
Sampling: log less, see more
Logging too much is its own problem. We log:
- Every error — 100%
- Every operation start/end at info level — 100%
- Every debug-level event — 1% in production, 100% in development
- Every routine event (heartbeats, polls) — 10% sampled
The sampling logic:
function sampledInfo(log, sampleRate, payload, msg) {
if (Math.random() < sampleRate) {
log.info({ ...payload, sampled: true, sample_rate: sampleRate }, msg);
}
}
// Usage in a tight loop
for (const item of largeBatch) {
sampledInfo(log, 0.01, { event: 'batch_item_processed', item_id: item.id });
}
The sampled: true flag tells log analytics that this log represents many more events than there are log lines. Aggregations can multiply counts by 1 / sample_rate for accurate volume tracking.
The math: at 200 slots × 1,000 operations/day, full info logging would be 200K lines/day = ~60MB/day. With sampling, it's ~6MB/day. Storage and query cost drops dramatically.
Log levels that mean something
Default Node.js logging has 5+ levels (trace, debug, info, warn, error, fatal). Most teams use them inconsistently. We use them strictly:
| Level | Use case | Frequency |
|---|---|---|
| error | Operation failed, action needed | rare |
| warn | Operation succeeded with caveat | uncommon |
| info | Normal operation milestones | common |
| debug | Detailed tracing for active debugging | high (sampled in prod) |
| trace | Reserved for very deep instrumentation | almost never |
The strict rule: if it's at error level, it must be alertable. If it's not actionable at 3 AM, it's not an error — it's an info with outcome: 'failed' or a warn.
This rule sounds pedantic but it changes alerting completely. The error log channel becomes the alert channel. If errors are appearing, something needs human attention. If something doesn't need human attention, it's not in the error log.
Log shipping and storage
We use a simple setup:
- Pino writes to stdout in production (no file rotation in the process)
- A sidecar process tails stdout and ships to our log aggregator
- The aggregator is a small self-hosted ELK stack (Elasticsearch + Kibana)
This is overkill for our scale but the queryability is worth it. At 6MB/day per slot worker, a single instance with 100 slots produces ~600MB/day = ~220GB/year. A 1TB Elasticsearch instance handles 4+ years of logs.
For a smaller setup, just shipping to a managed service like Logtail, Better Stack, or Axiom works fine. The cost is typically $10-50/month, which buys you indexing and search without operating the infrastructure.
The crucial thing is something queryable. Plain log files are useless when you have 100 worker processes writing to them in parallel. You need an index and a search UI, or you're back to grep at 3 AM.
The dashboards that matter
After incidents, I built three dashboards. These are the ones I still use.
1. Error rate by event_name
Top-level query: COUNT(*) WHERE level = 'error' GROUP BY event, 1m
Shows which event names are erroring, when, and at what rate. The first thing I look at when a pager fires.
2. p50/p95/p99 duration by event_name
Top-level query: PERCENTILES(duration_ms, [50, 95, 99]) WHERE event LIKE '%_succeeded' GROUP BY event, 5m
Shows which operations are getting slow. Latency regressions usually precede outright failures by hours or days.
3. Per-slot failure rate
Top-level query: COUNT(*) WHERE event = 'reply_generation_failed' GROUP BY slot_id, 5m
Highlights individual slots that are failing while the rest are healthy. Catches per-slot issues (bad credentials, proxy problem) before they look like global incidents.
These three dashboards cover 80% of the operational debugging we ever need to do.
Common mistakes worth calling out
Logging the entire object
log.info({ event: 'thing_happened', context: req }); // BAD
req contains headers, body, params, hops, and references to the response. Serializing it produces 10KB+ of useless detail. Pick the fields you actually need.
Logging in tight loops without sampling
for (let i = 0; i < 10000; i++) {
log.debug({ event: 'item_processed', index: i });
}
This generates 10K log lines per batch. Multiply by your batch frequency and you've blown out your log volume budget. Sample.
Logging only on success
const reply = await llmGenerate(tweet);
log.info({ event: 'reply_generated' });
If llmGenerate throws, you never see a log for that attempt. You also have no record of attempts that didn't succeed. Log start, log success, log failure.
Including PII without redaction
log.info({ event: 'user_registered', email: user.email });
The email is PII. Even if your log retention is short, log databases get backed up and shared. Use a hashed version, or redact.
Forgetting timezones
log.info(`Event at ${new Date()}`);
Local time is locale-dependent. Use ISO 8601 with timezone — Pino does this by default with pino.stdTimeFunctions.isoTime.
Key takeaways
- Structured JSON logs are non-negotiable. Strings are for development.
-
Every log line has an
eventname followingnoun_verb_statusconvention. -
Correlation IDs propagate through all related logs. Pino's
childlogger pattern does this elegantly. - Latency on every operation. Without it, you can't tell hung from slow.
- Wrap operations in a helper so the start/end/duration pattern is consistent and unmissable.
- Sample routine events — 1-10% in production keeps storage costs manageable.
-
errorlevel means alertable. If it's not action-required, it's not an error. - Build the three dashboards — error rate by event, p99 by event, per-tenant failure rate. They cover most incidents.
Good logging is a workout you do for your future self. The investment feels like overhead while you're building. It pays back massively the first time a pager fires.
HelperX ships with structured logging, per-slot correlation IDs, and reference dashboards out of the box. Self-hosted, free 30-day trial.
Top comments (0)