Bulletproof Error Handling in Node.js CLI Tools: A Complete Guide
When your CLI tool crashes at 2 AM inside a CI pipeline, nobody is there to read the stack trace. The build fails, the deploy stalls, and a developer wakes up to a cryptic "ENOENT: no such file or directory" with zero context about what went wrong or how to fix it.
CLI tools operate in a fundamentally different environment than web applications. There is no retry button, no friendly 500 page, no customer support chat widget. When something breaks, the user sees raw terminal output — and they need to understand it immediately. After building over 30 production npm CLI tools, I have learned that error handling is not an afterthought. It is the feature that separates a toy script from a tool people trust.
This guide covers everything you need to build CLI tools that fail gracefully, communicate clearly, and recover when possible.
Why CLI Error Handling Is Uniquely Challenging
Web applications have layers of infrastructure absorbing failures: load balancers, reverse proxies, framework-level exception handlers, and monitoring dashboards. CLI tools have none of that. They run in wildly unpredictable environments — different operating systems, shell configurations, permission models, and network conditions.
Here is what makes CLI error handling distinct:
- No visual UI. You cannot show a modal or a toast notification. You have stdout and stderr, and you need to use them correctly.
- Exit codes matter. Other tools, scripts, and CI systems depend on your exit code to decide what happens next.
- Piping and composition. Your tool's output might feed into another tool. Errors must not corrupt the data stream.
- Diverse environments. Your tool might run on macOS, Linux, Windows, inside Docker, inside a GitHub Action, or on a Raspberry Pi.
- Non-interactive contexts. You cannot prompt for input when running in CI. You need to fail with clear instructions instead.
Graceful Exit Codes: When to Use 0, 1, and 2
Exit codes are your CLI tool's primary communication channel with the outside world. Every parent process, shell script, and CI pipeline reads them.
The conventions are straightforward but often ignored:
// exit-codes.js
const EXIT = {
SUCCESS: 0, // Everything worked
GENERAL_ERROR: 1, // Something went wrong at runtime
MISUSE: 2, // Invalid arguments, bad usage
CANNOT_EXECUTE: 126, // Permission denied on target
NOT_FOUND: 127, // Command or file not found
SIGINT: 130, // Terminated by Ctrl+C (128 + 2)
};
function exitWithCode(code, message) {
if (message) {
const stream = code === 0 ? process.stdout : process.stderr;
stream.write(message + '\n');
}
process.exit(code);
}
A common mistake is exiting with code 0 when a soft error occurs. If your tool was asked to process 10 files and 3 failed, that is not a success. Use a non-zero exit code and report what failed.
function processFiles(files) {
const results = { succeeded: 0, failed: 0, errors: [] };
for (const file of files) {
try {
transform(file);
results.succeeded++;
} catch (err) {
results.failed++;
results.errors.push({ file, message: err.message });
}
}
if (results.failed > 0) {
console.error(`Processed ${results.succeeded} files, ${results.failed} failed:`);
results.errors.forEach(e => console.error(` ${e.file}: ${e.message}`));
process.exit(1);
}
console.log(`All ${results.succeeded} files processed successfully.`);
process.exit(0);
}
Catching Uncaught Exceptions and Unhandled Rejections
Every CLI tool needs a global safety net. Without one, an unexpected error prints a raw stack trace and exits with code 1 — or worse, hangs indefinitely on an unhandled promise rejection.
// safety-net.js
process.on('uncaughtException', (err) => {
console.error(`Fatal error: ${err.message}`);
if (process.env.VERBOSE || process.argv.includes('--verbose')) {
console.error(err.stack);
}
console.error('This is a bug. Please report it at https://github.com/yourname/yourtool/issues');
process.exit(1);
});
process.on('unhandledRejection', (reason) => {
console.error(`Unhandled promise rejection: ${reason}`);
if (reason instanceof Error && (process.env.VERBOSE || process.argv.includes('--verbose'))) {
console.error(reason.stack);
}
process.exit(1);
});
Register these handlers as the very first thing in your entry point, before any other code runs. If you register them after importing modules that kick off async work, you might miss early failures.
A subtlety worth knowing: Node.js 15+ treats unhandled rejections as fatal by default (exiting with code 1). Earlier versions only printed a warning. Do not rely on the default behavior — handle it explicitly.
User-Friendly Error Messages vs Stack Traces
The biggest mistake I see in CLI tools is dumping raw stack traces on end users. A stack trace is useful for the developer of the tool, not the person using it.
Implement a --verbose flag that controls error output detail:
// error-formatter.js
class CLIError extends Error {
constructor(message, { code = 'ERR_UNKNOWN', exitCode = 1, hint } = {}) {
super(message);
this.name = 'CLIError';
this.code = code;
this.exitCode = exitCode;
this.hint = hint;
}
}
function formatError(err, verbose = false) {
if (err instanceof CLIError) {
let output = `Error: ${err.message}`;
if (err.hint) {
output += `\nHint: ${err.hint}`;
}
if (verbose) {
output += `\nCode: ${err.code}`;
output += `\nStack: ${err.stack}`;
}
return output;
}
// Unknown errors — likely a bug
if (verbose) {
return `Unexpected error: ${err.stack}`;
}
return `Unexpected error: ${err.message}\nRun with --verbose for details.`;
}
The pattern is simple: known errors get a clean message and an actionable hint. Unknown errors suggest running with --verbose and reporting a bug. Users should never have to decode a stack trace to figure out that they forgot to set an environment variable.
Here is what good error output looks like in practice:
$ mytool deploy --env production
Error: Missing required environment variable DEPLOY_TOKEN
Hint: Set it with: export DEPLOY_TOKEN=your-token-here
Or pass it inline: DEPLOY_TOKEN=xxx mytool deploy --env production
Compare that to the alternative: TypeError: Cannot read properties of undefined (reading 'trim') somewhere on line 847 of a minified file.
Network Error Recovery: Retries with Exponential Backoff
CLI tools that talk to APIs need retry logic. Networks are unreliable, APIs have rate limits, and DNS resolution sometimes fails on the first try for no apparent reason.
// retry.js
async function withRetry(fn, options = {}) {
const {
retries = 3,
baseDelay = 1000,
maxDelay = 30000,
shouldRetry = () => true,
onRetry = () => {},
} = options;
let lastError;
for (let attempt = 0; attempt <= retries; attempt++) {
try {
return await fn(attempt);
} catch (err) {
lastError = err;
if (attempt === retries || !shouldRetry(err, attempt)) {
throw err;
}
const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
const jitter = delay * (0.5 + Math.random() * 0.5);
onRetry(err, attempt + 1, jitter);
await new Promise(resolve => setTimeout(resolve, jitter));
}
}
throw lastError;
}
// Usage
const data = await withRetry(
() => fetch('https://api.example.com/data').then(r => {
if (!r.ok) throw new HttpError(r.status, r.statusText);
return r.json();
}),
{
retries: 3,
shouldRetry: (err) => {
if (err instanceof HttpError) {
return [408, 429, 500, 502, 503, 504].includes(err.status);
}
return err.code === 'ECONNRESET' || err.code === 'ETIMEDOUT';
},
onRetry: (err, attempt, delay) => {
console.error(`Request failed (${err.message}), retry ${attempt}/3 in ${Math.round(delay / 1000)}s...`);
},
}
);
Key decisions in this implementation:
- Jitter prevents thundering herd problems when multiple instances retry simultaneously.
-
shouldRetrydistinguishes transient failures (500, timeout) from permanent ones (401, 404). Never retry a 404. -
onRetrycallback lets the user know something is happening. Silent retries make people think the tool is hanging.
For rate-limited APIs, respect the Retry-After header:
shouldRetry: (err) => err.status === 429,
onRetry: async (err, attempt, delay) => {
const retryAfter = err.headers?.get('retry-after');
if (retryAfter) {
const waitMs = parseInt(retryAfter) * 1000;
console.error(`Rate limited. Waiting ${retryAfter}s as requested by server...`);
await new Promise(resolve => setTimeout(resolve, waitMs));
}
},
File System Errors: Permissions, Missing Files, Disk Full
File system errors are among the most common failures in CLI tools. Every one of them should produce a clear, actionable message.
// fs-errors.js
function handleFsError(err, filePath) {
switch (err.code) {
case 'ENOENT':
throw new CLIError(`File not found: ${filePath}`, {
code: 'FILE_NOT_FOUND',
hint: `Check that the path exists: ls -la ${path.dirname(filePath)}`,
});
case 'EACCES':
case 'EPERM':
throw new CLIError(`Permission denied: ${filePath}`, {
code: 'PERMISSION_DENIED',
hint: `Check file permissions: ls -la ${filePath}\nOr run with elevated permissions if appropriate.`,
});
case 'ENOSPC':
throw new CLIError(`Disk full: cannot write to ${filePath}`, {
code: 'DISK_FULL',
hint: 'Free up disk space and try again.',
});
case 'EISDIR':
throw new CLIError(`Expected a file but found a directory: ${filePath}`, {
code: 'IS_DIRECTORY',
});
case 'EMFILE':
case 'ENFILE':
throw new CLIError('Too many open files', {
code: 'TOO_MANY_FILES',
hint: 'Increase the open file limit: ulimit -n 4096',
});
default:
throw new CLIError(`File system error on ${filePath}: ${err.message}`, {
code: 'FS_ERROR',
});
}
}
// Usage
async function readConfig(configPath) {
try {
const raw = await fs.readFile(configPath, 'utf8');
return JSON.parse(raw);
} catch (err) {
if (err instanceof SyntaxError) {
throw new CLIError(`Invalid JSON in config file: ${configPath}`, {
code: 'INVALID_CONFIG',
hint: `Check syntax: cat ${configPath} | npx json-lint`,
});
}
handleFsError(err, configPath);
}
}
A production detail worth noting: when writing output files, always write to a temporary file first, then rename. This prevents leaving behind a corrupted half-written file if the process is killed mid-write.
async function safeWriteFile(filePath, content) {
const tmpPath = filePath + '.tmp.' + process.pid;
try {
await fs.writeFile(tmpPath, content);
await fs.rename(tmpPath, filePath);
} catch (err) {
// Clean up the temp file if it exists
try { await fs.unlink(tmpPath); } catch {}
handleFsError(err, filePath);
}
}
Signal Handling: SIGINT and SIGTERM Graceful Shutdown
When a user presses Ctrl+C, your tool should clean up and exit cleanly. This means removing temporary files, closing database connections, flushing logs, and restoring terminal state.
// shutdown.js
class GracefulShutdown {
constructor() {
this.handlers = [];
this.shuttingDown = false;
process.on('SIGINT', () => this.shutdown('SIGINT'));
process.on('SIGTERM', () => this.shutdown('SIGTERM'));
}
onShutdown(handler) {
this.handlers.push(handler);
}
async shutdown(signal) {
if (this.shuttingDown) {
console.error('\nForce quitting...');
process.exit(128 + (signal === 'SIGINT' ? 2 : 15));
}
this.shuttingDown = true;
console.error(`\nReceived ${signal}. Cleaning up...`);
const timeout = setTimeout(() => {
console.error('Cleanup timed out. Force quitting.');
process.exit(1);
}, 5000);
try {
await Promise.allSettled(
this.handlers.map(h => Promise.resolve(h()))
);
} catch (err) {
console.error(`Cleanup error: ${err.message}`);
}
clearTimeout(timeout);
process.exit(128 + (signal === 'SIGINT' ? 2 : 15));
}
}
// Usage
const shutdown = new GracefulShutdown();
const tmpFiles = [];
shutdown.onShutdown(async () => {
for (const f of tmpFiles) {
try { await fs.unlink(f); } catch {}
}
});
shutdown.onShutdown(async () => {
if (db) await db.close();
});
Two important details here. First, the double-Ctrl+C pattern: if the user presses Ctrl+C once, you clean up gracefully. If they press it again during cleanup, you force-quit immediately. Users expect this behavior and get frustrated when a tool ignores their interrupt. Second, the cleanup timeout: never let cleanup handlers hang forever. Five seconds is a reasonable default.
The exit code 128 + signal_number is a Unix convention. SIGINT is signal 2, so the exit code is 130. SIGTERM is signal 15, so the exit code is 143. CI systems and parent processes use these to distinguish "the tool was killed" from "the tool failed."
Validation Errors: Arguments and Configuration
Bad input should be caught as early as possible, with error messages that tell the user exactly what is wrong and how to fix it.
// validation.js
function validateArgs(args) {
const errors = [];
if (!args.input) {
errors.push('Missing required argument: --input <file>');
} else if (!existsSync(args.input)) {
errors.push(`Input file does not exist: ${args.input}`);
}
if (args.format && !['json', 'csv', 'yaml'].includes(args.format)) {
errors.push(
`Invalid format "${args.format}". Must be one of: json, csv, yaml`
);
}
if (args.concurrency !== undefined) {
const n = parseInt(args.concurrency);
if (isNaN(n) || n < 1 || n > 100) {
errors.push('--concurrency must be a number between 1 and 100');
}
}
if (errors.length > 0) {
console.error('Invalid arguments:\n');
errors.forEach(e => console.error(` - ${e}`));
console.error(`\nRun "${process.argv[1]} --help" for usage information.`);
process.exit(2); // Exit code 2 = misuse
}
}
Notice the exit code 2 for argument validation errors. This follows the convention established by Bash builtins and most Unix tools. It lets scripts distinguish "I ran the tool wrong" from "the tool ran but encountered an error."
For config file validation, provide line-number-level feedback when possible:
function validateConfig(raw, filePath) {
try {
return JSON.parse(raw);
} catch (err) {
const match = err.message.match(/position (\d+)/);
if (match) {
const pos = parseInt(match[1]);
const lines = raw.substring(0, pos).split('\n');
const line = lines.length;
const col = lines[lines.length - 1].length + 1;
throw new CLIError(
`Invalid JSON in ${filePath} at line ${line}, column ${col}`,
{ code: 'INVALID_CONFIG', exitCode: 2 }
);
}
throw new CLIError(`Invalid JSON in ${filePath}: ${err.message}`, {
code: 'INVALID_CONFIG',
exitCode: 2,
});
}
}
Creating Custom Error Classes for CLI Tools
A well-structured error hierarchy makes it easy to handle different failure types consistently across your codebase.
// errors.js
class CLIError extends Error {
constructor(message, { code = 'ERR_CLI', exitCode = 1, hint } = {}) {
super(message);
this.name = 'CLIError';
this.code = code;
this.exitCode = exitCode;
this.hint = hint;
}
}
class NetworkError extends CLIError {
constructor(message, { url, status, retryable = false } = {}) {
super(message, {
code: 'ERR_NETWORK',
hint: retryable
? 'This may be a temporary issue. Try again in a few seconds.'
: undefined,
});
this.name = 'NetworkError';
this.url = url;
this.status = status;
this.retryable = retryable;
}
}
class ConfigError extends CLIError {
constructor(message, { filePath, line } = {}) {
super(message, {
code: 'ERR_CONFIG',
exitCode: 2,
hint: filePath ? `Check your config file: ${filePath}` : undefined,
});
this.name = 'ConfigError';
this.filePath = filePath;
this.line = line;
}
}
class ValidationError extends CLIError {
constructor(message, { field, expected } = {}) {
super(message, {
code: 'ERR_VALIDATION',
exitCode: 2,
});
this.name = 'ValidationError';
this.field = field;
this.expected = expected;
}
}
module.exports = { CLIError, NetworkError, ConfigError, ValidationError };
The benefit shows up in your main error handler:
try {
await main();
} catch (err) {
if (err instanceof CLIError) {
console.error(formatError(err, verbose));
process.exit(err.exitCode);
}
// Truly unexpected error
console.error(`Bug: ${err.message}`);
console.error('Please report this issue.');
if (verbose) console.error(err.stack);
process.exit(1);
}
Logging Errors to Files for Debugging
When a user reports a bug, you need more context than "it didn't work." Write a debug log to disk that captures everything.
// logger.js
const { createWriteStream } = require('fs');
const path = require('path');
const os = require('os');
function createDebugLogger(toolName) {
const logDir = path.join(os.tmpdir(), toolName);
mkdirSync(logDir, { recursive: true });
const logPath = path.join(logDir, `debug-${Date.now()}.log`);
const stream = createWriteStream(logPath, { flags: 'a' });
function log(level, message, meta = {}) {
const entry = {
timestamp: new Date().toISOString(),
level,
message,
...meta,
pid: process.pid,
nodeVersion: process.version,
platform: process.platform,
};
stream.write(JSON.stringify(entry) + '\n');
}
return {
info: (msg, meta) => log('info', msg, meta),
warn: (msg, meta) => log('warn', msg, meta),
error: (msg, meta) => log('error', msg, meta),
path: logPath,
close: () => stream.end(),
};
}
// In your error handler:
catch (err) {
logger.error('Command failed', {
error: err.message,
stack: err.stack,
args: process.argv.slice(2),
env: {
NODE_ENV: process.env.NODE_ENV,
PATH: process.env.PATH,
},
});
console.error(`Error: ${err.message}`);
console.error(`Debug log written to: ${logger.path}`);
}
Capture the Node.js version, platform, and arguments — these are almost always relevant to reproducing bugs. Be careful not to log secrets: filter out environment variables like API keys and tokens.
Showing Progress and Recovering from Interruptions
Long-running operations should show progress and support resumption after interruption.
// progress.js
class ProgressTracker {
constructor(total, label = 'Processing') {
this.total = total;
this.current = 0;
this.label = label;
this.startTime = Date.now();
this.checkpointPath = null;
}
enableCheckpoints(filePath) {
this.checkpointPath = filePath;
try {
const saved = JSON.parse(readFileSync(filePath, 'utf8'));
this.current = saved.current;
this.completed = new Set(saved.completed);
console.error(`Resuming from checkpoint: ${this.current}/${this.total} completed`);
} catch {
this.completed = new Set();
}
}
update(itemId) {
this.current++;
this.completed?.add(itemId);
const percent = Math.round((this.current / this.total) * 100);
const elapsed = (Date.now() - this.startTime) / 1000;
const rate = this.current / elapsed;
const remaining = Math.round((this.total - this.current) / rate);
process.stderr.write(
`\r${this.label}: ${this.current}/${this.total} (${percent}%) ` +
`ETA: ${remaining}s `
);
if (this.checkpointPath && this.current % 10 === 0) {
writeFileSync(this.checkpointPath, JSON.stringify({
current: this.current,
completed: [...this.completed],
timestamp: new Date().toISOString(),
}));
}
}
shouldProcess(itemId) {
return !this.completed?.has(itemId);
}
finish() {
process.stderr.write('\n');
if (this.checkpointPath) {
try { unlinkSync(this.checkpointPath); } catch {}
}
}
}
// Usage
const progress = new ProgressTracker(files.length, 'Converting');
progress.enableCheckpoints('/tmp/mytool-checkpoint.json');
for (const file of files) {
if (!progress.shouldProcess(file)) continue;
await convertFile(file);
progress.update(file);
}
progress.finish();
The checkpoint pattern is valuable for any operation processing a list of items. If the user interrupts with Ctrl+C (and your graceful shutdown handler saves the checkpoint), they can re-run the command and it picks up where it left off.
Progress output goes to stderr, not stdout. This is critical for composability — if someone pipes your tool's output into another tool, progress messages must not contaminate the data stream.
Putting It All Together
Here is the skeleton of a well-structured CLI tool that incorporates all of these patterns:
#!/usr/bin/env node
const { CLIError, NetworkError, ConfigError } = require('./errors');
const { GracefulShutdown } = require('./shutdown');
const { createDebugLogger } = require('./logger');
const { withRetry } = require('./retry');
// Register global handlers FIRST
process.on('uncaughtException', handleFatal);
process.on('unhandledRejection', handleFatal);
const shutdown = new GracefulShutdown();
const logger = createDebugLogger('mytool');
const verbose = process.argv.includes('--verbose');
shutdown.onShutdown(() => logger.close());
async function main() {
const args = parseArgs(process.argv.slice(2));
validateArgs(args);
const config = await loadConfig(args.config);
const result = await withRetry(
() => fetchData(config.apiUrl),
{ retries: 3, onRetry: (err, n) => console.error(`Retry ${n}: ${err.message}`) }
);
await processResult(result, args.output);
console.log('Done.');
}
function handleFatal(err) {
logger.error('Fatal', { error: err.message, stack: err.stack });
console.error(formatError(err, verbose));
if (!verbose) {
console.error(`Debug log: ${logger.path}`);
}
process.exit(1);
}
main().catch((err) => {
if (err instanceof CLIError) {
logger.error('CLI Error', { code: err.code, message: err.message });
console.error(formatError(err, verbose));
process.exit(err.exitCode);
}
handleFatal(err);
});
The key principles, once more:
- Register global handlers before anything else.
- Use exit codes consistently. Zero means success. One means runtime error. Two means bad usage.
- Write user-facing messages to stderr. Reserve stdout for data output.
- Make errors actionable. Every error message should answer: what happened, why, and what to do about it.
-
Support
--verbosefor debugging. Hide stack traces by default, show them on request. - Retry transient failures. But only transient ones — do not retry authentication errors or 404s.
- Handle signals gracefully. Clean up temporary files, close connections, flush logs.
- Checkpoint long operations. Let users resume after interruption.
- Log everything to a debug file. When a bug report comes in, ask for the log.
These patterns have saved me countless hours of debugging and have prevented users from filing issues that amount to "I got an error, what do I do?" A CLI tool that handles errors well is a tool that people keep using.
Build your error handling foundation early, and every feature you add afterward will inherit that resilience. Your future self — and your users running builds at 2 AM — will thank you.
Top comments (0)