Running a Node.js application in production without a process manager is running with one hand behind your back. The process crashes, and it stays crashed. The server reboots, and your app doesn't come back. You have 16 CPU cores and you're using one.
PM2 solves all of this. It's the de facto production process manager for Node.js — and when used correctly, it turns a single-threaded Node.js app into a horizontally-scaled, self-healing, observable production service.
This guide covers everything you need to run PM2 in production: cluster mode, ecosystem configuration, zero-downtime reloads, shared state handling, log management, and when to use PM2 versus systemd or Docker orchestration.
Why PM2 Over Raw Cluster
You can write your own cluster logic:
const cluster = require('cluster');
const os = require('os');
if (cluster.isPrimary) {
const cpus = os.cpus().length;
for (let i = 0; i < cpus; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code) => {
if (code !== 0) cluster.fork(); // restart on crash
});
} else {
require('./server');
}
This works. But you now own:
- Restart logic (exponential backoff? max restarts?)
- Log aggregation across processes
- Zero-downtime reload orchestration
- Startup on system boot
- Memory limit enforcement
- Metrics collection
PM2 ships all of this battle-tested. The raw cluster module is the foundation — PM2 is the production layer on top of it.
Installation and Basic Usage
npm install pm2 -g
Start an app:
# Single process
pm2 start app.js --name my-api
# Cluster mode (auto-detect CPU count)
pm2 start app.js --name my-api -i max
# Cluster mode (specific count)
pm2 start app.js --name my-api -i 4
Key commands:
pm2 list # show all processes
pm2 status # same as list
pm2 logs # stream all logs
pm2 logs my-api # stream specific app logs
pm2 monit # terminal dashboard (CPU, memory, logs)
pm2 stop my-api # stop process
pm2 restart my-api # hard restart (brief downtime)
pm2 reload my-api # zero-downtime reload (cluster mode only)
pm2 delete my-api # remove from PM2 entirely
The Ecosystem File: Infrastructure as Code
Never run PM2 from the command line in production. Use ecosystem.config.js:
// ecosystem.config.js
module.exports = {
apps: [
{
name: 'api-server',
script: './dist/server.js',
instances: 'max', // use all CPU cores
exec_mode: 'cluster', // enable cluster mode
// Environment
env: {
NODE_ENV: 'development',
PORT: 3000,
},
env_production: {
NODE_ENV: 'production',
PORT: 3000,
},
// Restart behavior
max_memory_restart: '512M', // restart if process exceeds 512MB
min_uptime: '10s', // must stay up at least 10s to be "stable"
max_restarts: 10, // max restart attempts before marking as errored
restart_delay: 1000, // wait 1s between restarts
exp_backoff_restart_delay: 100, // exponential backoff on restart
// Logging
log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
error_file: '/var/log/pm2/api-error.log',
out_file: '/var/log/pm2/api-out.log',
merge_logs: true, // merge cluster worker logs into one file
// Watch (dev only — never use in production)
watch: false,
// Advanced
kill_timeout: 5000, // ms to wait after SIGINT before SIGKILL
listen_timeout: 8000, // ms to wait for app to be ready after restart
wait_ready: true, // wait for process.send('ready') signal
}
]
};
Start with the ecosystem file:
pm2 start ecosystem.config.js --env production
pm2 restart ecosystem.config.js --env production
pm2 reload ecosystem.config.js --env production
Zero-Downtime Reloads (The Critical Pattern)
pm2 restart kills all processes simultaneously — brief downtime, not acceptable for production.
pm2 reload does a rolling restart across cluster workers: one worker stops accepting connections, waits for in-flight requests to complete, restarts, then the next worker rotates through. Zero dropped requests.
But zero-downtime reload only works if your app cooperates:
// server.js
const http = require('http');
const server = http.createServer(app);
server.listen(process.env.PORT, () => {
// Signal PM2 that we're ready to receive traffic
if (process.send) {
process.send('ready');
}
console.log(`Worker ${process.pid} listening on port ${process.env.PORT}`);
});
// Graceful shutdown on SIGINT (PM2 reload signal)
process.on('SIGINT', () => {
console.log(`Worker ${process.pid} shutting down...`);
server.close(() => {
// Drain all existing connections
console.log(`Worker ${process.pid} closed. Exiting.`);
process.exit(0);
});
// Force exit after kill_timeout if server.close() hangs
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 4000); // less than kill_timeout in ecosystem.config.js
});
The wait_ready: true + process.send('ready') pattern is critical. Without it, PM2 considers the process ready immediately after fork — before your server is actually listening. With it, PM2 waits for the explicit ready signal before routing traffic to the new worker.
CPU Affinity and Instance Count
instances: 'max' spawns one worker per logical CPU. But more isn't always better:
const os = require('os');
const cpuCount = os.cpus().length;
// For I/O-bound apps (most Node.js web servers):
// instances = cpuCount works well
// The event loop is efficient; more processes = more connection parallelism
// For CPU-bound apps:
// instances = cpuCount - 1 (leave one core for the OS and PM2 itself)
// Overcrowding with CPU-heavy workers causes context-switching overhead
// For memory-constrained servers:
// instances = Math.floor(totalRamMB / appRamMB)
// A 2GB server with a 400MB app footprint = max 4 instances, not 8
In ecosystem.config.js:
{
instances: process.env.PM2_INSTANCES || 'max',
// or calculate dynamically:
// instances: require('os').cpus().length - 1
}
For containerized environments (Docker/Kubernetes), run PM2 with a single instance (instances: 1) and let the orchestrator handle horizontal scaling. Running cluster mode inside a container wastes the isolation guarantee.
The Shared State Problem
Cluster mode means multiple processes. Processes don't share memory. If your app stores state in-process, cluster mode will break it:
// ❌ This breaks in cluster mode
const rateLimit = new Map(); // Each worker has its own Map — rate limits don't work
// ❌ Same problem: in-memory session storage
const sessions = {}; // Worker 1 handles login, Worker 2 doesn't have the session
Solution: Redis for Shared State
Move any shared state to Redis:
// ✅ Rate limiting with Redis (shared across all cluster workers)
const redis = require('ioredis');
const { RateLimiterRedis } = require('rate-limiter-flexible');
const client = new redis({ host: 'localhost', port: 6379 });
const rateLimiter = new RateLimiterRedis({
storeClient: client,
keyPrefix: 'rate_limit',
points: 100, // requests
duration: 60, // per 60 seconds
});
// In your middleware:
app.use(async (req, res, next) => {
try {
await rateLimiter.consume(req.ip);
next();
} catch (err) {
res.status(429).json({ error: 'Too many requests' });
}
});
// ✅ Session management with Redis (shared across all cluster workers)
const session = require('express-session');
const RedisStore = require('connect-redis').default;
app.use(session({
store: new RedisStore({ client }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: { secure: true, maxAge: 86400000 },
}));
Application-Level State Audit Checklist
Before enabling cluster mode, audit every in-memory data structure:
| Pattern | Cluster-Safe? | Fix |
|---|---|---|
const cache = new Map() |
❌ No | Redis or memcached |
let requestCount = 0 |
❌ No | Redis INCR
|
const sessions = {} |
❌ No | connect-redis |
const rateLimiter = new RateLimiterMemory() |
❌ No | RateLimiterRedis |
const db = createConnection() |
✅ Yes | Each worker gets its own pool |
const server = http.createServer() |
✅ Yes | PM2 handles port sharing |
const config = require('./config') |
✅ Yes | Read-only at startup |
const queue = new BullMQ.Queue() |
✅ Yes | Redis-backed queue |
Log Management in Production
PM2's default logging is good; production logging needs tuning:
# Install log rotation (critical — logs will fill your disk otherwise)
pm2 install pm2-logrotate
# Configure rotation
pm2 set pm2-logrotate:max_size 10M # rotate at 10MB
pm2 set pm2-logrotate:retain 7 # keep 7 rotated files
pm2 set pm2-logrotate:compress true # gzip rotated logs
pm2 set pm2-logrotate:rotateInterval '0 0 * * *' # daily at midnight
For structured logging, have your app write JSON to stdout — PM2 captures it:
// Use a structured logger like pino
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
// In production, output raw JSON (no prettification overhead)
transport: process.env.NODE_ENV === 'development'
? { target: 'pino-pretty' }
: undefined,
});
// Log includes worker PID automatically
logger.info({
pid: process.pid,
route: req.path,
duration_ms: elapsed
}, 'request completed');
PM2 merges all worker logs with merge_logs: true. Each line is still tagged with the process ID, so you can trace requests back to specific workers.
Startup on System Boot
PM2 processes don't survive reboots unless you configure the startup hook:
# Generate startup script for your OS (systemd, launchd, etc.)
pm2 startup
# Follow the printed command — it'll be something like:
sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u myuser --hp /home/myuser
# Save the current process list
pm2 save
After this, PM2 registers as a systemd service. On reboot, systemd starts PM2, which restores your saved process list.
Verify it works:
sudo systemctl status pm2-myuser # check PM2 service status
sudo reboot # test it
pm2 list # after reboot — all apps should be running
PM2 Monitoring and Metrics
Terminal Dashboard
pm2 monit
Real-time view: CPU%, memory, event loop lag, active handles, restarts, uptime per worker.
Web Dashboard (PM2 Plus)
PM2 Plus (paid, ~$9/month for 4 servers) provides a cloud dashboard, anomaly detection, and alerting. For most teams, the open-source terminal monitoring plus Prometheus export is sufficient.
Prometheus Export
pm2 install pm2-prometheus-exporter
Exposes a /metrics endpoint that Grafana can scrape. Key metrics:
pm2_process_cpu_seconds_total
pm2_process_memory_bytes
pm2_process_restart_count
pm2_process_status # 0=stopped, 1=online, 2=errored
pm2_process_uptime_seconds
Grafana dashboard JSON for PM2 is available at grafana.com (dashboard ID: 10474).
Process Manager Comparison
| PM2 | systemd | Docker | Nodemon | |
|---|---|---|---|---|
| Primary use | Production Node.js | System services | Containerized apps | Development only |
| Cluster mode | ✅ Built-in | ❌ Manual | ❌ (use Kubernetes) | ❌ |
| Zero-downtime reload | ✅ pm2 reload
|
❌ | ✅ (orchestrator) | ❌ |
| Log management | ✅ Built-in | journald | Docker logging | ❌ |
| Memory limits | ✅ Auto-restart | cgroups | cgroups | ❌ |
| Hot env injection | ✅ ecosystem.config.js | systemd env file | Docker env vars | ❌ |
| Startup on boot | ✅ pm2 startup
|
✅ Native | ✅ Compose restart | ❌ |
| Container-aware | ❌ | ❌ | ✅ | ❌ |
When to use PM2:
- Bare metal or VM deployments
- Non-containerized production servers
- Rapid deployment without orchestration overhead
- Small-to-medium teams that don't need Kubernetes
When to skip PM2:
- Docker containers (use a single process, let the orchestrator restart it)
- Kubernetes (let k8s handle restarts and scaling)
- Serverless (Lambda, Cloud Run — no persistent process)
PM2 in CI/CD
Deploy pattern for zero-downtime:
# deploy.sh
set -e
echo "Pulling latest..."
git pull origin main
echo "Installing dependencies..."
npm ci --production
echo "Building..."
npm run build
echo "Reloading PM2..."
pm2 reload ecosystem.config.js --env production --update-env
echo "Saving PM2 state..."
pm2 save
echo "Deployment complete"
The --update-env flag tells PM2 to reload environment variables from the ecosystem file during the rolling restart — so you can update env vars without a hard restart.
Production Checklist
Before going live with PM2 in cluster mode:
- [ ]
wait_ready: true+process.send('ready')implemented - [ ]
SIGINThandler drains in-flight requests gracefully - [ ]
kill_timeoutin ecosystem ≥ server.close() timeout - [ ] All in-memory state audited and moved to Redis
- [ ]
pm2-logrotateinstalled and configured - [ ]
pm2 startup+pm2 saveexecuted - [ ]
max_memory_restartset (prevents silent OOM death) - [ ]
max_restarts+restart_delayconfigured to prevent restart loops - [ ]
merge_logs: truefor aggregated log streams - [ ] Monitoring:
pm2 monitor Prometheus exporter - [ ] Load tested with cluster enabled (verify session/state works)
PM2 is one of the highest-leverage tools in the Node.js production toolkit. A 30-minute investment in a proper ecosystem.config.js — ready signals, graceful shutdown, Redis for shared state, log rotation — pays for itself the first time you do a zero-downtime deploy at 2 PM on a Tuesday.
The cluster module gives you the mechanism. PM2 gives you production operations. Use both.
This is part of the Node.js Production Series — 37+ deep-dive articles on running Node.js at scale.
Subscribe to The AXIOM Experiment newsletter for weekly updates on autonomous AI, developer tools, and what's actually working in production.
Top comments (0)