Series: From "Just Put It on a Server" to Production DevOps
Reading time: 10 minutes
Level: Beginner-friendly
Quick Recap
In Part 1, we deployed our Sales Signal Processing Platform to a Linode server the manual way. It worked... until:
- We closed our SSH session (app died)
- The app crashed (stayed dead)
- The server rebooted (app didn't restart)
Today's mission: Keep the app alive without babysitting it.
The Problem: Processes Are Fragile
Let's simulate what happens in production.
SSH into your server and start the API:
cd /opt/sspp/services/api
npm start &
Now kill it on purpose:
# Find the process ID
ps aux | grep node
# Kill it
kill -9 <pid>
Test the API:
curl http://localhost:3000/api/v1/health
Dead. And it's not coming back.
In production, processes die for many reasons:
- Unhandled exceptions
- Memory leaks (OOM killer strikes)
- Dependency failures (database connection lost)
- Random cosmic rays (yes, really)
You need something that automatically restarts your app.
Enter PM2: Process Manager 2
PM2 is a production-grade process manager for Node.js applications. Think of it as a babysitter that:
- Keeps your app running - Restarts on crash
- Survives reboots - Starts on system boot
- Manages logs - Aggregates stdout/stderr
- Monitors resources - CPU, memory usage
- Zero-downtime reloads - Update without dropping connections
Why PM2? It's battle-tested, actively maintained, and used by thousands of companies in production.
Installation
SSH into your server:
# Install PM2 globally
npm install -g pm2
# Verify
pm2 --version
Simple. Now let's use it.
Running Your App with PM2
Basic Usage
Instead of npm start, use PM2:
cd /opt/sspp/services/api
# Start the app
pm2 start npm --name "sspp-api" -- start
# Check status
pm2 status
Output:
┌─────┬──────────────┬─────────┬──────┬───────┬────────┬─────────┬────────┬──────┬───────────┬──────────┐
│ id │ name │ mode │ ↺ │ status│ cpu │ memory │
├─────┼──────────────┼─────────┼──────┼───────┼────────┼─────────┤
│ 0 │ sspp-api │ fork │ 0 │ online│ 0% │ 45.2mb │
└─────┴──────────────┴─────────┴──────┴───────┴────────┴─────────┘
Your app is now:
- Named (no more anonymous PIDs)
- Monitored (PM2 watches it)
- Managed (can be controlled by name)
Better: Use an Ecosystem File
Create a PM2 configuration file:
cd /opt/sspp
cat > ecosystem.config.js <<'EOF'
module.exports = {
apps: [
{
name: 'sspp-api',
cwd: '/opt/sspp/services/api',
script: 'npm',
args: 'start',
instances: 1,
autorestart: true,
watch: false,
max_memory_restart: '500M',
env: {
NODE_ENV: 'production',
PORT: 3000,
DB_HOST: 'localhost',
DB_PORT: 5432,
DB_NAME: 'sales_signals',
DB_USER: 'sspp_user',
DB_PASSWORD: 'sspp_password',
REDIS_HOST: 'localhost',
REDIS_PORT: 6379,
ELASTICSEARCH_URL: 'http://localhost:9200',
},
error_file: '/var/log/sspp/api-error.log',
out_file: '/var/log/sspp/api-out.log',
time: true,
},
{
name: 'sspp-worker',
cwd: '/opt/sspp/services/worker',
script: 'npm',
args: 'start',
instances: 2,
autorestart: true,
watch: false,
max_memory_restart: '500M',
env: {
NODE_ENV: 'production',
DB_HOST: 'localhost',
DB_PORT: 5432,
DB_NAME: 'sales_signals',
DB_USER: 'sspp_user',
DB_PASSWORD: 'sspp_password',
REDIS_HOST: 'localhost',
REDIS_PORT: 6379,
ELASTICSEARCH_URL: 'http://localhost:9200',
QUEUE_NAME: 'sales-events',
},
error_file: '/var/log/sspp/worker-error.log',
out_file: '/var/log/sspp/worker-out.log',
time: true,
},
],
};
EOF
What this does:
- Defines both services (API + Worker) in one place
-
Sets environment variables (no more
.envfiles to manage) - Configures resources (max memory before restart)
- Organizes logs (separate files for each service)
- Runs multiple workers (2 worker instances for parallel processing)
Create log directory:
mkdir -p /var/log/sspp
Start everything:
pm2 start ecosystem.config.js
# Check status
pm2 status
Output:
┌─────┬──────────────┬─────────┬──────┬───────┬────────┬─────────┐
│ id │ name │ mode │ ↺ │ status│ cpu │ memory │
├─────┼──────────────┼─────────┼──────┼───────┼────────┼─────────┤
│ 0 │ sspp-api │ fork │ 0 │ online│ 1.2% │ 48.3mb │
│ 1 │ sspp-worker │ fork │ 0 │ online│ 0.8% │ 42.1mb │
│ 2 │ sspp-worker │ fork │ 0 │ online│ 0.7% │ 41.8mb │
└─────┴──────────────┴─────────┴──────┴───────┴────────┴─────────┘
Now you have:
- 1 API instance
- 2 Worker instances (for parallel event processing)
- All managed by PM2
Testing Auto-Restart
Let's intentionally crash the API:
# Find the process ID
pm2 list
# Kill the API process
pm2 delete sspp-api
pm2 start ecosystem.config.js --only sspp-api
# Now kill it brutally
kill -9 $(pgrep -f "sspp-api")
Wait 1 second, then check:
pm2 status
The ↺ (restart count) increases! PM2 automatically restarted it.
Test the API:
curl http://localhost:3000/api/v1/health
Still alive. 🎉
Startup Script: Survive Reboots
The app survives crashes now. But what about server reboots?
# Generate startup script
pm2 startup systemd
# Follow the command it prints (looks like):
# sudo env PATH=$PATH:/usr/bin pm2 startup systemd -u root --hp /root
Run that sudo command it generates.
Save the current PM2 process list:
pm2 save
This creates /root/.pm2/dump.pm2 with your process configuration.
Test it:
# Reboot the server
sudo reboot
Wait 30 seconds, SSH back in:
pm2 status
Your apps are running! Without you doing anything.
Managing Your Apps
View Logs
# All logs (combined)
pm2 logs
# Specific app
pm2 logs sspp-api
# Last 100 lines
pm2 logs sspp-api --lines 100
# Live tail
pm2 logs sspp-worker --lines 0
Monitor Resources
pm2 monit
This opens an interactive dashboard showing:
- CPU usage
- Memory usage
- Logs (live stream)
Press Ctrl+C to exit.
Restart/Reload
# Restart (kills and starts)
pm2 restart sspp-api
# Reload (zero-downtime, only works for cluster mode)
pm2 reload sspp-api
# Restart all
pm2 restart all
Stop/Delete
# Stop (keeps in PM2 list)
pm2 stop sspp-api
# Delete (removes from PM2 list)
pm2 delete sspp-api
# Stop all
pm2 stop all
# Delete all
pm2 delete all
Cluster Mode (Bonus: Load Balancing)
PM2 can run multiple instances of your app and load-balance between them:
// In ecosystem.config.js
{
name: 'sspp-api',
script: './dist/main.js', // Direct script, not npm
instances: 4, // Or 'max' for CPU count
exec_mode: 'cluster', // Enable cluster mode
// ... rest of config
}
Restart PM2:
pm2 delete all
pm2 start ecosystem.config.js
Now you have 4 API instances behind PM2's built-in load balancer.
Why this matters:
- Utilizes all CPU cores
- Automatic load distribution
- Zero-downtime reloads (one instance at a time)
What We Solved
With PM2, we fixed:
✅ Automatic restart on crash - App crashes are now recoverable
✅ Startup on boot - Server reboots don't kill your service
✅ Log management - Centralized, timestamped logs
✅ Resource monitoring - Know when memory leaks happen
✅ Process naming - No more searching for PIDs
✅ Multi-instance management - Run workers in parallel
What We Didn't Solve
PM2 is great, but it doesn't solve:
❌ "Works on my machine" - Still manual dependency installation
❌ Environment consistency - Different Node versions, OS differences
❌ Multi-server scaling - PM2 is single-server only
❌ Deployment strategy - Still manual git pull, restart
❌ Rollback capability - No version management
❌ Network complexity - How do API and Worker discover services?
❌ Resource isolation - Apps can steal CPU/memory from each other
PM2 is a massive improvement over raw processes. But we're still managing dependencies manually, and we can't easily scale to multiple servers.
Real-World PM2 Tips
1. Always Use Ecosystem Files
Don't run pm2 start with inline arguments. Use ecosystem.config.js:
# ❌ Don't do this
pm2 start npm --name api -- start
# ✅ Do this
pm2 start ecosystem.config.js
2. Set Memory Limits
Prevent runaway processes:
{
max_memory_restart: '500M', // Restart if memory exceeds 500MB
}
3. Use Absolute Paths
Relative paths break when PM2 restarts:
{
cwd: '/opt/sspp/services/api', // Absolute path
script: 'npm', // Not '../../../node_modules/...'
}
4. Separate Logs
Don't dump everything to one file:
{
error_file: '/var/log/sspp/api-error.log',
out_file: '/var/log/sspp/api-out.log',
}
5. Use Log Rotation
Logs grow forever. Set up rotation:
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
Production Checklist
Before going live with PM2:
- [ ] Ecosystem file configured
- [ ] Startup script installed (
pm2 startup) - [ ] Process list saved (
pm2 save) - [ ] Memory limits set
- [ ] Log rotation enabled
- [ ] Monitoring alerts configured (e.g., PM2 Plus)
What's Next?
PM2 solves the "keep it running" problem beautifully. But we're still stuck with:
- Manual dependency management (Node, PostgreSQL, Redis, Elasticsearch)
- "Works on my machine" syndrome (different environments)
- Single-server limitations (can't easily scale horizontally)
In Part 3, we'll tackle these by introducing Docker—containers that package your entire application environment.
What PM2 Does NOT Fix
PM2 solves process management, but let's be honest about what's still broken:
✅ What PM2 fixes:
- Auto-restarts on crash
- Survives SSH disconnects
- Starts on server boot
- Basic logging and monitoring
❌ What PM2 does NOT fix:
- Environment consistency - Still manually installing Node, PostgreSQL, Redis
- Infrastructure drift - Every server is a unique snowflake
- Scaling - Can't easily add more servers
- Dependency conflicts - Node v16 on this server, v18 on that one
- Reproducibility - "Works on my machine" still exists (just less obviously)
- Onboarding - New devs still need 2+ hours of setup
- Rollbacks - No easy way to undo deployments
The hidden danger:
PM2 makes things feel professional, which can hide deeper problems.
Try It Yourself
Experience what breaks next:
- Set up PM2 for both API and Worker services
- Enable startup script:
pm2 startup && pm2 save - Now try to set up a second server identically
- Notice how many steps you have to remember
- Notice how easy it is to have version mismatches
This pain is important. It's why Docker exists.
Next: Making Environments Consistent
In Part 3, we'll solve the "works on my machine" problem:
"How do I package my app so it runs the same everywhere?"
We'll use Docker to:
- Freeze dependencies in time
- Eliminate environment drift
- Make onboarding instant
- Enable reliable rollbacks
But spoiler: Docker solves packaging, not operations. We'll discover what breaks when you have multiple containers.
Previous: Part 1: The Default Way - Putting an App on a Server
Next: Part 3: Docker - Freezing the Application in Time
About the Author
Building this series to demonstrate real DevOps thinking for my Proton.ai application. If you're hiring for platform engineering roles, let's connect.
- GitHub: @daviesbrown
- LinkedIn: David Nwosu
Top comments (0)