DEV Community

Empellio.com
Empellio.com

Posted on • Originally published at oxmgr.empellio.com

Node.js Clustering — How to Use All CPU Cores in Production

Node.js runs on a single thread. That's not a bug — it's a deliberate design decision that makes async I/O simple. But it means that by default, a Node.js process uses exactly one CPU core, no matter how many your server has.

On a 4-core VPS, three quarters of your compute is sitting idle.

Clustering fixes this.

What Clustering Means

Clustering in Node.js means running multiple instances of your application — one per CPU core — all sharing the same port. Incoming connections are distributed across instances by the OS.

                      ┌──────────────────────────┐
                      │        Port 3000         │
                      └────────────┬─────────────┘
                                   │
              ┌────────────────────┼────────────────────┐
              │                    │                    │
         ┌────▼─────┐        ┌─────▼─────┐        ┌─────▼─────┐
         │ Worker 0 │        │ Worker 1  │        │ Worker 2  │
         │ (PID 101)│        │ (PID 102) │        │ (PID 103) │
         └──────────┘        └───────────┘        └───────────┘
              │                    │                    │
         CPU Core 0           CPU Core 1            CPU Core 2
Enter fullscreen mode Exit fullscreen mode

Each worker is an independent Node.js process with its own event loop, memory heap, and V8 instance. They share no state in-memory — only the listening port.

Option 1: The Cluster Module (Built-in)

Node.js ships with a cluster module that handles forking and connection distribution:

import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import { createServer } from 'node:http';

if (cluster.isPrimary) {
  const numCPUs = availableParallelism();
  console.log(`Primary ${process.pid} running, forking ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died (${signal || code}), restarting...`);
    cluster.fork(); // auto-restart crashed workers
  });

} else {
  // Each worker runs this
  const server = createServer((req, res) => {
    res.writeHead(200);
    res.end(`Handled by worker ${process.pid}`);
  });

  server.listen(3000, () => {
    console.log(`Worker ${process.pid} started`);
  });
}
Enter fullscreen mode Exit fullscreen mode

With Express:

import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = availableParallelism();

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting`);
    cluster.fork();
  });

} else {
  const app = express();

  app.get('/', (req, res) => {
    res.json({ pid: process.pid, message: 'Hello from worker' });
  });

  app.listen(3000);
}
Enter fullscreen mode Exit fullscreen mode

Pros of the cluster module:

  • No external dependencies
  • Full control over fork behavior
  • Can pass messages between primary and workers via IPC

Cons:

  • Boilerplate in every project
  • You manage restarts yourself
  • No visibility into worker health from outside the process

Option 2: Let the Process Manager Handle It

The cleaner approach: keep your app code simple and let the process manager handle clustering.

With Oxmgr, set instances in oxfile.toml:

[processes.api]
command = "node dist/server.js"
instances = 4          # explicit count
restart_on_exit = true

# Or use CPU count automatically:
# instances = "max"
Enter fullscreen mode Exit fullscreen mode

Your application code has no clustering logic — it's just a regular Express/Fastify/Hapi app listening on a port. Oxmgr forks it instances times and distributes connections.

With PM2:

pm2 start app.js -i max     # max = number of CPU cores
pm2 start app.js -i 4       # explicit
Enter fullscreen mode Exit fullscreen mode

Or in ecosystem.config.js:

module.exports = {
  apps: [{
    name: 'api',
    script: 'dist/server.js',
    instances: 'max',
    exec_mode: 'cluster'
  }]
}
Enter fullscreen mode Exit fullscreen mode

Why this approach is better:

  • Clean separation of concerns — app doesn't know about clustering
  • Process manager handles crashes across all instances
  • Zero-downtime rolling restarts work across all instances
  • Health checks monitor each instance independently

Option 3: Worker Threads

Don't confuse clustering with Worker Threads. They're different tools for different problems:

Cluster Worker Threads
Purpose Scale I/O-bound workloads across cores Run CPU-intensive tasks off the main thread
Memory Separate heap per process Shared memory available (SharedArrayBuffer)
Communication IPC (slower) postMessage or shared buffers (faster)
Failure isolation One crash = one process One crash = entire process
Use case Web servers, API handlers Image processing, crypto, data parsing

Use clusters for serving HTTP. Use Worker Threads for CPU-heavy operations within a single process.

import { Worker, isMainThread, parentPort } from 'node:worker_threads';

if (isMainThread) {
  // Main thread: handle HTTP, delegate heavy work
  const worker = new Worker('./heavy-computation.js');
  worker.postMessage({ data: bigArray });
  worker.on('message', (result) => {
    console.log('Computation result:', result);
  });
} else {
  // Worker thread: do the heavy lifting
  parentPort.on('message', ({ data }) => {
    const result = doExpensiveComputation(data);
    parentPort.postMessage(result);
  });
}
Enter fullscreen mode Exit fullscreen mode

How Many Instances Should You Run?

The common advice is "one per CPU core." This is right for CPU-bound workloads. For I/O-bound Node.js apps (which is most of them), the relationship is more nuanced.

CPU-bound apps (heavy computation, data processing):

  • One instance per physical core
  • More instances than cores = context switching overhead

I/O-bound apps (database queries, HTTP calls, file reads):

  • Start with one per core
  • Benchmark under load — you might get better results with fewer instances sharing more memory
  • The bottleneck is usually the database, not the CPU

Practical starting point:

[processes.api]
# 2-core VPS
instances = 2

# 4-core server
instances = 4

# Leave one core for the OS and other processes
# instances = 3  (on a 4-core machine under heavy load)
Enter fullscreen mode Exit fullscreen mode

Monitor CPU and memory under load. If workers are CPU-saturated (consistently >80%), adding instances helps. If they're mostly idle waiting for DB queries, more instances just means more memory usage.

Sticky Sessions and Shared State

Clustering breaks any in-memory session state. If user A hits Worker 0 on request 1 and Worker 1 on request 2, Worker 1 doesn't know about Worker 0's session data.

Solutions:

1. Use a shared session store (recommended):

import session from 'express-session';
import RedisStore from 'connect-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false
}));
Enter fullscreen mode Exit fullscreen mode

2. Use sticky sessions (not recommended for new projects):

Sticky sessions route each client to the same worker consistently via a cookie or IP hash. This defeats the purpose of load balancing and creates uneven distribution.

3. Design stateless services:

The best approach for clustered apps: keep no state in memory. Use Redis, a database, or JWT tokens instead. Your app becomes trivially scalable — from 1 to 100 instances with no code changes.

Graceful Reloads with Clustering

When you update your app, you want to restart workers without dropping connections. This is the rolling restart pattern:

Worker 0: SIGTERM → drain connections → exit
                         ↓
                    New Worker 0 starts → health check passes
Worker 1: SIGTERM → drain connections → exit
                         ↓
                    New Worker 1 starts → health check passes
Enter fullscreen mode Exit fullscreen mode

With Oxmgr:

oxmgr reload api
Enter fullscreen mode Exit fullscreen mode

This performs a rolling restart across all instances. If a new instance fails its health check, the reload stops and old instances keep running — automatic rollback.

Quick Performance Test

Before and after enabling clustering:

# Install autocannon
npm install -g autocannon

# Benchmark single process (instances = 1)
autocannon -c 100 -d 10 http://localhost:3000

# Enable clustering (instances = 4), restart, benchmark again
autocannon -c 100 -d 10 http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

On a CPU-bound workload, you should see throughput multiply roughly linearly with core count. On an I/O-bound workload with a fast database, the gains are less dramatic but still meaningful.

Summary

  • Node.js is single-threaded by default — clustering uses all CPU cores
  • Simple apps: use instances in your process manager config
  • Complex apps: consider the built-in cluster module for fine-grained control
  • CPU-heavy tasks: use Worker Threads within a process, not more cluster instances
  • Always use shared state storage (Redis) — in-memory state doesn't survive across workers
  • Process managers handle the hard parts: restarts, health checks, rolling reloads

See the Oxmgr docs for cluster configuration, or the deployment guide for the full production setup walkthrough.

Top comments (0)