Your Node.js Server is Using Just One CPU. Here's How to Fix It.

#javascript #node #backend #webdev

CLUSTERING

You created your node application, it's ready, you have chosen an 8 vCPU instance to deploy it. You are done with deployment. Everything is working fine, but unknowingly you aren't using the full potential of the deployment. We know that node.js runs on a SINGLE THREAD, which means our node application uses only one vCPU at a time — but you took an 8 vCPU instance, so aren't the other 7 vCPUs sitting there idle?

The solution for this is CLUSTERING. It's a concept of running multiple instances of an application, where each works as an individual entity but still gets the work done and runs on the same port. Now the question is — how will this work? Isn't it going to cause issues among the instances? The simple and short answer is no.

HOW IT WORKS

When clustering is done, we end up with multiple processes. There are two kinds:

Primary – There is only one primary. It is responsible for spinning up the worker processes, managing them, and if any one of them dies, spawning it back. The primary is there only to manage — it doesn't run the code (connecting with db, spinning up server, etc.) Note – If the primary is down, the entire cluster will crash.
Workers – These are the actual instances where the application runs — they serve the users.

Key Facts

Since there are 8 vCPUs in our case, there will be 9 total processes — 1 primary + 8 workers.
Each worker has its own memory – nothing is shared among workers.
Workers share a single port – connections are distributed across them.
Primary is intentionally dumb – never runs code or connects with db.
Workers can't see their siblings — for each worker, only itself exists.

CODE SNIPPET

import cluster from "node:cluster";
import os from "node:os";
import app from "./src/app";
import { connectDB } from "./src/config/database";
import { createServer } from "http";

const PORT = process.env.PORT || 3000;
const enableCluster = process.env.NODE_ENV === "development";

if (enableCluster && cluster.isPrimary) {
  const numWorkers = os.cpus().length;
  for (let i = 0; i < numWorkers; i++) cluster.fork();

  cluster.on("exit", (worker) => {
    console.log(`worker ${worker.process.pid} died — respawning`);
    cluster.fork();
  });
} else {
  const httpServer = createServer(app);
  connectDB().then(() => httpServer.listen(PORT));
}

EXPLANATION OF CODE
In production we generally use services like pm2 to manage clustering, but here we are doing it using native options. For that, we first need the cluster and os modules of node.
Then we check if the current process is the primary or not. If it is the primary, we spawn new workers as per the number of cores available — it's not hard coded, we may change it as per our convenience, but it should not be more than the number of cores/vCPUs. If it isn't the primary (meaning we are already inside a worker), we run the actual backend code — connecting to the DB and starting the server. So now we have 8 worker instances up and running (plus the primary watching over them).
Using process.pid, we can see the unique id of each worker.
Note – this id, and whatever happens inside an instance, stays there only. Other instances can't access this one's data, process, etc.

PROS/CONS

Pros:

Uses all CPU cores
Crash isolation
Built into Node
Higher throughput, CPU-bound work
Auto-respawns dead workers

Cons:

Each worker has its own RAM (no shared state)
In-memory caches/sessions break silently
WebSockets/SSE need extra infrastructure
Harder to debug – 'which worker logged that?'
Primary crash = whole cluster dies

Note – load balancing is round-robin on Linux; on Windows, the OS decides routing.

BIG CAVEAT

This much is enough for simple clustering or for learning purposes, as long as our app is using stateless data (REST APIs backed by a DB).

In this case, the DB is the source of truth. Workers don't need to know about each other. Any worker can serve any request.

STATEFUL connections (WebSocket)
Prerequisite — knowledge of websockets.

Now things change. Once a connection is established and the HTTP request is upgraded to a WebSocket, the socket connection details (which user is on which socket) are stored in memory, inside that worker. So if User A connects through Worker 1 and User B connects through Worker 2, both are logged in and both users' data is stored in the DB. But the live sockets sit on different workers. Now when A sends a message to B, Worker 1 tries to push it to B's socket — but B's socket lives in Worker 2's memory, not Worker 1's. So the message gets saved to the DB, but real-time delivery to B fails.
Also, workers are standalone, so they can't even talk to each other to ask "do you have this user with you?"
A TCP socket lives inside one process.

STICKY Session
Imagine a user lands on Worker A and creates a socket connection. Details regarding the session are stored in Worker A's memory. Somehow, on the next request, the user is shifted to Worker B. Now the user tries to continue the conversation. The worker checks if this session exists or not, but there is no record of it in Worker B (that detail lives in Worker A). So the interaction fails.

To make it easier to picture, here are two ways to think about it:

Analogy 1 (hotel front desk) — You check into Hotel A. The front desk writes your name against Room 204. Later, you walk into Hotel B and ask for your room key. Hotel B has no idea who you are, because your check-in details only exist at Hotel A's front desk.

Analogy 2 (locker at a station) — You drop your bag at locker #5 in Station A and get a ticket. Later, you go to Station B and try to use the same ticket. Station B has no locker matching that ticket, because the bag is sitting back in Station A.

To mitigate this issue, we need Sticky Sessions. It ensures that a user stays on a single worker only — pinning all of one client's requests to the same worker.

One more thing worth knowing — Socket.IO's connection handshake itself is made of multiple HTTP requests (long-polling fallback) before it upgrades to WebSocket. Without stickiness, those handshake requests can scatter across different workers, and the connection never even establishes. So sticky sessions are needed not just after the user is connected, but during the initial connection itself.

REDIS ADAPTER for SOCKET
Even with stickiness, workers still can't communicate with each other. So User A on Worker 1 has no way to push a message to User B sitting on Worker 2. This is a major issue in applications using sockets or real-time communication. To solve this, we have adapters — one of them is the Redis adapter for Socket.IO. It acts as a coordination layer on pub/sub. With this in place, when Worker 1 emits a message, the adapter publishes that emit to a shared bus (Redis). Every worker is subscribed to this bus, and the worker that actually owns B's socket picks it up and delivers the message locally. Now the application will work just like an application running on a single instance.

STICKY + ADAPTER
The two solve different problems, and you actually need both together.

Sticky sessions make sure a user's requests always land on the same worker, so the connection (and the handshake) never breaks mid-way.
The Redis adapter makes sure that when a worker needs to push a message to a user sitting on a different worker, the message can still reach them through the shared pub/sub bus.

Sticky alone — your user stays connected, but messages between users on different workers still don't reach. Adapter alone — workers can broadcast across each other, but the initial connection itself keeps breaking. Together — your clustered app behaves like a single instance from the user's perspective.

TL;DR
Node is single-threaded. Clustering spawns one worker per core. REST scales for free because the DB is shared. Sockets don't — connections live in one worker's RAM. Fix with sticky sessions (so handshakes complete) plus a pub/sub adapter (so workers can deliver each other's messages).

So this sums up basic clustering in a node.js application.
Thanks for reading.