Building a Production-Ready WebSocket Server with Node.js: Scaling to 100K Connections
WebSocket servers promise real-time, bidirectional communication. HTTP servers promise request-response simplicity. The gap between these two paradigms is where most production incidents happen. If you have ever watched a WebSocket deployment crumble under load while your HTTP services hummed along, you know exactly what I mean.
This guide walks through building a WebSocket server in Node.js that can handle 100,000 concurrent connections. Not in theory --- in practice, with code you can deploy today.
Why WebSocket Servers Are Different from HTTP
HTTP is stateless. A load balancer can route any request to any server, and the server does not need to remember who you are between requests. WebSocket connections are stateful. Once a client connects, it maintains a persistent TCP connection to a specific server process. That single difference changes everything about how you scale.
Here is what breaks when you treat WebSocket servers like HTTP servers:
Memory pressure. Each WebSocket connection holds state in memory. At 100K connections with modest per-connection overhead (say 10 KB for buffers and metadata), you are looking at roughly 1 GB of memory just for connection bookkeeping.
Load balancing. Round-robin load balancing works for HTTP. For WebSockets, you need sticky sessions or a way to route messages to the correct server holding a given connection.
Failure recovery. When an HTTP server crashes, the next request goes to another server. When a WebSocket server crashes, every connected client must reconnect, detect the failure, and re-subscribe to whatever data streams they were consuming.
Broadcast fan-out. Sending a message to all connected clients is trivial on a single server. Across a cluster, you need a message bus.
Understanding these differences is not academic. It dictates every architectural decision that follows.
Building the Basic Server with the ws Library
The ws library is the fastest and most widely used WebSocket implementation for Node.js. Skip socket.io for this use case --- it adds abstraction overhead that matters at scale, and its automatic fallback to HTTP long-polling can mask real connection issues.
import { WebSocketServer } from 'ws';
import { createServer } from 'http';
const server = createServer((req, res) => {
// Health check endpoint for load balancers
if (req.url === '/health') {
res.writeHead(200);
res.end('ok');
return;
}
res.writeHead(404);
res.end();
});
const wss = new WebSocketServer({ server, perMessageDeflate: false });
// Connection tracking
const connections = new Map();
let connectionIdCounter = 0;
wss.on('connection', (ws, req) => {
const id = ++connectionIdCounter;
const ip = req.headers['x-forwarded-for'] || req.socket.remoteAddress;
connections.set(id, {
ws,
ip,
connectedAt: Date.now(),
lastPong: Date.now(),
subscriptions: new Set(),
});
ws.on('message', (data) => {
handleMessage(id, data);
});
ws.on('close', () => {
connections.delete(id);
});
ws.on('error', (err) => {
console.error(`Connection ${id} error:`, err.message);
connections.delete(id);
});
});
function handleMessage(connectionId, raw) {
let msg;
try {
msg = JSON.parse(raw);
} catch {
return; // Drop malformed messages silently
}
const conn = connections.get(connectionId);
if (!conn) return;
switch (msg.type) {
case 'subscribe':
conn.subscriptions.add(msg.channel);
break;
case 'unsubscribe':
conn.subscriptions.delete(msg.channel);
break;
case 'ping':
conn.ws.send(JSON.stringify({ type: 'pong', ts: Date.now() }));
break;
}
}
server.listen(8080, () => {
console.log('WebSocket server running on :8080');
});
Note that perMessageDeflate is disabled. Per-message compression trades CPU for bandwidth. At 100K connections, the CPU cost is severe. Disable it unless your messages are large and your bandwidth is constrained.
Connection Management: Heartbeats and Reconnection
Dead connections are the silent killer of WebSocket servers. A client loses network, the TCP connection enters a half-open state, and your server keeps allocating resources for a ghost. Without active detection, these pile up.
Server-Side Heartbeats
The WebSocket protocol includes native ping/pong frames. Use them.
const HEARTBEAT_INTERVAL = 30_000; // 30 seconds
const HEARTBEAT_TIMEOUT = 10_000; // 10 seconds to respond
function startHeartbeat() {
setInterval(() => {
const now = Date.now();
for (const [id, conn] of connections) {
// If we haven't received a pong since the last ping, terminate
if (now - conn.lastPong > HEARTBEAT_INTERVAL + HEARTBEAT_TIMEOUT) {
console.log(`Connection ${id} timed out, terminating`);
conn.ws.terminate(); // Hard close, no close frame
connections.delete(id);
continue;
}
// Send a protocol-level ping
if (conn.ws.readyState === 1) { // OPEN
conn.ws.ping();
}
}
}, HEARTBEAT_INTERVAL);
}
// Update lastPong when we receive a pong
wss.on('connection', (ws, req) => {
// ... existing setup ...
ws.on('pong', () => {
const conn = connections.get(id);
if (conn) conn.lastPong = Date.now();
});
});
startHeartbeat();
Client-Side Reconnection
The server cannot force a client to reconnect. Your client library must handle this. Here is a robust reconnection pattern:
class ResilientWebSocket {
constructor(url, options = {}) {
this.url = url;
this.maxRetries = options.maxRetries ?? Infinity;
this.baseDelay = options.baseDelay ?? 1000;
this.maxDelay = options.maxDelay ?? 30000;
this.retries = 0;
this.subscriptions = new Set();
this.connect();
}
connect() {
this.ws = new WebSocket(this.url);
this.ws.onopen = () => {
this.retries = 0;
// Re-subscribe to all channels after reconnect
for (const channel of this.subscriptions) {
this.ws.send(JSON.stringify({ type: 'subscribe', channel }));
}
};
this.ws.onclose = (event) => {
if (event.code === 1000) return; // Normal closure
this.scheduleReconnect();
};
this.ws.onerror = () => {
this.ws.close();
};
}
scheduleReconnect() {
if (this.retries >= this.maxRetries) return;
// Exponential backoff with jitter
const delay = Math.min(
this.baseDelay * Math.pow(2, this.retries) + Math.random() * 1000,
this.maxDelay
);
this.retries++;
setTimeout(() => this.connect(), delay);
}
subscribe(channel) {
this.subscriptions.add(channel);
if (this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({ type: 'subscribe', channel }));
}
}
}
Exponential backoff with jitter is non-negotiable. Without jitter, a server restart causes a thundering herd --- every client reconnects at the same intervals, creating synchronized spikes.
Scaling Strategies: Horizontal with Redis Pub/Sub
A single Node.js process can handle roughly 50,000 to 100,000 concurrent WebSocket connections depending on message throughput and payload size. To go beyond that, or to provide redundancy, you need horizontal scaling.
The Problem
Client A is connected to Server 1. Client B is connected to Server 2. Client A sends a message to a channel that Client B is subscribed to. Server 1 has no idea Client B exists.
The Solution: Redis Pub/Sub as a Message Bus
Redis pub/sub acts as the backbone connecting your server instances. Every server subscribes to the same Redis channels. When a message arrives on any server, it publishes to Redis, and Redis fans it out to all servers.
import { createClient } from 'redis';
const pub = createClient({ url: process.env.REDIS_URL });
const sub = pub.duplicate();
await pub.connect();
await sub.connect();
// When a server receives a message for a channel, publish to Redis
function broadcastToChannel(channel, message) {
pub.publish(`ws:${channel}`, JSON.stringify({
serverId: process.env.SERVER_ID,
message,
}));
}
// Subscribe to Redis and forward to local connections
await sub.pSubscribe('ws:*', (payload, redisChannel) => {
const channel = redisChannel.replace('ws:', '');
const { serverId, message } = JSON.parse(payload);
// Skip messages from this server (we already handled them locally)
if (serverId === process.env.SERVER_ID) return;
// Forward to all local connections subscribed to this channel
for (const [, conn] of connections) {
if (conn.subscriptions.has(channel) && conn.ws.readyState === 1) {
conn.ws.send(JSON.stringify({ type: 'message', channel, data: message }));
}
}
});
Sticky Sessions with Nginx
WebSocket connections start as an HTTP upgrade request. Your load balancer must route the upgrade to a server and then keep that TCP connection pinned to that server for the duration of the session.
upstream websocket_servers {
ip_hash; # Sticky sessions based on client IP
server ws1.internal:8080;
server ws2.internal:8080;
server ws3.internal:8080;
}
server {
listen 443 ssl;
location /ws {
proxy_pass http://websocket_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeout for idle connections (must exceed heartbeat interval)
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
}
Set proxy_read_timeout higher than your heartbeat interval. If your heartbeat fires every 30 seconds, a 60-second timeout gives you one missed heartbeat before Nginx closes the connection.
An important note on ip_hash: it works, but it is coarse. If a large portion of your traffic comes from behind a corporate NAT or a CDN, many users share the same IP, and one server gets disproportionate load. For finer-grained stickiness, consider hashing on a cookie or a query parameter that carries a session ID:
map $arg_session_id $ws_sticky {
default $arg_session_id;
"" $remote_addr;
}
upstream websocket_servers {
hash $ws_sticky consistent;
server ws1.internal:8080;
server ws2.internal:8080;
server ws3.internal:8080;
}
The consistent keyword enables consistent hashing, which minimizes connection redistribution when you add or remove a server from the pool. Without it, adding a fourth server would remap roughly 75% of connections. With consistent hashing, only about 25% get remapped.
OS-Level Tuning
At 100K connections, you will hit operating system limits before you hit Node.js limits.
# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.core.netdev_max_backlog = 65535
fs.file-max = 1000000
# /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
Also increase the file descriptor limit for your Node.js process via systemd or your container runtime.
Load Testing Your WebSocket Server
You cannot guess your way to 100K connections. You need to test it. The artillery tool supports WebSocket load testing and integrates well with CI pipelines.
# artillery-ws-test.yml
config:
target: "ws://localhost:8080"
phases:
- duration: 60
arrivalRate: 500 # 500 new connections per second
name: "Ramp up"
- duration: 300
arrivalRate: 0 # Hold connections open
name: "Sustained load"
ws:
rejectUnauthorized: false
scenarios:
- engine: ws
flow:
- send:
json:
type: "subscribe"
channel: "updates"
- think: 300 # Hold connection for 300 seconds
Run it:
npx artillery run artillery-ws-test.yml
For more control, write a custom load test with the ws library itself:
import WebSocket from 'ws';
const TARGET = 'ws://localhost:8080';
const TOTAL = 100_000;
const BATCH = 1000;
const DELAY_BETWEEN_BATCHES = 500; // ms
let connected = 0;
let failed = 0;
async function connectBatch(count) {
const promises = Array.from({ length: count }, () =>
new Promise((resolve) => {
const ws = new WebSocket(TARGET);
ws.on('open', () => { connected++; resolve(); });
ws.on('error', () => { failed++; resolve(); });
})
);
await Promise.all(promises);
}
async function run() {
for (let i = 0; i < TOTAL; i += BATCH) {
const batch = Math.min(BATCH, TOTAL - i);
await connectBatch(batch);
console.log(`Connected: ${connected}, Failed: ${failed}`);
await new Promise(r => setTimeout(r, DELAY_BETWEEN_BATCHES));
}
console.log(`Final: ${connected} connected, ${failed} failed`);
}
run();
Run this from a separate machine, not from localhost. Running 100K connections from the same machine as the server means you are competing for the same file descriptors, memory, and CPU. Use a dedicated load generation box --- or better, multiple boxes, since a single client machine will max out its ephemeral port range around 60K connections.
Key metrics to capture during load testing:
- Connection establishment time (p50, p95, p99)
- Memory usage per connection (measure RSS, not heap --- the OS overhead matters)
- Message latency under load (send timestamped messages, measure round-trip)
- Reconnection storm recovery time (kill a server, measure how long until all clients are back)
-
GC pause duration under sustained connection count (use
--trace-gcflag)
A practical baseline to aim for: a single Node.js process on a 4-core, 8 GB machine should handle 50K idle connections or 20K connections with moderate message throughput (100 messages/second broadcast). If you are below these numbers, look at per-message processing cost, serialization overhead, or excessive logging first.
Monitoring and Observability
WebSocket servers are harder to monitor than HTTP servers. There are no access logs by default, no request/response pairs to trace, and connection problems are often silent.
Metrics to Export
Push these to Prometheus, Datadog, or whatever your stack uses:
import { collectDefaultMetrics, Gauge, Histogram, Counter, register } from 'prom-client';
collectDefaultMetrics();
const wsConnectionsGauge = new Gauge({
name: 'ws_connections_active',
help: 'Number of active WebSocket connections',
});
const wsMessagesCounter = new Counter({
name: 'ws_messages_total',
help: 'Total WebSocket messages processed',
labelNames: ['direction', 'type'],
});
const wsMessageLatency = new Histogram({
name: 'ws_message_duration_seconds',
help: 'Time to process a WebSocket message',
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
});
// Update gauge periodically
setInterval(() => {
wsConnectionsGauge.set(connections.size);
}, 5000);
// Expose metrics endpoint on the HTTP server
server.on('request', (req, res) => {
if (req.url === '/metrics') {
res.setHeader('Content-Type', register.contentType);
register.metrics().then(data => res.end(data));
return;
}
});
Structured Logging
Log connection lifecycle events so you can debug connection issues in production:
function logEvent(event, connectionId, meta = {}) {
console.log(JSON.stringify({
ts: new Date().toISOString(),
event,
connectionId,
activeConnections: connections.size,
...meta,
}));
}
// Usage in connection handler
wss.on('connection', (ws, req) => {
logEvent('ws.connect', id, { ip, userAgent: req.headers['user-agent'] });
ws.on('close', (code, reason) => {
logEvent('ws.disconnect', id, { code, reason: reason.toString() });
});
ws.on('error', (err) => {
logEvent('ws.error', id, { error: err.message });
});
});
Alerting Rules
Set alerts for:
- Connection count dropping by more than 20% in 5 minutes (mass disconnect)
- Memory usage exceeding 80% of available RAM
- Message processing latency p99 above 100ms
- Redis pub/sub lag exceeding 1 second
Production Checklist
Before you ship a WebSocket server to production, verify every item on this list.
Connection handling:
- [ ] Heartbeat ping/pong with configurable interval and timeout
- [ ] Dead connection cleanup runs on a timer
- [ ] Maximum connections per IP to prevent resource abuse
- [ ] Maximum message size enforced (
maxPayloadoption inws) - [ ] Graceful shutdown drains connections before process exit
Scaling:
- [ ] Redis pub/sub (or equivalent) for cross-server messaging
- [ ] Sticky sessions configured on the load balancer
- [ ] OS file descriptor limits raised
- [ ] TCP backlog increased
- [ ] Node.js
--max-old-space-sizeset appropriately
Security:
- [ ] Authentication on the upgrade request (verify JWT or session token in the
connectionevent) - [ ] Rate limiting on message frequency per connection
- [ ] Input validation on every incoming message
- [ ] TLS termination at the load balancer or on the server
Observability:
- [ ] Active connection count exported as a metric
- [ ] Message throughput and latency tracked
- [ ] Connection lifecycle events logged with structured JSON
- [ ] Alerts configured for mass disconnects and memory pressure
Client resilience:
- [ ] Exponential backoff with jitter on reconnection
- [ ] Automatic re-subscription after reconnect
- [ ] Message queuing during disconnection (if required by your use case)
Graceful shutdown:
process.on('SIGTERM', () => {
console.log('SIGTERM received, draining connections...');
// Stop accepting new connections
server.close();
// Close all existing connections with a normal closure code
for (const [id, conn] of connections) {
conn.ws.close(1001, 'Server shutting down');
}
// Force exit after timeout
setTimeout(() => process.exit(0), 10_000);
});
Common Pitfalls
A few things that will bite you in production if you do not plan for them:
JSON.stringify in hot paths. Serializing a message once and sending the same buffer to 10,000 connections is dramatically faster than serializing it 10,000 times. Pre-serialize broadcast messages:
function broadcastToLocal(channel, data) {
const payload = JSON.stringify({ type: 'message', channel, data });
for (const [, conn] of connections) {
if (conn.subscriptions.has(channel) && conn.ws.readyState === 1) {
conn.ws.send(payload);
}
}
}
Backpressure. If a client's network is slow and your server keeps calling ws.send(), the messages queue in the kernel send buffer and eventually in Node.js memory. Check ws.bufferedAmount before sending, and drop messages or disconnect slow consumers:
if (conn.ws.bufferedAmount > 1024 * 1024) { // 1 MB backlog
logEvent('ws.slow_consumer', id, { buffered: conn.ws.bufferedAmount });
conn.ws.terminate();
connections.delete(id);
return;
}
Thundering herd on deploy. Rolling deploys kill old processes, forcing reconnections. Stagger your deploys and add random jitter (0 to 5 seconds) on the client before reconnecting to spread the load.
Wrapping Up
Scaling WebSocket servers to 100K connections is not about finding a magic configuration. It is about understanding the fundamental differences between persistent connections and request-response, then systematically addressing each challenge: memory management, cross-server messaging, dead connection detection, client resilience, and observability.
Start with a single server. Add heartbeats and connection tracking. Load test to find your ceiling. Then add Redis pub/sub and horizontal scaling when you hit it. Monitor everything, because WebSocket failures are silent by default.
The code in this article gives you a production-ready foundation. Adapt it to your use case, load test it in your environment, and deploy with confidence.
Top comments (0)