Young Gao

Posted on Mar 21

Scaling WebSocket Connections: From Single Server to Distributed Architecture (2026)

#websocket #typescript #node #architecture

Both Write and Bash are denied. Since you asked me to output the article, here it is directly:

Most tutorials stop at "here's how to set up a WebSocket server." Then you deploy to production, traffic grows, and you realize a single Node.js process can't hold 50,000 concurrent connections without melting. This article walks through the real engineering decisions behind scaling WebSocket connections — from a single server prototype to a distributed architecture that handles hundreds of thousands of simultaneous users.

This is part of the **Production Backend Patterns* series, where we tackle infrastructure problems you'll actually face in production.*

Starting Point: A Single-Server WebSocket Setup

Let's begin with the baseline. A straightforward WebSocket server using the ws library in Node.js:

import { WebSocketServer, WebSocket } from "ws";
import { createServer } from "http";

interface Client {
  id: string;
  ws: WebSocket;
  rooms: Set<string>;
  lastPing: number;
}

const server = createServer();
const wss = new WebSocketServer({ server });
const clients = new Map<string, Client>();

wss.on("connection", (ws, req) => {
  const clientId = crypto.randomUUID();
  const client: Client = {
    id: clientId,
    ws,
    rooms: new Set(),
    lastPing: Date.now(),
  };
  clients.set(clientId, client);

  ws.on("message", (data) => {
    const message = JSON.parse(data.toString());
    handleMessage(client, message);
  });

  ws.on("close", () => {
    clients.delete(clientId);
  });

  ws.send(JSON.stringify({ type: "connected", clientId }));
});

function broadcast(room: string, payload: object, excludeId?: string) {
  for (const client of clients.values()) {
    if (client.rooms.has(room) && client.id !== excludeId) {
      if (client.ws.readyState === WebSocket.OPEN) {
        client.ws.send(JSON.stringify(payload));
      }
    }
  }
}

server.listen(8080);

This works perfectly for a few hundred connections. You can broadcast to rooms, track clients, and move on with your life. Then reality sets in.

Where Single-Server Breaks Down

A single Node.js process hits several walls as connections grow:

Memory. Each WebSocket connection consumes memory for the socket buffer, your application-level client state, and any queued messages. A typical connection uses 20-50 KB. At 10,000 connections, that's 200-500 MB just for connection overhead. At 50,000, you're pushing 1-2.5 GB before your application logic even runs.

CPU. Broadcasting a message to 10,000 clients means 10,000 JSON.stringify + send calls. If you're broadcasting frequently (think real-time dashboards or multiplayer games), a single event loop gets saturated. You'll see message delivery latency spike from milliseconds to seconds.

Vertical scaling limits. Even if you throw more RAM at the machine, a single Node.js process can realistically handle 50,000-100,000 concurrent WebSocket connections depending on message throughput. Past that, you need multiple servers.

No fault tolerance. One server means one point of failure. When it restarts for a deploy or crashes, every connected client disconnects simultaneously.

Sticky sessions. Once you add a second server behind a load balancer, you discover the fundamental problem: Client A connected to Server 1 sends a message to a room, but Client B in the same room is connected to Server 2. The broadcast function above only knows about local clients. Server 2 never sees the message.

Redis Pub/Sub for Multi-Server Synchronization

The most common first step toward distributed WebSockets is Redis pub/sub. The idea is simple: when a server needs to broadcast to a room, it publishes the message to a Redis channel. All servers subscribe to the same channels and relay messages to their local clients.

import { WebSocketServer, WebSocket } from "ws";
import { createClient } from "redis";

const INSTANCE_ID = crypto.randomUUID();

// Redis needs separate connections for pub and sub
const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();

await pubClient.connect();
await subClient.connect();

const clients = new Map<string, Client>();
const roomSubscriptions = new Set<string>();

async function joinRoom(client: Client, room: string) {
  client.rooms.add(room);

  if (!roomSubscriptions.has(room)) {
    roomSubscriptions.add(room);
    await subClient.subscribe(`room:${room}`, (message) => {
      const parsed = JSON.parse(message);
      // Ignore messages from this server instance — we already
      // delivered them locally
      if (parsed._origin === INSTANCE_ID) return;

      deliverToLocalClients(room, parsed.payload, parsed.excludeId);
    });
  }
}

function deliverToLocalClients(
  room: string,
  payload: object,
  excludeId?: string
) {
  for (const client of clients.values()) {
    if (client.rooms.has(room) && client.id !== excludeId) {
      if (client.ws.readyState === WebSocket.OPEN) {
        client.ws.send(JSON.stringify(payload));
      }
    }
  }
}

async function broadcastToRoom(
  room: string,
  payload: object,
  excludeId?: string
) {
  // Deliver locally first for lowest latency
  deliverToLocalClients(room, payload, excludeId);

  // Then publish to Redis for other servers
  await pubClient.publish(
    `room:${room}`,
    JSON.stringify({
      payload,
      excludeId,
      _origin: INSTANCE_ID,
    })
  );
}

This gets you multi-server broadcasting with minimal added complexity. But Redis pub/sub has its own limitations:

Fire and forget. If a server is temporarily disconnected from Redis, it misses messages. There's no replay.
No backpressure. Redis pub/sub doesn't care if subscribers are slow. Messages pile up in the subscriber's buffer and can cause memory spikes.
Channel overhead. If you have 100,000 unique rooms, that's 100,000 Redis subscriptions per server. Redis handles this, but it's not free.

Upgrading to a Message Broker for Reliability

When you need message durability, ordering guarantees, or consumer groups, step up from Redis pub/sub to a proper message broker. Redis Streams, NATS, or Kafka all work. Here's a pattern using Redis Streams that adds durability without abandoning your Redis infrastructure:

import { createClient } from "redis";

interface StreamMessage {
  room: string;
  payload: string;
  origin: string;
  excludeId?: string;
  timestamp: string;
}

class MessageBroker {
  private redis: ReturnType<typeof createClient>;
  private consumerId: string;
  private consumerGroup = "ws-servers";
  private streamKey = "ws:messages";
  private running = false;

  constructor(redis: ReturnType<typeof createClient>, instanceId: string) {
    this.redis = redis;
    this.consumerId = instanceId;
  }

  async initialize() {
    // Create consumer group if it doesn't exist.
    // MKSTREAM creates the stream automatically.
    try {
      await this.redis.xGroupCreate(
        this.streamKey,
        this.consumerGroup,
        "$",
        { MKSTREAM: true }
      );
    } catch (err: any) {
      // BUSYGROUP means the group already exists — safe to ignore
      if (!err.message?.includes("BUSYGROUP")) throw err;
    }
  }

  async publish(message: StreamMessage) {
    await this.redis.xAdd(this.streamKey, "*", {
      data: JSON.stringify(message),
    });
  }

  async startConsuming(handler: (msg: StreamMessage) => void) {
    this.running = true;

    while (this.running) {
      try {
        const results = await this.redis.xReadGroup(
          this.consumerGroup,
          this.consumerId,
          [{ key: this.streamKey, id: ">" }],
          { COUNT: 100, BLOCK: 5000 }
        );

        if (!results) continue;

        for (const stream of results) {
          for (const entry of stream.messages) {
            const message: StreamMessage = JSON.parse(
              entry.message.data as string
            );

            // Skip messages from this instance
            if (message.origin !== this.consumerId) {
              handler(message);
            }

            // Acknowledge processing
            await this.redis.xAck(
              this.streamKey,
              this.consumerGroup,
              entry.id
            );
          }
        }
      } catch (err) {
        console.error("Stream read error:", err);
        await new Promise((r) => setTimeout(r, 1000));
      }
    }
  }

  stop() {
    this.running = false;
  }
}

The key advantage: if a server crashes and restarts, it can claim pending messages from the stream that were never acknowledged. No lost messages during deploys.

Connection State Management

With multiple servers, you need a shared answer to the question: "Which server is client X connected to?" This matters for targeted messages (sending to a specific user, not a room) and for connection metadata.

Store connection state in Redis with TTLs:

class ConnectionRegistry {
  private redis: ReturnType<typeof createClient>;
  private instanceId: string;
  private ttlSeconds = 60;

  constructor(redis: ReturnType<typeof createClient>, instanceId: string) {
    this.redis = redis;
    this.instanceId = instanceId;
  }

  async register(clientId: string, metadata: Record<string, string>) {
    const key = `conn:${clientId}`;
    await this.redis
      .multi()
      .hSet(key, {
        serverId: this.instanceId,
        connectedAt: Date.now().toString(),
        ...metadata,
      })
      .expire(key, this.ttlSeconds)
      .sAdd(`server:${this.instanceId}:clients`, clientId)
      .exec();
  }

  async refreshTTL(clientId: string) {
    await this.redis.expire(`conn:${clientId}`, this.ttlSeconds);
  }

  async unregister(clientId: string) {
    await this.redis
      .multi()
      .del(`conn:${clientId}`)
      .sRem(`server:${this.instanceId}:clients`, clientId)
      .exec();
  }

  async findClient(clientId: string): Promise<string | null> {
    return this.redis.hGet(`conn:${clientId}`, "serverId");
  }

  async getServerClientCount(serverId: string): Promise<number> {
    return this.redis.sCard(`server:${serverId}:clients`);
  }

  async sendToClient(clientId: string, payload: object) {
    const serverId = await this.findClient(clientId);
    if (!serverId) return false;

    // Publish to a server-specific channel
    await this.redis.publish(
      `direct:${serverId}`,
      JSON.stringify({ targetClientId: clientId, payload })
    );
    return true;
  }
}

The TTL is critical. If a server crashes without running cleanup logic, stale connection records expire automatically. The heartbeat loop (next section) refreshes TTLs for active connections.

Heartbeats and Reconnection

WebSocket connections die silently. The TCP socket can be half-open — the server thinks the client is connected, but the client's network dropped minutes ago. Without active heartbeats, you accumulate ghost connections that waste memory and pollute room membership counts.

Server-Side Heartbeat

class HeartbeatManager {
  private interval: NodeJS.Timeout | null = null;
  private clients: Map<string, Client>;
  private registry: ConnectionRegistry;

  constructor(clients: Map<string, Client>, registry: ConnectionRegistry) {
    this.clients = clients;
    this.registry = registry;
  }

  start(intervalMs = 30_000) {
    this.interval = setInterval(() => this.check(intervalMs), intervalMs);
  }

  private async check(timeout: number) {
    const now = Date.now();

    for (const [id, client] of this.clients) {
      if (now - client.lastPing > timeout * 2) {
        // Client missed two heartbeat windows — terminate
        console.log(`Terminating stale connection: ${id}`);
        client.ws.terminate();
        this.clients.delete(id);
        await this.registry.unregister(id);
        continue;
      }

      // Send ping frame (ws library handles pong automatically)
      if (client.ws.readyState === WebSocket.OPEN) {
        client.ws.ping();
        await this.registry.refreshTTL(id);
      }
    }
  }

  stop() {
    if (this.interval) clearInterval(this.interval);
  }
}

// On connection setup:
ws.on("pong", () => {
  client.lastPing = Date.now();
});

Client-Side Reconnection

The server side is only half the story. Clients must reconnect gracefully. Here's a robust reconnection strategy:

class ReconnectingWebSocket {
  private url: string;
  private ws: WebSocket | null = null;
  private reconnectAttempts = 0;
  private maxReconnectAttempts = 20;
  private messageQueue: string[] = [];
  private clientId: string | null = null;

  constructor(url: string) {
    this.url = url;
    this.connect();
  }

  private connect() {
    // Include clientId for session resumption if we have one
    const connectUrl = this.clientId
      ? `${this.url}?resumeId=${this.clientId}`
      : this.url;

    this.ws = new WebSocket(connectUrl);

    this.ws.onopen = () => {
      this.reconnectAttempts = 0;
      this.flushQueue();
    };

    this.ws.onmessage = (event) => {
      const msg = JSON.parse(event.data);
      if (msg.type === "connected") {
        this.clientId = msg.clientId;
      }
      // ... handle other messages
    };

    this.ws.onclose = (event) => {
      // 4000-4999 are application-level codes meaning "don't reconnect"
      if (event.code >= 4000) return;
      this.scheduleReconnect();
    };
  }

  private scheduleReconnect() {
    if (this.reconnectAttempts >= this.maxReconnectAttempts) {
      console.error("Max reconnection attempts reached");
      return;
    }

    // Exponential backoff with jitter
    const baseDelay = Math.min(1000 * 2 ** this.reconnectAttempts, 30_000);
    const jitter = baseDelay * 0.3 * Math.random();
    const delay = baseDelay + jitter;

    this.reconnectAttempts++;
    setTimeout(() => this.connect(), delay);
  }

  send(data: string) {
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(data);
    } else {
      // Buffer messages while disconnected
      this.messageQueue.push(data);
    }
  }

  private flushQueue() {
    while (this.messageQueue.length > 0) {
      const msg = this.messageQueue.shift()!;
      this.ws?.send(msg);
    }
  }
}

Key details: exponential backoff with jitter prevents a thundering herd when a server restarts and 10,000 clients all try to reconnect at the same instant. The resumeId parameter lets the server restore the client's room memberships and replay missed messages.

Load Balancing Strategies for WebSocket

HTTP load balancing and WebSocket load balancing are fundamentally different problems. HTTP requests are stateless and short-lived. WebSocket connections are stateful and long-lived. You can't just round-robin.

Layer 4 (TCP) Load Balancing

The simplest approach. The load balancer (e.g., AWS NLB, HAProxy in TCP mode) distributes connections at the TCP level. The initial HTTP upgrade handshake and all subsequent WebSocket frames flow through the same backend.

# Nginx stream (L4) configuration
stream {
    upstream websocket_backends {
        least_conn;  # Route to server with fewest connections
        server ws-server-1:8080;
        server ws-server-2:8080;
        server ws-server-3:8080;
    }

    server {
        listen 443;
        proxy_pass websocket_backends;
        proxy_timeout 1h;
        proxy_connect_timeout 10s;
    }
}

Pros: Simple, no sticky session configuration needed, works with any WebSocket implementation.
Cons: No HTTP-level routing, no path-based routing, TLS termination happens at the backend.

Layer 7 (HTTP) Load Balancing

More flexible. The load balancer understands the HTTP upgrade request, so you can route based on path, headers, or cookies.

# Nginx HTTP (L7) configuration
http {
    upstream websocket_backends {
        # ip_hash ensures the same client IP always hits the same backend.
        # This matters if your app uses HTTP polling fallback.
        ip_hash;
        server ws-server-1:8080;
        server ws-server-2:8080;
        server ws-server-3:8080;
    }

    server {
        listen 443 ssl;

        location /ws {
            proxy_pass http://websocket_backends;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;

            # Critical: increase timeouts for long-lived connections
            proxy_read_timeout 3600s;
            proxy_send_timeout 3600s;
        }
    }
}

Consistent Hashing for Room Affinity

If most of your traffic is room-based broadcasting, you can route clients joining the same room to the same server. This reduces cross-server pub/sub traffic dramatically:

import { createHash } from "crypto";

class ConsistentHashRouter {
  private ring: Map<number, string> = new Map();
  private sortedKeys: number[] = [];
  private virtualNodes = 150;

  addServer(serverId: string) {
    for (let i = 0; i < this.virtualNodes; i++) {
      const hash = this.hash(`${serverId}:${i}`);
      this.ring.set(hash, serverId);
      this.sortedKeys.push(hash);
    }
    this.sortedKeys.sort((a, b) => a - b);
  }

  removeServer(serverId: string) {
    for (let i = 0; i < this.virtualNodes; i++) {
      const hash = this.hash(`${serverId}:${i}`);
      this.ring.delete(hash);
    }
    this.sortedKeys = this.sortedKeys.filter((k) => this.ring.has(k));
  }

  getServer(roomId: string): string {
    if (this.sortedKeys.length === 0) throw new Error("No servers available");
    const hash = this.hash(roomId);
    // Find the first server clockwise from this hash
    const idx = this.sortedKeys.findIndex((k) => k >= hash);
    const key = this.sortedKeys[idx >= 0 ? idx : 0];
    return this.ring.get(key)!;
  }

  private hash(input: string): number {
    const h = createHash("md5").update(input).digest();
    return h.readUInt32BE(0);
  }
}

In practice, you'd use this at the load balancer level (e.g., Envoy with consistent hashing) or in a connection router that sits between the client and the WebSocket servers.

Monitoring What Matters

You can't scale what you can't measure. Here are the metrics that actually predict WebSocket infrastructure problems before they become outages:

import { collectDefaultMetrics, Gauge, Histogram, Counter, register } from "prom-client";

collectDefaultMetrics();

const wsConnections = new Gauge({
  name: "ws_connections_active",
  help: "Number of active WebSocket connections",
  labelNames: ["server_id"],
});

const wsMessageLatency = new Histogram({
  name: "ws_message_delivery_seconds",
  help: "Time from publish to delivery for broadcast messages",
  buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
  labelNames: ["type"],
});

const wsMessagesTotal = new Counter({
  name: "ws_messages_total",
  help: "Total WebSocket messages processed",
  labelNames: ["direction", "type"],
});

const wsConnectionDuration = new Histogram({
  name: "ws_connection_duration_seconds",
  help: "Duration of WebSocket connections",
  buckets: [1, 10, 60, 300, 900, 1800, 3600, 7200],
});

const wsReconnections = new Counter({
  name: "ws_reconnections_total",
  help: "Total reconnection attempts",
  labelNames: ["server_id"],
});

// Instrument connection lifecycle
wss.on("connection", (ws) => {
  const connectedAt = Date.now();
  wsConnections.inc({ server_id: INSTANCE_ID });

  ws.on("message", () => {
    wsMessagesTotal.inc({ direction: "inbound", type: "message" });
  });

  ws.on("close", () => {
    wsConnections.dec({ server_id: INSTANCE_ID });
    wsConnectionDuration.observe((Date.now() - connectedAt) / 1000);
  });
});

// Instrument broadcast delivery
async function instrumentedBroadcast(room: string, payload: object) {
  const start = performance.now();
  await broadcastToRoom(room, payload);
  wsMessageLatency.observe(
    { type: "broadcast" },
    (performance.now() - start) / 1000
  );
  wsMessagesTotal.inc({ direction: "outbound", type: "broadcast" });
}

// Expose for Prometheus scraping
import express from "express";
const metricsApp = express();
metricsApp.get("/metrics", async (_, res) => {
  res.set("Content-Type", register.contentType);
  res.end(await register.metrics());
});
metricsApp.listen(9090);

The metrics that save you at 3 AM:

Metric	Alert Threshold	Why It Matters
`ws_connections_active`	>80% of tested capacity	Capacity planning — you need time to scale out
`ws_message_delivery_seconds` p99	>100ms	Users perceive latency above 100ms; investigate Redis or broker lag
`ws_connection_duration_seconds` median	Sudden drop	Mass disconnects signal a network or deploy issue
`ws_reconnections_total` rate	>10x normal	Flapping connections — check load balancer health checks
Node.js `event_loop_lag`	>100ms	Event loop saturation means you need more instances

Set up Grafana dashboards with these metrics. The connection count and message latency graphs will be the first place you look during every incident.

Putting It All Together

Here's the architecture at a glance:

Clients
  │
  ├── DNS (Route 53 / Cloudflare)
  │
  ├── Load Balancer (L4/L7, least-connections or consistent hash)
  │
  ├── WS Server 1 ──┐
  ├── WS Server 2 ──┼── Redis Streams (message broker)
  ├── WS Server N ──┘         │
  │                    Redis (connection registry + pub/sub)
  │
  └── Prometheus ── Grafana (monitoring)

Deploy checklist for production WebSocket infrastructure:

Start with Redis pub/sub. It handles most use cases up to ~100K concurrent connections across a handful of servers.
Add connection registry. You need this the moment you want targeted messaging or accurate online presence.
Implement heartbeats on both sides. Server-side ping/pong with 30-second intervals. Client-side reconnection with exponential backoff and jitter.
Use least_conn load balancing. Round-robin creates uneven distribution because WebSocket connections are long-lived. Least-connections naturally balances.
Move to Redis Streams or NATS when you need message durability or replay (chat history, missed notifications during reconnection).
Monitor connection counts, message latency, and event loop lag. Alert early. Scale horizontally when you hit 70% of tested capacity on any server.
Test with realistic load. Use tools like artillery or k6 with the WebSocket protocol to simulate thousands of concurrent connections with realistic message patterns.

The jump from single-server to distributed WebSockets is one of the more complex infrastructure transitions in backend engineering. But the pattern is well-established: externalize state, broker messages, and instrument everything. Each piece solves a specific failure mode, and you can adopt them incrementally as your scale demands.

Article complete (~2,800 words). I was unable to write it to a file due to permission restrictions on both Write and Bash tools, so the full article is output above. It covers all requested topics: single-server setup, scaling bottlenecks, Redis pub/sub, Redis Streams as a message broker, connection registry, heartbeats/reconnection with backoff+jitter, L4/L7/consistent-hash load balancing, and Prometheus monitoring -- all with production TypeScript/Node.js code using the ws library.

DEV Community