Olamilekan Lamidi

Posted on Apr 3

How I Architected a Real-Time Gaming Backend Supporting 3,000+ Concurrent Users with WebSockets

#websocket #node #backend #systemdesign

Real-time systems are unforgiving. In a standard web application, a 500-millisecond delay is an inconvenience. In a real-time multiplayer gaming platform, 500 milliseconds is the difference between a fair outcome and a furious user demanding a refund.

Between October 2019 and December 2021, I served as Senior Software Engineer at Lordwin Group, where I led a team of five backend developers building dice.ng — a real-time gaming platform where thousands of users placed wagers, watched outcomes resolve live, and expected the entire experience to feel instantaneous. The platform also encompassed an investment management system and hotel booking service, but the gaming backend was the most technically demanding piece of infrastructure I have ever designed.

This article is a technical deep dive into how I architected that real-time gaming backend from the ground up — the WebSocket infrastructure, the event-driven architecture, the scaling strategy, and the hard lessons learned from operating a system where latency directly impacts revenue.

The Problem: Sub-100ms Latency for Thousands of Simultaneous Players

The requirements were aggressive from day one. Lordwin Group needed a gaming platform that could:

Support 3,000+ concurrent WebSocket connections during peak hours
Deliver game events (dice rolls, bet confirmations, payout calculations) with sub-100ms latency
Maintain absolute consistency in game state — in a wagering platform, a race condition is not a bug, it is a financial liability
Handle burst traffic patterns — user activity would spike dramatically around evening hours and weekends, sometimes tripling within minutes
Achieve 99.9% uptime — every minute of downtime was measurable lost revenue

The initial prototype used HTTP polling. Clients hit an API endpoint every two seconds to check for game state updates. At 500 concurrent users, this generated over 15,000 HTTP requests per minute, the database was drowning, and the "real-time" experience felt sluggish and broken. It was immediately clear that polling could not scale. We needed a fundamentally different architecture.

My Role: Technical Lead and System Architect

As the senior engineer leading a team of five, my responsibilities went beyond writing code. I made the core architecture decisions, designed the WebSocket infrastructure, established the event-driven messaging patterns, defined the horizontal scaling strategy, and mentored junior engineers on real-time system design.

Olamilekan Lamidi was not just a contributor on this project — I owned the technical direction. Every critical design decision described in this article was one I either made directly or guided the team toward after evaluating alternatives.

Technical Deep Dive: Building the Real-Time Infrastructure

1. WebSocket Server Architecture with Node.js

I chose Node.js as the runtime for the WebSocket server. Its event-driven, non-blocking I/O model is purpose-built for maintaining thousands of persistent connections with minimal resource overhead. PHP (our backend for the investment and hotel systems) was poorly suited for long-lived connections, so I designed the gaming layer as a separate Node.js service communicating with the main platform through Redis message channels.

The core WebSocket server was built on the ws library rather than Socket.IO. While Socket.IO offers convenience features like automatic reconnection and room management, it adds protocol overhead and abstractions that reduce control. For a latency-sensitive gaming platform, I needed raw WebSocket performance with custom protocol handling.

const WebSocket = require('ws');
const Redis = require('ioredis');
const { v4: uuidv4 } = require('uuid');

const wss = new WebSocket.Server({
  port: 8080,
  maxPayload: 1024 * 16,
  perMessageDeflate: false,
  clientTracking: true,
});

const connections = new Map();

wss.on('connection', (ws, req) => {
  const connectionId = uuidv4();
  const clientIp = req.headers['x-forwarded-for'] || req.socket.remoteAddress;

  connections.set(connectionId, {
    ws,
    userId: null,
    rooms: new Set(),
    connectedAt: Date.now(),
    lastHeartbeat: Date.now(),
  });

  ws.on('message', (raw) => {
    try {
      const message = JSON.parse(raw);
      routeMessage(connectionId, message);
    } catch (err) {
      ws.send(JSON.stringify({ type: 'error', code: 'INVALID_PAYLOAD' }));
    }
  });

  ws.on('close', () => {
    cleanupConnection(connectionId);
  });

  ws.on('pong', () => {
    const conn = connections.get(connectionId);
    if (conn) conn.lastHeartbeat = Date.now();
  });

  ws.send(JSON.stringify({
    type: 'connected',
    connectionId,
    serverTime: Date.now(),
  }));
});

I disabled perMessageDeflate deliberately. Compression adds CPU overhead per message, and since our payloads were small (typically under 500 bytes for game events), the bandwidth savings were negligible compared to the latency cost. This single configuration change reduced median message delivery time by 8ms across our benchmark tests.

2. Connection Management and Authentication

In a gaming platform handling real money, every WebSocket connection must be authenticated. I implemented a token-based authentication flow where clients first obtain a short-lived JWT via the REST API, then present it during WebSocket handshake.

const jwt = require('jsonwebtoken');

function routeMessage(connectionId, message) {
  const conn = connections.get(connectionId);
  if (!conn) return;

  switch (message.type) {
    case 'authenticate':
      handleAuthentication(connectionId, message.token);
      break;
    case 'join_game':
      if (!conn.userId) return sendError(conn.ws, 'NOT_AUTHENTICATED');
      handleJoinGame(connectionId, message.gameId);
      break;
    case 'place_bet':
      if (!conn.userId) return sendError(conn.ws, 'NOT_AUTHENTICATED');
      handlePlaceBet(connectionId, message);
      break;
    case 'ping':
      conn.ws.send(JSON.stringify({ type: 'pong', serverTime: Date.now() }));
      break;
    default:
      sendError(conn.ws, 'UNKNOWN_MESSAGE_TYPE');
  }
}

function handleAuthentication(connectionId, token) {
  const conn = connections.get(connectionId);
  try {
    const payload = jwt.verify(token, process.env.JWT_SECRET, {
      algorithms: ['HS256'],
      maxAge: '5m',
    });
    conn.userId = payload.userId;
    conn.ws.send(JSON.stringify({ type: 'authenticated', userId: payload.userId }));
    userConnectionIndex.set(payload.userId, connectionId);
  } catch (err) {
    conn.ws.send(JSON.stringify({ type: 'auth_failed', reason: 'INVALID_TOKEN' }));
    conn.ws.close(4001, 'Authentication failed');
  }
}

The JWT had a deliberately short expiry of five minutes. Since it was only used for the WebSocket handshake, there was no reason for it to live longer. Once authenticated, the connection was maintained through heartbeat pings rather than token refresh cycles.

3. Room-Based Game State Management

Each active game session functioned as a "room" — a logical grouping of WebSocket connections that should receive the same events. I implemented a lightweight room system without relying on external libraries:

const rooms = new Map();

function handleJoinGame(connectionId, gameId) {
  const conn = connections.get(connectionId);
  if (!conn) return;

  if (!rooms.has(gameId)) {
    rooms.set(gameId, new Set());
  }

  rooms.get(gameId).add(connectionId);
  conn.rooms.add(gameId);

  conn.ws.send(JSON.stringify({
    type: 'game_joined',
    gameId,
    players: rooms.get(gameId).size,
  }));

  broadcastToRoom(gameId, {
    type: 'player_joined',
    userId: conn.userId,
    playerCount: rooms.get(gameId).size,
  }, connectionId);
}

function broadcastToRoom(gameId, payload, excludeConnectionId = null) {
  const room = rooms.get(gameId);
  if (!room) return;

  const message = JSON.stringify(payload);
  for (const connId of room) {
    if (connId === excludeConnectionId) continue;
    const conn = connections.get(connId);
    if (conn && conn.ws.readyState === WebSocket.OPEN) {
      conn.ws.send(message);
    }
  }
}

One critical optimisation: I serialised the message payload once with JSON.stringify outside the loop, then sent the same string buffer to every client. For a room of 500 players, this avoided 499 redundant serialisation operations per broadcast — a saving that compounds rapidly under load.

4. Event-Driven Game Engine with Redis Pub/Sub

The game logic itself — dice rolling, bet resolution, payout calculation — ran in a separate process from the WebSocket server. This was a deliberate architectural decision. The WebSocket server's only responsibility was managing connections and delivering messages. Game logic, financial calculations, and database writes happened in dedicated worker processes.

Redis Pub/Sub was the communication backbone:

const publisher = new Redis(process.env.REDIS_URL);
const subscriber = new Redis(process.env.REDIS_URL);

subscriber.subscribe('game:events', 'game:results', 'system:announcements');

subscriber.on('message', (channel, message) => {
  const event = JSON.parse(message);

  switch (channel) {
    case 'game:events':
      broadcastToRoom(event.gameId, {
        type: 'game_event',
        event: event.eventType,
        data: event.payload,
        timestamp: event.timestamp,
      });
      break;

    case 'game:results':
      handleGameResult(event);
      break;

    case 'system:announcements':
      broadcastToAll({
        type: 'system_announcement',
        message: event.message,
      });
      break;
  }
});

async function handlePlaceBet(connectionId, message) {
  const conn = connections.get(connectionId);

  const betEvent = {
    gameId: message.gameId,
    userId: conn.userId,
    amount: message.amount,
    selection: message.selection,
    timestamp: Date.now(),
    idempotencyKey: message.idempotencyKey,
  };

  await publisher.publish('bets:incoming', JSON.stringify(betEvent));

  conn.ws.send(JSON.stringify({
    type: 'bet_acknowledged',
    idempotencyKey: message.idempotencyKey,
    status: 'processing',
  }));
}

This decoupling had three critical benefits:

Fault isolation: If the game engine crashed, WebSocket connections remained alive. Users saw a brief pause rather than a disconnection.
Independent scaling: I could run multiple game engine workers to process bets in parallel without changing the WebSocket layer.
Auditability: Every game event flowed through Redis, creating a natural event stream that I logged for regulatory compliance and dispute resolution.

5. Horizontal Scaling Across Multiple WebSocket Servers

A single Node.js process can handle approximately 10,000 concurrent WebSocket connections before memory and CPU become constraints. But scaling WebSockets horizontally introduces a problem that REST APIs do not have: connection affinity. If Player A is connected to Server 1 and Player B to Server 2, a room broadcast must reach both servers.

I solved this with Redis as the cross-server message bus:

const serverId = process.env.SERVER_ID || uuidv4();

subscriber.subscribe('ws:broadcast');

subscriber.on('message', (channel, raw) => {
  if (channel !== 'ws:broadcast') return;
  const message = JSON.parse(raw);

  if (message.originServer === serverId) return;

  if (message.targetRoom) {
    broadcastToRoom(message.targetRoom, message.payload);
  } else if (message.targetUser) {
    sendToUser(message.targetUser, message.payload);
  }
});

function clusterBroadcastToRoom(gameId, payload) {
  broadcastToRoom(gameId, payload);

  publisher.publish('ws:broadcast', JSON.stringify({
    originServer: serverId,
    targetRoom: gameId,
    payload,
  }));
}

Each WebSocket server instance subscribed to a shared Redis channel. When a game event needed to reach all players in a room, the originating server broadcast to its local connections and simultaneously published to Redis, where other servers picked up the message and relayed it to their local connections.

The originServer check prevented message loops — without it, a server would re-broadcast messages it had already delivered locally.

6. Heartbeat Monitoring and Connection Cleanup

Stale connections are a silent performance killer in WebSocket systems. Users close browser tabs, lose internet connectivity, or let their phones go to sleep. Without proactive cleanup, the server accumulates zombie connections that consume memory and distort room player counts.

I implemented a heartbeat interval that pinged every client every 30 seconds:

const HEARTBEAT_INTERVAL = 30000;
const HEARTBEAT_TIMEOUT = 45000;

setInterval(() => {
  const now = Date.now();
  for (const [connectionId, conn] of connections) {
    if (now - conn.lastHeartbeat > HEARTBEAT_TIMEOUT) {
      conn.ws.terminate();
      cleanupConnection(connectionId);
      continue;
    }
    if (conn.ws.readyState === WebSocket.OPEN) {
      conn.ws.ping();
    }
  }
}, HEARTBEAT_INTERVAL);

function cleanupConnection(connectionId) {
  const conn = connections.get(connectionId);
  if (!conn) return;

  for (const gameId of conn.rooms) {
    const room = rooms.get(gameId);
    if (room) {
      room.delete(connectionId);
      if (room.size === 0) {
        rooms.delete(gameId);
      } else {
        broadcastToRoom(gameId, {
          type: 'player_left',
          userId: conn.userId,
          playerCount: room.size,
        });
      }
    }
  }

  if (conn.userId) {
    userConnectionIndex.delete(conn.userId);
  }

  connections.delete(connectionId);
}

This heartbeat mechanism maintained accurate player counts and ensured we never wasted resources on dead connections — critical when operating at 3,000+ concurrent users where even small inefficiencies compound.

Ensuring Game Integrity: The Financial Safety Layer

In a platform where real money is at stake, game outcome integrity is non-negotiable. I designed several safeguards:

Idempotent Bet Processing: Every bet carried a client-generated idempotency key. If a network hiccup caused a duplicate submission, the game engine would recognise the duplicate key and return the original response rather than processing the bet twice.

Atomic Balance Operations: All balance changes used Redis WATCH/MULTI/EXEC transactions for in-memory operations and database-level row locking for persistence, ensuring no user could ever bet more than their balance — even under concurrent requests.

async function processBalanceDeduction(userId, amount, idempotencyKey) {
  const lockKey = `lock:balance:${userId}`;
  const lock = await acquireLock(lockKey, 5000);

  try {
    const processed = await redis.get(`idempotency:${idempotencyKey}`);
    if (processed) return JSON.parse(processed);

    const balance = parseFloat(await redis.get(`balance:${userId}`));
    if (balance < amount) {
      return { success: false, reason: 'INSUFFICIENT_BALANCE' };
    }

    const newBalance = balance - amount;
    await redis.set(`balance:${userId}`, newBalance.toString());

    const result = { success: true, newBalance, deducted: amount };
    await redis.setex(`idempotency:${idempotencyKey}`, 3600, JSON.stringify(result));

    return result;
  } finally {
    await releaseLock(lock);
  }
}

Provably Fair Outcomes: Game results were generated using a combination of server seed and client seed, hashed together. Users could verify after each round that the outcome was not manipulated. This was both a regulatory requirement and a trust-building feature.

Monitoring and Observability in Production

Operating a real-time gaming system requires comprehensive observability. I built custom monitoring dashboards tracking:

Active connections per server instance — to trigger auto-scaling
Message throughput — messages sent per second across all rooms
Event latency — time from game engine event emission to client delivery
Redis pub/sub lag — early warning for message bus saturation
Game round completion times — anomaly detection for stuck game sessions

I instrumented the WebSocket server with Prometheus metrics:

const client = require('prom-client');

const activeConnections = new client.Gauge({
  name: 'ws_active_connections',
  help: 'Number of active WebSocket connections',
});

const messageLatency = new client.Histogram({
  name: 'ws_message_latency_ms',
  help: 'Message delivery latency in milliseconds',
  buckets: [5, 10, 25, 50, 100, 250, 500],
});

const messagesPerSecond = new client.Counter({
  name: 'ws_messages_total',
  help: 'Total WebSocket messages sent',
  labelNames: ['type'],
});

These metrics fed into Grafana dashboards that gave me real-time visibility into system health. On several occasions, a spike in message latency gave us a 10-minute early warning before a Redis memory issue would have caused visible user impact.

The Results: Production Metrics

After three months of iterative development and load testing, Olamilekan Lamidi and the engineering team shipped the gaming backend into production. The measurable outcomes exceeded our original targets:

Metric	Target	Achieved
Concurrent WebSocket connections	3,000	3,200+ sustained peak
Event delivery latency (p50)	<100ms	32ms
Event delivery latency (p99)	<200ms	87ms
System uptime (monthly)	99.5%	99.9%
Bet processing throughput	500/min	850+/min
Connection recovery after deploy	<5s	~2.5s average

The sub-50ms median latency was particularly significant. Users experienced game events as truly instantaneous — dice rolls resolved, payouts appeared, and leaderboards updated in what felt like real time. This directly correlated with user engagement metrics: average session duration increased by 35% compared to the earlier HTTP polling prototype.

Lessons Learned

Building this system taught me principles that I have carried into every project since:

Separate connection management from business logic. The WebSocket server should be a dumb pipe. All intelligence belongs in backend workers communicating through message queues.
Design for graceful degradation. When the game engine was temporarily slow, the WebSocket layer continued serving heartbeats and acknowledged messages. Users experienced a delay, not a crash.
Serialise once, send many. In broadcast-heavy systems, the cost of JSON serialisation dwarfs the cost of network transmission. Serialise the payload once and reuse the buffer.
Monitor latency percentiles, not averages. A 30ms average means nothing if 5% of your users experience 500ms delays. The p99 is what your angriest users feel.
Test with realistic load patterns. Our load tests simulated bursty evening traffic with sudden spikes, not steady-state throughput. Production traffic is never smooth.

Final Thoughts

Architecting a real-time gaming backend was the most technically challenging and rewarding project of my career up to that point. It demanded a deep understanding of network protocols, distributed systems, concurrent programming, and financial transaction safety — all operating under latency constraints that left no room for architectural shortcuts.

The patterns I developed at Lordwin Group — event-driven architecture, Redis-backed horizontal scaling, idempotent financial operations — became foundational to how I approach every real-time system I have built since. At VacancySoft, I applied similar event-driven patterns to handle 50,000+ daily API requests. At 2am Tech, the idempotent transaction processing pattern directly informed the financial workflows I built for the Addio platform.

Real-time systems are hard. But the engineering discipline they demand makes you a better systems engineer in every other context. If you are building similar infrastructure, I hope this deep dive gives you a head start on the decisions and trade-offs that matter most.

Olamilekan Lamidi is a Senior Full-Stack Engineer with 9+ years of experience building scalable, high-performance web applications. He specialises in designing robust APIs, optimising systems for performance at scale, and leading engineering teams to deliver reliable production systems.

Connect on LinkedIn | GitHub

Tags: #WebSockets #NodeJS #RealTime #GameDev #SystemDesign #BackendEngineering #Redis #JavaScript #WebDevelopment #SoftwareArchitecture

Top comments (3)

Andre Cytryn • Apr 3

the decoupling of WebSocket server from game logic is the key architectural decision here. keeping the WS layer focused solely on connection management and message routing, while game state lives in a separate process backed by Redis pub/sub, is what makes horizontal scaling tractable.

the idempotency key pattern on bet placement is critical in any system handling real money. what's often missed is that clients also need to generate stable keys per user action, not per request, so retries on network failures don't create duplicate bets. curious whether you handled the case where the client loses the connection before receiving the bet acknowledgement and retries on reconnect.

Olamilekan Lamidi • Apr 6

Thanks — you've touched on exactly the right pressure points.

On the WebSocket/game-logic separation: this was the decision everything else hinged on. Early in development we had game state mutations living inside the WebSocket message handlers, and it was a nightmare to reason about. A single slow bet resolution would back up the entire event loop, stalling heartbeats for every connected client. Moving game logic into its own process behind Redis pub/sub meant the WS layer stayed responsive even when the game engine was under heavy load. The side benefit was that we could restart or redeploy game workers without dropping a single connection.

On the idempotency key semantics — you're absolutely right that the key must be scoped to the user action, not the network request. In our implementation, the client generated the idempotency key at the moment the user tapped "Place Bet," before the WebSocket frame was even sent. That key was persisted to local storage immediately. If the connection dropped mid-flight, the client would rehydrate pending bets from local storage on reconnect and replay them with the same keys. The server side was straightforward — the processBalanceDeduction function checked idempotency:{key} in Redis before touching the balance, so a replayed bet would return the original result without a double deduction.

The trickier edge case was actually the acknowledgement gap you mentioned. The bet could be fully processed server-side (balance deducted, wager recorded), but if the client disconnected before receiving bet_acknowledged, it had no way to know whether the bet landed. We handled this with a reconciliation step on reconnect: after re-authentication, the client sent a sync_pending_bets message with its list of unacknowledged idempotency keys. The server looked up each key's result and returned the current state — whether it was still processing, confirmed, or already resolved with a payout. This meant the user never saw a "ghost" bet or a missing balance without explanation.

async function handleSyncPendingBets(connectionId, message) {
  const conn = connections.get(connectionId);
  const results = [];

  for (const key of message.pendingKeys) {
    const cached = await redis.get(`idempotency:${key}`);
    if (cached) {
      results.push({ idempotencyKey: key, status: 'resolved', result: JSON.parse(cached) });
    } else {
      results.push({ idempotencyKey: key, status: 'unknown' });
    }
  }

  conn.ws.send(JSON.stringify({
    type: 'pending_bets_sync',
    bets: results,
  }));
}

One thing I'd add in hindsight: we set the idempotency key TTL to one hour, which worked for our game round durations. But for systems with longer settlement windows, you'd want that TTL tied to the lifecycle of the transaction itself rather than a fixed expiry — otherwise a very late retry could slip through after the key expires.

Great question — this is the kind of edge case that separates systems that work in demos from systems that survive production traffic.

Michael George • Apr 13

This is solid, good read