DEV Community

Seung Hyun Park
Seung Hyun Park

Posted on

Building Reliable WebSocket Connections for Real-Time Restaurant Operations

When you're building real-time systems for industries like food service, reliability isn't optional. A dropped WebSocket connection during a dinner rush means lost orders, confused staff, and unhappy customers. Here's what I've learned building WebSocket infrastructure for restaurant operations platforms.

The Problem Space

Restaurant operations generate a constant stream of events: new phone calls, reservation updates, order modifications, table status changes, and kitchen notifications. HTTP polling falls apart at this scale — you need persistent, bidirectional connections that can handle bursts of activity during peak hours.

The challenge is that restaurants operate in environments hostile to stable connections. Staff move between WiFi dead zones, POS terminals run on aging hardware, and the kitchen's microwave occasionally knocks out the 2.4GHz band entirely.

Connection Architecture

The foundation is a reconnection strategy that handles the full spectrum of failure modes:

class ReliableSocket {
  constructor(url, options = {}) {
    this.url = url;
    this.reconnectDelay = options.initialDelay || 1000;
    this.maxDelay = options.maxDelay || 30000;
    this.heartbeatInterval = options.heartbeat || 15000;
    this.messageQueue = [];
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);
    this.ws.onopen = () => {
      this.reconnectDelay = 1000;
      this.startHeartbeat();
      this.flushQueue();
    };
    this.ws.onclose = (event) => {
      this.stopHeartbeat();
      if (!event.wasClean) {
        this.scheduleReconnect();
      }
    };
  }

  scheduleReconnect() {
    const jitter = Math.random() * 0.3 * this.reconnectDelay;
    setTimeout(() => this.connect(), this.reconnectDelay + jitter);
    this.reconnectDelay = Math.min(
      this.reconnectDelay * 2, 
      this.maxDelay
    );
  }

  send(data) {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(data));
    } else {
      this.messageQueue.push(data);
    }
  }

  flushQueue() {
    while (this.messageQueue.length > 0) {
      const msg = this.messageQueue.shift();
      this.ws.send(JSON.stringify(msg));
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The key decisions here: exponential backoff with jitter prevents thundering herd problems when your server restarts, and the message queue ensures no events are lost during reconnection windows.

Heartbeat Strategy

Silent connection drops are the hardest to detect. The TCP keepalive mechanism is too slow for real-time applications — you can't wait 2 hours to discover a connection is dead when orders are flowing in.

startHeartbeat() {
  this.pingTimer = setInterval(() => {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'ping' }));
      this.pongTimeout = setTimeout(() => {
        // No pong received — connection is dead
        this.ws.close();
      }, 5000);
    }
  }, this.heartbeatInterval);
}
Enter fullscreen mode Exit fullscreen mode

A 15-second heartbeat interval works well for restaurant environments. Aggressive enough to detect failures quickly, but not so chatty that it wastes bandwidth on cellular connections.

Handling Phone Call Events

For AI phone systems that handle restaurant calls, WebSocket connections carry particularly critical data. When a customer calls to make a reservation, the event flow looks like this:

  1. Call arrives at the voice agent
  2. WebSocket pushes a call_started event to the dashboard
  3. As the AI handles the call, real-time transcription streams to the operator
  4. When a reservation is confirmed, a reservation_created event updates the calendar view
  5. Call summary and recording become available via call_completed

Each of these events must arrive in order and without duplication. Implementing idempotency keys on the server side prevents duplicate processing when clients reconnect and replay queued messages.

Server-Side Considerations

On the server, the WebSocket handler needs to manage connection state per restaurant location:

const locationSockets = new Map();

wss.on('connection', (ws, req) => {
  const locationId = authenticate(req);

  if (!locationSockets.has(locationId)) {
    locationSockets.set(locationId, new Set());
  }
  locationSockets.get(locationId).add(ws);

  ws.on('close', () => {
    locationSockets.get(locationId)?.delete(ws);
  });
});

function broadcastToLocation(locationId, event) {
  const sockets = locationSockets.get(locationId);
  if (sockets) {
    const payload = JSON.stringify(event);
    for (const ws of sockets) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(payload);
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This location-scoped broadcasting ensures that events from one restaurant's virtual receptionist don't leak to another location's dashboard.

Scaling Beyond a Single Server

Once you're handling more than a few hundred concurrent locations, single-server WebSocket management hits its limits. Redis Pub/Sub provides a straightforward fan-out mechanism:

const Redis = require('ioredis');
const sub = new Redis();
const pub = new Redis();

sub.subscribe('restaurant-events');
sub.on('message', (channel, message) => {
  const event = JSON.parse(message);
  broadcastToLocation(event.locationId, event);
});

// From your API or event processor
function publishEvent(event) {
  pub.publish('restaurant-events', JSON.stringify(event));
}
Enter fullscreen mode Exit fullscreen mode

This pattern lets you run multiple WebSocket servers behind a load balancer while maintaining event delivery guarantees.

Monitoring and Observability

In production, track these metrics for your WebSocket infrastructure:

  • Connection count per location — Sudden drops indicate network issues
  • Reconnection frequency — High reconnect rates suggest infrastructure problems
  • Message latency — Time from event generation to client delivery
  • Queue depth — Messages waiting during disconnections
  • Heartbeat failures — Leading indicator of connection health

Lessons Learned

After running this architecture in production across multiple restaurant locations:

  1. Always implement message ordering. Out-of-order events cause UI inconsistencies that confuse staff during busy periods.
  2. Test on real restaurant hardware. That tablet mounted in the kitchen handles connections differently than your development MacBook.
  3. Plan for cellular fallback. When the restaurant's WiFi goes down, the system should gracefully degrade to cellular without losing critical events.
  4. Log connection state transitions. When debugging production issues at 7 PM on a Friday, you'll want to know exactly when connections dropped and recovered.

The investment in connection reliability pays for itself quickly. When every minute of downtime potentially means missed calls and lost revenue, the engineering effort to build robust WebSocket infrastructure is well justified.

Top comments (0)