DEV Community

Cover image for I tried to run Socket.IO on Cloud Run. It kinda worked… until it really didn’t. Here’s how I fixed it.
Chris
Chris

Posted on

I tried to run Socket.IO on Cloud Run. It kinda worked… until it really didn’t. Here’s how I fixed it.

TL;DR: I started with Socket.IO inside a single Cloud Run service. I was naive about keeping “prolonged sessions” alive on a stateless platform. When emits were flaky, I split into two containers (backend + realtime) as a quick fix but still unreliable during deploys/auto-scale, and min-instances would have raised baseline cost. The final setup that works: CRUD on Cloud Run, sockets on one small GCE VM, with Google Pub/Sub as the glue. Cheap. Stable. And my first “distributed system” that actually behaves.


Why Cloud Run + sockets bit me

Cloud Run is awesome for stateless HTTP. WebSockets technically work, but:

  • Multiple revisions/instances: Socket.IO keeps room membership in memory per process. When Cloud Run deploys or scales, sockets can connect to Revision A while your emit happens in Revision Bdropped emits.
  • Rollouts: During deploys, two revisions can be live. Your Pub/Sub subscriber may run in one, while connected sockets live in the other.
  • Scale-to-zero/cold starts: There are windows where the subscriber isn’t ready or clients haven’t rejoined yet.

I also saw cost drift when keeping a realtime-ish instance “warm”.


The original plan (and my wrong assumptions)

  • I put Socket.IO and CRUD API in one Cloud Run service.
  • Assumption #1 (naive): “I can just keep a long-lived WebSocket session on Cloud Run and it’ll be fine.” Reality: Cloud Run is stateless & ephemeral; revisions come/go. In-memory room state doesn’t persist across them.
  • Assumption #2 (naive): “If it’s flaky, adding capacity will improve reliability.” Reality: More processes without shared state amplify the problem.

Symptoms I saw: sometimes receiveMessage arrived, often it didn’t, especially around deploys.


The quick “second container” getaway that bit me

When things got flaky, I split into two containers: one for backend (CRUD) and one for realtime (Socket.IO). Great separation of concerns… but still unreliable when multiple copies existed (deploys/auto-scale):

  • Clients could connect to Realtime Container A, but the publish/emit path could run in Realtime Container B.
  • Socket.IO rooms are per-process memory → emits from B don’t reach sockets on A.
  • I considered min-instances=1 to keep it warm, but that raises baseline cost and still doesn’t fix cross-process room state.

Lesson: Two containers is fine structurally, but without a single always-on realtime process (or shared state like Redis), emits will still get “lost” across processes.


The architecture that fixed it

I kept Cloud Run for stateless business logic and moved sockets to one always-on process on GCE. Pub/Sub bridges the two worlds.

[ Frontend ]  --Socket.IO-->  [ Realtime (GCE, 1 small VM) ]
                                   ▲              │
                                   │ sockets      │ subscribes
                                   │              ▼
                                     Google Pub/Sub  (topics: chat.out, chat.in)
                                   ▲              │ publishes
                                   │              ▼
                         [ Cloud Run Backend (CRUD/DB/notifications) ]
Enter fullscreen mode Exit fullscreen mode
  • Backend (Cloud Run)

    • DB writes, unread counts, notifications, auth.
    • Publishes client-visible events to chat.out.
    • Subscribes to chat.in for commands (e.g., mark read).
  • Realtime (GCE, single small VM)

    • Socket.IO only (rooms, presence, emits).
    • Subscribes to chat.out and emits to sockets.
    • Publishes to chat.in for DB work the backend must do.

This removes the cross-instance room problem: all sockets live in one process.


Minimal wiring (what I actually changed)

1) Backend is publish-only (no direct emits)

Everything the client should see goes through Pub/Sub:

// Backend (Cloud Run) on message created / reaction toggled / unread updated
await pubSubService.publish(process.env.CHAT_OUT_TOPIC || 'chat.out', {
  type: 'receiveMessage', // or 'messageReadUpdate' | 'roomUpdated' | 'reactionRemoved'
  roomId,
  userId,      // optional for per-user broadcasts
  payload: {...} // what the client expects
});
Enter fullscreen mode Exit fullscreen mode

2) Realtime (GCE) subscribes and emits

// Realtime (GCE) boot
await pubSubService.subscribe(process.env.CHAT_OUT_SUB || 'chat.out.realtime', (msg) => {
  const { type, roomId, userId, payload } = msg || {};
  if (type === 'receiveMessage' && roomId)       io.to(roomId).emit('receiveMessage', payload);
  else if (type === 'messageReadUpdate' && roomId) io.to(roomId).emit('messageReadUpdate', payload);
  else if (type === 'roomUpdated' && userId)     io.to(`user:${userId}`).emit('roomUpdated', payload);
});
Enter fullscreen mode Exit fullscreen mode

3) Realtime → Backend for DB work

// Realtime publishes commands back
await pubSubService.publish(process.env.CHAT_IN_TOPIC || 'chat.in', {
  type: 'markMessageRead',
  roomId, readerId, senderId,
  timestamp: new Date().toISOString()
});
Enter fullscreen mode Exit fullscreen mode

4) Add a join ACK to kill the first-emit race

Server:

@SubscribeMessage('joinRoom')
handleJoinRoom({ roomId, userId }, client) {
  client.join(roomId);
  client.data = { roomId, userId };
  return { ok: true }; // ACK
}

@SubscribeMessage('joinUser')
handleJoinUser({ userId }, client) {
  client.join(`user:${userId}`);
  client.data.userId = userId;
  return { ok: true };
}
Enter fullscreen mode Exit fullscreen mode

Client:

socket.emit('joinRoom', { roomId, userId }, (ack) => {
  if (!ack?.ok) console.warn('joinRoom not ACKed');
});
socket.emit('joinUser', { userId }, (ack) => {
  if (!ack?.ok) console.warn('joinUser not ACKed');
});
Enter fullscreen mode Exit fullscreen mode

GCE setup (cheap + HTTPS)

  • VM: e2-micro (1 vCPU / 1 GB) was enough for my load; e2-small is comfy.
  • Static IP + Caddy for HTTPS:

/opt/realtime/docker-compose.yml

version: "3.9"
services:
  realtime:
    image: gcr.io/<PROJECT>/realtime:latest
    environment:
      NODE_ENV: production
      PORT: "8080"
      PUBSUB_PROJECT_ID: <PROJECT>
      CHAT_IN_TOPIC: chat.in
      CHAT_OUT_TOPIC: chat.out
      CHAT_OUT_SUB: chat.out.realtime
    restart: unless-stopped

  caddy:
    image: caddy:2-alpine
    ports: ["80:80","443:443"]
    volumes:
      - /opt/realtime/Caddyfile:/etc/caddy/Caddyfile
    depends_on: [realtime]
    restart: unless-stopped
Enter fullscreen mode Exit fullscreen mode

/opt/realtime/Caddyfile

realtime.yourdomain.com {
  reverse_proxy realtime:8080
}
Enter fullscreen mode Exit fullscreen mode

Point realtime.yourdomain.com to the VM’s static IP (I used Onamae.com DNS). Caddy auto-issues Let’s Encrypt and proxies WebSockets out of the box.

Dev tip: you can Stop the VM when idle. You’ll still pay for the disk + static IP, but not CPU/RAM.


Capacity & tuning

For mostly-idle chat:

  • e2-micro: ~1–2k concurrent sockets safely (I’d cap around 2k).
  • e2-small: ~3–5k+ safely.

Hardening:

  • Increase FD limits (LimitNOFILE) if you push many connections.
  • Socket.IO: transports: ['websocket'], sensible pingInterval/Timeout.
  • Monitor event-loop lag and CPU; bump to e2-small if sustained >70% CPU or lag >100–200ms.

What I learned (a.k.a. the useful mistakes)

  1. Cloud Run ≠ stateful socket host.
    You can run WebSockets there, but instance/revision churn breaks in-memory room state unless you add a shared adapter (Redis) and manage rollouts.

  2. “Prolonged sessions” don’t beat platform semantics.
    Keeping sockets open longer doesn’t stop Cloud Run from swapping revisions or scaling.

  3. Two containers didn’t fix cross-process state.
    Splitting backend/realtime clarified concerns but emits were still lost across processes without shared state.

  4. Min-instances helps warmness, not correctness.
    It raises baseline cost and doesn’t solve the multi-process room problem.

  5. One always-on realtime process is a huge simplifier.
    A tiny GCE VM is cheap and eliminates revision/scale races. CRUD remains serverless on Cloud Run.

  6. Pub/Sub is the right boundary.
    At-least-once + unordered is fine with idempotent handlers. It decouples backend logic from realtime emits.

  7. Join ACK matters.
    A tiny handshake prevents the “first emit before join” race.


“Do this and it will work” checklist

  • Backend (Cloud Run) publishes to chat.out; consumes chat.in
  • Realtime (GCE) consumes chat.out; publishes chat.in
  • No direct .emit() in backend, publish only
  • joinRoom/joinUser return { ok: true }; client waits for ACK
  • Socket.IO: transports: ['websocket'], CORS locked to your app origin
  • HTTPS via Caddy on the VM (ports 80/443 open)
  • Single subscriber process (no duplicate subscribers)

Final Note

I went from “one Cloud Run with sockets” → “two containers (backend + realtime) on Cloud Run” → “Cloud Run + one small GCE + Pub/Sub.”
It’s cheaper, simpler, and finally RELIABLE!!! If your emits are flaky on Cloud Run or you’re scaling sockets, try this split. For me, it turned a frustrating rabbit hole into a boring (in the best way) distributed system.

Top comments (0)