TL;DR: I started with Socket.IO inside a single Cloud Run service. I was naive about keeping “prolonged sessions” alive on a stateless platform. When emits were flaky, I split into two containers (backend + realtime) as a quick fix but still unreliable during deploys/auto-scale, and min-instances would have raised baseline cost. The final setup that works: CRUD on Cloud Run, sockets on one small GCE VM, with Google Pub/Sub as the glue. Cheap. Stable. And my first “distributed system” that actually behaves.
Why Cloud Run + sockets bit me
Cloud Run is awesome for stateless HTTP. WebSockets technically work, but:
- Multiple revisions/instances: Socket.IO keeps room membership in memory per process. When Cloud Run deploys or scales, sockets can connect to Revision A while your emit happens in Revision B → dropped emits.
- Rollouts: During deploys, two revisions can be live. Your Pub/Sub subscriber may run in one, while connected sockets live in the other.
- Scale-to-zero/cold starts: There are windows where the subscriber isn’t ready or clients haven’t rejoined yet.
I also saw cost drift when keeping a realtime-ish instance “warm”.
The original plan (and my wrong assumptions)
- I put Socket.IO and CRUD API in one Cloud Run service.
- Assumption #1 (naive): “I can just keep a long-lived WebSocket session on Cloud Run and it’ll be fine.” Reality: Cloud Run is stateless & ephemeral; revisions come/go. In-memory room state doesn’t persist across them.
- Assumption #2 (naive): “If it’s flaky, adding capacity will improve reliability.” Reality: More processes without shared state amplify the problem.
Symptoms I saw: sometimes receiveMessage
arrived, often it didn’t, especially around deploys.
The quick “second container” getaway that bit me
When things got flaky, I split into two containers: one for backend (CRUD) and one for realtime (Socket.IO). Great separation of concerns… but still unreliable when multiple copies existed (deploys/auto-scale):
- Clients could connect to Realtime Container A, but the publish/emit path could run in Realtime Container B.
- Socket.IO rooms are per-process memory → emits from B don’t reach sockets on A.
- I considered min-instances=1 to keep it warm, but that raises baseline cost and still doesn’t fix cross-process room state.
Lesson: Two containers is fine structurally, but without a single always-on realtime process (or shared state like Redis), emits will still get “lost” across processes.
The architecture that fixed it
I kept Cloud Run for stateless business logic and moved sockets to one always-on process on GCE. Pub/Sub bridges the two worlds.
[ Frontend ] --Socket.IO--> [ Realtime (GCE, 1 small VM) ]
▲ │
│ sockets │ subscribes
│ ▼
Google Pub/Sub (topics: chat.out, chat.in)
▲ │ publishes
│ ▼
[ Cloud Run Backend (CRUD/DB/notifications) ]
-
Backend (Cloud Run)
- DB writes, unread counts, notifications, auth.
-
Publishes client-visible events to
chat.out
. -
Subscribes to
chat.in
for commands (e.g., mark read).
-
Realtime (GCE, single small VM)
- Socket.IO only (rooms, presence, emits).
-
Subscribes to
chat.out
and emits to sockets. -
Publishes to
chat.in
for DB work the backend must do.
This removes the cross-instance room problem: all sockets live in one process.
Minimal wiring (what I actually changed)
1) Backend is publish-only (no direct emits)
Everything the client should see goes through Pub/Sub:
// Backend (Cloud Run) on message created / reaction toggled / unread updated
await pubSubService.publish(process.env.CHAT_OUT_TOPIC || 'chat.out', {
type: 'receiveMessage', // or 'messageReadUpdate' | 'roomUpdated' | 'reactionRemoved'
roomId,
userId, // optional for per-user broadcasts
payload: {...} // what the client expects
});
2) Realtime (GCE) subscribes and emits
// Realtime (GCE) boot
await pubSubService.subscribe(process.env.CHAT_OUT_SUB || 'chat.out.realtime', (msg) => {
const { type, roomId, userId, payload } = msg || {};
if (type === 'receiveMessage' && roomId) io.to(roomId).emit('receiveMessage', payload);
else if (type === 'messageReadUpdate' && roomId) io.to(roomId).emit('messageReadUpdate', payload);
else if (type === 'roomUpdated' && userId) io.to(`user:${userId}`).emit('roomUpdated', payload);
});
3) Realtime → Backend for DB work
// Realtime publishes commands back
await pubSubService.publish(process.env.CHAT_IN_TOPIC || 'chat.in', {
type: 'markMessageRead',
roomId, readerId, senderId,
timestamp: new Date().toISOString()
});
4) Add a join ACK to kill the first-emit race
Server:
@SubscribeMessage('joinRoom')
handleJoinRoom({ roomId, userId }, client) {
client.join(roomId);
client.data = { roomId, userId };
return { ok: true }; // ACK
}
@SubscribeMessage('joinUser')
handleJoinUser({ userId }, client) {
client.join(`user:${userId}`);
client.data.userId = userId;
return { ok: true };
}
Client:
socket.emit('joinRoom', { roomId, userId }, (ack) => {
if (!ack?.ok) console.warn('joinRoom not ACKed');
});
socket.emit('joinUser', { userId }, (ack) => {
if (!ack?.ok) console.warn('joinUser not ACKed');
});
GCE setup (cheap + HTTPS)
-
VM:
e2-micro
(1 vCPU / 1 GB) was enough for my load;e2-small
is comfy. - Static IP + Caddy for HTTPS:
/opt/realtime/docker-compose.yml
version: "3.9"
services:
realtime:
image: gcr.io/<PROJECT>/realtime:latest
environment:
NODE_ENV: production
PORT: "8080"
PUBSUB_PROJECT_ID: <PROJECT>
CHAT_IN_TOPIC: chat.in
CHAT_OUT_TOPIC: chat.out
CHAT_OUT_SUB: chat.out.realtime
restart: unless-stopped
caddy:
image: caddy:2-alpine
ports: ["80:80","443:443"]
volumes:
- /opt/realtime/Caddyfile:/etc/caddy/Caddyfile
depends_on: [realtime]
restart: unless-stopped
/opt/realtime/Caddyfile
realtime.yourdomain.com {
reverse_proxy realtime:8080
}
Point realtime.yourdomain.com
to the VM’s static IP (I used Onamae.com DNS). Caddy auto-issues Let’s Encrypt and proxies WebSockets out of the box.
Dev tip: you can Stop the VM when idle. You’ll still pay for the disk + static IP, but not CPU/RAM.
Capacity & tuning
For mostly-idle chat:
- e2-micro: ~1–2k concurrent sockets safely (I’d cap around 2k).
- e2-small: ~3–5k+ safely.
Hardening:
- Increase FD limits (
LimitNOFILE
) if you push many connections. - Socket.IO:
transports: ['websocket']
, sensible pingInterval/Timeout. - Monitor event-loop lag and CPU; bump to e2-small if sustained >70% CPU or lag >100–200ms.
What I learned (a.k.a. the useful mistakes)
Cloud Run ≠ stateful socket host.
You can run WebSockets there, but instance/revision churn breaks in-memory room state unless you add a shared adapter (Redis) and manage rollouts.“Prolonged sessions” don’t beat platform semantics.
Keeping sockets open longer doesn’t stop Cloud Run from swapping revisions or scaling.Two containers didn’t fix cross-process state.
Splitting backend/realtime clarified concerns but emits were still lost across processes without shared state.Min-instances helps warmness, not correctness.
It raises baseline cost and doesn’t solve the multi-process room problem.One always-on realtime process is a huge simplifier.
A tiny GCE VM is cheap and eliminates revision/scale races. CRUD remains serverless on Cloud Run.Pub/Sub is the right boundary.
At-least-once + unordered is fine with idempotent handlers. It decouples backend logic from realtime emits.Join ACK matters.
A tiny handshake prevents the “first emit before join” race.
“Do this and it will work” checklist
- Backend (Cloud Run) publishes to
chat.out
; consumeschat.in
- Realtime (GCE) consumes
chat.out
; publisheschat.in
-
No direct
.emit()
in backend, publish only -
joinRoom/joinUser return
{ ok: true }
; client waits for ACK - Socket.IO:
transports: ['websocket']
, CORS locked to your app origin - HTTPS via Caddy on the VM (ports 80/443 open)
- Single subscriber process (no duplicate subscribers)
Final Note
I went from “one Cloud Run with sockets” → “two containers (backend + realtime) on Cloud Run” → “Cloud Run + one small GCE + Pub/Sub.”
It’s cheaper, simpler, and finally RELIABLE!!! If your emits are flaky on Cloud Run or you’re scaling sockets, try this split. For me, it turned a frustrating rabbit hole into a boring (in the best way) distributed system.
Top comments (0)