If you've ever deployed a Socket.IO app to Kubernetes and watched your real-time features silently break — no errors, no crashes, just messages vanishing — this is exactly what happened to me, and here's how I fixed it.
The Problem
We had a real-time feature built with WebSockets (Socket.IO). Everything worked perfectly in local and single-instance environments.
The moment we deployed to Kubernetes with multiple pods, things broke.
Symptoms
- Users connected to different pods couldn't receive each other's events
- Messages randomly "disappeared"
- Broadcasting only worked within the same pod
At first glance, everything looked fine — no errors, no crashes. But the system was fundamentally broken at the architectural level.
Root Cause: In-Memory Connections Don't Cross Pod Boundaries
Socket.IO (and WebSockets in general) maintain in-memory connections.
That means:
- Each pod has its own isolated set of connected clients
- There is no shared state between pods
So when User A connects to Pod 1 and User B connects to Pod 2, and Pod 1 emits an event — Pod 2 has no idea that event exists.
Each pod becomes a real-time island.
The Architecture Problem
Kubernetes distributes traffic using a load balancer. Requests get routed to different pods randomly — and without a shared communication layer, those pods can never talk to each other.
Without a shared messaging layer, real-time systems break horizontally.
The Solution: Redis Pub/Sub + Socket.IO Adapter
To fix this, we introduced Redis Pub/Sub using the official Socket.IO Redis adapter.
What Redis does here
- Acts as a message broker between all pods
- When Pod 1 emits an event → it's published to Redis
- Redis broadcasts it to all subscribed pods
- Every pod then emits it to its own connected clients
Result: All clients receive the event, regardless of which pod they're on.
Implementation
1. Install dependencies
npm install socket.io @socket.io/redis-adapter ioredis
2. Create Redis pub/sub clients
import { createClient } from "ioredis";
const pubClient = createClient({ host: "redis-host", port: 6379 });
const subClient = pubClient.duplicate();
Two separate clients are required — one for publishing, one for subscribing. This is a Redis Pub/Sub requirement.
3. Attach the Redis adapter to Socket.IO
import { Server } from "socket.io";
import { createAdapter } from "@socket.io/redis-adapter";
const io = new Server(server, {
cors: { origin: "*" }
});
io.adapter(createAdapter(pubClient, subClient));
4. Emit events — nothing changes
io.emit("message", data);
No changes to your business logic. Redis handles the cross-pod sync entirely behind the scenes.
Result
After the fix:
- Events are synchronized across all pods
- Real-time communication works reliably in Kubernetes
- No more silent "missing messages"
- The fix required zero changes to business logic
Things You Should Know Before Using This
1. Redis is now a critical dependency
If Redis goes down → your real-time layer breaks. Plan for Redis high-availability (Redis Sentinel or Redis Cluster) in production.
2. Latency increases slightly
There's a small overhead introduced by the pub/sub round-trip. For most real-time apps, this is negligible — but worth knowing.
3. Sticky sessions are no longer required
One of the nice side effects: Redis removes the need for session affinity (sticky sessions) at the load balancer. Any user can hit any pod.
Alternative Approaches — And Why We Didn't Use Them
Sticky Sessions
- Routes each user to the same pod permanently
- Breaks horizontal scalability — defeats the purpose of multiple pods
- A temporary band-aid, not a real fix
Kafka / RabbitMQ
- Powerful, but significant operational overhead
- Overkill for a straightforward real-time sync requirement
- Redis Pub/Sub hits the sweet spot: simple, fast, battle-tested
Key Takeaway
If you're scaling a real-time system horizontally:
In-memory sockets won't scale across pods. You need a shared messaging layer.
Redis Pub/Sub is one of the simplest and most effective ways to bridge that gap.
Final Thought
This wasn't a bug — it was an architectural gap.
Once you understand that each pod is an isolated process with no awareness of other pods, you start designing distributed systems differently. You stop assuming "in-process = reliable" and start asking "where's the shared state?"
And that's when things actually scale.

Top comments (0)