ZeeshanAli-0704

Posted on Jun 28 • Edited on Jul 18

Frontend System Design: WebSocket Architecture

Frontend System Design: WebSocket Architecture (Beginner Friendly, Full Theory)

A from-scratch guide to designing a real-time frontend with WebSocket, explained for someone reading about it for the first time. We use a Slack-like chat app as the running example and slowly build up from "what problem are we even solving" to "how do we scale this to millions of users".

1. The Problem WebSocket Solves
2. The Slack Example in Plain English
3. Functional Requirements What the App Must Do
4. Non Functional Requirements How Well It Must Do It
5. High Level Architecture
6. Data Flow Walk Throughs
7. Data Model
- 7.1 Event Schema Versioning and Compatibility
8. State Management
9. API Design From the Frontend View
- 9.1 Delivery Guarantees and Failure Semantics
10. Caching Rendering and Performance
- 10.1 Connection Management Defaults
- 10.2 Client Backpressure and Memory Policy
11. Security and Reliability
- 11.1 Auth Lifecycle After Connect
12. Edge Cases
- 12.1 Protocol Fallback Strategy
13. WSS Token and Handshake Explained
14. WebSocket Internals and Scaling
15. Final Summary

1. The Problem WebSocket Solves

Before we talk about WebSocket, we need to understand why it exists at all. If you are new, start here — everything later builds on this.

1.1 How the Web Normally Works Request and Response

The normal web works on a simple rule: the browser asks, the server answers, and then the conversation ends.

Browser:  "Hey server, give me the messages."   (request)
Server:   "Here they are."                        (response)
--- connection closes, conversation over ---

This is called the request-response model (HTTP). It is like sending a letter in the mail: you send one letter, you get one letter back, and if you want more you must send another letter.

This works great for loading a web page. But chat apps have a problem: new messages arrive at random times, and the server has no way to reach the browser after the response is sent. The mailbox is closed. The server cannot "push" a new message to you because there is no open line to talk through.

So how do we get live updates? There are four classic approaches. Let's build them up one at a time.

⬆ Back to Top

1.2 The Four Ways to Get Live Updates

1) Short Polling — "Are we there yet?"

The browser keeps asking again and again on a timer.

every 3 seconds:  Browser -> "Any new messages?" -> Server -> "No."
                  Browser -> "Any new messages?" -> Server -> "No."
                  Browser -> "Any new messages?" -> Server -> "Yes! Here."

Simple, works everywhere.
Wasteful: most requests return "nothing new". And a message can sit up to 3 seconds before you see it.

2) Long Polling — "Tell me when you know."

The browser asks, but the server holds the request open and only answers when there is actually something new. Then the browser immediately asks again.

Feels near real-time, no special protocol needed.
But every waiting user ties up a server connection, and you keep re-opening requests.

3) Server-Sent Events (SSE) — "Keep talking, I'm listening."

The server keeps one connection open and streams updates to the browser. But it is one-way: server can talk to browser, browser cannot talk back on that same channel.

Great for feeds, notifications, progress bars.
Not ideal for chat, where the browser also needs to send messages, typing signals, etc.

4) WebSocket — "Let's keep a phone line open both ways."

The browser and server open one connection that stays open, and both sides can send messages at any time. This is exactly what a chat app wants.

Approach	Live?	Two-way?	Cost	Best for
Short polling	Delayed	Yes (new request each time)	Wasteful	Simple status checks
Long polling	Almost	Yes (new request each time)	Medium	Near-real-time without sockets
SSE	Yes	No (server → client only)	Low	Feeds, notifications, progress
WebSocket	Yes	Yes (both directions)	Low per message	Chat, presence, collaboration

⬆ Back to Top

1.3 What a WebSocket Actually Is

In plain words:

A WebSocket is one live connection between the browser and the server.
Once it is open, it stays open.
Both sides can send data at any time, instantly, without starting a new request each time.
You do not need to refresh the page to see new data.

A good analogy: HTTP polling is like texting ("any news? … any news? … any news?"). A WebSocket is like a phone call that stays connected — either person can speak the moment they have something to say.

The system we design in this whole document is:

A Slack-like chat app with channels, direct messages, typing indicators, and presence (who is online).

Keep that picture in your head. Everything below serves that app.

⬆ Back to Top

2. The Slack Example in Plain English

Let's walk through what happens when you use Slack, step by step, with no jargon:

You open Slack in your browser.
The app logs you in and opens one live socket (the phone line).
The app subscribes to the channels you care about (tells the server "send me updates for these rooms").
A teammate sends a message — it appears instantly on your screen.
You send a message — it shows sending, and a moment later becomes sent once the server confirms it saved.
Your internet drops for a few seconds — the app quietly reconnects and catches up on anything you missed, so you never notice a gap.

That's the whole experience. The rest of this document explains how to build each of those six behaviors correctly and reliably.

⬆ Back to Top

3. Functional Requirements What the App Must Do

"Functional requirements" just means: the list of things the app is supposed to do. Here are ours, each explained.

3.1 Connect Once

Goal: after login, open a single WebSocket connection and keep it healthy.

Why once? Because each open connection costs memory and resources on the server. One user should not open five sockets. We open one and reuse it for everything.

The UI should always show the user what state the connection is in, because a silent failure is confusing:

connecting — we are opening the line.
connected — live and working.
reconnecting — we lost it and are trying again.
offline — we gave up for now and will retry.

3.2 Subscribe to Needed Channels Only

Goal: only ask for updates you actually need right now.

A big workspace might have thousands of channels. You do not want updates from all of them — that would flood the browser. So the client subscribes only to the relevant ones.

Where does the "relevant" list come from?

A bootstrap API at startup (gives your workspace info).
Your membership data (channels and DMs you belong to).
The current screen (the channel or thread you have open right now).

Simple rule:

Subscribed channels = channels you are allowed to see ∩ channels you actually need right now.

3.3 Receive and Process Events

Goal: when an event arrives, update the right part of the UI — correctly.

Examples of events and what they do:

message.created → add the message to the thread and update the sidebar preview.
presence.updated → turn a user's online dot green or grey.

Two subtle problems every real-time app must handle:

Duplicates: the same event may arrive twice (networks retry). We ignore repeats using a unique eventId.
Order: events can arrive out of order. We keep them in the right order using a sequence number.

3.4 Send User Actions

Goal: when the user does something (sends a message), make the UI feel instant.

We use a trick called optimistic UI:

User clicks send.
We immediately show the message bubble with a sending status — before the server even replies.
The server saves it and sends back an ack (acknowledgement).
We flip the bubble to sent.
If it fails, we show a retry button.

This makes the app feel fast even though the round-trip takes time.

3.5 Recover Automatically

Goal: survive network drops without the user losing data or context.

The recovery loop:

Detect the disconnect (heartbeat stops responding).
Reconnect using backoff (wait a bit longer between each retry so we do not hammer the server).
Resubscribe to the channels we had.
Replay the events we missed while offline, starting from the last sequence we saw.

3.6 Multi Tab Behavior

Goal: if the user opens the app in several browser tabs, do not open several sockets.

The clean pattern is a leader tab:

One tab is elected the leader and owns the single WebSocket.
Other tabs are followers and receive the same updates through a BroadcastChannel or a SharedWorker.
If the leader tab closes, the remaining tabs elect a new leader, which opens the socket.

    Tab A (leader)            BroadcastChannel             Tab B (follower)
        |                           |                            |
        |==== owns WebSocket ======>|                            |
        |<=== receives live events =|                            |
        |--- publish updates ------>|--- deliver updates ------->|
        |                           |                            |
        |---- closes/crashes ----X  |                            |
        |                           |<-- election heartbeat -----|
        |                           |--- Tab B becomes leader -->|
        |                           |==== Tab B opens socket ===>|

⬆ Back to Top

4. Non Functional Requirements How Well It Must Do It

"Non-functional requirements" means: not what it does, but how well it does it — the quality bar.

Fast: chat should feel instant. Target sub-second delivery.
Reliable: if the network hiccups, reconnect and recover on its own without losing messages.
Scalable: only subscribe to needed data, so one user does not pull the whole workspace.
Secure: the socket must be authenticated, and the server must check that the user is allowed in each channel.
Usable: the user should always be able to tell whether they are connected.

⬆ Back to Top

5. High Level Architecture

Think of the frontend as four layers stacked on top of each other, each with one job. This separation keeps the code understandable.

UI  ->  State  ->  Realtime  ->  API

UI layer — what the user sees: the channel list, the message thread, the message box. It only knows how to draw things.
State layer — the app's memory: messages, unread counts, presence, connection status. The UI reads from here.
Realtime layer — the socket manager: opening the connection, subscribing, reconnecting, replaying missed events. It feeds the state layer.
API layer — plain HTTPS calls used to bootstrap, fetch a snapshot of a channel, and replay missed events.

Why layer it this way? Because the UI should not care how a message arrived (socket vs replay vs snapshot). It just reads the state layer. The realtime and API layers do the messy work and keep the state layer up to date.

⬆ Back to Top

6. Data Flow Walk Throughs

Now let's trace real scenarios end to end. Read the diagrams left to right, top to bottom — each arrow is one step.

6.1 Initial Load

What happens the moment you open the app:

You log in.
The app calls the bootstrap API to learn your workspace, channels, and where to connect.
The app opens the WebSocket and authenticates it.
The app subscribes to the channels you need.
The app asks the replay API for anything it missed since last time.

    Browser                 App API                  WSS Gateway            Replay API
        |                       |                           |                     |
        |--- login/session ---->|                           |                     |
        |--- GET /bootstrap --->|                           |                     |
        |<-- wsUrl/channels ----|                           |                     |
        |--- POST /token ------>|                           |                     |
        |<-- wsToken -----------|                           |                     |
        |==================== open wss://...token ==============================>|
        |-------------------- subscribe(channel, fromSeq) ----------------------->|
        |--- GET /replay?fromSeq=cursor ----------------------------------------->|
        |<-- missed events -------------------------------------------------------|

6.2 Sending a Message

Notice the optimistic bubble appears before the server replies — that is what makes it feel instant.

You type and click send.
The message bubble appears immediately in sending state.
The command travels over the socket.
The server saves it and sends back an ack.
The bubble becomes sent.

    User/UI                   Browser Socket              WSS Gateway            Storage
        |                             |                         |                     |
        |-- click send -------------->|                         |                     |
        |  add optimistic bubble      |                         |                     |
        |                             |--- message.send ------->|                     |
        |                             |                         |--- persist --------->|
        |                             |                         |<-- saved id --------|
        |                             |<-- message.ack ---------|                     |
        |<-- optimistic -> sent ------|                         |                     |

6.3 On Network Drop

The key idea: backoff (waiting longer each retry) plus replay (asking only for what you missed).

The socket disconnects.
Reconnect begins, waiting 1s, then 2s, then 4s between tries.
After reconnecting, resubscribe.
Replay the missed events and continue as if nothing happened.

    Browser Client              Network              WSS Gateway             Replay API
        |                           |                      |                      |
        |<====== live stream ======>|                      |                      |
        |-------- X disconnect -----|                      |                      |
        | wait 1s, 2s, 4s           |                      |                      |
        |================ reconnect wss ===================>|                      |
        |---------------- resubscribe(fromSeq) ----------->|                      |
        |---------------- GET /replay?fromSeq=last ------->|--------------------->|
        |<--------------- missed events ------------------------------------------|
        | UI catches up and continues live                 |                      |

⬆ Back to Top

7. Data Model

The data model is the shape of the information the app keeps track of. Four core ideas:

ConnectionState — is the socket connecting, connected, reconnecting, or offline?
ChannelSubscription — which channels are we currently listening to?
EventEnvelope — the standard "wrapper" every event comes in.
ReplayCursor — a bookmark (sequence) of the last event we successfully processed.

Every event the server sends is wrapped in the same envelope so the client can handle them uniformly:

type EventEnvelope<T> = {
  eventId: string;   // unique id, used to ignore duplicates
  sequence: number;  // ordering number within a channel
  channel: string;   // which channel this belongs to
  type: string;      // e.g. "message.created"
  payload: T;        // the actual event data
};

7.1 Event Schema Versioning and Compatibility

The problem: you will change the shape of events over time (add fields, rename things). But not everyone updates their app at the same moment. An old tab open for two days should not crash when it receives a new-shaped event.

The solution: version your events from day one and prefer additive changes.

Simple rules:

Put a schemaVersion in every envelope.
Prefer adding new optional fields over changing or removing existing ones.
Never remove or rename a required field without bumping the version.
Support at least the previous client version (N-1) during a rollout.

Example envelope with a version:

{
  "eventId": "evt_123",
  "schemaVersion": 2,
  "channel": "channel:engineering",
  "type": "message.created",
  "payload": {
    "id": "m_1",
    "text": "hello",
    "mentions": []
  }
}

Safe deprecation flow (how to retire an old field without breaking anyone):

Introduce the new optional field alongside the old one.
Ship clients that can read both shapes.
Mark the old field deprecated in the API contract.
Remove the old field only after the upgrade window closes.

⬆ Back to Top

8. State Management

Not all state is equal. Organizing it by scope keeps the app predictable.

Global state (affects the whole app):
- socket status
- subscribed channels
- replay cursor
Feature state (belongs to the chat feature):
- messages
- unread counts
- presence
Local state (only one component cares):
- the draft text in the message box
- whether a side panel is open

Rule of thumb: keep state at the narrowest scope that works. Draft text does not belong in global state; socket status does.

⬆ Back to Top

9. API Design From the Frontend View

Even a real-time app needs a few normal HTTPS APIs — the socket is not the only channel. Here are the four core ones:

GET /realtime/bootstrap — startup info: socket URL, allowed channels, cursor hints.
GET /snapshot/{channel} — the current state of a channel (used on a cache miss).
GET /replay/{channel}?fromSeq=... — only the events you missed after a gap.
POST /realtime/token — a short-lived token used to open the socket.

How many APIs, and when is each called?

Total core APIs: 4
Always called on startup: 2
- GET /realtime/bootstrap
- POST /realtime/token
Called only when needed: 2
- GET /snapshot/{channel}
- GET /replay/{channel}?fromSeq=...

Step by step:

GET /realtime/bootstrap — right after login/page load. Gives the socket URL, allowed channels, and cursor hints.
POST /realtime/token — just before opening wss://. Returns the short-lived socket token.
GET /snapshot/{channel} — when a channel's data is not already cached.
GET /replay/{channel}?fromSeq=... — after a reconnect or a detected gap in sequence. Returns only the missed events.

Example subscribe message sent over the socket:

{
  "subscribe": {
    "channel": "channel:engineering",
    "fromSequence": 1234
  }
}

9.1 Delivery Guarantees and Failure Semantics

This sounds academic but it decides whether users ever see duplicate or missing messages. Let's define the terms simply.

At-most-once: no duplicates, but a message can be lost on failure.
At-least-once: no loss, but a retry can create a duplicate.
Effectively-once (at the UI): the transport may duplicate, but the client dedupes using eventId + sequence, so the user sees each message exactly once.

For chat, the recommended model is:

Transport: at-least-once (duplicates possible, loss unlikely).
UI: effectively-once (dedupe hides the duplicates).
Ordering: guaranteed per channel, not globally across all channels.

Failure situations to write down in your API contract:

Publisher timeout: the client shows a retry.
Ack lost but the write actually succeeded: the retry may create a duplicate → dedupe must catch it.
Replay overlap: replay may include events you already saw → dedupe handles it.

⬆ Back to Top

10. Caching Rendering and Performance

A real-time app can receive hundreds of events per second. If you re-render everything on every event, the UI will stutter. Performance rules:

Keep the current chat state in memory for instant access.
Store the replay cursor in IndexedDB / local storage so you can catch up after a full reload.
Render only the parts that changed, not the whole screen.
Use virtualization for long message lists (only render the rows on screen).
Batch frequent events so you re-render once per frame, not once per event.

10.1 Connection Management Defaults

Concrete starting numbers so behavior is predictable (tune later from real data):

Heartbeat interval: 25s
Heartbeat timeout: 10s
Reconnect backoff: 1s, 2s, 4s, 8s, 16s (cap at 30s)
Reconnect jitter: ±20% (randomize so all clients do not reconnect at the same instant)
Max reconnect attempts before showing an offline banner: 8
Max channels per socket (soft limit): 200
Max inbound frame size: 64 KB
Max outbound buffer per socket: 1 MB

Why jitter? If 10,000 users all disconnect at once (a server blip) and all reconnect at exactly 2s, they create a stampede. Random jitter spreads them out.

10.2 Client Backpressure and Memory Policy

Backpressure means: what to do when events arrive faster than the UI can handle. The client must never let its queue grow forever, or it will run out of memory.

Recommended policy:

Keep a bounded in-memory event queue (for example 5,000 events).
Prioritize the visible channel over background channels.
Coalesce high-frequency events (presence.updated, typing) — collapse many into one.
When the queue is nearly full:
- Drop non-critical transient events.
- Keep critical durable events (message.created, moderation actions).
When the queue overflows:
- Trigger a replay resync from the last stable sequence.
- Show a subtle "syncing" indicator.

⬆ Back to Top

11. Security and Reliability

Security basics:

Use wss:// only (the encrypted version of WebSocket, like HTTPS is to HTTP).
Validate the auth token when the socket connects.
Check channel permission on the server — never trust the client to only subscribe to allowed channels.

Reliability basics:

Heartbeat ping/pong to detect dead connections.
Reconnect with backoff.
Replay missed events.
Dedupe by eventId.

11.1 Auth Lifecycle After Connect

Important idea for beginners: authentication is not a one-time check at connect. Permissions can change while the socket is open, and the server must handle that.

Runtime cases to handle:

Session revoked (user logs out on another device):
- Server sends an auth.revoked control event.
- Client clears local auth and redirects to login.
Permission changed while connected (user removed from a channel):
- The next subscribe/publish for that channel must fail authorization.
- The server can also proactively push channel.access.revoked.
Token nearing expiry:
- Client fetches a fresh socket token before reconnecting.
- Never keep using a stale long-lived token.

Auth Service           Gateway                Client
    |                     |                     |
    |-- revoke user ----->|                     |
    |                     |-- auth.revoked ---->|
    |                     |<-- close/ack -------|
    |                     |                     |
    |                     |  client clears state |

⬆ Back to Top

12. Edge Cases

The situations that break naive implementations:

Tab sleeps (laptop closed): on resume, reconnect and replay.
Duplicate delivery: ignore by eventId.
Out-of-order delivery: fix using sequence.
Multiple tabs: use the leader-tab pattern (Section 3.6).

12.1 Protocol Fallback Strategy

The problem: some corporate networks and proxies block WebSocket. Your app must still work.

The fallback order — try the best first, degrade gracefully:

Try WebSocket (wss://).
If the upgrade repeatedly fails, fall back to SSE.
If SSE also fails, fall back to long-polling.

What you lose at each step:

WebSocket: full two-way real-time.
SSE: you still receive updates in real-time; your sends go over normal HTTP instead.
Long-poll: highest latency and cost — keep only the essential updates.

Client rule: remember the current transport in memory, and periodically retry WebSocket (say every 5–10 minutes) in case the network situation improves.

⬆ Back to Top

13. WSS Token and Handshake Explained

This section answers a very common beginner question: how does the socket get authenticated, and what actually happens when it "connects"?

The secure pattern in four steps:

The client gets a short-lived socket token using a normal HTTPS API (you are already logged in).
The client opens a wss:// connection using that token.
The server validates the token and upgrades the connection from HTTP to WebSocket.
After connecting, the client sends subscribe frames.

Important safety notes:

Keep the socket token short-lived (for example 1–5 minutes).
Prefer a dedicated ephemeral socket token, not your long-lived access token.
Never hardcode a real production token in frontend code.

13.1 End to End Sequence

+---------+        +-------------------+        +-------------------+        +----------------+
| Browser |        | App API           |        | Token API         |        | WSS Gateway    |
+---------+        +-------------------+        +-------------------+        +----------------+
|                        |                           |                            |
| (A) user already logged in                         |                            |
|----------------------->|                           |                            |
|                        |                           |                            |
| (1) GET /realtime/bootstrap                        |                            |
|----------------------->|                           |                            |
|<-----------------------| 200 { wsUrl, allowedChannels, cursorHints }            |
|                        |                           |                            |
| (2) POST /realtime/token                           |                            |
|----------------------------------->|               |                            |
|<-----------------------------------| 200 { wsToken, exp }                        |
|                        |                           |                            |
| (3) GET /snapshot/{channel} (optional, cache miss)                             |
|----------------------->|                           |                            |
|<-----------------------| 200 { initial channel state }                          |
|                        |                           |                            |
| (4) Open WSS: wss://rt.example.com/ws?token=...                                |
|--------------------------------------------------------------------------->     |
|<---------------------------------------------------------------------------     |
|                         101 Switching Protocols (Upgrade success)               |
|                                                                                    
| (5) WS frame: subscribe(channel:engineering, fromSeq:1234)                     |
|--------------------------------------------------------------------------->     |
| (6) WS frame: connection.ack(conn_7f2a)                                         |
|<---------------------------------------------------------------------------     |
|                                                                                    
| (7) On reconnect/gap -> GET /replay/{channel}?fromSeq=1234                      |
|----------------------->|                                                          |
|<-----------------------| 200 { missedEvents:[...] }                              |

13.2 API Count and Call Pattern

Core APIs = 4

[Always]
1) GET  /realtime/bootstrap
2) POST /realtime/token

[Conditional]
3) GET  /snapshot/{channel}
4) GET  /replay/{channel}?fromSeq=...

13.3 Token Fetch and Connect Flow

Browser Client                               API/Gateway
--------------                               -----------
1) HTTPS login already done

2) GET /realtime/token  ------------------->  validate session
   (cookie/JWT)

3) <-------------------------- 200 { wsToken: "eyJ...abc", exp: 60s }

4) new WebSocket(
     "wss://rt.example.com/ws?token=eyJ...abc"
   )

5) HTTP Upgrade request ------------------->  verify token + claims
                                              check exp, userId, tenantId

6) <-------------------------- 101 Switching Protocols (Upgrade success)

7) send subscribe frame ------------------->  start streaming events

13.4 Handshake Headers

What is a handshake? Even a wss:// connection starts as a normal HTTP request that politely asks "can we upgrade this to a WebSocket?" The server agrees with a special 101 response. After that, it is a WebSocket.

The browser's upgrade request:

GET /ws?token=eyJ...abc HTTP/1.1
Host: rt.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
Origin: https://app.example.com

The server's acceptance:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=

That 101 Switching Protocols is the moment the connection stops being HTTP and becomes a live two-way WebSocket.

13.5 First Messages After Handshake

The client's first frame — "please stream me this channel from where I left off":

{
  "type": "subscribe",
  "channel": "channel:engineering",
  "fromSequence": 1234
}

The server's acknowledgement — "you are connected, here is your connection id":

{
  "type": "connection.ack",
  "connectionId": "conn_7f2a",
  "userId": "u_42"
}

13.6 Failure Cases

token expired -> server rejects upgrade or closes with auth code -> client fetches new wsToken -> reconnect
invalid channel subscribe -> server sends error frame -> client does not retry that channel
network break -> reconnect with backoff -> resubscribe -> replay missed events

Case 1: token expired
Client ---- open wss(token_old) ----> Gateway
Client <--- auth close / reject ----- Gateway
Client ---- POST /realtime/token ---> API
Client <--- new wsToken ------------- API
Client ---- open wss(token_new) ----> Gateway

Case 2: invalid subscribe
Client ---- subscribe(private:admin) -> Gateway
Client <--- error.not_authorized ----- Gateway
Client ---- stop retry for this chan -> (client rule)

Case 3: network break
Client ----X socket lost X----------- Gateway
Client ---- reconnect + subscribe ---> Gateway
Client ---- GET /replay?fromSeq ----> API
Client <--- missed events ----------- API

13.7 Token Mechanism in Simple Terms

The five stages, in order:

Normal login — user signs in with password/SSO; server creates an authenticated session (cookie or access token).
Socket token exchange — client calls POST /realtime/token with that session; server returns a short-lived wsToken.
Handshake auth — client opens wss://.../ws?token=wsToken; the gateway validates it and replies 101 Switching Protocols. The connection is now tied to that user.
Authorization after connect — authentication says who you are; authorization says which channels you may access. The server checks permission on every subscribe.
Reconnect + refresh — if the socket drops and the token expired, fetch a new wsToken, reconnect, resubscribe, replay.

Security checklist:

Use only wss://.
Keep the socket token very short-lived (1–5 min).
Do not expose a long-lived auth token in the socket URL.
Do not log full tokens anywhere.

⬆ Back to Top

14. WebSocket Internals and Scaling

Now we go under the hood: how a WebSocket works internally, and how the system grows from one user to millions. This is where system-design interviews spend most of their time.

14.1 How WebSocket Works Internally

First, clear up a common confusion:

In your code you write new WebSocket("wss://...").
Under the hood, the browser sends an ordinary HTTPS request with an Upgrade: websocket header.
The server replies 101 Switching Protocols.
After that 101, the connection stops speaking HTTP and starts speaking WebSocket frames.

The lifecycle:

HTTP starts it — client sends a request with Upgrade: websocket.
Server upgrades it — replies 101; now it is a persistent, full-duplex connection.
Frames flow — data travels as small frames (not full HTTP responses), and either side can send at any time.
Keep-alive — periodic ping/pong frames check the connection is still alive; if a pong is missed, the client reconnects.
Lifecycle — connect → authenticate → subscribe → stream events → reconnect if broken.

14.2 Internal Components on the Backend

You do not build all of this on the frontend, but knowing it helps you reason about failures.

+----------------+      +----------------+      +-------------------+
|  Clients (Web) | ---> |  WSS Gateway   | ---> |  Event Router      |
+----------------+      +----------------+      +-------------------+
        |                         |
        v                         v
   +----------------+       +-------------------+
   | Presence Store |       | Message/Event Bus |
   +----------------+       +-------------------+
        |                         |
        +-----------+-------------+
                    |
                    v
          +---------------------+
          | Channel Subscribers |
          +---------------------+

What each piece does:

WSS Gateway — accepts socket connections and authenticates them.
Event Router — sends each event to the correct channel's subscribers.
Presence Store — tracks who is online.
Event Bus — distributes events across many backend servers.

14.3 Scaling Stages From One to Millions

This is the heart of a real-time system-design interview, so we will go very slowly and explain every term from scratch. By the end you should understand why each piece (sticky sessions, load balancer, event bus, fan-out, partitioning) exists and what problem it solves.

Why Scaling Sockets Is Hard

Start with the single most important idea:

A normal HTTP request is like a quick phone call — you call, get your answer, and hang up in a few milliseconds. A WebSocket is like leaving the phone line open all day.

That one difference changes everything:

With HTTP, a single server can handle huge traffic because each request is over in a blink. The server is never "full" — connections are constantly freed up.
With WebSocket, every connected user permanently occupies a slot on one server: some memory, one network socket (a "file descriptor"), and a little CPU for heartbeats. If 50,000 users are online, that is 50,000 open lines sitting there — even when nobody is typing.

And there is a second, bigger cost: fan-out. When one person posts in a channel with 10,000 members, the system must copy that one message out to 10,000 different connections. One input, ten thousand outputs. This multiplication is the real scaling monster, and most of this section is about taming it.

So we have two enemies:

Too many long-lived connections for one server to hold.
Fan-out — one message exploding into many deliveries.

We now grow the system in three stages, adding one tool at a time to fight these two enemies.

Stage A One Server

Situation: 1 to ~100 users. Everything fits on a single server (we call it a gateway — the server that holds WebSocket connections).

How it keeps track of who wants what: a simple lookup table in the server's memory called a subscription map:

channelId              ->  list of connections listening to it
---------------------------------------------------------------
"channel:engineering"  ->  [conn_1, conn_5, conn_9]
"channel:random"       ->  [conn_2, conn_5]

When a message arrives for channel:engineering, the server looks up that row and pushes the message to conn_1, conn_5, and conn_9. Done.

✅ Simple. Easy to build and debug.
❌ Single point of failure — if this one server crashes or you deploy new code, every user is disconnected at once.
❌ Cannot grow — one machine has a ceiling on how many open sockets it can hold.

This is fine for a demo, but a real app cannot live on one server. So we add more.

Sticky Sessions Explained

The moment you run two or more gateways, a new problem appears. Picture it:

A user's WebSocket is a long-lived, open line to one specific server — say Gateway 1.
All of that user's state (which channels they subscribed to, their presence) is living in Gateway 1's memory.
Now their Wi-Fi blips and they reconnect. The load balancer might send them to Gateway 2 this time — which knows nothing about them. Gateway 2 has to rebuild everything from scratch.

Sticky sessions (also called session affinity) fix this: they tell the load balancer "once a user lands on Gateway 1, keep sending that same user back to Gateway 1."

A simple analogy:

Sticky sessions are like having your own regular barista. They already remember your order, so every visit is fast. Without stickiness, you get a random new barista each time and must re-explain your whole order.

How the load balancer decides who is "sticky" to which server:

Cookie affinity — the load balancer sets a cookie remembering your server.
Source-IP hash — route by the user's IP address (less reliable on mobile, where IPs change).
Consistent hashing on userId / tenantId — a math function that always maps the same user to the same server, and (importantly) reshuffles as few users as possible when a server is added or removed.

Note: sticky sessions are a helpful optimization, not a correctness requirement. Because we also store important state in a shared database (next section), a user can survive landing on a different server — it is just slower. Stickiness makes the common case fast.

What the Load Balancer Actually Does

Beginners often think the load balancer "delivers messages." It does not. Its job is much narrower:

Admission — accept the incoming connection, terminate TLS (the s in wss), and pick which gateway this new connection should attach to.
Placement on reconnect — honor stickiness so a returning user goes back to a sensible server.

That is it. The load balancer is a receptionist at the front door — it points you to a desk. It does not carry messages between desks. Moving messages between servers is a totally different job, handled by the event bus (coming up).

Stage B Many Servers and Shared State

Situation: hundreds to ~10,000 users. One server is not enough, so we run multiple gateways behind a load balancer.

Why we are forced to do this:

One server cannot hold every socket forever (memory/socket limits).
You need to deploy new code without disconnecting everyone (rolling deploys — restart servers one at a time).
If one server dies, users on the other servers should be unaffected (failover).

But splitting across servers creates a knowledge problem: user Alice is on Gateway 1, user Bob is on Gateway 2, and neither server knows about the other's users. If presence ("who is online") lives only in each server's private memory, Gateway 1 has no idea Bob is online.

The fix: move shared knowledge out of any single server's memory into a shared store that all servers can read — commonly Redis (a very fast in-memory database).

What moves into the shared store:

Presence — who is online, last-seen times.
Routing metadata — which server currently holds which connections (so other servers can find them).

            +----------------+
 Gateway 1  |                |  Gateway 2
  (Alice)   |     Redis      |   (Bob)
     \      | shared memory  |    /
      \---> | presence,      | <--/
            | routing table  |
            +----------------+

Now any server can answer "is Bob online, and where is he connected?" by asking Redis. But we still have not solved delivery across servers. That is the event bus.

Fan Out Explained With Numbers

Let's make fan-out concrete, because it is the reason the rest of the architecture exists.

Fan-out = taking one incoming message and copying it out to every subscriber.

Imagine a channel with 10,000 members, and it is a busy channel getting 5 messages per second:

5 messages/sec  ×  10,000 subscribers  =  50,000 outbound deliveries/sec

One small input (5/sec) becomes a huge output (50,000/sec). Now imagine that channel's members are spread across 20 different gateways. Each of those 20 servers must push the message to its own slice of those 10,000 users.

So the message must somehow travel from the one server that received it, out to the other 19 servers, so each can deliver to its local users. How does a message hop from server to server? Through the event bus.

Analogy: fan-out is like a teacher making an announcement to a whole school. The teacher says it once (one input), but it must reach every classroom (many outputs). The PA system that carries it to every room is the event bus.

The Event Bus Explained

An event bus (also called pub/sub — publish/subscribe) is a central pipe that servers use to broadcast events to each other. The most common one is Kafka, but the idea is general.

Here is the whole point in one line:

A gateway that receives a message does not try to contact the other 19 gateways directly. It just publishes the message once to the event bus, and the bus takes care of getting it to whoever needs it.

Why this indirection is worth it:

Decoupling — the receiving server publishes once and moves on immediately. It does not need to know how many other servers exist or where they are. Servers can be added or removed freely.
Durability — Kafka writes events to a log on disk. If a server was briefly down, it can come back and replay the events it missed by reading the log from where it left off.
Buffering (backpressure) — if consumers are momentarily slow, events wait safely in the log instead of being lost.
Parallelism — the log is split into partitions (next section) so many servers can process different slices at the same time.

Gateway 1 receives Alice's message
        |
        |  publish ONCE
        v
   +-------------------+        every gateway that has
   |    Event Bus      | -----> subscribers for this channel
   |   (Kafka log)     |        reads it and delivers locally
   +-------------------+
        |        |        |
        v        v        v
   Gateway 1  Gateway 2  Gateway 3 ...  (each pushes to its own sockets)

Crucial detail beginners miss: the event bus does not know about individual browser sockets. Kafka just moves events between servers. The mapping of "which socket is on which server" still lives in the routing table (Redis). The bus carries the message near the right servers; the servers do the final push to the actual sockets.

Partitioning Explained

If every server had to read every message in the whole app, we would be back to one machine's worth of work on each machine — no real scaling. Partitioning is how we split the firehose into manageable lanes.

A partition is one ordered lane inside the event bus. The bus splits its log into many partitions, and each event is placed into a lane based on a partition key.

partition key = channelId

channel:engineering  --> always lane 3
channel:random       --> always lane 7
channel:design       --> always lane 3

Two big wins from choosing the channelId as the key:

Load spreads out. Different channels fall into different lanes, so 100 servers can each own a few lanes and work in parallel — no single server drowns.
Order is preserved within a channel. Because all of channel:engineering's events go into the same lane, and a lane is strictly ordered, messages in that channel are processed in the exact order they arrived. (This is why, later, we can promise "per-channel ordering" but not "global ordering across all channels" — different channels live in different lanes that move independently.)

Analogy: partitioning is like a post office sorting mail by zip code. Each zip code (channel) always goes to the same sorting bin (partition), so many workers can process different bins at once, and letters within one zip code stay in order.

Picking a good partition key matters:

Too coarse (e.g. key = workspaceId) → one giant workspace overloads a single lane.
Too fine (e.g. key = messageId) → you lose per-channel ordering.
channelId is usually the sweet spot for chat.

Stage C Putting It All Together

Situation: 10,000 to millions of users. Now we combine every tool above into one picture.

Users connect through a load balancer, which places them (stickily) onto a gateway in the nearest region (lower latency).
Each gateway keeps its local sockets and writes its routing info to the shared store (Redis).
When a message comes in, the gateway persists it and publishes once to the event bus (Kafka), keyed by channelId so it lands in the right partition.
A fan-out step reads that partition, looks up the routing table to see which gateways hold subscribers, and hands the message to those gateways.
Each of those gateways does the final push to its own local sockets.
Autoscaling watches connection count and outbound events/sec (plus CPU, memory, and how far behind the consumers are — "lag") and adds/removes gateways automatically.

Mental model to remember the whole journey:

Load balancer picks your desk. Redis remembers who sits where. Kafka carries messages between desks. Partitions keep each channel's mail in its own ordered lane. Fan-out copies the one message to everyone who wants it.

Hot Channels and Hot Connections

Even with all of the above, two things can still overwhelm you:

A hot connection — a single client sending or receiving way too much (a runaway bot, an abusive script, one power user).
A hot channel — one room with enormous fan-out (a company-wide all-hands channel where one message reaches everyone at once).

Protections:

Per-connection limits — rate-limit how fast one user can publish/subscribe (a "token bucket" gives each user a refilling allowance of actions).
Per-channel protection — cap burst fan-out per time window; coalesce high-frequency events (collapse many typing/presence updates into one).
Shard heavy channels — move a giant channel onto its own dedicated servers, so its storm cannot hurt normal channels.
Priority lanes — deliver critical events (real messages) first; delay or drop noisy ones (typing indicators) under stress.
Slow-consumer strategy — if one client cannot keep up, bound its buffer; when it overflows, drop low-priority events or force a replay resync instead of letting memory grow forever.

Migration Checklist

The practical order to actually evolve the system (do not build Stage C on day one):

Start with a single gateway and measure a baseline.
Add a second gateway plus a load balancer.
Introduce a shared presence/subscription store (Redis).
Add an event bus for inter-node fan-out (Kafka).
Partition channels by a stable key (channelId).
Add autoscaling rules from real traffic.
Add hot-channel isolation and rate limits.
Run chaos/failover tests (kill a node, drop a region, trigger a reconnect storm) and confirm recovery.

One-line summary of the whole journey:

1 node: easy and cheap.
10 nodes: shared state (Redis) + sticky routing.
100+ nodes: partitioning + event bus + regional strategy.

14.4 How Traffic Is Handled Safely

A real-time system does not usually die from average traffic — it dies from spikes: a viral moment, a reconnect storm after an outage, or one channel suddenly getting 100x its normal load. This section is about the safety valves that keep the system standing when traffic surges. Think of them as circuit breakers and pressure valves in plumbing: they trade a little quality (drop a typing indicator) to avoid a catastrophic burst (the whole pipe explodes).

There are five main techniques. We will explain each from scratch, then show how they work together.

Backpressure Explained

The problem: what happens when data arrives faster than it can be processed? Without a plan, queues grow forever until memory runs out and the process crashes.

Backpressure is the general idea of pushing back when a downstream part cannot keep up — slowing the producer, buffering with a limit, or dropping low-value data — instead of blindly accepting everything.

Analogy: backpressure is like a narrow funnel. If you pour water in faster than it drains, the funnel overflows. Backpressure is either pouring slower, using a bigger-but-still-limited funnel, or letting some water spill on purpose so the important flow continues.

Where it applies in our system:

On the client (rendering): the UI cannot repaint 500 times a second. So instead of re-rendering on every event, the client batches events and repaints once per animation frame (~60 times a second), and coalesces rapid updates (10 presence blinks for the same user collapse into one).
On the server (delivery): if a gateway is producing messages faster than a slow client's socket can drain, the server must not buffer infinitely — it bounds the buffer (see slow-consumer handling below).

Why it matters: backpressure is the difference between a system that degrades gracefully under load and one that falls over. The golden rule: bounded queues everywhere. No queue — client-side or server-side — is ever allowed to grow without a limit.

Rate Limiting Explained

The problem: one misbehaving client (a bug, a bot, an attacker) can flood the server with thousands of messages per second and starve everyone else.

Rate limiting caps how many actions a client is allowed in a time window. The most common mechanism is the token bucket:

Each user gets a bucket that holds, say, 20 tokens.
Every action (publish/subscribe) costs 1 token.
The bucket refills at, say, 10 tokens per second.

- Bucket has tokens?  -> allow the action, remove 1 token.
- Bucket empty?       -> reject or delay the action.

Analogy: a token bucket is like an arcade with a coin allowance. You get a handful of coins that slowly refill. You can spend a burst of coins fast (a quick flurry of messages), but once they run out you must wait for more. This allows normal bursts while blocking sustained abuse.

Apply limits at two levels:

Per-user — stops a single account from flooding.
Per-tenant / per-workspace — stops one organization's traffic from starving other tenants on shared infrastructure ("noisy neighbor" protection).

When a client is limited, respond with a clear signal (an error frame or a retry-after hint) so a well-behaved client backs off instead of hammering harder.

Fan Out Control Explained

The problem: recall from 14.3 that fan-out multiplies one message into many deliveries. A popular channel (say a 50,000-member announcements channel) can turn a single post into a burst of 50,000 deliveries all at once — a spike that can overwhelm gateways and the network.

Fan-out control smooths and spreads that burst so it does not hit all at once:

Queues + workers — instead of delivering to 50,000 sockets synchronously in one loop, push the delivery work into a queue and let a pool of workers drain it steadily. The burst becomes a fast-but-controlled stream rather than one giant spike.
Partitioning — (from 14.3) spread the channel's delivery work across many partitions/servers so no single server carries the whole 50,000-way fan-out.
Coalescing / batching — if several updates happen within a few milliseconds, combine them into one frame per subscriber instead of many.
Sharding a hot channel — for truly enormous channels, split the member list across multiple dedicated delivery groups so the fan-out is parallelized.

Analogy: fan-out control is like a stadium emptying through many exits with staggered timing instead of everyone stampeding one door at the same second.

Slow Consumer Handling Explained

The problem: not all clients drain data at the same speed. A user on weak mobile signal, a backgrounded tab, or an overloaded laptop reads its socket slowly. The server keeps producing events for them, and the unsent data piles up in a server-side buffer for that one connection. If unbounded, that buffer eats memory and can take the whole server down — one slow user hurting everyone.

Slow-consumer handling protects the server from its slowest clients:

Bound the per-socket buffer — each connection gets a maximum outbound buffer (e.g. 1 MB). It is never allowed to grow past that.
When the buffer fills, shed load intelligently:
- Drop low-value, transient events first (typing indicators, presence blinks) — nobody misses a stale "typing…".
- Keep critical, durable events (real messages, moderation actions).
If it still overflows, stop streaming and force a resync — disconnect or mark the client "behind," and let it recover via replay (next section) once it reconnects, rather than trying to push a huge backlog through a straw.

Analogy: slow-consumer handling is like a conveyor belt with a full bin at the end. Rather than letting boxes pile up and jam the whole line, you either divert the low-priority boxes or stop the belt and restart cleanly later.

Key principle: one slow client must never be able to degrade the experience of fast clients. Isolation is the goal.

Replay and Recovery Explained

The problem: disconnects are normal (Wi-Fi drops, tab sleeps, deploys). When a client comes back, it may have missed events. We must fill that gap without re-sending everything (wasteful) and without leaving holes (broken UI).

Replay and recovery solve this using the sequence numbers we attach to every event per channel:

Each event in a channel carries an increasing sequence (1, 2, 3, ...).
The client remembers the last sequence it successfully processed (its cursor).
On reconnect, the client asks: "give me everything after fromSeq" — only the missed slice, nothing more.
The server returns just those events; the client applies them and is caught up.

Client last saw sequence 1240, then disconnected.
While offline, the channel advanced to 1245.

Reconnect -> GET /replay?fromSeq=1240
Server    -> events 1241, 1242, 1243, 1244, 1245
Client    -> applies them, cursor now 1245, fully caught up.

Two safety nets make this robust:

Gap detection — if a live event arrives with a sequence that skips ahead (e.g. you have 1240 and suddenly get 1243), the client knows it missed 1241–1242 and triggers a replay to fill the hole.
Idempotent apply (dedupe by eventId) — replay may overlap with events the client already saw; deduping by eventId means applying the same event twice is harmless. The user still sees each message exactly once.

Analogy: replay is like catching up on a group chat after your phone was off — you do not re-read the whole history, you just scroll from the last message you saw. The sequence number is your "last read" bookmark.

How They Work Together

These five are layers of the same defense, from the moment traffic enters to the moment a client recovers:

incoming action
   |
   v
[Rate limiting]      -> block abusive senders at the door
   |
   v
[Fan-out control]    -> spread one message's delivery over queues/workers/partitions
   |
   v
[Backpressure]       -> bounded queues everywhere; batch + coalesce
   |
   v
[Slow-consumer]      -> protect the server from clients that read slowly
   |
   v
[Replay + recovery]  -> whatever got dropped is recovered cleanly on reconnect

The unifying philosophy: under stress, degrade gracefully. Drop the cheap stuff (typing, presence), protect the valuable stuff (messages), never let any queue grow unbounded, and rely on replay to make any drops invisible to the user.

14.5 Capacity Metrics to Track

Concurrent socket connections
Messages in/out per second
P95/P99 delivery latency
Reconnect rate
Replay request rate
Dropped/failed message count

Simple stress signal: if reconnect rate and replay rate spike together, your system is under stress.

14.6 Deep Theory Kafka Routing Table and Consumers

This answers the exact question interviewers love: the sender is connected to Server A, but the receiver is connected to Server B — how does the message get across?

A) The cross-node flow, step by step:

Sender is connected to Gateway Node A.
Receiver is connected to Gateway Node B.
Sender sends a frame to Node A.
Node A validates auth + publish permission.
Node A persists the message and publishes one event to Kafka (key usually = channelId).
A fan-out/routing component reads that event and checks the routing table.
The routing table returns the target gateway nodes for this channel (Node B, maybe Node A too).
Delivery tasks go to those target nodes.
Node B pushes the frame to the receiver's socket.
The receiver's UI renders it.

So the event bus (Kafka) is the bridge across nodes.

B) Why Kafka? Kafka is for backend event distribution and durability, not for talking to sockets directly. It gives you:

A durable log (survives restarts).
Decoupling (Node A publishes fast; delivery happens asynchronously).
A buffer for backpressure (consumers can lag and catch up).
Partitioned scale (parallel processing).
Replay by offsets.

Kafka does not know about live sockets — the socket-to-node mapping comes from the routing table, not Kafka.

C) The routing table — the "phone book" of who is connected where:

channelId -> set(gatewayNodeIds)
gatewayNodeId -> set(connectionIds)
connectionId -> userId / session / subscriptions

Who updates it: each gateway updates its local memory on connect/subscribe/unsubscribe/disconnect, and writes to a shared store (Redis) for cross-node visibility.
Who reads it: the fan-out router when it consumes a new event (in small systems, Node A may read it directly).

D) Consumer configuration patterns:

Pattern 1 — every gateway consumes everything (simple, wasteful): each node reads all events and throws away those with no local subscribers. Easy to start, poor at scale.
Pattern 2 — targeted fan-out (recommended): publish once, a fan-out service resolves target nodes via the routing table, then writes to per-node delivery streams so each gateway consumes only its own stream.

Sender on Node A
    |
    | 1) ws frame: message.send
    v
+-----------+      2) publish(channelId key)      +----------------+
| Gateway A | -----------------------------------> | Kafka Topic(s) |
+-----------+                                      +----------------+
                                                      |
                                                      | 3) consume
                                                      v
                                               +---------------+
                                               | Fanout Router |
                                               +---------------+
                                                      |
                                   4) read routing table: channel -> nodes
                                                      |
                       +------------------------------+-----------------------------+
                       |                                                            |
                       v                                                            v
              +-------------------+                                        +-------------------+
              | Node B delivery Q |                                        | Node A delivery Q |
              +-------------------+                                        +-------------------+
                       |                                                            |
                       | 5) consume only own queue                                 | (optional)
                       v                                                            v
                  +-----------+                                                +-----------+
                  | Gateway B |                                                | Gateway A |
                  +-----------+                                                +-----------+
                       |                                                            |
                       | 6) push to local sockets                                  | local subscriber(s)
                       v                                                            v
                   Receiver(s)                                                  Local receiver(s)

E) Can Node A also be a target? Yes — if Node A also has local subscribers for that channel, it is a valid target too. The distinction that matters: "Node A can be a target" is fine; "every node consumes every event" is what gets expensive.

F) Where does the load balancer fit? It handles connection admission (TLS + routing the handshake to a gateway) and reconnect placement (sticky/cookie/hash). It does not do cross-node message fan-out — that is Kafka + fan-out service + routing table.

G) Failure and correctness notes:

If Node B dies, the routing table must evict its stale entries (via TTL/heartbeat).
If a consumer lags, Kafka offsets let it catch up.
If the receiver is offline, the message stays durable and is delivered later via replay.
Ordering is guaranteed per partition key, so your keying strategy matters.

14.7 Ordering Policy

Be explicit about what order is guaranteed:

Per-channel ordering: guaranteed when all of a channel's events share one partition key.
Cross-channel ordering: not guaranteed.
Cross-region ordering: eventual — may be delayed or reordered.
Client rendering rule: sort by sequence within a channel.
Gap rule: if nextSequence != lastSequence + 1, trigger a replay.

Tie-break order when two events look equal: primary sequence, then server timestamp, then lexical eventId.

14.8 Observability SLOs and Alerts

An SLO is a target you promise to hit. Suggested SLOs:

Real-time delivery latency P99 < 2s
Socket connect success rate > 99.9%
Reconnect recovery under 30s for 99% of sessions
Replay success rate > 99.5%

Suggested alerts:

Reconnect spike — reconnect rate > 3× baseline for 5 min.
Replay spike — replay requests > 2× baseline for 10 min.
Consumer lag — lag exceeds threshold for 5 min.
Error burst — auth/subscribe errors exceed the normal percentile.

Minimal runbook when something fires:

Check regional gateway health.
Check broker lag and partition hot spots.
Check auth/token service latency.
If needed, turn on protective throttling and coalescing.

14.9 Testing Strategy and Validation Gates

Test before you ship:

Functional — connect, subscribe, publish, ack, replay, dedupe.
Resilience — node restart, broker restart, network partition, reconnect storm.
Scale — many concurrent sockets, fan-out stress, hot-channel stress.
Security — expired token, revoked token, unauthorized subscribe/publish.

Pass criteria examples:

No message loss in replay scenarios.
Duplicate rate within threshold and hidden by UI dedupe.
P99 latency under SLO at target load.
Recovery after a single-node failure meets the reconnect SLO.

⬆ Back to Top

15. Final Summary

If you can explain the design in these four lines, you understand it well enough for most interviews:

Open one live socket after login.
Subscribe only to the channels you need.
Update the UI instantly with the optimistic + ack flow.
Reconnect and replay missed events after a disconnect.

And the one deeper insight that separates a beginner from a strong answer:

The hard part of real-time is not opening a socket — it is fan-out at scale (delivering one message to many users across many servers) and recovering correctly after failures (dedupe by eventId, order by sequence, replay from a cursor).

More Details:

Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/front-end-system-design

Frontend System Design: WebSocket Architecture (Beginner Friendly, Full Theory)

Table of Contents

1. The Problem WebSocket Solves

1.1 How the Web Normally Works Request and Response

1.2 The Four Ways to Get Live Updates

1.3 What a WebSocket Actually Is

2. The Slack Example in Plain English

3. Functional Requirements What the App Must Do

3.1 Connect Once

3.2 Subscribe to Needed Channels Only

3.3 Receive and Process Events

3.4 Send User Actions

3.5 Recover Automatically

3.6 Multi Tab Behavior

4. Non Functional Requirements How Well It Must Do It

5. High Level Architecture

6. Data Flow Walk Throughs

6.1 Initial Load

6.2 Sending a Message

6.3 On Network Drop

7. Data Model

7.1 Event Schema Versioning and Compatibility

8. State Management

9. API Design From the Frontend View

9.1 Delivery Guarantees and Failure Semantics

10. Caching Rendering and Performance

10.1 Connection Management Defaults

10.2 Client Backpressure and Memory Policy

11. Security and Reliability

11.1 Auth Lifecycle After Connect

12. Edge Cases

12.1 Protocol Fallback Strategy

13. WSS Token and Handshake Explained

13.1 End to End Sequence

13.2 API Count and Call Pattern

13.3 Token Fetch and Connect Flow

13.4 Handshake Headers

13.5 First Messages After Handshake

13.6 Failure Cases

13.7 Token Mechanism in Simple Terms

14. WebSocket Internals and Scaling

14.1 How WebSocket Works Internally

14.2 Internal Components on the Backend

14.3 Scaling Stages From One to Millions

Why Scaling Sockets Is Hard

Stage A One Server

Sticky Sessions Explained

What the Load Balancer Actually Does

Stage B Many Servers and Shared State

Fan Out Explained With Numbers

The Event Bus Explained

Partitioning Explained

Stage C Putting It All Together

Hot Channels and Hot Connections

Migration Checklist

14.4 How Traffic Is Handled Safely

Backpressure Explained

Rate Limiting Explained

Fan Out Control Explained

Slow Consumer Handling Explained

Replay and Recovery Explained

How They Work Together

14.5 Capacity Metrics to Track

14.6 Deep Theory Kafka Routing Table and Consumers

14.7 Ordering Policy

14.8 Observability SLOs and Alerts

14.9 Testing Strategy and Validation Gates

15. Final Summary