Designing WhatsApp's Typing Indicator: The Question That Tests Your Real-Time Skills

#architecture #eventdriven #microservices #tutorial

Book: System Design Pocket Guide: Interviews
Also by me: Event-Driven Architecture Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Picture walking into a system design loop at a major messaging company. The interviewer skips the usual "design a chat app" prompt and asks something narrower: design the three little dots. The typing indicator. The small grey ellipsis that appears under your contact's name and disappears half a second after they stop tapping.

It sounds like a junior question. It is not. The typing indicator is the cleanest separation of concerns in a chat product, and it tests four things at once: pubsub vs persistent connection, transport choice for server-to-client push, ephemeral vs durable storage, and backpressure when a million people start typing at the same time. If you can design the indicator, you can design the rest of WhatsApp. The answer that ships in production is not the answer that comes to mind first. (Note: what follows is an interview sketch — inferred from public real-time-systems patterns, not WhatsApp's published architecture.)

What the indicator actually is

A typing event is a tiny piece of state with three properties. It is per-pair (sender → recipient), it is short-lived (a couple of seconds), and it has no value if it arrives late. If the indicator shows up after the message it was predicting, it is worse than not showing up at all.

That last property is the design constraint everything else follows from. You will never persist this. You will never durably queue it. You will never retry it on failure. If a typing event arrives 500ms late, you drop it on the floor. If the recipient is offline, you drop it. If the network blips, you drop it. The user does not need to know that someone was typing five minutes ago — only that someone is typing right now.

That single decision separates ephemeral signals (typing, online/offline, "seen" indicators) from durable messages (the messages themselves). Two different storage stories, two different delivery guarantees, two different failure modes. Conflating them is the most common junior mistake on this question.

Why pubsub, and where the persistent connection lives

You have one client (the recipient's phone) waiting for events from one specific other client (the sender). Multiplied by every active conversation. The naive design is HTTP long-polling — the recipient opens a request, the server holds it until something happens, returns, the recipient opens another. It works. It also burns a TCP connection per held request, breaks every load balancer's idle timeout, and falls over above a few thousand concurrent users per box.

The right transport for typing indicators is a persistent WebSocket per device. One connection, full-duplex, both sides push. Server-Sent Events would also work, but only the server pushes — and the client needs to send "I am typing" up the same pipe, so a separate request is wasteful.

Inside the server, you cannot have every gateway box hold the connections of every recipient. You shard. Each recipient's WebSocket lands on one gateway. Now the question is: when sender S types into a chat with recipient R, how does S's gateway tell R's gateway?

The answer is a pubsub layer. The sender's gateway publishes a typing event to a channel keyed by the recipient. The recipient's gateway is subscribed to a channel keyed by every recipient connected to it. When the publish lands on the right channel, the recipient's gateway pushes the event down R's WebSocket. Redis Pub/Sub is a common pick for this hop; many real-time products run variants of this pattern.

WebSocket vs SSE vs long polling

Senior interviewers will press on transport. The answer is workload-specific.

Long polling is the floor. It works everywhere (every browser, every proxy, every corporate firewall) and is fine for low-frequency events. For typing indicators, you would burn three to five HTTP round trips per typing burst. Doable, ugly.

Server-Sent Events is one-way push from server to client over a single HTTP connection. Cheap, simple, browser-native via EventSource. You still need a separate channel (regular HTTP or WebSocket) for the client to send "typing started" up to the server. Workable for read-heavy event streams. Not great for the typing case where there is symmetric traffic.

WebSocket is full-duplex over one connection. The right pick for chat-shaped workloads. The cost is operational: WebSockets do not play well with stateless load balancers, you have to handle reconnect storms after a deploy, and idle timeouts on intermediaries (around 60s on AWS ALB by default; see AWS docs) require ping/pong heartbeats. Worth it for chat — would be overkill for a stock ticker.

The senior answer is: WebSocket per device for chat, with SSE-only fallback for restrictive networks, and HTTP fallback as the floor. Three transports, one logical channel.

Why ephemeral signals need a different store

Here is where the design splits. Messages must be durable. They go through the message service, into a write-ahead log, into a sharded message store (Cassandra, Bigtable, or a custom log), and ack the sender only when persisted. Multi-region replication, exactly-once semantics, the works.

Typing indicators must not be durable. They go through Redis Pub/Sub and nowhere else. No retry, no replay, no log. If you store them in Cassandra alongside messages, you are paying for replication, compaction, and disk writes for an event that will be useless in 800ms. At WhatsApp scale, that is millions of writes per second to a database that does not need to keep them.

The same logic applies to presence (online/offline) and read receipts. Presence is a TTL'd key in Redis with a heartbeat. Read receipts are arguably durable (you want them to survive a reconnect) but typing is not. Two stores, two contracts.

This is the cleanest test of whether a candidate understands the message vs signal distinction. Most teams in production run two pipelines: a durable one for messages, an ephemeral one for signals. Cross-talk between them only on observability, not on data.

Backpressure and presence storms

The interesting failure mode at scale is the storm. A celebrity sends a broadcast message to a group of a few hundred people (WhatsApp-sized groups have historically capped in the hundreds, though the exact ceiling has moved over time). All those phones see "X is typing" — and within a beat, most of them start replying. Now the celebrity's gateway is publishing one typing event per active replier through Redis Pub/Sub, fanning out to as many gateways, each pushing to one WebSocket. Pub/Sub fan-out is cheap; the WebSockets pushing the event downstream are not, and the celebrity's phone is now receiving a flood of typing events from every active chat.

Three defenses.

Debounce on the sender. The client only emits "typing" once every 3 seconds, and emits "stopped typing" 5 seconds after the last keystroke. The server discards typing events that arrive faster than once every 2 seconds per pair. This kills most of the fan-out before it reaches the network.

Coalesce at the gateway. If the same recipient is connected to one gateway and several senders are typing in different chats with that recipient, the gateway can batch the WebSocket push. One frame, multiple events. The recipient's UI only updates at most every 100ms anyway.

Drop, do not buffer. When the gateway's outbound WebSocket queue is over a threshold (say 100 pending frames), drop typing events first. Drop presence updates next. Drop messages last (and only after escalating). The transport must shed load gracefully when the consumer is slow.

A 60-line sketch

A minimal Python WebSocket gateway that publishes typing events through Redis Pub/Sub with debounce. Single-process, in-memory connection registry. Real production code adds reconnect handling, auth, and a sharded gateway pool.

import asyncio, json, time
import redis.asyncio as redis
import websockets

# user_id -> WebSocket connection
connections: dict[str, websockets.WebSocketServerProtocol] = {}
# (sender, recipient) -> last sent timestamp
last_sent: dict[tuple[str, str], float] = {}
DEBOUNCE_S = 2.0

async def gateway(ws):
    user_id = await ws.recv()           # auth handshake
    connections[user_id] = ws
    rds = redis.Redis()

    sub_task = asyncio.create_task(
        subscribe_for_user(user_id, ws, rds)
    )
    try:
        async for raw in ws:
            event = json.loads(raw)
            if event["type"] != "typing":
                continue
            recipient = event["to"]
            key = (user_id, recipient)
            now = time.monotonic()
            if now - last_sent.get(key, 0) < DEBOUNCE_S:
                continue
            last_sent[key] = now
            await rds.publish(
                f"typing:{recipient}",
                json.dumps({"from": user_id, "ts": now}),
            )
    finally:
        sub_task.cancel()
        connections.pop(user_id, None)

The receive side is symmetric: subscribe to the recipient's channel, push frames down the WebSocket, and bail the moment the consumer is too slow to keep up.

async def subscribe_for_user(user_id, ws, rds):
    pubsub = rds.pubsub()
    await pubsub.subscribe(f"typing:{user_id}")
    async for msg in pubsub.listen():
        if msg["type"] != "message":
            continue
        try:
            await asyncio.wait_for(
                ws.send(msg["data"]), timeout=0.5
            )
        except (asyncio.TimeoutError, Exception):
            return                       # drop slow consumer

async def main():
    async with websockets.serve(gateway, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Sixty-ish lines. The shape is the whole answer: one connection per user, Redis Pub/Sub as the cross-gateway bus, debounce on send, drop-on-slow on receive. No persistence anywhere in the typing path. If the recipient is offline, no subscriber on the channel — the publish is a no-op, the event vanishes. Exactly what you want.

What the question is really testing

Typing indicators are a forcing function. The candidate who passes shows three things: they pick WebSocket without prompting and can defend it against SSE and long polling, they separate ephemeral signals from durable messages and pick different stores for each, and they reach for Redis Pub/Sub (or an equivalent) for the cross-gateway hop without trying to make the durable message bus do double duty.

The candidate who fails usually does one of two things. They try to put typing events into the same Kafka topic as messages "for consistency." Or they try to scale the WebSocket layer without sharding the recipient registry, and end up describing every gateway holding every user's connection state. Both are failure modes you only learn about by having shipped real-time at scale, which is exactly why the senior interviewer is asking.

If this was useful

The full WhatsApp design (messages, presence, group fan-out, multi-region) is one of the 15 walkthroughs in System Design Pocket Guide: Interviews. And the cross-gateway pubsub patterns generalize well past chat: the Event-Driven Architecture Pocket Guide covers the durable side: outbox, saga, CQRS, and the traps that look like obvious wins until you ship them.