Ganesh Parella

Posted on Mar 4

How to design a Real-Time Chat Application

#distributedsystems #career #systemdesign #architecture

Why Designing a Real-Time Chat Application Is Hard

Designing a real-time chat application is significantly more complex than building systems like a URL shortener or a notification service.

The main reasons are:

Real-time bidirectional communication
Handling millions of concurrent connections
Ensuring low latency
Managing message persistence and offline delivery

Unlike simple request-response systems, chat applications require persistent connections and instant delivery at scale.

Functional Requirements

1-to-1 messaging
Group messaging
Message persistence
Offline message delivery (messages should be delivered when a user comes online)

Non-Functional Requirements

Scalable to millions of users
Low latency (< 500 ms)
Fault tolerant
Highly available
Durable storage

Choosing the Correct Communication Protocol

Since our latency requirement is less than 500 ms, traditional short polling or long polling are not ideal because they introduce unnecessary delays and overhead.

Server-Sent Events (SSE) are also not suitable because they support only one-way communication (server → client), whereas a chat system requires two-way communication.

Therefore, we use WebSockets, which provide:

Persistent connections
Bidirectional communication
Low latency
Reduced network overhead

Modern messaging platforms like WhatsApp use persistent connections to achieve real-time communication.

High-Level Architecture

Our system consists of the following components:

1. Client

The client maintains a WebSocket connection with the server to send and receive messages.

2. Load Balancer

The load balancer distributes incoming WebSocket connections across multiple chat servers to ensure scalability and high availability.

3. Chat Servers

Chat servers handle the core business logic:

Manage WebSocket connections
Validate messages
Store messages in the database
Deliver messages to recipients

4. Redis

Since the load balancer does not know which user is connected to which chat server, we store connection mappings in Redis.

Example:

userId → serverId / connectionId

This allows any server to determine whether a user is online and where to route the message.

Database

We use a scalable NoSQL database such as Amazon DynamoDB or any key-value store because:

We require high write throughput
We do not need strict ACID guarantees
Horizontal scaling is easier

1-to-1 Message Flow

The sender sends a message via WebSocket.
The chat server validates and stores the message in the database (for persistence).
The server checks Redis to determine whether the recipient is online.
If the recipient is online: The message is delivered immediately via WebSocket.
If the recipient is offline: The message remains stored in the database.
It will be delivered when the user reconnects.

Group Chat Message Flow

A user sends a message to a group.
The message is stored in the database with the group ID.
The server retrieves the list of group members.
For each member: Check Redis for their connection.
If online → deliver via WebSocket.
If offline → deliver when they reconnect.

Challenges
Designing the architecture is only the beginning. The real complexity lies in handling the following challenges at scale.

Scaling Millions of WebSocket Connections

Each active user maintains a persistent WebSocket connection with the server.

Problems:

Each connection consumes memory.
A single server can handle only a limited number of concurrent connections.
Sudden traffic spikes (e.g., during peak hours) can overwhelm servers. Solution:

Use horizontal scaling (multiple chat servers).

Keep servers stateless.
Store connection metadata in a centralized store like Redis.
Use load balancers to distribute traffic evenly.

This ensures we can scale to millions of concurrent users.

The Fan-Out Problem in Group Chats

When a user sends a message in a group with 10,000 members, the system must deliver that message to all members.

This creates a massive delivery overhead.

Two common approaches:

Fan-out on Write

When a message is sent, it is immediately distributed to all group members.
Faster reads.
Heavy write amplification.

Fan-out on Read

Store one copy of the message.
Deliver it only when users fetch or reconnect.
Reduces write load but increases read complexity.

Large-scale systems like Slack often use optimized hybrid approaches depending on group size.

Message Ordering

Messages must appear in the correct order for each conversation.

Problems:

Messages may arrive out of order due to network delays.
Multiple servers handling requests can cause race conditions.

Solution:

Assign a sequence number per conversation.
Store timestamps.
Let clients reorder messages based on sequence IDs.

Maintaining ordering becomes especially challenging in distributed systems.

Handling Offline Users

Users may disconnect unexpectedly due to:

Network issues
App crashes
Device shutdown

The system must:

Store undelivered messages safely.
Detect when the user reconnects.
Deliver pending messages reliably.

This requires durable storage (e.g., NoSQL databases like Amazon DynamoDB).

Delivery Guarantees

Should messages be delivered:

At most once?
At least once?
Exactly once?

Exactly-once delivery is extremely hard in distributed systems.

Most chat systems:

Use at-least-once delivery.
Assign unique message IDs. Let clients deduplicate messages if needed.

Fault Tolerance

What happens if:

A chat server crashes?
Redis goes down?
A database node fails?

Solutions:

Replicated databases.
Redis clustering.
Health checks and auto-restarts.
Multi-availability zone deployments.

Large messaging systems like WhatsApp are designed with redundancy at every layer to avoid message loss.

Data Storage & Hot Partitions

If many users are chatting in the same popular group, all writes may hit the same database partition.

This creates:

Hot keys
Increased latency
Throttling

Solutions:

Partition by conversation ID + time bucket.
Use sharding strategies.
Distribute load evenly across nodes.

Conclusion

Designing a real-time chat application goes far beyond simply sending messages between users. It requires solving complex distributed systems problems such as scaling millions of persistent connections, ensuring low latency, handling offline users, maintaining message ordering, and guaranteeing fault tolerance.

By using WebSockets for bidirectional communication, horizontally scalable chat servers, centralized connection mapping with Redis, and durable storage solutions like Amazon DynamoDB, we can build a system capable of supporting millions of users efficiently.

The real challenge is not just building the architecture — it’s understanding the trade-offs between scalability, consistency, and reliability.

A well-designed chat system is a practical example of how distributed systems principles are applied in real-world applications.