Why Designing a Real-Time Chat Application Is Hard
Designing a real-time chat application is significantly more complex than building systems like a URL shortener or a notification service.
The main reasons are:
- Real-time bidirectional communication
- Handling millions of concurrent connections
- Ensuring low latency
- Managing message persistence and offline delivery
Unlike simple request-response systems, chat applications require persistent connections and instant delivery at scale.
Functional Requirements
- 1-to-1 messaging
- Group messaging
- Message persistence
- Offline message delivery (messages should be delivered when a user comes online)
Non-Functional Requirements
- Scalable to millions of users
- Low latency (< 500 ms)
- Fault tolerant
- Highly available
- Durable storage
Choosing the Correct Communication Protocol
Since our latency requirement is less than 500 ms, traditional short polling or long polling are not ideal because they introduce unnecessary delays and overhead.
Server-Sent Events (SSE) are also not suitable because they support only one-way communication (server → client), whereas a chat system requires two-way communication.
Therefore, we use WebSockets, which provide:
- Persistent connections
- Bidirectional communication
- Low latency
- Reduced network overhead
Modern messaging platforms like WhatsApp use persistent connections to achieve real-time communication.
High-Level Architecture

Our system consists of the following components:
1. Client
The client maintains a WebSocket connection with the server to send and receive messages.
2. Load Balancer
The load balancer distributes incoming WebSocket connections across multiple chat servers to ensure scalability and high availability.
3. Chat Servers
Chat servers handle the core business logic:
- Manage WebSocket connections
- Validate messages
- Store messages in the database
- Deliver messages to recipients
4. Redis
Since the load balancer does not know which user is connected to which chat server, we store connection mappings in Redis.
Example:
userId → serverId / connectionId
This allows any server to determine whether a user is online and where to route the message.
Database
We use a scalable NoSQL database such as Amazon DynamoDB or any key-value store because:
- We require high write throughput
- We do not need strict ACID guarantees
- Horizontal scaling is easier
1-to-1 Message Flow
- The sender sends a message via WebSocket.
- The chat server validates and stores the message in the database (for persistence).
- The server checks Redis to determine whether the recipient is online.
- If the recipient is online: The message is delivered immediately via WebSocket.
- If the recipient is offline: The message remains stored in the database.
- It will be delivered when the user reconnects.
Group Chat Message Flow
- A user sends a message to a group.
- The message is stored in the database with the group ID.
- The server retrieves the list of group members.
- For each member: Check Redis for their connection.
- If online → deliver via WebSocket.
- If offline → deliver when they reconnect.
Challenges
Designing the architecture is only the beginning. The real complexity lies in handling the following challenges at scale.
Scaling Millions of WebSocket Connections
Each active user maintains a persistent WebSocket connection with the server.
Problems:
- Each connection consumes memory.
- A single server can handle only a limited number of concurrent connections.
- Sudden traffic spikes (e.g., during peak hours) can overwhelm servers. Solution:
Use horizontal scaling (multiple chat servers).
- Keep servers stateless.
- Store connection metadata in a centralized store like Redis.
- Use load balancers to distribute traffic evenly.
This ensures we can scale to millions of concurrent users.
The Fan-Out Problem in Group Chats
When a user sends a message in a group with 10,000 members, the system must deliver that message to all members.
This creates a massive delivery overhead.
Two common approaches:
Fan-out on Write
- When a message is sent, it is immediately distributed to all group members.
- Faster reads.
- Heavy write amplification.
Fan-out on Read
- Store one copy of the message.
- Deliver it only when users fetch or reconnect.
- Reduces write load but increases read complexity.
Large-scale systems like Slack often use optimized hybrid approaches depending on group size.
Message Ordering
Messages must appear in the correct order for each conversation.
Problems:
- Messages may arrive out of order due to network delays.
- Multiple servers handling requests can cause race conditions.
Solution:
- Assign a sequence number per conversation.
- Store timestamps.
- Let clients reorder messages based on sequence IDs.
Maintaining ordering becomes especially challenging in distributed systems.
Handling Offline Users
Users may disconnect unexpectedly due to:
- Network issues
- App crashes
- Device shutdown
The system must:
- Store undelivered messages safely.
- Detect when the user reconnects.
- Deliver pending messages reliably.
This requires durable storage (e.g., NoSQL databases like Amazon DynamoDB).
Delivery Guarantees
Should messages be delivered:
- At most once?
- At least once?
- Exactly once?
Exactly-once delivery is extremely hard in distributed systems.
Most chat systems:
- Use at-least-once delivery.
- Assign unique message IDs. Let clients deduplicate messages if needed.
Fault Tolerance
What happens if:
- A chat server crashes?
- Redis goes down?
- A database node fails?
Solutions:
- Replicated databases.
- Redis clustering.
- Health checks and auto-restarts.
- Multi-availability zone deployments.
Large messaging systems like WhatsApp are designed with redundancy at every layer to avoid message loss.
Data Storage & Hot Partitions
If many users are chatting in the same popular group, all writes may hit the same database partition.
This creates:
- Hot keys
- Increased latency
- Throttling
Solutions:
- Partition by conversation ID + time bucket.
- Use sharding strategies.
- Distribute load evenly across nodes.
Conclusion
Designing a real-time chat application goes far beyond simply sending messages between users. It requires solving complex distributed systems problems such as scaling millions of persistent connections, ensuring low latency, handling offline users, maintaining message ordering, and guaranteeing fault tolerance.
By using WebSockets for bidirectional communication, horizontally scalable chat servers, centralized connection mapping with Redis, and durable storage solutions like Amazon DynamoDB, we can build a system capable of supporting millions of users efficiently.
The real challenge is not just building the architecture — it’s understanding the trade-offs between scalability, consistency, and reliability.
A well-designed chat system is a practical example of how distributed systems principles are applied in real-world applications.
Top comments (0)