Group Chat at Scale: Architecting Read Receipts for 10,000-Member Communities
Building a group chat system that scales to thousands of members is deceptively complex. You're not just storing messages, you're orchestrating real-time notifications, tracking read status across diverse network conditions, and managing file uploads, all while keeping latency under control. This is the kind of challenge that separates production systems from prototypes.
Architecture Overview
A robust group chat system needs to separate concerns across several key layers. At the foundation, you have a message store (typically a distributed database like Cassandra or DynamoDB) that handles write-heavy workloads with high throughput. Messages arrive through an API gateway that routes requests to backend services, while a message queue (Kafka, RabbitMQ) decouples ingestion from processing and ensures no messages are lost during traffic spikes.
For real-time delivery, WebSocket connections maintain persistent channels between clients and servers. A connection manager distributes these connections across multiple servers using consistent hashing, so when a user reconnects, they can pick up where they left off. This layer is crucial because you can't afford to broadcast every single message to every single connection.
File sharing adds another dimension: you'll want to offload actual file storage to object storage (S3, GCS) while keeping metadata in your main database. This prevents your message store from bloating and lets you serve files with a CDN. For features like mentions and threads, you'd add indexing layers and potentially a search engine like Elasticsearch to make queries snappy even with millions of messages in the archive.
Media and Search Considerations
Threads deserve special attention in the architecture. Rather than flattening all replies into a single stream, thread replies can be stored separately with a parent message ID reference. This keeps your primary feed clean and lets users dive into conversations without drowning in context. Mentions require a tagging system that indexes user handles, enabling fast autocomplete and notification routing.
Design Insight: Read Receipts at Scale
Here's where the 10,000-member challenge gets interesting. You cannot afford to store individual read receipt records for every user in every group. That's a billion rows of data for a modest 100,000 messages across 10,000 users. Instead, the clever approach is aggregation. Rather than tracking John's, Maria's, and Arun's read status individually, you track thresholds: "the furthest message this group has collectively read is message #8,943."
To implement this efficiently, clients send read receipt acknowledgments to a dedicated service that batches updates and writes them to a time-series database or cache layer (Redis works well here). Every few seconds, you compute the group's read progress by querying which messages have been read by at least a quorum of users, or you track the 95th percentile of read positions. When a user opens the chat, you serve them their personal read position from cache, instantly highlighting which messages are new. This approach reduces database writes by orders of magnitude while still giving users that "I know what's new" visual feedback.
Watch the Full Design Process
Curious how these decisions come together visually? Watch the AI-powered architecture generation process unfold in real-time. This is Day 33 of our 365-day system design challenge, and seeing the diagram build from a plain English description shows exactly how these components fit together.
Try It Yourself
Ready to design your own system? Head over to InfraSketch and describe your system in plain English. In seconds, you'll have a professional architecture diagram, complete with a design document. Whether you're tackling group chat, real-time notifications, or collaborative tools, let InfraSketch turn your ideas into visual architecture instantly.
Top comments (0)