DEV Community

Alex Aslam
Alex Aslam

Posted on

How Discord Uses Event Sourcing for Message History

"Discord stores trillions of messages—and lets you scroll back years in milliseconds. Here’s how."

Discord’s real-time chat handles millions of messages per second while allowing users to:
⏪ Scroll back years in any channel
🔍 Search instantly across servers
🔄 Sync messages flawlessly across devices

The secret? Event sourcing with some brilliant optimizations.

Let’s break down their architecture—and what you can steal for your own apps.


1. The Core Architecture

Problem: Traditional CRUD Fails at Scale

  • Updates/deletes would break message history
  • Searching 1B+ messages with LIKE is a nightmare
  • Sharding alone doesn’t solve consistency

Discord’s Event-Sourced Approach

  1. Every message is an event:
   {
     "event_id": "msg_abc123",
     "channel_id": "xyz789",
     "user_id": "u_456",
     "content": "Hello world",
     "timestamp": "2023-05-10T14:30:00Z",
     "deleted": false
   }
Enter fullscreen mode Exit fullscreen mode
  1. Events are immutable (edits/deletes append new events)
  2. Projections power real-time views:
    • Channel view: SELECT * FROM messages WHERE channel_id = ? ORDER BY timestamp
    • User search: Elasticsearch index over content

2. Key Optimizations

Optimization 1: Hybrid Storage

  • Hot data (recent messages): Cassandra (low-latency reads)
  • Cold data (old messages): S3 + RocksDB (cost-effective)

Why it works:

  • 99% of queries target last 30 days → optimize for that.
  • Older data uses compressed, columnar formats.

Optimization 2: Incremental Snapshots

Instead of replaying all events when a user loads a channel:

  1. Store daily snapshots of channel state.
  2. Only replay events since the last snapshot.

Result: Loading a channel with 10K messages takes 50ms vs. 5s.

Optimization 3: Event Chunking

  • Group events into 5-minute chunks (e.g., 2023-05-10T14:30:00Z to 2023-05-10T14:35:00Z)
  • Pros:
    • Faster replay (process chunks in parallel)
    • Efficient compression (chunks deduplicate metadata)

3. Handling Deletes and Edits

Problem: GDPR requires message deletion, but events are immutable.

Solution:

  1. Soft-delete events:
   {
     "event_id": "del_abc123",
     "target_message_id": "msg_xyz456",
     "timestamp": "2023-05-10T15:00:00Z"
   }
Enter fullscreen mode Exit fullscreen mode
  1. Filter deleted messages in projections:
   SELECT * FROM messages
   WHERE channel_id = ?
     AND deleted = false
   ORDER BY timestamp
Enter fullscreen mode Exit fullscreen mode
  1. Purge data legally:
    • Pseudonymization: Replace user_id/content with [REDACTED]
    • Cold storage pruning: Batch-delete chunks older than X years

4. Lessons for Your Apps

Start with immutable events (even if you don’t "need" event sourcing yet)
Separate hot/cold data early (Discord migrated later—painfully)
Snapshot incrementally (daily > real-time for most apps)
Design for deletion (GDPR isn’t retroactive)


"But We’re Not Discord!"

You don’t need to be. Start small:

  1. Log critical changes as events (e.g., user bans, payments).
  2. Build one projection (e.g., "last 100 messages").
  3. Expand as scale demands.

Have you tried event sourcing for real-time apps? Share your wins (or fails) below.

Top comments (0)