DEV Community

Sospeter Mong'are
Sospeter Mong'are

Posted on

How I’d Design a Collaborative Tool Like Google Docs (After Fumbling the Interview Years Ago)

A few years ago, a recruiter asked me to design a real-time collaborative tool like Google Docs during an interview. Back then, I fumbled my mind went blank, and I gave a vague, unstructured answer.

Today? I wish that recruiter would ask me again. I’m very ready.

Let me walk you through how I’d design it now—with lessons learned from system design deep dives, real-world architectures, and a bit of revenge-driven motivation.

1. What Are We Building?

A real-time collaborative text editor where:

  • Multiple users edit the same document simultaneously.
  • Changes appear instantly (<200ms latency).
  • Conflicts (e.g., two people typing in the same spot) are resolved cleanly.
  • Version history is preserved.
  • Works offline (optional but impressive).

2. Breaking Down the Problem

Core Challenges:

  1. Real-Time Sync: How do we sync edits across users without lag?
  2. Conflict Resolution: What if two people edit the same word at the same time?
  3. Scalability: How do we support millions of concurrent documents?
  4. Persistence: Where and how do we store documents and their history?

3. High-Level Architecture

We’ll use a client-server model:

  • Clients (Web/Mobile): Render the document, send edits, and apply remote changes.
  • Servers: Process updates, resolve conflicts, and broadcast changes.
  • Database: Store documents and version history.

4. Deep Dive: Key Components

A. Real-Time Synchronization

Problem: When User A types "Hello" and User B deletes "World" at the same time, how do we merge changes?

Solutions:

  1. Operational Transformation (OT) – Used by Google Docs.
    • Transforms incoming operations to resolve conflicts.
    • Example: If User A inserts text where User B deleted, OT repositions A’s insertion.
  2. Conflict-Free Replicated Data Types (CRDTs) – Used by Figma, Notion.
    • Data structures that guarantee consistency without a central server.
    • More scalable but memory-intensive.

Which to choose?

  • OT is complex but battle-tested (Google Docs, Etherpad).
  • CRDTs are decentralized but harder to debug.

B. Handling Real-Time Updates

  • WebSockets maintain persistent connections for low-latency updates.
  • Pub/Sub Model (Redis/Kafka) broadcasts changes to all clients in a doc’s channel.

C. Storage & Persistence

  • Master Document: Stored in a database (PostgreSQL for relational, MongoDB for JSON).
  • Version History:
    • Snapshotting (e.g., save full doc every 100 edits).
    • Diffs (delta storage) to save space (like Git).
  • Offline Support: Queue changes locally, sync when reconnected.

D. Scaling to Millions of Users

  • Sharding: Split documents by doc_id across servers.
  • Caching: Hot documents in Redis/Memcached.
  • Global Replication: Deploy in multiple regions (AWS/GCP) for low latency.

5. Handling Edge Cases

  • Simultaneous Edits: Use OT/CRDTs to merge changes logically.
  • Network Failures: Retry failed updates with exponential backoff.
  • Data Corruption: Roll back using version history.

6. Advanced Considerations (Bonus Points)

  • Access Control: Role-based permissions (view/edit/share).
  • Rich Media Support: Images, tables, comments (store as separate entities).
  • Analytics & Suggestions: Track edits for features like Grammarly.

7. What I Learned Since That Interview

  1. Start simple, then refine. Don’t dive into OT vs. CRDTs immediately—first outline the big pieces.
  2. Trade-offs matter. Explain why you’d choose OT (mature) vs. CRDTs (scalable).
  3. Think about scale early. How will this work with 1M concurrent docs?

Final Answer

To build Google Docs today, I’d:

  1. Use WebSockets + OT/CRDTs for real-time sync.
  2. Store docs in a sharded database with versioned snapshots.
  3. Scale with caching, load balancing, and global replication.
  4. Handle conflicts via Operational Transformation (or CRDTs for a decentralized approach).

Moral of the story? Failing an interview question just means you’ll come back stronger.

What Would You Add?

Would you choose OT or CRDTs? How would you handle offline mode differently?

I would love to get your view as well.


Credits:

Top comments (0)