Samuel Danquah

Posted on Jun 26, 2020

Collaborative Editing with Gridsome and FaunaDB: Introduction

#webdev #vue #yjs #fauna

Document Conventions

The following conventions are employed throughout this article.

JAMstack	Javascript, Api’s, and Markup technology stack
CMS	Content Management System
OT	Operational Transformation
CRDT	Conflict-Free Replicated Data Types
CmRDT	Commutative Replicated Data Types
P2P	Peer-to-Peer
ABAC	Attribute-Based Access Control

Background

The dawn of Web 2.0 sparked an explosive interest in browser-based document editing tools. Today, creating contents necessitate collaboration to arrive at the final masterpiece - a gestalt in addition to the many revisions throughout a content’s life cycle. Google Docs, a free collaborative editing tool, gained a meteoric rise in the collaborative editing space on the bedrock of real-time collaboration. In this modern era of the JAMstack, headless content management systems provide a cheap, scalable, and developer-friendly approach to content creation and management in contrast to classical monoliths. However, the concept of collaborative editing is not native to the JAMstack ecosystem of headless CMS nor its monolithic predecessors. Therefore, we observe mundane use cases where content collaboration happens on Google Docs (and other alternatives) before content transfer to a CMS of choice. This approach throttles down on the overall productivity engine of the user experience. How can dynamic JAMstack applications trapped in this scenario surmount this bottleneck? To answer this question, we explore the real-time collaborative editing landscape available to the JAMstack ecosystem as a starting reference, the challenges, and finally build a collaborative text editor using WebRTC with data persistence to FaunaDB.

But first, we will dive under the hood of a real-time collaborative editor to comprehend the moving parts and their integration.

Mechanics of a real-time collaborative editor

Version control system like Git is the polar opposite of a real-time collaborative editor. A real-time collaborative editor has a singular goal of providing consensus across multiple users editing the same content in near real-time at its foundation. Furthermore, a practical distributed team of content editors may place additional constraints on the ability to collaborate on slow and flaky internet connections with seamless content resynchronization capabilities as a minimum requirement. Fundamentally, a real-time collaborative editor mandates a delicate balancing act across the trident of clients (content editors), concurrency (simultaneous editing), and communication (centralized or decentralized).

Clients

Clients represent end-users who need to collaborate on content with write permissions in real-time. The number may vary depending on the size of the collaboration team wanting to edit the content at any instant of time (Figure 1). Increasing clients spike the complexity of scalability for any given collaborative editing solution. For example, Google Docs allows a maximum of 100 users to both comment and edit a document at the same time.

Concurrency

Concurrency relates to conflicts that are bound to arise from clients simultaneously editing the same content and a primary challenge for real-time collaborative editing. Addressing this issue requires a blend of concurrency control and consistency maintenance across all simultaneous users accessing the content. Whether creating, updating, or deleting content from concurrent actions from users, collaborative editing operations can be classified as insert and delete actions at a basic level. A real-time collaborative editor then has to satisfy two challenging axioms: commutativity and idempotency.

Commutativity: Concurrent insert and delete operations converge to the same result despite the order of application.
Idempotency: Repeated delete operations must produce the same result.

Real-time collaborative editing solutions diverge into two broad strategies to tackle these challenges. The first and popular approach is Operational Transformation (OT). This strategy gave birth to the first generation of real-time collaborative editors - Etherpad, Firepad, Google Wave, among others. However, OT is more complicated to implement in a decentralized communication model than a centralized one. The plethora of OT algorithms with several trade-offs published in academic papers to address a fully distributed approach highlights the complexities involved. Current real-time collaborative editors still use OT and mostly adopt a centralized communication model. The alternative approach to OT is Conflict-Free Replicated Data Types (CRDT). Researchers discovered this alternate strategy in a quest to improve the robustness and simplicity of OT. CRDT has created a new generation of real-time collaborative editors.

Communication

Communication protocols provide an avenue to relay content changes from the content creator to other concurrent content editors. The major obstacle with real-time communication is the physical distance between users for which there are no plausible solutions. The two prevalent models are centralized and decentralized architectures.

Centralized Architecture

This architecture (Figure 2) relies on a client-server model. It requires a central server to mediate content changes between multiple concurrent clients and achieve convergence across users. The central server handles concurrency control and consistency maintenance.

Paradoxically, it serves as both a single source of truth for conflict resolutions as well as a single source of failure. If the server fails, content editors can no longer collaborate. Another limitation is the introduction of undesirable high latencies between nearby clients who are both far away from the central server (Figure 3). The central server has to receive content updates from one user(Anna) for conflict resolution before sending it to another user(Evan) even though they are near (10ms) to each other. Furthermore, a centralized server presents scaling challenges for increasing large distributed clients.

Decentralized Architecture

This architecture is a peer-to-peer system. Each end-user(peer) acts as both client and server with direct communication between users and no need for a central server. However, a signaling server is required to broker the initial peer-to-peer connections and serves no further purpose. After, clients exchange content updates directly. WebRTC is the protocol that enables real-time communication among concurrent users and is present in modern browsers. This decentralized approach is in stark contrast to the centralized architecture (Figure 4) and does not suffer from the issues of a centralized server. However, there is a limit on the maximum number of peer-to-peer connections possible, and the client browser decides this limit. For instance, the Chromium browser sets a limit of 500 peer-to-peer connections from its source code.

Comparison of real-time collaboration solutions

This is not an exhaustive list and is meant to be a referencing guide.

	Yjs	CKEditor	Collab	ShareDB
Communication	P2P	Centralized	Centralized	Centralized
License	MIT	Proprietary	MIT	MIT
Network-agnostic	Yes	No	No	No
Offline editing	Yes	No	`No	Yes
Algorithm	CRDT	OT	Reconciliation	OT
Supported Editors	ProseMirror	CKEditor	ProseMirror	CodeMirror
	Quill			Quill
	Monaco			Ace
	Ace
	CodeMirror
Support Content Types other than Text	Yes	No	No	Yes
Scalable	Yes	Yes	No	Yes
Server Data Loss Sync	Yes	No	No	No

Real-time collaborative text editor: Case Study

Let us look at a hypothetical company X with a real-world collaboration problem while building a real-time collaborative text editor solution in the process:

Background

Company X has a small distributed team of content editors, spread across the US and Europe who have to collaborate on contents simultaneously. For every published content, 30 content editors (maximum) have to work simultaneously. Company X seeks a seamless solution to the problem, and like most companies is on a tight budget. How can company X address this issue? We are going to explore an example application to cover this use case.

The maximum number of clients working at any instant on a single content is 30. Next, we have to look at concurrency control and what communication protocols present a united front on efficiency and cost. Do we employ an OT or CRDT approach for concurrency and conflict resolution? An OT strategy simplifies concurrency to one conflict at a time, which a centralized server can resolve before propagating content changes to all clients. However, this solution is also a single point of failure and has little fault tolerance. If the central server goes offline at any point, the team cannot collaborate, which is undesirable. OT also creates inefficiencies as content editors living in nearby neighborhoods always have to send content updates first to a distant central server for conflict resolution before neighboring content editors can receive the content changes. Lastly, running on a central server for conflict resolution implies OT brings extra costs for a tight budget. We already have non-negotiable bills for persisting content updates to a database. It is important to note that in other scenarios where a large distributed team needs to collaborate simultaneously, server costs for communication is inevitable. Can a CRDT alternative solve these challenges for company X with a tight budget? Contrary to OT, CRDT provides a decentralized autonomous conflict resolution strategy without a central server for concurrency control irrespective of the number of connected clients (content editors). This approach eliminates a single point of failure observed in the OT strategy and is more fault-tolerant.

Furthermore, CRDT guarantees automatic content synchronization across all connected clients. For clients with flaky internet connections, content updates are seamless upon reconnection and resolve to a unified state for the team of content editors. In the earlier situation of nearby content editors with the distant central server, CRDT has an option for direct peer-to-peer communication through protocols such as WebRTC present in modern browsers. This option for P2P can save costs for running active communication servers. However, CRDT needs a signaling server to establish the initial direct connection among peers or clients. This signaling server is analogous to a matchmaker for clients and takes no further part in the actual communication of content updates among connected clients. There are free to use public signaling servers as well as self-hosted private options. In summary, our solution will employ a CRDT strategy, and hence we use Yjs, a network agnostic shared editing framework.

Lastly, we need to persist the content to a database. Since we are working with a distributed team of content editors, FaunaDB presents a viable solution (Figure 5) for data persistence with its global serverless database, which provides low latency data access and distributed ACID transactions without sacrificing data consistency.

Yjs has integrations for several text editors, as seen in the comparison chart for real-time collaborative editing solutions. For our scenario, we use ProseMirror, an open-source toolkit for building renderless and extensible rich text editors. Yjs also provides a communication connector for webRTC, among others. We will use this connector (y-webrtc) in our application, which satisfies our current threshold of 30 maximum collaborators per document or content for company X. Now, we have the ingredients to make a real-time collaborative text editor with data persistence. But, how secure is this system end to end?

Security

Figure 6 shows the communication model for our decentralized real-time collaborative editing solution. Communication channels exist across client to client (P2P), client to FaunaDB, and the client to signaling server bridges. From figure 6, Evan creates the document for collaboration with a unique room attribute as well as a shared secret attribute and persists to FaunaDB (GraphQL or FQL).

Through FaunaDB’s attribute-based security model (ABAC), we can implement fine-grained access control for collaborators (content editors). Hence, Evan grants Anna collaboration rights to his document. Evan retrieves the document from FaunaDB in our real-time collaborative text editor. This communication channel between Evan (client) and FaunaDB (database) is secure over HTTPS, among other security mechanisms. Communication between client and database is an already solved problem, especially for JAMstack apps.

Therefore, we focus on the other two communication channels, according to Figure 6. Next, both Evan and Anna access our real-time collaborative editing (we-collab) app. They both communicate with the signaling server, which is responsible for handling unique and relevant signaling information such as the IP address and facilitates the necessary exchange of this information to connected clients. Signaling servers follow a publish-subscribe pattern. Yjs provides an option for a shared secret as well as a room for collaboration. FaunaDB secures this information with its strong ABAC model. Yjs uses the shared secret to encrypt the information exchange from the client to the signaling server.

Hence, Evan and Anna can exchange signaling information securely, even on a public signaling server without any trust. Yjs also provides an option to utilize multiple signaling servers in our we-collab app for scalability.

Finally, the last communication channel occurs between clients, in our case Evan and Anna (Figure 6). Clients receive content updates through webRTC, directly from one client browser to the other. How secure is webRTC? This a legitimate concern, and the answer is VERY SECURE. WebRTC employs biometric encryption protocols with content sharing across encrypted data channels from peer to peer. Therefore content updates exchanged among peers (clients) are secure by default. The tight coupling of Yjs, webRTC, and FaunaDB with its strong ABAC model provides an end to end security for this application.

We-Collab App

We will now proceed to build the we-collab app using Gridsome(Vue) for Markup, WebRTC for peer-to-peer communication, and FaunaDB for data persistence, through its GraphQL API. Let’s get started:

Backend

Login to your FaunaDB account and create a new database collab from the cloud console. Next, create a schema.gql with the following code and upload it to the cloud console.

type Page {
   name: String! @unique
   room: String! @unique
   secret: String
   type: PageType!
   content: String!
 }

 enum PageType {
   PUBLIC
   PRIVATE
 }

 type Query {
   allPages: [Page!]
   findPageByName(name: String!): Page
 }

Let’s create some data into the collab database. Go to the GraphQL Playground in the cloud console and run the following code:

mutation createPublicPage {
  createPage(
    data: {
      name: "Public Page"
      room: "live-wall"
      type: PUBLIC
      secret: "afreeworld"
      content: ""
    }
  ){
    _id
    name
    room
    type
    secret
    content
  }
}

If successful, your screen should look similar to the picture below (Figure 7):

Lastly, we create a Guest Role using ABAC from FaunaDB. Still logged in, click on the SECURITY tab to the left of your screen, highlighted below.

Next, click on MANAGE ROLES as shown below.

Select NEW ROLE to create our guest role.

On the new screen, change MyCustomRole text to Guest.

Add the Page collection with Read and Write privileges under Collections. Next, add the findPageByName index with Read permission under Indexes. Lastly, add collections with Read and Write privileges under Schemas. Your screen should be similar to image below and click Save.

We can now create a new key for the Guest Role by clicking on NEW KEY under the SECURITY tab as shown below:

Next, choose the collab (Current Database) under Database, and select Guest under Role. You can also provide an optional key name. Select Save when you are done.

You should see the screen below with the guest key to connect our frontend. Save this key as an environment variable. In our frontend, we use this key as the GRIDSOME_GUEST_KEY variable. Now we are going to connect our frontend to FaunaDB’s GraphQL API.

Frontend

Our frontend uses vuetify for UI. Vuetify is a Vue UI library that comes out of the box with ready-to-use handcrafted material design components. We set up our application to consume the FaunaDB GraphQL API. Since we are not utilizing GraphQL subscriptions, we use Axios, a lightweight HTTP client library. You can also use the native browser Fetch API or the graphql-request library, a minimal GraphQL client. We create a mutation with the following code, to update our database with the current document version in a mutation.js file.

export const updatePageMutation = (payload) => {
   return `
     mutation updatePublicPage{
       updatePage(
         id: "${payload._id}"
         data: {
           content:"${payload.content}"
           name: "${payload.name}"
           room: "${payload.room}"
           type: ${payload.type}
         }
       ) {
         _id
         name
         room
         secret
         type
         content
       }
     }
   `
}

Next, we create a query in a _query.js _file with the following code to fetch a document (content) from our database with the updated document version, whenever a new client (content editor) opens our app for the first time.

export const getPageQuery = (payload) => {
   return `
     query getPage {
       getPage: findPageByName(
         name: "${payload}"
       ) {
         _id
         name
         room
         secret
         type
         content
       }
     }
   `
}

This content has two vital attributes (room and shared secret) for integration with Yjs. Then we create an index.js that houses HTTP calls to our database for queries and mutations through the GraphQL API with the following code:

import { guest, page, url } from "./config"
import { updatePageMutation } from "./mutation"
import { getPageQuery } from './query'
import axios from 'axios'

export const updatePage = (dataPayload) => {
   return axios({
       method: 'post',
       url,
       headers: {
           "authorization": `Bearer ${guest}`
       },
       data: {
           query: updatePageMutation(dataPayload)
       }
   })
}

export const getPage = async () => {
   return await axios({
       method: 'post',
       url,
       headers: {
           "authorization": `Bearer ${guest}`
       },
       data: {
           query: getPageQuery(page)
       }
   })
}

We now look at integrating Yjs as a real-time extension to the ProseMirror text editor in our app. Since we are using Gridsome, we will use Tiptap, a Vue flavored ProseMirror text editor. We begin by adding Yjs and y-webrtc connector to the realtime.js file.

import * as Y from 'yjs'
import { WebrtcProvider } from 'y-webrtc'

Next, we create an extension for our real-time collaboration functionality with the code below:

import * as Y from 'yjs'
import { WebrtcProvider } from 'y-webrtc'
import { Extension } from 'tiptap'


export default class Realtime extends Extension {
  get name() {
    return 'realtime'
  }
}

Our real-time extension accepts a document from our getPage query. We create a new instance for both Yjs and the y-webrtc connector and pass the fetched document from the database in the init function. We are storing the content for collaboration as a string in FaunaDB whereas Yjs uses the content as a Uint8Array. Therefore, we use the fromBase64 and toBase64 to handle the conversion between Uint8Array and a Base64 String. Every time the document is updated with the update hook of Yjs, all connected clients receive just the update and not the whole state of the document. At the same time, we persist the entire state of the document to FaunaDB. Any new client who joins to collaborate will first fetch the current copy from the database and receive direct incremental updates from connected clients as real-time updates arise. Y-webrtc also provides an option to set the maximum number of allowed connections per client browser. We set the maximum number of connected clients per client browser with maxConns. It is important that we add an element of randomness to connected clients to prevent them from forming clusters as shown in the code below:

import * as Y from 'yjs'
import { WebrtcProvider } from 'y-webrtc'
import { Extension } from 'tiptap'
import { fromBase64, toBase64 } from '@aws-sdk/util-base64-browser'

export default class Realtime extends Extension {
  get name() {
    return 'realtime'
  }
  get defaultOptions() {
    return {
        pageDoc: {},
        type: null,
        provider: null
    }
  }
  init() {
      const ydoc = new Y.Doc()

      if (
          typeof this.options.pageDoc.content === 'string' &&
          this.options.pageDoc.content.length !== 0
      ) {
          this.options.pageDoc.content = fromBase64(this.options.pageDoc.content)
          Y.applyUpdate(ydoc, this.options.pageDoc.content)
      }

      ydoc.on('update', update => {
          this.options.pageDoc.content = toBase64(Y.encodeStateAsUpdate(ydoc))
          updatePage(this.options.pageDoc)
      })

      this.options.provider = new WebrtcProvider(
          this.options.pageDoc.room, ydoc, {
              password: this.options.pageDoc.secret,
              maxConns: 40 + Math.floor(Math.random() * 30),
          }
      )
      this.options.type = ydoc.getXmlFragment('prosemirror')
  }
}

Yjs also provides shared editing cursors for connected clients just like in Google Docs to create a better user experience for all connected clients with undo and redo capabilities. You can customize what happens for undo and redo in the shared editing space. Finally, we add the shared editing cursors to our app with the code below:

import { keymap } from 'prosemirror-keymap' 
import { Extension } from 'tiptap' 
import { redo, undo, yCursorPlugin, ySyncPlugin, yUndoPlugin } from 'y-prosemirror' 
import * as Y from 'yjs' 
import { WebrtcProvider } from 'y-webrtc' 
import { updatePage } from '@/services/fauna/index.js' 
import { fromBase64, toBase64 } from '@aws-sdk/util-base64-browser' 
 
export default class Realtime extends Extension { 
    get name() { 
        return 'realtime' 
    } 
 
    get defaultOptions() { 
        return { 
            pageDoc: {}, 
            type: null, 
            provider: null 
        } 
    } 
    init() { 
        const ydoc = new Y.Doc() 
 
        if ( 
            typeof this.options.pageDoc.content === 'string' && 
            this.options.pageDoc.content.length !== 0 
        ) { 
            this.options.pageDoc.content =     fromBase64(this.options.pageDoc.content) 
            Y.applyUpdate(ydoc, this.options.pageDoc.content) 
        } 
 
        ydoc.on('update', update => { 
            this.options.pageDoc.content = toBase64(Y.encodeStateAsUpdate(ydoc)) 
            updatePage(this.options.pageDoc) 
        }) 
 
        this.options.provider = new WebrtcProvider( 
            this.options.pageDoc.room, ydoc, { 
                password: this.options.pageDoc.secret, 
                maxConns: 40 + Math.floor(Math.random() * 30), 
            } 
        ) 
        this.options.type = ydoc.getXmlFragment('prosemirror') 
    } 
 
    get plugins() { 
        return [ 
            ySyncPlugin(this.options.type), 
            yCursorPlugin(this.options.provider.awareness), 
            yUndoPlugin(), 
            keymap({ 
                'Mod-z': undo, 
                'Mod-y': redo, 
                'Mod-Shift-z': redo 
            }) 
        ] 
    }

The final code is available at this repo. Finally, we have built a real-time collaborative text editor for company X. You can play around with the demo (Figure 8) at https://we-collab.netlify.app.

Conclusion

In this article, we took a deep dive under the hood of a real-time collaborative text editor to understand its internal mechanisms and moving parts. We explored the real-time collaborative editing solution landscape to provide viable options to build a real-time collaborative text editor for various use cases. We saw firsthand how webRTC can provide both a low-cost and secure option for real-time collaboration for small distributed teams.

Finally, the globally distributed architecture of FaunaDB with strong consistency and low latency provides a unique set of complements to any real-time collaborative editing solution. Let’s not forget FaunaDB’s powerful ABAC feature provided out of the box to implement robust security and expand the creative freedom to architect security solutions for various use cases without any external setup.

Top comments (2)

artydev • Apr 18 '24

Thank you

Matt Buck • Aug 31 '22

Thank you for such a thorough, thoughtful post! This is a really great addition to the list of tutorials on building production-grade, realtime, collaborative text editing.