DEV Community: Zak

Database generated events: LiveSync’s database connector vs CDC

Zak — Thu, 20 Jun 2024 08:34:06 +0000

Ably LiveSync is a product we launched last month to help developers deliver live updates in their applications by automatically keeping their database and frontend clients in sync.

LiveSync is made of two components, the Models SDK that runs on the client, and the database connector that listens to changes in your database and syncs those changes to your clients.

When we talk about ‘listening to your database’, we’re often referring to Change Data Capture (or CDC). In CDC a component listens to your database and distributes events representing the changes that have happened. Normally the events that are created by CDC represent the changes to each individual underlying row.

In this post, we’re going to discuss how the LiveSync database connector works, and how and why it is different from CDC.

Change data capture (or CDC) is a mechanism of listening to the changes in your database and sharing these (typically over a queue) with other consumers. It allows other participants in your system to react to changes that have happened in one part of the system. A classic example is in some ‘event driven’ architecture where an email service will send a confirmation email when some e-commerce ‘order’ is created. Typically the changes that are shared as events by CDC represent changes to each individual underlying database table. For example, a change of email address for a user would have the user record’s ID, and email address before and after the change.

LiveSync’s database connector is different from traditional CDC, and we made some intentional choices about how and why it should be different. Our database connector is based on the ‘outbox pattern’, where a specific table is used to share database changes with the connector. Rows written to the outbox table have a ‘data’ field where the content of that column will be the payload of the event that’s generated and shared over Ably channels by the LiveSync database connector. The rows in the outbox also let you choose which Ably channel a record should be sent to.

So given both CDC and the LiveSync database connector generate events to be shared with some consumers, what’s the difference between the two? Well it’s all about encapsulation and control; two things we wanted to make sure LiveSync baked in, which you might not get with traditional CDC.

In traditional CDC:

Joins are hard: Typically CDC events are generated per-table, this means that for any frontend data model that’s made of more than one database table, where you’d normally be writing a join in your database query, it’s really hard to work with in CDC. Events representing changes to two different tables have to be glued back together after they have been generated by a CDC system (perhaps using transaction ID metadata attached to the events).
Encapsulation is hard: Because traditional CDC operates on the database table level, it’s far too easy to accidentally share some internal data model details with the consumers of your CDC events. Lots of CDC systems do allow you to filter the fields included in the CDC events, but it’s still an easy mistake to make.
Access control is hard: It’s hard to encode into CDC which consumers should get access to which pieces of data. Typically all changes for a specific table go into a specific queue. Some CDC products do allow the partitioning of events into separate queues or topics based on a field of the changed row, but this isn’t always enough. Especially when you need data from a different table to enforce your access requirements (see: ‘joins are hard’). You end up having to create some filtering or mapping somewhere that can be queried server-side to work out if a client is allowed to see a row. (This is exactly what Supabase Realtime does, where the database has to be re-queried on each change event to enforce Row Level Security).

When we built the LiveSync database connector we wanted to make it easy to work with database generated events, and solve the things that made it hard to work with CDC. LiveSync is different firstly because it uses the ‘outbox pattern’. Using the outbox table, you get to control the content of the events that are shared by the database connector. You also get to control which Ably channel those events are published on. These two differences solve a bunch of the problems with traditional CDC.

You can use channels to help with access control: Ably’s capabilities mechanism allows you to authorise clients to only perform specific operations (publish, subscribe, etc) on specific channels. This allows you to publish only the information relevant to those clients on those channels. And because you get to control the channel a database event is written to at the time the event is created, you can dynamically control which clients get access to which data.
You can expose only the data the clients need: Because you get to specify the content of the event that’s written to an Ably channel, you can publish the exact content that the frontend needs. This solves the problems that CDC has with joining events from multiple tables, and solves the oversharing that can happen with CDC. So enforcing the right encapsulation and joins on the data is much easier.

Traditional CDC products can be faster to ‘plug and play’, as they will connect to your database without any changes to your application and start sharing database changes. But there’s a trade-off, enforcing the access control, encapsulation, and joins that your application needs is much harder.

In LiveSync we make these problems easy, by giving you an outbox table allowing you to choose which events are published, where those events are published, and the content those events carry. You can write events to the outbox transactionally along with the other changes in your database to ensure that the publish of messages to Ably and changes in your database either both happen, or neither happen. The outbox table along with the database connector guarantees that events written to the outbox table will appear in Ably, and will retain their order within each channel you’re writing to.

Get started with LiveSync now

Sign-up for a free account and dive into our docs to give LiveSync a try. If you are interested in being an alpha tester, or would like to provide feedback on the product, please get in touch - we’d love to collaborate!

Reliably syncing database and frontend state: A realtime competitor analysis

Zak — Thu, 16 May 2024 07:57:25 +0000

Ably’s LiveSync product is now available as a public alpha release. LiveSync allows you to automatically keep data in sync across servers and clients, in realtime.

This is a hard distributed systems problem. In this post, we’re going to have a look at four existing solutions to realtime data synchronization, the different technical designs, and the tradeoffs they make. Then, at the end of the post I’ll describe the set of design choices we’ve taken at Ably for our LiveSync product.

But first, what are others up to?

1. Supabase

Supabase has a feature called Postgres Changes, which is part of their realtime product offering. It allows you to listen to changes that happen in a single table, and send those changes to clients that are allowed to receive them.

As you INSERT, UPDATE, and DELETE records in your Postgres instance, Supabase receives these changes via the Postgres Streaming Replication Protocol, and sends the changes to the subscribed clients over websockets.

Supabase also bundles auth through Row Level Security (RLS). This is a feature where clients get individual tokens, and those tokens grant access to specific rows in a Postgres table. The Postgres Changes feature integrates with RLS.

Things to consider:

Supabase WebSockets do not have history or guaranteed delivery: So you’re not guaranteed to receive all the changes made to your database, and you can’t ‘rewind’ to get those changes if you do notice you’ve missed them. Even temporary disconnections from clients can result in you missing messages, and you have to do quite a lot of work to protect against this.
Mis-managed replication slots can tank your whole database: The architecture used by Supabase Postgres Changes involves two components: your Postgres server and a Realtime server. The Realtime Server consumes database changes from the Postgres replication slot. The replication slot stores all the changes that are happening in the database and once those changes have been processed by the consumer of the replication slot the data representing that change can be deleted from the replication slot by Postgres. This means that if the Realtime server can’t consume from the slot fast enough (faster than Postgres is writing to the slot), then the data stored in that replication slot will continue to grow. This replication slot shares disk space with your database. Eventually, the replication slot can consume all the disk space of your database. Once this happens, Postgres will reject new connections and new writes to the database.
RLS adds a lot of queries to your database when using Postgres changes for fan-out: The Supabase RLS feature delegates authentication and access control to the built-in Postgres Row Security Policies feature. In most request-response queries to the database, a single user is making some queries and Postgres is checking if that user’s auth token determines that they can access those specific rows. However, one of the common uses of the Postgres Changes feature is to fan-out database changes to multiple clients over WebSockets. To implement RLS for these fan-out changes, Supabase will re-query the row that has changed using each client’s auth token. That means if you have 1000 clients listening for changes in your Postgres table, then a single INSERT, UPDATE, or DELETE operation on that table will result in Supabase generating 1000 queries against the database (one for each client) to fetch the changed row and check whether access is allowed.
Postgres Changes only works for a single table at a time: That is, you can subscribe to multiple tables, but events in different tables are distinct from one-another. The data that your clients need is probably spread across a bunch of normalized database tables, and probably relies on a set of joins to create the entity that the client uses. However, Supabase will only send the changes that are happening to individual database tables. One option is to implement logic in the client to materialize the result of a query from the stream of changes to individual tables, which is exceedingly difficult to implement. A simpler solution, sometimes known as a poke/pull pattern, is to use Postgres Changes to fan-out a change notification to clients who then individually query for the new data from the database. This causes all your clients to request data at the same time, creating a spike request load.

2. Liveblocks

Liveblocks’ products are based on a series of CRDT building blocks (LiveObject, LiveMap, LiveList). These objects are backed by some opaque storage managed by Liveblocks. The changes are shared between the server and clients over WebSockets, which handle automatic reconnection and fetching older data. Thanks to the properties of the underlying CRDT, the order of message delivery doesn’t matter.

By building on CRDTs, Liveblocks ensures a high level of data integrity for multiplayer, collaborative applications where multiple users make changes to a shared object in realtime.

Things to consider:

Liveblocks stores and controls your data but getting a copy to your server is hard: There is no mechanism by which Liveblocks can push changes into your datastore. In order to reflect a copy of the Liveblocks data into your own server or database, you can pull the state of the object via the Storage API. This can be used in conjunction with a Storage Updated Event webhook to obtain the new state when changes are made, which is sent at most every 5 seconds. Therefore, due to the additional network round trips and throttled change notifications, there may be significant latency in reflecting changes from Liveblocks to your own servers.
Source-of-truth: In order to implement the CRDT building blocks, Liveblocks maintains control over the source-of-truth copy of your data. Many applications may expect the database to represent the source-of-truth in order to provide other application functionality. Therefore Liveblocks is best suited for the truly collaborative parts of your application state in which eventual consistency with your database is acceptable.
Limited cross-platform support: Liveblocks ships a set of client libraries that allow you to interact with the underlying CRDT storage primitives. The libraries are open-source, but don’t cover a broad set of languages and currently only target the JavaScript ecosystem.

3. ElectricSQL

ElectricSQL is bi-directional database sync between a Postgres server and SQLite running on the clients. Changes made in your Postgres database are fanned out to the clients, and changes made in each client’s SQLite are reflected back into the Postgres Server. ElectricSQL makes this work with a series of client libraries, and a backend component called the Electric Sync Service that runs against your Postgres Server. The Sync Service, similar to Supabase Postgres Changes, also makes use of logical replication to receive outbound changes from the database, but additionally the database consumes changes inbound from the Sync Service. The SQLite databases running on the clients are kept in sync with the Sync Service over WebSockets.

ElectricSQL offers a novel approach to data synchronization and provides some powerful tooling for building local-first applications.

Things to consider:

Tight coupling between the Sync Service and your database: ElectricSQL’s design means that the Sync Service is quite tightly coupled to your database. An implication of this, for example, is that migrations have to be applied to your database via the Migrations Proxy.
Limited query language expressivity: ElectricSQL queries are table-scoped, which limits the complexity of the queries you can write.
Sync lags for one-to-many relations: ElectricSQL queries allow you to request data from related tables, but only many-to-one relations will be synced at the same time as the parent row. Other relations can be ‘included’, but will be synced sometime after. ElectricSQL calls this ‘move-in lag’. For example in an app with projects and issues, if you sync ‘active’ projects and a project becomes active then the project row (and many-to-one relations like project author) will be synced at the same time, but the issues (one-to-many) won’t immediately appear. So your app would show a project with zero issues until the issues are synced by ElectricSQL.
Conflict resolution semantics: Local writes are subject to concurrent merge semantics of the underlying CRDTs that are used under-the-hood, meaning that updates might be overridden by the semantics of the conflict resolution logic, which may not preserve the original intent of the operation performed by the user.
There’s no permissions control: At the time of writing, ElectricSQL doesn’t have an access control or permissions system, although it is on the roadmap.

4. Replicache

Replicache is a client library that can be integrated with your own backend to synchronize database changes to your frontend clients. It operates on the principle of Server Reconciliation, an approach which delegates the resolution of conflicting concurrent changes to the database (or server) such that all clients reflect the authoritative state on the server, rather than relying on the properties of CRDTs as Liveblocks and ElectricSQL do.

To make Server Reconciliation work, the clients make requests to tell the server the change they want to make (and not the result of the change they have made, like you might in a REST API). The request to the server could be “increment by 1” rather than the result of the increment operation that a client might have applied to its own local state. The server then applies that change, and then sends a change notification to clients. The change notification (or “poke”) is delivered to the client over a WebSocket, and on receipt of the poke the client must then request any changes it hasn’t yet received from the server. To fulfill this request, the server must compute the set of changes on a per-client basis such that each client receives only the data that it is missing.

Replicache has a hosted solution called Reflect that takes away the significant complexity of building and managing the associated backend.

Things to consider:

Backend complexity: There is significant complexity in correctly implementing your backend in order to satisfy the requirements of the client library. Depending on the needs of your application, Replicache offers guidance for the different strategies you can implement to enable server-client synchronization.
Storing the complete change history: In order to send each client the specific set of changes that client hasn’t received, the backend needs to store the full change history that has happened. There’s no guarantee how up-to-date any specific client is, so the full change history is required so that lagging clients can be caught back up. For any reasonably fast-changing entity, the server will very quickly end up with a lot of changes it has to store. Additionally, some strategies require you to use soft deletes, meaning that entities cannot be fully removed from your database.
Storing per-client state: Depending on the backend strategy used, the backend may need to store a pointer for each client that represents where in the change history the client thinks it is up to. The pointer is used to catch-back-up clients that have forgotten where they got to. Replicache’s design means you have to store a lot of extra data in the database.
Thundering herd: On delivery of a poke to connected clients, all clients must request their missing changes from the backend. This creates a spike in load on your database and server which cannot be cached as each client requires a customized response based on where it is in the change history.

In each of these designs there are a series of technology and design trade-offs. Each one will be better or worse depending on the use case it’s applied to. With LiveSync, we’re trying to make the best choices for the most use cases, minimizing developer effort and offering a high degree of flexibility and data consistency.

Ably LiveSync

LiveSync allows you to reliably synchronize changes that occur in your database to frontend clients. LiveSync is a flexible solution designed to work with your database of choice and supports incremental adoption in existing applications.

LiveSync borrows concepts from Event Driven Architectures to offer a flexible and reliable approach to data synchronization. It is built on top of Ably Pub/Sub Channels, our highly available, low latency edge messaging product. This means that it automatically benefits from automatic connection recovery as well as features like rewind and history to make sure you don’t miss messages. You can also leverage Ably capabilities to control what data changes clients can access.

LiveSync provides a client library called the Models SDK which makes it easy to define data models in your frontend that are automatically kept up to date with changes made on your backend. These changes are automatically distributed to clients via the Database Connector, which is available as a hosted solution on the Ably platform or as a container image which you can operate yourself. The Models SDK helps merge those changes easily and provides hooks for optimistic updates allowing for a snappy user interface.

Synchronizing changes

LiveSync uses Ably Pub/Sub Channels to deliver the stream of changes from the database to connected clients. Each change is assigned a number that describes its position in the sequence of changes, called a sequenceId. Ably stores a history of the stream of changes on the channel so your backend doesn’t need to. The Models SDK tracks the sequence of changes the client has received and automatically rewinds on the channel to ensure that it doesn’t miss any changes.

Applications built on LiveSync make use of the transactional outbox pattern to reliably deliver an ordered sequence of changes from your database to clients. When you make a change in your database, you transactionally write an event to the outbox representing that change. You choose the content of the event, and the change will be reliably fanned out in-order to all clients and merged with your frontend application’s state via the Models SDK. Once the event is successfully published to a channel, the Database Connector will clean up the data it no longer needs from the outbox table, so your database doesn’t have to store change history or extraneous data.

The outbox table uses a monotonic serial number to automatically assign each outbox event with a sequenceId. Since the event published to the outbox is written transactionally with the changes to your own database tables, all the changes end up with an exact and total order defined by the sequenceId. The Database Connector and Models SDK ensure that these changes are processed by your frontend clients exactly once in the correct order.

You can integrate LiveSync with your existing backend API by extending your database writes to transactionally write change events to the outbox table. Since you determine what data is included on the change event and how it is interpreted by the client, you can make changes to models that span multiple tables and represent that data however you like on the client.

Optimistic updates

In LiveSync, your database is the source-of-truth: in effect, LiveSync is implementing a form of Server Reconciliation in which the server decides the resulting state of some updates and fans those changes out to clients.

There can be a short delay between a client making a change and the client receiving the associated change event from the backend. To make this easy to work with, the Models SDK supports optimistic updates, where the frontend client can make an update optimistically and the result of that update is immediately reflected on that client only. When the update is confirmed by the backend the optimistic update is replaced with the confirmed result delivered to the client. If the backend decides to reject the optimistic update, then the optimistic update is automatically rolled back in the client that requested it.

Bootstrap

The final piece of the puzzle is bootstrapping a brand new client from scratch. You probably already have a REST API that you use to fetch an entity from your backend. You can use your existing endpoint to fetch the data used to initialize the model in the frontend. This endpoint can be extended to return one additional piece of information: the sequenceId. The Models SDK automatically uses the sequenceId to rewind to the correct position in the change stream on the Ably channel, so that your frontend receives exactly the changes it’s missing and no more.

You don’t have to manage the sequenceId, because it’s automatically managed by the monotonic serial on the outbox table. You can trivially obtain it by transactionally reading the largest sequenceId currently in the outbox table and return this along with your model data.

Conclusion

LiveSync delivers changes in your database to clients in realtime over WebSockets, reliably and in-order, over Ably’s global edge messaging infrastructure. It can be integrated with your existing application and database. Ably channel history allows you to avoid storing change history or per-client state in your database. Your database remains the source-of-truth for your data.

LiveSync is now available in public alpha. If you would like to try it out, please check out the documentation. If you have any feedback, please get in touch.

You don't need CRDTs for collaborative experiences

Zak — Wed, 22 Nov 2023 14:13:42 +0000

This post was originally published by Senior Engineer Zak Knill on zknill.io.

You don’t need CRDTs

You don’t need CRDTs for collaborative experiences.

First lets get the ‘what-about-ery’ out the way…

Things you do need CRDTs for:

Offline first – this is wayy harder to get useful behaviour with out CRDTs. If you don’t use them, you’re pretty much destined to have LWW (which is actually a CRDT behaviour), and one user is likely to overwrite the changes of another. This isn’t a great experience for anyone involved.
Text editing – everyone’s gonna say “but hey, Google docs uses operational transform not CRDTs”.. OK yes, but you are not google. Martin Kleppmann has a great round-up of the various people who though they implemented OT correctly, but actually didn’t. The reason that you need CRDTs for text editing collaboration is that it’s a really extreme example of collaboration. The nature of text editing is that any tiny errors in the placement of characters by the convergence algorithm is going to create incorrect words, and incorrect words are incredibly obvious. Text editing has a high rate of edits (as you type), and the edits need to interleave perfectly or you get incorrect words, and errors in the interleaving are super obvious (incorrect words)!

Hold, on.. that all sounds great, but..

Maybe I want to use a CRDT even if I don’t strictly need to? OK, yes, maybe you do.

But CRDTs are not without their downsides:

Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand. There’s a bunch of magic that CRDT library authors are doing with clever compression techniques to make this problem less-bad, but it’s basically in-escapable. The size of your CRDT state is not purely a function of the size of the state the CRDT represents, but also of the number of updates that state has gone through.
Complex implementations: CRDTs are easy to implement wrong, so probably don’t roll your own. Instead you’re going to end up using one of the main libraries that are growing in maturity and popularity. These libraries solve the hard internal-CRDT problems for you, like converging updates, and compressing state, etc. But in using one of these, you’re locked into that technology (generally Javascript).
Opaque state: Because the CRDT has to represent both the underlying state and the updates that led to that state, in order for it’s convergence algorithm to work properly, you’re generally left with an opaque blob of binary encoded data.

You can’t inspect your model represented by the CRDT without using the CRDT library to decode the blob, and you can’t just store the underlying model state because the CRDT needs its change history also. You’re left with an opaque blob of data in your database. You can’t join on it, you can’t search it, you can’t do much without building extra features around that state blob.

So maybe you are convinced that CRDTs are not the be-all-and-end-all of collaboration, and that you aren’t in one of the two categories where you probably should use a CRDT, and you’ve made it this far in the post.

What you need instead of CRDTs

Now, let me explain why you don’t need CRDTs, and what you do need instead:

Contextual information – the ability to show where a collaborator is in the collaborative environment; who is engaging with the page, which element they’ve selected, and where their cursor is.
Locking for safety – the ability to lock them smallest possible individual components or elements of the page to stop conflicting updates from happening.
Small-scale updates – the ability to update only the information that has changed, and no more.
Realtime fan-out of updates – the ability to share the update a user has made, in realtime, with all the other collaborators in that environment.

Remind me again why this is better? Well, you get realtime collaborative experiences in a data format/structure that you own and control, without suffering any of the downsides of CRDTs. TL;DR - because it works!

Examples of collaboration without CRDTs

I’ll run through a bunch of broad categories of applications, and describe how to make use of these features.

Form builders: Google Forms, Typeform, Attest, etc

Typically lots of smaller individual inputs. You’re unlikely to want two users collaborating on the same form-box at once.

How to make it collaborative:

Lock the individual inputs/elements that a user is engaging with.
Add Contextual information to show which user has locked which element.
Add Contextual information to allow users to indicate to each other with their cursors.
Have a mechanism to update only the form element that has changed, so that two users making updates at the same time do not overwrite each other
Have a mechanism to fan-out the update that each user has made to the other user’s in realtime, so that each can see the other’s changes as soon as the input is unlocked.

Task management: Jira, Shortcut, Trello, Linear

Typically lots of ‘cards’ or ’tasks’ with various properties: status, owner, title, description, project, epic, etc.

How to make it collaborative:

Lock the individual inputs/elements that a user is engaging with.
Add Contextual information to show which user has locked which element.
Have a mechanism to update only the form element that has changed, so that two users making updates at the same time do not overwrite each other
Have a mechanism to fan-out the update that each user has made to the other user’s in realtime, so that each can see the other’s changes as soon as the input is unlocked.

Spreadsheets: Google sheets, Airtable

Typically a ui of rows and columns. Sometimes the records can expand to show inner-details of that row, similar to the task management apps.

How to make it collaborative:

Lock the individual inputs/elements that a user is engaging with. For Google Sheets; the individual cells. For Airtable, the individual elements on the record.
Add Contextual information to show which user has locked which element.
Have a mechanism to update only the form element that has changed, so that two users making updates at the same time do not overwrite each other
Have a mechanism to fan-out the update that each user has made to the other user’s in realtime, so that each can see the other’s changes as soon as the input is unlocked.

Picture drawing, presentation and whiteboards: Miro, Figma, Google Slides

Typically free-form drawing combined with preset shapes, stickies, note and text boxes, and more.

How to make it collaborative:

Lock the individual elements that a user is engaging with, it doesn’t really matter if two users draw over the same part of the canvas. It only matters if two users try and change the same element at once.
Add Contextual information to show which user has locked which element.
Add Contextual information to allow users to indicate to each other with their cursors.
Have a mechanism to update only the form element that has changed, so that two users making updates at the same time do not overwrite each other
Have a mechanism to fan-out the update that each user has made to the other user’s in realtime, so that each can see the other’s changes as soon as the input is unlocked.

Design patterns

You can start to see a few things here:

The pattern for building collaborative experiences is almost exactly the same across all of these types of applications that don’t need CRDTs,
Many of the applications that you know from these broad categories operate in exactly this way. A few examples; cell locking in Google Sheets, post-it locking in Miro and Figma, element locking in Google Slides.