DEV Community

Cover image for Debezium: Open-Source Technology for Change Data Capture
Mensah Alkebu-Lan for Universal Equations

Posted on

Debezium: Open-Source Technology for Change Data Capture

Introducing Debezium

Debezium is an open-source tool for Change Data Capture (CDC). It does this as a distributed platform that turns data stores into event streams. This allows you to focus only on data that has changed instead of all data. For example, it enables you to keep disparate data systems in continuous sync and to respond quickly to new information.

Built on top of Apache Kafka & Kafka Connect, a traditional "Debezium-centric" architecture comprises source and sink data stores with a selection of Kafka Connect compatible connectors able to communicate changes taking place in the source data store to the sink data store.

If you're comfortable with Docker and Kubernetes, a basic Debezium setup should only be moderately difficult. The main ingredients are Apache Kafka and Kafka Connect. Fortunately, in Debezium's public Docker repo, there are Zookeeper, Kafka, and Kafka Connect images ready to go. If you can get Apache Kafka with Kafka Connect up & running, you have several Debezium deployment options to choose from. Let's go over a few.

Some Kafka Connect Deployment Options

Since Kafka Connect is Debezium's foundation, for the time-being, we'll focus mainly on Kafka Connect deployment options.

Apache Camel and Debezium are a natural fit since Camel is known for its ability to integrate data consumer and producer systems. We also can't lose sight of the fact both technologies are strongly supported by Red Hat.

You can also use Debezium in Google Cloud with technologies like PubSub and DataFlow. In this architecture, PubSub plays a similar role as Apache Kafka in a traditional setup, but you'll still need some knowledge of the Kafka Connect libraries.

In a traditional 'Debezium-centric' architecture, a relational database is used for the sink data store, but there are use cases such as database caching where an in-memory computing platform like GridGain can be used.

Before I forget, I have to thank Gunnar Morling at Red Hat and his team for all the work they are putting into Debezium.

References:

  1. Extending Kafka connectivity with Apache Camel Kafka connectors. https://developers.redhat.com/blog/2020/05/19/extending-kafka-connectivity-with-apache-camel-kafka-connectors/. Last accessed: 7/16/2020.
  2. How do I move data from MySQL to BigQuery? https://cloud.google.com/blog/products/data-analytics/how-to-move-data-from-mysql-to-bigquery. Last: 7/16/2020.
  3. Change Data Capture Between MySQL and GridGain With Debezium. https://www.gridgain.com/resources/blog/change-data-capture-between-mysql-and-gridgain-debezium. Last accessed: 7/21/2020.
  4. Change data capture in Postgres: How to use logical decoding and wal2json. https://developers.redhat.com/blog/2020/05/19/extending-kafka-connectivity-with-apache-camel-kafka-connectors/. Last accessed: 7/21/2020.

Top comments (0)