DEV Community

Cover image for 🪄 Debezium: the magic behind data capture & async replication (for free)
adriens
adriens

Posted on

🪄 Debezium: the magic behind data capture & async replication (for free)

🪝 Teaser

Did you ever find yourself in a situation where :

  • Team 🇦 pushes data in a given database (let's say MySQL,...) with its very own custom software
  • Team 🇧 needs to get these data changes (INSERT, UPDATE, DELETE) as events so they can put them in let's say... an another database (MariaDB, PostgreSQL,...)
  • Base software cannot be changed : you have to "deal with it"

Eg, team B's motivation maybe to achieve datascience, RealTime Analytics, store in a datalake,...

👉 This blog post is dedicated to this case... and surprisingly : open source solutions do exist to achieve this magic!

🤔 About the "why"

Debezium Project's "why" is pretty straightforward :

"Turn your databases into change event streams"

... even for "legacy" like systems:

👂 How it does NOT work (why it's awesome)

The key thing here to remind is that Debezium does NOT act as a proxy in front of the database, and that's the most elegant part.

The key point is that Debezium is literally listening to database changes, whatever you call them :

, then send these events in a common standard format into Kafka messages... waiting to be used later by one or many consumers.

🪄 How it works

The magic resides in the following workflow :

  1. Capture data changes at the database level (WAL for postgres, archivelogs, whatever you call them...)
  2. Send/Stream events to Kafka
  3. Consume Kafka events so they they can be pushed to any third party data service 3'. JDBC : for example "consume events from multiple source topics, and then rite those events to a relational database by using a JDBC driver."

Image description

🍿 Demo from scratch

Below the live demo I was able to do, from scratch, but by following default instructions for a MySQL instance :

🔭 Going further

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (2)

Collapse
 
adriens profile image
adriens

Very nice in-depth post :

Collapse
 
adriens profile image
adriens

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay