Dmitry Narizhnyhkh

Posted on Apr 15

MySQL CDC: Real-Time Replication with Binlog (Complete Guide 2026)

#mysql #cdc #database #dataengineering

Most MySQL CDC guides stop at "enable binlog and stream changes".

In practice, that’s not the hard part.

What actually matters shows up once you try to run it in a real system.

What is MySQL CDC?

MySQL Change Data Capture (CDC) is a way to track and stream changes from a database in real time.

Instead of scanning full tables, CDC reads only what changed: inserts, updates, and deletes.

In MySQL, this is typically done using the binary log (binlog), which records every data modification as a sequence of events.

These events can then be applied to another system, keeping it in sync with the source database.

How MySQL CDC Works

At a high level, MySQL CDC is simple:

MySQL writes every change to the binlog
A reader parses those events
Changes are applied to a target system

The binlog is just a sequence of events describing row-level changes.

Everything else — ordering, retries, consistency — is where things get tricky.

Why Use CDC?

Common use cases:

keeping a warehouse in sync
zero-downtime migrations
feeding analytics or search systems

CDC Implementation Methods

In practice, almost all production setups use binlog-based CDC.

Trigger-based and timestamp-based approaches still exist, but they don’t scale well and are rarely used in real systems.

Comparison

Method	Latency	Performance Impact	Complexity	Use Case
Trigger-based	Real-time	High	Low	Small-scale setups
Query-based	Minutes	Medium	Low	Simple polling-based sync
Binlog-based	Milliseconds	Minimal	Medium	Production systems

Configuring MySQL for CDC

Minimum required settings:

SET GLOBAL binlog_format = 'ROW';
SET GLOBAL binlog_row_image = 'FULL';

Create a user with replication privileges:

CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'password';
GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'cdc_user'@'%';

Other MySQL CDC Tools

Most CDC setups fall into a few categories:

Debezium — log-based CDC, but requires Kafka
Airbyte — connector-heavy, mostly batch
Fivetran — managed SaaS, usage-based pricing
AWS DMS — migration-focused, AWS-centric

Each solves part of the problem, but often requires combining multiple tools.

Full comparison:

How It Looks in Practice

In real setups, CDC is not configured via JSON.

The typical flow is:

create a source connection
create a target
start a CDC stream

The system handles binlog parsing, ordering, and delivery.

Step-by-step guide:

Start and Monitor the Stream

After setup, starting CDC is just one action.

From there, the system continuously reads binlog events and applies them to the target.

What matters in practice:

replication lag
throughput
failure handling

Most problems don’t come from setup — they show up while the stream is running.

Common Challenges

Initial data load

CDC only captures changes going forward.

That means existing data has to be copied before CDC starts.

For large tables, the typical approach is:

run a one-time bulk load first
then switch to CDC for ongoing changes

Skipping this step often leads to lag, gaps, or inconsistent data between source and target.

Testing and rollout

Before running CDC on the full dataset, it’s common to:

start with a few tables
run the stream for a limited time
verify consistency

This helps catch issues early without affecting production systems.

Throughput and latency

Throughput depends heavily on network conditions.

In high-latency environments, batching becomes important to avoid excessive round trips.

Most systems expose this as a configurable parameter, but defaults are usually enough to get started.

FAQ

Can MySQL CDC capture schema changes?
Yes. Binlog-based CDC captures DDL events if configured correctly.

What MySQL version is required?
MySQL 5.7+ works, but 8.0+ is recommended for production.

Does CDC impact performance?
Binlog-based CDC has minimal impact. Trigger-based approaches can slow down writes.

Does CDC work with cloud databases?
Yes. AWS RDS, Google Cloud SQL, and Azure Database all support it.

How do you handle schema changes?
Schema changes are one of the trickier parts. Most setups require coordination and sometimes stream reconfiguration.

Summary

MySQL CDC itself is straightforward:

read binlog → apply changes

The complexity comes from everything around it:

initial data load
consistency during replication
monitoring and recovery

Different tools mostly differ in how much of that they handle for you.

That’s where most CDC implementations either stay simple or become a mess.

Try it yourself

The fastest way to understand CDC is to run it.

Create your first stream:

Runs as a desktop app (Windows, macOS, Linux) or via Docker.
MySQL → PostgreSQL, S3, or files, real-time sync, no Kafka.

Originally published at:
https://streams.dbconvert.com/blog/mysql-change-data-capture/

Top comments (1)

Naif Amoodi • Apr 30

Good post. Setting up binlog is usually the easy part. The harder part is keeping sync stable once it is running for real, especially when schema changes happen. Initial load and live sync together can get messy pretty fast.