π Introduction
Modern applications often need data to move between systems in real time β analytics platforms, microservices, search indexes, or backup databases.
Traditional approaches like batch jobs or cron-based sync introduce delays, inconsistencies, and operational complexity.
This is where Change Data Capture (CDC) becomes powerful.
In this article, weβll build a simple but powerful real-time database replication pipeline using:
- MySQL
- Debezium
- Apache Kafka
- Kafka Connect (JDBC Sink)
- Docker Compose
By the end, youβll have a working system that automatically replicates inserts, updates, and deletes from one database to another.
π₯ What is Change Data Capture (CDC)?
Change Data Capture is a technique used to capture database changes (INSERT, UPDATE, DELETE) and stream them to other systems in real time.
Instead of polling the database repeatedly, CDC reads the database transaction log (binlog in MySQL).
This makes CDC:
β
Real-time
β
Efficient
β
Reliable
β
Scalable
β Why Do We Need CDC?
Common use cases:
- Real-time analytics pipelines
- Microservices data synchronization
- Data warehousing
- Cache invalidation
- Event-driven architectures
- Search indexing (Elasticsearch)
- Zero-downtime migrations
Without CDC:
App β DB β Batch Job β Other Systems
With CDC:
App β DB β CDC Stream β Multiple Systems
Much faster and cleaner.
π Architecture Overview
We will build the following pipeline:
MySQL Source
β (binlog)
Debezium Connector
β
Kafka Topic
β
JDBC Sink Connector
β
MySQL Target
π³ Step 1 β Docker Compose Setup
Weβll run everything using Docker so the setup is easy and reproducible.
Create a file called docker-compose.yml
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
mysql-source:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: company
command: >
--server-id=1
--log-bin=mysql-bin
--binlog-format=ROW
ports:
- "3306:3306"
mysql-target:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: root
ports:
- "3307:3306"
connect:
image: debezium/connect:2.5
depends_on:
- kafka
- mysql-source
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_status
Start everything:
docker compose up -d
Give it ~30 seconds for services to fully start.
π Step 2 β Create Database and Table (Source)
Because MySQL is running inside Docker, we execute commands using Docker.
docker compose exec -T mysql-source mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;
CREATE TABLE IF NOT EXISTS employees (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
EOF
docker compose exec -T mysql-target mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;
DROP TABLE IF EXISTS employees;
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP NULL
);
EOF
Insert some sample data:
docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('John Doe','john@example.com');
"
π Step 3 β Configure Debezium Source Connector
This connector reads MySQL binlog and publishes events into Kafka.
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql-source",
"database.port": "3306",
"database.user": "root",
"database.password": "root",
"database.server.id": "184054",
"topic.prefix": "dbserver1",
"database.include.list": "company",
"table.include.list": "company.employees",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-changes.company",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}'
Kafka topic created:
dbserver1.company.employees
π― Step 4 β Configure Sink Connector (Target MySQL)
Create target database first:
docker compose exec mysql-target mysql -uroot -proot -e "
CREATE DATABASE IF NOT EXISTS company;
"
Then create the sink connector:
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "mysql-sink",
"config": {
"connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
"topics": "dbserver1.company.employees",
"connection.url": "jdbc:mysql://mysql-target:3306/company",
"connection.username": "root",
"connection.password": "root",
"insert.mode": "upsert",
"delete.enabled": "true",
"primary.key.mode": "record_key",
"primary.key.fields": "id",
"auto.create": "true",
"auto.evolve": "true",
"table.name.format": "employees"
}
}'
β οΈ Important β Restart Connector if Needed
Sometimes connectors fail initially due to schema timing or table issues.
Restart connector:
curl -X POST \
"http://localhost:8083/connectors/mysql-sink/restart?includeTasks=true"
Check status:
curl http://localhost:8083/connectors/mysql-sink/status
This is completely normal when working with Kafka Connect.
π§ͺ Step 5 β Test Real-Time Replication
Insert
docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('Alice','alice@test.com');
"
Update
docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
UPDATE employees
SET name='Updated User'
WHERE id=1;
"
Delete
docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
DELETE FROM employees
WHERE id=2;
"
π₯ Verify Data in Target
docker compose exec mysql-target mysql -uroot -proot -e "
USE company;
SELECT * FROM employees;
"
If data doesnβt appear immediately:
- Wait a few seconds
- Restart connector
- Try again
π§ How It Works Internally
- MySQL writes change to binlog
- Debezium reads binlog
- Kafka stores event
- Sink connector consumes event
- JDBC writes to target DB
All within seconds.
β οΈ Lessons Learned
During setup we faced:
- Connector schema issues
- Table mapping problems
- Sink auto-creation failures
- Connector crashes
Key learnings:
- Always unwrap Debezium events
- Pre-create tables in production
- Restart connector tasks when failed
- Understand topic naming
π Production Best Practices
- Use Schema Registry
- Enable monitoring
- Use DLQ
- Use migration tools
- Secure credentials
β Conclusion
CDC enables real-time data movement with minimal overhead.
Debezium + Kafka provides:
- Scalability
- Reliability
- Low latency
- Event-driven architecture
This pattern is widely used in modern distributed systems.
π Final Thoughts
If you are working in:
- Data Engineering
- DevOps
- Backend Systems
- Platform Engineering
Learning CDC is extremely valuable.
β If you enjoyed this article, feel free to connect and share your feedback!
Top comments (0)