POTHURAJU JAYAKRISHNA YADAV

Posted on Feb 20

🚀 Real-Time Data Replication Using MySQL, Debezium, Kafka, and Docker (CDC Guide)

#kafka #mysql #docker #microservices

📌 Introduction

Modern applications often need data to move between systems in real time — analytics platforms, microservices, search indexes, or backup databases.

Traditional approaches like batch jobs or cron-based sync introduce delays, inconsistencies, and operational complexity.

This is where Change Data Capture (CDC) becomes powerful.

In this article, we’ll build a simple but powerful real-time database replication pipeline using:

MySQL
Debezium
Apache Kafka
Kafka Connect (JDBC Sink)
Docker Compose

By the end, you’ll have a working system that automatically replicates inserts, updates, and deletes from one database to another.

🔥 What is Change Data Capture (CDC)?

Change Data Capture is a technique used to capture database changes (INSERT, UPDATE, DELETE) and stream them to other systems in real time.

Instead of polling the database repeatedly, CDC reads the database transaction log (binlog in MySQL).

This makes CDC:

✅ Real-time
✅ Efficient
✅ Reliable
✅ Scalable

❓ Why Do We Need CDC?

Common use cases:

Real-time analytics pipelines
Microservices data synchronization
Data warehousing
Cache invalidation
Event-driven architectures
Search indexing (Elasticsearch)
Zero-downtime migrations

Without CDC:

App → DB → Batch Job → Other Systems

With CDC:

App → DB → CDC Stream → Multiple Systems

Much faster and cleaner.

🏗 Architecture Overview

We will build the following pipeline:

MySQL Source
     ↓ (binlog)
Debezium Connector
     ↓
Kafka Topic
     ↓
JDBC Sink Connector
     ↓
MySQL Target

🐳 Step 1 — Docker Compose Setup

We’ll run everything using Docker so the setup is easy and reproducible.

Create a file called docker-compose.yml

services:

  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  mysql-source:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_DATABASE: company
    command: >
      --server-id=1
      --log-bin=mysql-bin
      --binlog-format=ROW
    ports:
      - "3306:3306"

  mysql-target:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: root
    ports:
      - "3307:3306"

  connect:
    image: debezium/connect:2.5
    depends_on:
      - kafka
      - mysql-source
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: connect_configs
      OFFSET_STORAGE_TOPIC: connect_offsets
      STATUS_STORAGE_TOPIC: connect_status

Start everything:

docker compose up -d

Give it ~30 seconds for services to fully start.

🗄 Step 2 — Create Database and Table (Source)

Because MySQL is running inside Docker, we execute commands using Docker.

docker compose exec -T mysql-source mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;

CREATE TABLE IF NOT EXISTS employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
EOF

docker compose exec -T mysql-target mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;

DROP TABLE IF EXISTS employees;

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP NULL
);
EOF

Insert some sample data:

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('John Doe','john@example.com');
"

🔌 Step 3 — Configure Debezium Source Connector

This connector reads MySQL binlog and publishes events into Kafka.

curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
  "name": "mysql-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql-source",
    "database.port": "3306",
    "database.user": "root",
    "database.password": "root",
    "database.server.id": "184054",
    "topic.prefix": "dbserver1",
    "database.include.list": "company",
    "table.include.list": "company.employees",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-changes.company",

    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
  }
}'

Kafka topic created:

dbserver1.company.employees

🎯 Step 4 — Configure Sink Connector (Target MySQL)

Create target database first:

docker compose exec mysql-target mysql -uroot -proot -e "
CREATE DATABASE IF NOT EXISTS company;
"

Then create the sink connector:

curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
  "name": "mysql-sink",
  "config": {
    "connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
    "topics": "dbserver1.company.employees",

    "connection.url": "jdbc:mysql://mysql-target:3306/company",
    "connection.username": "root",
    "connection.password": "root",

    "insert.mode": "upsert",
    "delete.enabled": "true",
    "primary.key.mode": "record_key",
    "primary.key.fields": "id",

    "auto.create": "true",
    "auto.evolve": "true",
    "table.name.format": "employees"
  }
}'

⚠️ Important — Restart Connector if Needed

Sometimes connectors fail initially due to schema timing or table issues.

Restart connector:

curl -X POST \
"http://localhost:8083/connectors/mysql-sink/restart?includeTasks=true"

Check status:

curl http://localhost:8083/connectors/mysql-sink/status

This is completely normal when working with Kafka Connect.

🧪 Step 5 — Test Real-Time Replication

Insert

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('Alice','alice@test.com');
"

Update

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
UPDATE employees
SET name='Updated User'
WHERE id=1;
"

Delete

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
DELETE FROM employees
WHERE id=2;
"

📥 Verify Data in Target

docker compose exec mysql-target mysql -uroot -proot -e "
USE company;
SELECT * FROM employees;
"

If data doesn’t appear immediately:

Wait a few seconds
Restart connector
Try again

🧠 How It Works Internally

MySQL writes change to binlog
Debezium reads binlog
Kafka stores event
Sink connector consumes event
JDBC writes to target DB

All within seconds.

⚠️ Lessons Learned

During setup we faced:

Connector schema issues
Table mapping problems
Sink auto-creation failures
Connector crashes

Key learnings:

Always unwrap Debezium events
Pre-create tables in production
Restart connector tasks when failed
Understand topic naming

🚀 Production Best Practices

Use Schema Registry
Enable monitoring
Use DLQ
Use migration tools
Secure credentials

✅ Conclusion

CDC enables real-time data movement with minimal overhead.

Debezium + Kafka provides:

Scalability
Reliability
Low latency
Event-driven architecture

This pattern is widely used in modern distributed systems.

🙌 Final Thoughts

If you are working in:

Data Engineering
DevOps
Backend Systems
Platform Engineering

Learning CDC is extremely valuable.

⭐ If you enjoyed this article, feel free to connect and share your feedback!

DEV Community