DEV Community

POTHURAJU JAYAKRISHNA YADAV
POTHURAJU JAYAKRISHNA YADAV

Posted on

πŸš€ Real-Time Data Replication Using MySQL, Debezium, Kafka, and Docker (CDC Guide)

πŸ“Œ Introduction

Modern applications often need data to move between systems in real time β€” analytics platforms, microservices, search indexes, or backup databases.

Traditional approaches like batch jobs or cron-based sync introduce delays, inconsistencies, and operational complexity.

This is where Change Data Capture (CDC) becomes powerful.

In this article, we’ll build a simple but powerful real-time database replication pipeline using:

  • MySQL
  • Debezium
  • Apache Kafka
  • Kafka Connect (JDBC Sink)
  • Docker Compose

By the end, you’ll have a working system that automatically replicates inserts, updates, and deletes from one database to another.


πŸ”₯ What is Change Data Capture (CDC)?

Change Data Capture is a technique used to capture database changes (INSERT, UPDATE, DELETE) and stream them to other systems in real time.

Instead of polling the database repeatedly, CDC reads the database transaction log (binlog in MySQL).

This makes CDC:

βœ… Real-time
βœ… Efficient
βœ… Reliable
βœ… Scalable


❓ Why Do We Need CDC?

Common use cases:

  • Real-time analytics pipelines
  • Microservices data synchronization
  • Data warehousing
  • Cache invalidation
  • Event-driven architectures
  • Search indexing (Elasticsearch)
  • Zero-downtime migrations

Without CDC:

App β†’ DB β†’ Batch Job β†’ Other Systems
Enter fullscreen mode Exit fullscreen mode

With CDC:

App β†’ DB β†’ CDC Stream β†’ Multiple Systems
Enter fullscreen mode Exit fullscreen mode

Much faster and cleaner.


πŸ— Architecture Overview

We will build the following pipeline:

MySQL Source
     ↓ (binlog)
Debezium Connector
     ↓
Kafka Topic
     ↓
JDBC Sink Connector
     ↓
MySQL Target
Enter fullscreen mode Exit fullscreen mode

🐳 Step 1 β€” Docker Compose Setup

We’ll run everything using Docker so the setup is easy and reproducible.

Create a file called docker-compose.yml

services:

  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  mysql-source:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_DATABASE: company
    command: >
      --server-id=1
      --log-bin=mysql-bin
      --binlog-format=ROW
    ports:
      - "3306:3306"

  mysql-target:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: root
    ports:
      - "3307:3306"

  connect:
    image: debezium/connect:2.5
    depends_on:
      - kafka
      - mysql-source
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: connect_configs
      OFFSET_STORAGE_TOPIC: connect_offsets
      STATUS_STORAGE_TOPIC: connect_status
Enter fullscreen mode Exit fullscreen mode

Start everything:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Give it ~30 seconds for services to fully start.


πŸ—„ Step 2 β€” Create Database and Table (Source)

Because MySQL is running inside Docker, we execute commands using Docker.

docker compose exec -T mysql-source mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;

CREATE TABLE IF NOT EXISTS employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
EOF

docker compose exec -T mysql-target mysql -uroot -proot <<EOF
CREATE DATABASE IF NOT EXISTS company;
USE company;

DROP TABLE IF EXISTS employees;

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP NULL
);
EOF
Enter fullscreen mode Exit fullscreen mode

Insert some sample data:

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('John Doe','john@example.com');
"
Enter fullscreen mode Exit fullscreen mode

πŸ”Œ Step 3 β€” Configure Debezium Source Connector

This connector reads MySQL binlog and publishes events into Kafka.

curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
  "name": "mysql-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql-source",
    "database.port": "3306",
    "database.user": "root",
    "database.password": "root",
    "database.server.id": "184054",
    "topic.prefix": "dbserver1",
    "database.include.list": "company",
    "table.include.list": "company.employees",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-changes.company",

    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
  }
}'
Enter fullscreen mode Exit fullscreen mode

Kafka topic created:

dbserver1.company.employees
Enter fullscreen mode Exit fullscreen mode

🎯 Step 4 β€” Configure Sink Connector (Target MySQL)

Create target database first:

docker compose exec mysql-target mysql -uroot -proot -e "
CREATE DATABASE IF NOT EXISTS company;
"
Enter fullscreen mode Exit fullscreen mode

Then create the sink connector:

curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
  "name": "mysql-sink",
  "config": {
    "connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
    "topics": "dbserver1.company.employees",

    "connection.url": "jdbc:mysql://mysql-target:3306/company",
    "connection.username": "root",
    "connection.password": "root",

    "insert.mode": "upsert",
    "delete.enabled": "true",
    "primary.key.mode": "record_key",
    "primary.key.fields": "id",

    "auto.create": "true",
    "auto.evolve": "true",
    "table.name.format": "employees"
  }
}'
Enter fullscreen mode Exit fullscreen mode

⚠️ Important β€” Restart Connector if Needed

Sometimes connectors fail initially due to schema timing or table issues.

Restart connector:

curl -X POST \
"http://localhost:8083/connectors/mysql-sink/restart?includeTasks=true"
Enter fullscreen mode Exit fullscreen mode

Check status:

curl http://localhost:8083/connectors/mysql-sink/status
Enter fullscreen mode Exit fullscreen mode

This is completely normal when working with Kafka Connect.


πŸ§ͺ Step 5 β€” Test Real-Time Replication

Insert

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
INSERT INTO employees(name,email)
VALUES ('Alice','alice@test.com');
"
Enter fullscreen mode Exit fullscreen mode

Update

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
UPDATE employees
SET name='Updated User'
WHERE id=1;
"
Enter fullscreen mode Exit fullscreen mode

Delete

docker compose exec mysql-source mysql -uroot -proot -e "
USE company;
DELETE FROM employees
WHERE id=2;
"
Enter fullscreen mode Exit fullscreen mode

πŸ“₯ Verify Data in Target

docker compose exec mysql-target mysql -uroot -proot -e "
USE company;
SELECT * FROM employees;
"
Enter fullscreen mode Exit fullscreen mode

If data doesn’t appear immediately:

  1. Wait a few seconds
  2. Restart connector
  3. Try again

🧠 How It Works Internally

  1. MySQL writes change to binlog
  2. Debezium reads binlog
  3. Kafka stores event
  4. Sink connector consumes event
  5. JDBC writes to target DB

All within seconds.


⚠️ Lessons Learned

During setup we faced:

  • Connector schema issues
  • Table mapping problems
  • Sink auto-creation failures
  • Connector crashes

Key learnings:

  • Always unwrap Debezium events
  • Pre-create tables in production
  • Restart connector tasks when failed
  • Understand topic naming

πŸš€ Production Best Practices

  • Use Schema Registry
  • Enable monitoring
  • Use DLQ
  • Use migration tools
  • Secure credentials

βœ… Conclusion

CDC enables real-time data movement with minimal overhead.

Debezium + Kafka provides:

  • Scalability
  • Reliability
  • Low latency
  • Event-driven architecture

This pattern is widely used in modern distributed systems.


πŸ™Œ Final Thoughts

If you are working in:

  • Data Engineering
  • DevOps
  • Backend Systems
  • Platform Engineering

Learning CDC is extremely valuable.


⭐ If you enjoyed this article, feel free to connect and share your feedback!

Top comments (0)