DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC): Debezium, Logical Replication, and Stream Processing

Change Data Capture (CDC) is a pattern that captures row-level changes in a database and streams them to downstream consumers in real time. Unlike triggers or application-level dual-writes, CDC reads the database's transaction log, adding negligible overhead to production workloads.

Why CDC?

Traditional approaches to synchronizing databases have downsides:

  • Triggers : Add per-row overhead and operate within the transaction, slowing writes.

  • Dual-writes : Writing to two systems (database and cache, or database and search index) from the application is non-atomic; partial failures cause drift.

  • Batch ETL : Hourly or daily jobs introduce latency that many systems cannot tolerate.

CDC solves these problems by making the database transaction log the source of truth and streaming changes as they occur.

PostgreSQL Logical Replication

PostgreSQL's built-in logical replication is the foundation for CDC. Unlike physical replication (which copies entire WAL segments and requires identical servers), logical replication streams decoded changes as row-level operations.

Publisher Setup

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- wal_level must be logical

SHOW wal_level; -- must be 'logical'

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Create a publication

CREATE PUBLICATION cdc_pub FOR TABLE orders, order_items, payments;

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Optionally filter rows

CREATE PUBLICATION cdc_pub_usa FOR TABLE orders WHERE (country = 'US');

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Specify publish operations

CREATE PUBLICATION cdc_pub_inserts FOR TABLE orders

WITH (publish = 'insert');

Subscriber Setup

The subscriber can be another PostgreSQL database or any consumer that speaks the pgoutput protocol:

CREATE SUBSCRIPTION cdc_sub

CONNECTION 'host=primary-db port=5432 dbname=proddb'

PUBLICATION cdc_pub;

Each logical replication slot tracks the WAL position, ensuring no data loss even if the consumer is offline for extended periods.

Debezium

Debezium is an open-source CDC platform built on Apache Kafka. It uses PostgreSQL's logical decoding plugin (pgoutput or decoderbufs) to capture changes and publishes each row change as a Kafka message.

Debezium Connector Configuration

{

"name": "postgres-connector",

"config": {

"connector.class": "io.debezium.connector.postgresql.PostgresConnector",

"database.hostname": "primary-db",

"database.port": "5432",

"database.user": "debezium",

"database.password": "debezium",

"database.dbname": "proddb",

"database.server.name": "my-app",

"plugin.name": "pgoutput",

"slot.name": "debezium_slot",

"table.include.list": "public.orders,public.order_items",

"transforms": "unwrap",

"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",

"decimal.handling.mode": "string"

}

}

Each change event has a standard envelope:

{

"op": "c",

"before": null,

"after": {

"id": 1001,

"user_id":


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)