DEV Community

Cover image for How Debezium Uses PostgreSQL WAL for Change Data Capture
Mohamed Hussain S
Mohamed Hussain S

Posted on

How Debezium Uses PostgreSQL WAL for Change Data Capture

This article is part of a series on PostgreSQL WAL.

Part 1 — How PostgreSQL WAL Actually Works  
Part 2 — How Debezium Uses PostgreSQL WAL for CDC  
Part 3 — PostgreSQL Backups with pgBackRest and PITR
Enter fullscreen mode Exit fullscreen mode

In the previous article, we've explored how PostgreSQL uses Write-Ahead Logging (WAL) to ensure durability and recover from crashes.

If you haven't read it yet, you can check it out here:

How PostgreSQL WAL Actually Works (And Why Everything Depends on It)

But WAL is not just an internal recovery mechanism.

Because WAL records every change that happens inside the database, it also enables systems to stream those changes in real time.

This is exactly how Change Data Capture (CDC) systems work.

Instead of polling tables repeatedly, CDC tools read database changes directly from WAL and turn them into events.

One of the most widely used CDC platforms for PostgreSQL is Debezium.

In this article, we’ll explore how Debezium reads PostgreSQL WAL and converts database operations into a stream of change events.


What is Change Data Capture (CDC)

Applications often need to react to changes happening inside a database.

For example, imagine a simple update:

UPDATE users SET name='Alice Cooper' WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode

That change might need to trigger other systems.

For example:

  • updating a search index
  • syncing data into a data warehouse
  • updating analytics pipelines
  • triggering downstream workflows

A traditional approach would be to periodically query the database to detect changes.

For example:

poll database every few seconds
compare results
detect differences
Enter fullscreen mode Exit fullscreen mode

But this approach has several problems:

  • inefficient
  • high database load
  • delayed updates
  • difficult to scale

Change Data Capture solves this by streaming database changes as they happen.


Why WAL Makes CDC Possible

PostgreSQL writes every change into WAL before modifying the actual table files.

Because of this, WAL effectively contains a chronological record of all database operations.

Instead of scanning tables, CDC systems can simply read this log.

The idea looks like this:

                            Application Query
                                    ↓
                                PostgreSQL
                                    ↓
                            WAL record created
                                    ↓
                            CDC tool reads WAL
                                    ↓
                             Event generated
Enter fullscreen mode Exit fullscreen mode

This makes CDC both efficient and reliable.

The CDC system simply consumes changes in the order they occurred.

Debezium works exactly this way.

It does not read tables directly.
Instead, it reads PostgreSQL WAL.


Enabling Logical Decoding in PostgreSQL

For CDC systems to read WAL, PostgreSQL must provide enough information about row-level changes.

This is controlled by the wal_level configuration.

wal_level = logical
Enter fullscreen mode Exit fullscreen mode

PostgreSQL supports three WAL levels:

wal_level Purpose
minimal crash recovery only
replica physical replication
logical logical decoding (CDC tools)

When wal_level is set to logical, PostgreSQL records additional metadata needed for decoding row-level changes.

This makes it possible for external systems to interpret WAL records and reconstruct database events.


Replication Slots

When Debezium connects to PostgreSQL, it creates something called a replication slot.

A replication slot is managed by PostgreSQL and tracks the progress of a consumer reading WAL.

Its job is simple:

                            Debezium reads WAL
                                    ↓
                   Postgres tracks last consumed position
                                    ↓
                        WAL not deleted until consumed
Enter fullscreen mode Exit fullscreen mode

This prevents PostgreSQL from removing WAL segments that Debezium still needs.

The position in WAL is identified using something called an LSN (Log Sequence Number).

An LSN represents a specific position inside the WAL stream.

Because the replication slot tracks the last processed LSN, Debezium can resume streaming from the correct position even if it restarts.


Logical Decoding Plugins

WAL records are stored in a binary format that is not directly readable by external systems.

PostgreSQL solves this using logical decoding plugins.

These plugins translate WAL records into a logical representation of database changes.

Some commonly used plugins include:

  • pgoutput
  • wal2json
  • decoderbufs

The default plugin used by PostgreSQL is pgoutput.

It converts WAL records into logical change events that CDC tools like Debezium can consume.


How Debezium Streams Database Changes

When everything is configured, the end-to-end flow looks like this:

                          Application writes data
                                    ↓
                      PostgreSQL writes change to WAL
                                    ↓
                    Logical decoding plugin interprets WAL
                                    ↓
                Debezium reads changes using replication protocol
                                    ↓
                     Debezium publishes events to Kafka
                                    ↓
                    Downstream systems consume the events
Enter fullscreen mode Exit fullscreen mode

This architecture allows applications to react to database changes in near real time.

A simplified architecture might look like this:

                                PostgreSQL
                                    │
                                    ▼
                                 Debezium
                                    │
                                    ▼ 
                                  Kafka
                                    │
                                    ▼
                                Consumers
Enter fullscreen mode Exit fullscreen mode

Consumers could include:

  • microservices
  • analytics pipelines
  • search indexing systems
  • data warehouses

Example Change Event

When Debezium captures a change from WAL, it converts it into a structured event.

A simplified example might look like this:

{
  "op": "c",
  "table": "users",
  "before": null,
  "after": {
    "id": 1,
    "name": "Alice"
  }
}
Enter fullscreen mode Exit fullscreen mode

The op field indicates the type of operation:

Operation Meaning
c create
u update
d delete

These events can then be consumed by downstream systems to trigger further processing.


Why WAL Is So Powerful

What makes this architecture powerful is that multiple systems can reuse the same WAL stream.

The same WAL records that power CDC can also be used for:

  • replication
  • backups
  • point-in-time recovery

In other words, WAL acts as a universal source of truth for database changes.


Final Thoughts

Change Data Capture allows systems to react to database changes without constantly querying tables.

PostgreSQL makes this possible because every change is recorded in WAL.

Tools like Debezium simply read that stream of changes and convert them into events that other systems can consume.

This makes it possible to build event-driven architectures directly on top of a relational database.


Next Article in This Series

In the next article, we’ll explore how PostgreSQL WAL is used for database backups and point-in-time recovery using tools like pgBackRest.

We’ll walk through how WAL archiving enables restoring a PostgreSQL database to a specific moment in time.


Top comments (0)