Finny Collins

Posted on Apr 1

PostgreSQL logical replication — 5 steps to set up real-time data sync across servers

#postgres #database

PostgreSQL has had logical replication built in since version 10, and it remains one of the most practical ways to keep data in sync between two or more servers. Unlike physical replication, which copies the entire database cluster byte-for-byte, logical replication works at the row level. You pick which tables to replicate, and PostgreSQL streams the changes in real time.

This article walks through the full setup in five steps. By the end you'll have a working publisher-subscriber pair and know how to monitor it.

What is logical replication and when do you need it

Logical replication uses a publish-subscribe model. The source database (publisher) defines a publication — a set of tables whose changes should be broadcast. The target database (subscriber) creates a subscription that connects to the publisher, pulls the initial data snapshot and then receives a continuous stream of INSERT, UPDATE and DELETE operations.

The key difference from physical (streaming) replication is granularity. Physical replication mirrors the entire cluster, operates at the WAL byte level and requires identical PostgreSQL major versions on both sides. Logical replication works per-table, decodes WAL into logical change events and allows different major versions or even different schemas on the subscriber.

This makes logical replication useful in a range of scenarios that physical replication simply cannot handle.

	Logical replication	Physical replication
Granularity	Per-table	Entire cluster
Cross-version support	Yes	No (same major version required)
Subscriber writes	Allowed	Read-only standby
Schema differences	Allowed (column subset)	Must be identical
DDL replication	No	Yes (WAL-level)
Typical use case	Selective sync, migrations, consolidation	High availability, failover

Use logical replication when you need to replicate a subset of tables, consolidate data from multiple sources into one reporting database, migrate between PostgreSQL major versions with minimal downtime or feed changes into a data warehouse alongside normal writes.

Prerequisites

Before you start, make sure you have the following in place:

Two PostgreSQL instances running version 10 or later (publisher and subscriber). They can be on different major versions
Network connectivity between the two servers on the PostgreSQL port (default 5432)
A superuser or a user with the REPLICATION role on the publisher
The tables you want to replicate must have a primary key or a REPLICA IDENTITY set. Without one, UPDATE and DELETE operations will fail on the subscriber

If both instances are on the same machine for testing, just use different ports. The rest of the setup is identical.

Step 1 — configure the publisher database

Logical replication requires the WAL level to be set to logical. By default PostgreSQL uses replica, so you need to change this in postgresql.conf on the publisher.

Open the config file and set:

wal_level = logical

You also need at least one replication slot available. Check that max_replication_slots is high enough. The default of 10 is fine for most setups, but if you already use slots for physical replication or other subscribers, increase it:

max_replication_slots = 10
max_wal_senders = 10

Next, update pg_hba.conf to allow the subscriber to connect with replication privileges. Add a line like this:

host    all    replication_user    192.168.1.0/24    md5

Replace the IP range with your subscriber's actual address. After editing both files, restart PostgreSQL:

sudo systemctl restart postgresql

You can verify the WAL level is active by running:

SHOW wal_level;

It should return logical. If it still shows replica, the restart didn't pick up the config change. Double-check the file path and try again.

Step 2 — create a publication

On the publisher database, create a publication for the tables you want to replicate. Connect to the database as a superuser or the replication user and run:

CREATE PUBLICATION my_publication FOR TABLE users, orders;

This publishes changes from the users and orders tables. If you want to publish all tables in the database, use:

CREATE PUBLICATION my_publication FOR ALL TABLES;

You can also control which operations are published. By default all of them (INSERT, UPDATE, DELETE, TRUNCATE) are included. To limit it:

CREATE PUBLICATION my_publication FOR TABLE users
    WITH (publish = 'insert,update');

To check what publications exist:

SELECT * FROM pg_publication;

And to see which tables are in a publication:

SELECT * FROM pg_publication_tables;

Publications are lightweight metadata objects. Creating one doesn't start any replication by itself. That happens when a subscriber connects.

Step 3 — prepare the subscriber database

The subscriber needs to have the same table structure as the publisher. Logical replication does not copy DDL, so you must create the tables manually before starting.

The easiest way is to dump the schema from the publisher and restore it on the subscriber:

pg_dump -h publisher_host -U postgres -s -t users -t orders mydb > schema.sql
psql -h subscriber_host -U postgres -d mydb -f schema.sql

The -s flag dumps schema only, no data. The initial data will come through the replication snapshot when the subscription is created.

Make sure the target tables are empty. If they already contain data, you'll get duplicate key errors during the initial sync. Either truncate them or create the subscription with copy_data = false (more on that in the next step).

The subscriber user needs permissions to write to these tables. If you're using the same superuser, that's already covered.

Step 4 — create a subscription

On the subscriber database, create a subscription that points to the publisher:

CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host port=5432 dbname=mydb user=replication_user password=secret'
    PUBLICATION my_publication;

As soon as you run this, PostgreSQL will:

Connect to the publisher
Create a replication slot on the publisher
Copy the initial table data (snapshot)
Start streaming live changes

If the tables on the subscriber already have data and you only want to start streaming from now, skip the initial copy:

CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host port=5432 dbname=mydb user=replication_user password=secret'
    PUBLICATION my_publication
    WITH (copy_data = false);

To check the subscription status:

SELECT * FROM pg_stat_subscription;

You should see a row with a non-null pid, which means the subscription worker is running and connected.

Step 5 — verify and monitor replication

After creating the subscription, verify that data is actually flowing. Insert a row on the publisher and check if it appears on the subscriber:

-- On publisher
INSERT INTO users (name, email) VALUES ('test_user', 'test@example.com');

-- On subscriber
SELECT * FROM users WHERE name = 'test_user';

If the row shows up, replication is working. For ongoing monitoring, these views are your main tools.

On the publisher, check replication slot status and lag:

SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;

On the subscriber, check subscription state and table sync progress:

SELECT subname, pid, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;

SELECT srsubid, srrelid, srsublsn
FROM pg_subscription_rel;

If pg_stat_replication shows a growing gap between sent_lsn and replay_lsn, the subscriber is falling behind. This usually means the subscriber is under heavy load or the network is slow. Check the subscriber's PostgreSQL logs for errors.

Common issues and how to fix them

Most problems with logical replication happen during setup or when the publisher schema changes. Here are the ones you'll run into most often.

Problem	Cause	Solution
`ERROR: logical decoding requires wal_level >= logical`	`wal_level` not set or restart not done	Set `wal_level = logical` in `postgresql.conf` and restart PostgreSQL
Initial sync stuck or very slow	Large tables being copied	Monitor `pg_stat_subscription`. Consider increasing `max_sync_workers_per_subscription`
`ERROR: could not create replication slot`	All slots in use	Increase `max_replication_slots` and restart
`UPDATE` or `DELETE` fails on subscriber	Table has no primary key	Add a primary key or set `REPLICA IDENTITY FULL` on the publisher table
Subscriber stops receiving changes	Network issue or publisher restart	Check `pg_stat_subscription` for errors. The subscription auto-reconnects in most cases
Duplicate key errors during initial sync	Target table already has data	Truncate the table or use `copy_data = false` when creating subscription
Schema mismatch after `ALTER TABLE`	DDL changes are not replicated	Apply the same DDL on the subscriber manually before the change takes effect

When something breaks, the subscriber logs are the first place to look. PostgreSQL is usually specific about what went wrong.

Logical replication limitations

Logical replication covers a lot, but it has clear boundaries worth knowing before you commit to it:

DDL changes (CREATE TABLE, ALTER TABLE, DROP) are not replicated. You need to apply schema changes on both sides manually or use a migration tool
Sequences are not synced. If you fail over to the subscriber, sequence values will be out of date. You'll need to reset them manually
Large objects (the lo type) are not supported
TRUNCATE replication was added in PostgreSQL 11. Earlier versions don't replicate it
There's no built-in conflict resolution. If the same row is modified on both publisher and subscriber, you get an error and replication stops until you fix it

These limitations are manageable for most use cases. But if you need full cluster replication with automatic failover, physical replication or a tool like Patroni is a better fit.

Keeping replicated data safe

Replication is not a backup. If someone runs a bad DELETE on the publisher, that delete gets replicated too. You need actual backups alongside replication. Databasus is the most widely used tool for PostgreSQL backup and the industry standard for managing scheduled backups. It supports logical, physical and incremental backups with point-in-time recovery, handles multiple storage destinations and takes a few minutes to set up through its web UI.

Setting up logical replication takes about 15 minutes once you've done it a couple of times. The five steps above cover the core flow. From there, you can add more tables to the publication, create additional subscribers or combine logical replication with physical standby servers for both selective sync and high availability.

DEV Community