PostgreSQL Logical Replication: Setup, Monitoring & Troubleshooting
PostgreSQL logical replication decodes WAL changes into a logical format and streams them to subscribers. Unlike physical (streaming) replication that creates byte-for-byte copies of the entire cluster, logical replication lets you replicate a subset of tables, replicate between different PostgreSQL major versions, and feed changes into systems running a different schema.
Sounds great on the surface. The complexity hits after the initial setup.
Why Physical Replication Isn't Always Enough
Streaming replication sends raw WAL bytes to replicas. It's great for HA failover, but falls apart when you need selectivity. You can't replicate just three tables out of a hundred. You can't replicate from PostgreSQL 14 to PostgreSQL 17 during a version upgrade. You can't stream changes into a data warehouse with a different schema.
Logical replication solves all of these by decoding WAL into row-level change events (INSERT, UPDATE, DELETE) and applying them on the subscriber. You define a publication with a set of tables on the source, create a subscription on the target, and PostgreSQL handles the initial data copy and ongoing change streaming.
Setting It Up
On the publisher
First, set wal_level (this requires a restart):
ALTER SYSTEM SET wal_level = 'logical';
-- Restart PostgreSQL after this change
Then create a publication:
-- Publish specific tables
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items, customers;
-- Or publish all tables in a schema (PG15+)
CREATE PUBLICATION analytics_pub FOR TABLES IN SCHEMA analytics;
On the subscriber
Create the subscription. This automatically creates a replication slot on the publisher and copies existing data before streaming changes:
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=publisher.example.com port=5432 dbname=myapp user=replicator password=secret'
PUBLICATION orders_pub;
Monitor the initial sync:
SELECT
srsubid,
srrelid::regclass AS table_name,
CASE srsubstate
WHEN 'i' THEN 'initializing'
WHEN 'd' THEN 'data_copying'
WHEN 's' THEN 'synchronized'
WHEN 'r' THEN 'ready'
END AS sync_state
FROM pg_subscription_rel;
The Three Things That Bite You in Production
1. DDL is not replicated
This is the big one. An ALTER TABLE ADD COLUMN on the publisher is completely invisible to the subscriber. The subscriber doesn't know the column exists, so when it receives rows with the new column, replication breaks.
The fix requires careful ordering:
-- Step 1: Apply on subscriber FIRST
ALTER TABLE orders ADD COLUMN priority_level TEXT DEFAULT 'normal';
-- Step 2: Apply on publisher
ALTER TABLE orders ADD COLUMN priority_level TEXT DEFAULT 'normal';
-- Step 3: Refresh the subscription
ALTER SUBSCRIPTION orders_sub REFRESH PUBLICATION;
The rule is: additive changes (ADD COLUMN, ADD TABLE) go on the subscriber first. Destructive changes (DROP COLUMN, DROP TABLE) go on the publisher first. Get it backwards and replication stops.
2. Replication slots retain WAL indefinitely
A replication slot tells PostgreSQL "do not delete any WAL after this point until I say so." If the subscriber goes offline -- a failed server, a dropped subscription that wasn't cleaned up, a CDC tool that crashed -- the slot keeps retaining WAL segments.
Hours or days later, pg_wal/ has grown from a few GB to hundreds of GB. The disk fills. PostgreSQL can't write new WAL. All transactions stall. The primary is effectively down.
Monitor slot lag constantly:
SELECT
slot_name,
active,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
) AS slot_lag_bytes,
wal_status
FROM pg_replication_slots
WHERE slot_type = 'logical';
And set a safety limit (PG13+):
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
SELECT pg_reload_conf();
When a slot exceeds this limit, PostgreSQL invalidates it rather than filling the disk. The subscriber will need to be re-synced, but the publisher stays online.
3. Conflict resolution is minimal
If an INSERT on the subscriber violates a unique constraint because the row already exists, replication stops dead. You have to manually intervene:
-- Disable the subscription
ALTER SUBSCRIPTION orders_sub DISABLE;
-- Fix the conflict
DELETE FROM orders WHERE order_id = 12345;
-- Re-enable
ALTER SUBSCRIPTION orders_sub ENABLE;
On PG15+, you can skip conflicts automatically:
ALTER SUBSCRIPTION orders_sub SET (disable_on_error = false);
But that means silently dropping conflicting data, which may not be acceptable.
REPLICA IDENTITY: The Hidden Requirement
For UPDATE and DELETE to work, the subscriber needs to identify which row to modify. By default it uses the primary key. Tables without a primary key need explicit configuration:
-- Option 1: Send all columns (works but increases WAL volume)
ALTER TABLE events REPLICA IDENTITY FULL;
-- Option 2: Use a unique index
CREATE UNIQUE INDEX idx_events_event_id ON events (event_id);
ALTER TABLE events REPLICA IDENTITY USING INDEX idx_events_event_id;
Without a replica identity, UPDATE and DELETE operations on the publisher fail with an error.
Aurora PostgreSQL Setup
For AWS Aurora, logical replication requires a parameter group change and reboot:
-- In the RDS parameter group:
-- rds.logical_replication = 1
-- This automatically sets wal_level = logical
-- After reboot, grant the replication role:
GRANT rds_replication TO replicator_user;
-- Then create publications and subscriptions as normal
Monitoring Replication Health
Check subscription status and lag on the subscriber:
SELECT
subname AS subscription_name,
received_lsn,
latest_end_lsn,
latest_end_time,
pg_size_pretty(
pg_wal_lsn_diff(received_lsn, latest_end_lsn)
) AS apply_lag_bytes
FROM pg_stat_subscription
WHERE subname IS NOT NULL;
Check for errors that have stalled replication:
SELECT
subname,
worker_type,
relname,
last_error_message,
last_error_time
FROM pg_stat_subscription_stats
WHERE last_error_message IS NOT NULL;
Prevention Checklist
-
Always set
max_slot_wal_keep_size-- 5-20 GB depending on workload. This prevents a single orphaned slot from filling your disk. -
Document your schema change procedure -- subscriber-first for additions, publisher-first for removals. Include
ALTER SUBSCRIPTION ... REFRESH PUBLICATIONin the checklist. - Use UUID primary keys in bidirectional setups to avoid conflicts.
- Keep subscriber tables read-only in unidirectional setups (separate role without write privileges).
- Monitor both apply lag and slot lag continuously. They indicate different problems.
What Logical Replication Does NOT Replicate
- Sequences (sync periodically with
pg_dump --data-only -t '*_seq') - DDL changes (manual coordination required)
- Large objects
- Materialized view refreshes
- TRUNCATE (on PostgreSQL versions before 11)
Plan for these gaps in your architecture rather than discovering them in production.
Top comments (0)