DEV Community

Cover image for PostgreSQL Logical Replication: Setup, Monitoring & Troubleshooting
Philip McClarence
Philip McClarence

Posted on

PostgreSQL Logical Replication: Setup, Monitoring & Troubleshooting

PostgreSQL Logical Replication: Setup, Monitoring & Troubleshooting

PostgreSQL logical replication decodes WAL changes into a logical format and streams them to subscribers. Unlike physical (streaming) replication that creates byte-for-byte copies of the entire cluster, logical replication lets you replicate a subset of tables, replicate between different PostgreSQL major versions, and feed changes into systems running a different schema.

Sounds great on the surface. The complexity hits after the initial setup.

Why Physical Replication Isn't Always Enough

Streaming replication sends raw WAL bytes to replicas. It's great for HA failover, but falls apart when you need selectivity. You can't replicate just three tables out of a hundred. You can't replicate from PostgreSQL 14 to PostgreSQL 17 during a version upgrade. You can't stream changes into a data warehouse with a different schema.

Logical replication solves all of these by decoding WAL into row-level change events (INSERT, UPDATE, DELETE) and applying them on the subscriber. You define a publication with a set of tables on the source, create a subscription on the target, and PostgreSQL handles the initial data copy and ongoing change streaming.

Setting It Up

On the publisher

First, set wal_level (this requires a restart):

ALTER SYSTEM SET wal_level = 'logical';
-- Restart PostgreSQL after this change
Enter fullscreen mode Exit fullscreen mode

Then create a publication:

-- Publish specific tables
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items, customers;

-- Or publish all tables in a schema (PG15+)
CREATE PUBLICATION analytics_pub FOR TABLES IN SCHEMA analytics;
Enter fullscreen mode Exit fullscreen mode

On the subscriber

Create the subscription. This automatically creates a replication slot on the publisher and copies existing data before streaming changes:

CREATE SUBSCRIPTION orders_sub
    CONNECTION 'host=publisher.example.com port=5432 dbname=myapp user=replicator password=secret'
    PUBLICATION orders_pub;
Enter fullscreen mode Exit fullscreen mode

Monitor the initial sync:

SELECT
    srsubid,
    srrelid::regclass AS table_name,
    CASE srsubstate
        WHEN 'i' THEN 'initializing'
        WHEN 'd' THEN 'data_copying'
        WHEN 's' THEN 'synchronized'
        WHEN 'r' THEN 'ready'
    END AS sync_state
FROM pg_subscription_rel;
Enter fullscreen mode Exit fullscreen mode

The Three Things That Bite You in Production

1. DDL is not replicated

This is the big one. An ALTER TABLE ADD COLUMN on the publisher is completely invisible to the subscriber. The subscriber doesn't know the column exists, so when it receives rows with the new column, replication breaks.

The fix requires careful ordering:

-- Step 1: Apply on subscriber FIRST
ALTER TABLE orders ADD COLUMN priority_level TEXT DEFAULT 'normal';

-- Step 2: Apply on publisher
ALTER TABLE orders ADD COLUMN priority_level TEXT DEFAULT 'normal';

-- Step 3: Refresh the subscription
ALTER SUBSCRIPTION orders_sub REFRESH PUBLICATION;
Enter fullscreen mode Exit fullscreen mode

The rule is: additive changes (ADD COLUMN, ADD TABLE) go on the subscriber first. Destructive changes (DROP COLUMN, DROP TABLE) go on the publisher first. Get it backwards and replication stops.

2. Replication slots retain WAL indefinitely

A replication slot tells PostgreSQL "do not delete any WAL after this point until I say so." If the subscriber goes offline -- a failed server, a dropped subscription that wasn't cleaned up, a CDC tool that crashed -- the slot keeps retaining WAL segments.

Hours or days later, pg_wal/ has grown from a few GB to hundreds of GB. The disk fills. PostgreSQL can't write new WAL. All transactions stall. The primary is effectively down.

Monitor slot lag constantly:

SELECT
    slot_name,
    active,
    pg_size_pretty(
        pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
    ) AS slot_lag_bytes,
    wal_status
FROM pg_replication_slots
WHERE slot_type = 'logical';
Enter fullscreen mode Exit fullscreen mode

And set a safety limit (PG13+):

ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
SELECT pg_reload_conf();
Enter fullscreen mode Exit fullscreen mode

When a slot exceeds this limit, PostgreSQL invalidates it rather than filling the disk. The subscriber will need to be re-synced, but the publisher stays online.

3. Conflict resolution is minimal

If an INSERT on the subscriber violates a unique constraint because the row already exists, replication stops dead. You have to manually intervene:

-- Disable the subscription
ALTER SUBSCRIPTION orders_sub DISABLE;

-- Fix the conflict
DELETE FROM orders WHERE order_id = 12345;

-- Re-enable
ALTER SUBSCRIPTION orders_sub ENABLE;
Enter fullscreen mode Exit fullscreen mode

On PG15+, you can skip conflicts automatically:

ALTER SUBSCRIPTION orders_sub SET (disable_on_error = false);
Enter fullscreen mode Exit fullscreen mode

But that means silently dropping conflicting data, which may not be acceptable.

REPLICA IDENTITY: The Hidden Requirement

For UPDATE and DELETE to work, the subscriber needs to identify which row to modify. By default it uses the primary key. Tables without a primary key need explicit configuration:

-- Option 1: Send all columns (works but increases WAL volume)
ALTER TABLE events REPLICA IDENTITY FULL;

-- Option 2: Use a unique index
CREATE UNIQUE INDEX idx_events_event_id ON events (event_id);
ALTER TABLE events REPLICA IDENTITY USING INDEX idx_events_event_id;
Enter fullscreen mode Exit fullscreen mode

Without a replica identity, UPDATE and DELETE operations on the publisher fail with an error.

Aurora PostgreSQL Setup

For AWS Aurora, logical replication requires a parameter group change and reboot:

-- In the RDS parameter group:
-- rds.logical_replication = 1
-- This automatically sets wal_level = logical

-- After reboot, grant the replication role:
GRANT rds_replication TO replicator_user;

-- Then create publications and subscriptions as normal
Enter fullscreen mode Exit fullscreen mode

Monitoring Replication Health

Check subscription status and lag on the subscriber:

SELECT
    subname AS subscription_name,
    received_lsn,
    latest_end_lsn,
    latest_end_time,
    pg_size_pretty(
        pg_wal_lsn_diff(received_lsn, latest_end_lsn)
    ) AS apply_lag_bytes
FROM pg_stat_subscription
WHERE subname IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

Check for errors that have stalled replication:

SELECT
    subname,
    worker_type,
    relname,
    last_error_message,
    last_error_time
FROM pg_stat_subscription_stats
WHERE last_error_message IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

Prevention Checklist

  • Always set max_slot_wal_keep_size -- 5-20 GB depending on workload. This prevents a single orphaned slot from filling your disk.
  • Document your schema change procedure -- subscriber-first for additions, publisher-first for removals. Include ALTER SUBSCRIPTION ... REFRESH PUBLICATION in the checklist.
  • Use UUID primary keys in bidirectional setups to avoid conflicts.
  • Keep subscriber tables read-only in unidirectional setups (separate role without write privileges).
  • Monitor both apply lag and slot lag continuously. They indicate different problems.

What Logical Replication Does NOT Replicate

  • Sequences (sync periodically with pg_dump --data-only -t '*_seq')
  • DDL changes (manual coordination required)
  • Large objects
  • Materialized view refreshes
  • TRUNCATE (on PostgreSQL versions before 11)

Plan for these gaps in your architecture rather than discovering them in production.

Top comments (0)