DEV Community

PikachuEXE
PikachuEXE

Posted on • Updated on

PostgreSQL Logical Replication - For Upgrade

2022-05-25 Update

  • Add statement to drop replication slot to avoid making folder pg_wal becoming too large This happens when the old DB server is still being used after publication is removed

2021-01-07 Update

  • Add Prevent Replication Error
  • Update Setup New Server (Sequences) to append actual commands to run

2020-11-24 Update
Moved sequence fixing to the end (after subscription removed)

2020-11-19 Update
Wow I forgot to fix the sequences
Added back a section for it
I realize this after trying a few inserts on my staging environment

Main Article
Just notes for my own experience mostly
Upgrade from 12 to 13

If you don't know what's logical replication read some articles about it first
Official doc: https://www.postgresql.org/docs/12/logical-replication.html

I used to just upgrade PSQL like this:
pg_dump -> pg_restore to another instance -> Point processes (web, worker, etc.) to new DB
Data loss doesn't matter that much in my project but I rather have no data loss

Failed Attempts

If you are too busy or need the solution urgently you can skip this and come back later maybe.

Failed Attempts #1 - Replication after data restored

I used pg_restore to restore data
After subscription started the server will complaint about duplicate data
Although it's possible use copy_data = false but that means some data will be lost and make this replication meaningless

Failed Attempts #2 - Replication after table structure AND indexes restored

Used pg_restore to restore with --schema
After subscription started the server will take forever to copy data from table with many indexes (waited 2 days still not done)

Solution

General Steps

This probably lack details
See Actual Notes on Test Run for actual commands / queries

0.1. Setup new server

  • Setup new host
  • Deploy empty PG Server (without cron job that does backup)
  • Backup schema (only if no latest db structure backup available)
  • Restore schema & PK without indexes & constraints (section pre-data)

Config might have to be updated

0.2. Setup old server

Config might have to be updated

1. Add Publication

In existing DB server

CREATE PUBLICATION {publication_name}
FOR ALL TABLES
Enter fullscreen mode Exit fullscreen mode

2. Add Subscription

In new DB server

CREATE SUBSCRIPTION {subcription_name}
CONNECTION 'hostaddr={old_db_server_ip} port={old_db_server_port} user={old_db_server_username} password={old_db_server_password} dbname={old_db_name}'
PUBLICATION {publication_name}
Enter fullscreen mode Exit fullscreen mode

3. Wait until synced

https://severalnines.com/database-blog/how-upgrade-postgresql-11-postgresql-12-zero-downtime

4.1. Setup New Server (Indexes)

  • Restore indexes (section post-data)

4.2. Verify all indexes & constraints are restored

5. Update Users (web/worker process) to use new DB

Check things are working

6.1. Remove Subscription

In new DB server

DROP SUBSCRIPTION IF EXISTS {subcription_name}
Enter fullscreen mode Exit fullscreen mode

6.2. Remove Publication

In old DB server

DROP PUBLICATION IF EXISTS {publication_name}
-- Prevents pg_wal folder getting very large in case old DB is still being used
SELECT pg_drop_replication_slot('{publication_name}');
Enter fullscreen mode Exit fullscreen mode

Actual Notes on Test Run

Setup New Server (Data Structure)
You will see some docker commands since I use Docker container to run PSQL


sudo docker exec -it db.master.1 bash

curl -o backup.dump \
https://bucket.s3.ap-east-1.amazonaws.com/backup/postgresql/pg_dump/manual/2020/10/2020_xx_xx_xxxxxx.dump

PGPASSWORD=pa55w0rd pg_restore --verbose --clean --no-acl --no-owner --if-exists --section=pre-data --no-publications -h localhost -U useruser -d dbdbdbdb --jobs=$(nproc) ./backup.dump


PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb

Enter fullscreen mode Exit fullscreen mode

Prevent Replication Error
Errors like
logical replication target relation xxx has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL
is caused by not having PK since those are in section post-data to be restored in Setup New Server (Indexes)
Thus we need to restore PK only
Which require custom SQL file created
References:

PGPASSWORD=pa55w0rd \
pg_dump \
-h localhost -U useruser -d dbdbdbdb \
--schema-only --section=post-data > dbdbdbdb_postdata.sql

grep -B 1 'PRIMARY KEY' dbdbdbdb_postdata.sql > dbdbdbdb_pkeys.sql

# Just text, so show content and copy/paste to new DB console
cat dbdbdbdb_pkeys.sql
Enter fullscreen mode Exit fullscreen mode

Add Publication/Subscription

CREATE PUBLICATION publication_test_2020_10_21_1146
FOR ALL TABLES;


CREATE SUBSCRIPTION subcription_test_2020_10_21_1146
CONNECTION 'hostaddr=10.170.0.6 port=5432 user=useruser password=pa55w0rd dbname=dbdbdbdb'
PUBLICATION publication_test_2020_10_21_1146;

Enter fullscreen mode Exit fullscreen mode

Monitor


-- Publisher
SELECT pid, usename, application_name, state
, pg_current_wal_lsn() AS current_lsn
, state
, sync_state
, sent_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS sent_diff
, write_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) AS write_diff
, replay_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_diff
--, write_lag
--, flush_lag
--, replay_lag
--, backend_start
--, reply_time
FROM pg_stat_replication
ORDER BY application_name, pid;



-- Subcriber
-- # Subscription Workers
SELECT pss.*, c.relname FROM pg_stat_subscription AS pss 
LEFT OUTER JOIN pg_class AS c
ON pss.relid = c.oid;

-- # Subscription Workers on Tables (not ready, srsubstate <> 'r')
SELECT psr.*, c.relname FROM pg_subscription_rel AS psr 
LEFT OUTER JOIN pg_class AS c
ON psr.srrelid = c.oid 
WHERE srsubstate <> 'r';

-- # Subscription Workers on Tables (not ready or synced)
SELECT psr.*, c.relname FROM pg_subscription_rel AS psr 
LEFT OUTER JOIN pg_class AS c
ON psr.srrelid = c.oid 
WHERE srsubstate NOT IN ('r', 's');





-- Monitor Create Index (in case "Broken pipe")
\x on

SELECT pid
--, backend_start
, query_start
, state_change
, wait_event_type
, wait_event
, state
, query
--, backend_type 
FROM pg_stat_activity
WHERE state = 'active';

Enter fullscreen mode Exit fullscreen mode

Monitor network speed / errors

sudo nethogs

sudo docker logs --tail=100 --follow db.master.1

Enter fullscreen mode Exit fullscreen mode

Setup New Server (Indexes)


sudo docker exec -it db.master.1 bash

PGPASSWORD=pa55w0rd pg_restore --verbose --clean --no-acl --no-owner --if-exists --section=post-data --no-publications -h localhost -U useruser -d dbdbdbdb --jobs=$(nproc) ./backup.dump

Enter fullscreen mode Exit fullscreen mode

Verify

-- Index Count
SELECT COUNT(*)
FROM pg_class
WHERE relkind = 'i'
  AND relname NOT LIKE 'pg_%'
  AND relname NOT LIKE 'sql_%';

-- Table Count
SELECT COUNT(*)
FROM pg_class
WHERE relkind = 'r'
  AND relname NOT LIKE 'pg_%'
  AND relname NOT LIKE 'sql_%';

-- Constraint Count
SELECT COUNT(*), contype
FROM pg_constraint
GROUP BY contype;

-- Sequence Count
SELECT COUNT(*), seqtypid
FROM pg_sequence
GROUP BY seqtypid;

Enter fullscreen mode Exit fullscreen mode

Remove Publication/Subscription


-- Safer to drop subscription first
DROP SUBSCRIPTION IF EXISTS subcription_test_2020_10_21_1146;

DROP PUBLICATION IF EXISTS publication_test_2020_10_21_1146;
-- Prevents pg_wal folder getting very large in case old DB is still being used
SELECT pg_drop_replication_slot('publication_test_2020_10_21_1146');
Enter fullscreen mode Exit fullscreen mode

Setup New Server (Sequences)
This will not be restored since we cannot restore data via pg_restore --section=data
This can be done after subscription removed (no more inserts without calling nextval on sequences) and before you start inserting records into any table with a sequence (In Ruby on Rails that's almost every table)

The method I tested:
https://wiki.postgresql.org/wiki/Fixing_Sequences

You might be interested in other answers:

touch reset.sql
tee -a reset.sql <<EOF
SELECT 'SELECT SETVAL(' ||
       quote_literal(quote_ident(PGT.schemaname) || '.' || quote_ident(S.relname)) ||
       ', COALESCE(MAX(' ||quote_ident(C.attname)|| '), 1) ) FROM ' ||
       quote_ident(PGT.schemaname)|| '.'||quote_ident(T.relname)|| ';'
FROM pg_class AS S,
     pg_depend AS D,
     pg_class AS T,
     pg_attribute AS C,
     pg_tables AS PGT
WHERE S.relkind = 'S'
    AND S.oid = D.objid
    AND D.refobjid = T.oid
    AND D.refobjid = C.attrelid
    AND D.refobjsubid = C.attnum
    AND T.relname = PGT.tablename
ORDER BY S.relname;
EOF
# Maybe check the content
# cat reset.sql
PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb -Atq -f reset.sql -o temp.sql
# Maybe check the content
# cat temp.sql
PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb -f temp.sql
rm temp.sql
Enter fullscreen mode Exit fullscreen mode

The End

If you feel like this looks like a mess
I am sorry but it is mainly for myself
Feel free to write another one that looks more organized ;)

Similar/Related Articles

Top comments (0)