BladePipe

Posted on Nov 30

How to Move Data from PostgreSQL to PostgreSQL in Minutes

#postgres #database #tutorial

PostgreSQL is a widely-used database system with over 35 years of development. It not only has the capabilities of a standard relational database, but also excels in executing complex SQL queries.

Users often utilize PostgreSQL for both online transaction processing (OLTP) and some data analysis tasks. This makes it important to move data from PostgreSQL to PostgreSQL.

This tutorial introduces how to sync data from PostgreSQL to PostgreSQL using BladePipe in minutes.

About BladePipe

BladePipe is a real-time end-to-end data replication tool, simplifying your data movement between diverse data sources, including databases, message queues, real-time data warehouses, etc.

By using the technique of Change Data Capture (CDC), BladePipe can track, capture and deliver data changes automatically and accurately with ultra-low latency, greatly improving the efficiency of data integration. It provides sound solutions for use cases requiring real-time data replication, fueling data-driven decision-making and business agility.

Highlights

PostgreSQL Logical Replication

BladePipe tracks the incremental data through the logical replication in PostgreSQL.

Publications are associated with all the tables involved in a DataJob. If the subscriptions of the DataJob are modified, the tables in the Publication are automatically changed accordingly.

DDL Sync with Triggers

DDL synchronization is essential for online database disaster recovery and other use cases, but PostgreSQL logical replication does not involve the replication of DDLs.

To address this issue, we adopt a widely-used solution, that is to use triggers to capture DDL operations and automatically write them into a regular table cc_pg_ddl_capture_tab. BladePipe can subscribe to this table to obtain DDL operations.

This mechanism is consistent with that of regular incremental DML sync, thus ensuring the correct order of DDLs and related DML events of the table.

Bidirectional Sync Loop Prevention

When using online databases, active geo-redundancy is usually one of the mandatory requirements. For data movement tools, it is crucial to prevent circular data replication during data synchronization.

To avoid loops in the bidirectional data synchronization between PostgreSQL databases, we mark the DML operations of the same transaction.

An additional decycle table is created. When BladePipe writes data to the target database, the DML events of the same transaction are recorded in the decycle table.

If BladePipe retrieves the event from the decycle table in the PostgreSQL instance, it ignores all operations of the current transaction, thus preventing circular data replication.

Procedure

Step 1: Modify PostgreSQL wal_level

Please refer to Permissions Required for PostgreSQL to create a user and grant the necessary permissions.
Set PostgreSQL's wal_level to logical.

For self-managed databases, you can modify the postgresql.conf file to set wal_level=logical and wal_log_hints=on.
Configure network permissions for the account.

For self-managed databases, you can modify the pg_hba.conf file and add the following configurations:
- host replication md5
- host md5
- host postgres md5
Restart PostgreSQL to apply the changes.

Step 2: Install BladePipe

Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.

Step 3: Add DataSources

Log in to the BladePipe Cloud.
Click DataSource > Add DataSource.
Select the source and target DataSource type, and fill out the setup form respectively.

Step 4: Create a DataJob

Click DataJob > Create DataJob.
Select the source and target DataSources, and click Test Connection to ensure the connection to the source and target DataSources are both successful.
Select Incremental for DataJob Type, together with the Full Data option.

If you select Sync DDL, BladePipe will automatically create the corresponding DDL capture triggers and events, which requires privileged permissions.
Select the tables to be replicated.
Select the columns to be replicated.

If you need to select specific columns for synchronization, you can create the corresponding tables in the Target in advance.
Confirm the DataJob creation.

Now the DataJob is created and started. BladePipe will automatically run the following DataTasks:

Schema Migration: The schemas of the source tables will be migrated to the target database.
Full Data Migration: All existing data from the source tables will be fully migrated to the target database.
Incremental Synchronization: Ongoing data changes will be continuously synchronized to the target database with ultra-low latency.

Conclusion

In this tutorial, a data pipeline from PostgreSQL to PostgreSQL is created in minutes with 3 steps using BladePipe. It is just a glimpse of BladePipe’s powerful capabilities of real-time end-to-end data replication. To discover more, welcome to visit https://www.bladepipe.com/

DEV Community

How to Move Data from PostgreSQL to PostgreSQL in Minutes

About BladePipe

Highlights

PostgreSQL Logical Replication

DDL Sync with Triggers

Bidirectional Sync Loop Prevention

Procedure

Step 1: Modify PostgreSQL wal_level

Step 2: Install BladePipe

Step 3: Add DataSources

Step 4: Create a DataJob

Conclusion

Top comments (0)

Read next

SplitView in iOS 18 - #30DaysOfSwift

Avoiding Connection Pinning in Lambda and RDS Proxy with NestJS and Proxy Splitting

How to use migrations with Golang

Exploring Aurora DSQL with TypeScript, Drizzle, Lambda, and AWS CDK