Efficiently Migrating Data Between PostgreSQL Databases: A Practical Guide

#postgres #database

Migrating or syncing data between PostgreSQL databases is a common task for developers. Whether consolidating user data, archiving old records, or migrating production data to a reporting database, having a reliable and simple method can save hours of headaches.

PostgreSQL’s dblink extension makes this process straightforward, allowing you to query one database from another as if the two were connected tables.

Before using dblink, you need to make sure the extension is created in your target database:

CREATE EXTENSION IF NOT EXISTS dblink;

This only needs to be done once per database. After that, you can safely use dblink to query other PostgreSQL databases.

Example Scenario

Imagine you have:

Source database: user_data_archive
Target database: user_data_main
Target table: users

The users table has unique constraints on:

username
email

The goal is to copy users from the archive to the main database without violating these constraints.

Using dblink to Move Data

Here’s a simple example:

INSERT INTO users (username, email, created_at, updated_at)
SELECT DISTINCT
    username,
    lower(email) AS email,
    NOW() AS created_at,
    NOW() AS updated_at
FROM dblink(
    'dbname=user_data_archive user=app_user password=securepass',
    'SELECT username, email FROM archived_users'
) AS t(username text, email text)
ON CONFLICT DO NOTHING;

How This Works:

dblink allows the target database (user_data_main) to query the source (user_data_archive) directly.

DISTINCT removes duplicate rows from the incoming data.

ON CONFLICT DO NOTHING ensures that unique constraints on the target table (like unique usernames or emails) do not cause the query to fail.

This keeps the insert process safe while letting you move large amounts of data efficiently.

Why Use This Approach?

Simple setup: No need to dump and restore SQL files manually.
Flexible: You can filter, transform, or normalize data as part of the select.
Reliable: Works even when the source and target databases live on the same server or different servers.

Practical Use Cases:

Migrating users, products, or content from a legacy database to a new system.
Populating a reporting or analytics database from production tables.
Syncing staging and production data safely during testing.

dblink is a powerful tool for querying one PostgreSQL database from another.

Combining DISTINCT and ON CONFLICT DO NOTHING allows you to safely insert data without worrying about violating unique constraints.

This approach keeps your workflow clean, efficient, and maintainable, especially for large tables or ongoing migrations.