rika

Posted on Nov 23

Cloning PostgreSQL Databases: A Developer's Guide to Local Replication

#postgres #bash #database

Like many developers, I frequently need to work with production-like data locally. Maybe you're debugging a tricky issue that only happens with specific data patterns, or you're testing a complex migration that could affect millions of rows. Whatever the reason, having a local copy of your production database can be incredibly useful.

I've spent considerable time perfecting a reliable process for this at work, and I want to share my approach. This guide will walk you through creating an exact copy of a remote PostgreSQL database on your local machine.

Why Clone Databases Locally?

Before we dive in, let's talk about when you might want to do this. I typically clone databases when:

I'm writing a complex migration and want to verify it won't blow up with real data
I need to debug a production issue that I can't reproduce with test data
I'm optimizing queries and need realistic data volumes for meaningful performance testing
I want to experiment with schema changes without affecting anyone else

The Script

Here's the script I use. It's battle-tested and handles most edge cases I've encountered:

#!/bin/bash

# Remote database configuration
REMOTE_HOST="your-remote-host.database.azure.com"
REMOTE_PORT="5432"
REMOTE_USER="your_remote_user"
REMOTE_PASSWORD="your_remote_password"

# Local database configuration
LOCAL_HOST="localhost"
LOCAL_PORT="5432"
LOCAL_USER="your_local_user"
LOCAL_PASSWORD="your_local_password"

# Function to copy a database
copy_database() {
    local DB=$1
    echo "Processing database: $DB"

    # Create a timestamp for the backup file
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    BACKUP_FILE="${DB}_backup_${TIMESTAMP}.sql"

    echo "Starting database copy process for $DB..."

    # Step 1: Create a dump of the remote database
    echo "Creating dump from remote database..."
    PGPASSWORD=$REMOTE_PASSWORD pg_dump \
        -h $REMOTE_HOST \
        -p $REMOTE_PORT \
        -U $REMOTE_USER \
        -d $DB \
        -F c \
        -f $BACKUP_FILE

    if [ $? -ne 0 ]; then
        echo "Error: Failed to create database dump for $DB"
        return 1
    fi

    # Step 2: Drop the local database if it exists
    echo "Dropping local database if exists..."
    PGPASSWORD=$LOCAL_PASSWORD dropdb \
        -h $LOCAL_HOST \
        -p $LOCAL_PORT \
        -U $LOCAL_USER \
        --if-exists \
        $DB

    # Step 3: Create new local database
    echo "Creating new local database..."
    PGPASSWORD=$LOCAL_PASSWORD createdb \
        -h $LOCAL_HOST \
        -p $LOCAL_PORT \
        -U $LOCAL_USER \
        $DB

    if [ $? -ne 0 ]; then
        echo "Error: Failed to create local database for $DB"
        return 1
    fi

    # Step 4: Restore the dump to the local database
    echo "Restoring database locally..."
    PGPASSWORD=$LOCAL_PASSWORD pg_restore \
        -h $LOCAL_HOST \
        -p $LOCAL_PORT \
        -U $LOCAL_USER \
        -d $DB \
        --no-owner \
        --no-privileges \
        $BACKUP_FILE

    if [ $? -ne 0 ]; then
        echo "Warning: Some errors occurred during restore of $DB (this might be normal)"
    fi

    # Step 5: Cleanup
    echo "Cleaning up temporary files..."
    rm -f $BACKUP_FILE

    echo "Database copy process completed for $DB!"
    echo "----------------------------------------"
}

# Copy each database
copy_database "database1"
copy_database "database2"

echo "All databases have been copied!"

How It Works

The script is pretty straightforward. Here's what's happening under the hood:

First, it takes a snapshot of your remote database using pg_dump. I use the custom format (-F c) because it's both faster and more flexible than plain SQL dumps.
Then it wipes your local database (if it exists) and creates a fresh one. This helps avoid any weird state issues that might come from partial updates.
Finally, it restores the dump to your local database using pg_restore. I've added the --no-owner and --no-privileges flags because you probably don't have the same users and permissions locally as you do in production.

Things That Might Trip You Up

Credentials

Don't store credentials in the script file. I usually set them as environment variables:

export PG_REMOTE_PASSWORD="your_password"
export PG_LOCAL_PASSWORD="your_local_password"

Then modify the script to use them:

PGPASSWORD=$PG_REMOTE_PASSWORD pg_dump ...

Large Databases

If you're working with a massive database, you might want to use compression:

pg_dump -Z 9 ... # Maximum compression

Or parallel restore for faster imports:

pg_restore -j 4 ... # Use 4 parallel jobs

Network Issues

For unstable connections or very large dumps, I recommend using screen or tmux. There's nothing worse than losing a 2-hour transfer because your VPN hiccuped.

Selective Copying

Sometimes you don't need the entire database. Here's how to grab specific parts:

# Just one schema
pg_dump -n specific_schema ...

# Specific tables
pg_dump -t schema.table_name ...

# Just the structure, no data
pg_dump --schema-only ...

Example

Here's a specific example from my work. We had a complex migration that needed to modify historical transaction data. The staging environment didn't have enough real-world edge cases, so I needed a local copy of production to test thoroughly.

# Copy just the transactions schema
pg_dump -h prod-db.example.com -U prod_user -d prod_db -n transactions -F c -f transactions.dump

# Create a new local database
createdb migration_test

# Restore just that schema
pg_restore -d migration_test transactions.dump

This let me test the migration multiple times without affecting anyone else's work.

Wrapping Up

Having a reliable way to clone databases has saved me countless hours of debugging and prevented numerous production issues. The script above has served me well, but feel free to modify it for your needs.

DEV Community

Cloning PostgreSQL Databases: A Developer's Guide to Local Replication

Why Clone Databases Locally?

The Script

How It Works

Things That Might Trip You Up

Credentials

Large Databases

Network Issues

Selective Copying

Example

Wrapping Up

Top comments (0)

Read next

ClickHouse Vs DuckDB

Database schema design of Splitwise application

The Justin Beiber database problem!

Designing a Production-Grade Database for High-Traffic Applications on AWS RDS MySQL