Like many developers, I frequently need to work with production-like data locally. Maybe you're debugging a tricky issue that only happens with specific data patterns, or you're testing a complex migration that could affect millions of rows. Whatever the reason, having a local copy of your production database can be incredibly useful.
I've spent considerable time perfecting a reliable process for this at work, and I want to share my approach. This guide will walk you through creating an exact copy of a remote PostgreSQL database on your local machine.
Why Clone Databases Locally?
Before we dive in, let's talk about when you might want to do this. I typically clone databases when:
- I'm writing a complex migration and want to verify it won't blow up with real data
- I need to debug a production issue that I can't reproduce with test data
- I'm optimizing queries and need realistic data volumes for meaningful performance testing
- I want to experiment with schema changes without affecting anyone else
The Script
Here's the script I use. It's battle-tested and handles most edge cases I've encountered:
#!/bin/bash
# Remote database configuration
REMOTE_HOST="your-remote-host.database.azure.com"
REMOTE_PORT="5432"
REMOTE_USER="your_remote_user"
REMOTE_PASSWORD="your_remote_password"
# Local database configuration
LOCAL_HOST="localhost"
LOCAL_PORT="5432"
LOCAL_USER="your_local_user"
LOCAL_PASSWORD="your_local_password"
# Function to copy a database
copy_database() {
local DB=$1
echo "Processing database: $DB"
# Create a timestamp for the backup file
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${DB}_backup_${TIMESTAMP}.sql"
echo "Starting database copy process for $DB..."
# Step 1: Create a dump of the remote database
echo "Creating dump from remote database..."
PGPASSWORD=$REMOTE_PASSWORD pg_dump \
-h $REMOTE_HOST \
-p $REMOTE_PORT \
-U $REMOTE_USER \
-d $DB \
-F c \
-f $BACKUP_FILE
if [ $? -ne 0 ]; then
echo "Error: Failed to create database dump for $DB"
return 1
fi
# Step 2: Drop the local database if it exists
echo "Dropping local database if exists..."
PGPASSWORD=$LOCAL_PASSWORD dropdb \
-h $LOCAL_HOST \
-p $LOCAL_PORT \
-U $LOCAL_USER \
--if-exists \
$DB
# Step 3: Create new local database
echo "Creating new local database..."
PGPASSWORD=$LOCAL_PASSWORD createdb \
-h $LOCAL_HOST \
-p $LOCAL_PORT \
-U $LOCAL_USER \
$DB
if [ $? -ne 0 ]; then
echo "Error: Failed to create local database for $DB"
return 1
fi
# Step 4: Restore the dump to the local database
echo "Restoring database locally..."
PGPASSWORD=$LOCAL_PASSWORD pg_restore \
-h $LOCAL_HOST \
-p $LOCAL_PORT \
-U $LOCAL_USER \
-d $DB \
--no-owner \
--no-privileges \
$BACKUP_FILE
if [ $? -ne 0 ]; then
echo "Warning: Some errors occurred during restore of $DB (this might be normal)"
fi
# Step 5: Cleanup
echo "Cleaning up temporary files..."
rm -f $BACKUP_FILE
echo "Database copy process completed for $DB!"
echo "----------------------------------------"
}
# Copy each database
copy_database "database1"
copy_database "database2"
echo "All databases have been copied!"
How It Works
The script is pretty straightforward. Here's what's happening under the hood:
First, it takes a snapshot of your remote database using
pg_dump
. I use the custom format (-F c
) because it's both faster and more flexible than plain SQL dumps.Then it wipes your local database (if it exists) and creates a fresh one. This helps avoid any weird state issues that might come from partial updates.
Finally, it restores the dump to your local database using
pg_restore
. I've added the--no-owner
and--no-privileges
flags because you probably don't have the same users and permissions locally as you do in production.
Things That Might Trip You Up
Credentials
Don't store credentials in the script file. I usually set them as environment variables:
export PG_REMOTE_PASSWORD="your_password"
export PG_LOCAL_PASSWORD="your_local_password"
Then modify the script to use them:
PGPASSWORD=$PG_REMOTE_PASSWORD pg_dump ...
Large Databases
If you're working with a massive database, you might want to use compression:
pg_dump -Z 9 ... # Maximum compression
Or parallel restore for faster imports:
pg_restore -j 4 ... # Use 4 parallel jobs
Network Issues
For unstable connections or very large dumps, I recommend using screen
or tmux
. There's nothing worse than losing a 2-hour transfer because your VPN hiccuped.
Selective Copying
Sometimes you don't need the entire database. Here's how to grab specific parts:
# Just one schema
pg_dump -n specific_schema ...
# Specific tables
pg_dump -t schema.table_name ...
# Just the structure, no data
pg_dump --schema-only ...
Example
Here's a specific example from my work. We had a complex migration that needed to modify historical transaction data. The staging environment didn't have enough real-world edge cases, so I needed a local copy of production to test thoroughly.
# Copy just the transactions schema
pg_dump -h prod-db.example.com -U prod_user -d prod_db -n transactions -F c -f transactions.dump
# Create a new local database
createdb migration_test
# Restore just that schema
pg_restore -d migration_test transactions.dump
This let me test the migration multiple times without affecting anyone else's work.
Wrapping Up
Having a reliable way to clone databases has saved me countless hours of debugging and prevented numerous production issues. The script above has served me well, but feel free to modify it for your needs.
Top comments (0)