Athreya aka Maneshwar

Posted on Jul 18

Dump and Sync PostgreSQL Schema with Python (No Data, No BS)

#postgres #programming #webdev

Hi there! I'm Maneshwar. Right now, I’m building LiveAPI, a first-of-its-kind tool that helps you automatically index API endpoints across all your repositories. LiveAPI makes it easier to discover, understand, and interact with APIs in large infrastructures.

When you're working with multiple PostgreSQL environments — dev, staging, prod — you often need to sync schema without touching the data.

This post shows how to use a pure Python script to dump an entire schema, including tables, views, triggers, sequences, constraints, and stored procedures.

No external dependencies. No ORM. No pg_dump black box.

What This Script Does

Connects to any PostgreSQL database.
Extracts complete schema info:
- Tables, Columns, Indexes
- Views, Sequences, Triggers
- Functions & Procedures
- Constraints (PK, FK, etc.)
Outputs schema as:
- Pretty terminal dump
- JSON file (schema_dump.json)
- Raw SQL DDL file (schema_dump.sql)

Setup

Install the dependency:

pip install psycopg2

Set your connection config at the top:

engine = "postgresql"
name = "your_db"
user = "your_user"
password = "your_pass"
host = "your_host"
port = "5432"

How to Use

Just run the script:

python dump_schema.py

This will:

Print a human-readable breakdown of your entire schema
Write two files in the current directory:
- schema_dump.json
- schema_dump.sql

JSON Output

You’ll get structured JSON that looks like this:

{
  "database_info": {
    "name": "liveapi",
    "host": "34.57.96.60",
    "port": "5432",
    "engine": "postgresql",
    "dump_date": "2025-07-18 18:00:00"
  },
  "schemas": {
    "public": {
      "tables": {
        "users": {
          "columns": [...],
          "primary_keys": [...],
          "foreign_keys": [...],
          "indexes": [...],
          "triggers": [...]
        }
      },
      "views": [...],
      "functions": [...],
      "triggers": [...],
      "sequences": [...],
      "constraints": [...]
    }
  }
}

Use it for:

Schema diffs between environments
Version control on schema metadata
Visual schema documentation

SQL Output

The schema_dump.sql file contains raw SQL:

CREATE TABLE public.users (
  id uuid NOT NULL DEFAULT gen_random_uuid(),
  email text NOT NULL,
  PRIMARY KEY (id)
);

ALTER TABLE public.users ADD CONSTRAINT fk_users_org FOREIGN KEY (org_id) REFERENCES public.org(id);

CREATE UNIQUE INDEX idx_users_email ON public.users(email);

You can pipe this into psql on another server:

psql -h target_host -U target_user -d target_db -f schema_dump.sql

This lets you sync schema from one DB to another in a controlled way — no data, just structure.

Why Use This Over `pg_dump`

Feature	`pg_dump --schema-only`	This Script
JSON Output	❌	✅
Selective Extraction	❌ (manual)	✅
Programmatic Access	❌	✅
Easy to Extend	❌	✅
Custom Logic	❌	✅
Triggers, Views, FKs, Indexes	✅	✅

If you're building tooling, dashboards, or want to integrate schema dumping into your CI/CD, this Python approach is flexible and extendable.

Final Thoughts

You can use this Python script to:

Mirror your schema across environments
Export schema snapshots for backup or review
Debug schema issues with a visual/textual snapshot
Generate schema migration plans manually

Bonus: Want to Sync Schema Automatically?

Take the schema_dump.sql generated here and write an Ansible task or a Flyway/Liquibase wrapper to apply it to staging/prod in your pipeline.

LiveAPI helps you get all your backend APIs documented in a few minutes.

With LiveAPI, you can generate interactive API docs that allow users to search and execute endpoints directly from the browser.

If you're tired of updating Swagger manually or syncing Postman collections, give it a shot.

Top comments (4)

ramgit1989 • Sep 19

insightful for sync schema

Athreya aka Maneshwar • Sep 19

Thanks

zulqasar • Nov 5

Where is the script?

Athreya aka Maneshwar • Nov 6

I dont remember why, but I switched to bash script.

#!/bin/bash

set -e

# === ENVIRONMENT VARIABLES (CONFIGURED VIA ANSIBLE) ===
export AWS_ACCESS_KEY_ID="{{ s3_access_key_id }}"
export AWS_SECRET_ACCESS_KEY="{{ s3_secret_access_key }}"
AWS_ENDPOINT="{{ s3_endpoint_url }}"
S3_BUCKET="{{ s3_bucket }}"
S3_PATH="{{ s3_path_pg }}"
S3_REGION="{{ s3_region }}"
DISCORD_WEBHOOK="{{ discord_webhook }}"

# === POSTGRESQL CONFIGURATION ===
PG_USER="{{ pg_user }}"
PG_HOST="{{ pg_host }}"
PG_PORT="{{ pg_port }}"
PG_PASSWORD="{{ pg_password }}"

export PGPASSWORD="$PG_PASSWORD"

# === BACKUP SETUP ===
BACKUP_DIR="{{ pg_backup_dir }}"

# === NEW: SAFETY CHECKS ===
# 1. Ensure the backup directory variable is not empty.
if [ -z "$BACKUP_DIR" ]; then
    echo "❌ CRITICAL ERROR: The BACKUP_DIR variable is empty. Aborting to prevent catastrophic deletion."
    exit 1
fi

# 2. Ensure the backup directory is not a dangerous root-level path.
if [ "$BACKUP_DIR" = "/" ] || [ $(expr length "$BACKUP_DIR") -lt 5 ]; then
    echo "❌ CRITICAL ERROR: BACKUP_DIR is set to '$BACKUP_DIR', which is a dangerous location. Aborting for safety."
    exit 1
fi

# 3. Ensure the backup directory exists before we try to use it.
if [ ! -d "$BACKUP_DIR" ]; then
    echo "🤔 Backup directory '$BACKUP_DIR' not found. Creating it."
    mkdir -p "$BACKUP_DIR"
fi
# === END SAFETY CHECKS ===

cd "$BACKUP_DIR"

# === Cleanup old files ===
# First, remove the old dump directory
rm -rf "$BACKUP_DIR/dump"

# Use find to safely delete only .tar.gz files in this directory
echo "🧹 Cleaning up old *.tar.gz backup files from $BACKUP_DIR..."
find "$BACKUP_DIR" -maxdepth 1 -type f -name "*.tar.gz" -delete

# Recreate the dump directory for the new backup
mkdir -p "$BACKUP_DIR/dump"

# === Set filename to the current day of the week ===
# As it's Thursday evening in Mandya, this would create "thursday.tar.gz"
DAY_OF_WEEK=$(date +%A | tr '[:upper:]' '[:lower:]')
ARCHIVE="${DAY_OF_WEEK}.tar.gz"

# === Dump all databases individually ===
echo "📦 Dumping PostgreSQL databases..."
DBS=$(psql -h "$PG_HOST" -U "$PG_USER" -p "$PG_PORT" -d postgres -Atc \
"SELECT datname FROM pg_database WHERE datistemplate = false AND datname NOT IN ('postgres');" \
| tr -s '[:space:]' '\n' | grep -v -E '^(|[*])$')
echo "DBS list: $DBS"

if [ -z "$DBS" ]; then
  curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
      --data-raw '{"content": "❌ PostgreSQL backup failed — no databases found."}'
  exit 1
fi

for db in $DBS; do
    [ "$db" = "*" ] && continue
    echo "→ Dumping $db..."
    if ! pg_dump -h "$PG_HOST" -U "$PG_USER" -p "$PG_PORT" -Fc \
      -f "$BACKUP_DIR/dump/${db}.dump" "$db"; then
        curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
            --data-raw '{"content": "❌ Failed dumping database: '"$db"'"}'
        exit 1
    fi
done

# === Dump roles ===
echo "👥 Dumping PostgreSQL roles..."
if ! pg_dumpall -h "$PG_HOST" -U "$PG_USER" --roles-only > "$BACKUP_DIR/dump/roles.sql"; then
    curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
        --data-raw '{"content": "❌ Failed dumping PostgreSQL roles."}'
    exit 1
fi

# === Compress everything ===
echo "🗜️ Compressing backup..."
tar -zcvf "$BACKUP_DIR/$ARCHIVE" -C "$BACKUP_DIR/dump" .

# === Upload to S3-Compatible Storage ===
echo "☁️ Uploading to Object Storage..."
if ! aws s3 cp "$BACKUP_DIR/$ARCHIVE" "s3://${S3_BUCKET}/${S3_PATH}/${ARCHIVE}" \
  --endpoint-url "$AWS_ENDPOINT" \
  --region "$S3_REGION"; then
    curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
        --data-raw '{"content": "❌ Error uploading PostgreSQL backup to Object Storage."}'
    exit 1
fi

# === Check size ===
filesize=$(stat -c%s "$BACKUP_DIR/$ARCHIVE")
maxsize=20971520  # 20MB
filesize_mb=$(bc <<< "scale=2; $filesize / 1024 / 1024")

if [ "$filesize" -lt "$maxsize" ]; then
    curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
        --data-raw '{"content": "❌ PostgreSQL backup too small ('"$filesize_mb"' MB), probably failed."}'
    exit 1
fi

# === Success ===
# --- Generates the virtual-hosted-style S3 URL ---
# NOTE: This link will NOT work in a browser if the bucket is private.

# Get the endpoint hostname by removing "https://"
S3_HOSTNAME=$(echo "$AWS_ENDPOINT" | sed 's|https://||')

# Construct the S3 virtual-hosted URL
S3_URL="https://${S3_BUCKET}.${S3_HOSTNAME}/${S3_PATH}/${ARCHIVE}"

curl -sf -X POST "$DISCORD_WEBHOOK" -H 'Content-Type: application/json' \
    --data-raw '{"content": "✅ Backed up '"$filesize_mb"' MB PostgreSQL. S3 URL: '"$S3_URL"'"}'

#!/bin/bash
set -euo pipefail
shopt -s nullglob
# --- Helper Functions for Logging ---
log() { echo -e "\033[1;32m[✓]\033[0m $1"; }
step() { echo -e "\n\033[1;34m[*] $1\033[0m"; }
warn() { echo -e "\033[1;33m[!] $1\033[0m"; }
error() { echo -e "\033[1;31m[✗] $1\033[0m"; }

# --- Configuration ---
# User who will own the restored databases
# This user MUST be a superuser or have CREATEDB privileges.
PG_OWNER="" 
# Password for the postgres user, needed for admin tasks like creating databases.
PG_ADMIN_PASS="" 
ROLE_FILE="roles.sql"
PG_CONF_FILE="/etc/postgresql/16/main/postgresql.conf"

# --- Script Body ---

# Check if the roles file exists before trying to process it
if [ ! -f "$ROLE_FILE" ]; then
    echo "Error: Roles file '$ROLE_FILE' not found. Exiting."
    exit 1
fi

echo "--> Cleaning the roles file ($ROLE_FILE) automatically..."
# Use sed to perform in-place edits (-i) on the roles file.
# -e allows for multiple expressions (multiple edits in one command).
sed -i \
    -e '/^\\restrict/d' \
    -e '/^\\unrestrict/d' \
    -e '/^CREATE ROLE postgres;/d' \
    -e '/^ALTER ROLE postgres /d' \
    -e '/^CREATE ROLE flyweightpostgres;/d' \
    -e '/^ALTER ROLE flyweightpostgres /d' \
    "$ROLE_FILE"
echo "Cleaning complete."
echo

echo "--> Restoring roles from the cleaned file..."
export PGPASSWORD="$PG_ADMIN_PASS"
# Now, use the automatically cleaned roles file
sudo -u postgres psql -f "$ROLE_FILE" || echo "Warnings occurred during role restore. Continuing..."
unset PGPASSWORD
echo "Roles restored."
echo

# Restore all .dump and .tar files in the current folder
for dumpfile in *.dump; do
    # This check is good practice, though nullglob handles it.
    [ -e "$dumpfile" ] || continue 

    # Extract database name from filename (e.g., "my_app.dump" -> "my_app")
    dbname="${dumpfile%.*}"

    echo "----------------------------------------"
    echo "Processing restore for database: $dbname"
    echo "----------------------------------------"

    step "Configuring pg_cron for '$dbname' and restarting PostgreSQL..."
    DB_NAME_SETTING="cron.database_name = '$dbname'"

    if sudo grep -q "^#*\\s*cron.database_name" "$PG_CONF_FILE"; then
        sudo sed -i "s/^#*\\s*cron.database_name\\s*=.*/$DB_NAME_SETTING/" "$PG_CONF_FILE"
        log "Updated cron.database_name for '$dbname'."
    else
        echo "$DB_NAME_SETTING" | sudo tee -a "$PG_CONF_FILE" > /dev/null
        log "Added cron.database_name for '$dbname'."
    fi

    step "Restarting PostgreSQL to load new configurations..."
    sudo systemctl restart postgresql
    log "PostgreSQL restarted."
    # 1. Drop the database if it exists, to ensure a clean restore
    echo "Dropping database '$dbname' if it exists..."
    export PGPASSWORD="$PG_ADMIN_PASS"
    sudo -u postgres dropdb --if-exists "$dbname"
    unset PGPASSWORD

    # 2. CRITICAL STEP: Create the database before restoring into it.
    echo "Creating new database '$dbname' owned by '$PG_OWNER'..."
    export PGPASSWORD="$PG_ADMIN_PASS"
    sudo -u postgres createdb -O "$PG_OWNER" "$dbname"
    unset PGPASSWORD

    # 3. Restore the contents into the newly created database.
    echo "Restoring contents from '$dumpfile' into '$dbname'..."
    export PGPASSWORD="$PG_ADMIN_PASS" # Use admin pass for restore
    sudo -u postgres pg_restore \
        -d "$dbname" \
        --no-owner \
        --clean \
        --if-exists \
        -v "$dumpfile" || echo "Warning: Non-fatal errors may have occurred while restoring '$dbname'."
    unset PGPASSWORD
    echo "Restore for '$dbname' complete."
    echo
done

echo "======================"
echo "All restores complete."
echo "======================"