Darian Vance

Posted on Jan 24 • Originally published at wp.me

Solved: Automate PostgreSQL Database Backups to S3 with a Bash Script

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: Manual PostgreSQL database backups are prone to errors and inconsistency, leaving critical data vulnerable. This guide provides a step-by-step Bash script solution to automate PostgreSQL backups directly to AWS S3, ensuring reliable, offsite data protection managed efficiently through cron.

🎯 Key Takeaways

The core backup script leverages pg\_dump for database export, gzip for compression, and aws s3 cp for secure upload to a designated S3 bucket.
Proper IAM permissions are crucial, requiring at least s3:PutObject for the target S3 bucket, and for production, using a .pgpass file is recommended over embedding PGPASSWORD directly in the script.
Cron is used to schedule automated execution, with output redirection (> logs/backup\_pg\_to\_s3.log 2>&1) essential for logging script results and troubleshooting, alongside integrated local cleanup for old backup files.

Automate PostgreSQL Database Backups to S3 with a Bash Script

Introduction

In the world of data, integrity and availability are paramount. For SysAdmins, Developers, and DevOps Engineers managing PostgreSQL databases, the thought of data loss is a persistent nightmare. Manual backup processes are not only tedious and error-prone but also inconsistent, leaving your critical data vulnerable. A robust disaster recovery strategy demands automated, offsite backups that are reliable and easily recoverable.

This tutorial from TechResolve provides a comprehensive, step-by-step guide to automating your PostgreSQL database backups directly to AWS S3 using a simple yet powerful Bash script. By the end of this article, you will have a resilient backup system in place, ensuring your data is safe, secure, and ready for restoration whenever needed, all managed efficiently through cron.

Prerequisites

Before you begin, ensure you have the following in place:

A running PostgreSQL database instance: You should have administrative access and the pg_dump utility available.
AWS Account and S3 Bucket: An active AWS account with an S3 bucket created for storing your backups.
AWS CLI Installed and Configured: The AWS Command Line Interface should be installed on your backup server and configured with credentials that have sufficient permissions (specifically s3:PutObject) for your target S3 bucket.
Basic understanding of Bash Scripting: Familiarity with writing and executing Bash scripts.
Basic understanding of Cron: Knowledge of how to schedule tasks using cron.
Compression Utility: gzip or bzip2 (usually pre-installed on Linux systems).

Step-by-Step Guide

Step 1: Configure AWS S3 Bucket and IAM Permissions

First, you need an S3 bucket to store your backups and an IAM entity (user or role) with permissions to write to it.

Create an S3 Bucket: Log in to your AWS Console, navigate to S3, and create a new bucket. For example, my-pg-backups-[ID]. Choose a region close to your database server for optimal performance.
Configure IAM Permissions:

Create an IAM User or Role specifically for this backup process. This is a best practice for security. Attach an IAM policy that grants s3:PutObject permission to your backup bucket. A minimal policy might look like this:

   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject"
               ],
               "Resource": "arn:aws:s3:::my-pg-backups-[ID]/*"
           }
       ]
   }

Replace my-pg-backups-[ID] with your actual bucket name. Generate an Access Key ID and Secret Access Key for the IAM user (or configure an instance role).

Configure AWS CLI:

On your database server, configure the AWS CLI using the credentials for your newly created IAM user:

   aws configure
   AWS Access Key ID [****************]: YOUR_ACCESS_KEY_ID
   AWS Secret Access Key [****************]: YOUR_SECRET_ACCESS_KEY
   Default region name [None]: us-east-1
   Default output format [None]: json

Test the configuration by trying to list objects in your bucket (if your policy allows, otherwise just verify configuration files).

Step 2: Create the Backup Script

Now, let’s create the Bash script that will perform the backup, compress it, and upload it to S3. Create a file named backup-pg-to-s3 in a suitable location, like /home/user/scripts/.

# Bash Script

# --- Configuration ---
DB_USER="your_pg_user"            # PostgreSQL username
DB_NAME="your_database_name"      # PostgreSQL database name to backup
PGPASSWORD="your_pg_password"    # PostgreSQL user's password (for security, consider .pgpass file)
S3_BUCKET="my-pg-backups-[ID]"    # Your S3 bucket name
BACKUP_DIR="/home/user/pg_backups" # Local directory to store backups temporarily
RETENTION_DAYS=7                # Number of days to keep local backups

# --- Script Logic ---

# Create backup directory if it doesn't exist
mkdir -p "${BACKUP_DIR}"

# Generate timestamp for the backup filename
TIMESTAMP=$(date +%Y-%m-%d_%H-%M-%S)
BACKUP_FILE="${DB_NAME}_${TIMESTAMP}.sql"
COMPRESSED_FILE="${BACKUP_FILE}.gz"
FULL_BACKUP_PATH="${BACKUP_DIR}/${BACKUP_FILE}"
FULL_COMPRESSED_PATH="${BACKUP_DIR}/${COMPRESSED_FILE}"

echo "Starting PostgreSQL backup for database: ${DB_NAME}"
echo "Backup timestamp: ${TIMESTAMP}"

# Export PGPASSWORD so pg_dump doesn't prompt for it
export PGPASSWORD

# Perform the pg_dump
echo "Dumping database to ${FULL_BACKUP_PATH}"
pg_dump -U "${DB_USER}" -d "${DB_NAME}" > "${FULL_BACKUP_PATH}"
DUMP_STATUS=$?
if [ ${DUMP_STATUS} -ne 0 ]; then
    echo "ERROR: pg_dump failed with exit code ${DUMP_STATUS}"
    exit 1
fi
echo "Database dump complete."

# Compress the backup file
echo "Compressing backup file: ${FULL_BACKUP_PATH}"
gzip "${FULL_BACKUP_PATH}"
GZIP_STATUS=$?
if [ ${GZIP_STATUS} -ne 0 ]; then
    echo "ERROR: gzip compression failed with exit code ${GZIP_STATUS}"
    exit 1
fi
echo "Backup file compressed to ${FULL_COMPRESSED_PATH}"

# Upload the compressed backup to S3
echo "Uploading ${COMPRESSED_FILE} to s3://${S3_BUCKET}/"
aws s3 cp "${FULL_COMPRESSED_PATH}" "s3://${S3_BUCKET}/${COMPRESSED_FILE}"
S3_UPLOAD_STATUS=$?
if [ ${S3_UPLOAD_STATUS} -ne 0 ]; then
    echo "ERROR: S3 upload failed with exit code ${S3_UPLOAD_STATUS}"
    exit 1
fi
echo "Backup successfully uploaded to S3."

# Clean up local backup files older than RETENTION_DAYS
echo "Cleaning up local backups older than ${RETENTION_DAYS} days..."
find "${BACKUP_DIR}" -type f -name "*.gz" -mtime +"${RETENTION_DAYS}" -delete
echo "Local cleanup complete."

# Unset PGPASSWORD for security
unset PGPASSWORD

echo "PostgreSQL backup to S3 process finished."

Logic Explanation:

Configuration: Define crucial variables like database credentials, S3 bucket name, and local backup directory. We use PGPASSWORD for simplicity, but for production, consider using a .pgpass file for enhanced security.
Directory Creation: Ensures the local backup directory exists.
Timestamping: Creates a unique timestamp to append to the backup filename, preventing overwrites and aiding in recovery.
pg_dump: Exports the specified PostgreSQL database into a SQL file. The export PGPASSWORD command ensures that pg_dump does not interactively ask for the password.
gzip: Compresses the SQL dump to save disk space and reduce upload time/costs for S3 storage.
aws s3 cp: This command securely uploads the compressed backup file to your designated S3 bucket.
Error Handling: Basic error checks are included after critical commands to ensure the script fails gracefully if a step doesn’t complete successfully.
Local Cleanup: Removes old local backup files to prevent the backup directory from consuming excessive disk space.
Security: The unset PGPASSWORD command is crucial to remove the password from the environment variables once the operation is complete.

Step 3: Make the Script Executable and Test

Once you’ve created the script, you need to make it executable and perform a manual test to ensure everything works as expected.

Make Executable:

Change the permissions of the script to make it executable:

   chmod +x /home/user/scripts/backup-pg-to-s3

Test Manually:

Run the script from your terminal:

   /home/user/scripts/backup-pg-to-s3

Monitor the output for any errors. After successful execution, check your local /home/user/pg_backups directory for the compressed backup file and your S3 bucket to confirm the file has been uploaded.

Step 4: Schedule the Backup with Cron

Finally, we’ll use cron to schedule the script to run automatically at a regular interval. Cron is a time-based job scheduler in Unix-like operating systems.

Open your cron editor:

Execute the command to open your cron editor:

   Open your cron editor

This will open a text editor (usually Vi or Nano) where you can add your cron job.

Add a Cron Entry:

Add the following line to the end of the file. This example schedules the backup to run every day at 2:00 AM.

   0 2 * * * /home/user/scripts/backup-pg-to-s3 > logs/backup_pg_to_s3.log 2>&1

Let’s break down the cron entry:

0 2 * * *: Specifies the schedule (minute 0, hour 2, every day of the month, every month, every day of the week).
/home/user/scripts/backup-pg-to-s3: The absolute path to your executable backup script.
> logs/backup_pg_to_s3.log 2>&1: This part is crucial for logging. It redirects standard output and standard error to a log file. Remember to replace logs/ with an appropriate directory like /home/user/backup_logs/ if you prefer. This ensures you can review the script’s execution results and troubleshoot any issues without relying on the console.

Save and exit the cron editor. Cron will automatically pick up the new schedule.

Common Pitfalls

IAM Permissions: One of the most common issues is insufficient IAM permissions. Ensure your IAM user/role has at least s3:PutObject for the target bucket. Always follow the principle of least privilege.
Database Connectivity: Double-check the DB_USER, DB_NAME, and PGPASSWORD in your script. Incorrect credentials or host settings can lead to pg_dump failures.
PATH Issues in Cron: Cron environments often have a minimal PATH. While we’ve called pg_dump and aws directly, if they are not found, you might need to specify their full paths (e.g., /usr/local/bin/pg_dump).
Disk Space: Although the script includes local cleanup, if backups fail or the cleanup process encounters an issue, the /home/user/pg_backups directory can fill up quickly. Monitor your disk space.
Time Zones: Cron schedules operate based on the server’s local time zone. Be mindful of this when setting your backup times, especially for geographically distributed teams or servers.

Conclusion

Automating your PostgreSQL database backups to AWS S3 with a Bash script is a fundamental practice for any robust infrastructure. This tutorial has equipped you with a reliable, efficient, and cost-effective solution to protect your data against unforeseen events. By implementing this strategy, you’re not just backing up data; you’re investing in peace of mind, ensuring business continuity and compliance.

Remember that while setting up the backup is critical, regularly testing your recovery process is equally important. Practice restoring your database from an S3 backup to validate the integrity of your backups and your ability to recover swiftly. Consider further enhancements like client-side encryption, detailed logging, or more sophisticated S3 lifecycle policies for advanced retention management.