DEV Community

loading...

Postgres backup using Docker

thakkaryash94 profile image Yash Thakkar ・2 min read

Postgres is one of the most popular open-source Relational databases. You can read more about it here. The purpose of this blog to explain how you can take a backup of your Postgres database running with and without docker using the docker image and why I have created this image.

Problem

Running Postgres inside the docker is very easy. Docker hub has an official postgres image that we can run. With a single command, we can start using Postgres database. The issue happens when we want to run a backup with cron. There are many ways to do this. We can follow official documentation as well to automate the backup on Linux. The advantage of using docker is flexibility and no platform dependency. The official document only shows us how to run cron job on Linux. This will add the os level of dependency that we wanted to ignore. When searching for the backup docker image, I was not able to find any Docker image that supports Postgres 13 with S3 backup support. Existing images are not up-to-date.

Solution

To tackle the problem, I have created my own docker image. To start the backup, follow the below instruction.

Required Environment Variables

All the environment variables from https://www.postgresql.org/docs/current/libpq-envars.html are supported because we are using native postgres-client binary for backup.

  • PGHOST: behaves the same as the host connection parameter. eg. postgresql
  • PGHOSTADDR: behaves the same as the hostaddr connection parameter. This can be set instead of or in addition to PGHOST to avoid DNS lookup overhead.
  • PGHOST: postgresql
  • PGPORT: 5432
  • PGDATABASE: database
  • PGUSER: postgres
  • PGPASSWORD: password
  • S3_HOST: https://storage.googleapis.com || s3.eu-west-1.amazonaws.com || nyc3.digitaloceanspaces.com
  • S3_BUCKET: BUCKET
  • S3_ACCESS_KEY: ACCESS_KEY
  • S3_SECRET_KEY: SECRET_KEY
  • CRON_SCHEDULE: "* * * * *". Read more https://crontab.guru/

Docker Run Command

docker run -d \
      --name postgres-backup \
      -v $(pwd)/backups:/backups \
      -e PGHOST=postgresql
      -e PGPORT=5432
      -e PGDATABASE=db_name
      -e PGUSER=postgres
      -e PGPASSWORD=password
      -e S3_ACCESS_KEY=ACCESS_KEY
      -e S3_SECRET_KEY=SECRET_KEY
      -e S3_BUCKET=BUCKET
      -e S3_HOST=https://storage.googleapis.com || s3.eu-west-1.amazonaws.com || nyc3.digitaloceanspaces.com
      -e CRON_SCHEDULE="@daily"
      docker.pkg.github.com/thakkaryash94/docker-postgres-backup/docker-postgres-backup:latest
Enter fullscreen mode Exit fullscreen mode

That's it, now the container should be up and running.

Links:

Discussion (2)

pic
Editor guide
Collapse
patarapolw profile image
Pacharapol Withayasakpunt • Edited

I tweaked a little with Python script, and used supervisord instead of tini.

  • backup.sh
#!/bin/sh

echo "$(date): backup process started"
echo "$(date): pg_dump started for ${POSTGRES_DB}"

export BACKUP_ROOT=/backups

FILE=$BACKUP_ROOT/$POSTGRES_DB-$(date +\%FT\%H-%M-%S).sql.gz
pg_dump | /bin/gzip > $FILE

echo "$(date): pg_dump completed"

python3 del.py

echo "$(date): deleted similar files / cluttered old files"
Enter fullscreen mode Exit fullscreen mode
  • del.py
import os
import filecmp
from datetime import datetime, timedelta

os.chdir(os.environ["BACKUP_ROOT"])

latest, *recent = sorted(os.listdir())[::-1]
latest_ctime = datetime.fromtimestamp(os.stat(latest).st_ctime)

previous_file_ctime = latest_ctime
for f in recent:
  ctime = datetime.fromtimestamp(os.stat(f).st_ctime)

  if latest_ctime - ctime < timedelta(days=7):
    if filecmp.cmp(latest, f):
      os.unlink(f)  # Remove similar files
  elif previous_file_ctime - ctime < timedelta(days=7):
    os.unlink(f)  # Remove less than 1 week apart
  elif latest_ctime - ctime < timedelta(days=180):
    os.unlink(f)  # Remove older than 180 days

  previous_file_ctime = ctime
Enter fullscreen mode Exit fullscreen mode

I am also considering backing up to S3 from DigitalOcean; but not yet. Waiting until launched to production...

Collapse
thakkaryash94 profile image
Yash Thakkar Author

That's an interesting approach. I also wanted to add logic to remove only a 1-week old backup but it was not important at that time, so I skipped it. My intention in using alpine was to reduce the image size. Even I was considering setting up the cron job outside of the image, on the host.

I have created this image to take backup on GCS and I believe this will work for any object storage system which supports s3 protocol as we are doing only 1 operation, which is just uploading the zip.

I have created one more image which zips the folder and upload it to DO Spaces using go and minio. Feel free to checkout. github.com/thakkaryash94/docker-sp...