DEV Community

Cover image for Self-host - Part 3 - MySQL and PostgreSQL Database Backup to Local Drive
Davor Jovanović
Davor Jovanović

Posted on

Self-host - Part 3 - MySQL and PostgreSQL Database Backup to Local Drive

This blog will be the third in the three-part series (maybe more, we will see) of self-hosting. In the first part, we have explained how to start and secure your self-hosted server. In the second part we addressed zero-downtime deployment using Docker Swarm. This third part will discuss backing up our PostgreSQL and MySQL databases without downtime.

It has been decided to bring this topic into this series as it has a huge part in having a reliable, resilient application to system/database failures as much as possible. Do note that you should go through part 1 and part 2 of this series, as this is the continuation of the code presented in those parts, and you might not understand everything if you don't read previous parts.

First things first, why back up at all? Well, the straightforward answer is because we want to save data from databases even if there is some failure of the server. Namely, we don't want to force our end-users to, for example, input their data again every time something goes wrong with the database (and something can always go wrong, starting from bad scripts that accidentally delete the database, to server failure without recovery). Therefore, to provide as best UX as possible, we need to store data from the databases in multiple places, as that would reduce the chance of everything being lost altogether. We want to make our data resilient.

In this article, in the spirit of previous articles in this series, we will tackle the issue of backing up MySQL and PostgreSQL databases with docker services and how to achieve it without any downtime of our production application. Namely, saving data from both databases to the local disk, and restoring that data as necessary.

Why local backup, and not S3 or any other cloud storage?

In essence, you can store backed-up data wherever you decide, as long as it is accessible at the moment of that data restoration. In the spirit of this series, the goal is to reduce costs as much as possible, and storage from cloud providers, as cheap as it might be, costs, which, if you remember from previous parts of this series, we try to avoid as much as possible. In that regard, we will explain now how to store everything on your local disk (or external drive if you like), and if you decide to store everything in remote storage, the same principles apply, with the only difference that you would send the files to remote cloud storage, instead of saving them locally.

Now that we have explained the reasoning behind backing up our databases, let's proceed with the implementation, namely, how we will achieve it.

Backup services in Swarm cluster

To achieve our goal, at the beginning, we need to define the services that we will use in our docker-compose file. Let's say we have MySQL and PostgreSQL databases in our production application, something like this:

services:
  mysqldb:
    image: "mysql:8.0"
    restart: always
    env_file:
      - "path to env file"
    volumes:
      - dbdata:/var/lib/mysql
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
        failure_action: rollback
        delay: 5s
    networks:
      - mysql-network

  pgdb:
    image: "postgres:16.3"
    restart: always
    env_file:
      - "path to env file"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - /etc/localtime:/etc/localtime
    ports:
      - 5432:5432
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
        failure_action: rollback
        delay: 5s
    networks:
      - pg-network
Enter fullscreen mode Exit fullscreen mode

As you can see from the code above, there are MySQL and PostgreSQL services that we can use to store whichever data we decide. Now, let's define database backup services:

MySQL:

  mysqldb-backup:
    image: fradelg/mysql-cron-backup
    volumes:
      - ./backup/mysqldb:/backup
    env_file:
      - "path to env file"
    environment:
      - MYSQL_HOST=mysqldb
      - MYSQL_USER=root
      - MYSQL_PASS=${MYSQL_ROOT_PASSWORD}
      - MYSQL_DATABASE="database name here"
      - MAX_BACKUPS=1
      - INIT_BACKUP=1
      - TIMEOUT=60s
      - CRON_TIME=0 01 * * *
      - GZIP_LEVEL=6
      - MYSQLDUMP_OPTS=--no-tablespaces
    restart: unless-stopped
    deploy:
      mode: global
      update_config:
        order: start-first
        failure_action: rollback
        delay: 5s
    networks:
      - mysql-network
Enter fullscreen mode Exit fullscreen mode

As you can see above, we use fradelg/mysql-cron-backup which can be found here. This service will connect to our mysqldb service (note that those are on the same docker network), and perform a backup every day at 00:01 (as might be seen from CRON_TIME). You can take a look at the documentation of the docker image for the details, but the main thing to note from the code above is that this service will connect to our mysqldb service and perform a backup to an unnamed volume /backup/mysqldb next to our docker-compose file. If you have read previous article you will see that this docker-compose file will be placed in /app on our remote server. We have also specified that we want to have only one backup at a time, as in this case, we don't want to store multiple backup files on our remote server and use too much of our remote storage. The deploy section refers to Swarm cluster behavior, so our cluster knows how to handle this service.

Note that this image is convenient enough and gzips everything to our specified backup folder, and after one backup, the contents of the folder would, for example, look as follows:

  • latest.sql.gz
  • 202408120100.sql.gz

And we would always have latest.sql.gz linked to the latest backup for convenience.

PostgreSQL:

  pgdb-backup:
    image: prodrigestivill/postgres-backup-local
    volumes:
      - ./backup/pgdb:/backups
    env_file:
      - "path to postgres envs here"
    environment:
      - POSTGRES_HOST=pgdb
      - SCHEDULE=@daily
      - BACKUP_KEEP_DAYS=4
      - BACKUP_KEEP_WEEKS=0
      - BACKUP_KEEP_MONTHS=0
      - HEALTHCHECK_PORT=8080
    restart: unless-stopped
    deploy:
      mode: global
      update_config:
        order: start-first
        failure_action: rollback
        delay: 5s
    networks:
      - pg-network
Enter fullscreen mode Exit fullscreen mode

In the case of PostgreSQL, we use image prodrigestivill/postgres-backup-local which can be found here. This service will connect to our pgdb service and perform the backup @daily, namely, once a day at midnight (take a look at cron schedule). Please, look into the documentation for specific features of this image, but for our use case, it is enough to keep every backup for a maximum of 4 days. Same as in the case of MySQL, the deploy section refers to Swarm cluster behavior, so our cluster knows how to handle this service.

Same as with the mysqldb-backup service, this image is convenient enough and gzips everything to our specified backup folder, and will sort them into folders such as daily, last, weekly, and monthly. Consult documentation for specific folder structure, but basically, this service also provides the latest backup as well as daily for our use case.

Now that we have defined our services in Swarm cluster, which will automatically backup MySQL and PostgreSQL databases every day at midnight and store them in the /app/backup directory on our remote server, we need to tackle the part of transferring those files and folders to our local storage so we can save them for possible future restoration of data in our production environment.

Transferring the files and folders to our local machine

To transfer all folders and files to our local machine, we will use a simple rsync command to connect to our remote server and transfer everything necessary on our backup machine:

execute_backup.sh

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

# Path to the log file
LOG_FILE="$SCRIPT_DIR/cronjob.log"

# Check if the log file exists and is a regular file
if [ -f "$LOG_FILE" ]; then
    # Truncate the log file to remove all content
    > "$LOG_FILE"
    echo "Log file cleared successfully."
else
    echo "Log file not found or is not a regular file."
fi

echo "Current date and time: $(date)"

. "$SCRIPT_DIR/input-data/credentials.txt"

rsync -arvzP --rsh="ssh -p $remote_port" --delete "$remote_username@$remote_ip:/app/backup" "$SCRIPT_DIR/bkp-app"

Enter fullscreen mode Exit fullscreen mode

As you can see above, the backup process is pretty straightforward. We have created a log file for the cron job (which we will cover in the next section) to see if the transfer was successful. After that, we are sourcing the credentials.txt file, which contains all data necessary to connect to the remote server and has the following content:

remote_ip={ip goes here}
remote_port={port goes here}
remote_username={username goes here}
pass={password for remote machine goes here}
pg_username={postgres username}
pg_db={postgres database name}
Enter fullscreen mode Exit fullscreen mode

Afterward, we used the rsync command to transfer everything from /app/backup on our remote server to our local directory bkp-app. Voila! We have our, previously created files and folders by backup services on our remote server, ready and set on our local machine.

Note: For flags used in the rsync command, please look at documentation.

Note that, even if we have all the files and folders required for restoration, we need to set up some additional logic in the following section to have it all automated.

Adding cron jobs

To have everything automated and transfer all backups once daily to our local machine, we need to set up a local cron job, which will run once daily. For convenience, we can run the following script:

add_local_cronjob.sh

CURRENT_DIR=$(pwd)
SCRIPT_NAME="execute_backup.sh"
SCRIPT_PATH="$CURRENT_DIR/$SCRIPT_NAME"

# Get the current system timezone offset in hours
OFFSET=$(date +%z | cut -c 1-3)
# Calculate the adjusted hour for the cron job based on the offset
CRON_HOUR=$((7 + OFFSET))
# The number above is an actual hour we want to run in, in this
# case, we want to run a backup transfer every day at 7 in 
# the morning.

# Ensure CRON_HOUR is within the valid hour range (0-23)
if [ $CRON_HOUR -lt 0 ]; then
    CRON_HOUR=$((24 + CRON_HOUR))
fi

CRON_TIME="0 $CRON_HOUR * * *"
CRON_JOB="$CRON_TIME $SCRIPT_PATH >> $CURRENT_DIR/cronjob.log 2>&1"

# Check if the cron job already exists
(crontab -l | grep -F "$CRON_JOB") &> /dev/null

if [ $? -eq 0 ]; then
    echo "Cron job already exists. No changes made."
else
    # Add the new cron job
    (crontab -l; echo "$CRON_JOB") | crontab -
    echo "Cron job added."
fi

Enter fullscreen mode Exit fullscreen mode

The script presented above adds a cron job that runs at 7 in the morning every day and runs the execute_backup.sh shell script. Also, note that we added additional logic to write all output to the cronjob.log file so we can see if everything is okay with the script execution.

Note: This script should only be run once on the local machine.

Now if you read through part 1 of this series, you would know that we have set 2FA using Google Authenticator on our remote server. That means we cannot automatically use a cron job to connect to our server, as we need some way of writing down code from the Google Authenticator application. To omit that authentication, we can add another cron job, but this time on the remote server, and its script would be as follows:

initalize_remote_backup.sh

source "./input-data/credentials.txt"

rsync -arvzP --rsh="ssh -p $remote_port" "./input-data/credentials.txt" "$remote_username@$remote_ip:/app/" 

ssh $remote_username@$remote_ip -p $remote_port "bash -s" << 'ENDSSH'

source "/app/credentials.txt"

cron_command1="59 06 * * * sed -i 's/^auth[[:space:]]\+required[[:space:]]\+pam_google_authenticator\.so[[:space:]]\+debug[[:space:]]\+nullok/# &/' /etc/pam.d/sshd"
cron_command2="01 07 * * * sed -i 's/^#[[:space:]]\+auth[[:space:]]\+required[[:space:]]\+pam_google_authenticator.so[[:space:]]\+debug[[:space:]]\+nullok/auth required    pam_google_authenticator.so debug nullok/' /etc/pam.d/sshd"

echo -e "$pass\n$cron_command1\n$cron_command2" | sudo -S crontab -

rm "/app/credentials.txt"
ENDSSH
Enter fullscreen mode Exit fullscreen mode

In the script above, when running initialize_remote_backup.sh, we are basically removing 2FA at 06:59 in the morning, and bringing back 2FA at 07:01. Namely, at 07:00, we can connect to the server to transfer backup files to our local machine without inputting code from the Google Authenticator application.

Now that we have an automated process of backing up the databases using shell scripting and local, as well as remote cron jobs, we can proceed with instructions on restoration of backed-up data if necessary.

Database restoration

Two scripts are concerned with restoration. The first script, which we will name local_restore.sh will have the rsync command which transfers all files and folders before the restoration script on the remote server. The second script, which we will name remote_restore.sh will restore databases from files and folders on the remote server. Namely, the second script will not transfer files from our local machine before executing commands to restore data. Let's see what these scripts look like, and it will make more sense:

local_restore.sh

source "./input-data/credentials.txt"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

rsync -arvzP --rsh="ssh -p $remote_port" "$SCRIPT_DIR/input-data/credentials.txt" "$SCRIPT_DIR/bkp-app/backup" "$remote_username@$remote_ip:/app/" 

# Execute script on the remote server
ssh $remote_username@$remote_ip -p $remote_port "bash -s" << 'ENDSSH'

source "/app/credentials.txt"

task_id=$(echo "$pass" | sudo -S -E docker stack ps swarm_stack_name --filter "desired-state=running" | grep mysqldb-backup | awk '{print $1}' | head -n 1)

mysqldb_container_id=$(echo "$pass" | sudo -S -E docker inspect --format="{{.Status.ContainerStatus.ContainerID}}" $task_id)

echo "$pass" | sudo -S chown -R root:user /app/backup/mysqldb

# Run restore script according to documentation
echo "$pass" | sudo -S docker container exec $mysqldb_container_id /restore.sh /backup/latest.sql.gz


task_id=$(echo "$pass" | sudo -S -E docker stack ps swarm_stack_name --filter "desired-state=running" | grep pgdb-backup | awk '{print $1}' | head -n 1)

db_container_id=$(echo "$pass" | sudo -S -E docker inspect --format="{{.Status.ContainerStatus.ContainerID}}" $task_id)

# Run restore script according to documentation
echo "$pass" | sudo -S -E docker exec $db_container_id /bin/sh -c "zcat /backups/last/latest.sql.gz | psql --username $pg_username --dbname $pg_db --host db > /dev/null 2>&1"

echo "All done!"

rm "/app/credentials.txt"
ENDSSH
Enter fullscreen mode Exit fullscreen mode

In the script above, we first use all variables in credentials.txt and then use the rsync command to transfer backed-up files from the local machine's bkp-app/backup folder to /app/backup on the remote machine. After transferring the files intended for backing up, we enter the bash script on our remote server and read the credentials.txt file. For both mysqldb-backup and pgdb-backup, to execute scripts in their respective containers, we need to find their process IDs (task_id in the script above) and eventually container IDs (mysqldb_container_id and db_container_id in the script above) inside Swarm cluster. After finding their container IDs, we can execute the command inside the container.
For mysqldb-backup, according to documentation for restoration, we need to run restore.sh script inside the container, which will handle everything for us.
For pgdb-backup, their documentation is a bit unclear about restoration, so this command is used in the script above:

echo "$pass" | sudo -S -E docker exec $db_container_id /bin/sh -c "zcat /backups/last/latest.sql.gz | psql --username $pg_username --dbname $pg_db --host db > /dev/null 2>&1"
Enter fullscreen mode Exit fullscreen mode

Note: $pass, $pg_db, and $pg_username are sourced from credentials.txt file.

After executing the script, both MySQL and PostgreSQL databases should be restored to the local version from our machine.

Note: Part echo "$pass" | is applied to allow the use of the sudo command for docker (as sudo is required for docker on the remote server), where $pass is sourced from credentials.txt file that is included in the script.

In case we want to restore from the latest data which is already present in the remote server, we should modify:

rsync -arvzP --rsh="ssh -p $remote_port" "$SCRIPT_DIR/input-data/credentials.txt" "$SCRIPT_DIR/bkp-app/backup" "$remote_username@$remote_ip:/app/" 
Enter fullscreen mode Exit fullscreen mode

to

rsync -arvzP --rsh="ssh -p $remote_port" "$SCRIPT_DIR/input-data/credentials.txt" "$remote_username@$remote_ip:/app/" 
Enter fullscreen mode Exit fullscreen mode

namely, remove the part where backup is transferred to the remote server. This script in code is called remote_restore.sh.

We have successfully restored both databases and our end users will be very happy! Hooray!

Onboarding new local machine

Let's explain what the process for a new machine should look like before wrapping up.

Let's say that you cloned the repository from GitHub with all the code explained above, your file structure should look like this:

  • docker-compose.yaml (with databases and services for backing up)
  • input-data
    • credentials.txt (you should create this file if it is non-existent)
  • add_local_cronjob.sh
  • execute_backup.sh
  • initialize_remote_backup.sh
  • local_restore.sh
  • remote_restore.sh

If it is a first-time setup, the new local machine should run these scripts:

  • add_local_cronjob.sh
  • initialize_remote_backup.sh

Script add_local_cronjob.sh is in charge of calling execute_backup.sh that will handle all stuff related to backing up. While initialize_remote_backup.sh will instruct the remote server to turn off 2FA for two minutes until we connect with our local machine to transfer backed-up files. Onboarding new machines is complete once these cronjobs are run.

Script local_restore.sh should be called when we want to restore databases on the remote server from data on our local machine. Script remote_restore.sh should be called when we want to restore databases from backed-up data on our remote machine.

Note that the new machine should be provided with proper data for credentials.txt before adding cron jobs and running restorations.

Wrapping up

We have explained how database backup and restoration can be achieved, and all data saved to our local disk (internal or external hard drive). Also, we have tackled restoring those databases if we have some outage on our remote server or whatever issue might occur with the databases, to improve the end-user experience of not having to input all data every time something happens.

The benefit of this approach is, of course, that we don't pay/subscribe for our local hard drive storage (or we buy it once and that is it), and we have much more control when we have all data from our databases physically at our fingertips and not on some cloud somewhere. It gives a greater sense of control over our data and, in the end, provides us with an option for our startup to do cost-efficient backing up of our databases.

Once our startup application generates meaningful revenue, we can start paying for storage at some cloud provider and transfer all the data there.

Useful links

Top comments (2)

Collapse
 
devlocal profile image
devlocal

Thank you for this series, it was information exactly in this spirit I was looking for. Have you checked up Capn Rover? It's built on top of docker swarm, makes it very easy for self-host. Can deploy by github etc.

Collapse
 
davorj94 profile image
Davor Jovanović

Thank you. No I haven't. Can you send link to documentation please? :)