Introduction: Docker Volume Backup Became Serious When My Redis Data Vanished
Last month, when a Redis container running on my own VPS suddenly crashed, all my data evaporated. The incident happened on April 28th, around 4:00 AM. It was just a simple test environment, but still annoying because that Redis instance held temporary cache data for one of my side products. At that moment, I realized I needed to take Docker Volume backup more seriously.
This is a situation that can happen to anyone running things on their own VPS. While Docker containers are ephemeral, the data they contain is expected to be persistent. This is where Docker Volumes come into play, and a robust Docker Volume backup strategy is essential to eliminate the risk of data loss. In this post, I will share what I learned and the practical solutions I implemented during this process.
Challenges of Docker Volume Backup and Why You Need to Be Careful
Docker Volumes ensure data persistence because they have a separate lifecycle from the container's file system. However, this makes backup processes slightly different from traditional file backups. Directly entering /var/lib/docker/volumes and copying files is not always a safe method.
Especially for services that continuously write data, such as databases, file consistency at the time of backup is critically important. In the past, I saw backups corrupted due to a WAL bloat issue in a PostgreSQL container. Or, with in-memory databases like Redis, I experienced data loss because I didn't configure the OOM eviction policy correctly. Therefore, it's crucial to consider the application's state when performing backups.
⚠️ Important Note: Consistency
Directly backing up a live database or application volume using file copying methods can lead to data consistency issues. Especially copies made while write operations are ongoing can result in corrupted backups. Always prefer the application's own backup mechanisms or instant snapshot methods.
Basic Docker Volume Backup Methods
I've tried a few different methods for Docker Volume backup. Each has its own advantages and disadvantages. I usually choose between these methods based on my needs.
1. Backup with docker run --volumes-from
This method works by attaching the volumes of a specific container to a temporary backup container. This way, I can easily compress all data in the volume with the tar command and save it elsewhere. This is one of the methods I use most frequently and find most reliable.
How It Works?
First, I stop the container to which the volume I want to back up is attached (important for consistency). Then, I run a temporary container that mounts the volumes of that container onto itself. Inside this temporary container, I use the tar command to compress the data in the volume and write it to another volume mounted on the host machine.
Steps and Code Example
Let's say I have a Docker Volume named my_postgres_data and I want to back it up.
-
Stop the relevant service: If it's a critical service like a database, stopping it is the healthiest approach for consistency.
docker stop my_postgres_container -
Run the backup container: This container will attach the
my_postgres_datavolume to itself and write the backup file to thebackupdirectory.
docker run --rm --volumes-from my_postgres_container -v $(pwd):/backup ubuntu tar cvf /backup/postgres_data_$(date +%Y%m%d%H%M%S).tar /var/lib/postgresql/dataHere:
-
--rm: Delete the container after the operation is complete. -
--volumes-from my_postgres_container: Attach all volumes frommy_postgres_containerto this new container. -
-v $(pwd):/backup: Mount the current directory on my host machine ($(pwd)) to the/backupdirectory inside the container. This way, the backup file is written to the host. -
ubuntu: Use a basic Ubuntu image. -
tar cvf /backup/postgres_data_$(date +%Y%m%d%H%M%S).tar /var/lib/postgresql/data: Compress the data in the volume withtarand write it to the/backupdirectory. The/var/lib/postgresql/datapart is the data directory path for PostgreSQL within the volume. This path varies by application.
-
-
Restart the service:
docker start my_postgres_container
Restore Steps
Restoring from a backup works with a similar logic:
-
Delete the existing volume (be careful!):
docker stop my_postgres_container docker rm my_postgres_container docker volume rm my_postgres_data -
Create an empty volume:
docker volume create my_postgres_data -
Extract the backup file into the new volume:
docker run --rm -v my_postgres_data:/var/lib/postgresql/data -v $(pwd):/backup ubuntu bash -c "cd /var/lib/postgresql/data && tar xvf /backup/postgres_data_XXXX.tar --strip-components 1"Here, you should replace
XXXXwith the name of your backup file. The--strip-components 1parameter extracts the files directly into the target directory, skipping the first-level directory inside thetarfile.
Advantages and Disadvantages
- Advantages: Simple, reliable, easily implementable if you are familiar with
dockercommands. Offers the option to stop the container for data consistency. - Disadvantages: May require a brief
downtimefor critical services. Backup and restore times can be long for large volumes. When using this approach for a production ERP system, I preferred to do it during off-peak hours or maintenance windows.
2. Backup with docker cp
This method is used to directly copy files or directories from a running container to the host machine. It can be a quick solution for small and non-critical data.
How It Works?
The docker cp command is the fastest way to copy files or directories between a running container and the host machine. However, it carries risks regarding data consistency.
Steps and Code Example
Let's say I want to back up the configuration files in the /app/config directory of a container named my_app_container.
docker cp my_app_container:/app/config ./backup_configs
This command will copy the /app/config directory from my_app_container to the backup_configs directory under the current directory (.) on my host machine.
Advantages and Disadvantages
- Advantages: Fast, no need to stop the container. I use this method when backing up log files or static configurations for my side product's backend.
- Dezavantajlar: For volumes with live writes, the risk of data consistency issues is very high. The file might be open or in the process of being written to at the time of backup.
3. Backup by Mounting the Volume
This method involves finding the actual disk location of the Docker Volume on the host machine and then backing up that directory directly using traditional Linux tools like rsync or tar.
How It Works?
First, I find the full path of the volume on the host machine using the docker inspect command. Then, I access that path and use my preferred backup tool.
Steps and Code Example
Let's say I have a volume named my_data_volume.
-
Find the volume's path on the host:
docker volume inspect my_data_volumeLook for the
Mountpointfield in the output. For example:/var/lib/docker/volumes/my_data_volume/_data. -
Perform backup using this path:
# Backup with rsync rsync -avz /var/lib/docker/volumes/my_data_volume/_data /path/to/backup/destination/ # Backup with tar tar cvfz /path/to/backup/destination/my_data_volume_$(date +%Y%m%d%H%M%S).tar.gz /var/lib/docker/volumes/my_data_volume/_data
Advantages and Disadvantages
- Advantages: Easy to use Linux tools due to direct file system access. Can simplify backup scripts.
- Disadvantages: Again, there's a risk of data consistency issues, especially for live databases. The
Mountpointpath may vary depending on Docker versions or specific installations. This method was something I saw used on an internal platform at a bank, but I don't particularly prefer it.
Advanced Docker Volume Backup Strategies
For more critical data and complex scenarios, we need more than just file copying. Here, leveraging the features provided by the application itself or the infrastructure is much more sensible.
1. Database-Specific Backup
Databases are the most sensitive components in terms of data consistency. Therefore, it's best to use the database's own backup tools, such as pg_dump for PostgreSQL, BGSAVE for Redis, and mysqldump for MySQL.
How It Works?
A consistent backup of the database is taken via the database client or utility program within the container. This backup is usually in an SQL file or a special format. This backup file is then written to a Docker Volume, and that volume can then be backed up using the methods described above.
Steps and Code Example (PostgreSQL)
Create a volume to write the backup file to, or use an existing one:
Let's say I have a volume namedpg_backup_volume.-
Run
pg_dumpinside the PostgreSQL container and write the backup topg_backup_volume:
docker exec my_postgres_container pg_dump -U myuser mydb > /var/lib/postgresql/data/backups/mydb_$(date +%Y%m%d%H%M%S).sqlHere, the
/var/lib/postgresql/data/backupsdirectory should be mounted topg_backup_volumeor be a subdirectory withinmy_postgres_container's own volume. I usually use a separate backup volume.ℹ️ PostgreSQL WAL Bloat and Redis OOM Eviction
In PostgreSQL, sufficient disk space and appropriate
wal_levelsettings are important to avoidWAL bloatwhenpg_dumpis running. It happened to me: during backup, the disk filled to 100%, and WAL files bloated. In Redis, since theBGSAVEcommand uses memory intensively, if theOOM eviction policy(e.g.,allkeys-lruinstead ofnoeviction) is not set correctly, service interruptions can occur due to insufficient memory during backup. Therefore, it's necessary to define resource limits (cgroup limits) well. Backup the volume containing the backup file: Now I can safely back up the SQL file inside
/var/lib/postgresql/data/backupsusing thedocker run --volumes-fromorrsyncmethods mentioned above. This is because the SQL file is already a consistent snapshot.
2. Snapshot-Based Backup
If your VPS provider or infrastructure (e.g., LVM, ZFS) supports snapshot functionality, this is one of the fastest and most consistent backup methods.
How It Works?
A snapshot creates a copy of the disk partition's state at a specific moment. Since this operation is instantaneous, it allows for taking a consistent backup even of applications like databases.
Steps (Dependent on VPS Provider)
- Take a snapshot from the VPS panel: Most VPS providers (DigitalOcean, Vultr, Linode, etc.) offer the ability to take a server snapshot with a single click. This backs up the entire disk content (including Docker Volumes).
-
If you are using LVM or ZFS: If your server uses advanced file systems like LVM (Logical Volume Manager) or ZFS, you can take instant volume snapshots with
lvcreate --snapshotorzfs snapshotcommands.
# LVM Snapshot example lvcreate --size 1G --snapshot --name myvol_snap /dev/myvg/my_data_volume # Then you can mount the snapshot and perform backup mount /dev/myvg/myvol_snap /mnt/snapshot tar cvfz /backup/myvol_snap.tar.gz /mnt/snapshot umount /mnt/snapshot lvremove -f /dev/myvg/myvol_snap
Advantages and Disadvantages
- Advantages: Very fast, provides instant data consistency, usually backs up the entire system.
- Disadvantages: Infrastructure dependency. Not every VPS provider or server may offer this feature. There might be additional costs. When using such snapshots in a customer project, I saw that the recovery time was incredibly fast.
3. Third-Party Tools (Restic/BorgBackup)
For more flexible, encrypted, and deduplication-enabled backups, I like to use tools like Restic or BorgBackup. These can be very efficient, especially for large and frequently changing volumes.
How It Works?
These tools divide files into blocks and only back up changed blocks. They encrypt the data and securely send it to remote storage locations (S3, SFTP, Backblaze B2, etc.).
Steps and Code Example (Restic)
-
Install Restic: You can install it on your host machine or in a separate container. I usually prefer installing it on the host and accessing the volume's
Mountpoint.
# Installation on Linux wget https://github.com/restic/restic/releases/download/v0.16.2/restic_0.16.2_linux_amd64.bz2 bzip2 -d restic_0.16.2_linux_amd64.bz2 mv restic_0.16.2_linux_amd64 /usr/local/bin/restic chmod +x /usr/local/bin/restic -
Initialize a backup repository: This repository can be on a local disk, an SFTP server, or an S3-compatible storage.
# Initialize a repository on local disk mkdir -p /mnt/backups/restic_repo restic init --repo /mnt/backups/restic_repo # Set a password -
Backup the Docker Volume: Use Restic to back up the volume using its
Mountpoint.
# Find the volume's mountpoint VOLUME_PATH=$(docker volume inspect my_data_volume --format '{{ .Mountpoint }}') # Backup with Restic restic backup $VOLUME_PATH --repo /mnt/backups/restic_repo --tag my_data_volume_backupThis command will encrypt and deduplicate the contents of
my_data_volumeand back it up to your specified repository. When backing up the disks of self-hosted runners where I experienced issues like my Astro build running out of memory, the deduplication capabilities of such tools were very useful.
Advantages and Disadvantages
- Advantages: Encryption,
deduplication,incrementalbackup (only backs up changed blocks), supports various storage targets. - Disadvantages: Can have an initial setup and learning curve. Requires backup repository management.
Backup Automation and Monitoring
Manual backups can lead to disaster, especially if you forget or make a mistake. Therefore, automating and monitoring backup processes is vital.
Regular Backup with systemd timer or cron
I generally use systemd timer to run my backup scripts at regular intervals. cron is also a good alternative.
Code Example (systemd timer)
First, create a systemd service file (/etc/systemd/system/docker-volume-backup.service):
[Unit]
Description=Docker Volume Backup Service
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup_docker_volumes.sh
Then create a systemd timer file (/etc/systemd/system/docker-volume-backup.timer):
[Unit]
Description=Run Docker Volume Backup Daily
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Your backup_docker_volumes.sh script will contain the backup commands mentioned above:
#!/bin/bash
LOG_FILE="/var/log/docker-volume-backup.log"
DATE=$(date +%Y%m%d%H%M%S)
BACKUP_DIR="/path/to/your/backup/destination"
echo "--- Backup started at $DATE ---" >> $LOG_FILE
# Example PostgreSQL backup
<figure>
<Image src={cover} alt="An image illustrating Docker Volume backup strategies on a VPS." />
</figure>
docker stop my_postgres_container >> $LOG_FILE 2>&1
docker run --rm --volumes-from my_postgres_container -v $BACKUP_DIR:/backup ubuntu tar cvf /backup/postgres_data_$DATE.tar /var/lib/postgresql/data >> $LOG_FILE 2>&1
docker start my_postgres_container >> $LOG_FILE 2>&1
# Example Restic backup
# VOLUME_PATH=$(docker volume inspect my_data_volume --format '{{ .Mountpoint }}')
# restic backup $VOLUME_PATH --repo /mnt/backups/restic_repo --tag my_data_volume_backup >> $LOG_FILE 2>&1
if [ $? -eq 0 ]; then
echo "Backup completed successfully at $DATE" >> $LOG_FILE
else
echo "Backup failed at $DATE" >> $LOG_FILE
# Notification mechanism can be added in case of error (email, Telegram, etc.)
fi
echo "--- Backup finished at $DATE ---" >> $LOG_FILE
After creating these files, don't forget to reload systemd and enable the timer:
sudo systemctl daemon-reload
sudo systemctl enable docker-volume-backup.timer
sudo systemctl start docker-volume-backup.timer
Backup Verification and Monitoring
Backup isn't just about doing it; verifying that backups actually work is also important. This was one of the biggest lessons for me: "Don't relax just because you have a backup; don't sleep until you've tested the restore."
- Restore Tests: At regular intervals (monthly or quarterly), I take a random backup and try to restore it in a separate environment. This is the only way to understand if my backup strategy truly works.
- Logging and Notifications: I write the output of my backup scripts to
/var/log. I monitor these logs withjournaldand set up a simple mechanism to send me notifications via email or Telegram in case of an error. - Disk Fullness Ratios: I regularly monitor how much space backups occupy on the disk. I check disk usage with
df -hordu -shcommands. The incident where "the disk filled up to 100% on April 28th" showed me that I needed to take backup size and old backup cleanup (retention policy) seriously. - Impact of
cgroup limitson Backup Performance: Large backup operations can heavily utilize server resources (CPU, memory). If I've set soft limits likecgroup memory.highfor containers or the backup process, I might experience performance drops orOOM-killedsituations during backup. Considering this, I try to schedule backups during the server's least busy times. Last month, when I ran a script withsleep 360, it gotOOM-killed, so I switched topolling-wait. Such errors demonstrate once again that scripts need to be tested.
Conclusion
Docker Volume backup is a topic I constantly grapple with as a system administrator and developer running things on my own VPS. There isn't a single "right" strategy; the best approach depends on the type of data you're backing up, its criticality level, and the resources you have. I've developed the methods and lessons described in this post by distilling my own experiences.
For now, I'm comfortable with these strategies and believe I've minimized the risk of data loss. Do you have similar experiences or different Docker Volume backup strategies? Perhaps in my next post, I'll explain how to automatically send these backups to an S3-compatible storage.
Top comments (0)