TL;DR: This is the first of a two-part series diving deep into the world of database backups and snapshots. We'll explore how to create snapshots and restore data from them, empowering you to manage your data with ease and efficiency.
Sometime during a conversation with my engineering manager, we talked about the challenges of testing new features, particularly those requiring time-consuming test data setup. He provided a valuable insight from his prior experience: utilizing database snapshots for scenarios where test data preparation is extensive. The key takeaway? For features demanding significant pre-testing setup, maintaining a pre-test data backup streamlines the testing process by eliminating the need for repeated data recreation.
Benefits of Database Snapshots
Database snapshots offer several advantages for testing and debugging workflows:
- Reduced Test Setup Time: By readily providing a pre-populated database environment, snapshots drastically reduce the time spent setting up test data for each test run. This translates to faster feedback cycles and improved developer productivity.
- Improved Test Consistency: Snapshots ensure consistent test environments, minimizing variability and aiding in identifying and isolating genuine issues.
- Simplified Debugging: Having a readily available pre-test database state facilitates debugging by allowing developers to quickly revert to a known baseline and pinpoint the source of errors.
Example Use Case
Consider a scenario where you're developing a new e-commerce feature requiring a complex product catalog with numerous items, categories, and relationships. Manually setting up this data for each test run would be incredibly time-consuming and prone to errors. Instead, by maintaining a snapshot of the fully populated catalog, you can instantly replicate the test environment for each run, significantly streamlining the testing process.
Maintaining Database Backups: Cloning vs. pg_dump
Cloning: A Preliminary Solution
My first solution to this dilemma is by keeping a clone of my database. PostgreSQL's cloning feature enables the creation of exact database snapshots using the following command:
CREATE DATABASE ${mold_database_name} WITH TEMPLATE ${name_of_clone_database};
In order to recreate the desired database, you have to drop and create the desired database with the cloned one as the template. While effective, this approach presented certain drawbacks.
Drawbacks of Cloning
- 
Confusion Between Backups and Originals: Cloned databases reside alongside their original counterparts within the PostgreSQL server, potentially leading to confusion and misidentification. You can create many copies of the database, but you have to get more creative with your naming to avoid confusion on which database is the one you want. This proved troublesome for me as i used this method more often. I always confuse the back up databases from the real ones, and as time passes, it began to ran out of ways to cleverly name my back up databases. 
- 
Docker Volume Issues: In Docker-based environments, accidental volume pruning can inadvertently delete cloned databases, resulting in data loss. One time, I ignorantly pruned my containers and volumes, and I lost my snapshots and clone copies of databases along with my testing progress. It's like executing a command hoping that it would clean out the junk volumes on your local machine but end up loosing hours of work progress instead 😑. 
Discovering these limitations prompted me to find another and much better solution.
A Superior Solution: pg_dump
To address these limitations, I explored the use of pg_dump for database backups. This method offers several advantages:
- Data Integrity Preservation: pg_dump creates independent backup files, ensuring the original database remains unaffected by testing activities.
- Reduced Risk of Data Loss: Backup files reside outside the PostgreSQL server, mitigating the risk of accidental deletion.
- Efficient Data Restoration: Restoring databases from pg_dump files is a straightforward process..
In order to use pg_dump, we need to install postgres client in our local machine: sudo apt install postgresql-client
Normally, you can do this procedure with pg_dump command alone: pg_dump "${database_name}" > "${dump_file}". But to make our lives "easier" we'll create a script for that. Yep i choose to spend 3 hours developing a script for a procedure that would normally take a few minutes, i am that kind of guy.
Bash script for taking database snapshot
For taking snapshot of database/s, i created a bash script to do just that. This would make more sense if used with the method that i talked on my other blog post about structuring your file directory by scope of testing or development.
I'll go over each and every section of the script and explain what it does. I left comments on how to use the script on the snippet to make more sense to the user.
#!/usr/bin/bash
# HOW TO USE?
# 1. just list the database names that you want to back up on the "target_database" array
# 2. then update the value of "parent_directory"
# ---
# list your databases that you want to have a backup here 
# Array of the names of database whose snapshot to be taken.
target_database=("reference_database_1" "reference_database_2" "reference_database_3")
# ---
# This loops through the database names values of `database_name` and creates a snapshot of each databases. It then prints a message on the terminal to indicate of the action was successful or was failed.
backup_database() {
  for database_name in "${target_database[@]}"; 
  do
    dump_file="${database_name}.sql"
    pg_dump "$database_name" > "$dump_file"
    if [ $? -eq 0 ]; then
     echo "Dump for ${database_name} completed successfully. File: ${dump_file}"
    else
      echo "Error: Dump for ${database_name} failed."
    fi
  done
}
# ---
# This refers to the value on where the `.sql` file containing the snapshot of the database will be exported and saved.
parent_directory="home/development-environments/service-1"
# ---
# This section of the scripts inspects if there is a directory named: "database_dump" on under the `parent_directory`. If there is none, then the script creates one. After then, it executes the `backup_database` function.
cd ~/
cd "$parent_directory"
if [ -d "database_dump" ]; then
    echo "The dump folder exists."
    cd database_dump
    backup_database
else
    echo "The dump folder does not exist. Creating dump folder."
    mkdir database_dump
    cd database_dump
    backup_database
fi
The script can be used on ubuntu-linux terminal just by simply executing the command: ./${script_file_name}.sh
Things to remember when using the script
- Update the value of
parent_directoryand,- Update the value of
target_database.
Conclusion
Maintaining an efficient local development environment is crucial for developers. This includes having a reliable method for testing and debugging features that require substantial setup data. Fortunately, database snapshots offer a powerful solution.
The best snapshotting method for your local environment depends on your specific needs and preferences. If you frequently experiment with large amounts of data or require a more comprehensive replica of your database, cloning might be preferable. However, for most scenarios, especially in Docker-based environments, data export offers a simpler, safer, and more efficient solution for managing test and debugging data.
 




 
    
Top comments (4)
Great article! I'm totally with you on the importance of database backups. As it happens, I'm working on a product that streamlines the process - DatabaseBackup.dev. Looking forward to hearing your thoughts!
Thanks for the praise man. I viewed your product and i think that its a great solution for automating database backups. I'll be sure to check out more of it. cheers.
Clear walkthrough! If you're building Laravel on Windows, I’ve been using ServBay, which sets up PHP, databases, and HTTPS in one click—also handles Laravel Artisan and migrations out of the box. Saves a lot of setup headaches.
Great to see someone getting valuable insight on the approach 😁.
Using applications that saves you time, instead of making these manual configs and scripts, totally adds great value to your workflow 👍.