DEV Community

ZINSOU Trinité
ZINSOU Trinité

Posted on

Setting up incremental backups with PostgreSql - Introduction - Part 1

🧭 Introduction

PostgreSQL is a robust, reliable, and widely used database management system in the development world. It adheres to SQL standards, is open source, and most importantly, it keeps our data safe.

But even the best tool can't protect you from a bad command, a bug, or a server crash. That’s where our best ally comes in: the backup.

In this guide, we’ll explore how to configure incremental backups on PostgreSQL.

The goal? Whether you're a developer, administrator, student, or just curious, you'll be able to set up an automatic, efficient, and reassuring backup system.

Here’s what we’ll cover:

  • Understanding what a backup is (and what an incremental backup means 🧠)

  • Discovering PostgreSQL’s built-in tools 🔧

  • Step-by-step configuration of an incremental backup system 🛠️

  • Automating the process with scripts and cron

  • And even testing restoration (because an untested backup is just an illusion of safety 😉)

Ready? Let’s start with the basics. 👇

🧠 Reminder: What’s a backup (and what’s an incremental one)?

Before jumping into the configuration, let’s take a moment to lay the groundwork because good understanding is already half the job.

🔐 What is a backup?

It’s simply a copy of data (files, databases, configurations, etc.) stored separately, in a safe place, so we can recover it if something goes wrong: human error, failure, hacking or just a “Oops, I deleted everything.”

🧱 The 3 main types of backups

There are several ways to back up data. Here are the three most common:

  • Full backup: you copy all files every time. It’s simple but can quickly become heavy on disk space.

  • Differential backup: you copy only files that have changed since the last full backup. It's lighter, but over time it accumulates.

  • Incremental backup: you copy files that have changed since the last backup whatever type it was (full or incremental). It’s the most optimized in terms of size and time but a bit more technical to manage.

📌 Why are incremental backups great?

Because they help you save time and space while keeping a history of changes. And the good news is: PostgreSQL supports this kind of backup thanks to a built-in mechanism called Write-Ahead Logging (WAL).

👉 That’s exactly what we’ll use in this guide.

🔍 How does PostgreSQL handle incremental backups?

Let’s start by reviewing the types of backups PostgreSQL offers so we can understand the available mechanisms before diving into incremental backups.

🧬 The different types of backups in PostgreSQL

PostgreSQL offers three backup options 👇:

1. Full backup

This consists of copying the entire database at a specific point in time. It’s a snapshot of the database and allows restoration to its exact state when the backup was taken. Tools like pg_basebackup or manual file copying can be used for this.

2. Logical backup (pg_dump)

This uses the pg_dump tool to export the database as an SQL script. It’s a form of full backup that lets you recover the database independently of PostgreSQL’s physical structure. It’s often used for migrations or targeted backups (e.g., a table or schema). However, it has limitations for incremental backups, as it doesn’t include WAL transaction logs.

3. Incremental backup (WAL-based)

The incremental backup only saves changes made since the last backup (whether full or incremental). It relies on archiving WAL (Write-Ahead Log) files generated by PostgreSQL. These files record every transaction applied to the database. Using them, you can restore the database to a precise moment in time achieving a very low RPO (Recovery Point Objective), often just a few minutes. This is about minimizing the amount of data you are willing to lose after an incident.

🧪 About incremental backups

PostgreSQL’s key feature for incremental backups is Write-Ahead Logging (WAL).

Each write operation on the database is logged in special log files. These files record changes to the database before they are actually applied, thus ensuring data integrity.

This mechanism, known as WAL, makes it possible to restore the state of the database at a specific point in time, a technique called Point In Time Recovery (PITR). This means that after an initial full backup, you can restore a database to any given moment by replaying the WAL files generated since that backup.

That’s exactly what we’re going to leverage in our configuration with Barman.

⚙️ How WAL works

When WAL archiving is enabled, PostgreSQL generates log files in a specific directory (/var/lib/postgresql/{version}/main/pg_wal/) with each write operation. These files are rotated once they reach 16 MB (by default).

Each time a WAL file is created or modified, PostgreSQL can run a custom command, allowing centralized WAL archiving and simplified recovery.

Barman leverages these WAL files to manage incremental backups and point-in-time restores.

🚀 What is Barman?

🍹 Introducing Barman

Barman (Backup and Recovery Manager) is a Python-based tool that simplifies backup and recovery operations for PostgreSQL.

It helps with:

  • Scheduling and automating backups

  • Managing full and incremental backups

  • Performing Point-In-Time Recovery (PITR) using WAL files

It’s especially useful for managing multiple PostgreSQL servers and centralizing backup/restoration operations.

We’ll be configuring and using this tool in this article.

🤔 Barman vs pg_dump

Before going further, let’s compare Barman with traditional PostgreSQL backup tools: pg_dump / pg_basebackup.

  • pg_dump / pg_basebackup: These tools perform full backups, exporting the database’s state (as an SQL script, for example). However, they are limited when it comes to fast disaster recovery, since they only restore the exact point in time when the backup was made.

    Let’s say you run a nightly pg_dump at 3 AM, and an incident happens at 6 PM. You’d lose all changes made since 3 AM 15 hours of data loss, which is unacceptable in most business scenarios.

  • Barman: Unlike pg_dump, Barman works at the physical level. It handles full and incremental backups, using WAL files to recover any change since the last backup. This allows low RPOs, minimizing data loss to a few minutes.

    Of course, it requires more disk space, but the tradeoff is well worth it for data security and recovery capabilities.

📦 Installing Barman

Before using Barman, make sure PostgreSQL is properly installed.

Ubuntu users can follow this tutorial, and CentOS/RHEL users can refer to this one.

Once PostgreSQL is ready, we’ll look into advanced configuration. But first, let’s install Barman:

a. On Ubuntu/Debian

On Debian-based systems like Ubuntu, install Barman from official repositories:

  1. Install from the repository:

    $ sudo apt update
    $ sudo apt install barman
    
  2. Check the installation:

    Once installation is complete, check that Barman has been installed correctly by running the following command:

    $ sudo apt update
    $ sudo apt install barman
    

a. On Ubuntu/Debian

On CentOS or RHEL, you must first add the official Barman repository before installing it.

  1. Add the barman official repository:

    $ sudo yum install -y https://dl.enterprisedb.com/barman/barman-2.16.0-1.rhel7.x86_64.rpm
    
  2. Update system packages and install Barman:

     $ sudo yum update
     $ sudo yum install barman
    
  3. Check the barman installation:

     $ barman --version
    

🧾 Conclusion

In this article, we explored the fundamental concepts needed to implement incremental backups in PostgreSQL, with a focus on the critical role of transaction logs (WAL) and the PITR mechanism. We also saw how Barman, the backup and recovery management tool, fits seamlessly into this process.

The goal was to provide a clear understanding of what incremental backups involve in PostgreSQL before diving into the technical configuration itself. With this foundation, you now have a broader perspective on the challenges and tools required for effective backup management.

In Part 2, we’ll get into the heart of the matter with the complete configuration of Barman from installation to enabling incremental backups. We’ll also cover best practices to ensure the continuity and reliability of your backups.

Stay tuned for the next article to learn how to configure and automate your backups efficiently.

Top comments (0)