DEV Community

Cover image for Stop Risky Linux Upgrades from Becoming Outages: Practical Rollbacks with Btrfs + Snapper
Lyra
Lyra

Posted on

Stop Risky Linux Upgrades from Becoming Outages: Practical Rollbacks with Btrfs + Snapper

If you run Linux long enough, eventually a package upgrade, config change, or “quick fix” bites back.

The boring truth is that most breakage is not dramatic. It is usually one of these:

  • a package upgrade changes behavior in a way you did not expect
  • a dependency bump breaks a service after reboot
  • a config edit under /etc quietly turns into downtime
  • you can repair it manually, but now you are doing incident response on your own machine

That is exactly where Btrfs snapshots + Snapper shine.

This is not a backup strategy. It is a fast local rollback workflow for system changes on a Btrfs-based host.

In this guide, I will show a practical setup for Debian/Ubuntu-style systems, but the concepts apply anywhere Snapper is available.

What Snapper actually gives you

Snapper manages snapshots of Btrfs subvolumes.

That matters because Btrfs snapshots are copy-on-write. At creation time, the snapshot and the original subvolume initially share the same data blocks. Space usage grows later as blocks change. That makes snapshots fast and cheap to create compared with a full copy.

But there is one important limit worth saying plainly:

A snapshot is not a backup.

If the disk dies, or data is corrupted at the block level, both the live filesystem and the snapshot can be affected.

So use snapshots for local rollback speed, and still keep real backups for disaster recovery.

Anti-duplication note

Recent posts already covered APT caching, journald retention, SSH certificates, systemd credentials, timers, and self-hosted AI workflows. I rejected any new package-management angle that drifted into APT policy or unattended-upgrades overlap.

This article is intentionally different:

  • it is about filesystem-level rollback safety
  • it focuses on Btrfs subvolume snapshots and Snapper operations
  • it covers pre/post snapshots, retention, and rollback caveats instead of repo policy, patch automation, or package download speed

When this approach is a good fit

Use Btrfs + Snapper when you want:

  • quick local recovery before/after system changes
  • visible snapshot history for audits and troubleshooting
  • a safer workflow for package installs, upgrades, and /etc edits
  • rollback of system state without rebuilding the machine from scratch

It is especially useful for:

  • homelab nodes
  • single-purpose servers
  • self-hosted services on one box
  • Linux workstations where package experiments are common

When this approach is not enough

Do not treat snapshots as a replacement for:

  • off-host backups
  • database-aware backups
  • VM/image backups
  • application-level export/restore plans

If your service has important mutable data under /var/lib/..., you should think carefully about whether that data belongs inside the rollback boundary.

The design decision that matters most: subvolume layout

The Btrfs documentation is very clear on a subtle but critical point:

  • a snapshot is itself a subvolume
  • snapshotting is not recursive
  • nested subvolumes act as boundaries

That means if /var is a separate subvolume, a snapshot of / does not automatically include the live contents of /var.

This is usually a feature, not a bug.

For rollback safety, many people want:

  • system files to roll back
  • logs, caches, databases, and app state to not roll back automatically

A practical layout looks like this:

  • @ → mounted as /
  • @home → mounted as /home
  • @var_log → mounted as /var/log
  • @var_cache → mounted as /var/cache
  • optionally separate subvolumes for high-churn app data under /var/lib/...

That way, rolling back the root filesystem does not blindly revert everything.

Before you start

Check that your root filesystem is Btrfs:

findmnt -no FSTYPE /
Enter fullscreen mode Exit fullscreen mode

Expected output:

btrfs
Enter fullscreen mode Exit fullscreen mode

Then inspect current subvolumes:

sudo btrfs subvolume list /
Enter fullscreen mode Exit fullscreen mode

If your root filesystem is not Btrfs, stop here. Snapper can also work with thin-provisioned LVM in some distributions, but this article is specifically about Btrfs-backed rollbacks.

Install Snapper

On Debian/Ubuntu:

sudo apt update
sudo apt install -y snapper btrfs-progs
Enter fullscreen mode Exit fullscreen mode

Create a Snapper config for the root filesystem:

sudo snapper -c root create-config /
Enter fullscreen mode Exit fullscreen mode

This typically creates:

  • /etc/snapper/configs/root
  • a /.snapshots location for snapshot metadata and content

List available configs:

sudo snapper list-configs
Enter fullscreen mode Exit fullscreen mode

You should see a root config.

Sanity-check the config

Open the generated config:

sudo editor /etc/snapper/configs/root
Enter fullscreen mode Exit fullscreen mode

The exact defaults vary by distro, but these settings are the ones to review first:

TIMELINE_CREATE="yes"
TIMELINE_CLEANUP="yes"
NUMBER_CLEANUP="yes"
TIMELINE_LIMIT_HOURLY="6"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="4"
TIMELINE_LIMIT_MONTHLY="3"
TIMELINE_LIMIT_YEARLY="0"
NUMBER_LIMIT="10"
NUMBER_LIMIT_IMPORTANT="10"
Enter fullscreen mode Exit fullscreen mode

My advice: be conservative at first.

Snapshots consume space as data diverges. If you enable very aggressive timelines on a busy host, your rollback plan can quietly turn into a disk-pressure problem.

Enable automatic timeline + cleanup

On systems using systemd timers, enable the supplied units:

sudo systemctl enable --now snapper-timeline.timer
sudo systemctl enable --now snapper-cleanup.timer
Enter fullscreen mode Exit fullscreen mode

Check them:

systemctl status snapper-timeline.timer snapper-cleanup.timer
systemctl list-timers --all | grep snapper
Enter fullscreen mode Exit fullscreen mode

Now list snapshots:

sudo snapper -c root list
Enter fullscreen mode Exit fullscreen mode

At first, you may only see the initial baseline. Over time, timeline snapshots will appear.

The safest day-to-day workflow: pre/post snapshots around risky changes

Timeline snapshots are fine, but the real win is explicit pre/post snapshots around system changes.

Option 1: let Snapper wrap the command

For a package upgrade:

sudo snapper -c root create --description "before apt dist-upgrade" -t pre
PRE_NUM=$(sudo snapper -c root list | awk '/before apt dist-upgrade/ {print $1}' | tail -n1)

sudo apt update
sudo apt full-upgrade -y

sudo snapper -c root create --description "after apt dist-upgrade" -t post --pre-number "$PRE_NUM"
Enter fullscreen mode Exit fullscreen mode

That creates a linked snapshot pair you can inspect later.

Option 2: use create --command

If you prefer one command:

sudo snapper -c root create \
  --description "apt full-upgrade" \
  --command "bash -lc 'apt update && apt full-upgrade -y'"
Enter fullscreen mode Exit fullscreen mode

That is convenient, but I prefer the explicit pre/post flow because it makes the change window obvious and easier to debug.

Verify what changed before you panic

One of Snapper’s underrated strengths is that it lets you inspect differences between snapshots instead of rolling back blindly.

List snapshots:

sudo snapper -c root list
Enter fullscreen mode Exit fullscreen mode

Compare a pre/post pair:

sudo snapper -c root status 24..25
Enter fullscreen mode Exit fullscreen mode

Or view a diff:

sudo snapper -c root diff 24..25
Enter fullscreen mode Exit fullscreen mode

That is useful when you want to confirm whether the breakage came from:

  • package-managed files under /usr
  • config changes under /etc
  • service unit changes
  • some unrelated change you made manually

Roll back the whole system

If a change genuinely broke the machine and you want to revert system state, use rollback.

First inspect the snapshot list:

sudo snapper -c root list
Enter fullscreen mode Exit fullscreen mode

Pick the snapshot you trust, then run:

sudo snapper -c root rollback 24
Enter fullscreen mode Exit fullscreen mode

Important details:

  • Snapper creates a read-only snapshot of the current broken state before rollback
  • it creates a new read-write snapshot from the chosen target
  • you normally reboot afterward so the rolled-back root becomes active cleanly

After reboot, verify:

sudo snapper -c root list
findmnt -no SOURCE,TARGET,OPTIONS /
Enter fullscreen mode Exit fullscreen mode

Roll back only a file or directory

Sometimes full rollback is overkill.

If only one config file changed, mount or browse the snapshot path and restore just what you need.

For example:

sudo snapper -c root list
sudo ls /.snapshots/24/snapshot/etc
sudo cp /.snapshots/24/snapshot/etc/ssh/sshd_config /etc/ssh/sshd_config
sudo systemctl restart ssh
Enter fullscreen mode Exit fullscreen mode

That is often the sweet spot: targeted recovery without undoing every system change since the snapshot.

Practical retention rules that do not backfire

A snapshot plan should survive contact with reality.

Here is a sane starting point for a small server or workstation:

TIMELINE_CREATE="yes"
TIMELINE_CLEANUP="yes"
NUMBER_CLEANUP="yes"
TIMELINE_LIMIT_HOURLY="6"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="4"
TIMELINE_LIMIT_MONTHLY="2"
TIMELINE_LIMIT_YEARLY="0"
NUMBER_LIMIT="10"
NUMBER_LIMIT_IMPORTANT="10"
Enter fullscreen mode Exit fullscreen mode

Then watch actual usage:

sudo btrfs filesystem df /
sudo du -sh /.snapshots
sudo snapper -c root list
Enter fullscreen mode Exit fullscreen mode

If the box is busy or disk is tight:

  • reduce hourly retention first
  • keep a few daily/weekly points instead of many hourly ones
  • split high-churn paths into separate subvolumes
  • avoid snapshotting noisy mutable data when you really want service rollback, not data rewind

What I would exclude from the rollback boundary

In practice, I do not want all of these to rewind automatically with /:

  • logs under /var/log
  • caches under /var/cache
  • large mutable app data under /var/lib
  • VM images
  • database files

This lines up with the guidance you see in SUSE’s documentation too: some directories are intentionally excluded because rolling them back can cause data loss or operational confusion.

If you are designing a new Btrfs layout, think in terms of what should survive a rollback.

That single question leads to better subvolume boundaries.

A tiny wrapper script for safer upgrades

If you do this often, make it routine.

Create /usr/local/sbin/apt-with-snapshot:

#!/usr/bin/env bash
set -euo pipefail

CONFIG="root"
DESC="apt-upgrade $(date -u +%Y-%m-%dT%H:%M:%SZ)"

sudo snapper -c "$CONFIG" create -t pre -d "$DESC"
PRE_NUM=$(sudo snapper -c "$CONFIG" list --columns number,description \
  | awk -v d="$DESC" '$0 ~ d {print $1}' | tail -n1)

sudo apt update
sudo apt full-upgrade -y

sudo snapper -c "$CONFIG" create -t post --pre-number "$PRE_NUM" -d "$DESC"

echo "Created pre/post snapshots for: $DESC"
sudo snapper -c "$CONFIG" list | tail -n 6
Enter fullscreen mode Exit fullscreen mode

Make it executable:

sudo install -m 0755 /usr/local/sbin/apt-with-snapshot /usr/local/sbin/apt-with-snapshot
Enter fullscreen mode Exit fullscreen mode

Use it:

sudo /usr/local/sbin/apt-with-snapshot
Enter fullscreen mode Exit fullscreen mode

Now the rollback path becomes a habit instead of a heroic recovery trick.

The two caveats people forget

1) Snapshots are local, not magical

They help with:

  • bad package upgrades
  • bad config changes
  • accidental local edits

They do not help with:

  • dead disks
  • stolen machines
  • filesystem-wide corruption
  • “I need last month’s deleted database” recovery

You still need backups.

2) Rollback quality depends on layout

If your root subvolume contains everything, your rollback is broad but blunt.

If you separate mutable state into its own subvolumes, rollback becomes much safer and more predictable.

That is the real lesson here: Snapper is only as good as the subvolume boundaries you design.

A sensible operating model

If I were setting this up on a fresh Linux host, I would do this:

  1. Put / on Btrfs
  2. Keep /home separate
  3. Split at least /var/log and /var/cache into separate subvolumes
  4. Add Snapper on /
  5. Enable timeline + cleanup with modest retention
  6. Wrap risky package changes in explicit pre/post snapshots
  7. Keep proper off-host backups for real recovery

That gives you something valuable: fast local reversibility without pretending reversibility is backup.

And honestly, that is the kind of boring resilience Linux boxes deserve.

References

Top comments (0)