DEV Community

Franck Pachot for YugabyteDB

Posted on

PITR snapshot: an easy flashback / backtrack for application releases

During an application release, you may run some DDL and DML to bring the database from one version to another. If anything fails, for whatever reason, you don't want to leave it in an intermediate state and restore the initial state.

  • This is an In-Place Point In Time Recovery (PITR) and can be achieved by restoring a backup and recovering the transactions until the initial point in time. But doing this in this way can be long, and the fallback procedure is part of the maintenance window, whatever the probability of the decision.
  • Another solution is Transactional DDL where you run everything in a transaction that can be rolled back. PostgreSQL is quite nice there, supporting transactional DDL and being fast to rollback. But having long-running transactions may not be the best idea and you don't really need this because the application is stopped.
  • The third solution is what Oracle calls Flashback Database, and Amazon Aurora calls Backtrack. It is similar to PITR but, rather than a restore + recover, it keeps a snapshot from the start of the maintenance window, easy to revert to.

The snapshot solution depends on the storage. YugabyteDB stores tables and indexes into LSM-Tree which are append-only. This is stored on the filesystem with SST Files, which are written once and never updated. With this technology, the snapshots are easy: nothing to write, just keep the files. It is a bit more complicated to synchronize metadata (the PostgreSQL catalog) that is stored on the yb-master. To automate this, we can create a snapshot schedule for the duration of the maintenance.

Example

I'm starting a RF=3 cluster on my laptop using my ybdemo/docker/yb-lab/ and I set two aliases, ysqlsh, for SQL commands, and yb-admin, for snapshot commands:

alias ysqlsh="\
docker exec -it yb-tserver-0 ysqlsh -h yb-tserver-0 \
" 

alias yb-admin="\
docker exec -it yb-master-0 /home/yugabyte/bin/yb-admin   \
 --master_addresses $(echo yb-master-{0..2}:7100|tr ' ' ,)\
"

sh gen-yb-docker-compose.sh rf3

Enter fullscreen mode Exit fullscreen mode

I create a database with a demo table:

ysqlsh
 create database myapp;
 \c myapp
 create table mytable (a uuid primary key, b bigint unique);
 create extension pgcrypto;
 insert into  mytable select gen_random_uuid(), generate_series(1,1000);

Enter fullscreen mode Exit fullscreen mode

Here are my tables and indexes: mytable with OID 16426 holds the table rows in its primary key structure, indexed on "a" and mytable_b_key is the unique index on "b"

Initial Tables

Take a snapshot

When the maintenance window starts, after stopping the application and before running the changes in the database, I take a snapshot by starting a snapshot schedule with yb-admin create_snapshot_schedule, a frequency and retention, and the name of the database prefixed with ysql.:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

yb_sched=$(
 yb-admin create_snapshot_schedule 60 2880 ysql.myapp |
  tee /dev/stderr |
  jq -r .schedule_id ; 
 ) ; echo "${yb_sched}"

{
    "schedule_id": "320000d2-8424-4930-aeb2-5994d1d5a345"
}
320000d2-8424-4930-aeb2-5994d1d5a345

Enter fullscreen mode Exit fullscreen mode

This will take an hourly snapshot (every 60 minutes) that is kept for two days (2880 minutes). This allows to recover within 24h, without increasing the MVCC retention too much (one hour here). Here, the goal is to cover the maintenance window and be able to recover quickly by applying, at most, one hour of WAL. I need only one snapshot fort that, and I'll delete the snapshot schedule at the end.

The first snapshot is taken in the background. I want to be sure that the snapshot is complete:

Franck@YB:/home/ybdemo/docker/yb-lab $

 while yb-admin list_snapshots | grep CREATING 
 do sleep 1 ; done

Franck@YB:/home/ybdemo/docker/yb-lab $

 yb-admin list_snapshots

yb-admin list_snapshots
Snapshot UUID                           State           Creation Time
c9290df0-dc3c-4819-88d8-8f32372497d1    COMPLETE        2022-06-14 16:35:50.661273
No snapshot restorations

Enter fullscreen mode Exit fullscreen mode

This enables Point In Time Recovery (PITR) and allows to recover to any point in time after this first snapshot. As the goal is to revert back to the start of the maintenance window, I take note of this timestamp:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

yb_pit0=$(
  ysqlsh -qAt -c "
  select to_char(now() at time zone 'utc','YYYY-MM-DD HH24:MI:SS')
  " myapp
 ) ; echo "${yb_pit0}"

2022-06-14 16:36:03
Enter fullscreen mode Exit fullscreen mode

I get it from the database. You can also get it from Linux date +%s but I find it convenient to log it at the start of the DDL + DML that will be run.

Some DDL and DML

I this example I'll do two things:

  • remove half of the rows, as if we purge old data
  • change the primary key from column a to column b which is typically something that requires DDL + DML, though an intermediate table
ysqlsh myapp
 begin transaction;
 delete from mytable where b<=500;
 commit;
 alter table mytable  rename to tmp_mytable;
 create table mytable (a uuid unique, b bigint primary key);
 insert into mytable (a,b) select a,b from tmp_mytable;
 drop table tmp_mytable;
 \q
Enter fullscreen mode Exit fullscreen mode

Now that PITR is enabled, I still see the dropped table and index, because metadata is required in case of restoring a previous state of data. They are still there, but marked as "Hidden":

Modified Tables

The new table, with "b" as the primary key, is the visible one:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c '\d mytable' myapp;

              Table "public.mytable"
 Column |  Type  | Collation | Nullable | Default
--------+--------+-----------+----------+---------
 a      | uuid   |           |          |
 b      | bigint |           | not null |
Indexes:
    "mytable_pkey" PRIMARY KEY, lsm (b HASH)
    "mytable_a_key" UNIQUE CONSTRAINT, lsm (a HASH)

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c 'select count(*) from mytable' myapp

  count
---------
 500
(1 row)
Enter fullscreen mode Exit fullscreen mode

Flashback

Let's say that this change doesn't work as expected, the application test shows that something fails with this new schema. We want to revert it back quickly and open the application again.

Franck@YB:/home/ybdemo/docker/yb-lab $

  yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "727a168b-6666-42a1-b67f-4f5148180f88"
}
Enter fullscreen mode Exit fullscreen mode

The restore happens in the background. I want to be sure that the restore is complete:

Franck@YB:/home/ybdemo/docker/yb-lab $

 while yb-admin list_snapshots | grep RESTORING
 do sleep 1 ; done

 yb-admin list_snapshots

Snapshot UUID                           State           Creation Time
c9290df0-dc3c-4819-88d8-8f32372497d1    COMPLETE        2022-06-14 16:35:50.661273
6c223f6c-b033-41dc-a1d9-404607b79252    COMPLETE        2022-06-14 16:37:31.195093
Restoration UUID                        State
727a168b-6666-42a1-b67f-4f5148180f88    RESTORED
Enter fullscreen mode Exit fullscreen mode

Verification

Now, at this point in time, the new tables are hidden and the old ones are visible:

Restored Table

Both metadata and data have been restored as of the initial point in time:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c '\d mytable' myapp;

              Table "public.mytable"
 Column |  Type  | Collation | Nullable | Default
--------+--------+-----------+----------+---------
 a      | uuid   |           | not null |
 b      | bigint |           |          |
Indexes:
    "mytable_pkey" PRIMARY KEY, lsm (a HASH)
    "mytable_b_key" UNIQUE CONSTRAINT, lsm (b HASH)

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c 'select count(*) from mytable' myapp

  count
---------
 1000
(1 row)
Enter fullscreen mode Exit fullscreen mode

All is back as of 16:36:03 - tables and metadata

Forward and backward

In the list_snapshot above we have seen two snapshots because one was taken before the restore. This means that, in case of doubt, we can roll forward and backward within this timeline:

Franck@YB:/home/ybdemo/docker/yb-lab $

   yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:31"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "ab81a8f8-eee1-410a-9e06-65edb6248160"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 ysqlsh -c 'select count(*) from mytable' myapp
 count
-------
   500
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

   yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:00"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "edc3981d-fc77-4564-b77a-05d2e05d5483"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 ysqlsh -c 'select count(*) from mytable' myapp  
                 count
-------
   500
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

   yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:30"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "3c51073f-2d43-418c-b468-29b3c2fee160"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 ysqlsh -c 'select count(*) from mytable' myapp      

             count
-------
  1000
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

   yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:45"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "d2e137f8-86fa-41bf-ace8-864bf1572fc3"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 ysqlsh -c 'select count(*) from mytable' myapp

                ERROR:  relation "mytable" does not exist
LINE 1: select count(*) from mytable
                             ^

Franck@YB:/home/ybdemo/docker/yb-lab $

   yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"

{
    "snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
    "restoration_id": "4ef52933-a40d-4764-8dfd-f01f6298b966"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 ysqlsh -c 'select count(*) from mytable' myapp

 count
-------
  1000
(1 row)

Enter fullscreen mode Exit fullscreen mode

You see here that I was able to go back to the future, just before the restore, then back in time again, finding when the table had 1000 rows before my delete, and even when the table was renamed but not yet created, and finally back to the initial state as it was my goal.

Delete schedule

When all is validated, there's no need to keep the snapshots. Of course you can, if you want to be able to do point in time recovery, but for this example I need it only to cover the maintenance window.

Franck@YB:/home/ybdemo/docker/yb-lab $

  yb-admin delete_snapshot_schedule ${yb_sched} 

{
    "schedule_id": "320000d2-8424-4930-aeb2-5994d1d5a345"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

 yb-admin list_snapshots

No snapshots
Restoration UUID                        State
727a168b-6666-42a1-b67f-4f5148180f88    RESTORED
ab81a8f8-eee1-410a-9e06-65edb6248160    RESTORED
4ef52933-a40d-4764-8dfd-f01f6298b966    RESTORED
edc3981d-fc77-4564-b77a-05d2e05d5483    RESTORED
3c51073f-2d43-418c-b468-29b3c2fee160    RESTORED
d2e137f8-86fa-41bf-ace8-864bf1572fc3    RESTORED

Enter fullscreen mode Exit fullscreen mode

I have no snapshot anymore. Just some traces of my restores.

And all is clean, no hidden table:

Table without snapshot

Summary

YugabyteDB doesn't support transactional DDL yet, but the need to protect a maintenance window full of DDL and DML is achieved with fast Point In Time Recovery. Doing the same with PostgreSQL is not easy and requires a copy-on-write filesystem which impacts the performance. YugabyteDB storage, in SST Files, allows fast snapshots, on each node. This, combined with YugabyteDB implementation of MVCC (Multi-Version Concurrency Control) with Hybrid Logical Clock timestamps, makes it possible to travel in time, to a state that is consistent across all nodes. The MVCC retention, usually set by timestamp_history_retention_interval_sec for transaction snapshots, is internally increased to cover down to the previous PITR snapshot.

Note: PITR is work in progress, there are currently (2.13) some limitation. They are documented and the roadmap is in PITR: Tracking issue #7120.

Discussion (0)