Matouš Borák for NejŘemeslníci

Posted on Feb 7, 2020

Archiving large MySQL tables (part II - initial migrations)

#mysql #database #rails

This post is a Part II in the series so if you haven't read the introduction in Part I, we'd like to suggest you do so now.

Before fully automating the large tables archival process, we will have to partition the tables manually first and export old data from them. Please bear in mind that this is the most risky and error-prone part of the whole process, so be sure to thoroughly test the procedure in a non-production environment first! Luckily, the gh-ost tool saves us a lot of trouble and we will also give you some pointers to check the expected outcomes throughout the process.

Preparatory thoughts and decisions

For each table, that you want to periodically archive, make a decision on the following options first:

How often you want the data to be split into partitions?

The options are yearly, monthly or weekly (for other options you will have to edit the automation scripts in various places). The best period will depend mostly on how fast the table grows in terms of disk size and number of records. For quickly growing tables, choose weekly or monthly splitting, for less heavy tables you'll be OK with yearly archival.
How much data do you want to retain in the live database?

This decision will be mostly driven by your business needs. How far into history do you generally need to access your data? It's always a matter of a compromise between immediate accessibility of the data (in live tables) vs. table size / disk space. We noticed that we barely ever need to look at data older than one year, so retaining the last 12 months of data, for example, was a safe bet for us.

The convention for partition naming throughout this guide will be:

yYYYY for yearly-split tables
mYYYYMM for monthly-split tables
and wYYYYWW for weekly-split tables.

BTW, the week numbers used for weekly-split tables are another beast. The week number in a given year depends heavily on how you define the first week of a year. This guide supposes all weeks are ISO weeks, where the week 01 in a year is the one where the first Thursday occurs. The weeks themselves start with Mondays. Sometimes, this convention gives you 53 weeks in a year.

In ruby, you'll want to use the %V and %G formatting chars in strftime (e.g. Date.parse("2005-01-01").strftime("%G-%V") => "2004-53"). In MySQL, their equivalents are %v and %x in the DATE_FORMAT function. If you need to work with another week definition, you'll have to edit most of the shell scripts mentioned further on.

Preparing tables for partitioning - 1st migration

Now that we know how we want to split our tables, we can finally get to some db work. However, due to the important caveat regarding unique keys in partitioned tables in MySQL, the process of tables partitioning will most likely have to be a two-step process.

MySQL requires the column, based on which the table partitions are split (e.g. created_at), to be included in all unique keys in the table. As the primary key is, by definition, also a unique key, this rule applies to the primary key, too. In our case we obey the Rails convention of having an auto-incremented integer column named id as the primary key in our tables, so we have to add the created_at column to this primary key somehow.

There are multiple options for doing this:

We might simply try to use the created_at column as the primary key. Unfortunately, this option most probably won't work for several reasons: it would require the timestamp column to be unique across all records which is usually not possible to guarantee, it would complicate JOINing our archived tables with other ones and we would lose quick access to the records using their simple identifier.
We will thus have to create a composite primary key, specifically (id, created_at). A primary key constructed this way will not be very useful in the common sense - it won't help us in any way with SELECT queries (actually, it will make our INSERTs and UPDATEs a tiny bit slower). But it is unique (because id itself is unique) and it will allow us to split tables using created_at.
In theory, we might compose the key in the opposite way (created_at, id), but this will have all the same downsides as the first option, so it's a no go, too.

In conclusion, we will restructure the primary key into a composite key (id, created_at) in each archived table.

Warning: composite primary keys will likely affect your app

Be careful that some frameworks may have problems with tables with a composite primary key, Rails included. Please test that everything works in a sandbox environment before going live with this change. In Rails, we had to reconfigure the primary key in all models that are based on archived tables so that Rails still think they have a simple id primary key:

class ModelWithArchivedTable < ApplicationRecord

  # Override composite primary key (used for partitioning) 
  # as Rails does not support them.
  self.primary_key = "id"

  ...

end

The 1st migration

As stated in Part I, we want all migrations to happen live, without database downtime. We will use the great online DDL tool, gh-ost, for this. Unfortunately, gh-ost does not support directly restructuring the primary key, so we will use a neat trick mentioned in their docs: in the 1st migration we will add a temporary unique key (ADD UNIQUE KEY temp_pk (id, created_at)) and in the 2nd migration we will replace the primary key with this temporary key (DROP PRIMARY KEY, DROP KEY temp_pk, ADD PRIMARY KEY (id, created_at))), besides the partitioning itself. This is why, in effect, we need two separate migrations to partition each of our tables.

Our migrations will heavily depend on using the gh-ost tool. We heartily recommend you read through their docs to understand what command line options are needed, what is the cut-over phase, what internal tables gh-ost uses and when it is OK to delete them, etc. In short, be sure to know what you'll be doing!

We wrote a small shell script, run_ghost.sh, that makes running the gh-ost tool easier for us. It defines the chunk size (how much data will gh-ost process in each iteration), the cut-over file (the cut-over phase will be postponed until you delete this file) and some other options needed for our particular MySQL setup. At the very least, you will want to edit the $database variable in the file. The script expects two arguments (the table name to operate on and the ALTER command) and passes all other arguments back to gh-ost.

The most basic variant of the first migration is just adding the temporary composite key that will later serve as the primary key:

./run_ghost.sh my_huge_table 'ADD UNIQUE KEY temp_pk (id, created_at)' --execute

Before trying this on a huge table, be sure to have at least twice as much free disk space as the size of the table. Gh-ost copies all data from the table into an internal table and even more space is consumed by binary logs.

During the 1st migration, you can freely add more ALTER commands if you need, such as to convert the table from MyISAM to the InnoDB format, update other columns or keys on the run, etc.

Once this migration is finished, including the cut-over, you should end up with the same table, with only the temporary key temp_pk added. Great, the table is now finally ready to be partitioned!

Partitioning the tables - 2nd migration

Partitioning means, above all, defining the particular partitions. We will use the RANGE partition type, so for each partition we will define its name and its upper limit (timestamp). A sample partitioning ALTER command for month-based partitions looks like this:

ALTER TABLE my_huge_table
DROP PRIMARY KEY,
DROP KEY temp_pk,
ADD PRIMARY KEY (id, created_at)
PARTITION BY RANGE(TO_DAYS(created_at)) (
PARTITION m201001 VALUES LESS THAN (TO_DAYS('2010-02-01')),
PARTITION m201002 VALUES LESS THAN (TO_DAYS('2010-03-01')),
...
PARTITION m202001 VALUES LESS THAN (TO_DAYS('2020-02-01')),
PARTITION m202002 VALUES LESS THAN (TO_DAYS('2020-03-01')),
PARTITION future VALUES LESS THAN (MAXVALUE)
);

Let's note a few things here:

The first two DROPs and the following ADD serve as the primary key replacement, as mentioned above.
The next line, PARTITION BY RANGE(...), tells MySQL that we want to split our table according to the created_at temporal column. As this column is a DATETIME column and MySQL only allows integer expressions in the PARTITION BY command, we have to convert the column to an integer using the TO_DAYS function.
Most of the remaining lines define the partitions themselves. Each partition requires a name (here we are using our convention for naming partitions described above) and the upper limit, typically the next day after the partition time period.
The last line defines a somewhat internal partition called future which just serves as a place to hold records with a future date. This partition should be always empty as we don't ever expect to create records with a future date. It's up to you if you really add this partition but the automation script (see Part III) counts on it and checks that it's actually empty.

Preparing partitioning commands

It would be cumbersome to manually write the commands for all partitions needed, especially if you need to cover a long time span. Well, what do we have computers for? It's actually very easy to generate the lines needed. This is an example for the Rails console, with month-based partitions:

>> from = Date.parse('2010-01-01')
=> Fri, 01 Jan 2010
>> to = Date.parse('2020-02-01')
=> Sat, 01 Feb 2020
>> (from..to).select { |date| date.day == 1 }.each do |date| 
  puts "PARTITION #{date.strftime('m%Y%m')} VALUES LESS THAN (TO_DAYS('#{(date + 1.month).strftime("%Y-%m-%d")}')),"
end

PARTITION m201001 VALUES LESS THAN (TO_DAYS('2010-02-01')),
PARTITION m201002 VALUES LESS THAN (TO_DAYS('2010-03-01')),
...
PARTITION m202001 VALUES LESS THAN (TO_DAYS('2020-02-01')),
PARTITION m202002 VALUES LESS THAN (TO_DAYS('2020-03-01')),

Just what we needed! Now we can finally call gh-ost to split our table into partitions:

run_ghost.sh my_huge_table "DROP PRIMARY KEY, DROP KEY temp_pk, ADD PRIMARY KEY (id, created_at) PARTITION BY RANGE(TO_DAYS(created_at)) (PARTITION m201001 VALUES LESS THAN (TO_DAYS('2010-02-01')), PARTITION m201002 VALUES LESS THAN (TO_DAYS('2010-03-01')), …, PARTITION m202001 VALUES LESS THAN (TO_DAYS('2020-02-01')), PARTITION m202002 VALUES LESS THAN (TO_DAYS('2020-03-01')), PARTITION future VALUES LESS THAN (MAXVALUE))" --execute

It will be a very long command and watch for the proper quotes around the ALTER argument. Once this migration finishes, you should have the table partitioned, with all data nicely preserved. You can easily check its new structure with SHOW CREATE TABLE my_huge_table.

Exporting old data from the table

Now that we have our table partitioned, we can finally free up some db disk space by exporting data from the old partitions into files. We will use the mysqldump tool to export the data.

We wrote a small shell script, export-partition.sh, which will export data from the given table and partition name into a gzipped file. You need to tweak at least the $database and $export_dir variables in it.

A sample call of this script can look as follows:

./export-partition.sh my_huge_table m201001

The script will go through the following steps:

It counts the rows in the given partition, to do some basic checks later.
It creates a temporary table named my_huge_table_m201001 with the same schema as the original table (CREATE TABLE … LIKE …). We don't need the partitions structure in the temp table though, that's why the partitions are removed right afterwards (ALTER TABLE … REMOVE PARTITIONING).
It calls the ALTER TABLE … EXCHANGE PARTITION … command to move data out of the original table's partition into the temp table.
It performs a sanity check by counting the number of rows in the temporary table. It must be the same as the original table's record count calculated in step 1.
When all seems OK, it calls the mysqldump tool to export the data from the temp table into a gzipped file. By convention, this file will be located under the $export_dir/my_huge_table directory and will be named my_huge_table_m201001.gz in this case.
When all data is exported, it DROPs the temporary table as well as the exported partition in the original table.
Note that we don't need to use gh-ost here as the potentially blocking ALTER commands run for only fractions of a second in general, and lock only old (and unused) partitions or the temporary table. The whole script (not counting in the data dump) usually takes just a second or two.

Running exports in a loop

You need to repeat the export command for each old partition in each table that you want to archive. This can be easily automated using a for loop in the shell. For example, this command will export all monthly partitions covering the year 2010 from my_huge_table:

for m in 01 02 03 04 05 06 07 08 09 10 11 12; do ./export-partition.sh my_huge_table "m2010$m"; done

Just be sure to stop exporting your data soon enough so that all partitions that are supposed to stay in your live database, are indeed available there…

In case something goes wrong, do not try to kill the process, instead use the ”panic button“ in the script, i.e. uncomment the exit line, save the file and the rest of the loop won't run.

All right! If you got here, you have the most difficult part of the whole process done! Congratulations! 🎉 By now, you should see some db disk space already freed.

How it went at NejŘemeslníci

The whole process can be nicely visualized via disk space monitoring. Below, you can see the monitoring chart captured during the manual partitioning and export stages of one of our largest tables:

The top purple line in the chart is the total db disk space. You can see that we had to substantially (though temporarily) enlarge the disk before starting the procedure. The bottom chart line is the actual db disk space occupied.

The red highlight lines mark the 1st and 2nd migrations during partitioning. During this process, we watched the remaining db disk space closely and, just to be sure we don't run out of space, we periodically deleted some of the MySQL binary logs (these are the little drops in occupied space, marked by the red arrows in the chart).

After running the migrations, we moved on to exporting the data (green highlight). You can see that once we finished the process, we ended up with much less disk space occupied then before, actually it jumped down from ~700GB to about 90GB! Nice!

Also note that the whole process took about three days to complete, ugh! 🤪 But everything went really well, without any downtime or data loss and with only a slightly raised load on our db server. Since this was only a one-time job, it paid really well!

Restoring data from the archive

If you occasionally need to work with some archived data, you can simply load the data dump for the given year / month / week back into the database. It will be loaded as a new table, e.g. my_huge_table_m201001. To load the data, simply call mysql:

gunzip /my_archive/my_huge_table/my_huge_table_m201001.gz | mysql

In Part III, we will automate future table archivals so that we don't have to go through any of this ever again! 😉

DEV Community