ScaleGrid for ScaleGrid

Posted on Jul 11, 2018 • Edited on Dec 26, 2019 • Originally published at scalegrid.io

Data Integrity and Performance Considerations in MySQL Semisynchronous Replication

#mysql #replication #semisynchronous #performance

MySQL semisynchronous replication provides improved data integrity because when a commit returns successfully, it’s known that the data exists in at least two places - the master and its slave. In this blog post, we review some of the MySQL configurations that influence the data integrity and performance aspects of semisynchronous replication. We’ll be using InnoDB storage engine and GTID-based replication in a 3-node replica set (master and 2 slaves), which will ensure there is redundancy in the slaves. This means that if there are issues with one slave, we can fall back on the other.

Read the original post: Data Integrity and Performance Considerations in MySQL Semisynchronous Replication

Configurations Applicable to Both Master and Slave Nodes

innodb_flust_log_at_trx_commit = 1
sync_binlog = 1

These configurations guarantee high durability and consistency settings for data. That is, each committed transaction is guaranteed to be present in binary logs and also the logs are flushed to the disk. Hence, in the case of a power failure or operating system crash, the data consistency of MySQL is always preserved.

Configurations on the Master Node.

rpl_semi_sync_master_wait_for_slave_count:

This option is used to configure the number of slaves that must send an acknowledgment before a semisynchronous master can commit the transaction. In the 3-node replica set, we recommend setting this to 1, so that we always have an assurance that the data is available in at least one slave while avoiding any performance impact involved in waiting for acknowledgment from both the slaves.

rpl_semi_sync_master_timeout:

This option is used to configure the amount of time that a semisynchronous master waits for slave acknowledgment before switching back to asynchronous mode. We recommend setting this to a large number so that there is no fallback to asynchronous mode which then defeats our data integrity objective. Since we’re operating with 2 slaves and rpl_semi_sync_master_wait_for_slave_count is set to 1, we can assume that at least one of the slaves does acknowledge within a reasonable amount of time, thereby minimizing the performance impact of this setting.

Configurations on the Slave Nodes

In the slaves, it’s always important to track two positions very accurately: the current executed position of SQL thread in relay log, and the current position of the IO thread which indicates how far the mater binary file is read and copied to slave. The consequences of not maintaining these positions are quite obvious. If there's a slave crash and restart, SQL thread can start processing transactions from a wrong offset or the IO thread can start pulling data from a wrong position in the master binary logs. Both these cases will lead to data corruption.

it is important to ensure crash-safety of slaves through the following configurations:

Setting relay_log_info_repository to TABLE will ensure the position of the SQL thread is updated together with each transaction commit on the slave. However, it’s difficult to maintain the exact position of IO thread and flush to the disk. This is because reading master binary log and writing to slave relay log is not based on transactions. The impact on performance is very high if IO thread position has to be updated and flushed to disk after each write to slave relay logs. A more elegant solution would be to set relay_log_recovery = ON, in which case, if there’s a MySQL restart, current relay logs will be assumed to be corrupted and will be freshly pulled from the master based on the SQL thread position.

Last but not least, it’s important to note that semisynchronous replication ensures that the data has just ‘reached’ one of the slaves before the master committing the transaction, and does not mean that the transactions are committed on the slave. Hence, it will be good to ensure that the SQL thread works with good performance. In the ideal case, the SQL thread moves hand in hand with the IO thread so we can have the benefit of slave not only receiving the transactions, but also committing them. It’s recommended to go with a multi-threaded slave configuration so that we can get increased slave SQL thread performance. The important configurations for multi-threaded slaves are:

slave_parallel_workers : Set this to > 1 to enable multiple slave SQL thread workers. Based on the number of threads writing on the master, we can decide an optimal number for this so that the slave does not lag.
slave-parallel-type = LOGICAL_CLOCK
slave-preserve-commit-order =1

The above configurations are going to promise parallelism in the slave, while at the same time, preserving the order of transactions as seen on the master.

In summary, by using the above configurations on our MySQL replica set, we’re able to maintain high data integrity along with an optimal performance.

As always, if you have any questions, leave us a comment, reach out to us at @scalegridio on Twitter, or send us an email at support@scalegrid.io.