<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Klim Markelov</title>
    <description>The latest articles on DEV Community by Klim Markelov (@tutelaris).</description>
    <link>https://dev.to/tutelaris</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F612110%2F79d66674-7088-4496-9f9b-fb52c30a3b03.jpeg</url>
      <title>DEV Community: Klim Markelov</title>
      <link>https://dev.to/tutelaris</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tutelaris"/>
    <language>en</language>
    <item>
      <title>What is ACID? Baby don't hurt me. No more.</title>
      <dc:creator>Klim Markelov</dc:creator>
      <pubDate>Sun, 25 Apr 2021 15:31:02 +0000</pubDate>
      <link>https://dev.to/tutelaris/what-is-acid-baby-don-t-hurt-me-no-more-2jlb</link>
      <guid>https://dev.to/tutelaris/what-is-acid-baby-don-t-hurt-me-no-more-2jlb</guid>
      <description>&lt;p&gt;Hello, ultra devs! 👋🏻⚡️&lt;br&gt;
A lot of developers currently use relational databases such as &lt;em&gt;MySQL&lt;/em&gt;, &lt;em&gt;PostgreSQL&lt;/em&gt;, and so on. And they are probably familiar with transactions. But what the &lt;strong&gt;"transaction"&lt;/strong&gt; is and what properties does it have?&lt;/p&gt;

&lt;p&gt;Today I would like to talk about transaction properties that were combined into the cool and scary word &lt;strong&gt;ACID&lt;/strong&gt; and reveal the truth about each letter in this word 👆🏻&lt;/p&gt;
&lt;h3&gt;
  
  
  What is ACID?
&lt;/h3&gt;

&lt;p&gt;ACID is an acronym of the following words:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Atomicity&lt;/em&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Consistency&lt;/em&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Isolation&lt;/em&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Durability&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If we have a certain operation in a database that satisfies these properties, we can call this operation &lt;strong&gt;"transaction"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Ok, that's nice, but these words are just words without an explanation. Let's dig deeper into each of them! 🚜&lt;/p&gt;
&lt;h3&gt;
  
  
  Atomicity
&lt;/h3&gt;

&lt;p&gt;Atomicity is a nice property that guarantees, that &lt;em&gt;transaction&lt;/em&gt; is &lt;strong&gt;atomic&lt;/strong&gt; and cannot be broken into smaller parts. If something happens inside that &lt;em&gt;transaction&lt;/em&gt;, it will be reverted for all affected fields described inside this &lt;em&gt;transaction&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Imagine we have the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;START&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'title'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'body'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;posts_statistics&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;posts_amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;em&gt;transaction&lt;/em&gt; basically inserts the post and increments the &lt;code&gt;post_statistics.posts_amount&lt;/code&gt; field for draft posts.&lt;br&gt;
If an &lt;code&gt;UPDATE&lt;/code&gt; of the &lt;code&gt;posts_statistics&lt;/code&gt; fails, the whole &lt;em&gt;transaction&lt;/em&gt; will be rolled back and the post won't be inserted in the &lt;code&gt;posts&lt;/code&gt; table as well. So, these two operations (actually three, &lt;code&gt;SELECT&lt;/code&gt; is also an operation) are highly coupled and can be considered atomic.&lt;/p&gt;

&lt;p&gt;One more nice property of &lt;strong&gt;atomicity&lt;/strong&gt; is that all clients won't see the change in tables until the &lt;em&gt;transaction&lt;/em&gt; gets committed. So, if we execute the &lt;em&gt;transaction&lt;/em&gt; described above and during the transaction execution, we execute the following query &lt;code&gt;SELECT COUNT(id) FROM posts WHERE status = 'draft'&lt;/code&gt; from another client, we will see a different value than the &lt;code&gt;@counter&lt;/code&gt; variable has.&lt;/p&gt;
&lt;h3&gt;
  
  
  Consistency
&lt;/h3&gt;

&lt;p&gt;This is a very ambiguous and unclear letter. You might hear about it in &lt;strong&gt;CAP theorem&lt;/strong&gt; (leave a comment if you wanna read about &lt;strong&gt;CAP theorem&lt;/strong&gt; and types of consistencies there 😌) or in &lt;strong&gt;Consistent hashing&lt;/strong&gt;. This &lt;strong&gt;Consistency&lt;/strong&gt; is more about data consistency on the application level. Basically, it means, that before committing the transaction system needs to make sure to satisfy all invariants that were set up beforehand and were valid at the moment of starting the transaction.&lt;/p&gt;

&lt;p&gt;Imagine the situation when you buy something on dev.to shop. You reached the point when you need to pay for a cool T-shirt. You filled out all the credentials and clicked &lt;strong&gt;Pay&lt;/strong&gt;. In order to satisfy &lt;strong&gt;consistency&lt;/strong&gt; property, the sum value of your amount of money and dev.to's amount of money should be the same at the moment of the beginning of the transaction and the moment of committing it.&lt;/p&gt;

&lt;p&gt;This sum value can be considered an invariant for this &lt;em&gt;transaction&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Isolation
&lt;/h3&gt;

&lt;p&gt;This property is very useful for the database to prevent so-called &lt;strong&gt;race conditions&lt;/strong&gt;. You may wonder what is "race condition".&lt;/p&gt;

&lt;p&gt;Imagine, you have multiple clients trying to access the same data and change it at the same time. Let's say, they concurrently insert posts using &lt;em&gt;transaction&lt;/em&gt; that is described in the &lt;strong&gt;Atomicity&lt;/strong&gt; chapter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;START&lt;/span&gt; &lt;span class="n"&gt;TRANSACTION&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'title'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'body'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;posts_statistics&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;posts_amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's say initially we have 42 posts (just because 42). Since they do it at the same time, here is what might happen:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tZIQg5A9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/he0id64gssv60df4d7dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tZIQg5A9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/he0id64gssv60df4d7dj.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, they added their posts, but the total number of posts should be equal to 44, but they ended up with 43.&lt;/p&gt;

&lt;p&gt;So, in an ideal world &lt;strong&gt;isolation property&lt;/strong&gt; ensures that concurrent transactions look like they get executed one after another or &lt;em&gt;serially&lt;/em&gt;. Therefore it is also known as &lt;strong&gt;serializability&lt;/strong&gt;. In practice, the implementation of this property rarely serializable and some databases use so-called &lt;strong&gt;Weak isolation&lt;/strong&gt; (please, leave a comment if you wanna read about it).&lt;/p&gt;

&lt;h3&gt;
  
  
  Durability
&lt;/h3&gt;

&lt;p&gt;The last, but the very important property is &lt;strong&gt;Durability&lt;/strong&gt;. This property means that once a transaction got successfully committed, it will remain committed and won't be forgotten even if the data gets corrupted due to the power or the database crash.&lt;/p&gt;

&lt;p&gt;We can distinguish durability based on the database architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Single-node database;&lt;/li&gt;
&lt;li&gt;Replication database (you can read more about replication &lt;a href="https://dev.to/tutelaris/introduction-to-mysql-replication-97c"&gt;here&lt;/a&gt;).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For &lt;em&gt;single-node database&lt;/em&gt;, &lt;strong&gt;durability&lt;/strong&gt; makes sure that the data won't be corrupted by writing on the disk and has a recovery mechanism in case of disk corruption.&lt;/p&gt;

&lt;p&gt;For &lt;em&gt;replication database&lt;/em&gt;, &lt;strong&gt;durability&lt;/strong&gt; makes sure that the data got written in a certain amount of replicas, so in case of a crash, it will be propagated to other replicas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Today we've learned about &lt;em&gt;transaction&lt;/em&gt; properties that are aggregated into the &lt;strong&gt;ACID&lt;/strong&gt; word.&lt;/p&gt;

&lt;p&gt;ACID is an acronym of the following words:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Atomicity – ensures that all &lt;em&gt;transactions&lt;/em&gt; are &lt;strong&gt;atomic&lt;/strong&gt; and cannot be broken down into small parts;&lt;/li&gt;
&lt;li&gt;Consistency – ensures that the &lt;em&gt;transaction&lt;/em&gt; does not violate &lt;strong&gt;application invariants&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;Isolation – ensures that &lt;strong&gt;race conditions&lt;/strong&gt; do not happen;&lt;/li&gt;
&lt;li&gt;Durability – ensures that once the &lt;em&gt;transaction&lt;/em&gt; is committed, it is committed forever.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! Thank you for your attention! I hope you liked this post 😌&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>database</category>
      <category>postgres</category>
      <category>transactions</category>
    </item>
    <item>
      <title>Introduction to MySQL replication</title>
      <dc:creator>Klim Markelov</dc:creator>
      <pubDate>Sun, 18 Apr 2021 19:12:19 +0000</pubDate>
      <link>https://dev.to/tutelaris/introduction-to-mysql-replication-97c</link>
      <guid>https://dev.to/tutelaris/introduction-to-mysql-replication-97c</guid>
      <description>&lt;p&gt;Hello, ultra devs! 👋🏻⚡️&lt;br&gt;
Today I would like to talk about Replication and how it works in MySQL.&lt;/p&gt;

&lt;p&gt;Let's start from the beginning. If you know what replication is, feel free to skip the next chapter.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is replication?
&lt;/h3&gt;

&lt;p&gt;Basically, a replication mechanism means storing some copy of data on multiple machines. &lt;em&gt;"How is it different from backups?"&lt;/em&gt; you may wonder. Replication is a bit more than that. When a backup is just a snapshot of the data in a certain time, replication helps you to not only keep the copy of the data in real-time ensuring availability but also facilitate the overload on the database providing both reading and writing to the client and therefore increase throughput. Also, replication helps you to distribute your data storage across the globe decreasing the response time for clients from different parts of the world.&lt;/p&gt;

&lt;p&gt;In this article, we will be talking about &lt;strong&gt;leader-based&lt;/strong&gt; replication, and to continue our journey into this world, it's required to introduce several terms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Leader (master)&lt;/em&gt; – part of the replication system eligible to write and read from the database.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Follower (replica)&lt;/em&gt;  – part of the replication system eligible &lt;em&gt;only to read&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Basically, &lt;em&gt;leader&lt;/em&gt; is responsible for all inserts, updates, deletes, and once these changes go through the &lt;em&gt;leader&lt;/em&gt;, it transfers these changes to all its &lt;em&gt;followers&lt;/em&gt; that are responsible for reads and never writes.&lt;/p&gt;

&lt;p&gt;Here is a simple example of single-leader replication with two followers:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pcwa3h4dw2fdqssu7jo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pcwa3h4dw2fdqssu7jo.png" alt="Pic.1"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  How it works in MySQL?
&lt;/h3&gt;

&lt;p&gt;Ok, now we know what replication is, but how it actually works in MySQL? How data got transferred from leader to followers and how MySQL keeps the consistency between them?&lt;/p&gt;

&lt;p&gt;Imagine the situation that dev.to is powered by MySQL. You just wrote the article and clicked on the &lt;strong&gt;Publish&lt;/strong&gt; button. Here is what happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data comes to the leader and get saved in the database;&lt;/li&gt;
&lt;li&gt;The leader saves data changes in the special file called &lt;strong&gt;binary log&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;Follower copies changes in &lt;strong&gt;binary log (binlog)&lt;/strong&gt; to its own file called &lt;strong&gt;relay log&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;Follower replays these changes from &lt;strong&gt;relay log&lt;/strong&gt; to its own data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0rjg8cotgwpe2cpesq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0rjg8cotgwpe2cpesq0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, to synchronize &lt;strong&gt;relay log&lt;/strong&gt; with leader's &lt;strong&gt;binary log&lt;/strong&gt;, MySQL starts a worker thread that is called &lt;em&gt;I/O follower thread&lt;/em&gt;. It's basically an ordinary client connection to the leader that starts reading its &lt;strong&gt;binary log&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Digging a bit into details, we can ask a quite reasonable question: in which format do &lt;strong&gt;binary log&lt;/strong&gt; and &lt;strong&gt;relay log&lt;/strong&gt; store the data?&lt;/p&gt;
&lt;h3&gt;
  
  
  Replication types
&lt;/h3&gt;

&lt;p&gt;Currently, MySQL supports two types of replication: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Statement-based replication;&lt;/li&gt;
&lt;li&gt;Row-based replication.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Statement-based replication
&lt;/h4&gt;

&lt;p&gt;So, as it's clear from the name, &lt;strong&gt;statement-based replication&lt;/strong&gt; records the whole query that changed the state of the data in &lt;strong&gt;binlog&lt;/strong&gt;. So, when a follower decides to synchronize its data with the leader, it copies the query and replays it by executing this query and applying changes to its own data.&lt;/p&gt;

&lt;p&gt;This kind of replication is very easy to implement and has multiple advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It still works when the schema is different on the leader and the follower;&lt;/li&gt;
&lt;li&gt;It's easy to audit and debug;&lt;/li&gt;
&lt;li&gt;It requires not that much disk space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, with &lt;em&gt;great advantages comes great disadvantages&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Non-deterministic functions&lt;/em&gt;. With non-deterministic functions, it can come up with different data on &lt;strong&gt;leader&lt;/strong&gt; and &lt;strong&gt;follower&lt;/strong&gt;. By non-deterministic functions I mean function like the following: &lt;code&gt;CURRENT_USER()&lt;/code&gt;, &lt;code&gt;RAND()&lt;/code&gt;, &lt;code&gt;IS_FREE_LOCK()&lt;/code&gt; and so on. Executing them first on &lt;strong&gt;leader&lt;/strong&gt; and then on &lt;strong&gt;follower&lt;/strong&gt; can lead to inconsistent data;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Performance penalty&lt;/em&gt;. Imagine if you execute the following query:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;post_statistics&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;statistics_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;posts_amount&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;without having an index on the &lt;code&gt;status&lt;/code&gt; field, and after pressing 'Enter' you just went for a tea (a hypothetical situation, I know, you probably drink coffee). The query got executed on &lt;strong&gt;leader&lt;/strong&gt;, consuming all available CPU, and then &lt;strong&gt;follower&lt;/strong&gt; picked up the baton, copied the query to its &lt;strong&gt;relay log&lt;/strong&gt;, and cheerfully ate all CPU as well;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Triggers and stored routies&lt;/em&gt;. Triggers and stored routines, as well as &lt;em&gt;Non-deterministic functions&lt;/em&gt;, can cause a lot of problems with different side-effects on &lt;strong&gt;leader&lt;/strong&gt; and &lt;strong&gt;follower&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, &lt;strong&gt;Statement-based replication&lt;/strong&gt; has its own advantages, but big disadvantages. Therefore not every database supports this type of replication, but in the case of MySQL, up until &lt;strong&gt;MySQL 5.0&lt;/strong&gt;, this type of replication was the only one supported.&lt;/p&gt;
&lt;h4&gt;
  
  
  Row-based replication
&lt;/h4&gt;

&lt;p&gt;Compare to &lt;strong&gt;Statement-based replication&lt;/strong&gt;, &lt;strong&gt;Row-based replication&lt;/strong&gt; stores the actual data changes in &lt;strong&gt;binary log&lt;/strong&gt;, but not the query. So, when a &lt;em&gt;follower&lt;/em&gt; replicates the data, it doesn't execute the query, but applies the changes to each record it was applied to on &lt;em&gt;leader&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Let's consider the advantages of this approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Less CPU intensive&lt;/em&gt;. If we execute the query, described in &lt;strong&gt;Statement-based replication&lt;/strong&gt; chapter, &lt;strong&gt;follower&lt;/strong&gt; does not replay this query, but copy the value and apply the change to its own data record. So, the query gets executed once and doesn't consume all available CPU;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Helps to find data inconsistency&lt;/em&gt;. Since &lt;strong&gt;Row-based replication&lt;/strong&gt; stores the changes only, when &lt;strong&gt;follower&lt;/strong&gt; replays these changes and tries to apply to the data that exists on &lt;strong&gt;leader&lt;/strong&gt; but doesn't exist on &lt;strong&gt;follower&lt;/strong&gt;, it throws the error. Meanwhile &lt;strong&gt;statement-based replication&lt;/strong&gt; proceeds with what it has and keeps the inconsistency hidden, complicating the ability to find the point of failure and fix it;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;No non-deterministic behavior&lt;/em&gt;. Compare to &lt;strong&gt;statement-based replication&lt;/strong&gt;, if you execute the query that has &lt;em&gt;Non-deterministic functions&lt;/em&gt;, it ends up with the same result for both &lt;strong&gt;leader&lt;/strong&gt; and &lt;strong&gt;follower&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looks nice, that's what we were expecting from replication, right? But along with the advantages come disadvantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;High disk space consumption&lt;/em&gt;. Yeah, we just talked about &lt;em&gt;Less CPU consuming&lt;/em&gt; for this replication, but the disk space is also very important. Imagine if you have the following statement:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'published'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Considering the fact, that the &lt;code&gt;posts&lt;/code&gt; table has about &lt;em&gt;1.000.000&lt;/em&gt; of &lt;em&gt;Published&lt;/em&gt; posts, this query becomes quite expensive, since it requires storing &lt;em&gt;1.000.000&lt;/em&gt; of changes in &lt;strong&gt;binary/relay log&lt;/strong&gt; files;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Does not allow different schemas&lt;/em&gt;. Sometimes it might be useful when you have different schemas on &lt;strong&gt;leader&lt;/strong&gt; and &lt;strong&gt;follower&lt;/strong&gt; (I don't know about these cases, but they definitely exist). As it was described above, &lt;strong&gt;row-based replication&lt;/strong&gt; throws an error in case of data inconsistency, caused by different schemas;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Statement is not included in the &lt;strong&gt;binary log&lt;/strong&gt;&lt;/em&gt;. It can be not a problem at all until you try to debug or audit what's going on and what query caused damage to your database. &lt;strong&gt;Row-based replication&lt;/strong&gt; makes it hard to analyze.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now we are fluent in replication process language. Everything is clear. Hold on, on the picture of replication example, we can see one &lt;strong&gt;leader&lt;/strong&gt; and two &lt;strong&gt;followers&lt;/strong&gt;. Can we do things differently?🕵️‍♀️ &lt;/p&gt;
&lt;h3&gt;
  
  
  MySQL supported topologies
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Single-leader replication
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F481z07e9as3fxomjn2gu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F481z07e9as3fxomjn2gu.png" alt="Single-leader topology"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This type of replication is the most common one. It is useful when you have a lot of reads but not that many writes. You can distribute users reads among &lt;strong&gt;followers&lt;/strong&gt; load-balancing them and therefore providing better response time. With this replication topology, you can easily add one more &lt;strong&gt;follower&lt;/strong&gt; to it. Also, this topology prevents a lot of problems that multiple leaders topology have (will be described in &lt;em&gt;Leader-leader replication&lt;/em&gt;), since it has only one &lt;strong&gt;leader&lt;/strong&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  Leader-leader replication
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuge6q3vos5g12bkaomel.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuge6q3vos5g12bkaomel.png" alt="Leader-leader replication"&gt;&lt;/a&gt;&lt;br&gt;
As it is visible from the picture, this topology involves two &lt;strong&gt;leaders&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This topology is useful when you have different data centers in different locations and you need to provide fast writes to both regions.&lt;/p&gt;

&lt;p&gt;But with this advantage comes a great cost. Suppose, we have a table &lt;code&gt;post_statistics&lt;/code&gt; and you just realized that the number of posts with &lt;strong&gt;Published&lt;/strong&gt; &lt;code&gt;status&lt;/code&gt; is triple more than it's actually written in table and you decide to fix the situation. So, you connect to MySQL and execute the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;posts_statistics&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;posts_amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;posts_amount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'published'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Meanwhile, somebody from a different part of the world just published his/her first post (just like me) and triggered the following query to be executed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;posts_statistics&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;posts_amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;posts_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Suppose, the original number of posts was 10k. Due to replication lag, databases ended up with two different numbers: 30.001 and 30.003. And no errors were thrown. &lt;/p&gt;

&lt;p&gt;This is a big disadvantage of this topology, and in practice, it brings more problems than advantages. But if you ended up with this topology, it's better to add few more replicas to it 😁&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F523qvnf7soz5qxkhxwcu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F523qvnf7soz5qxkhxwcu.png" alt="Leader-leader with replicas"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Active-passive leader-leader replication
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhos5gxyt01k0zihf5ahf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhos5gxyt01k0zihf5ahf.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;em&gt;Active-passive leader-leader replication&lt;/em&gt; topology one server takes the role of the &lt;strong&gt;leader&lt;/strong&gt; and another one takes the role of the &lt;strong&gt;follower&lt;/strong&gt;. But in comparison to the ordinary &lt;em&gt;leader-follower topology&lt;/em&gt;, it allows you to swap easily the &lt;strong&gt;leader&lt;/strong&gt; responsibility from one server to another. &lt;/p&gt;

&lt;p&gt;It's useful in many cases. For example, if you execute &lt;code&gt;ALTER TABLE&lt;/code&gt; that locks the whole table for reads and writes, you can stop the replication process, easily swap &lt;strong&gt;leader&lt;/strong&gt; responsibility, execute &lt;code&gt;ALTER TABLE&lt;/code&gt; on the passive server, then swap the responsibility back, restore replication process, and execute &lt;code&gt;ALTER TABLE&lt;/code&gt; on the remaining server. It can help you to keep your service alive while executing that expensive query.&lt;/p&gt;

&lt;h4&gt;
  
  
  Other topologies
&lt;/h4&gt;

&lt;p&gt;There are many other topologies that are supported by MySQL:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Replication Ring topology 
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8x5ik6ozxn5i0d07h7o.png" alt="Replication Ring topology"&gt;
&lt;/li&gt;
&lt;li&gt;Tree of pyramid topology
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwwivdpuap96gfjxrbnp.png" alt="Tree of pyramid topology"&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And many others. You can choose the best topology that fits your purposes or create your own. This is the list 👆🏻 of the most common topologies used in MySQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Replication&lt;/strong&gt; is a mechanism of having a consistent copy of the data storage. It provides:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data distribution;&lt;/li&gt;
&lt;li&gt;Load balancing;&lt;/li&gt;
&lt;li&gt;Backups;&lt;/li&gt;
&lt;li&gt;High availability and failover.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Leader-based replication&lt;/em&gt; consists of &lt;strong&gt;leader&lt;/strong&gt; and &lt;strong&gt;follower&lt;/strong&gt;. Both of them have their own journal of changes: &lt;strong&gt;binary log&lt;/strong&gt; and &lt;strong&gt;relay log&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are two types of replication:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Statement-based replication&lt;/strong&gt;. It's represented in queries itself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Row-based replication&lt;/strong&gt;. It's represented in direct data changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are multiple topologies for replication:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leader-follower topology;&lt;/li&gt;
&lt;li&gt;Leader-leader topology;&lt;/li&gt;
&lt;li&gt;Leader-leader active-passive topology;&lt;/li&gt;
&lt;li&gt;Ring topology;&lt;/li&gt;
&lt;li&gt;Tree of Pyramids topology.
And many specialized topologies together with custom ones.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! Thank you for your attention! I hope you liked this post 😌&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>replication</category>
      <category>distributedsystems</category>
    </item>
  </channel>
</rss>
