<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nuno Martins</title>
    <description>The latest articles on DEV Community by Nuno Martins (@nmartinsx).</description>
    <link>https://dev.to/nmartinsx</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F579161%2F2ce229ac-de02-43d8-825b-5e369252d1fd.jpeg</url>
      <title>DEV Community: Nuno Martins</title>
      <link>https://dev.to/nmartinsx</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nmartinsx"/>
    <language>en</language>
    <item>
      <title>Choosing the right MySQL key-pair indexing</title>
      <dc:creator>Nuno Martins</dc:creator>
      <pubDate>Tue, 16 Feb 2021 20:29:37 +0000</pubDate>
      <link>https://dev.to/nmartinsx/choosing-the-right-mysql-key-pair-indexing-291m</link>
      <guid>https://dev.to/nmartinsx/choosing-the-right-mysql-key-pair-indexing-291m</guid>
      <description>&lt;p&gt;Recently, I had the task of creating a MySQL table which had the main purpose of looking up records exclusively by a pair of attributes. A pretty simple table, one might say, but I was in doubt of the best way to introduce indexes in order to prioritise performance. I'm aware this could probably be faster running on a NoSQL engine but this is not what this article is about.&lt;/p&gt;

&lt;p&gt;I knew I would have to associate the two attributes but was in doubt of what was better: using SQL's &lt;strong&gt;key-pair indexes&lt;/strong&gt; or a &lt;strong&gt;hash algorithm (md5)&lt;/strong&gt;? Did MySQL use a similar hashing algorithm or maybe even a faster one than md5? I googled around but was actually surprised I didn't find an obvious answer (or maybe I just suck at googling).&lt;/p&gt;

&lt;p&gt;I needed a benchmark that would give me answers so I scripted one thus this article objective is too show some performance results and get some conclusions. For comparison purposes I've also wanted to check the performance of a table with indexes assigned individually to each attribute. In total there are 3 tables to benchmark:&lt;/p&gt;

&lt;p&gt;Table individual_indexes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE `no_indexes` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `attribute_a` varchar(255) NOT NULL,
  `attribute_b` varchar(255) NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `attribute_a` (`attribute_a`),
  KEY `attribute_b` (`attribute_b`)
) ENGINE=InnoDB AUTO_INCREMENT=50001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Table keypair_index&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE `keypair_index` (
  `attribute_a` varchar(255) NOT NULL,
  `attribute_b` varchar(255) NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  KEY `keypair_index_index` (`attribute_a`,`attribute_b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Table hash_index&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE `hash_index` (
  `hash` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `attribute_a` varchar(255) NOT NULL,
  `attribute_b` varchar(255) NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL
  KEY `hash_index_hash_index` (`hash`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script for the benchmark was written in PHP 7.2 (running on Linux with MySQL 5.7.33). First test creates the same number of records on each table, populating the fields with random generated values, being that &lt;code&gt;attribute_a&lt;/code&gt; had four possible different values while &lt;code&gt;attribute_b&lt;/code&gt; had all distinct values. &lt;br&gt;
Please note that for the &lt;code&gt;hash_index&lt;/code&gt; table every record created needs to calculate the hash (with md5 algorithm) consisting of what I have decided to be a string concatenation of &lt;code&gt;attribute_a&lt;/code&gt; and &lt;code&gt;attribute_b&lt;/code&gt;. You can generate this hash in either PHP or MySQL.&lt;br&gt;
The tests were ran multiple times to make sure the results were consistent.&lt;/p&gt;

&lt;p&gt;Benchmark results of time to run multiple Insertions (time in miliseconds):&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zEJyYAHX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f3p3kxjinn1avdehsk8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zEJyYAHX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f3p3kxjinn1avdehsk8w.png" alt="table1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a very quick first look it would seem that &lt;code&gt;hash_index&lt;/code&gt; table has the worst performance when inserting records. But the fact is that it offers the best performance if you plan to have a lot more than 50k records on your table! This is because it has the lowest scaling factor (lower is better) meaning it scales better than the other tables and at one point, once it reaches a certain number of records, it will outperform the &lt;code&gt;individual_indexes&lt;/code&gt; table. The &lt;code&gt;keypair_index&lt;/code&gt; table started to look okay when inserting 10k records but it's evident it will become a lot slower than the other tables if more than 50k records exist. The &lt;strong&gt;winner&lt;/strong&gt; here, if you plan to have a lot of records and plan on scaling, is definitely the &lt;code&gt;hash_index&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;Next, I wanted to test the read speed of searches, which was for me the most important performance measure. The script attempts to search in the same way as the records creation, with random values of &lt;code&gt;attribute_a&lt;/code&gt; and &lt;code&gt;attribute_b&lt;/code&gt;, with MySQL cache disabled (this is to mimic a production table that is constantly being updated). Please note that for the &lt;code&gt;hash_index&lt;/code&gt; table is also necessary to calculate the md5 hash for every read(SELECT) operation because this is what identifies a record on the table.&lt;/p&gt;

&lt;p&gt;Benchmark results of time to do 10k searches on each table (time in miliseconds):&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gZ3SY-CL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pgremeo6az8erttx2srp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gZ3SY-CL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pgremeo6az8erttx2srp.png" alt="table2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now it is obvious that on every level the &lt;code&gt;hash_index&lt;/code&gt; table offers the best performance and scaling. Again, the lowest scaling factor (lower is better) of &lt;strong&gt;1.05(!)&lt;/strong&gt; indicates how well this will perform with huge tables although it is worth noting that this factor will slightly increase with the amount of records due to the nature of B-tree structure. I don't think I even need to bring the other tables to this discussion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt; A hash index will work much faster than a key-pair MySQL index if you want a table that performs a lot of lookups, specifically with two or more attributes combined, and scales much better.&lt;br&gt;
There may be value in using a key-pair index if you plan to do something like a log table that performs very low amounts of search operations and is planned to have a low amount of records, but on the long run I would always go with a hash index.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>sql</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
