<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thomas Bracher</title>
    <description>The latest articles on DEV Community by Thomas Bracher (@sadraskol).</description>
    <link>https://dev.to/sadraskol</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1794%2Ffe4f390b-4198-41fe-ad42-5587b2beb21d.png</url>
      <title>DEV Community: Thomas Bracher</title>
      <link>https://dev.to/sadraskol</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sadraskol"/>
    <language>en</language>
    <item>
      <title>Could not create unique index: how to solve duplication errors</title>
      <dc:creator>Thomas Bracher</dc:creator>
      <pubDate>Fri, 23 Sep 2022 08:41:54 +0000</pubDate>
      <link>https://dev.to/doctolib/could-not-create-unique-index-how-to-solve-duplication-errors-44j9</link>
      <guid>https://dev.to/doctolib/could-not-create-unique-index-how-to-solve-duplication-errors-44j9</guid>
      <description>&lt;p&gt;Unique indexes enforce the uniqueness (as their name implies) of one or more columns' values. They can guarantee one-to-one relationships in your model. For instance, let's have a unique shopping cart per user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;uidx_shopping_carts_on_user_id&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;ERROR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;could&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="nv"&gt;"uidx_shopping_carts_by_user_id"&lt;/span&gt;
&lt;span class="n"&gt;DETAIL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="k"&gt;Key&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;duplicated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wow! What was that? It seems that the index can not be created: some existing rows already infringe the constraint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Oopsy
&lt;/h2&gt;

&lt;p&gt;Let's explore the data a bit. Can we spot culprits?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

 &lt;span class="n"&gt;user_id&lt;/span&gt; 
&lt;span class="c1"&gt;---------&lt;/span&gt;
       &lt;span class="mi"&gt;1&lt;/span&gt;
       &lt;span class="mi"&gt;2&lt;/span&gt;
       &lt;span class="mi"&gt;2&lt;/span&gt;
       &lt;span class="mi"&gt;3&lt;/span&gt;
       &lt;span class="mi"&gt;4&lt;/span&gt;
       &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oopsy, "2" appears twice. This can be quite an issue. It can go wrong for 3 reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The user can see the wrong cart, leading to items disappearing&lt;/li&gt;
&lt;li&gt;Your data team has to apply heuristics to reunify the duplicates&lt;/li&gt;
&lt;li&gt;Worst case: it leads to complex bugs because everyone suppose shopping carts are unique&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately, postgres cannot create a unique index as long as the duplicate items exist.&lt;/p&gt;

&lt;p&gt;First things first: how many duplicates do you have? Some self joining request does the trick:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;having&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

 &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-------&lt;/span&gt;
 &lt;span class="mi"&gt;76533&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are too many duplicates for a manual cleanup... Create a dashboard to track the duplicate injection rate. If your table already includes a &lt;code&gt;created_at&lt;/code&gt; datetime, you can use a variation of the previous query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="n"&gt;last_occurrence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;having&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt;   &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt;
&lt;span class="o"&gt;-#&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

 &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-----------------+-------&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;193&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;484&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have the duplicate rate per day. This is useful to check if our future fixes work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remove the root cause
&lt;/h2&gt;

&lt;p&gt;Let's consider your endpoint implements the following pseudo-ruby code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
  &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A quick fix is to check the existence of the row before creating it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
  &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists?&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way no duplicate creation, right? You deploy and check the numbers the next day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-----------------+-------&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;8&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;193&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;484&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait what? Still more duplicates!!!!&lt;/p&gt;

&lt;h2&gt;
  
  
  Transactions
&lt;/h2&gt;

&lt;p&gt;This is a race condition happening. Two requests check existence and insert at the same time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;request 1&lt;/th&gt;
&lt;th&gt;request 2&lt;/th&gt;
&lt;th&gt;select count(*) from shopping_carts where id = 2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;exists()... == false&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;exists()... == false&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;insert()...&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;insert()...&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Transactions are gonna help, but not directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
  &lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists?&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You deploy this code and watch its effects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-----------------+-------&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;15&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;8&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;193&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;484&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem is not solved. Transactions are configured to &lt;code&gt;committed read&lt;/code&gt; isolation by default. If you replay the previous race condition under this isolation, you'll find no violation. Transactions do not avoid race conditions. Fortunately, there are tools to stop the bleeding: locks.&lt;/p&gt;

&lt;p&gt;Locks are usually managed by your database for operations like indexation, etc. Locks are a common way to mutually exclude processes from shared resources. You can also manually lock a row. Mutual exclusion is preventing 2 processes to access the same resource.&lt;/p&gt;

&lt;p&gt;In postgres you can lock rows or tables using the &lt;code&gt;for udpate&lt;/code&gt; keyword in a &lt;code&gt;select&lt;/code&gt; query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
  &lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"FOR UPDATE"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists?&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You deploy and check the numbers the next day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-----------------+-------&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;11&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;15&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;8&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;193&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;484&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oopsy, the lock does not work? Because the &lt;code&gt;select&lt;/code&gt; query returns no rows, no lock is upheld. We need to lock something &lt;strong&gt;that exists&lt;/strong&gt;. Great thing that we have a unique user!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;
  &lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_lock&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;user_id: &lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists?&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="no"&gt;ShoppingCarts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here the transaction is finally helpful. The lock is released when the transaction is committed. You can deploy and see the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="n"&gt;last_occurrence&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-----------------+-------&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;11&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;15&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;8&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;193&lt;/span&gt;
 &lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="mi"&gt;484&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No line for 2022-09-17: victory! Next, we remove duplicates and create the unique index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Removing duplicates
&lt;/h2&gt;

&lt;p&gt;Removing duplicates can be a touchy matter. I find the following heuristic true most of the time: the latest modified row is the most updated. Some frameworks automatically generate &lt;code&gt;updated_at&lt;/code&gt; columns. This can be useful to apply the heuristic. We use &lt;code&gt;updated_at&lt;/code&gt; column to discriminate the old rows to delete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_updated_at&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;having&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_updated_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way only the last updated rows remain. Let's check if there's no duplicate left:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="k"&gt;having&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="o"&gt;-=&lt;/span&gt;   &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

 &lt;span class="k"&gt;count&lt;/span&gt; 
&lt;span class="c1"&gt;-------&lt;/span&gt;
     &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice!&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating the unique index
&lt;/h2&gt;

&lt;p&gt;With no duplicate in the table left, we can create the index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;uidx_shopping_carts_on_user_id&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time, there's no error. Every user has at most one shopping cart. When attempting to create duplicates, you encounter the following error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;shopping_carts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;ERROR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;duplicate&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="n"&gt;violates&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt; &lt;span class="k"&gt;constraint&lt;/span&gt; &lt;span class="nv"&gt;"uidx_shopping_carts_by_user_id"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No process can violate the business rule, even under race conditions! Great value for the buck if you ask me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I hope you learned some sql knowledge. Here are the main takeaways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Learn some sql syntax to navigate easily in your data&lt;/li&gt;
&lt;li&gt;Columns &lt;code&gt;created_at&lt;/code&gt; and &lt;code&gt;updated_at&lt;/code&gt; managed by your ORM are great tools for maintenance&lt;/li&gt;
&lt;li&gt;Transactions do &lt;strong&gt;not&lt;/strong&gt; avoid race conditions&lt;/li&gt;
&lt;li&gt;Introduce unique constraints before you feed the data (this applies to other constraints)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had to solve a similar problem at work and &lt;a href="https://refer.hellotrusty.io/YP1WLAS9Lm"&gt;we are hiring talented engineers to solve this kind of problems&lt;/a&gt;.&lt;br&gt;
Take care.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Photo by &lt;a href="https://unsplash.com/@stefanbc?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Stefan Cosma&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/duplicate?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
