<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Haokang Den</title>
    <description>The latest articles on DEV Community by Haokang Den (@hden).</description>
    <link>https://dev.to/hden</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F45081%2F3fd103fd-2015-46e8-9c99-49332a6cf8ed.jpeg</url>
      <title>DEV Community: Haokang Den</title>
      <link>https://dev.to/hden</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hden"/>
    <language>en</language>
    <item>
      <title>Thoughts about primary key</title>
      <dc:creator>Haokang Den</dc:creator>
      <pubDate>Sun, 19 Nov 2017 02:40:22 +0000</pubDate>
      <link>https://dev.to/hden/thoughts-about-primary-key-91b</link>
      <guid>https://dev.to/hden/thoughts-about-primary-key-91b</guid>
      <description>

&lt;p&gt;Many of us who manages databases needs to create identities for their table. As the business grows the table schema evolves with it, but the identities often tend to stays longer.&lt;/p&gt;

&lt;p&gt;As someone who interacts with data/database on a daily basis, I spend a lot of time thinking about how to make them better.&lt;/p&gt;

&lt;p&gt;Let's discuss one of the pitfalls by creating two tables for an growing hypothetical e-commerce startup: &lt;code&gt;products&lt;/code&gt; and &lt;code&gt;customers&lt;/code&gt;. It's often tempting to model the identities using the built-in mechanisms, for example:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL --&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- 🌶&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="n"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- 🌶&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I'm skipping over a few details, but I hope you will get the gist. Each products and customer has an numerical ID and one or more associated columns. There are two problems in the above code, which I’ve marked with a hot pepper 🌶.&lt;/p&gt;

&lt;p&gt;Let's add more detail to our hypothetical startup. Let's say that we used to create new products and customers by an internal admin system, perhaps a simple web form.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Tuuap_dt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://gist.githubusercontent.com/hden/757ab9d5715857cbe1bdb4f46c4a9148/raw/32ec779ddd3e2ab53fe27f8d64bf4c248f25d543/architecture_01.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Tuuap_dt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://gist.githubusercontent.com/hden/757ab9d5715857cbe1bdb4f46c4a9148/raw/32ec779ddd3e2ab53fe27f8d64bf4c248f25d543/architecture_01.svg" alt="Original"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's very likely that the database ids are used for internal communications. After all, names just aren't precise enough. We might even exposed them in the API endpoints, for example &lt;code&gt;GET /products/5487&lt;/code&gt; 🌶.&lt;/p&gt;

&lt;p&gt;As the business thrives, our startup needs to hire more people and start buying advanced software to manage it's growing operation. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sales department might want to use Salesforce as it's primary CRM.&lt;/li&gt;
&lt;li&gt;Financial department(s) might want to use some kind of inventory management system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So our operation changed significantly, and we must integrate with external systems each with it's own database and IDs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NX34TyTI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://gist.githubusercontent.com/hden/757ab9d5715857cbe1bdb4f46c4a9148/raw/c352e6d5ab95d76b290f12562b3e5adfac3dfcea/architecture_02.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NX34TyTI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://gist.githubusercontent.com/hden/757ab9d5715857cbe1bdb4f46c4a9148/raw/c352e6d5ab95d76b290f12562b3e5adfac3dfcea/architecture_02.svg" alt="Enterpriseimplementations Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's quite challenging to keep different databases in sync, especially when there is no common identity key.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using multiple keys simultaneously is quite confusing for humans.&lt;/li&gt;
&lt;li&gt;Using foreign system IDs as primary key is not a option since systems might be replaced once in a while.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we got the chance to really think about the requirement of a common identity key, perhaps in a cozy &lt;a href="https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/HammockDrivenDev.md"&gt;hammock&lt;/a&gt;, what makes an ID good?&lt;/p&gt;

&lt;p&gt;Let's start with the &lt;code&gt;CREATE TABLE&lt;/code&gt; statement. Regardless of specific database implementations &lt;code&gt;AUTO_INCREMENT NOT NULL PRIMARY KEY&lt;/code&gt; usually indicates that the column is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unique&lt;/li&gt;
&lt;li&gt;Not null&lt;/li&gt;
&lt;li&gt;Indexed&lt;/li&gt;
&lt;li&gt;Sequential&lt;/li&gt;
&lt;li&gt;Depends on internal states of the database 🌶&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last part is not very obvious, but it have caused scaling problems in &lt;a href="https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake.html"&gt;Twitter&lt;/a&gt;. In our case, we might want to generate our own primary key using one of the alternatives algorithms.&lt;/p&gt;

&lt;p&gt;Several alternatives have been proposed, including &lt;a href="https://blog.codinghorror.com/primary-keys-ids-versus-guids/"&gt;GUID/UUID&lt;/a&gt; and their semi-sequential counterpart &lt;a href="http://docs.datomic.com/identity.html#sec-6"&gt;Squuids&lt;/a&gt;/&lt;a href="https://github.com/ericelliott/cuid"&gt;CUID&lt;/a&gt;. Our initial table design can be modified by removing the &lt;code&gt;AUTO_INCREMENT&lt;/code&gt; statement. Using on of the algorithms above, we can easily generate the primary key in the API server or one of the external system.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL --&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- or VARCHAR(36), depends on your implementation&lt;/span&gt;
  &lt;span class="n"&gt;urn&lt;/span&gt; &lt;span class="n"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="n"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If exposing primary keys might raise some security concern. We can add a second unique resource name, perhaps in the &lt;a href="https://en.wikipedia.org/wiki/Uniform_Resource_Name#URIs.2C_URNs.2C_and_URLs"&gt;URN&lt;/a&gt; format just like Amazon did for &lt;a href="http://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html"&gt;AWS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mike Bostock's &lt;a href="https://medium.com/@mbostock/what-makes-software-good-943557f8a488"&gt;What Makes Software Good?&lt;/a&gt; (for using 🌶 as a marker)&lt;/li&gt;
&lt;li&gt;Tom Harrison Jr's &lt;a href="https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439"&gt;UUID or GUID as Primary Keys? Be Careful!&lt;/a&gt; (for proposing a secondary key for public API)&lt;/li&gt;
&lt;/ul&gt;


</description>
      <category>database</category>
      <category>architecture</category>
      <category>technicaldebt</category>
    </item>
  </channel>
</rss>
