Changing the schema for large relational databases costs companies millions of dollars every year. Relational databases are highly intertwined, so one small change can have unexpected, cascading effects.
The foundation for a table in a relational database is the primary key. Because of its importance, we have to give special consideration to how we select our primary keys.
In this article I’ll discuss why primary keys are important, different types of keys you can use, and a primary key with values that are guaranteed to have never, ever been used in the entire universe (with a 1/10^37 chance of error).
Table of Contents:
- Understanding Keys in SQL
- Creating a Primary Key with PostgreSQL
- A Universally Unique ID? Intro to UUIDs
- Creating a UUID Primary Key Using
uuid-osp- PostgreSQL Example
- Alternative to UUID - Auto-Increment Primary Key
- UUID vs Auto-Increment Primary Key
Understanding Keys in SQL
Relational databases have been around for almost 50 years. In its simplest form, the relational data model is a collection of tables containing rows of data. These databases receive the “relational” qualifier because “related” tables are connected together by keys.
There are 2 types of keys in SQL: primary and foreign.
Primary keys uniquely identify rows in a table. By placing a
PRIMARY KEY constraint on a specific column in a table, the database engine will guarantee that no row can be entered with the same key as an existing row. In the next section I cover how to create a primary key in Postgres.
A foreign key is a value in a second table that references the primary key for the first table. This primary/foreign key relationship is the glue that ties relational tables together.
In the relationship diagram above,
id in the
Pets table is the primary key. The
Pet Owners table creates a foreign key,
pet_id, that is refers is the same value as
id in the
Pets table. Because these two values are the same, now the
Pet Owners can be connected with a
Natural vs. Surrogate Keys
Primary keys can have either natural or surrogate key values. Natural keys are associated with a business context. For example, if you used a dogs breed as the primary key for the
Now, that would be inadvisable since many pets can be the same breed.
A surrogate key is a generated unique identifier with no additional meaning. The two primary keys we discuss later in this post, UUIDs and auto-incremented, fall into this category. In our
Pets table, a surrogate key would be a better option because it's guaranteed to be unique.
Syntax for Creating a Primary Key with PostgreSQL
The primary key is defined when you create the table:
CREATE TABLE PINK_FLOYD ( id INTEGER PRIMARY KEY, album_name TEXT NOT NULL, release_date DATE NOT NULL… );
In the background, Postgres implements primary keys by combining the
Creating a composite primary key
A primary key can also be created by combining two (or more) columns. In this case, the primary key is declared at the bottom of the
CREATE TABLE command, rather than at the top.
CREATE TABLE PINK_FLOYD ( id INTEGER, album_name TEXT NOT NULL, release_date DATE NOT NULL, PRIMARY KEY (id, album_name,) );
This type of concatenation is useful for organizing your database when you can be sure that the combination of the natural keys will be unique.
If you were at a very small company, for example, then you could use a combination of your employee’s first and last names as a primary key. However, this may not work at a larger company which employs more than one John Smith.
Primary keys guarantee uniqueness for each row of data within the same table. But what if we want a primary key that is unique across tables and even _ between databases? _
A Universally Unique ID? Intro to UUIDs
Universally Unique Identifiers, or UUIDs, were first introduced in 1990 by Apollo Computers. They were later standardized by the Open Software Foundation (OSF) as an ID that guaranteed “uniqueness across space and time.”
UUIDs are also referred to as GUIDs (Globally Unique Identifiers), which comes from Microsoft’s name for them.
UUIDs, or GUIDs, are 128-bit values, represented as 32 base-16 digits. But there are different methods for generating these 32 digits.
UUID_v4 vs UUID_V1
In UUID version 1 (UUID_v1), the values are generated by the computer system using a combination of the current time and the computer's MAC address (shown in the example above).
Alternatively, UUID_v4 generates the 32 random digits using a random number generator. This is the most commonly used UUID.
There is also a UUIDV5, but it is not random, so you have to make sure it’s unique by watching the inputs.
Creating a UUID Primary Key Using uuid-osp - PostgreSQL Example
Installing a SQL client
The first step is downloading a SQL client to run your SQL commands. The Arctype SQL client allows anyone to easily connect to their database and run queries.
First check if you have the extension already installed by running
SELECT * FROM pg_extension.
uuid-ossp is not in the returned list, first you would run the
make install commands to build from the source distribution.
make runs a compiling program, and
make-install copies the compiled files and sends them to their locations for you.
Now we caninstall the
uuid-ossp module by running:
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Adding UUID to a table - Syntax
Let’s say we wanted to create a table to store all of Pink Floyd’s albums. This is how we could add a UUID to that table using
CREATE TABLE PINK_FLOYD ( id uuid DEFAULT uuid_generate_v4 (), album_name VARCHAR NOT NULL, release_date DATE NOT NULL, PRIMARY KEY (id) );
We declare the primary key at the bottom, like when we were combining two tables to create one.
Alternative to UUID - Auto-Increment Primary Key
What if we don't need to be unique across all space and time? Another type of surrogate key is the auto-increment primary key.
An auto-increment primary key increases the key value by 1 for every new row that is added. MySQL uses an
AUTO_INCREMENT keyword, but in Postgres we create auto-incrementing keys by setting the data type as
CREATE TABLE PINK_FLOYD ( id SERIAL PRIMARY KEY, album_name TEXT NOT NULL, release_date DATE NOT NULL… PRIMARY KEY (id) );
Now, every time you
INSERT, Postgre will make a new, auto-incremented key for your insertion. For example, if you wanted to add data for a new album, you could run the following command without having to specify the id in
INSERT INTO PINK_FLOYD (album_name, release_date) VALUES (“The Wall”, 1979-11-30);
UUID vs Auto-Increment Primary Key
Auto-incremented primary keys increase read performance because they’re simpler than UUIDs and faster to scan, but they have some major downsides:
- Revealing potentially sensitive data
- Duplicate keys across databases
Auto-incremented keys reveal how many rows your table has, and you may not want that information to be public-- and sometimes you might want primary keys displayed in your URLs to keep them dependable.
Auto-incremented primary keys also have the potential to be duplicated across databases. This could cause a problem in distributed computing systems that require a key to be unique across each node.
But are UUIDs really unique?
You might say ok, but couldn't a UUID generate create the same primary key within a table? Since UUIDs are randomly generated, that’s technically possible.
Possible, but the probability is infinitesimally low. If you had 103 trillion version-4 UUIDs, the probability of finding a duplicate is one in a billion. At a rate of 1 UUID per second, it would take 3 million years to get to 103 trillion IDs.
UUIDs are sufficiently unique or almost every use case, but they can be a lot to store. If your table is too large to fit into the cache all at once, your UUID might be outside the cache if you’re selecting by UUID, slowing everything down instead of just moving to the next id.
So should I use a UUID?
If security is your top concern and you have a lot of storage, UUIDs are a good strategy. If you’ve got a large, but relatively unexposed database, you may want to opt for auto-incrementation.
In summary, primary keys uniquely identify rows in a table.
You can create a primary key in a couple of different ways, including by making a composite primary key.
If you want to use UUIDs, you’ll need to install
uuid-ossp in postgres and then include your UUID version in your
CREATE TABLE command.
There are pros and cons to UUIDs -- they’re very good for security, especially if your ids are in your URLS, but if your database is huge, they can slow it down in comparison to auto-incremented ids.
But can you put a price tag on being unique? Go forth and be the proud owners of a 128 digit number that no one else in the world has! (probably)
Selecting a primary key can be a big decision, but choosing a SQL client doesn't have to be. Arctype is the free, collaborative sql editor that makes it easier to work with your databases.
Top comments (1)
Great and comprehensive insights!!
I never really thought about UUIDs until I realized that incremental IDs can be potential security threats.. Not so long ago I was reading this piece on Parler hack (heard it first from @joshpuetz in the last episode of DevNews)
full piece on cybernews.com
Blew my mind and completely crushed my soul!