DEV Community

Cover image for UUID in PostgreSQL

UUID in PostgreSQL

Franck Pachot on January 27, 2024

This month's PGSQL Phriday #015 discusses using UUID in PostgreSQL. I have previously written on this topic (UUID or cached sequences?) , and I won...
Collapse
 
rponte profile image
Rafael Ponte

Thanks for sharing it, Franck. Excellent article 👏🏻

I'm not a database specialist like you, but as an application developer, I try to take the best out of the databases I work with as much as possible. I've been studying this theme since I wrote an article about it (in pt_BR 🇧🇷). So, I could sum up the "UUID as a PK" issue with RDBMS as follows:

  1. The more random the key is, the worse it is for the database performance, especially with MySQL and MSSQL Server that use clustered indexes by default (that's why UUIDv4 might be terrible);

  2. If your dataset fits into memory (buffer cache and OS cache), I would say it does not matter if you're using UUIDv4 or a sequential key;

  3. I think this issue we'll be in the past as soon as UUIDv7 becomes a standard;

Well, although I still favor and recommend using sequential keys (int/bigint) over UUID as PKs, I also understand it will not be a big issue for the majority of microservices out there since their data and workload were partitioned already 😊

Collapse
 
soanvig profile image
Mateusz Koteja

Even more interesting is using uuidv7 (draft RFC) generated in the application. The best thing is that it is incremental unlike uuidv4 you use typically. Uuidv7 is compatible with v4, that is postgres can store it as uuid type, and you can sort by it easily.

Collapse
 
brianl profile image
Brian Lim

One other consideration that goes all the way to the frontend is Relay and GraphQL. In particular, Relay expects a globally unique identifier graphql.org/learn/global-object-id... . Of course generally the object type (table) is automatically appended to the ID so there's no need to specifically choose UUID. But supposedly you can use both in the same table (have an autoincrementing sequence plus another column with UUID and a non-clustered index). That is probably the best of both worlds.

Collapse
 
rponte profile image
Rafael Ponte

@franckpachot, could you talk a little bit more about those 2 statements? Are they related? I got a bit confused by them.

It's also important to keep in mind that certain database limitations, such as PostgreSQL declarative partitioning or Citus sharding, may limit the possibilities of enforcing uniqueness through global indexes.

And this one:

While a UUID provides a high probability of uniqueness, a sequence strictly guaranties it, so if you cannot create all the unique constraints, a sequence may be the safer choice.

Collapse
 
krowin profile image
KRowin

As ignorant end user my take away is: that’s awesome, I can request from vendor to integrate it and make available into our postgresql based system as in our line of work we need UUID in virtually every business correspondence with government agencies.

Collapse
 
franckpachot profile image
Franck Pachot AWS Heroes

Yes, it makes sense to have a UUID when it is visible to the user and I suppose it must be random (not disclosing the insert time). Just know that B-Tree on them will be larger and see random writes so it is important that they fit in memory cache