So recently, I was on a live steam with @aydrian on Twitch where I demonstrated how to migrate a live serverless application (Next.js and Vercel) from a PostgreSQL database onto CockroachDB serverless with zero application downtime. If you're interested in that, please do check out the video.
CockroachDB@cockroachdbYesterday @itsaydrian hosted @HarshhhDev on his livestream. It's hard to describe how humbling it was to watch the way this 15 year old high school student owned the moment. He came prepared with educational slides and then did live demo.
Full show: youtu.be/U57GDhVEU5023:16 PM - 30 Mar 2022
In the stream, we talked a lot about different topics I had going on in the code, and one of them was about the difference between a CUID and a UUID. Now, at the time of the stream I was unsure of the difference between the two, and I just kind of used whatever.
After the stream, I too was curious what the difference is between the two. After doing some research, I'd like to share my findings.
We'll be looking at UUID and CUID, along with NanoID as it has started to gain traction lately.
What is UUID?
UUID, or universally unique identifier, is a 128 bit label used for information in operating systems. The probability that an ID will be replicated isn't zero, it's close enough that when we generate a UUID, the chances of the identifier being used ANYWHERE else is near zero.
An example would be something like that fits the format of xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, so something like 123e4567-e89b-12d3-a456-426614174000.
The problem with the UUID
While researching, I came across this blog by PlanetScale, which is a MySQL-compatible serverless data platform.
The title of the post was "Why we chose NanoIDs for PlanetScale's API", and the post goes onto talk about how UUIDs are great and all as it's near impossible to generate a duplicate identifier, however they have a huge problem: they're too big, and take up too much space in the URL.
Don't we all love clean and short URLs? Nobody likes seeing a bunch of digits and weird symbols in your URL. Quoted from their blog,
They take up a lot of space in a URL:
Try double clicking on that ID to select and copy it. You can't. The browser interprets it as 5 different words.
It may seem minor, but to build a product that developers love to use, we need to care about details like these.
Which is true! If we look into the history of UUID, we realise that it wasn't ever really made to be used in web applications. If we have a website such as, say, HasteBin, which depends upon generating these short, three letter IDs to easily share with people, this causes a problem.
CUID aims to solve the exact problem we discussed above with UUIDs. Quoted from CUID's GitHub:
Modern web applications have different requirements than applications from just a few years ago. Our modern unique identifiers have a stricter list of requirements that cannot all be satisfied by any existing version of the GUID/UUID specifications
CUID aims to focus on horizontal scalability (as today's applications don't run on a single machine), performance, size, security, and portability. The GitHub repository offers a more specific breakdown of the problems that this aims to solve.
They're shorter, but what about collisions? I found this online benchmark ran on a GitHub gist, showing that the chances for collisions are still extremely slim. It found no collisions on 100 million iterations!
Since the talk I gave about was related to Prisma ORM, the Prisma schema (in relational databases) can generate both either a CUID or UUID, depending on the needs of your application.
...what about NanoID?
As a bonus, we'll being going over NanoID. It's said to be a tiny (only 130 bytes minified and gzipped), fast (x2 faster than UUID), safe, short, and portable.
NanoID's website has a cool visualiser tool in it, where we can calculate the chances of a collision. With an ID length of 15 characters (pretty short and sweet), and a generation of 1,000 IDs every second, it'll take a mind boggling 569 thousand years for it to have a 1% probability of a single collision. Now if we switch that to 1000 IDs being generated every single second, it'd still take around 158 for it to have a 1% probability of at least a single collision. That's longer than me or you will probably live.
To visualise that, let's see:
- Human lifespan: 79 years.
- Life on Earth will be impossible in ~1.1 billion years.
- Age of the Earth: ~4.543 billion years.
- Age of the Universe: ~13.799 billion years.
NanoID also lets you customise the alphabet it uses, so you can remove and/or add in custom alphabets.
Anyways, I think you get the point. Just because something's short doesn't mean it's insecure - but let's move onto another metric: performance.
On the NanoID's README, there's a section for benchmarks which they've run with other popular generators. As we can see here, compared to uuidv4, there's a huge upscale in performance.
With that being said, I think I've given a fair comparison of the different methods of ID generation, their pros, and their cons. For most cases, I believe the best option is NanoID due to the fact that it's insanely customisable whilst being performant. It's slowly but surely taking over uuidv4, if we look at this npm trends comparison between the three. In the context of Prisma, the built-in CUID is perhaps the best choice.
Top comments (4)
There's now Cuid2 which replaces cuid v1
Thanks for the update!
Nice to see alternatives to balance size and performance, but I still love sequences in many cases dev.to/yugabyte/uuid-or-cached-seq... 😉
Yes! Sequences are cool for sure. It just depends on how well a database is able to support them in most cases. In the case of YugabyteDB, it's able to support sequences quite well from what I can read in the post :)