DEV Community

Yuval
Yuval

Posted on

There are more than 2 UUID types - UUIDv4, 7, ULID, etc...

Tl;dr - UUIDv4, UUIDv7, ULID, Base64, Base58, Base85, HashIDs (hiding IDs on the frontend), libs compatibility between different SDKs.


So, you've all heard about UUIDv4. It's just a very random collection of bits, represented nicely.

Let's review some other UUIDs:

ULID / UUIDv7

UUIDv4 has a major issue - it would give you some issues when you try to order by UUIDv4.

Let's say you have a database table, with id which is just a simple int, and AUTO_INCREMENT.

first record will be id=1, second record will be id=2, etc..
Now, when you do something like SELECT * FROM my_table ORDER BY id, results will be sorted, but more importantly - the results will be close to each other.

e.g., if you iterate in chunks of 100 for example, you would not jump all over the database, but results should be pretty close to each other.

What happens with UUIDv4? There's no really point in sorting, because you just sort a bunch of random numbers.

In addition, you will "jump" all over the database when reading records. And if your DB is big enough, results you read will page out.

So what about ULID/ UUIVv7?

So, ULID / UUIDv7 are 2 protocol which offer to prefix the UUID with time signature; eg the IDs are always increasing.

This way, for example, if you have table with ULID/UUIDv7 as an index, you can run SELECT * FROM my_table ORDER BY id and it would make sense.

Problems with ULID/UUIv7?

So, one thing is adoption; another problem which is more problematic, is information leakage - given an ID, we can know when it was created..


"Nano IDs"

This is just a summary of this great article - The UX of UUIDs. Go read it now.

UUIDs are not easier to copy; the "-" in the UUID prevent from copying the whole string.

We can see what Stripe is doing - key is just a random string, without dashed; in addition it is prefixed with key description. For example:

STRIPE_LIVE_PUBLIC_KEY="pk_live_xUBcwUhe....."
STRIPE_LIVE_SECRET_KEY="sk_live_gpTjnUwB....."
STRIPE_TEST_PUBLIC_KEY="pk_test_CcfLsSzE....."
STRIPE_TEST_SECRET_KEY="sk_test_WFnNSjpB....."
DJSTRIPE_WEBHOOK_SECRET="whsec_LqqRWEKkd....."
Enter fullscreen mode Exit fullscreen mode

We can copy the key(s) using double-click, and also key has lots more information. How come?

Answer is, UUID (v4 for example) represents in hexadecimal base; Stripe IDs, however, represent in a different base. The bigger the base, the shorter the string for the same amount of data represented.

Base64 vs Base58

We all know about base64 (see FAQ if not) - but what is Base58??
Base58 is just like Base64, but with some confusing letters omitted; eg we remove I and l, remove O and 0 and o avoid confusion. and + and / to as well.

Base85???

Yes, another base is base85; let's say:

  1. you work for a software company which distributes signed .exe files to customers

  2. and you want the filename to contain url the executable should connect to on first run.

  3. You can't add this URL to the file content, since you would have to sign many different files (*).

  4. URL should be part of the filename

  5. Filename should be short as possible.

So - use you base85 to encapsulate the URL; the bigger the base -> lower string of the filename.
And this way you get a short filename.


Hash IDs

Let's say you have a SaaS, and you give each new user an ID. And you have a view of the format https://my-saas.com/users/123 (where 123 is the user_id) ;

What happens is, people can estimate the number of users in your website, by creating a new user and checking the id they got.

So - how can you hide the current user_id from the user itself??

One option of course is to use a random id, but then we would get all the issues of UUID (UUID is just a private case of random ID).

Another option, is to encrypt the ID using some key; and this is exactly what Squid (formerly HashIDs) is doing!

Using a secret key (*), you can convert id->string and string->id, and this way you can have something like https://my-saas.com/users/nVB, and convert nVB to user_id 123 in your backend.

Sample code:

# Taken from: https://github.com/davidaurelio/hashids-python

hashids = Hashids(salt='this is my salt 1')
hashid = hashids.encode(123) # 'nVB'

# and with different salt:
hashids = Hashids(salt='this is my salt 2')
hashid = hashids.encode(123) # 'ojK'
Enter fullscreen mode Exit fullscreen mode

What can be the problem with HashIDs?

Well, first we should check what algorithm does it use; and make sure e(d(id)) == id for all ids; eg that we can trust the lib (algorithm) to do conversions without an error.

Another issue, is we might be bound to a specific implementation (and thus technology), unless we prove that results are not changed when we switch lib.
Security review of the algorithm - the algorithm does some logic to avoid generating most common English curse words by never placing some letters next to each other ; so this might sound like trouble, entropy-wise..

Checking Cross-Language Consistency

What happens if frontend uses HashIDs with Javascript but backend uses Python/C++/Rust/ for example?


FAQ

Q: Do we really need 128 bits as an ID? isn't it too much? What are the odds of collisions?
A: Some people claim this isn't really needed. Referring to the Birthday Attack probability table we see that for 128 bits we need to have 1.6×1076 keys in order to get collision with 1% chance.

Q: What is base64 for?
A: Let's say you want to transfer information via text, eg you want to serialize info and send it to someone (other program)
Let's say you want to transfer information, eg serialize it. Serializing means converting information to text, so you can send it from one program to another

Q: Why the "" in "you would have to sign many different files"?
**A
*: There are some mechanisms to deal with it, eg signing the file except a small part of metadata, for example.

Q: If we use UUID with time as prefix, what happens on daylight saving time?
A: Nothing; as the time is unix epoch, which is always increasing.

Q: Why do hash IDs lib call the secret key "salt"? It's a secret, not salt..
A: The goal of hash IDs is to convert number to string and vice versa; eg supply a two-direction hash function.

Thus, in order to change the hash result, we use salt.
In the algorithmic layer, this hould indeed be called "hash".

In the Product/Marketing layer, this is should be called "secret".

Q: In "Checking cross-language", why do we need a Dockerfile for the test?
A: We really don't need; we can do this one time to check the implementations we need and that's it. Dockerfile is for demonstration purposes only.

References

UUIDs and poor index locality

Benchmarking UUIDs and checking WAL

https://buildkite.com/blog/goodbye-integers-hello-uuids

Top comments (0)