DEV Community

Discussion on: What would you use as a sortable, globally unique, ID?

Collapse
 
cathodion profile image
Dustin King • Edited

There are probably better answers, but a concatenation of:

  • timestamp (yyyy-mm-ddThh:mm:ssZ)
  • sequence number (of item generated by same node at same timestamp, left padded with zeroes so the numbers are sortable asciibettically)
  • node ID (also zero-padded if numeric)

Maybe include milliseconds or more depending on how fast and frequently items are generated. Also maybe node ID before sequence number depending on whether you'd rather have ones with the same node ID together, or have the same sequence number together for a given timestamp.

Something like:

2019-09-08T22:17:41Z;00003;grayarea

Edit to add: Maybe use _ to separate the fields to make it more distinct:

2019-09-08T22:17:41Z_00003_grayarea

Collapse
 
rhymes profile image
rhymes

Thanks Dustin, the idea of using a timestamp is pretty popular and I agree.

The problem with using a sequence generator though is predictability and also collision, which to be avoided would require coordination which is not feasible. That's why you shouldn't use things like custom random functions for secrets :D

The node ID (whatever we want to use) is also another predictable part. We don't want your node to generate IDs posing as my node, for example.

It's a really tricky thing to come up with something that's random, secure and portable.

UUIDv1 had the timestamp and the node ID (the MAC address) but it wasn't secure because MAC addresses could be guessed and spoofed.

All the implementations I saw around have a completely random part attached to the encoded timestamp, which I guess is the best way to avoid those issues.

Some have suggested ULID directly or variations on the same tune (encoded timestamp plus completely random string) for those reasons.

Collapse
 
cathodion profile image
Dustin King

I guess I was assuming the nodes were in a relatively trusted environment. Like, maybe distributed geographically but communicating on a secure network, where nodes are owned by the same entity. Also being able to rely on some authorities for unique node IDs (and time synchronization).

If spoofing is an issue, one could require that each ID is cryptographically signed using the node's private key. However, appending a signature might make the ID a lot longer, I'm not sure.

Thread Thread
 
rhymes profile image
rhymes

I guess I was assuming the nodes were in a relatively trusted environment.
Also being able to rely on some authorities for unique node IDs (and time synchronization).

This requires synchronization though, which is another problem in itself. It means having to handle a sort of "god machine" that releases node IDs everytime a node comes online, which still is tricky for serverless environments, unless you are considering every single process of the app a separate node. Keeping in mind that the authority could be offline or unreachable or too slow and yet another machine to handle, monitor, keep updated and secure and so on...

Unless tracing back the originating node from the ID is paramount (which could have security issues onto itself in case an ID leaks outside the trusted area), I believe letting go of the whole idea of embedding a node identifier in the final ID is a way to sidestep all of these things

Thread Thread
 
cathodion profile image
Dustin King

I think I would need to know what this is for to go further. The simplest thing satisfying your original criteria would probably be a timestamp down to the nanosecond + random per-item hex string to avoid collisions.

If node identity needs to be kept secret, then the source of random numbers could be a potential point of deanonymization, so a CSPRNG might need to be used.

Timestamps have some potential issues, but depending on the application, they may or may not matter.

Thread Thread
 
rhymes profile image
rhymes

The simplest thing satisfying your original criteria would probably be a timestamp down to the nanosecond + random per-item hex string to avoid collisions.

This is what we ended up choosing, though it's down to the millisecond. We're using ULID as the format. It's a hash of a timestamp and a random string

then the source of random numbers could be a potential point of deanonymization, so a CSPRNG might need to be used.

yeah, exactly. It uses the secure generator for the random part

Thanks for the exchange!

Thread Thread
 
cathodion profile image
Dustin King

Likewise :)