DEV Community

Discussion on: Generating unique IDs in a Large scale Distributed environment

Collapse
 
golfman484 profile image
Driving change • Edited

Great article! I used a similar time based + sequence ID generator to create "unique" class IDs in a visual class editor many years back.

Just one point I would like to make: It would seem that if you ran this on many nodes and requested lots of IDs that it is likely that duplicate IDs will be generated because of the way the node ID is derived from the MAC in the createNodeId() method.

While every MAC should be unique, by taking only the 5 least significant bits of the MAC means that you're only using 5 bits of the last byte of the MAC. The probability that two MAC addresses will have the same last byte is 1 in 256. The chance that they will have the same 5 least significant bits is 1 in 160 (5/8 * 256). If only winning the lottery had odds as good as that!

Let's say you decided to have 32 nodes then the chances of any two nodes ending up with the same node ID becomes 1 in 5!!! (160/32) - now that's the lottery I want to buy a ticket in! :)

In this case it may be safer to manually (or otherwise) allocate a specific, unique node ID to each node via some configuration mechanism to ensure that no two nodes end up with the same node ID - and then the chance of two different nodes generating the same ID goes to zero in a million bazillion :)

I picked 32 nodes in the example above because that is the maximum possible nodes you can have with a 5 bit node ID (2 ^ 32) - if each node is allocated its own unique dedicated ID then all 32 nodes will generate unique IDs which may not happen if auto allocating node IDs based on the MAC via createNodeId().