DEV Community

Cover image for Understanding the Birthday Paradox in Software Development
Zoe Clegg
Zoe Clegg

Posted on

Understanding the Birthday Paradox in Software Development

If your code generates IDs or keys using some kind of random generator then it’s important that it generates unique values and not ones already in use. Universally Unique Identifier (UUID) is a widely used identifier which is usually a 36 character string, but depending on your use case you might not need something this long - particularly if the ID needs to be user-facing. But how do you know how many characters is enough? This is when the Birthday Paradox comes into play.

What is the Birthday Paradox?

In a room of just 23 people, there is a 50% chance that at least two of them share a birthday. Given that there’s 365 possible dates for each person’s birthday, it doesn’t sound right that you’d only need 23 people for that statement to be true. That’s why it’s called the Birthday Paradox.

The maths behind it

Let’s say that p(n) is the probability that in a room with n people, nobody shares a birthday. This means 1-p(n) is the probability that at least two people share a birthday.

If there’s only one person in the room, they could have a birthday on any date, since there’s no one to share it with. So they have a 365/365 chance of not sharing a birthday.

When a second person enters the room, in order for them not to share a birthday with the first person, they have a 364/365 chance. A third person would have 363/365, and so on…

To determine the probability of all of the above statements to be true, we multiply the probabilities together. So the probability that none of them share a birthday is:

p(n)=(365/365)(364/365)(363/365)(365(n1)/365) p(n) = (365/365) * (364/365) * (363/365) * … * (365-(n-1)/365)

So when n = 23, this means p(23) = 0.4927

Therefore, 1 - p(23), the chance of at least two people sharing a birthday is 0.5073.

What does this mean for development?

If we replace birthdays with ID values, the chance of two randomly generated IDs being the same is higher than you might intuitively have expected. In software, a clash could lead to errors, security risks or even data being overridden. Therefore it’s important to ensure that the number of possible unique IDs is large enough that the chance of the same ID being used twice is extremely low.

The number of possible ID values you need is directly connected to the number of IDs you might expect to generate over the lifetime of your software.

Let’s say your IDs refer to users of the software. Are you likely to be dealing with hundreds, thousands, millions or even more users?

You can use this Birthday Paradox Calculator to work out how many possible ID values you should have to minimise clashes.

Ways to combat the birthday paradox

There are three common approaches to mitigating the birthday paradox:

  • Increase the length of the ID: more characters means more possible ID values
  • Allow more character types: using a combination of letters, numbers and symbols leads to more possibilities
  • Implement a retry: as a fallback, if a generated ID already existed, generate another ID until it is unique

Top comments (0)