If you're writing a lot of business applications, you may not have much need for randomization. After all, when a customer checks out in your shopping cart, you don't want to charge them a random price. Or add a random amount of sales tax. Or send them a random product.
But there are definitely some times when "random" is a critical feature. And... this is where things get tricky. Because many devs underestimate how difficult it can be to represent "randomness" in an application. They also underestimate the public's general ignorance about randomness and probabilities.
Random(ish)
Most languages make it pretty simple to create virtual "randomness". For example, in JavaScript we can do this:
const dieRoll = Math.floor(Math.random() * 6) + 1;
This line of code rolls a virtual six-sided die. If you've done any reading about the inner plumbing of computer science, you may already know that this line of code doesn't provide true randomness. To put it another way, the "random" result of this line of code is actually a predictable outcome if we were to peer under the covers and track the seed that's being used to generate this so-called "random" number. This is often referred to as pseudo-randomness.
Another way to think of pseudo-randomness is that it's random to you. In theory, if you were tracking, in real-time, all the inputs that the algorithm is using to generate the "random" number, it wouldn't be random at all. You could predict, with 100% certainty, what every subsequent "random" number would be, every time we ran this line of code.
But you're probably not staring at the guts of your microprocessor. You probably have no idea what exact seed was used the last time this code was run. So, for all practical purposes, the number is random - to you. And for most applications that require "randomness", this lower-level pseudo-randomness is just fine.
This article actually is not a deep-dive into the surprisingly-difficult pursuit of true randomness. For the rest of this article, I'm only going to deal with pseudo-randomness. Because the deeper problem that affects many applications has nothing to do with the academic pursuit of true randomness. The deeper problem is that most people don't even recognize randomness when they see it. And when they misunderstand the nature of randomness, they tend to blame the application that's generating a supposedly-random sequence.
Random Occurrences vs. Random Sets
In my experience, most people have a very limited grasp of probabilities. (And as a poker player, I have a fair amount of experience with this.) They can usually give you a reasonable estimate on the probability that a single event might occur. But when you ask them how likely it is that a given set of events will occur over a specific period, the accuracy of their predictions quickly falls apart.
For example, if I ask people:
What are the odds of rolling a
1
on a single throw of a six-sided die?
The vast majority of everyone I know will (accurately) say that the chance is 1-in-6 (16.6%). But if I ask those same people:
What are the odds of rolling a
1
at least one time if I throw a six-sided die six times??
Too often, people consider this scenario and respond that the answer is: 100%. Their (flawed) reasoning goes like this:
If the odds of rolling a
1
are 1-in-6, then the odds of rolling at least one1
, over six throws, is1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 6/6 = 100%
.
(If you're unsure of the answer yourself, the chance of rolling a 1
, at least one time, over the course of six rolls of a six-sided die is: 66.6%.)
In general, people also perform poorly when they're asked to assess the distribution of an entire random set. For example:
Let's imagine that we have a single, six-sided die. And we're gonna roll that die six times. But before we make those die rolls, we're gonna ask people to predict how many times each number will occur. Most people would write down a prediction that would look something like this:
Number of rolls that will result in `1`: 1
Number of rolls that will result in `2`: 1
Number of rolls that will result in `3`: 1
Number of rolls that will result in `4`: 1
Number of rolls that will result in `5`: 1
Number of rolls that will result in `6`: 1
--
Total rolls that will occur 6
So here's the critical question:
What are the odds that the above prediction would be correct??
The answer would surprise a lot of people.
There is a 1.5% chance that each of the six numbers will only occur once over the course of six different rolls.
In other words, there's a 98.5% chance that those six rolls will not result in every number occurring once (and only once).
Phantom Patterns
Just as we can fail to understand the likelihood of random occurrences, we can also "perceive" non-random events that occur in the middle of otherwise-random noise. The human brain is, essentially, an analog pattern-matching machine. This trait evolved over millions of years - and we wouldn't be here today if it hadn't.
You can't wait to react until a lion is leaping at you. You must be able to discern the pattern of its face - even when it's mostly obscured through the bush.
You can't wait to to pay the chieftain your respects until he's standing right in front of you. You must be able to discern the pattern of his appearance - even when he's some ways off down the street.
In other words, pattern-matching is generally a good thing. We want to identify patterns as early and as often as possible. But this ingrained ability can often work against us - because we sometimes perceive patterns where they don't exist. (BTW, the name for this is: pareidolia.) And when we become convinced that a pattern has emerged, we also become convinced that the so-called "random" generator has failed.
We assume that patterns don't exist in random noise. And therefore, if we perceive a pattern in the random noise, we jump to the conclusion that this "noise" is not actually random at all. To see how this plays out in real life, let's consider a scenario with some playing cards.
Imagine that I have a standard deck of 52 cards. We'll assume that it's a "fair" deck (no magician's props here) and that I've given it an extensive shuffling using thorough and "accepted" techniques. Once the deck has been thoroughly randomized, I pull the top card off the deck, and it's:
The ace of spades
Would that result surprise you? I hope not. Because, assuming that the deck is "fair" and my shuffling skills are complete, the ace of spades has the same odds of ending up on the top of the deck as any other card.
So now I put the ace of spades back into the deck. And I again conduct a thorough-and-extended shuffling of all 52 cards. Once I'm done, I pull the top card off the deck, and it's:
The ace of spades(!)
Would that result surprise you? Maybe. If nothing else, it certainly feels like an odd coincidence, no? But I imagine that even the most hardcore conspiracy theorist would admit that it's possible for the exact same card to be shuffled to the top of the deck twice in a row.
So now I put the ace of spades back into the deck. And I again conduct a thorough-and-extended shuffling of all 52 cards. Once I'm done, I pull the top card off the deck, and it's:
The ace of spades!!!!!
OK. I can almost hear you thinking right now. You're saying, "C'monnn... The ace of spades? Three times in a row?? This must be rigged!" But here's my question to you:
How many times must the ace of spades come off the top of the deck before we can prove that the deck and/or the shuffling technique and/or the person doing the shuffling - are rigged??
The answer is very simple. As long as we are assessing nothing but the observable results, it is impossible to ever conclude, definitively, that any part of the process is "rigged". This is because, with no deeper analysis of the processes that surround the ever-repeating ace of spades, it's impossible to definitively state that this is not, simply, an incredible sequence of events.
To be clear, I understand that, on a practical level, at a certain point the incredible nature of the sequence becomes soooo improbable, and soooo mind-blowing, as to throw the integrity of the whole exercise into question. To put this another way, you can reach a point where "statistical improbability" becomes indistinguishable from "impossibility".
But I'm pointing out these phantom patterns because your users will be far quicker to claim "impossibility" than you will.
Who Cares??
This article will be a two-parter. If I try to cram this into a single blog post, no one will ever read it. Part two will explain, in some detail, why programmers can't ignore these issues.
It may feel like the "problems" I've outlined are just cognitive biases that have nothing to do with your code. But in part two, I'm gonna outline how these mental traps are not simply the users' problem. Even if your code is "perfect" and your randomization is mathematically flawless, that won't do you much good if the users don't trust your process.
Specifically, I'm going to outline some real-life use cases from Spotify where they've alienated some of their own subscribers because they failed to account for all the ways in which people can't comprehend randomness. I'm also going to illustrate how ignoring the issue can turn off your own customers - but trying too hard to "fix" it can also make the problem worse.
Top comments (7)
I love this topic. Randomness in programming has been fascinating to me for a long time. It just questions my belief in "computers do things exactly as they are told to, they can't be random".
I haven't taken a deeper look under the hood of what exactly happens, so I very much appreciate this post and am hoping for part 2 soon. The wikipedia Article and it's sources about /dev/random in Linux already fried my brain a few times, so I am excited to see where your post goes :D
I'll be getting much more practical (diving into some of the problems with "randomness" that I found in the Spotify app) and much less theoretical (like, the basis for computer "randomness" itself). But it is a fascinating subject and I may dive into it again in future articles.
Your belief that "computers do things exactly as they are told to, they can't be random" is pretty much... spot-on correct. For all their sophistication, there are some things that modern computers still just can't do. One of those things is: "Hey, computer. Give me a random number between 1 and 100." As basic as that sounds, there's no part of a microchip dedicated to generating random noise/numbers/whatever that returns independent values every single time it's invoked.
So how do computers create "randomness"? Well, the key lies in what they use as a seed. (I referenced this near the top of this article.) As long as the machine knows where to grab a seed that is constantly changing, it can then do all sorts of standard machinations to mutate that seed into something that looks like a random value. The simplest and most obvious seed that's available to a computer is the system time.
If we look at "regular" time, as humans typically use it, there doesn't seem to be much that's "random". We know that 4:19:59 will shortly be followed by 4:20:00 then 4:20:01 - and so on. But computers can measure time in tiny fragments - microseconds - and if we extend our time stamps to include them, our values start to look a whole lot more... "random". This is true even if we create a basic program that will grab, say, three timestamps (accurate to the point of microseconds). Even though it feels to us as though those timestamps have been created in such short succession that they essentially happened at the same time, all three of the timestamps will actually be quite unique.
Once the computer has a unique(ish), random(ish) value, it can then perform any number of transformations on the value to make it look even more unique and even more random. This is like the process of creating a hash. You can start with any input - of any size - and get back a standard-sized string that looks to the typical user like it's random gobbledygook. Of course, it's not random gobbledygook, is it?
When you create a hash, if you start with the same value (the "seed"), then you'll always get the same hash - every single time. Similarly, in pseudo-randomness, if you start with the exact same "seed", you'll get the same "random" value - every time. But computers spoof this by using values that are constantly changing (like, finite fractions of the system time) to generate a series of unique seeds. And because most people, sitting at a keyboard have no way of ever capturing the exact microsecond when their "random" number is generated, the resulting value is - to them, at least - random.
For applications that cannot settle for pseudo-randomness (think: cryptography), there are many other approaches whereby the computer can grab some kind of random value to use as a seed. These can include measuring the varying resistance inside the microprocessor itself - or the heat that's generated by the processor. Basically, any measurement of changing conditions, when carried out to a sufficient number of decimal places, essentially becomes "random".
FWIW, the problem of "true" randomness isn't confined to computers. We often use a coin flip as an example of a "random" event. But if you could analyze, in real-time, ALL of the forces that have suddenly been placed on that coin (e.g., rotational velocity, angular velocity, wind resistance, air viscosity, the hardness of the surface on which it will land, etc...), then you should be able to calculate, with absolute certainty, whether it will land on heads or tails. We only call the coin flip "random" because it's random to us. Because we don't have the ability to measure all those factors in real time. So the result is, effectively, "random". In other words, a coin flip is a real-life demonstration of pseudo-randomness.
I love this reply, thank you very much for taking the time to go to such length explaining this!
I'm already excited for what you got in stores for us :)
What really baked my noodle was when I realized, pseudorandom numbers were the only kind that existed in the universe.
Absolutely. Randomness (or the lack thereof) is a fascinating subject for probability nerds like myself.
I'm really looking forward to the next part now...
Hehehe - it won't be long...