Cadoret Théo for Onepoint x Stack Labs

Posted on Aug 24, 2020

Randomness, what did you expect ?

#linux #security #computerscience

Randomness is, at first glance, a hard to grasp philosophical concept that we are absolutely not going to tackle in this article. Definitely not.

Rather, we are going to use a pretty common and globally accepted definition, and what could be better for that than a community maintained definition such as Wikipedia's ?

“Randomness is the apparent lack of pattern or predictability in events”

Yes, we are going to talk about patterns and predictability.

But wait I'm getting ahead of myself, first, let's see why I am even writing about it.

Randomness is needed everywhere

Well, actually randomness is everywhere in software.

Video games make an extensive use of randomness to make each game different from the last. Sometimes to make enemies (or allies for what it's worth) unpredictable. Sometimes to generate a whole new world that the the player will explore.

_Randomly generated map in Don't Starve_

But it's not only for "adventure" games, it is also very useful in virtual card games which are often based on the unpredictability of the deck shuffling.

_Online poker game_

One can also think of Spotify who offers to shuffle your playlists (actually their algorithm is based on random but is rigged, because we humans don't like it when it's too random).

_Spotify's shuffling algorithm schema_

Light simulators also make extensive use of randomness to simulate light diffusion on matter in every direction of space. Instead of modelling every light beam we randomly select few of them to save the machine from a certain death.

Optimisation algorithms as well, such as Monte-Carlo, could use some randomly chosen initial values.

Finally security is also a great playground for random based algorithms, especially for generating encryption keys. Indeed they must be absolutely unpredictable to be of any use. If someone could predict which key you're going to use, then it's easy for them to decrypt your message.

There are many more fields of application for this peculiar concept, but I think I made my point.
We need randomness.

Ok that's cool, but what's the big deal? If I just go Math.random() I'm fine right ?

Both "yes" and "no". That's where it gets interesting.

Randomness is far from innate

Let's go back to fundamentals, what is a computer and what is its purpose ?

In essence, a computer has an initial defined state, and a set of predefined instructions which will, by their execution, lead it to another state.

There is no place for actual randomness. Everything is fully deterministic and that's pretty useful actually.

So how do we cope with our needs being completely out of reach of our means ?

Donald Knuth gives us a clue for what could be our first move:

“In a sense, there is no such thing as a random number. For example, is 2 a random number ? Rather, we speak of a sequence of independent random numbers with specified distribution.”

Which matches pretty well with our definition of a computer that executes a set of instructions that lead it to a sequence of consecutive states.

So our first approach will be based on generating a sequence of numbers that seem random.

Pseudo-Random Number Generators (PRNGs)

This is where pseudo-random number generators come into play. They are entirely defined by two parameters :

A seed
A function

The seed is the starting point of the algorithm, the first value of the sequence.
Then to generate the sequence, one just has to apply that function to the seed, recursively. Each obtained value is part of the sequence.
The challenge is to choose a couple (seed, function) that will produce random looking values.

Middle square algorithm

The middle square algorithm is very simple :

// Define an initial value
Seed = 5678

// Square it
5678^2 = 32239684

// Take the 4 middle digits (if it's not 8 digits long, add leading 0s)
32[2396]84 = 2396

// Start over with the new value

Here you can find an online generator if you want to try it yourself.

The middle square algorithm applied to the seed 5678 gives this sequence :

2396 7408 8784 1586 5153 5534 6251 0750 5625 6406 0368 1354 8333 4388 2545 4770 7529 6858 0321 1030 0609 3708 7492 1300 6900 6100 ...

Which indeed seems very random to a human.

But the next iterations of this sequence point out one of the main issues of PRNGs :

... 6100 2100 4100 8100 6100 2100 4100 8100 6100 ...

As soon as the PRNG falls on a value it has already visited, it gets trapped in a loop. The length of this loop is called the period and can be very short as in this example : 4

Any PRNG will eventually cycle as the range (number of possible values) is limited and defined by the function itself. For the middle square algorithm the range is [0-9999]=> 10 000 possible values. And the best possible period is proved to be 4096.

But PRNGs are not dead. Middle square algorithm is just one of their soldiers, and not the mightiest.

Actually their Achilleus, the gold standard when it comes to PRNGs, is the Mersenne Twister algorithm. We are not going to it discuss in details though as it is fairly complex (for the curious and rather mathematical ones).

But it's worth discussing the Linear Congruential generator which can be viewed as a base for the Mersenne Twister.

Linear Congruential generator

The function for this PRNG is based on 3 parameters :

A : the multiplier
C : the increment
M : the modulus

f(x) = ( x * A + C ) mod M

So the range is defined by M so that f(x) ∈ [0, M-1].
Choosing M to be a power of 2 is very efficient as the modulus operation can be simply truncating the binary representation.
By correctly tuning all 3 parameters it is possible to obtain a very large range and period. One of the best sets of parameters involve choosing M a power of 2 so that M-1=2^N-1 is prime.

And a prime that is one less than a power of two is called... a Mersenne Prime !

Okay so let's sum up what we've got.

PRNGs :

do look random : Given a not too absurd set of parameters and a correct algorithm, the output sequence seems random to us humans.
can provide a uniform distribution : Before the first loop, values appear only once in the sequence. From an external point of view, they are equiprobable.
are fast to compute : They are rather basic algorithms and easy to compute
are deterministic : Meaning sequences can be reproduced if you know the seed and parameters. Which is good when used in tests for instance.

And for most use cases it's enough. One can think of video games, light simulator, card games... card games ?
It depends, if it is to draw your Solitaire initial state, PRNGs are widely sufficient, but when it comes to online card games such as Poker, where people bet real money and would have solid reasons to try to take advantage of our Achilleus heel, we need a different approach.
As you can imagine, it is also the case for encryption keys, which job is to protect from bad intentioned intruders.

For these use cases, we will need more than random looking sequences, we will need true unpredictability.

True Random

If you aim for true randomness, you must seek first for true unpredictability, a property that is inherently out of reach of a software.
One must search for it on the other side of the keyboard, in the real world, which is an incredible source of disrupted events. We call it entropy.

Entropy sources are everywhere and can take various forms, from atmospheric noise to radioactive decay.

For instance, Cloudflare (who is responsible for the security of about 10% of the overall internet traffic) uses a video stream from a camera pointed 24/7 at a wall of lavalamps to encrypt data.

This is the wall I'm talking about :

Online card games are not outdone either as PokerStars chose another option. Their card drawer algorithm relies partly on a light beam shot at a semi-opaque mirror.

If the photon passes the mirror that's a 0, if it gets reflected that's a 1.

As a last example I'll mention the Linux kernel which makes use of both hardware metrics and user inputs to fill its entropy pool.
Hardware metrics can take the shape of an SSD access time, which is not deterministic. User inputs can be timestamps of a keyboard pressure, or a mouse movement etc. This website gives an example of what can be done with a cursor movement to generate random strings.

But unpredictability is not sufficient to provide randomness. We also need a uniform distribution of probability among the possible values. Meaning we need the absence of pattern.

In order to provide this, we'll need CSPRNGs.

Cryptographically Secure Pseudo Random Number Generators (CSPRNGs)

CSPRNGs are basically just badass PRNGs often based on hash functions. It's an algorithm that takes a seed as input and outputs a value. They have 2 very important properties that make them way more secure than classic PRNGs :

Withstand the 'Sate compromise extension'

In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation.
Satisfy the next bit test

We say that a sequence of bits passes the next bit test for at any position i in the sequence, if any attacker who knows the i first bits (but not the seed) cannot predict the (i+1)st with reasonable computational power.

(Reasonable computational power means with less than the overall computing power on the planet, pretty safe.)

A good CSPRNG is SHA256 for instance ! (Secure Hash Algorithm)

On linux you can access this truly random source by reading from /dev/urandom...
... or /dev/random?

/dev/random vs /dev/urandom

When Linux OS receives entropy from the outside world, it stores it in an entropy pool. Entropy pool from which both /dev/urandom and /dev/random draw. The only difference is that /dev/random will block when it estimates that the entropy level is too low, whereas /dev/urandom can draw from this pool almost endlessly providing perfect randomness ("u" stands for unblocking).

You are safe using /dev/urandom for cryptographic purposes.

The nitty-gritty details can get a bit too technical for this article, the curious ones though can choose the red pill and dive into the rabbit's hole

Conclusion

Random numbers don't exist, we speak of a random sequence of numbers. This is something that software can easily simulate and fool human brains with a rather basic algorithm. They look unpredictable and uniformly distributed. This is more than sufficient for the vast majority of use cases, but it is inherently flawed. These pseudo random number generators won't fool well designed attacking software. In use cases where attacks are awaited, such as encryption keys, these simple algorithms won't do, we'll need a more robust source of unpredictability which can only be found through interactions with the real world.

DEV Community