Pseudorandom number generation

#algorithms

Quite often programmers in their work meet with the need of working with random numbers. Most often random numbers are required in modeling, numerical analysis, and testing tasks, but there are many other very specific tasks like:

Random selection of the responder. In this case, one wheel is enough.
Form mini-groups to work in class or to work on a long-term project.
Choice of tasks for students, both textual and using images.
Formation of sentences. You can represent wheels with sentence members. Students must form sentences with correct word order, cases, etc.
Winged phrases. You can break such phrases, and aphorisms into parts. Start wheels. The students have to indicate the correct or incorrect order of these phrases.
Randomly choose questions from the didactic game.
And much more..

They are all easily solved at SpinnerWheel, a relatively new site that provides an easy way to create custom wheel generators to randomly select names, numbers, words, and even images. A feature of SpinnerWheel is the ability to further customize the names, words, numbers, and images that appear in the wheel. You can also customize the sound and visual effects of the spinners you create.

Of course, all modern programming languages have a random function or their analogs. These functions most often give really good pseudorandom numbers, but I've always wondered how these functions work.
In this topic, I will try to explain how the linear congruential method (which is most often used in the random function) works, and the method of getting random numbers using a polynomial counter (which is often used for hardware testing).

Introduction

It is worth saying right away that it makes sense to generate random numbers only with uniform distribution since all other distributions can be obtained from the uniform one using transformations known from probability theory.
For those who have forgotten or have not yet studied probability theory, let me remind you that in a uniformly distributed sequence of zeros and ones, zeros will occur on average (!) in 50% of cases.

However, it does not mean that there will be exactly 500 zeros in a sequence of 1000 digits. Moreover, a sequence of 1000 digits can have 999 zeros, and the probability that the thousandth element will be equal to zero remains equal to 0.5. This seems paradoxical at first sight, but it is important to understand that all sequences are equally probable. If we consider a sufficiently large set of such sequences, then on average(!) each of them will contain 500 zeros.
Having remembered the theory a little bit, we move on to history. In pre-computer times, random numbers were obtained by pulling colorful balls out of bags, drawing cards, and rolling dice. You couldn't do serious research that way, so in 1927 Tippett published the first table of random numbers. A little later, people tried to automate the process somehow. Machines generating random numbers began to appear. Now such devices are also used and are called sources (generators) of entropy. It should be noted that only such devices can give truly random numbers. But, unfortunately, entropy generators are quite expensive, and it is not possible to install them on every PC. This is why the need for algorithms to produce random numbers arose.

The first attempt to obtain an NSP

Some people think that getting random numbers is easy. In their opinion, it is enough to perform random complex mathematical operations on the original number. If we open the second volume of the famous Knuth, we find that in 1959 Knuth also tried to build a generator based on such an idea. His algorithm looked like this:

[Select the number of iterations.] Assign Y to the largest significant digit of X. (We will perform steps K2-K13 exactly Y+1 times, i.e., apply randomized transformations a random number of times.)
[Select random step] Assign the next highest significant digit X. Move to step K(3 + Z), i.e. to a randomly chosen step in the program.
[Provide > 5 x 109] If X < 5000000000, assign X the value X + 5000000000.
[Middle of the square.] Replace X with the middle of the square X
[Multiply.] Replace X with the number (1001001001 X) mod 1010.
[Pseudo-complete.] If X < 100000000, assign X the value X + 9814055677; otherwise assign X the value 1010- X.
[Rearrange the halves.] Swap the five lowest characters with the highest.
[Multiply.] Perform step K5.
[Decrease digits.] Decrease each non-zero digit of the decimal representation of the number X by one.
К10. [Modified by 99999.] If A' < 105, assign X the value - X 2 +99999; otherwise assign X the value X - 99999.
[Normalize.] (At this step A' cannot be zero.) If X <109, multiply X by 10.
[Modification of the method of mean squares.] Replace X with the mean 10 digits of the number X(X - 1).
[Repeat?] If Y > 0, decrease Y by 1 and return to step K2. If Y = 0, the algorithm is complete. The value of number X from the previous step will be the desired "random" value.

Despite its apparent complexity, this algorithm quickly converged to the number 6065038420, which after a small number of steps converted to itself. The moral of the story is that you can't use a random algorithm to get random numbers.

Linear congruential method

In most programming languages, this is the method used in the standard random number function. This method was first proposed by Lehmer in 1949. Four numbers are selected:

Module m (m>0);
Multiplier a (0<=a<m);
Increment c (0<=c<m);
Initial value X0 (0<= X0 < m)

The sequence is obtained using the following recurrence formula: Xn+1 =(a* Xn +c) mod m.
This method gives really good pseudorandom numbers, but if we take the numbers m,a,c arbitrarily, we are likely to be disappointed. With m=7, X0 =1, a=2, c=4 we get the following sequence: 1,6,2,1,6,2,1,...
This sequence does not exactly fit the definition of a random sequence. Nevertheless, this failure allowed us to draw two important conclusions:

The numbers m,a,c, X0 must not be random;
The linear congruential method gives us repetitive sequences.

Any function mapping a finite set X to X will give cyclically repeated values. So, our goal is to lengthen as much as possible the unique part of the sequence (by the way, the length of the unique part cannot be more than m).
Without going into details of the proofs, let us say that the period of the sequence will be equal to m only if the following three conditions are met:

The numbers c and m are mutually prime;
a-1 is a multiple of p for every prime p that is a divisor of m;
If m is a multiple of 4, then a-1 must also be a multiple of 4.

At the end of the story about the linear congruential method, we should say that the sequences obtained with its help, although they are sufficiently random, are not cryptographically stable. Because knowing 4 consecutive numbers, the cryptanalyst can make a system of equations, from which we can find a,c,m.

Obtaining pseudorandom numbers based on the polynomial counter (shift register)

The algorithm we are going to consider is based on the exclusive OR (sum modulo two) operation. Just in case, let me remind you what the truth table for this function looks like:

The circuits shown in the figure below are the simplest polynomial counters. The zero bit in such circuits is calculated based on the exclusive OR function, and all other bits are obtained by a simple shift. The bits from which the signal goes to the exclusive OR are called taps.

Let's look at how the values in these registers will change when the initial value is 001:

Both registers start with the same value, but then the values generated by the registers start to diverge quickly. But after 6 steps, both registers return to their initial state.
It is easy to show that both of these registers have generated the longest possible sequence, which contains all combinations except zero. That is, if the register bit is m, you can get a sequence of length 2m -1.
The polynomial counter of any dimension has many combinations of taps that will provide the sequence of maximum length, using wrong combinations will lead to the generation of short sequences. A separate and quite difficult task is to find these combinations of taps.
It is worth noting that these combinations are not always unique. For example, for a 10-bit counter, there are two: [6;9] and [2;9], for a six-bit counter there are twenty-eight such combinations.
To find these combinations, we need to represent the counter as a polynomial. The counters from the example will have the following form: x2 XOR 1 and x2 XOR x XOR 1.

From theory, we know that a necessary and sufficient condition for generating a complete sequence is the primitiveness of the characteristic polynomial. This means that:

A characteristic polynomial cannot be represented as a product of polynomials of a lower degree;
The characteristic polynomial is a divisor of the polynomial zδ XOR 1, at(δ=2m -1, and is not a divisor at any other values of δ<2m -1.

The advantages of the polynomial counter are simplicity, both software and hardware implementation, speed of operation, and cryptographic stability.