Introduction
Recently I read an article which gives a nice overview on generating non-repeating random numbers. After explaining the purpose of random number generators in general, the author presents three algorithms to generate non-repeating random numbers, namely:
- Repeatedly generate normal random numbers and compare each one with all the previously generated random numbers.
- Create a complete list of all numbers in the range of the desired random numbers, shuffle the list and take the numbers one by one from this list.
- Like 2. but without shuffling in advance, instead a partial shuffle (swap of two numbers) is done when a number is taken from the list.
The third algorithm is better than the second in the sense that the up-front cost of shuffling the complete list is avoided if only some few random numbers are needed from a larger range. But both can generate new random numbers in constant time. Unfortunately both algorithms need huge amounts of memory since all possible random numbers must be stored beforehand.
The first algorithm may need the same amount of memory since every generated number must be stored for the subsequent comparisons. On top of that its time complexity is very bad because every number generated must be compared to all numbers previously generated (this can be mitigated by using hash-based storage techniques, but drives up the memory consumption even more). Furthermore, it may happen that the algorithm does not terminate because (at least in the real world) random numbers have no memory - there is no guarantee that at some point a random number will come along that has not appeared before.
All three algorithms and their disadvantages (except for the last one mentioned) are explained very well in the article referenced at the beginning.
At the end, the author asks "Can we do better?" and his answer ultimately seems to be "no". Now "better" is a relative term. Perhaps the author is right if only the time complexity is considered or if the non-repeating random numbers have to be as "perfect" as those from a good random number generator. But in any case, there are other algorithms, e.g. based on block ciphers (format preserving encryption is another keyword) that perform the desired task. But these algorithms are incredible complex and therefore very hard to understand (apart from that, I'm pretty sure that they do not work as fast as the mentioned algorithms No. 2 and No. 3).
In the following, I would like to present a quite simple algorithm to generate non-repeating random numbers which are somewhat less perfect but certainly perfect enough for various purposes (at least for mine). The idea of the algorithm is simple, but the final implementation is a little bit tricky (an imaginery tree structure is recursivly navigated), so we approach the final version in three steps.
Remark: I implemented the algorithm in Java, but I rewrote it in Javascript for this article to make it more accessible.
Evolution of a non-repeating Random Number Generator
The algorithm presented here produces non-repeating integer random numbers in the range from 0
to max-1
, where max
can be an arbitrarily large value greater than zero.
The well known divide and conquer paradigm recommends to break down large problems into smaller ones, so let's break down this (possibly gigantic) range into smaller portions called pages from now on.
Every range [0...n]
with n >= 0
can be divided into maxPage + 1
pages not containing more than maxPageSize
elements whereby
maxPage = n / maxPageSize
holds. Sounds complicated, so let's explain it visually with n = 9
and maxPageSize = 4
and therefore maxPage = 2
:
0 1 2 3 4 5 6 7 8 9 ---> 0 1 2
3 4 5
6 7 8
9
Page 0 contains the elements [0, 3, 6, 9]
, page 1 contains [1, 4 ,7]
and page 2 contains [2, 5, 8]
.
Another example with n = 9
and maxPageSize = 2
:
0 1 2 3 4
5 6 7 8 9
Here we have maxPage = 4
with page 0 containing [0, 5]
, page 1 containing [1, 6]
and so on.
For the further explanation of the algorithm we will stick to these initial values, but of course it also works for arbitrary ones.
First version of the algorithm
The first (not so good) version of our algorithm just shuffles the pages as needed and delivers its values one by one until the desired number of random numbers is reached.
Let's say we need 3 non-repeating random numbers. The algorithm shuffles the first page [0, 5]
and delivers its values, than it shuffles the second page [1, 6]
and delivers its first value. The second value of the second page is internally stored for subsequent calls for random numbers. This means only four different results are possible:
[0, 5, 1]
[0, 5, 6]
[5, 0, 1]
[5, 0, 6]
As I said, the first version of our algorithm is not so good.
Here you can find a jsfiddle of the first version of our algorithm (the program only writes logging messages, so open the browser console). The calculation of the pages is done by the (ES6-) class Slicer
. This class will not change until the final version of our algorithm, so you only have to read and understand it once (the same holds for the functions assert
and shuffleInPlace
). The interesting things happen in NonRepeatingRandom_v1
and its function next
, which is called at the end of the jsfiddle:
const nrr = new NonRepeatingRandom_v1(10, 2);
// try it out!
// console.info(nrr.next(1)); // get only one random number
// console.info(nrr.next(5)); // get 5 random numbers
console.info(nrr.next(10)); // get all random numbers at once
Second version of the algorithm
The problem of the first version of our algorithm is that the calculated pages are processed one after the other. But wait!
0 1 2 3 4 <--- indices of the five pages
5 6 7 8 9
The page indices also form a range from 0
to some n
, but a much smaller n
than before (depending on maxPageSize
). So why not apply the first algorithm to create a non-repeating sequence of page indices which define which pages are used to generate the desired random numbers? Let's do it:
0 1 2
3 4
(again with maxPageSize = 2
, but this time with n = 4
and therefore maxPage = 2
)
This improves the randomness of the generated sequences as they now start with [0, 5]
or [5, 0]
from page 0 or [3, 8]
or [8, 3]
from page 3 (the page indices originate from the first page of indices [0, 3]
).
This is exactly what happens in the second version of our algorithm which can be found in this jsfiddle (again, don't forget to open the browser console).
Instead of using recursion (à la NonRepeatingRandom_v1
calls itself) two instances of Slicer
are utilized in a class NonRepeatingRandom_v2
- this way it is a good preparation to understand the third and final version of our algorithm. But first play with the second one - the generated sequences of non-repeating random numbers look better already, but there is still room for improvement.
Final version of the algorithm
I think you've already guessed it: The indices of the pages of indices (of the very first pages) again can be sliced into pages and so - until there is nothing left so slice, i.e., a Slicer
is reached which can produce only one page (maxPage = 0
).
With regard to our example we have (still with maxPageSize = 2
, but of course also different values can be used)
0 1
2
and
0
1
- which is also illustrated in the cover image at the top of this article.
For the human brain (at least for mine) it's not easy to capture the benefits of this "ultimate paging" because for large max
the recursion can go quite deep - it's a little bit like thinking in more than three dimensions.
For this reason, the final version of the algorithm is used in this jsfiddle to plot pixels inside a 500 x 500 canvas in random order. This is done by picking 500 * 500 = 250.000
non-repeating random numbers out of the range [0...249999]
, translating them into coordinates...
const x = Math.trunc(randomValue / 500);
const y = randomValue % 500;
...and drawing black squares of size 1 x 1 at these positions.
Technical remark: To make this process observable, the pixels are not plotted all at once. Instead only some of them are plotted in a method plotPixel()
which indirectly calls itself with window.requestAnimationFrame(plotPixel)
if the complete task is not yet finished. With 60fps requestAnimationFrame calls the function given as argument around 60 times per second - if plotPixel()
would plot only one pixel, it would take more than one hour to plot all pixels. To shorten this time plotPixel()
always plots 100 pixels at once - this way you can meditate in front of the screen for around 42 seconds watching the rise of the pixels...
Again I encourage you to play around with the code!
The constructor of the class NonRepeatingRandom
looks like this:
class NonRepeatingRandom {
/**
* Produces non-repeating pseudorandom numbers between 0 and max-1 (incl.).
*
* @param {number} max
* @param {boolean} maximize to maximize the number of slicers by repeating the last maxPageSize
* @param {...number} maxPageSizes
*/
constructor(max, maximize, ...maxPageSizes) {
You can
- change the size of the canvas
- try out different page sizes
- read the doc of the constructor of the class
NonRepeatingRandom
- a value of
true
for the parametermaximize
means "re-use the last givenmaxPageSize
if more thanmaxPageSizes.length
Slicer
s can be constructed"
- read the doc of the constructor of the class
- prevent the "ultimate paging" by setting
maximize
tofalse
when calling the constructor ofNonRepeatingRandom
- use a "usual" sequence of random numbers to plot the pixels and compare the visual randomness
- and so on, and so on
Time and space complexity
These complexity values are hard to calculate because they depend on the maxPageSizes
used when the algorithm is initialized.
Let's look at some border case: If the constructor is called with
new NonRepeatingRandom(max, false, max); // or `..., true, max`
then only one Slicer
is initialized which produces only one page [0, max - 1]
. In this case the algorithm equals algorithms No. 2 and No. 3 in the referenced article regarding time and space complexity.
If only the parameter max
is provided when calling the constructor
new NonRepeatingRandom(max)
then maximize = true
and maxPageSizes = [2]
are set as default values - therefore
new NonRepeatingRandom(max, true, 2)
initializes the class in the same way. With this sensible defaults the time and space complexity can be calculated easily:
To compute all max
random numbers (at once or step by step) each value in each page must be calculated at some point in time. As we have seen, the pages can be arranged in a tree structure such that the number of pages (roughly) halve in each stage. Together with
this shows that all random numbers can be computed in O(n) which means that in average one random number can be computed in O(1).
Analogously the space complexity can be calculated as O(log(n)) since only one page (resp. its values) must be kept available in each stage of the aforementioned tree structure.
Further optimizations
A disadvantage of the algorithm in its current form is that random numbers always come in groups (consisting of the values from their originating page) - w.r.t. our running example:
- 0 is followed by 5 or 5 is followed by 0
- 1 is followed by 6 or 6 is followed by 1
- ...
If this is a problem then it can be solved by always generating a minimum number of random numbers, which in turn are shuffled again and kept in a buffer. If random numbers are requested externally, they are fetched from the buffer. If the buffer falls below a certain fill level, the buffer is refilled with new random numbers and shuffled again.
If the buffer has a fixed size than it does not change the time and space complexity of the algorithm. On the other hand, if it has a fixed size then of course some max
can be found such that the grouping effect takes place in pages higher up (the tree). But the outcome of this effect should be very difficult to detect - the higher up, the more difficult. Therefore, the buffer should not be too small - or it should be flexibly sized depending on max
.
Inject even more randomness
As soon as I've published the article, another optimization came to my mind. So far there is only one place in the algorithm where randomness occurs: The content of each page fetched from Slicer
(by calling its method getPage
) is shuffled in NonRepeatingRandom
before its further processing.
Let me repeat some example from above:
0 1 2 3 4 5 6 7 8 9 ---> 0 1 2
3 4 5
6 7 8
9
(here we have n = 9
, maxPageSize = 4
and maxPage = 2
)
Page 0 equals [0, 3, 6, 9]
and will be shuffled later on, so one could say that the shuffling happens vertically 'per page'.
It would be great to introduce some horizontal shuffling too but this is not possible for obvious reasons (all shuffled values would have to be stored). However, what is possible with some integer arithmetic is rotating the rows randomly per Slicer
, e.g.,
0 1 2 3 4 5 6 7 8 9 ---> 2 0 1 (rotated right)
3 4 5 (unrotated)
7 8 6 (rotated left)
9
which leads to page 0 equalling [2, 3, 7, 9]
and so on.
I've not yet implemented it, but I'll update this article as soon as I've done it.
That's it
I hope you had some fun reading this article, comments are welcome!
Top comments (2)
@lukeshiru Two more comments:
Math.floor(Math.random() * (iterableArray.length - 1))
is wrong,Math.floor(Math.random() * iterableArray.length)
would be ok.Your method of shuffling looked a bit suspicious to me, so I wrote a little test:
If you run this a few times you will see that the combination 123 occurs most frequently by a large margin, followed by 321. The remaining combinations are underrepresented, with 213 still appearing the most, e.g.:
Something similar is discussed here: blog.codinghorror.com/the-danger-o... - that's a very good read!
Cheers!
@lukeshiru Thanks for your comment! Never used generators in javascript - very interesting! Regarding your "simpler approaches": The first one (more or less) equals No. 2 and the second (more or less) equals No. 3 in the article from Graham Cox referenced in my article. Both do very well if and only if the range
[min, max]
is not "too large". Citation from Graham Cox article:So the memory consumption is enormous in this case. In contrast the algorithm proposed by me only needs some 100 bytes in this case due to its logarithmic space complexity.