DEV Community

Cover image for A new(?) non-repeating Random Number Generator
Burkhard Graves
Burkhard Graves

Posted on • Edited on

A new(?) non-repeating Random Number Generator

Introduction

Recently I read an article which gives a nice overview on generating non-repeating random numbers. After explaining the purpose of random number generators in general, the author presents three algorithms to generate non-repeating random numbers, namely:

  1. Repeatedly generate normal random numbers and compare each one with all the previously generated random numbers.
  2. Create a complete list of all numbers in the range of the desired random numbers, shuffle the list and take the numbers one by one from this list.
  3. Like 2. but without shuffling in advance, instead a partial shuffle (swap of two numbers) is done when a number is taken from the list.

The third algorithm is better than the second in the sense that the up-front cost of shuffling the complete list is avoided if only some few random numbers are needed from a larger range. But both can generate new random numbers in constant time. Unfortunately both algorithms need huge amounts of memory since all possible random numbers must be stored beforehand.

The first algorithm may need the same amount of memory since every generated number must be stored for the subsequent comparisons. On top of that its time complexity is very bad because every number generated must be compared to all numbers previously generated (this can be mitigated by using hash-based storage techniques, but drives up the memory consumption even more). Furthermore, it may happen that the algorithm does not terminate because (at least in the real world) random numbers have no memory - there is no guarantee that at some point a random number will come along that has not appeared before.

All three algorithms and their disadvantages (except for the last one mentioned) are explained very well in the article referenced at the beginning.

At the end, the author asks "Can we do better?" and his answer ultimately seems to be "no". Now "better" is a relative term. Perhaps the author is right if only the time complexity is considered or if the non-repeating random numbers have to be as "perfect" as those from a good random number generator. But in any case, there are other algorithms, e.g. based on block ciphers (format preserving encryption is another keyword) that perform the desired task. But these algorithms are incredible complex and therefore very hard to understand (apart from that, I'm pretty sure that they do not work as fast as the mentioned algorithms No. 2 and No. 3).

In the following, I would like to present a quite simple algorithm to generate non-repeating random numbers which are somewhat less perfect but certainly perfect enough for various purposes (at least for mine). The idea of the algorithm is simple, but the final implementation is a little bit tricky (an imaginery tree structure is recursivly navigated), so we approach the final version in three steps.


Remark: I implemented the algorithm in Java, but I rewrote it in Javascript for this article to make it more accessible.

Evolution of a non-repeating Random Number Generator

The algorithm presented here produces non-repeating integer random numbers in the range from 0 to max-1, where max can be an arbitrarily large value greater than zero.

The well known divide and conquer paradigm recommends to break down large problems into smaller ones, so let's break down this (possibly gigantic) range into smaller portions called pages from now on.

Every range [0...n] with n >= 0 can be divided into maxPage + 1 pages not containing more than maxPageSize elements whereby

maxPage = n / maxPageSize
Enter fullscreen mode Exit fullscreen mode

holds. Sounds complicated, so let's explain it visually with n = 9 and maxPageSize = 4 and therefore maxPage = 2:

0 1 2 3 4 5 6 7 8 9  --->  0 1 2
                           3 4 5
                           6 7 8
                           9
Enter fullscreen mode Exit fullscreen mode

Page 0 contains the elements [0, 3, 6, 9], page 1 contains [1, 4 ,7] and page 2 contains [2, 5, 8].

Another example with n = 9 and maxPageSize = 2:

0 1 2 3 4
5 6 7 8 9
Enter fullscreen mode Exit fullscreen mode

Here we have maxPage = 4 with page 0 containing [0, 5], page 1 containing [1, 6] and so on.

For the further explanation of the algorithm we will stick to these initial values, but of course it also works for arbitrary ones.

First version of the algorithm

The first (not so good) version of our algorithm just shuffles the pages as needed and delivers its values one by one until the desired number of random numbers is reached.

Let's say we need 3 non-repeating random numbers. The algorithm shuffles the first page [0, 5] and delivers its values, than it shuffles the second page [1, 6] and delivers its first value. The second value of the second page is internally stored for subsequent calls for random numbers. This means only four different results are possible:

  1. [0, 5, 1]
  2. [0, 5, 6]
  3. [5, 0, 1]
  4. [5, 0, 6]

As I said, the first version of our algorithm is not so good.

Here you can find a jsfiddle of the first version of our algorithm (the program only writes logging messages, so open the browser console). The calculation of the pages is done by the (ES6-) class Slicer. This class will not change until the final version of our algorithm, so you only have to read and understand it once (the same holds for the functions assert and shuffleInPlace). The interesting things happen in NonRepeatingRandom_v1 and its function next, which is called at the end of the jsfiddle:

const nrr = new NonRepeatingRandom_v1(10, 2);

// try it out!
// console.info(nrr.next(1)); // get only one random number
// console.info(nrr.next(5)); // get 5 random numbers
console.info(nrr.next(10)); // get all random numbers at once
Enter fullscreen mode Exit fullscreen mode

Second version of the algorithm

The problem of the first version of our algorithm is that the calculated pages are processed one after the other. But wait!

0 1 2 3 4 <--- indices of the five pages
5 6 7 8 9
Enter fullscreen mode Exit fullscreen mode

The page indices also form a range from 0 to some n, but a much smaller n than before (depending on maxPageSize). So why not apply the first algorithm to create a non-repeating sequence of page indices which define which pages are used to generate the desired random numbers? Let's do it:

0 1 2
3 4
Enter fullscreen mode Exit fullscreen mode

(again with maxPageSize = 2, but this time with n = 4 and therefore maxPage = 2)

This improves the randomness of the generated sequences as they now start with [0, 5] or [5, 0] from page 0 or [3, 8] or [8, 3] from page 3 (the page indices originate from the first page of indices [0, 3]).

This is exactly what happens in the second version of our algorithm which can be found in this jsfiddle (again, don't forget to open the browser console).
Instead of using recursion (à la NonRepeatingRandom_v1 calls itself) two instances of Slicer are utilized in a class NonRepeatingRandom_v2 - this way it is a good preparation to understand the third and final version of our algorithm. But first play with the second one - the generated sequences of non-repeating random numbers look better already, but there is still room for improvement.

Final version of the algorithm

I think you've already guessed it: The indices of the pages of indices (of the very first pages) again can be sliced into pages and so - until there is nothing left so slice, i.e., a Slicer is reached which can produce only one page (maxPage = 0).

With regard to our example we have (still with maxPageSize = 2, but of course also different values can be used)

0 1
2
Enter fullscreen mode Exit fullscreen mode

and

0
1
Enter fullscreen mode Exit fullscreen mode

- which is also illustrated in the cover image at the top of this article.

For the human brain (at least for mine) it's not easy to capture the benefits of this "ultimate paging" because for large max the recursion can go quite deep - it's a little bit like thinking in more than three dimensions.

For this reason, the final version of the algorithm is used in this jsfiddle to plot pixels inside a 500 x 500 canvas in random order. This is done by picking 500 * 500 = 250.000 non-repeating random numbers out of the range [0...249999], translating them into coordinates...

const x = Math.trunc(randomValue / 500);
const y = randomValue % 500;
Enter fullscreen mode Exit fullscreen mode

...and drawing black squares of size 1 x 1 at these positions.

Technical remark: To make this process observable, the pixels are not plotted all at once. Instead only some of them are plotted in a method plotPixel() which indirectly calls itself with window.requestAnimationFrame(plotPixel) if the complete task is not yet finished. With 60fps requestAnimationFrame calls the function given as argument around 60 times per second - if plotPixel() would plot only one pixel, it would take more than one hour to plot all pixels. To shorten this time plotPixel() always plots 100 pixels at once - this way you can meditate in front of the screen for around 42 seconds watching the rise of the pixels...

Again I encourage you to play around with the code!

The constructor of the class NonRepeatingRandom looks like this:

class NonRepeatingRandom {
  /**
   * Produces non-repeating pseudorandom numbers between 0 and max-1 (incl.).
   *
   * @param {number} max
   * @param {boolean} maximize to maximize the number of slicers by repeating the last maxPageSize
   * @param  {...number} maxPageSizes
   */
  constructor(max, maximize, ...maxPageSizes) {
Enter fullscreen mode Exit fullscreen mode

You can

  • change the size of the canvas
  • try out different page sizes
    • read the doc of the constructor of the class NonRepeatingRandom
    • a value of true for the parameter maximize means "re-use the last given maxPageSize if more than maxPageSizes.length Slicers can be constructed"
  • prevent the "ultimate paging" by setting maximize to false when calling the constructor of NonRepeatingRandom
  • use a "usual" sequence of random numbers to plot the pixels and compare the visual randomness
  • and so on, and so on

Time and space complexity

These complexity values are hard to calculate because they depend on the maxPageSizes used when the algorithm is initialized.

Let's look at some border case: If the constructor is called with

new NonRepeatingRandom(max, false, max); // or `..., true, max`
Enter fullscreen mode Exit fullscreen mode

then only one Slicer is initialized which produces only one page [0, max - 1]. In this case the algorithm equals algorithms No. 2 and No. 3 in the referenced article regarding time and space complexity.

If only the parameter max is provided when calling the constructor

new NonRepeatingRandom(max)
Enter fullscreen mode Exit fullscreen mode

then maximize = true and maxPageSizes = [2] are set as default values - therefore

new NonRepeatingRandom(max, true, 2)
Enter fullscreen mode Exit fullscreen mode

initializes the class in the same way. With this sensible defaults the time and space complexity can be calculated easily:

To compute all max random numbers (at once or step by step) each value in each page must be calculated at some point in time. As we have seen, the pages can be arranged in a tree structure such that the number of pages (roughly) halve in each stage. Together with

m=0n2m=2n \sum_{m = 0}^{\infin} \frac{n}{2^m} = 2n

this shows that all random numbers can be computed in O(n) which means that in average one random number can be computed in O(1).

Analogously the space complexity can be calculated as O(log(n)) since only one page (resp. its values) must be kept available in each stage of the aforementioned tree structure.

Further optimizations

A disadvantage of the algorithm in its current form is that random numbers always come in groups (consisting of the values from their originating page) - w.r.t. our running example:

  • 0 is followed by 5 or 5 is followed by 0
  • 1 is followed by 6 or 6 is followed by 1
  • ...

If this is a problem then it can be solved by always generating a minimum number of random numbers, which in turn are shuffled again and kept in a buffer. If random numbers are requested externally, they are fetched from the buffer. If the buffer falls below a certain fill level, the buffer is refilled with new random numbers and shuffled again.

If the buffer has a fixed size than it does not change the time and space complexity of the algorithm. On the other hand, if it has a fixed size then of course some max can be found such that the grouping effect takes place in pages higher up (the tree). But the outcome of this effect should be very difficult to detect - the higher up, the more difficult. Therefore, the buffer should not be too small - or it should be flexibly sized depending on max.

Inject even more randomness

As soon as I've published the article, another optimization came to my mind. So far there is only one place in the algorithm where randomness occurs: The content of each page fetched from Slicer (by calling its method getPage) is shuffled in NonRepeatingRandom before its further processing.

Let me repeat some example from above:

0 1 2 3 4 5 6 7 8 9  --->  0 1 2
                           3 4 5
                           6 7 8
                           9
Enter fullscreen mode Exit fullscreen mode

(here we have n = 9, maxPageSize = 4 and maxPage = 2)

Page 0 equals [0, 3, 6, 9] and will be shuffled later on, so one could say that the shuffling happens vertically 'per page'.

It would be great to introduce some horizontal shuffling too but this is not possible for obvious reasons (all shuffled values would have to be stored). However, what is possible with some integer arithmetic is rotating the rows randomly per Slicer, e.g.,

0 1 2 3 4 5 6 7 8 9  --->  2 0 1 (rotated right)
                           3 4 5 (unrotated)
                           7 8 6 (rotated left)
                           9
Enter fullscreen mode Exit fullscreen mode

which leads to page 0 equalling [2, 3, 7, 9] and so on.

I've not yet implemented it, but I'll update this article as soon as I've done it.

That's it

I hope you had some fun reading this article, comments are welcome!

Top comments (2)

Collapse
 
burki profile image
Burkhard Graves

@lukeshiru Two more comments:

  1. Small fix in your second snippet (the one with the generator)

Math.floor(Math.random() * (iterableArray.length - 1)) is wrong,
Math.floor(Math.random() * iterableArray.length) would be ok.

  1. Very poor randomness in the first snippet

Your method of shuffling looked a bit suspicious to me, so I wrote a little test:

const randomNumbers = (min) => (max) => {
  const array = [...Array(max + 1 - min)].map((_, index) => index + min);

  return () => array.sort(() => Math.round(-Math.random())); // looks suspicious
};

// map with all possible results and a counter
const map = new Map();
map.set("123", 0);
map.set("132", 0);
map.set("213", 0);
map.set("231", 0);
map.set("312", 0);
map.set("321", 0);

for (let i = 600000; --i >= 0; ) {
  const a = randomNumbers(1)(3)();
  const key = a.join("");
  map.set(key, map.get(key) + 1); // increment the counter
}

// list number of occurrences of each possible result
// each result should appear around 100.000 times...
map.forEach((value, key) => console.info(key + " -> " + value));
Enter fullscreen mode Exit fullscreen mode

If you run this a few times you will see that the combination 123 occurs most frequently by a large margin, followed by 321. The remaining combinations are underrepresented, with 213 still appearing the most, e.g.:

123 -> 224783
132 -> 37147
213 -> 74877
231 -> 37517
312 -> 37532
321 -> 188144
Enter fullscreen mode Exit fullscreen mode

Something similar is discussed here: blog.codinghorror.com/the-danger-o... - that's a very good read!

Cheers!

Collapse
 
burki profile image
Burkhard Graves

@lukeshiru Thanks for your comment! Never used generators in javascript - very interesting! Regarding your "simpler approaches": The first one (more or less) equals No. 2 and the second (more or less) equals No. 3 in the article from Graham Cox referenced in my article. Both do very well if and only if the range [min, max] is not "too large". Citation from Graham Cox article:

However, what if, instead, we want to be able to generate any 32-bit integer? This would mean that we need a single 32-bit counter and an array with one entry for every possible 32-bit integer. This means we need (2^32 + 1)*4 bytes of memory, which is ~16GB.

So the memory consumption is enormous in this case. In contrast the algorithm proposed by me only needs some 100 bytes in this case due to its logarithmic space complexity.