loading...

Generating Unique Random Data: A Practical Example Of Algorithm Analysis

zacharypatten profile image Zachary Patten ・10 min read

I ran across a practical example of algorithm analysis when I wrote a method that generates unique random values inside a provided range. Figured I would share it...

The code in this article was included in my https://github.com/ZacharyPatten/Towel project. Please check it out if you want to see more code like it. :)

Note: all code examples are in C#.

Introduction

When you want to generate unique random numbers, there are two approaches you can take: 1) you can track the remaining possible values of generation or 2) you can track the values that have already been randomly selected. For variables/parameters, let count be the number of values to generate, minValue be the inclusive lower bound of generation, and maxValue be the exclusive upper bound of generation.

Algorithm #1/A

After a bit of googling, option #1 seems to be more common. You often see people throw all the possible values into an array/list and remove a randomly generated index. Here is an implementation of that algorithm:

public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue)
{
    // Algorithm A: Θ(range + count)
    int pool = maxValue - minValue;
    int[] array = new int[pool];
    for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range)
        array[i] = j;
    for (int i = 0; i < count; i++) // Θ(count)
    {
        int rollIndex = random.Next(0, pool);
        int roll = array[rollIndex];
        array[rollIndex] = array[--pool];
        yield return roll;
    }
}

Algorithm #2/B

Now, here is what option #2 can look like. For this implementation, I use a linked list to store all the values that have already been generated in sorted order:

public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue)
{
    // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2)
    Node head = null;
    for (int i = 0; i < count; i++) // Θ(count)
    {
        int roll = random.Next(minValue, maxValue - i);
        Node node = head;
        Node previous = null;
        while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2)
        {
            if (node.Value > roll)
                break;
            roll++;
            previous = node;
            node = node.Next;
        }
        yield return roll;
        if (previous is null)
            head = new Node() { Value = roll, Next = head, };
        else
            previous.Next = new Node() { Value = roll, Next = previous.Next };
    }
}

internal class Node
{
    internal int Value;
    internal Node Next;
}

Analyzing Algorithm Complexities

These two implementations have different runtime algorithm complexities. The #1/A option has a runtime complexity of Θ(range + count) while the #2/B option has a runtime complexity of O(.5*count^2), Ω(count), ε(.5*count^2) where O is the worst case, Θ is every case, Ω is the best case, and ε is the expected average between the worst and the best cases. Long story short, when the count is relatively low you want to use algorithm #2/B. When the count is relatively high you want to use algorithm #1/A. So if you want to make a faster all-around general purpose method, you need to combine the two approaches. But how do you do that? You need to find the intersection of the two runtime complexities of each method.

If you treat both algorithm complexities as linear functions and do some simple math to find the intesection, you will find that they intersect at about √(maxValue - minValue) the square root of the range of random generation relative to count. Now that you know the intersection point, you can combine the two algorithms into a single function that is optimized for more use cases:

Combination of #1/A and #2/B

public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue)
{
    if (count > Math.Sqrt(maxValue - minValue))
    {
        // Algorithm A
    }
    else
    {
        // Algorithm B
    }
}

Benchmarks

Here are some benchmarks so you can compare theory with practical results. Note that for these benchmarks I used a random generation range of 0..10000. So minValue = 0 and maxValue = 10000. From the theory, I knew that the intersection of the runtime algorithms is roughly √(maxValue - minValue) = √(100000 - 0) = √(100000) = 100. So when count is less than ~100 the #2/B algorithm should be faster, but when count is greater than ~100 algorithm #1/A should be faster.

Benchmarks Source Code

Note: This code uses the BenchmarkDotNet nuget package. It also uses the Plots exporter to generate the graphs, which requires R be installed.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;
using System.Collections.Generic;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Configs;

namespace UniqueRandomGeneration
{
    class Program
    {
        static void Main(string[] args) => BenchmarkRunner.Run<Benchmarks>();
    }

    [RPlotExporter]
    [Config(typeof(Config))]
    public class Benchmarks
    {
        public class Config : ManualConfig
        {
            public Config() => Add(RPlotExporter.Default);
        }

        const int MinValue = 0;
        const int MaxValue = 10000;

        [Params(25, 50, 75, 100, 125, 150, 175, 200, 225, 250)]
        public int Count;

        readonly Random random = new Random();

        [Benchmark] public void RandomUniqueA() => random.NextUniqueA(Count, MinValue, MaxValue).ToArray();

        [Benchmark] public void RandomUniqueB() => random.NextUniqueB(Count, MinValue, MaxValue).ToArray();

        [Benchmark] public void RandomUnique() => random.NextUnique(Count, MinValue, MaxValue).ToArray();
    }

    public static class Extensions
    {
        internal class Node
        {
            internal int Value;
            internal Node Next;
        }

        public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            if (count > Math.Sqrt(maxValue - minValue))
            {
                // Algorithm A: Θ(range + count)
                int pool = maxValue - minValue;
                int[] array = new int[pool];
                for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range)
                    array[i] = j;
                for (int i = 0; i < count; i++) // Θ(count)
                {
                    int rollIndex = random.Next(0, pool);
                    int roll = array[rollIndex];
                    array[rollIndex] = array[--pool];
                    yield return roll;
                }
            }
            else
            {
                // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2)
                Node head = null;
                for (int i = 0; i < count; i++) // Θ(count)
                {
                    int roll = random.Next(minValue, maxValue - i);
                    Node node = head;
                    Node previous = null;
                    while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2)
                    {
                        if (node.Value > roll)
                            break;
                        roll++;
                        previous = node;
                        node = node.Next;
                    }
                    yield return roll;
                    if (previous is null)
                        head = new Node() { Value = roll, Next = head, };
                    else
                        previous.Next = new Node() { Value = roll, Next = previous.Next };
                }
            }
        }

        public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            // Algorithm A: Θ(range + count)
            int pool = maxValue - minValue;
            int[] array = new int[pool];
            for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range)
                array[i] = j;
            for (int i = 0; i < count; i++) // Θ(count)
            {
                int rollIndex = random.Next(0, pool);
                int roll = array[rollIndex];
                array[rollIndex] = array[--pool];
                yield return roll;
            }
        }

        public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2)
            Node head = null;
            for (int i = 0; i < count; i++) // Θ(count)
            {
                int roll = random.Next(minValue, maxValue - i);
                Node node = head;
                Node previous = null;
                while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2)
                {
                    if (node.Value > roll)
                        break;
                    roll++;
                    previous = node;
                    node = node.Next;
                }
                yield return roll;
                if (previous is null)
                    head = new Node() { Value = roll, Next = head, };
                else
                    previous.Next = new Node() { Value = roll, Next = previous.Next };
            }
        }
    }
}

Benchmarks Results Table


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT


Method Count Mean Error StdDev
RandomUniqueA 25 7.406 us 0.1576 us 0.1876 us
RandomUniqueB 25 1.307 us 0.0256 us 0.0414 us
RandomUnique 25 1.300 us 0.0046 us 0.0043 us
RandomUniqueA 50 7.761 us 0.0263 us 0.0246 us
RandomUniqueB 50 2.893 us 0.0084 us 0.0079 us
RandomUnique 50 3.022 us 0.0755 us 0.0707 us
RandomUniqueA 75 8.387 us 0.1317 us 0.1232 us
RandomUniqueB 75 5.180 us 0.1070 us 0.1001 us
RandomUnique 75 5.298 us 0.0634 us 0.0593 us
RandomUniqueA 100 9.151 us 0.1800 us 0.2340 us
RandomUniqueB 100 8.013 us 0.0994 us 0.0930 us
RandomUnique 100 8.246 us 0.1205 us 0.1127 us
RandomUniqueA 125 9.335 us 0.1533 us 0.1434 us
RandomUniqueB 125 11.291 us 0.1093 us 0.1023 us
RandomUnique 125 9.394 us 0.1773 us 0.1658 us
RandomUniqueA 150 9.509 us 0.0323 us 0.0302 us
RandomUniqueB 150 14.486 us 0.0653 us 0.0610 us
RandomUnique 150 9.355 us 0.0428 us 0.0400 us
RandomUniqueA 175 9.972 us 0.0319 us 0.0298 us
RandomUniqueB 175 18.778 us 0.0792 us 0.0741 us
RandomUnique 175 9.789 us 0.0593 us 0.0554 us
RandomUniqueA 200 10.270 us 0.0297 us 0.0278 us
RandomUniqueB 200 23.384 us 0.0623 us 0.0520 us
RandomUnique 200 10.112 us 0.0499 us 0.0466 us
RandomUniqueA 225 10.724 us 0.1754 us 0.1465 us
RandomUniqueB 225 28.748 us 0.0872 us 0.0816 us
RandomUnique 225 10.612 us 0.0430 us 0.0381 us
RandomUniqueA 250 11.144 us 0.0518 us 0.0484 us
RandomUniqueB 250 34.320 us 0.1249 us 0.1107 us
RandomUnique 250 10.891 us 0.0826 us 0.0772 us

Benchmarks Results Bar Graph

Bar Graph Edited


Probability/Validity Testing

Here is some validity testing to hopefully make sure I am not full of shit. In this code, I generate 5 values in the 0..26 range. I do this 1000000 times, and I calculate the mean that each values was randomly selected. All the values should be selected an average of 5 / 26 = ‭0.1923...‬ times.

Validity Testing Source Code

using System;
using System.Linq;
using System.Collections.Generic;

namespace ValidityTesting
{
    class Program
    {
        static void Main()
        {
            Console.WriteLine(nameof(Extensions.NextUnique));
            PrintPriorities(CalculateProbabilities(Extensions.NextUnique));
            Console.WriteLine();

            Console.WriteLine(nameof(Extensions.NextUniqueA));
            PrintPriorities(CalculateProbabilities(Extensions.NextUniqueA));
            Console.WriteLine();

            Console.WriteLine(nameof(Extensions.NextUniqueB));
            PrintPriorities(CalculateProbabilities(Extensions.NextUniqueB));
            Console.WriteLine();
        }

        internal static void PrintPriorities(Dictionary<int, double> results)
        {
            foreach (var keyVal in results.OrderBy(x => x.Key))
                Console.WriteLine(keyVal.Key + ": " + keyVal.Value);
        }

        internal static Dictionary<int, double> CalculateProbabilities(Func<Random, int, int, int, IEnumerable<int>> algorithm)
        {
            Random random = new Random();
            const int size = 5;
            const int min = 0;
            const int max = 26;
            const int iterations = 1000000;
            var counts = new Dictionary<int, int>();
            for (int i = min; i < max; i++)
                counts.Add(i, 0);
            for (int i = 0; i < iterations; i++)
                foreach (int j in algorithm(random, size, min, max))
                    counts[j]++;
            var results = new Dictionary<int, double>();
            foreach (var keyVal in counts)
                results.Add(keyVal.Key, keyVal.Value / (double)iterations);
            return results;
        }
    }

    public static class Extensions
    {
        internal class Node
        {
            internal int Value;
            internal Node Next;
        }

        public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            if (count > Math.Sqrt(maxValue - minValue))
            {
                // Algorithm A: Θ(range + count)
                int pool = maxValue - minValue;
                int[] array = new int[pool];
                for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range)
                    array[i] = j;
                for (int i = 0; i < count; i++) // Θ(count)
                {
                    int rollIndex = random.Next(0, pool);
                    int roll = array[rollIndex];
                    array[rollIndex] = array[--pool];
                    yield return roll;
                }
            }
            else
            {
                // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2)
                Node head = null;
                for (int i = 0; i < count; i++) // Θ(count)
                {
                    int roll = random.Next(minValue, maxValue - i);
                    Node node = head;
                    Node previous = null;
                    while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2)
                    {
                        if (node.Value > roll)
                            break;
                        roll++;
                        previous = node;
                        node = node.Next;
                    }
                    yield return roll;
                    if (previous is null)
                        head = new Node() { Value = roll, Next = head, };
                    else
                        previous.Next = new Node() { Value = roll, Next = previous.Next };
                }
            }
        }

        public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            // Algorithm A: Θ(range + count)
            int pool = maxValue - minValue;
            int[] array = new int[pool];
            for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range)
                array[i] = j;
            for (int i = 0; i < count; i++) // Θ(count)
            {
                int rollIndex = random.Next(0, pool);
                int roll = array[rollIndex];
                array[rollIndex] = array[--pool];
                yield return roll;
            }
        }

        public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue)
        {
            if (maxValue < minValue)
                throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue));
            if (count < 0)
                throw new ArgumentOutOfRangeException(nameof(count) + " < 0");
            if (maxValue - minValue < count)
                throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + ".");
            // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2)
            Node head = null;
            for (int i = 0; i < count; i++) // Θ(count)
            {
                int roll = random.Next(minValue, maxValue - i);
                Node node = head;
                Node previous = null;
                while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2)
                {
                    if (node.Value > roll)
                        break;
                    roll++;
                    previous = node;
                    node = node.Next;
                }
                yield return roll;
                if (previous is null)
                    head = new Node() { Value = roll, Next = head, };
                else
                    previous.Next = new Node() { Value = roll, Next = previous.Next };
            }
        }
    }
}

Validity Testing Console Output

NextUnique
0: 0.192031
1: 0.19208
2: 0.192785
3: 0.192287
4: 0.192593
5: 0.192686
6: 0.191937
7: 0.191998
8: 0.19244
9: 0.192112
10: 0.19212
11: 0.192707
12: 0.192877
13: 0.192086
14: 0.192277
15: 0.191907
16: 0.192366
17: 0.192386
18: 0.192423
19: 0.19229
20: 0.192017
21: 0.191901
22: 0.19285
23: 0.19282
24: 0.191902
25: 0.192122

NextUniqueA
0: 0.192705
1: 0.191956
2: 0.192313
3: 0.192613
4: 0.193418
5: 0.191957
6: 0.192116
7: 0.192217
8: 0.192773
9: 0.191862
10: 0.19233
11: 0.192145
12: 0.192044
13: 0.192749
14: 0.192097
15: 0.19224
16: 0.192526
17: 0.19225
18: 0.191653
19: 0.192546
20: 0.192156
21: 0.191686
22: 0.192908
23: 0.19174
24: 0.192605
25: 0.192395

NextUniqueB
0: 0.192408
1: 0.191776
2: 0.191809
3: 0.192623
4: 0.192399
5: 0.192451
6: 0.192135
7: 0.1931
8: 0.192594
9: 0.19232
10: 0.192281
11: 0.191723
12: 0.192358
13: 0.19216
14: 0.192697
15: 0.192975
16: 0.192357
17: 0.192507
18: 0.191768
19: 0.191874
20: 0.192803
21: 0.191934
22: 0.192075
23: 0.192704
24: 0.192155
25: 0.192014

Posted on by:

zacharypatten profile

Zachary Patten

@zacharypatten

Just another programmer...

Discussion

pic
Editor guide