Setting probability to choose something randomly

#sampling #random #programming #python

Playing at Tetris I always wonder how some shapes are most common that they appear than others. Basically, you can expect that you almost never see the line, probably you'll see twice as often the square, then the L shapes and the kind-of zigzag shapes are the most common shapes.

If you ever think about, how would you implement the shape choosing, just by using the pseudo-random to select the shape, the probability to get any shape will be the same. But we can think some other approaches to change the probability.

Let's say we have the following shapes with their chances to be picked:

class Shape:
  def __init__(self, name, chances):
    self.name = name
    self.chances = chances

shapes = [
    Shape("ZigZag", 8),
    Shape("LeftZigZag", 8),
    Shape("L", 4),
    Shape("LeftL", 4),
    Shape("Square", 2),
    Shape("Line", 1),
]

There are 6 shapes, and the total chances (sum of all chances) are: 27. So the chances of getting a ZigZag will be 8 out of 27, when the chances of getting a line would be 1 out of 27.

Approach 1 Creating a list with all the chances

The first approach that came to my mind is creating a list of size 27 and fill it with all the shapes, repeating the shape as many chances it has to appear. This way we have the same chance to pick any of the 27 indexes of the list, but if a shape has 8 slots, this shape will have 8 out of 27 chances to appear:

class RandomShapePicker:
  def __init__(self, shapes):
    self.chances = []
    for shape in shapes:
      for _ in range(shape.chances):
        self.chances.append(shape)

  def pick_shape(self):
    return random.choice(self.chances)

Analyzing the performance, to create the RandomShapePicker, memory will be in function of the total sum of chances (In this case 27) and the time complexity will be in function of the sum of chances as well: O(C), where C is the total sum of chances.

Picking a shape time complexity will be a constant as we get a random index and access directly to that index. We don't need extra space either which is constant as well: O(1).

This works very well for a small list which can potentially call pick_shape multiple times as we can expect constant time for these calls.

The problem with this approach is when the number of choices is large, and we manage large number of chances. Imagine that the sum of chances is 1000000 which means that our list should have 1M spots. Even if the different choices are low (Let's say 1000).

For this particular problem would be more efficient not having multiple spots for the same choice. But then, we have to think how we are going to manage that a Line shape has 1/27 chances to appear.

Approach 2 Creating a list with the cumulative sum of chances.

Rethinking our solution:

Picking a number between 0 and 26 (27 chances) the chances of getting:

a number between 0 and 7 is 8/27 (ZigZag have 8 out of 27 chances)
a number between 8 and 15 is 8/27 (LeftZigZag have 8 out of 27 chances)
a number between 16 and 19 is 4/27 (L have 4 out of 27 chances)
a number between 20 and 23 is 4/27 (LeftL have 4 out of 27 chances)
a number between 24 and 25 is 2/27 (Square have 2 out of 27 chances)
The number 26 is 1/27 (Line have 1 out of 27 chances)

There is a pattern with the number of chances and the cumulative sum. If we store this cumulative sum in a list, we will have:

self.chances = [8, 16, 20, 24, 26, 27]

We can use binary search to find the insertion point for that random number, and then select the shape based on the chances that we have to get that shape:

For example, if we get number 21, which is in the range of 20 and 23 (4 chances to get a LeftL) The insertion index for that 21 is
3 (The one with the value 24) which is the same index as the LeftL shape.

The implementation would be:

class RandomShapePicker:
  def __init__(self, shapes):
    self.shapes = shapes
    self.chances = []
    self.total_chances = 0
    for shape in shapes:
      self.total_chances += shape.chances
      self.chances.append(self.total_chances)

  def pick_shape(self):
    choice = random.random() * self.total_chances
    i = bisect_left(self.chances, choice)
    return self.shapes[i]

This implementation will trade off time complexity from pick_shape to improve the time and space complexity for the RandomShapePicker creation.

The constructor is now O(N) time and space complexity (N is the number of choices which is less than number of chances), and pick_shape time complexity is now O(logN) which is still a decent performance for multiple callings of pick_shape.

Conclusion

We need to think about trade-offs here. Probably is not a bad idea that we are calling a big number of times the pick_shape() or pick_option() and we will only create the object once. Having a constant time to do this can definitely give us the best time performance, but if there is a large number of chances, this can require a large amount of memory. But the second approach can help on that when our number of chances is too high to save some memory and still have a decent time performance.