Mohammad Tomaraei

Posted on Apr 5, 2021

Random Sort Algorithm Without Consecutive Repeats

#algorithms #python #flowchart #sorting

Background

I’ve been recently working on a project to create a content feed like Instagram and Facebook. There are various available content types including photos, videos, blog posts, and surveys.

The feed can be sorted in different ways such as newer content first, sort by popularity, and random order. For the default sort order, we have decided to show a random assortment of content, but we don’t want the same content type to appear twice in a row.

Naturally, I began searching the internet to see if someone else had already come up with this algorithm, but most of the algorithms I found were about shuffling and removing duplicates altogether. I did however find a Stack Overflow question about the same idea, but the top answer was not a perfect solution.

I then went to the whiteboard and took on the challenge of finding a working and relatively optimized algorithm.

Do it yourself

One of the best ways to create algorithms is by defining the problem, choosing a sample input, and solving it yourself while explaining the steps throughout the way.

Problem: Given a list of items of various types, produce a randomized list without two consecutive repeated types.
Sample input: [A,A,A,A,B,B,B,C,C,D]
Valid output: [C,A,D,B,A,B,A,C,A,B]

To solve this problem manually, we first need to shuffle the input list. Let’s suppose that [B,A,C,C,A,A,D,B,A,B] is sufficiently randomized.

Starting from the first item, B, we keep going to the right and check if the next item is the same as the last one. The first consecutive repeat has occurred at the location of item 4, C. We set this item aside and continue moving to the right. Another repeated item, A, is found next.

Note that you should never modify the original list while iterating over it. While it may not be obvious to humans, it does create a bunch of unwanted side-effects for the computer. For instance, the iterator may not be aware of changes made to the list and can behave differently. How about indices? Altering the list changes the index-based reference and causes confusion.

Once we have identified the repeated items, we look for places in between existing items where they can fit. The final sorted list is [B,A,C,A,C,A,D,B,A,B].

Making flowcharts

We can create flowcharts to visualize algorithms. Depending on the complexity of the algorithm, you may sometimes choose to skip this step and directly write code, but try making a flowchart first — it saves a lot of time and makes coding easier.

In the last step of our manual solution, we had to find new locations for the repeated items to avoid consecutive repeats. Although it seems trivial to our brains, there are actually multiple ways of doing this, but not all paths lead to an acceptable outcome. I’ve created a Whimsical board with 3 different algorithm flowcharts.

My winning algorithm starts by shuffling the input list. It then creates two empty lists: a list for sorted items and one for repeated items. Next, it loops over the randomized list. If the current item is different than the last item, it adds it to the list of sorted items. Otherwise, the item gets added to the list of repeated items.

Once the first iteration is over, it repeatedly loops over the sorted list and tries to find spots where it can place the repeated items. When either no repeated items remain or it cannot find somewhere to fit them, the iteration stops and it returns the sorted list merged with the remaining repeated items.

Turning it into code

My current favorite language for prototyping algorithms is Python, primarily because it doesn’t have curly brackets and has handy built-in functions like random.shuffle().

import random

def randomize_list(input_list):
    randomized_list = input_list.copy()
    random.shuffle(randomized_list)
    return randomized_list


def get_repeated_items(input_list, type_key):
    repeated_items = []
    pre_sorted_list = []
    last_item = {}
    for item in input_list:
        if last_item and item[type_key] == last_item[type_key]:
            repeated_items.append(item)
            continue
        pre_sorted_list.append(item)
        last_item = item
    return repeated_items, pre_sorted_list

def recycle_repeated_items(repeated_items, pre_sorted_list, type_key):
    sorted_list = []
    last_item = {}
    changed = False
    for item in pre_sorted_list:
        filtered_types = [item[type_key]] + ([last_item[type_key]] if last_item else [])
        eligible_repeated_item = next(filter(lambda t: t[type_key] not in filtered_types, repeated_items), None)
        if eligible_repeated_item:
            changed = True
            repeated_items.remove(eligible_repeated_item)
            sorted_list.append(eligible_repeated_item)
        sorted_list.append(item)
        last_item = item
    return repeated_items, sorted_list, changed

def randomized_non_repeating_sort(input_list, type_key):
    randomized_list = randomize_list(input_list)
    repeated_items, sorted_list = get_repeated_items(randomized_list, type_key)
    repeated_items, sorted_list, changed = recycle_repeated_items(repeated_items, sorted_list, type_key)
    while repeated_items and changed:
        repeated_items, sorted_list, changed = recycle_repeated_items(repeated_items, sorted_list, type_key)
    return sorted_list + repeated_items

I wrote the test below to make sure it works and measure how well it performs under heavy load:

original_list = [
    {'id': 10, 'type': 'A'},
    {'id': 11, 'type': 'A'},
    {'id': 12, 'type': 'A'},
    {'id': 13, 'type': 'A'},
    {'id': 14, 'type': 'B'},
    {'id': 15, 'type': 'B'},
    {'id': 16, 'type': 'B'},
    {'id': 17, 'type': 'C'},
    {'id': 18, 'type': 'C'},
    {'id': 19, 'type': 'D'}
]

print('Original list:')
print(original_list)

print('Randomized non-repeating list (1 of 1,000,000 tests):')

for _ in range(1,1000000):
    output_list = randomized_non_repeating_sort(original_list, 'type')
    repeated_items = get_repeated_items(output_list, 'type')[0]
    if repeated_items:
        raise Exception('CONSECUTIVE REPEATED ITEM FOUND!')
    if len(output_list) != len(original_list):
        raise Exception('LIST LENGTH MISMATCH!')

print(output_list)

"""
Output:
[{'id': 15, 'type': 'B'},
 {'id': 10, 'type': 'A'},
 {'id': 17, 'type': 'C'},
 {'id': 12, 'type': 'A'},
 {'id': 19, 'type': 'D'},
 {'id': 13, 'type': 'A'},
 {'id': 18, 'type': 'C'},
 {'id': 16, 'type': 'B'},
 {'id': 11, 'type': 'A'},
 {'id': 14, 'type': 'B'}]
"""

I ran the code with PyCharm’s profiler and found that it takes 0.018702 seconds to sort a list of 10 items.

It’s a great starting point, but there is definitely room for optimization!

This post was originally published on my blog where I write all about tech.

DEV Community

Random Sort Algorithm Without Consecutive Repeats

Background

Do it yourself

Making flowcharts

Turning it into code

Top comments (0)

Read next

Building a Full Stack Web Application using Flask (Python Web Framework) - Part Two

Convert Images Into Pencil Sketch Using Python.

Crafting an Image to PDF Converter App Using Python.

Calculator with GUI Using Python Tkinter.