DEV Community

Cover image for Sorting Lists in Python
Sebastian Witowski
Sebastian Witowski

Posted on • Originally published at switowski.com on

Sorting Lists in Python

There are at least two common ways to sort lists in Python:

  • With sorted function that returns a new list
  • With list.sort method that modifies list in place

Which one is faster? Let's find out!

sorted() vs list.sort()

I will start with a list of 1 000 000 randomly shuffled integers. Later on, I will also check if the order matters.

# sorting.py
from random import sample

# List of 1 000 000 integers randomly shuffled
MILLION_RANDOM_NUMBERS = sample(range(1_000_000), 1_000_000)

def test_sort():
    random_list = MILLION_RANDOM_NUMBERS[:]
    return random_list.sort()

def test_sorted():
    random_list = MILLION_RANDOM_NUMBERS[:]
    return sorted(random_list)
Enter fullscreen mode Exit fullscreen mode
$ python -m timeit -s "from sorting import test_sort" "test_sort()"
1 loop, best of 5: 352 msec per loop

$ python -m timeit -s "from sorting import test_sorted" "test_sorted()"
1 loop, best of 5: 385 msec per loop
Enter fullscreen mode Exit fullscreen mode

sorted is less than 10% slower (385/352β‰ˆ1.094). Since we only run one loop, the exact numbers are not very reliable. I have rerun the same tests a couple more times, and the results were slightly different each time. sort took around 345-355 msec and sorted took around 379-394 msec (but it was always slower than sort). This difference comes mostly from the fact that sorted creates a new list.

Why do I make a copy of the original list in each test? Well, in the original version of this article, I forgot to do this, and I ended up with completely wrong benchmarks. sort was running on a sorted list (because the first iteration of timeit sorted it), and sorted was running on a random list. You can see the whole explanation in the original post.

Initial order matters

What happens when our initial list is already sorted?

MILLION_NUMBERS = list(range(1_000_000))
Enter fullscreen mode Exit fullscreen mode
$ python -m timeit -s "from sorting import test_sort" "test_sort()"
20 loops, best of 5: 12.1 msec per loop

$ python -m timeit -s "from sorting import test_sorted" "test_sorted()"
20 loops, best of 5: 16.6 msec per loop
Enter fullscreen mode Exit fullscreen mode

Now, sorting takes much less time and the difference between sort and sorted grows to 37% (16.6/12.1β‰ˆ1.372). Why is sorted 37% slower this time? Well, creating a new list takes the same amount of time as before. And since the time spent on sorting has shrunk, the impact of creating that new list got bigger.

And if we try to sort a list of 1 000 000 numbers ordered in descending order:

DESCENDING_MILLION_NUMBERS = list(range(1_000_000, 0, -1))
Enter fullscreen mode Exit fullscreen mode
$ python -m timeit -s "from sorting import test_sort" "test_sort()"
20 loops, best of 5: 11.7 msec per loop

$ python -m timeit -s "from sorting import test_sorted" "test_sorted()"
20 loops, best of 5: 18.1 msec per loop
Enter fullscreen mode Exit fullscreen mode

The results are almost identical as before. The sorting algorithm is clever enough to optimize the sorting process for a descending list.

For our last test, let’s try to sort 1 000 000 numbers where 100 000 elements are shuffled, and the rest are ordered:

# 10% of numbers are random
MILLION_SLIGHTLY_RANDOM_NUMBERS = [*range(900_000), *sample(range(1_000_000), 100_000)]
Enter fullscreen mode Exit fullscreen mode
$ python -m timeit -s "from sorting import test_sort" "test_sort()"
5 loops, best of 5: 61.2 msec per loop

$ python -m timeit -s "from sorting import test_sorted" "test_sorted()"
5 loops, best of 5: 71 msec per loop
Enter fullscreen mode Exit fullscreen mode

Both functions get slower as the input list becomes more scrambled.

Using list.sort() is my preferred way of sorting lists - it saves some time (and memory) by not creating a new list. But that's a double-edged sword! Sometimes you might accidentally overwrite the initial list without realizing it (as I did with my initial benchmarks πŸ˜…). So, if you want to preserve the initial list's order, you have to use sorted instead. And sorted can be used with any iterable, while sort only works with lists. If you want to sort a set, then sorted is your only solution.

Conclusions

sort is slightly faster than sorted, because it doesn't create a new list. But you might still stick with sorted if:

  • You don't want to modify the original list. sort performs sorting in-place, so you can't use it here.
  • You need to sort something else than a list. sort is only defined on lists, so if you want to sort a set or any other collection of items, you have to use sorted instead.

If you want to learn more, the Sorting HOW TO guide from Python documentation contains a lot of useful information.

Discussion (0)