General Resources:
Stability in sorting
- A stable sorting algorithm will retain the original order of the items, after sorting by the given key(s).
- For example, sorting the following array by first letter only:
[fence, fan, apple, boat, fun]
, a stable sort would produce:[apple, boat, fence, fan, fun]
. An unstable algorithm might produce:[apple, boat, fun, fan, fence]
. - Notice how
fence, fan, fun
all start with the letterf
. The stable sort keeps the original order of the items after sorting the entire array. An unstable sort will not necessarily retain the original order. - This is a good article explaining stability in sorting.
In-place vs not-in-place
- Many sorting algorithms are in-place.
- In-place algorithms require very little extra memory to perform their transformation on an input.
- If additional data structures are required to perform a transformation, then the algorithm is considered not-in-place.
- Variables, such as pointers or indexes, are allowed in in-place algorithms. These use a very small amount of memory.
- In-place algorithms usually modify the input by overwriting/rearranging/replacing elements in the input.
- Not-in-place algorithms usually create additional data structures or copies of the input, in order to perform their work.
- Almost all of the sorts mentioned in this article are in-place.
- A good example of a not-in-place algorithm is mergesort. You will notice in my implementation that it requires a copy of the input, in addition to the input, to perform sorting. It also uses recursion, which increases memory requirements.
- Mergesort's extra memory overhead means that it requires
O(n)
additional space - meaning that the memory required is directly proportional to the size of the input. You will notice that every other sort listed has a space complexity less thanO(n)
- as they are all in-place. - If you want more information, Wikipedia has a good article on in-place algorithms
Bubblesort
Resources
Takeaways
- Bubblesort is very simple to implement, but generally quite slow.
- It achieves sorting a list by continually comparing neighbouring pairs of items and swapping them (moving the larger items right) until the entire list is sorted.
- It is stable.
- It is in-place.
- Space complexity is
O(1)
but time complexity in the worst and average cases areO(n^2)
(quadratic). - It is generally avoided due to its poor performance.
Selectionsort
Resources
Takeaways
- Selectionsort is also very simple to implement, but like bubblessort is slow.
- It sorts a list by iterating over it and for each iteration moving the smallest items towards the front of the list (so on first iteration, the smallest item found is moved to the very start of the list). This process is repeated until the list is sorted.
- Time complexity is
O(n^2)
and space isO(1)
. - It is unstable.
- It is in-place.
Insertionsort
Resources
Takeaways
- Insertionsort is relatively quick for very small lists/arrays. If a list is between 5-15 items, insertionsort can be the fastest sorting algorithm to choose. In fact, Sedgewick recommends, in the book Algorithms, cutting off and using insertionsort for tiny arrays in implementations of quick/mergesort.
- However, for large inputs, insertionsort is slow - in worst and average cases it has a time complexity of
O(n^2)
. - Space is
O(1)
. - It is stable.
- It is in-place.
- It sorts a list by maintaining "sorted" and "unsorted" subsections of it. It will insert items from the unsorted subsection into the sorted subsection.
- Before inserting an item from the unsorted subsection, insertion sort will compare the to be inserted item with the items in the sorted subsection.
- It does this by iterating over the sorted subsection and comparing each item with the item to be inserted. It maintains an index where new item should be inserted at.
- Once the item to be inserted is no longer smaller than the items it is being compared to, it is inserted at the calculated index and the process is repeated until the entire input is sorted.
- To start with, the sorted subsection is just one item (which means it is sorted by default) and will grow by one after each insert.
Shellsort
Resources
Takeaways
- It is based on insertionsort, and is marginally more complex to implement.
- It is unstable.
- It is in-place.
- It's time complexity is not consistent between implementations and depends on the interval/gap sequence used. It is generally accepted that it runs anywhere from
O(n)
toO(n^2)
- but a best case ofO(n log n)
(linearithmic) is possible. - Shellsort uses an interval/gap which determines which items are compared to each other. If the interval is three, then every item is compared to the item three items away (instead of it's neighbour).
- I used Knuth's gap sequence in my shellsort implementation. This will produce gaps of
1,3,7,21,48,112,etc.
. - During the sort, the interval is continually reduced after each complete iteration of the list. The interval will eventually be 1 - then neighbouring items will be compared. After this compare, the list/array will be sorted.
- Using an interval/gap sequence on top of insertionsort means that items more widely spread out in a list/array are compared (versus just neighbouring items like with insertsionsort). This can help speed up sorting when compared to insertionsort.
Quicksort
Resources
- Quicksort Overview.
- Quicksort Overview + Implementation.
- Detailed Quicksort Overview + Implementation.
- Quicksort Overview + History.
Takeaways
- One of the fastest sorts with an average running time of
O(n log n)
. - It does have a worst case time complexity of
O(n^2)
, however randomly shuffling the input or selecting a random pivot protects against this worst case. - Space is
O(log n)
(logarithmic). - It is a divide & conquer algorithm.
- It is unstable.
- It is in-place.
- Quicksort splits a given list in two around what is called a pivot item. This splitting is called partitioning. All items to the left of the pivot are smaller than the pivot, and all to the right are larger than the pivot. Once this is achieved the two sub-lists are recursively sorted.
- Quicksort has two variants: Lomuto & Hoare.
- Tony Hoare invented quicksort and his variant is slightly more complex to implement (uses recursion) compared to Lomuto's variant - but it is more efficient than the Lomuto variant (which has no recursion and is easier to implement). For my implementation of quicksort I implemented Hoare's.
- You will also note in my implementation of quicksort I randomly chose the pivot, this is to protect against the worst case quadratic time complexity. You can also choose to randomly shuffle the input before sorting, but as C# does not have a built in shuffle algorithm I went with a random pivot.
- There is an interesting problem sometimes asked in interviews called the Dutch National Flag problem. This problem can be solved using a variation of quicksort.
- The Dutch National Flag problem essentially asks us to sort a random input consisting of three colours (the Dutch flag has three colours). And sort the input such that each colour is grouped together. Example:
[red, white, red, white, blue, white, white]
when sorted might produce:[blue, white, white, white, white, red, red]
. There are several variations of this problem, but they can all be solved efficiently in linear time using a 3-way quicksort. - A 3-way quicksort will partition the input into 3 subsections: items less than the pivot, items equal to the pivot, and items greater than the pivot.
- For a more detailed overview of solving the Dutch National Flag problem with a 3-way quicksort check out the Entropy-optimal sorting section of the quicksort chapter in the Algorithms book.
Mergesort
Resources
- Mergesort Overview.
- Mergesort Overview + Implementation.
- Detailed Mergesort Overview + Implementation.
- Mergesort Overview + History.
Takeaways
- Another fast sort. Along with quicksort, mergesort is a sort every programmer should be familiar with due to it's relative speed
- In simple terms, mergesort splits an array in half and sorts each half independently - then merges the two sorted halves together.
- On paper it is faster than quicksort with an average and best case time complexity of
O(n log n)
and a space complexity ofO(n)
. - It is stable.
- It is a divide & conquer algorithm.
- It is not-in-place.
- Like Hoare's quicksort, mergesort is recursive. It also similarly splits an input list/array in two, then sorts each half. After sorting each half mergesort will merge them back together (hence the name).
- I found mergesort to be the most complex sorting algorithm to implement. The next most complex was quicksort.
- There are two common types of mergesort: Top-Down & Bottom-Up.
- Top-down mergesort works downwards by starting with the original input and splitting it in half, it then recursively splits each resulting half and will continue to do so until each subsection of the original input is a series of lists containing a single item each. It will then merge these back together, whilst unwinding the stack, to form a sorted array.
- Bottom-down mergesort works in the opposite direction (upwards from one item subsections of the list). It starts by doing merges of subsections of the input that are one item in size.
- My mergesort implementations are top-down.
Heapsort
Resources
Takeaways
- Heapsort is very fast on paper. It's time complexity is
O(n log n)
in the average and worst cases. It has a best case ofO(n)
. Space isO(1)
. - Although it is faster on paper, it is often slower than quick/mergesort in practice.
- It is unstable.
- It is in-place.
- Heapsort first transforms an input into a heap (usually a max-heap), and then will continue removing the max item from the heap (first/parent item in the heap), and swap it with the furthest right child (last item). After the swap it will decrement a counter (to ignore the max item that was moved towards the end), then percolate down the item that was placed at the first index. This process is repeated until the array is sorted.
- You can also use a min-heap for heapsort, which I did in my previous post on binary heaps - however using a max-heap requires less memory overhead/operations.
Bonus sorts
- External Mergesort - this is a variation of mergesort that does a K-way merge to handle scenarios when the data to be sorted is larger than the available memory. External Mergesort overview.
- Countingsort is a stable sort that works by counting the occurrences of each item and using those counts to compute the index at which the items should start getting inserted at. Here is a good video overview of countingsort.
-
Radixsort is a stable sort that uses countingsort under the hood but will sort input numbers one at a time. For strings it will sort one letter at a time. For example:
[123, 456, 111, 122, 333]
each number would get sorted by its last digit first, then the array would get sorted using the second digit, and so on until the input is sorted. Here is a good video overview of radixsort.
Below you will find implementations of all the sorting algorithms discussed:
As always, if you found any errors in this post please let me know!
Top comments (2)
Here are the current versions of my sorting algorithms (with all those topics included):
@zacharypatten
I think some of those suggestions could be worthwhile and your repo definitely looks interesting.
Having said that, the main aim of the post, and the series in general, is to learn the concepts and write code that can be easily understood - even by developers unfamiliar with C#.
At any rate, thanks for reading and providing feedback!