Making Sense of Merge Sort [Part 1]
Vaidehi Joshi γ»12 min read
If youβve been reading this series sequentially, then thereβs a good chance that over the course of the past few weeks, youβve thought more about sorting things than youβve ever thought about before! At least, that has certainly been the case for me. I havenβt started dreaming about sorting algorithms just yet, but Iβm expecting that it might happen soon.
Well, guess what? Weβre not quite halfway through this series yetΓ’β¬Ε βΓ’β¬Ε and weβre not quite halfway through sorting algorithms yet, either! However, we are at a turning point our algorithm adventure. So far, weβve talked about some of the most commonΓ’β¬Ε βΓ’β¬Ε and sometimes thought of as the more βsimpleβΓ’β¬Ε βΓ’β¬Ε sorting algorithms. Weβve covered selection sort, bubble sort, and insertion sort. If you take a closer look at these algorithms, you might notice a pattern: theyβre all pretty slow. In fact, all of the sorting algorithms that weβve explored thus far have had one thing in common: theyβre all pretty inefficient! Each of them, despite their little quirks and differences, have a quadratic running time; in other words, using Big O notation, we could say that their running time is O(nΓΒ²).
This commonality was completely intentionalβsurprise! The order in which weβre learning these topics is actually pretty important: weβre covering sorting algorithms based on their time complexity. So far, weβve covered algorithms with a quadratic runtime complexity, but from this point forwards, weβll be looking at algorithms that are significantly faster and more efficient. And weβll start off with one of the most fun (albeit a little more complex) sorting algorithms there is!
Donβt worry, weβre going to break it down, step by step. I suppose that step one is telling you what this algorithm is called! Itβs time for you to meet my new friend: merge sort.
Divide and conquer algorithms
You might have heard merge sort discussed or referenced in the context of a technical interview, or a computer science textbook. Most CS courses spend a decent amount of time covering this topic, and for some reason or another, this particular algorithm seems to pop up quite a lot.
Interestingly, however, merge sort was only invented (discovered?) in 1945, by a mathematician named John von Neuman. In the grand scheme of things, merge sort is still a fairly new algorithmic approach to sorting. But hang on a secondΓ’β¬Ε βΓ’β¬Ε we still donβt know what it is yet!
Alright, time for a definition. The merge sort algorithm is a sorting algorithm that sorts a collection by breaking it into half. It then sorts those two halves, and then merges them together, in order to form one, completely sorted collection.
And, in most implementations of merge sort, it does all of this using recursion. But hold that thought for nowΓ’β¬Ε βΓ’β¬Ε weβll come back to recursion in just a moment.
The basic idea behind merge sort is this: it tends to be a lot easier to sort two smaller, sorted lists rather than sorting a single large, unsorted one.
Weβve already seen how sorting through one large, unsorted list can end up being slow. If we think back to selection sort, we ran into the issue of having to iterate through the list multiple times, selecting the smallest element and marking it as βsortedβ. With bubble sort, we again had to pass through the list many, many times; each time, we compared just two elements at a time, and then swapped them. Bubble sort was more obviously slow since we had to make so many swaps! And then there was insertion sort, which was kind of acceptable if our list was already mostly sorted. But if it wasnβt, then we basically were forced to iterate through a list and, one item at a time, slowly pull out the smallest element and insert it into its correct spot in a sorted array.
This is where merge sort fundamentally differs from all of these other sorting algorithms. Letβs take a look at an example to help illustrate this.
In the image here, we have a single, unsorted list. Conceptually, merge sort asserts that instead of a one unsorted list, itβs a lot easier to sort and join together two sorted lists. The idea is that if we could somehow magically end up with two sorted halves, then we could very easily merge those two sorted sublists together. Ultimately, if we did our merging in a smart, efficient way, we could end up with one sorted list at the end of it all.
Hopefully, at this point, youβre wondering how on earth merge sort can just βmagicallyβ split and sort two halves of our list.
Hang on for a secondΓ’β¬Ε βΓ’β¬Ε weβre programmers! Intrinsically we know that, at the end of the day, there really is no magic at play here. Thereβs something else going on under the hood, and itβs probably an abstraction of something else.
In the case of merge sort, that abstraction is something called divide and conquer (sometimes referred to as d&c ). The divide and conquer technique is actually an algorithm design paradigm, which is really just a fancy way of saying that itβs a design pattern that lots of algorithms use! In fact, weβve already seen this exactly paradigm before, way back when we were first learning about the binary search algorithm.
So what does the divide and conquer paradigm entail, exactly? Well, for starters, an algorithm that uses the divide and conquer strategy is one that divides the problem into smaller subproblems. In other words, it breaks down the problem into simpler versions of itself.
The basic steps of a d&c algorithm can be boiled down to these three steps:
- Divide and break up the problem into the smallest possible βsubproblemβ, of the exact same type.
- Conquer and tackle the smallest subproblems first. Once youβve figured out a solution that works, use that exact same technique to solve the larger subproblemsΓ’β¬Ε βΓ’β¬Ε in other words, solve the subproblems recursively.
- Combine the answers and build up the smaller subproblems until you finally end up applying the same solution to the larger, more complicated problem that you started off with!
The crux of divide and conquer algorithms stems from the fact that itβs a lot easier to solve a complicated problem if you can figure out how to split it up into smaller pieces. By breaking down a problem into its individual parts, the problem becomes a whole lot easier to solve. And usually, once you figure out how to solve a βsubproblemβ, then you can apply that exact solution to the larger problem. This methodology is exactly what makes recursion the tool of choice for d&c algorithms, and itβs the very reason that merge sort is a great example of recursive solutions.
Spotting recursion in the (algorithm) wild
Understanding divide and conquer in theory is one thing, but its usefulness tends to be far more obvious if we can see it in action. Letβs take a look at how we can use recursion to divide and conquer the illusive merge sort algorithm!
In theory, we understand that merge sort splits up an unsorted list, sorts the two halves, and merges them together.
But how does this algorithm really work in practice?
Well, now that we know what exactly that βmagicβ is that goes on behind the scenesΓ’β¬Ε βΓ’β¬Ε that is to say, the abstraction thatβs responsible for sorting the two sublist halvesβwe can finally try and understand how it works. Itβs time for us to apply the divide and conquer paradigm to merge sort and figure out whatβs actually going on inside this algorithm!
Weβll start with a simple example, and we wonβt actually worry about sorting any values just yet. For now, letβs try to understand how the divide and conquer methodology comes into play. We know that first need to divide and break up the problem into the smallest possible βsubproblemβ, of the exact same type. The smallest possible βsubproblemβ in our situation is our base caseΓ’β¬Ε βΓ’β¬Ε the point at which weβve basically solved our problem. In terms of sorting items, the base case is a sorted list. So, we can divide our large problem (and our unsorted list) into itβs smallest possible pieces.
In the example above, we just continue to divide our problem into smaller subproblems. We start off with one, unsorted list. Then we break that up into two, unsorted lists. We can divide that even further, into four lists. And then we can break that down again, into eight lists. At this point, weβve got eight lists, and each list has only one item in it.
We canβt divide these any further, can we? So, weβve arrived at our smallest possible subproblem, and our base case: a sorted list, with one single item in it. You might remember from previous sorting algorithms that a list with only one item in it is, by definition, considered sorted. This is because thereβs literally nothing elseΓ’β¬Ε βΓ’β¬Ε no other itemΓ’β¬Ε βΓ’β¬Ε to compare it to!
Okay, now we have eight sorted lists, and each one has only one item in it.
According to divide and conquer, we know that the next step is to tackle our smallest subproblems first. Once we can determine a solution that works, weβll apply that same solution to solve the larger subproblemsΓ’β¬Ε βΓ’β¬Ε in other words, weβll use recursion.
So, letβs start by combining (merging) two items together at a time. Since we have eight sorted lists, letβs combine each list with its neighbor. When we do this, weβll put the items in the correct, sorted order.
Nice! That was pretty cool. We took our eight, sorted lists, and merged them together to create four sorted lists. Then we took those four lists, and merged them together to form two sorted lists.
Oh, waitΓ’β¬Ε βΓ’β¬Ε two sorted lists? Hang onβ¦that sounds familiar. I think we just figured out the βmagicβ behind merge sort! How rad is that?!
Now, since we have two sorted lists, we can finish up our merge sort by merging them together.
There are a few reasons that divide and conquer actually works so well in implementing merge sort. For starters, the fact that a list with a single item is, by definition, a βsortedβ list is what makes it easy for us to know when weβve hit our smallest possible βsubproblemβ. When this algorithm is implemented in code, this ends up being the base case for determining when the recursion should end. Weβve talked about recursion before in the context of binary trees; in that situation, the base case is a single leaf nodeΓ’β¬Ε βΓ’β¬Ε in other words, when you canβt break down a subtree into any smaller possible parts. The same idea applies here: when we canβt break down our list into any smaller possible parts, and when we only have one item that is sorted, weβve hit our recursive base case.
Another reason that divide and conquer works here is once we know how to merge two items together and have figured out the logic behind that, we basically can just keep reusing that same logic and continue applying it to every built-up sublist that we merge together.
This is exactly what makes recursion so powerful: once weβve figured out how to solve the problem of merging two lists together, it doesnβt matter if the list has one element, or a hundredΓ’β¬Ε βΓ’β¬Ε we can reuse the same logic in both scenarios.
Letβs take a look at one more exampleΓ’β¬Ε βΓ’β¬Ε this time, weβll use actual numbers that weβre trying to sort. In the drawing below, we have an unsorted list with eight numbers. We can start by dividing the unsorted collection into two sublists. Weβll just continuing dividing until we hit our base case.
Okay, now weβre down to the smallest possible subproblem: eight lists, and each has one, sorted item within it. Now, we just need to merge two lists together. Weβll merge each sorted list together with its neighbor. When we merge them, weβll check the first item in each list, and combine the two lists together so that every element is in the correct, sorted order.
Easy-peasy! We got this.
Great! Once we started merging two lists together, we didnβt have to think too much more when it came to merging those two sorted sublists, did we? We used the same technique to merge lists with four items as we did when we merged lists with only one item.
Okay, illustrations are greatΓ’β¬Ε βΓ’β¬Ε but what would this look like in code? How would we even be able to spot the recursive part of a merge sort algorithm if we saw one in the wild?
Recursion at work
We already know that a recursive algorithm is one that conceptually uses the same solution to solve a problem. When it comes to code, this can effectively be translated to a function that calls itself.
In the code belowΓ’β¬Ε βΓ’β¬Ε which is adapted from Rosetta Codeβs JavaScript implementation of merge sortΓ’β¬Ε βΓ’β¬Ε we can see that the mergeSort function actually calls itself.
function mergeSort(array) {
// Determine the size of the input array.
var arraySize = array.length;
// If the array being passed in has only one element
// within it, it is considered to be a sorted array.
if (arraySize === 1) {
return;
}
// If array contains more than one element,
// split it into two parts (left and right arrays).
var midpoint = Math.floor(arraySize / 2);
var leftArray = array.slice(0, midpoint);
var rightArray = array.slice(midpoint);
// Recursively call mergeSort() on
// leftArray and rightArray sublists.
mergeSort(leftArray);
mergeSort(rightArray);
// After the mergeSort functions above finish executing,
// merge the sorted leftArray and rightArray together.
merge(leftArray, rightArray, array);
// Return the fully sorted array.
return array;
}
function merge(leftArray, rightArray, array) {
var index = 0;
while (leftArray.length && rightArray.length) {
if (rightArray[0] < leftArray[0]) {
array[index++] = rightArray.shift();
} else {
array[index++] = leftArray.shift();
}
}
while (leftArray.length) {
array[index++] = leftArray.shift();
}
while (rightArray.length) {
array[index++] = rightArray.shift();
}
}
There are a few things going on here, and we wonβt dive into all of them today (donβt worry, weβll come back to it next week!). For now, letβs just look at what makes this algorithm recursive in nature. We can see that weβre taking in an input array, and splitting it as close as we can to the centerΓ’β¬Ε βΓ’β¬Ε in this case, we call it midpoint.
Then, we take those two halves (leftArray and rightArray), and we actually pass those in as the new input arrays to the internal calls to mergeSort. Guess what? This is recursion in action!
We can think of it recursion kind of like Russian babushka dollsΓ’β¬Ε βΓ’β¬Ε or like boxes with smaller boxes within them.
A mergeSort function ultimately has two functions inside of it:
- a merge function, which actually combines two lists together and sorts them in the correct order
- and a mergeSort function, which will continue to split the input array again and again, recursively, and will also call merge again and again, recursively.
Indeed, it is because merge sort is implemented recursively that makes it faster than the other algorithms weβve looked at thus far.
But why is this? And just how much faster is merge sort, exactly? Well, youβll get the answer to both of those questions next week! In part 2 of this series, weβll look at the runtime complexity of merge sort, how this recursion actually makes it more efficient, and how merge sort stacks up against other algorithms. Until thenΓ’β¬Ε βΓ’β¬Ε happy merging!
Resources
Merge sort comes up quite a lot in computer science curriculums, technical interviews, and even in textbooks. Thereβs a wide array of resources for this particular algorithm simply because itβs one of the more tricky sorting algorithms to understand. If some of this still doesnβt make sense to youΓ’β¬Ε βΓ’β¬Ε or if youβd like to see more examplesΓ’β¬Ε βΓ’β¬Ε try taking a look at these different resources below.
- Binary Search & Merge Sort, Department of Computer Science, Carnegie Mellon University
- Mergesort, Ian Foster
- Merge Sort, Harvard CS50
- Merge sort algorithm, mycodeschool
- External Sorting, GeeksForGeeks
This post was originally published on medium.com
Thank you so much for this series!
I also was tempted by this article to implement a mergesort myself (while heavily relying on yours of course). I think it's a bit shorter than yours but also, for me more importantly, it is not mutating the array(s) that are fed in which I didn't like. OTOH it maybe uses more space. So here it is:
Hi there, I think the merge function you implemented is wrong. When you started to merge the individual items of each partition you still have to compare the items of each partition before putting into the merged partiton. In your code you simply shifted each partition individually into the merged partiton. Should be something like this (from Wikipedia)
TopDownMerge(A[], iBegin, iMiddle, iEnd, B[])
{
i = iBegin, j = iMiddle;
}
I think you might have misread a section of the function -
Here, I'm both comparing the elements of each partition and also merging it in. This is still a merge sort function, it has just been rewritten into simplified bits of code, rather than implemented in one single
while
loop.Thanks. Missed the shift() function definition which is effectively using it like a stack so it means you're cutting it down instead of traversing the array.
No, not like a stack. A stack gets pushed and popped from the end. She's pulling from the front, but not even in the sense of a queue. It's just an array that happens to get consumed based on which one has the lowest first value.
She treats
merge
as a private API for this algorithm, giving it knowledge thatleftArray
andrightArray
belong to themergeSort
algorithm so it can do whatever it needs to with them. If I had to guess, I'd say it looks like a refactoring, pulled out into its own function after having to do the same operation with both segments of the original array.The most important thing to keep in mind is that she's not optimizing the code for execution time, space, or anything involving a computer. Instead, she's optimizing for readability for the audience she's targeting with this blog series β that is, developers who don't have a background in CS. The example you pasted above, while apparently correct and probably efficient, is a poor teaching tool for that audience. A single iterator for the main array and only working with the front of the two other arrays is less cognitive load than having three iterators moving through three arrays at three different paces.
As a developer with no CS background, I'm really enjoying these articles. Now I actually want to try implementing a merge sort myself. Thanks!
Heck yes!
Yes! I totally support this decision, Prash. And would love to see your implementation when you're done :)
OK, I finally got round to it. This is an implementation in my language of choice, F#. Note that I didn't read the JS implementation. This is purely based on your description of recursive splitting and merging, and so there is no mutation here. Just a lot of building up of arrays and lists, so it's probably not very efficient. It was fun to write, though!
Thanks for sharing Vaidehi, I'm anxious to the part 2!