Making Sense of Merge Sort [Part 1]
Vaidehi Joshi Jun 05, 2017
If you’ve been reading this series sequentially, then there’s a good chance that over the course of the past few weeks, you’ve thought more about sorting things than you’ve ever thought about before! At least, that has certainly been the case for me. I haven’t started dreaming about sorting algorithms just yet, but I’m expecting that it might happen soon.
Well, guess what? We’re not quite halfway through this series yet — and we’re not quite halfway through sorting algorithms yet, either! However, we are at a turning point our algorithm adventure. So far, we’ve talked about some of the most common — and sometimes thought of as the more “simple” — sorting algorithms. We’ve covered selection sort, bubble sort, and insertion sort. If you take a closer look at these algorithms, you might notice a pattern: they’re all pretty slow. In fact, all of the sorting algorithms that we’ve explored thus far have had one thing in common: they’re all pretty inefficient! Each of them, despite their little quirks and differences, have a quadratic running time; in other words, using Big O notation, we could say that their running time is O(n²).
This commonality was completely intentional—surprise! The order in which we’re learning these topics is actually pretty important: we’re covering sorting algorithms based on their time complexity. So far, we’ve covered algorithms with a quadratic runtime complexity, but from this point forwards, we’ll be looking at algorithms that are significantly faster and more efficient. And we’ll start off with one of the most fun (albeit a little more complex) sorting algorithms there is!
Don’t worry, we’re going to break it down, step by step. I suppose that step one is telling you what this algorithm is called! It’s time for you to meet my new friend: merge sort.
Divide and conquer algorithms
You might have heard merge sort discussed or referenced in the context of a technical interview, or a computer science textbook. Most CS courses spend a decent amount of time covering this topic, and for some reason or another, this particular algorithm seems to pop up quite a lot.
Interestingly, however, merge sort was only invented (discovered?) in 1945, by a mathematician named John von Neuman. In the grand scheme of things, merge sort is still a fairly new algorithmic approach to sorting. But hang on a second — we still don’t know what it is yet!
Alright, time for a definition. The merge sort algorithm is a sorting algorithm that sorts a collection by breaking it into half. It then sorts those two halves, and then merges them together, in order to form one, completely sorted collection.
And, in most implementations of merge sort, it does all of this using recursion. But hold that thought for now — we’ll come back to recursion in just a moment.
The basic idea behind merge sort is this: it tends to be a lot easier to sort two smaller, sorted lists rather than sorting a single large, unsorted one.
We’ve already seen how sorting through one large, unsorted list can end up being slow. If we think back to selection sort, we ran into the issue of having to iterate through the list multiple times, selecting the smallest element and marking it as “sorted”. With bubble sort, we again had to pass through the list many, many times; each time, we compared just two elements at a time, and then swapped them. Bubble sort was more obviously slow since we had to make so many swaps! And then there was insertion sort, which was kind of acceptable if our list was already mostly sorted. But if it wasn’t, then we basically were forced to iterate through a list and, one item at a time, slowly pull out the smallest element and insert it into its correct spot in a sorted array.
This is where merge sort fundamentally differs from all of these other sorting algorithms. Let’s take a look at an example to help illustrate this.
In the image here, we have a single, unsorted list. Conceptually, merge sort asserts that instead of a one unsorted list, it’s a lot easier to sort and join together two sorted lists. The idea is that if we could somehow magically end up with two sorted halves, then we could very easily merge those two sorted sublists together. Ultimately, if we did our merging in a smart, efficient way, we could end up with one sorted list at the end of it all.
Hopefully, at this point, you’re wondering how on earth merge sort can just “magically” split and sort two halves of our list.
Hang on for a second — we’re programmers! Intrinsically we know that, at the end of the day, there really is no magic at play here. There’s something else going on under the hood, and it’s probably an abstraction of something else.
In the case of merge sort, that abstraction is something called divide and conquer (sometimes referred to as d&c ). The divide and conquer technique is actually an algorithm design paradigm, which is really just a fancy way of saying that it’s a design pattern that lots of algorithms use! In fact, we’ve already seen this exactly paradigm before, way back when we were first learning about the binary search algorithm.
So what does the divide and conquer paradigm entail, exactly? Well, for starters, an algorithm that uses the divide and conquer strategy is one that divides the problem into smaller subproblems. In other words, it breaks down the problem into simpler versions of itself.
The basic steps of a d&c algorithm can be boiled down to these three steps:
- Divide and break up the problem into the smallest possible “subproblem”, of the exact same type.
- Conquer and tackle the smallest subproblems first. Once you’ve figured out a solution that works, use that exact same technique to solve the larger subproblems — in other words, solve the subproblems recursively.
- Combine the answers and build up the smaller subproblems until you finally end up applying the same solution to the larger, more complicated problem that you started off with!
The crux of divide and conquer algorithms stems from the fact that it’s a lot easier to solve a complicated problem if you can figure out how to split it up into smaller pieces. By breaking down a problem into its individual parts, the problem becomes a whole lot easier to solve. And usually, once you figure out how to solve a “subproblem”, then you can apply that exact solution to the larger problem. This methodology is exactly what makes recursion the tool of choice for d&c algorithms, and it’s the very reason that merge sort is a great example of recursive solutions.
Spotting recursion in the (algorithm) wild
Understanding divide and conquer in theory is one thing, but its usefulness tends to be far more obvious if we can see it in action. Let’s take a look at how we can use recursion to divide and conquer the illusive merge sort algorithm!
In theory, we understand that merge sort splits up an unsorted list, sorts the two halves, and merges them together.
But how does this algorithm really work in practice?
Well, now that we know what exactly that “magic” is that goes on behind the scenes — that is to say, the abstraction that’s responsible for sorting the two sublist halves—we can finally try and understand how it works. It’s time for us to apply the divide and conquer paradigm to merge sort and figure out what’s actually going on inside this algorithm!
We’ll start with a simple example, and we won’t actually worry about sorting any values just yet. For now, let’s try to understand how the divide and conquer methodology comes into play. We know that first need to divide and break up the problem into the smallest possible “subproblem”, of the exact same type. The smallest possible “subproblem” in our situation is our base case — the point at which we’ve basically solved our problem. In terms of sorting items, the base case is a sorted list. So, we can divide our large problem (and our unsorted list) into it’s smallest possible pieces.
In the example above, we just continue to divide our problem into smaller subproblems. We start off with one, unsorted list. Then we break that up into two, unsorted lists. We can divide that even further, into four lists. And then we can break that down again, into eight lists. At this point, we’ve got eight lists, and each list has only one item in it.
We can’t divide these any further, can we? So, we’ve arrived at our smallest possible subproblem, and our base case: a sorted list, with one single item in it. You might remember from previous sorting algorithms that a list with only one item in it is, by definition, considered sorted. This is because there’s literally nothing else — no other item — to compare it to!
Okay, now we have eight sorted lists, and each one has only one item in it.
According to divide and conquer, we know that the next step is to tackle our smallest subproblems first. Once we can determine a solution that works, we’ll apply that same solution to solve the larger subproblems — in other words, we’ll use recursion.
So, let’s start by combining (merging) two items together at a time. Since we have eight sorted lists, let’s combine each list with its neighbor. When we do this, we’ll put the items in the correct, sorted order.
Nice! That was pretty cool. We took our eight, sorted lists, and merged them together to create four sorted lists. Then we took those four lists, and merged them together to form two sorted lists.
Oh, wait — two sorted lists? Hang on…that sounds familiar. I think we just figured out the “magic” behind merge sort! How rad is that?!
Now, since we have two sorted lists, we can finish up our merge sort by merging them together.
There are a few reasons that divide and conquer actually works so well in implementing merge sort. For starters, the fact that a list with a single item is, by definition, a “sorted” list is what makes it easy for us to know when we’ve hit our smallest possible “subproblem”. When this algorithm is implemented in code, this ends up being the base case for determining when the recursion should end. We’ve talked about recursion before in the context of binary trees; in that situation, the base case is a single leaf node — in other words, when you can’t break down a subtree into any smaller possible parts. The same idea applies here: when we can’t break down our list into any smaller possible parts, and when we only have one item that is sorted, we’ve hit our recursive base case.
Another reason that divide and conquer works here is once we know how to merge two items together and have figured out the logic behind that, we basically can just keep reusing that same logic and continue applying it to every built-up sublist that we merge together.
This is exactly what makes recursion so powerful: once we’ve figured out how to solve the problem of merging two lists together, it doesn’t matter if the list has one element, or a hundred — we can reuse the same logic in both scenarios.
Let’s take a look at one more example — this time, we’ll use actual numbers that we’re trying to sort. In the drawing below, we have an unsorted list with eight numbers. We can start by dividing the unsorted collection into two sublists. We’ll just continuing dividing until we hit our base case.
Okay, now we’re down to the smallest possible subproblem: eight lists, and each has one, sorted item within it. Now, we just need to merge two lists together. We’ll merge each sorted list together with its neighbor. When we merge them, we’ll check the first item in each list, and combine the two lists together so that every element is in the correct, sorted order.
Easy-peasy! We got this.
Great! Once we started merging two lists together, we didn’t have to think too much more when it came to merging those two sorted sublists, did we? We used the same technique to merge lists with four items as we did when we merged lists with only one item.
Okay, illustrations are great — but what would this look like in code? How would we even be able to spot the recursive part of a merge sort algorithm if we saw one in the wild?
Recursion at work
We already know that a recursive algorithm is one that conceptually uses the same solution to solve a problem. When it comes to code, this can effectively be translated to a function that calls itself.
In the code below — which is adapted from Rosetta Code’s JavaScript implementation of merge sort — we can see that the mergeSort function actually calls itself.
function mergeSort(array) {
// Determine the size of the input array.
var arraySize = array.length;
// If the array being passed in has only one element
// within it, it is considered to be a sorted array.
if (arraySize === 1) {
return;
}
// If array contains more than one element,
// split it into two parts (left and right arrays).
var midpoint = Math.floor(arraySize / 2);
var leftArray = array.slice(0, midpoint);
var rightArray = array.slice(midpoint);
// Recursively call mergeSort() on
// leftArray and rightArray sublists.
mergeSort(leftArray);
mergeSort(rightArray);
// After the mergeSort functions above finish executing,
// merge the sorted leftArray and rightArray together.
merge(leftArray, rightArray, array);
// Return the fully sorted array.
return array;
}
function merge(leftArray, rightArray, array) {
var index = 0;
while (leftArray.length && rightArray.length) {
if (rightArray[0] < leftArray[0]) {
array[index++] = rightArray.shift();
} else {
array[index++] = leftArray.shift();
}
}
while (leftArray.length) {
array[index++] = leftArray.shift();
}
while (rightArray.length) {
array[index++] = rightArray.shift();
}
}
There are a few things going on here, and we won’t dive into all of them today (don’t worry, we’ll come back to it next week!). For now, let’s just look at what makes this algorithm recursive in nature. We can see that we’re taking in an input array, and splitting it as close as we can to the center — in this case, we call it midpoint.
Then, we take those two halves (leftArray and rightArray), and we actually pass those in as the new input arrays to the internal calls to mergeSort. Guess what? This is recursion in action!
We can think of it recursion kind of like Russian babushka dolls — or like boxes with smaller boxes within them.
A mergeSort function ultimately has two functions inside of it:
- a merge function, which actually combines two lists together and sorts them in the correct order
- and a mergeSort function, which will continue to split the input array again and again, recursively, and will also call merge again and again, recursively.
Indeed, it is because merge sort is implemented recursively that makes it faster than the other algorithms we’ve looked at thus far.
But why is this? And just how much faster is merge sort, exactly? Well, you’ll get the answer to both of those questions next week! In part 2 of this series, we’ll look at the runtime complexity of merge sort, how this recursion actually makes it more efficient, and how merge sort stacks up against other algorithms. Until then — happy merging!
Resources
Merge sort comes up quite a lot in computer science curriculums, technical interviews, and even in textbooks. There’s a wide array of resources for this particular algorithm simply because it’s one of the more tricky sorting algorithms to understand. If some of this still doesn’t make sense to you — or if you’d like to see more examples — try taking a look at these different resources below.
- Binary Search & Merge Sort, Department of Computer Science, Carnegie Mellon University
- Mergesort, Ian Foster
- Merge Sort, Harvard CS50
- Merge sort algorithm, mycodeschool
- External Sorting, GeeksForGeeks
This post was originally published on medium.com
As a developer with no CS background, I'm really enjoying these articles. Now I actually want to try implementing a merge sort myself. Thanks!
Heck yes!
Yes! I totally support this decision, Prash. And would love to see your implementation when you're done :)
OK, I finally got round to it. This is an implementation in my language of choice, F#. Note that I didn't read the JS implementation. This is purely based on your description of recursive splitting and merging, and so there is no mutation here. Just a lot of building up of arrays and lists, so it's probably not very efficient. It was fun to write, though!
Thanks for sharing Vaidehi, I'm anxious to the part 2!
Thank you so much for this series!
I also was tempted by this article to implement a mergesort myself (while heavily relying on yours of course). I think it's a bit shorter than yours but also, for me more importantly, it is not mutating the array(s) that are fed in which I didn't like. OTOH it maybe uses more space. So here it is:
Hi there, I think the merge function you implemented is wrong. When you started to merge the individual items of each partition you still have to compare the items of each partition before putting into the merged partiton. In your code you simply shifted each partition individually into the merged partiton. Should be something like this (from Wikipedia)
TopDownMerge(A[], iBegin, iMiddle, iEnd, B[])
{
i = iBegin, j = iMiddle;
}
I think you might have misread a section of the function -
Here, I'm both comparing the elements of each partition and also merging it in. This is still a merge sort function, it has just been rewritten into simplified bits of code, rather than implemented in one single
while
loop.Thanks. Missed the shift() function definition which is effectively using it like a stack so it means you're cutting it down instead of traversing the array.
No, not like a stack. A stack gets pushed and popped from the end. She's pulling from the front, but not even in the sense of a queue. It's just an array that happens to get consumed based on which one has the lowest first value.
She treats
merge
as a private API for this algorithm, giving it knowledge thatleftArray
andrightArray
belong to themergeSort
algorithm so it can do whatever it needs to with them. If I had to guess, I'd say it looks like a refactoring, pulled out into its own function after having to do the same operation with both segments of the original array.The most important thing to keep in mind is that she's not optimizing the code for execution time, space, or anything involving a computer. Instead, she's optimizing for readability for the audience she's targeting with this blog series — that is, developers who don't have a background in CS. The example you pasted above, while apparently correct and probably efficient, is a poor teaching tool for that audience. A single iterator for the main array and only working with the front of the two other arrays is less cognitive load than having three iterators moving through three arrays at three different paces.