The purpose of this series of blog posts is to demystify and conceptualize algorithms - more specifically the design and analysis hereof. These blog posts are targeted for software development students early in their studies, as well as students from similar lines of study.
However, I will also continuously relate the concepts to examples in laymans terms, to include people with no related background, but merely with a desire to learn more about algorithms. A certain level of mathematical understanding is however a prerequisite.
Some would argue that an understanding of algorithms is not needed to become a programmer. An understanding of algorithms will separate you from these programmers, by allowing you to quantify and compare different approaches to a problem. It will give you a broader vocabulary, or even language, with which you can rightfully reason for the superiority of a solution, or algorithm, for a given problem.
Most importantly, in terms of everyday programming, it will give you a greater understanding of data structures, which are an essential part of writing efficient code. Are you performing lookups in an array of reference type objects? Convert it to a data structure, which supports constant lookups, such as hash tables, instead of a linear scan. Are you using an array-based list for FIFO operations? Array-based lists mean linear time pop operations. Convert it to a FIFO-based data structure, such as a queue, which has constant time append and pop.
Examples like these show you why you need to familiarize yourself with data structures. Do so using what you learn from these blog posts, and you should see a dramatic improvement in the efficiency of your code.
This specific blog post will also give you the needed prerequisites for the coming blog posts, in which we will get into the actual design of algorithms. Let's get to it!
Complexity is the grounds upon which we discuss algorithms, namely in terms of space and time. Intuitively, we often determine the running time of an algorithm based on the input size. We will get back to this, as we familiarize ourselves with different running times. First off, we need to get acquainted with asymptotic notation. It gives us a way of expressing the growth rate of a function, or algorithm, in which we focus on the important terms of the function.
Take a simple quadratic equation, (an2 + bn + c), i.e.
15n2 + 10n + 5
in which n defines the size of the input. The running time of this function is bounded by the fastest growing term of this function, which is n2. As the input size n grows, where a > 0, the term n2 will eventually always exceed the size of the other terms and constants, i.e. bn + c in this case.
Using asymptotic notation, we express this by removing coefficients and inferior terms, which leaves us with an approximation of the growth rate of an algorithm. In asymptotic notation, we refer to O(⋅), 'big O'-notation, which expresses an upper bound on an algorithm, i.e. the worst case scenario, and Ω(⋅), 'big theta'-notation, which expresses a lower bound on an algorithm, i.e. the best case scenarios. In this series of blog posts, we will only focus on the upper bound.
Again, let's talk about prerequisites. Complexity refers to both running time and space consumption, however, we will focus on running times, as it is more approachable and intuitive than space complexity.
The examples of this post will be based on both a list of whole numbers, i.e. an integer array, with a thousand elements and a group of people, for different levels of complexity. Furthermore, for the sake of simplicity and understanding, we assume that one operation takes one second. For our group of people, a question to a member of the group could correspond to an operation. Finally, we will refer to the below graph from http://www.bigocheatsheet.com for a visual comparison of the running times.
This is referred to as constant time, as it requires a constant amount of operations. The bigocheatsheet graph refers to this as an 'excellent' running time. Furthermore, an important point here is that the algorithm is independent of the input size, which means the running time does not increase, as the input increases.
An example of a constant time algorithm is finding the value of an element of our integer array. Given an integer array with a thousand elements, accessing a single element at a certain index is constant, as it requires a single lookup, which corresponds to a single second.
This example corresponds to finding the age of a specific person from a group, regardless of the size, which only requires a single question to that very person. Simple, right? Moving on.
This is referred to as linear time, as it is linearly dependent on the size of the input. The bigocheatsheet graph refers to this as a 'fair' running time. As the size of your input increases, so does your number of operations, and therefore your running time. However, remember that due to our 'big O'-notation, the running time of a linear time algorithm increases at most at the same rate, as the number of elements of the input - for simplicity our example will use every element of the input.
An example of a linear time algorithm is finding the minimum value of an unsorted integer array. Given an integer array with a thousand elements, finding the minimum value requires iterating through the entire list, which requires n operations, which in our case corresponds to a thousand seconds. A dramatic increase in running time, compared to constant time, O(1), with no increase in the input size.
This example corresponds to finding the youngest person in a group of people, which requires asking every single person of said group, before you are able to conclude, which person is the youngest.
We now increase the theoretical complexity considerably. We assess two running times, which are often referred to in job interviews in combination - quadratic (polynomial) time and linearithmic time algorithms, respectively. Interviewers will often pose problems, which seem to have an obvious, intuitive solution in quadratic time, but are often solvable in at least linearithmic time - a dramatic improvement in running time.
With O(n2), we look at each element a constant amount of times, for each other element, at most, which is a considerable increase in operations, compared to linear time, where we only look at each element a constant amount of times. This is a dramatic increase in running time, which is also apparent in the bigocheatsheet graph, where we've moved from the yellow (fair), into the red (horrible). In a job interview, such a running time should be your cue to look for a better solution.
An example of a quadratic time algorithm is finding all pairs in our integer array. Given an integer array of a thousand elements, one would find all pairs by iterating over the entire collection of size n once for each of the n elements. This corresponds to n2 operations, which in the case of the integer array corresponds to a million seconds. A nested loop, in which both the inner and outer loop iterates the same list, is a classic example of a quadratic time algorithm.
This example translates directly to our group of people, in which we try to find all pairs of people. It should be noted that pairing up with oneself would not be a valid pair, but it is irrelevant for the matter of the example.
For O(n log n), first we note, that we typically refer to log2(n), as we discuss logarithms in algorithms. The graph places this between linear and quadratic time, at a 'bad' running time. If needed, please do refresh your memory on logarithms - the kryptonite of exponentials - before continuing with the next blogpost. The most common linearithmic time algorithms are definitely sorting algorithms. Replacing a quadratic time sorting algorithm with one of linearithmic time is an example of said improvement of running time.
An example of a linearithmic time algorithm is mergesort. In short, given our integer array with a thousand elements, one would compare pairwise consecutive integers
((e1, e2), (e3, e4), (e5, e6) ...)
and then merge exponentially larger sets of integers until finally combining the two halves of the array. Don't worry about the details of the algorithm - we'll get into the algorithm in the next blogpost.
This algorithm and example would also be applicable for sorting a group of people by age, but quite hard to explain in layman's terms. For a more hands-on example, however, I strongly recommend this clip, https://www.youtube.com/watch?v=XaqR3G_NVoo, in which Sapientia University visualizes the mergesort algorithm by means of Transylvanian-saxon folk dance. If nothing, just do it for the laugh of it.
The important thing to notice here is the dramatic improvement in running time. A quadratic time algorithm would sort an array in n2 operations, i.e. a million operations or seconds in our case; the linearithmic algorithm only requires n log n operations, i.e. ten thousand operations or seconds. This is a reduction from eleven days to two hours!
We refer to this as logarithmic time. Returning to the bigocheatsheet graph, we finish back where we started, an 'excellent' running time, even though the running time is dependent on the input size, n.
An example of a logarithmic time algorithm is binary search for ordered collections. Given our integer array, you search for an integer i of the array, by continuously looking at the element at the center of the array, and doing one of three things. If the value of the middle element m:
- Is the integer i, we have found the integer.
- Is greater than the integer i, we repeat the process for the lower half of the array.
- Is less than the integer i, we repeat the process for the upper half of the array.
It should be noted, that other strategies exist for picking the pivot other than choosing the middle element. This is irrelevant for the purpose of this example. It is important to notice, that as a property of logarithms, we need to do this no more than log2(n) times, as this would leave us with a single element. To figure out, why this is true, try to prove to yourself, that x in log2(n) = x, represents the number of times, we can half our input size, and that n in 2x = n, represents the size of an array you get by doubling x times.
This example corresponds to having a thousand people line up in order of their salary, and looking for the person closest to a ridiculously specific yearly salary, such as 359344,297 DKK.
By following the three steps above, you only need to ask a maximum of log2(n) + 1 people, in this case eleven, to come to a conclusion. Note, that the last question (i.e. the + 1) comes from asking the last person to compare him with his "neighbours".
Again, the important thing to notice is how logarithms scale. If we increase the number of people to a million people, the amount of needed questions only increases to twenty. This explains, why it is depicted on top of constant time algorithms in the bigocheatsheet graph, as it almost seems to be independent of the input size, n.
With the basics down, we will get into the design and analysis of algorithms in the next two blog posts. We will do so by becoming acquainted with three families of algorithms - greedy algorithms, dynamic programming algorithms, and divide and conquer algorithms. This will lead us to the final blog post, which will discuss the subject of my own master thesis, which is based on randomized algorithms - more specifically differential privacy.
You made it this far. By now, you should already feel more comfortable discussing algorithms and data structures in future projects. We've covered the basics, which means, you are more than qualified for reading the rest of the posts. I hope you do!