What is Big O notation?
It is the language we use to talk about how long an algorithm takes to run which allows us to compare the efficiency of different approaches to a problem.
When we talk about Big O we are talking about the concept of asymptotic analysis which focuses on how the runtime of an algorithm increases as the input grows and approaches infinity. What this means is that as datasets get larger, it is the growth function which will be the dominating factor of runtime.
What do you mean by a growth function?
This is how we express the speed of which how quickly the runtime grows. Since we aren't measuring the runtime directly we can't express it in something like seconds. Instead we use Big O notation and the size of the input, which is denoted by "n" and we can express the growth using that input with expressions such as O(n), O(n^2), O(lg(n)), etc.
Here are some algorithmic examples:
O(1) : "Constant time"
def print_first_item(items) puts items end
This method runs in constant because regardless of the size of the array of items, only one step is happening.
O(n) : "Linear time"
def print_all_items(items) items.each do |item| puts item end end
This method is considered O(n) where n is the number of items in the array. It is linear because we perform a step for each item in that array. If there are 10 items, we print 10 times, if there are 100, we print 100 times.
O(n^2) : "Quadratic Time"
def print_all_possible_ordered_pairs(items) items.each do |first_item| items.each do |second_item| puts first_item, second_item end end end
This method is is considered quadratic because if the array has 10 items, we print 1000 times. This because we have a loop nested within a loop. For n items in the array, the outer loop runs n times, and the inner loop runs nu times for each iteration of the outer loop. This brings us to printing n^2 times. Note: if you have a loop within a loop, it is often (but not always) an indication that you will have a O(n^2) runtime.
Why do care about the dataset as its size approaches infinity?
It is helpful to see this visually. Below we can see that for that with a smaller dataset, the O(n^2) function runs in less time than the linear or logarithmic function, so we might be tempted to use the n^2 method. However, most of the time as programmers we work with very large datasets, and sometimes we can't know how big they will become as more users are created or more information added. We can see that as the input gets increasingly larger, the O(n) will have a faster runtime than O(n^2) and O(lg(n)) will be even faster as the dataset gets even larger.
We also care about the input as it get arbitrarily large because when talking about our runtimes, we can drop constants, coefficients, and lower-order terms associated with the function. This is because all of these have relatively minor significance compared to the highest order component.
So if an algorithm was O(2n), we can ignore the 2 and just call it O(n). Similarly, a function that is O(n + n^2) we can refer to it as O(n^2) because the lower-order term becomes insignificant to the cost of the run time of n^2 as the input increases.
Worst Case Analysis
When programmers talk about Big O notation, it is typically implied that we are discussing the worst-case situation. There are a few reasons for this.
It is usually easier to calculate the worst-case runtime than the average runtime because there are few factors that need to be accounted for.
A worst case analysis is a guarantee that the runtime will never take any longer.
It is actually not that uncommon that the worst case scenario happens frequently when implemented.
Honesty is important! You may have written an algorithm that in the best case scenario has O(lg(n)) and if that is how you base your determination for your runtime, you may end up with a very unpleasant situation when all of a sudden the worst happens and your runtime is actually O(n^2) and suddenly an app becomes unusable.
In summary, Big O is a way that we can communicate with others the runtime of an algorithm. When we know the runtime, we can make improvements to our code to make it more efficient. If we have an algorithm that we know runs in O(n^2) we can then try to refactor it to make it more efficient with O(n * lg(n)) or a function that is even faster.