Intro and Brief Overview of Big-O
Intro
Sooner or later we must come to wrestle with the beast that is Big -O Time Complexity. Wikipedia has a good definition of Time Complexity:
'Time complexity is commonly estimated by counting the number of
elementary operations performed by the algorithm’
Here we will be talking about linear Big-O (O(n)) as a path to understanding the more elusive nature of logarithmic Big-O (O(log n)).
A Brief Overview
Big-O Time Complexity evaluates the number of operations required for an algorithm to produce its result in the 'worst-case'. In order to see how Big-O works in practice, let's first analyze the Big-O for running a recursive indexOf function on a sorted array.
Linear: O(n)
Here is a fairly straightforward recursive implementation of an indexOf function using 'es6 recursion patterns with default and rest params'. The implementation is my own, but it is modeled off the syntactic patterns found in the implementations of various other functions as per this excellent article.
const indexOf = ([x, ...xs], target, index = 0) => x !== undefined
? x === target ? index : indexOf(xs, target, ++index)
: -1;
const newArray = [1, 2, 3, 4, 5];
console.log(indexOf(newArray, 5))//returns 4, as 5 is found at the fourth index of newArray
The time complexity of performing this operation is linear O(n) due to the fact that, in the worst case, for every element ‘x’ in the array numbers, we would have to perform an operation: comparing it to the target value, 5. No matter how big or small the numbers array gets, in order to find whether or not the target value exists inside, in the worst case (our element exists at the final index or not at all), we have to perform exactly one operation for each element in the array. Thus the growth of the Big-O is directly proportional to the growth of the data set: One more element, one more operation.
Here is a graphic representation:
Note the yellow linear curve
The number on the n-axis(commonly the x-axis) represents the amount of items in our JavaScript array 'newArray' from above. The number on the f(n)-axis(commonly the y-axis) represents the number of operations required, in the worst-case, for the algorithm to produce a result.
It helps me to imagine when we run a function, say indexOf from above, that we're starting at the point on the yellow line where
n = the number of items in the array and moving left along that line as the function runs until we hit n = 0 and no more items are left to be processed. A steeper curve(or line) means more operations(f(n) axis) required to complete our algorithm. This process will help us to envision logarithmic complexity.
Logarithmic time complexity:
What is a logarithm? A logarithm has two components, “a base ‘b’ and a given number ‘x’” A logarithm finds out how many times the base b would have to multiply itself to become the ‘given number x’
‘For example, log2 64 = 6, as 2^6 = 64’ from Wikipedia on Logarithms
It might help to rearrange this equation a bit though for our purposes, because division is really what we are doing when logarithmically traverse a data set. Say we are using a 'divide-and-conquer' algorithm, such as Binary Search(described below) to search our sorted array of size 'x'. We start with x = 64 and divide it by the base 2 until we get 1, which should be the result. How many operations does this take??
64/2 = 32/2 = 16/2 = 8/2 = 4/2 = 2/2 = 1
-> 6 operations
Basically we are asking, in the case of a Binary Search, “how many times will I have to divide my data set of size ‘x’ values by 2 to isolate one value? i.e. to find (or not find) the target”
What is Binary Search?
The Binary Search Algorithm operates on a sorted array to find a target value in logarithmic time complexity. Given a sorted array Binary Search finds the middle index of the array and compares the value found at that index to the target value. If the target value is less than the value found in the middle index, we know we only need to search the ‘lower-half’ of numbers in the array. It cannot be located in the 'upper-half' because those numbers are all higher than the value at the middle index.
In the implementation and example below, at first iteration, min will be set to the first index in the array and max will be set to the last index. Binary Search will 'discard' either the ‘upper half’ or the 'lower-half' of numbers in the given array by resetting either the min value or the max value based on the middle value's comparison with the target value. On the next iteration it will only search between the newly updated min and max indexes. The 'discarded' half will never be searched, thus eliminating half of the size of the array and half of the potential operations immediately.
Below we implement binarySearch and use it to find the index of 47 in the nums array
let nums = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
const binarySearch = function(array, target, min, max) {
min = min || 0;
max = max || array.length -1;
let index = Math.floor((min+max) / 2);
if (array[index] === target) {
return index;
} else if (min > max) {
return null;
} else if (target > array[index]) {
min = index + 1;
return binarySearch(array, target, min, max);
} else if (target < array[index]) {
max = index - 1;
return binarySearch(array, target, min, max)
}
};
console.log(binarySearch(nums, 47))//logs 14 because 47 has been found at the 14th index
Bringing it All Together
As the given input size increases, an algorithm which processes data in logarithmic time will arrive at its result using substantially fewer operations than one which operates in linear time. Let’s imagine a sorted JavaScript array of 128 items. Using indexOf, we have to perform potentially 128 operations in order to find the target. Using Binary search we have to perform just seven operations:
128/2 = 64/2 = 32/2 = 16/2 = 8/2 = 4/2 = 2/2 = 1
If we double the data size we will have to perform potentially double the operations with indexOf, while we only have to perform one more with Binary Search.
Let's take another look at the graph from above:
We see that in the long run, yellow linear curve diverges more and more from the light-blue logarithmic curve.
Conclusion
Hopefully this article helps to elucidate just how efficient an algorithm operating in logarithmic time complexity can be when working with large data sets.
Top comments (0)