Algorithm design and optimization for beginners

#beginners #programming #algorithms #computerscience

Hi! This is my first post on DEV. I’m going to tell why you should optimize algorithms. As an example, I will show how to design algorithm for maximum pairwise product which is pretty basic. I’ll explain how to design algorithm for this problem and optimize until we get an efficient one. So, Let’s get started!

Understand Problem Statement

First of all, “What is maximum pairwise product” problem? Well, the problem is to find a highest product value in a sequence of non-negative integers.

Just for clarification,

Input: A sequence of non-negative integers.

Output: The maximum value that can be obtained by multiplying two different elements from the sequence.

Sample:

Design the algorithm

Ok! So, let’s start with the simplest algorithm first. It is to go through all possible pairs of the input elements A[1 . . . n] = [a 1 , . . . , a n ] and to find a pair of distinct elements with the largest product.

Here is the pseudocode of the above algorithm:

Implement the Algorithm

According to the pseudocode, we'll implement the algorithm in java as follows:

Test, Debug and Optimize

Test

Check this code on a small datasets such as [1, 3, 4, 6] to ensure that it produces correct results. Then try an input [100000, 100000]. According to the algorithm, the output should be 10000000000. But wait...what happened?? The program fails to give an output for 100 000 * 100 000 and it gives 1,410,065,408 instead of the correct result 10,000,000,000. You know why?

Debug

Well, this kind of error is probably caused by an Integer Overflow. Integer overflows cannot be detected until they have happened. That's why, we should always keep in mind about overflow errors in programming. So as you can see, this algorithm only works for small data and therefore, we need to assign another data type for large datasets. We are going to use long here.

Optimize

Everything works well now. But you think it'll work fast and correctly when there are billions of input? As we can see there's two nested for loops for finding maximum pairwise product, this still is an O(n²) algorithm. So it still has an ugly running time. For large datasets, it will take for a long time to complete. WE NEED A FASTER ALGORITHM!

Stop and Think for a while. What if we find only the two largest numbers in the array and multiply them instead of multiplying each and every single possible combination.

Since we need to find the largest and the second largest numbers of the array, we'll only need two scans of the sequence.

During the first scan, we find the largest number.
During the second scan, we find another largest number among the remaining ones and skip the number found at the previous scan.
Finally multiplying both numbers and return the result

This reduces the complexity of algorithm to O(n) and our algorithm is much optimized.

The pseudocode will be changed like this:

Here is the code for the optimized algorithm.

Stress Testing

After doing optimization,testing and debugging, it is also required to test beyond the limits of normal operation. Writing bug-free code is impossible. There is no such thing like this but you can somehow minimize the volume and the severity of the bugs present in your code. Always remember, your algorithm can sometimes fail for some test cases and you will need to know which are those cases. This is where stress testing technique becomes useful. So, what's stress testing and why do we need it? To explain, it is a technique for generating thousands of tests with the goal of finding a situation where the algorithm fails to give the correct output.

Basically, a stress test consists of:

At least two different implementations of an algorithm (For our scenario, one is the simple brute force algorithm and the other is the optimized algorithm)
A random test generator.
An infinite loop in which random test generator generate a new test case, pass into both implemented algorithms and compare the results. If the results are different, the program stops, else, the loop repeats.

For our case, we create a infinite while loop and then generate arrays with random size and assign each array with random numbers. After that, we pass the array to both implementations and then compare the results.

Here is the stress test for the algorithm:

After running the code, you will see a test case where getMaxPairwiseProduct and MaxPairwiseProductFast produce different results, so now we can check what went wrong. Can you find anything suspicious in the dataset of different results?

Well, the bitter truth is that generating tests automatically and running stress test is easy,
but debugging is hard, right?

Take a look at this stress test result(the result can be different on your computer because of a different random number generator):

As you can see, the simple brute force algorithm gave the correct result but the optimized algorithm didn't. Why? To make a long story short, the optimized one doesn't work when the maximum numbers have the same value. The algorithm always fails on any test case where the largest number is equal to the second largest number.

So, although, the first algorithm works correctly, it becomes really slow when dealing with very large datasets. The second algorithm, however, is fast but doesn't work for the above cases in stress test.

What should we do now??

Ok. We just need to go back to our optimized algorithm and check the process again. And .. here we go! The condition that checks the second largest number is the reason of the wrong result!

So, we are changing that code to make it correct.
Here is the final code:

Run the stress test again and wait for a minute until we get bored. You will see there's no different results anymore.

MaxPairwiseProductFast algorithm is finally correct and optimized!

As for the final conclusion, even for such a simple problem like this, it is easy to make mistakes during algorithm designs and optimization. But practice makes perfect 😉. There are many ways to find the solution to this problem so I hope you can find more efficient algorithm than the example one.

Thank you very much for giving your time to read my post and please forgive me if my writing is too boring or too complicated. I'll try my best to write more blog post in the future 😃