Kay Kleinvogel

Posted on Apr 4, 2020 • Originally published at kaykleinvogel.com

Introduction to function profiling (string reversal)

#javascript #algorithms #computerscience

This article originally appeared on my blog kaykleinvogel.com. If you liked this article make sure to check out my others.👍

When I was reading about different algorithms to reverse strings in JavaScript I was wondering which one was the fastest. I used the classical string.split("").reverse().join(""); before and I use it for its simple code and ease of reading. I disliked loop-based solutions for similar reasons. They always seemed to be much longer and hard to read in comparison. But since as a developer I always want to learn new things I put aside some time and dived into these algorithms to check out which was faster.

My Setup

For my programming language, I have chosen TypeScript. For unit tests, I am using mocha. For the data analysis, later on, I'll be using Jupyter Notebooks with NumPy and matplotlib.

The Contenders

My 2 contenders for this comparison are probably the most basic solutions for string reversal.

export const reverseInbuild = (inputString: String): String => {
    return inputString
        .split('')
        .reverse()
        .join('');
};

This solution is using basic string and array manipulation to reverse the string. In my opinion, this is probably the cleanest way (codewise) to perform a string reversal. In case you haven't seen this before it is quite easy.

inputString.split("") is splitting the string at every occurrence of the delimiter. Since in our case the delimiter is an empty string it splits the string on every character and returns it as an array.
array.reverse() does exactly what the name suggests. It reverses the order of the elements inside an array.
array.join("") is concatenating an array into a string. In the final string, the characters are then separated by the delimiter we set. Since we don't want the characters to be separated we just choose an empty string as the delimiter.

The other contender is a loop-based approach.

export const reverseLoop = (inputString: String): String => {
    var reversedString = '';
    for (var i = inputString.length - 1; i >= 0; i--) {
        reversedString += inputString[i];
    }
    return reversedString;
};

This function reverses the string by iterating through it from the last character to the first. Then it appends the current character to a new string. This leaves you with the reversed string as an output.

To make sure both algorithms are working properly you will need 2 small unit-tests to make sure that both functions are reversing a string.

describe('Inbuild Reverse', () => {
    it('should inverse abc to cba', () => {
        const result = reverseInbuild('abc');
        expect(result).to.eql('cba');
    });
});

describe('Loop Reverse', () => {
    it('should inverse abc to cba', () => {
        const result = reverseLoop('abc');
        expect(result).to.eql('cba');
    });
});

Even if these tests here are trivial and could probably be ommited, the practice becomes more important later when you're writing more functions that might not work the way they're supposed to.

basic profiling

To log the time the different functions need you will need a profiling function. All this function does is taking another function and check how long it takes for it to execute. Using Date.now() would be an option here, but I opted for process.hrtime() as it is more precise (nanoseconds). It is also not affected by clock drifting.

const profiling = (
    testedFunction: Function,
    inputString: String
): number => {
    const hStart = process.hrtime();
    testedFunction(inputString);
    const hEnd = process.hrtime(hStart);
    return hEnd[1];
};

The profiler takes a function as an argument as well as a string to be reversed. Before running the function however it stores the time in hStart. After the function finished process.hrtime(hStart) calculates the difference between both values. And thus gives me the time the function needed to execute. As hrtime returns a tuple in the format [seconds, nanoseconds]. We can just discard the first value and use the nanoseconds in the second value.

the string generator

To test both algorithms, a generator that could produce as many strings as you want is quite a useful tool. You could have write them by hand but if you want to get thousands of string with varying length this could become a tedious task. Also the requirements for the generator are quite easy.

The function should take an Integer n which acts as the length of the word.
Then it should return a random string of the length n which only contains characters.

To make it simpler it'll only use capitalized letters for this. To check if these conditions are met there are 2 unit tests needed.

describe('String Builder', () => {
    it('should only contain chars', () => {
        const regexCheckForNonChars = '/[^A-Z]*/';
        const length = 1000;
        const result = buildString(length);
        expect(result.match(regexCheckForNonChars)).to.eql(null); // checks if there are no matches for chars outside of A-Z (range of generator)
    });
    it('should be n chars long', () => {
        const n = Math.trunc(Math.random() * 100);
        const result = buildString(n);
        expect(result.length).to.eql(n);
    });
});

The first unit test checks if any characters within the string are not defined in the requirements. If this is true the test will fail as it breaks the requirement specified above. The second test calls the builder with a random length (1-100) and checks if the returned string hast the specified length. With these unit tests in place, we can now start making these tests green and build the generator.

To create a random string we will be using the Math.random() function. This will generate random numbers in a specific range that can then be transformed into characters.

const digit = Math.trunc(Math.random() * 25 + 65);
const char = String.fromCharCode(digit);

The digit represents a random Integer from the range between 65 and 90. If we look at the ASCII chart we can see that this equates to the characters A-Z. This Integer then has to be converted into a character. Strings in JS have a function for this so we can translate the Integer into the character it represents. This will leave us with a random character inside the predefined range of characters.

export const buildString = (length: number): String => {
    let string: String = '';
    for (let i = 0; i < length; i++) {
        const digit = Math.trunc(Math.random() * 25 + 65);
        string += String.fromCharCode(digit);
    }
    return string;
};

Now the only thing left to do is to wrap the code into a loop that iterates n time where n is the length of the word. This finalizes the function and after running all unit tests, the generator will correctly fulfill all of them.

Wordlist generator

To not only test a single string there has to be a function that generates an array of strings to be tested. This requires a generator that generates an array of n words with a specified length. The first step here is to write the unit tests again.

describe('Build String Array', () => {
    it('should return an array with n elements', () => {
        const n = Math.trunc(Math.random() * 100);
        const result = buildStringArray(n, 1);
        expect(result.length).to.eql(n);
    });
    it('should create strings with size n', () => {
        const length = Math.trunc(Math.random() * 10);
        const result = buildStringArray(1, length)[0]; // create a single string and return it
        expect(result.length).to.eql(length);
    });
});

To check if the function returns n words we can simply check the length of the array. I set the length of the words to be 1 since here it is not really important how long the words are. To check the length of the generated words you can just chose to generate a single word and see how long it is. Since every word is generated the same way they should all have the same length.

export const buildStringArray = (
    n: number,
    length: number
): Array<String> => {
    const testedStrings = [];
    for (let i = 0; i < n; i++) {
        const string = buildString(length);
        testedStrings.push(string);
    }
    return testedStrings;
};

This function itselve is very simple. All it does is calling the string generator n times and populate an array with the results.

The next thing to do is to adjust our profiler to the arrays as an input instead of a single string.

const profiling = (
    testedFunction: Function,
    inputStrings: Array<String>
): number => {
    const hStart = process.hrtime();
    inputStrings.forEach((string) => {
        testedFunction(string);
    });
    const hEnd = process.hrtime(hStart);
    return hEnd[1] / inputStrings.length;
};

Now the profiler starts the timer and then runs through all the strings. In the end, the execution time is divided by the numbers of strings that had to be reversed. This technique is something my physics teacher taught me back in high school. The idea behind this is that every single measurement might be inaccurate. By adding them all up and then dividing them by the number of measurements (effectively getting the average) you reduce the amount of impact that random factors have on the final measurement.

measuring the functions

Now that everything is set up properly we can put the functions into the profiler and see how their performance compares to each other. For this, I want to introduce the term chunk size. The chunk size refers to the number of strings that are generated by the wordlist. This is an important metric because with a bigger chunk size the influence of randomness is decreased. In the beginning, we will be working with a chunk size of 100.000 strings. After every chunk, the size of the strings will be increased by 1 so we can see how the functions perform with different sizes of string (up to 100 characters).

const chunkSize = 100000;
for (let i = 1; i <= 100; i++) {
    const testedString = buildStringArray(chunkSize, i);
    const time = profiling(reverseLoop, testedString);
    measuredTimes.push(time);
    console.log('n: ' + i + ' - ' + time + 'ns');
}

These times are then logged so they can be further analyzed later on. After this we'll run the other reverse function through the profiler to get the data. Then we can jump to jupyter notebook to do some data analysis.

Data Analysis

The actual analysis of the data we collected is probably the easiest part of the whole process. For this part, you'll need matplotlib and numpy. The data that was logged in the previous step was saved under the variables inbuild_100 and loop_100 where the 100 represents the number of steps (1 to 100).

To have an x-axis to plot the data against you will need an array containing the numbers that are needed.

x_100 = []
for i in range(1,101):
    x_100.append(i)
print(x_100)

plotting the graphs

Now that the data is prepared we can simply plot them. To compare them better I'd recommend putting them in the same chart. You can also fill the area between the graphs to better highlight the differences in their execution time.

plt.plot(x_100,inbuild_100, label="inbuild_100", color="black", linewidth = 2)
plt.title("Reverse functions")

plt.plot(x_100, loop_100, label="loop_100" ,color = "skyblue" , linewidth = 2)
plt.fill_between(x_100,loop_100,inbuild_100, color="red", alpha=0.25)

plt.xlabel("string size in chars")
plt.ylabel("execution time in ns")

plt.legend()
plt.show()

As you can see the loop takes quite a bit less time to execute than the in-build function. But what occures at a smaller scale might not happen on a bigger scale.

scaling up

The next logical step is to check if the behavior is consistent with bigger strings. So I've run the profiler with bigger strings. I had to decrease the chunk size since it would take forever otherwise. The new parameters are:

chunk-size: 10
string size: 1 - 10.000 characters

Due to the small chunk size, the random factor comes into play here and makes the data noisy. To counteract this I've averaged 100 numbers into a single one. Even though this reduces the resolution of the data it smooths everything out so we can see the trends more clearly.

inbuild_10000_smooth_100 = np.mean(np.array(inbuild_10000).reshape(-1, 100), axis=1)
loop_10000_smooth_100 = np.mean(np.array(loop_10000).reshape(-1, 100), axis=1)
x_10000_smooth_100 = np.mean(np.array(x_10000).reshape(-1, 100), axis=1)

After plotting this you can see that the trend is consistent for bigger strings.

Conclusion

We have discovered that using loops to reverse a string is faster than using string.split("").reverse().join("");. So if you need to reverse a lot of strings you should rather use loops. If you have any other algorithms for string reversal feel free to show them to me on twitter. Let's see how fast we can go with this.

DEV Community