DEV Community

Seung Woo (Paul) Ji
Seung Woo (Paul) Ji

Posted on

Exploring and Benchmarking Audio Volume Adjusting Algorithms Part 2

Introduction

In the last post, we explored multiple volume adjusting algorithms and made assumptions of how well they would perform. Now, we are going to measure the performance of each algorithm and test if they are met with our expectations.

The Audio Sample Size

Before we start testing, we will set the number of sample size with a large number so that we can have meaningful result. For this, we will use the size of 1,600,000,000 for each program. If we run the time command with the dummy program, we have the following result:

real 1m27.058s
user 1m22.503s
sys 0m4.496s

The dummy program takes about a minute and a half seconds in total. However, we have to consider that this time does not only account for the volume scale function - there are different processes involved (e.g. generating random samples, calculating results and so on).

Evaluating Algorithm Performance

How do we only measure the performance of the volume scale function (scale_sample)?

// ---- This is the part we're interested in!
// ---- Scale the samples from in[], placing results in out[]
        for (x = 0; x < SAMPLES; x++) {
                out[x]=scale_sample(in[x], VOLUME);
        }
Enter fullscreen mode Exit fullscreen mode

We can easily implement this by utilizing the C Time library. With this library, we can isolate the function and measure the elapsed time as following:

// ---- Include the C Time library
#include <time.h>

        clock_t         t;

// ---- Calculate the start time
        t = clock();

// Scale Sample Code

//----  Calculate the elapsed time
        t = clock() - t;

// ---- Print the elapsed time in seconds
        printf("Time elapsed: %f\n", ((double)t)/CLOCKS_PER_SEC);

Enter fullscreen mode Exit fullscreen mode

In this way, we can only estimate the elapsed time of the scale function in seconds.

Benchmark Test Results

For benchmarking, a total of 20 cases were tested for each algorithm. All algorithms also processed 1,600,000,000 samples and were assessed on AArch64 and x84_64 systems. During the tests, the number of background operations were minimized.

The following table shows the results. Both tables show very small number of standard deviation (SD) meaning the data are clustered around the mean value.

AArch64

Algorithm vol0 vol1 vol2 vol4 vol5
Time (seconds) 5.290686 4.571809 11.204779 2.862223 2.897304
5.271289 4.616451 11.236343 2.869659 2.860497
5.3009 4.618019 11.207497 2.839968 2.88575
5.257061 4.57951 11.229004 2.794136 2.837761
5.29981 4.584778 11.237608 2.879343 2.857112
5.252714 4.590422 11.220075 2.785239 2.859161
5.300421 4.590156 11.215143 2.870726 2.919503
5.286753 4.589992 11.224697 2.794225 2.895057
5.317688 4.61077 11.268087 2.907598 2.91678
5.272125 4.63759 11.235228 2.799026 2.881828
5.308232 4.58515 11.229461 2.882254 2.910783
5.286579 4.599118 11.253098 2.85217 2.903325
5.282362 4.597291 11.190576 2.875931 2.920964
5.276742 4.611212 11.239454 2.849582 2.853147
5.293711 4.591562 11.253258 2.870164 2.918136
5.293716 4.621955 11.228463 2.858067 2.850342
5.318874 4.591154 11.225114 2.864949 2.912111
5.306651 4.590993 11.252793 2.841034 2.847878
5.30221 4.641963 11.220678 2.877916 2.842209
5.299778 4.593774 11.206139 2.868532 2.856316
Total 105.818302 92.013669 224.577495 57.042742 57.625964
Average 5.2909151 4.60068345 11.22887475 2.8521371 2.8812982
SD 0.01805085609 0.01880182964 0.01914206262 0.0338674976 0.02977236719

In the previous post, we assumed the algorithms that use SIMD instructions would perform faster than others. Indeed, we can observe that vol4 and vol5 algorithms outperform others. The performance difference between them are really small (~0.0291 seconds) indicating that both inline assembly and compiler intrinsic are almost equally fast.

We can also see that vol1 runs faster than vol0. This corresponds to our expectation as vol1 uses a fixed-point calculation with bit-shift operations.

Interestingly, vol2 algorithm is found to be significantly slower than others. Initially, we assumed that this algorithm may perform faster than vol0 and vol1 which multiplies each sample with scaling factor because it pre-calculates all the results and stores them in a table. This result would mean that the CPU has an efficient arithmetic logic unit (ALU) that processes the multiplication fast or is slow at reading the memory when looking over the pre-calculated values within the table.

x86_64

Algorithm vol0 vol1 vol2
Time (seconds) 2.821902 2.784482 3.531761
2.903628 2.786877 3.569542
2.895999 2.78038 3.551214
2.877543 2.785402 3.559591
2.886563 2.785422 3.537273
2.891856 2.783449 3.545279
2.80208 2.786667 3.58345
2.855822 2.782619 3.590136
2.804731 2.781633 3.572802
2.782909 2.801589 3.587121
2.783267 2.783468 3.630578
2.785422 2.800091 3.562486
2.81526 2.77875 3.591089
2.873962 2.778289 3.529016
2.791908 2.789269 3.579964
2.785272 2.792904 3.55086
2.804883 2.778821 3.587747
2.78638 2.785906 3.545412
2.788079 2.795611 3.574527
2.810512 2.794108 3.54657
Total 56.547978 55.735737 71.326418
Average 2.8273989 2.78678685 3.5663209
SD 0.04456744515 0.006838116502 0.02516021857

The x86_64 system shows similar aspects as the AArch64 system -vol1 algorithm is the fastest and vol2 is the slowest. Note that we are missing vol4 and vol5 algorithms because these programs utilize SIMD instructions that are unique to the AArch64 system.

Conclusion

In this post, we measured the performance of each algorithm to test the assumptions we made in the previous post. As expected, the algorithms that use SIMD instructions appear to run faster than others as they can process multiple data at a time.

Top comments (0)