Introduction
The content of this post will be my investigation about the impact of different algorithms, which still produce the same effect. I have watched different type of algorithms, compiled them, and checked what are the differences.
Background of the Project
There are six programs are already provided, each with a different approach to the problem
-
vol0.c
is the basic or naive algorithm -
vol1.c
does the math using fixed-point calculations -
vol2.c
pre-calculates all 65536 different results, then looks up the answer for each input value -
vol3.c
is a dummy program - it doesn't scale the volume at all -
vol4.c
uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembly -
vol5.c
uses SIMD instructions accessed through Complier Intrinsics
More details about the provided programs can be found here: Project 1
My Prediction
My prediction of the relative performance of each scaling algorithm is vol0
will be the fastest and vol3
will be the slowest. To be honest, this is just my prediction, however, I think they will be the same regarding to the performance.
How I Get Started
Firstly, I have to copy the archive into my directory by using this command:
cp /public/spo600-volume-examples.tgz .
then I have to unzip the archive I just copied by using this command:
tar xvf spo600-volume-examples.tgz
After that, I move to directory where Makefile
is contained, and then use make
command to build the program. Then I just simply use ./vol0
to run the vol0
and same thing applied to other vol
programs.
My first build and test each programs
I did the test on AArch64 architecture firstly, and then I also tested on x84_64 architecture. However, the results were basically the same when I built and tested programs on two architectures. The screenshot below was tested on AArch64 architecture.
As we can see, the output of each program is not the same, the result is different for each program I run. I also used time
command to see the time of the performance as well
-
real
is the total time that the command ran on the system. -
user
is the time it takes to execute the command on the user’s side. -
sys
is the system time it takes to call/execute the command.
Relative memory usage of each program
I use the free -m
command to check the relative memory usage of the program on my machine.
Memory usage on AArch64 architecture
Memory usage on x86-64 architecture
Questions marked with Q:
Q: Why is this needed?
for (x = 0; x < SAMPLES; x++) {
ttl=(ttl+out[x])%1000;
}
The reason why we need this loop is because we have to go through all the SAMPLES
that we already defined, and then assigned the results out[]
to ttl
then we could print the output of the program.
Q: Why is this needed?
printf("Result: %d\n", ttl);
return 0;
The reason why we need this printf
is because there was nothing to print the result of the program vol1
. Using printf
to print the output for the program
Q: What is the purpose of the cast to unint16_t in the next line?
precalc[(uint16_t) x] = (int16_t) ((float) x * VOLUME / 100.0);
We casted to unint16_t because it explicitly specified the number of bits, and it also was guaranteed to be an unsigned 16-bit integer.
Q: What's the point of this dummy program? how does it help with benchmarking?
The dummy program does NOT scale the volume. It can be used to determine some of the overhead of the rest of the processing done by the other programs.
Q: should we use 32767 or 32768 in next line? why?
vol_int = (int16_t)(VOLUME/100.0 * 32767.0);
We should use 32767 since we have already defined a maximum limit for the samples. The samples are starting from the minimum value of an int16_t
or a 16-bit signed integer to the maximum value of the 16-bit signed integer.
Q: what is the purpose of these next two lines?
in_cursor = in;
out_cursor = out;
limit = in + SAMPLES;
The purpose of those two lines is to assign the input cursor to an array in
and the output cursor to an array out
.
Q: what does it mean to "duplicate" values in the next line?
__asm__ ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h
A duplicate is stored into a vector which will act as an array of equal size. The value to duplicate is %w0
which is the 32-bit register 0
. The values to duplicate will be sent into the dup v1.8h
.
Top comments (0)