SPO600 - Project Stage 1/3

Hello!

Hello everyone, my name is Brian and I'm going to be describing and explaining the process I used to complete the first stage of my project in the SPO600 course.

What does the first stage consist of?

The first stage of this project is essentially testing multiple different programs and distinguishing the similarities and differences between them all, if any. There were 6 programs provided, being:

vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.
vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.
vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value. vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.
vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.

Predictions

I personally think that each of these programs will run quite similarly. Not the exact same, but no drastic differences that would pop out.

How to obtain the files and begin the testing

I had to get and unpack these files from a remote server first, which I connected to using an SSH client (OpenSSH). I connected to the hostname israel.cdot.systems, copied the files from the SPO directory into my home directory using cp /public/spo600-volume-examples.tgz ., unpacked them using tar xvf spo600-volume-examples.tgz, entered the folder's directory and finally built the programs using make.

Running the programs

I ran the programs on both the AArch64 and x86_64 systems, and both results were very similar and nothing worth noting about. However, here is the approach I took for the AArch64 system.

Since we have built the programs, we are able to run them using a command like ./vol0, ./vol1, and so on, like so.

The results from each program were interesting. Some of them had the same result, such as 0, 2, 4, and 5. 1 and 3 were left with very different results. Despite all of this, it wasn't significant enough to be a big difference.

Let's take a deeper look. We are going to analyze the amount of time required to run these programs using time.

real represents the time it takes for the command to run on the system.
user represents the time it takes for the command to run on the user’s side.
sys represents the time that the CPU took to run the command.

We can see more in depth now, but surprisingly each of the programs had the exact same result. We can try increasing the sample size in the code to simulate a more consistent result.

How do we do that exactly? I took a look by connecting to the remote server using an SSH extension on VS Code.

Here I've changed the samples for each program (default 16) to 16000000. Let's compile them, run them again and see what's changed.

We can now see that the numbers have changed drastically and there's very noticeable differences between each one. Vol4 seems to be the fastest, with vol2 being the slowest. My prediction was somewhat correct with the limited sample size we had at the beginning, but strayed off as we ran a more consistent program.

Finally, after running all these commands, this is how much memory was used on the AArch64 system.

Questions hidden within code

Q: Why is this needed? for (x = 0; x < SAMPLES; x++) { ttl=(ttl+out[x])%1000; }

This loop is needed to run through all samples and calculate the proper output.

Q: Why is this needed? printf("Result: %d\n", ttl); return 0; }

This is needed to print the output for the user to view.

Q: What is the purpose of the cast to unint16_t in the next line? precalc[(uint16_t) x] = (int16_t) ((float) x * VOLUME / 100.0); }

The purpose is to guarantee that it ends up as an unsigned 16 bit integer.

Q: should we use 32767 or 32768 in next line? why? vol_int = (int16_t)(VOLUME/100.0 * 32767.0);

We should use 32767 in the next line as it has more space and less prone to overflow.

Q: what is the purpose of these next two lines? in_cursor = in; out_cursor = out;

To me, it seems like these two lines are simply setting values to simpler variables.

Q: What's the point of this dummy program? how does it help with benchmarking?

The point of the dummy program is to show how different types of programs/algorithms could affect factors like the time required to process it.

Conclusion

Overall, this has given me a lot of insight into what we're working on. It was surprisingly easy compared to the previous labs (with the exception of some googling) which was a relief. Thank you all for reading!