Thanh Van

Posted on Dec 5, 2021

SPO600 Algorithm Selection Lab - Project Stage 1

#algorithms #opensource

Introduction

The content of this post will be my investigation about the impact of different algorithms, which still produce the same effect. I have watched different type of algorithms, compiled them, and checked what are the differences.

Background of the Project

There are six programs are already provided, each with a different approach to the problem

vol0.c is the basic or naive algorithm
vol1.c does the math using fixed-point calculations
vol2.c pre-calculates all 65536 different results, then looks up the answer for each input value
vol3.c is a dummy program - it doesn't scale the volume at all
vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembly
vol5.c uses SIMD instructions accessed through Complier Intrinsics

More details about the provided programs can be found here: Project 1

My Prediction

My prediction of the relative performance of each scaling algorithm is vol0 will be the fastest and vol3 will be the slowest. To be honest, this is just my prediction, however, I think they will be the same regarding to the performance.

How I Get Started

Firstly, I have to copy the archive into my directory by using this command:

cp /public/spo600-volume-examples.tgz .

then I have to unzip the archive I just copied by using this command:

tar xvf spo600-volume-examples.tgz

After that, I move to directory where Makefile is contained, and then use make command to build the program. Then I just simply use ./vol0 to run the vol0 and same thing applied to other vol programs.

My first build and test each programs

I did the test on AArch64 architecture firstly, and then I also tested on x84_64 architecture. However, the results were basically the same when I built and tested programs on two architectures. The screenshot below was tested on AArch64 architecture.
As we can see, the output of each program is not the same, the result is different for each program I run. I also used time command to see the time of the performance as well

real is the total time that the command ran on the system.
user is the time it takes to execute the command on the user’s side.
sys is the system time it takes to call/execute the command.

Relative memory usage of each program

I use the free -m command to check the relative memory usage of the program on my machine.

Memory usage on AArch64 architecture

Memory usage on x86-64 architecture

Questions marked with `Q:`

Q: Why is this needed?
        for (x = 0; x < SAMPLES; x++) {
                ttl=(ttl+out[x])%1000;
        }

The reason why we need this loop is because we have to go through all the SAMPLES that we already defined, and then assigned the results out[] to ttl then we could print the output of the program.

Q: Why is this needed?
        printf("Result: %d\n", ttl);
        return 0;

The reason why we need this printf is because there was nothing to print the result of the program vol1. Using printf to print the output for the program

Q: What is the purpose of the cast to unint16_t in the next line?
     precalc[(uint16_t) x] = (int16_t) ((float) x * VOLUME / 100.0);

We casted to unint16_t because it explicitly specified the number of bits, and it also was guaranteed to be an unsigned 16-bit integer.

Q: What's the point of this dummy program? how does it help with benchmarking?

The dummy program does NOT scale the volume. It can be used to determine some of the overhead of the rest of the processing done by the other programs.

Q: should we use 32767 or 32768 in next line? why?
     vol_int = (int16_t)(VOLUME/100.0 * 32767.0);

We should use 32767 since we have already defined a maximum limit for the samples. The samples are starting from the minimum value of an int16_t or a 16-bit signed integer to the maximum value of the 16-bit signed integer.

Q: what is the purpose of these next two lines?
        in_cursor = in;
        out_cursor = out;
        limit = in + SAMPLES;

The purpose of those two lines is to assign the input cursor to an array in and the output cursor to an array out.

Q: what does it mean to "duplicate" values in the next line?
__asm__ ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h

A duplicate is stored into a vector which will act as an array of equal size. The value to duplicate is %w0 which is the 32-bit register 0. The values to duplicate will be sent into the dup v1.8h.

DEV Community

SPO600 Algorithm Selection Lab - Project Stage 1

Introduction

Background of the Project

My Prediction

How I Get Started

My first build and test each programs

Relative memory usage of each program

Memory usage on AArch64 architecture

Memory usage on x86-64 architecture

Questions marked with `Q:`

Top comments (0)

Read next

773. Sliding Puzzle

1975. Maximum Matrix Sum

15 ways to use Jenkins for Continuous Integration (CI) with examples

Zoraxy vs Nginx Proxy Manager

Introduction

Background of the Project

My Prediction

How I Get Started

My first build and test each programs

Relative memory usage of each program

Memory usage on AArch64 architecture

Memory usage on x86-64 architecture

Questions marked with Q:

Read next

773. Sliding Puzzle

1975. Maximum Matrix Sum

15 ways to use Jenkins for Continuous Integration (CI) with examples

Zoraxy vs Nginx Proxy Manager

Questions marked with `Q:`