DEV Community

Gustavo Tavares
Gustavo Tavares

Posted on

SPO600 Project – Step 2 – SVE2 Implementation

Hey There

Its time for Step 2 of our SPO600 Project.
Before we start, lets do a quick review on what we need to do in this project

Step 1: Research a library level package to be a candidate for sve2 implementation.

Step 2: Implement sve2 to the chosen package.

Step 3: Upstream your changes or prepare it for future implementation.


Continuing with FFmpeg

After choosing this package, I had to follow some steps to make sure it was able to receive sve2 through auto vectorization.

  • Check if there was previous implementation of sve (There was as said in Step 1)

  • Check if the compiler could apply the auto vectorization on this package.

  • Check the correct Makefile to change in order to apply the auto vectorization to all files in the package.

My Approach

After taking a look at the .S and .c files with neon optimization on them:

S and c

I realized that this package was able to receive the auto vectorization from the compiler, so I decided to give it a try.

Then I started looking for a Makefile, but for my surprise there was quite a few:
makefile

Where should I start searching, many of these files have some configuration that I cannot even understand properly.

So I decided to start from the beginning: ./Makefile
It looked like this:

makefile

And kept going and going, but no sign of compiler and optimizations.

But there was something there that caught my attention:

include

At the very top of the file there was an include that could help me, and then there I went to see if I could find the gcc instructions.

The congif.mak is generated by the ./configure script and it enables the neon optimization and others:

script

But when I checked the config.mak file looking for the gcc optimizations I found that they were disabling the vectorization:

disabled

So I decide to change it to enable the vectorization:

enabled

And then I run the make command.

After built, it was time to try it:

core dumped

For my surprise, the first run I got core dumped error, which this time was a very welcome error message.

It meant that the program was built in a way that it could not be run by the current system.

So I tried to run it using the qemu-aarch64 emulator and for my surprise the program worked fine!

I tried to test it with a sample file few times to see if it worked and here is the result:

testing

It converted my sample.avi to output.avi with 24 framerates as I requested.

It was time to check if there was indeed sv2 optimizations inside the binary file.

So I used Objdump -d and I found that there was really sve2 in there:

Here are some examples:

whilelo1

whilelo2

whilelo3

We can see z and p registers being used together with the whilelo instruction.


To sum up

Step 2 was an adventure. At first, I thought that going for auto-vectorization would be an easy task, I though that a Makefile would be waiting for me just to change the arguments of the compiler but in the end I had dozens of Makefiles, each with different configurations and it required a loot of reading and research to make it work.

I had to learn that there was configure scripts needed to make the configurations appears and the file I was looking for was not even a Makefile, it was a .mak one.

I pretend to write a little more about Makefiles as it seems to me a powerful tool and way more complicated and deeper than I have imagined.

Thank you for reading!

Top comments (0)