Hey There
Its time for Step 2 of our SPO600 Project.
Before we start, lets do a quick review on what we need to do in this project
Step 1: Research a library level package to be a candidate for sve2 implementation.
Step 2: Implement sve2 to the chosen package.
Step 3: Upstream your changes or prepare it for future implementation.
Continuing with FFmpeg
After choosing this package, I had to follow some steps to make sure it was able to receive sve2 through auto vectorization.
Check if there was previous implementation of sve (There was as said in Step 1)
Check if the compiler could apply the auto vectorization on this package.
Check the correct Makefile to change in order to apply the auto vectorization to all files in the package.
My Approach
After taking a look at the .S and .c files with neon optimization on them:
I realized that this package was able to receive the auto vectorization from the compiler, so I decided to give it a try.
Then I started looking for a Makefile, but for my surprise there was quite a few:
Where should I start searching, many of these files have some configuration that I cannot even understand properly.
So I decided to start from the beginning: ./Makefile
It looked like this:
And kept going and going, but no sign of compiler and optimizations.
But there was something there that caught my attention:
At the very top of the file there was an include that could help me, and then there I went to see if I could find the gcc
instructions.
The congif.mak
is generated by the ./configure
script and it enables the neon optimization and others:
But when I checked the config.mak
file looking for the gcc
optimizations I found that they were disabling the vectorization:
So I decide to change it to enable the vectorization:
And then I run the make
command.
After built, it was time to try it:
For my surprise, the first run I got core dumped error, which this time was a very welcome error message.
It meant that the program was built in a way that it could not be run by the current system.
So I tried to run it using the qemu-aarch64
emulator and for my surprise the program worked fine!
I tried to test it with a sample file few times to see if it worked and here is the result:
It converted my sample.avi to output.avi with 24 framerates as I requested.
It was time to check if there was indeed sv2 optimizations inside the binary file.
So I used Objdump -d
and I found that there was really sve2 in there:
Here are some examples:
We can see z
and p
registers being used together with the whilelo
instruction.
To sum up
Step 2 was an adventure. At first, I thought that going for auto-vectorization would be an easy task, I though that a Makefile would be waiting for me just to change the arguments of the compiler but in the end I had dozens of Makefiles, each with different configurations and it required a loot of reading and research to make it work.
I had to learn that there was configure scripts needed to make the configurations appears and the file I was looking for was not even a Makefile
, it was a .mak one.
I pretend to write a little more about Makefiles as it seems to me a powerful tool and way more complicated and deeper than I have imagined.
Thank you for reading!
Top comments (0)