Introduction
Hi,this is Tecca, this post is for the purpose of SPO600 project stage 2, to understand it more please check part 1. In this post, I will be adding auto-vectorization as SVE2 support for project libjpeg-turbo.
Stage 2: Implementing auto-vectorization
Last time, I successfully set up our environment and executed one of the executable(djpeg) with the option(-fast) in Unix system.
This time, I will start by adding the new compiler options to the entire program.
The default compile option was set to -O3 -DNDEBUG as you can see on the above image.
What I need to do is to modify it to adapt SVE2 implementation. After a bit of research online and through the project directories. I found out that Compiler flags and CMAKE_ASM_FLAGS both resides in the CMakeCache.txt file which is generated after first run of
cmake -G"Unix Makefiles"
//Flags used by the C compiler during all build types.
CMAKE_C_FLAGS:STRING=
...
//Flags used by the C compiler during RELEASE builds.
CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
...
//Flags used by the ASM compiler during all build types.
CMAKE_ASM_FLAGS:STRING=
...
//Flags used by the ASM compiler during RELEASE builds.
CMAKE_ASM_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
...
We can either modify the CMAKE_ASM_FLAGS_RELEASE:STRING and CMAKE_ASM_FLAGS_RELEASE:STRING in the CMakeCache.txt file manually or export CFLAGS as environment variable before running cmake for the first time.
Modifiying CMAKE_C_FLAGS would take affect on all build types, but since we are only working with release build at this time it would be better for us to only modify the ones that will be used by C compiler during RELEASE builds.
Example command for exporting CFLAGS as environment variable
export CFLAGS="-g -fopt-info-vec-all -march=armv8-a+sve2"
After changing the compiler options to the one we need and cmake again.
We can see that both compiler flags are changed to the one we want for SVE2.
Now we make again just like we did in part 1.
make -j$((`nproc`+1))
The build was successful.
Now we need to see if the whilelo instructions are applied in all possible locations.
The find and grep was able to fetch 1972 whilelo instructions among all possible locations. Which means that we have build the project correctly. Now let's see if djpeg works fine as how we did in part 1.
Note: We can't run it the same way we did in part 1 since currently there is no hardware that supports SVE2 instructions, We now have to use qemu-aarch64 which will allow us to use SVE2 instructions.
The testimgint.jpg was successfully decompressed and generated a new decompressed.pgm just like what we've done in part 1.
Conclusion
In this post, we've successfully add SVE2(auto-vectorization)support to the project and run with qumu-aarch64 without breaking anything.
Top comments (0)