DEV Community

gus
gus

Posted on • Updated on

Adding SVE2 Support to an Open Source Library - Part I

Part 1
Part 2
Part 3


SVE was developed by Arm as a new SIMD instruction set used as an extension to AArch64, that allows for variable vector length implementations. SVE2 is a superset of SVE and its precursor, Neon. Among many benefits of SVE and SVE2, one is that the same binaries can run on different AArch64 hardware with differing vector length implementations. It is especially suited to processing large datasets and for this reason I'll be implementing its use in an open source library to improve performance.

My first task is to find an open source library to implement SVE2 support for, ideally one that's used for processing large amounts of data like a crypto or multimedia library. As I'm interested in audio and audio programming, I'll start looking there and hopefully find a good candidate. Criteria for my search are as follows:

  • Open source
  • Library level package, application level SVE2 optimization is less useful
  • Ideally has Neon implementation already to glean ideas for how I'll approach SVE2 implementation

I started by thinking of what open source audio applications I know of, and the first that came to mind was Audacity. I used dnf list as my prof recommended to look up the package on the AArch64 server and confirmed one was available.

Image description

I then used dnf deplist to see what dependencies it had to try and narrow down which would be a good target for optimization. There were several libraries which could be good candidates:

Advanced Linux Sound Architecture Library (ALSA)

Image description

Free Lossless Audio Codec (FLAC)

Image description

Libogg

Image description

From there I checked the FLAC library to get access to the source code and find out more about how an SVE2 optimization could work out. The git URL on their website was down so I left it for now to check out the other libraries and circle back to it if they don't pan out.

I found the page with the relevant info to clone the ALSA library and did so.

git clone git://git.alsa-project.org/alsa-lib.git alsa-lib
Enter fullscreen mode Exit fullscreen mode

Unfortunately, after many searches trying to find anything related to sve, Neon, or AArch64 specific implementations, I wasn't able to find anything. Again I'm going to keep going and circle back to this if I hit a wall.

Last in my list is Libogg. I found out it's located here and is maintained by the same organization that maintains FLAC. Thankfully this git link wasn't broken. Unfortunately I once again came up empty when looking for references to Neon or SIMD, so I expanded my search to look through the various xiph projects - the maintainer of the aforementioned FLAC and ogg libraries. In doing so I found a great candidate, this library called opus with specific references to AArch64 and Neon.

Opus

Image description

In opus/cmake/OpusFunctions.cmake I was able to find a check to establish whether the CPU and the compiler support Neon.

Image description

This indicates that this package takes advantage of SIMD, Neon being one implementation.

After configuring the library I was able to find a Makefile and see what compilation options it was using. In this case it had the following:

CFLAGS = -g -O2 -fvisibility=hidden -D_FORTIFY_SOURCE=2 -W -Wall -Wextra -Wcast-align -Wnested-externs -Wshadow -Wstrict-prototypes
Enter fullscreen mode Exit fullscreen mode

Moving this up a level to -O3 would get the SVE2 autovectorization optimization to kick in, and furthermore I found that this package takes advantages of intrinsics, for example in the opus/celt/arm/pitch_neon_intr.c source file:

for (i = 0; i < N - 7; i += 8) {
        x_s16x8  = vld1q_s16(&x[i]);
        y_s16x8  = vld1q_s16(&y[i]);
        xy_s32x4 = vmlal_s16(xy_s32x4, vget_low_s16 (x_s16x8), vget_low_s16 (y_s16x8));
        xy_s32x4 = vmlal_s16(xy_s32x4, vget_high_s16(x_s16x8), vget_high_s16(y_s16x8));
    }
Enter fullscreen mode Exit fullscreen mode

This would be a good place to start - create an SVE2 equivalent of pitch_neon_intr.c and/or celt_neon_intr.c with the SVE2 versions of the intrinsics therein, I can get the ball rolling on optimizing this package for SVE2. I sent an email to the opus developer mailing list expressing my intention to do so, and now all that's left is to do it! More on that soon.

Top comments (0)