DEV Community

Qzhang125
Qzhang125

Posted on

week 14 Project Stage 3: SVE2

Hello my friend, welcome to the last project stage of SPO600. In this project, we will discuss how to extend SIMD(Single instruction, multiple data) Neon package to support Scalable Vector Extensions v2(SVE2) on the opensource software that I chose from the project stage 2. Before I start stage 3, let's do a short review of what we did in project stage 2. In stage 2, I chose an open-source software which is FFmpeg, it is a cross-platform software to record, stream, and convert video and audio. It used a lot of SIMD methods to accelerate data processing but the Neon architecture extension of SIMD only has a fixed 128-bit vector length for the instruction set. For this case, Arm designed the SVE to improve SIMD implementation.

What is SVE

We all know that the 128-bit vector instruction set could operate the data which is inside 128 bits. To improve that, SVE allows choosing a suitable vector length between 128 bits and 2048 bits. Beyond this, SVE design enables developers to write and build software once and then the software can be used on different AArch64 hardware regardless of the length of the hardware’s vector implementation. Also, SVE includes:

  1. Per-lane prediction.
  2. Gather-load and scatter-store
  3. Speculative vectorization
  4. These features help to vectorize and optimize 5. loops for large datasets.

SVE 2

The main difference between SVE and SVE 2 is the functional coverage of the instruction set. SVE improves the suitability of the architecture for High-Performance Computing(HPC) and Machine learning(ML). SVE 2 expands the domain of data processing and accelerate the common algorithms that are used in the areas below:

  1. Computer vision
  2. Multimedia
  3. Long-Term Evolution (LTE) baseband processing
  4. Genomics
  5. In-memory database
  6. Web serving
  7. General-purpose software

SVE2 usage

 int16x8_t q0s16, q2s16, q3s16, q8s16, q10s16, q11s16, q13s16;
    int16x8_t q14s16, q15s16, qzs16;
    int16x4_t d0s16, d2s16, d3s16, dzs16;
    uint16x8_t q1u16, q9u16;
    uint16x4_t d1u16;

Enter fullscreen mode Exit fullscreen mode

Now let’s talk about how to extend the open-source software to support SVE2. In project stage 2 we discussed the mpegvideo.c file, this file is working with MPEG to compress and decompress moving pictures by using the Neon. The SVE2 could improve this procedure, to help the compiler with vectorization, the SVE2 adds a new feature which is Vector Length Agnostic(VLA). The VLA could save so much time for the compiler when it is working with the picture elements. For the FFmpeg to be extended to support and take advantage of SVE2, the FFmpeg developer could consider adding inline assemblers into the package to support SVE2 because SVE2 provides 32 scalable vector registers. Then add the SVE2 assembly syntax to invoke the instruction set.

Conclusion

In this project, we discussed the many algorithms including SIMD and its instruction set SVE2. To extend the FFmpeg to support SVE2, the software that runs on the system has to be Neon and then the developer should add inline assemblers and change the old vector length for SVE2. Since this is the last blog for this course, I would say this is one of the hardest courses that I have ever taken so far. I learned a lot about how the program works under high-level programming languages and how to benchmark an application. Lastly, I got a picture of how the compiler optimizes programs using SIMD and many other algorithms. It is a fun and valuable experience for me and it helps me to create a picture of the relationship between machine, compiler, and my program.

Top comments (0)