DEV Community

Seung Woo (Paul) Ji
Seung Woo (Paul) Ji

Posted on

Exploring Scalable Vector Extension 2

Introduction

Scalable Vector Extension (SVE) is SIMD extension of ARMv8 and provides a new set of vector instructions to enable vectorization of loops for High Performance Computing (HPC).

Why SVE?

One of the key features of SVE is that it does not require a fixed 128-bit vector length like Neon architecture extension. This enables Vector-length agnostic (VLA) programming in which the vector length is determined by hardware that is best for the workload. Thus, developers can write and build programs once and run them on different hardware with different SVE vector length implementations (better portability!).

SVE2

SVE2 is basically a superset of SVE and Neon extension. With SVE2 instruction, it further extends data-processing domains beyond HPC that now include:

  • Computer vision
  • Multimedia
  • Long-Term Evolution (LTE) baseband processing
  • Genomics
  • In-memory database
  • Web serving
  • General-purpose software

SVE2 Registers

Like SVE, SVE2 is based on the scalable vectors as follows:

  1. Scalable vector registers

Scalable_Vector_Registers

There are a total of 32 scalable vector registers (z0-z31). Their size in bits must be a multiple of 128 and up to 2048 bits. Data in these registers can holder 64, 32, 16, and 8-bit elements. The lower 128 bits of each register holds the corresponding Neon register of the SIMD extension.

  1. Scalable predicate registers Scalable_Predicate_Registers

There are a total of 16 predicate registers which are unique to SVE and SVE2. Each predicate register can hold one bit for each byte available in the respective z register (1/8 of the z register length). P0 - P7 registers are governing predicates for load, store, and arithmetic. P8 - p15 registers are extra predicates for loop management.

Conclusion

SVE allows developers to implement vectorization for the program in more efficient manner as they don't have to worry about the vector size. This also enable better portability because different hardware determines the vector size accordingly for the same program. In the next post, we will discuss how we can implement SVE2 to the volume algorithm we explored previously.

Resources

  1. What is the Scalable Vector Extension?
  2. Introducing SVE2
  3. Introduction to Arm SVE

Top comments (0)