DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Qzhang125
Qzhang125

Posted on • Updated on

Week 11 Reflection SIMD

Hello my friend, welcome back to the week 11 blog about SPO600(Software Portability and Optimization). In this blog, I'm going to blog about SIMD.

Introduction

SIMD(Single instruction multiple data) also called vectorization. The concept of SIMD is a single instruction which does one operation but we could improve it to process multiple data in parallel. It is an ability that many modern processors have. It is also called vectorization which is working with vectors. Vectors are arrays of values, typically one dimensional arrays then apply the same operation to different parts of the array simultaneously. For example,
Image description
In the diagram above, the SIMD unit could calculate multiple values from vector A and vector B with the same operation and then put the result into the vector C in parallel which means it is executing four operations simultaneously.

MMX instruction set

MMX defines 8 processor registers, named from MM0 to MM7, and operations that operate on them. Each register is 64 bits wide and can be used to hold either one 64-bit integer or multiple smaller integer: one instruction can be applied to two 32-bit integers, or four 16-bit integers or eight 8-bit integers at once.

How to check SIMD instruction

Before using the SIMD instruction set, we need help from the processor and compiler. I used my personal laptop as an example:
Check SIMD instruction sets that are supported by the CPU:

cat /proc/cpuinfo
Enter fullscreen mode Exit fullscreen mode

Then find the flags, we could find the instruction set that the CPU supports.

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse 
Enter fullscreen mode Exit fullscreen mode

Check SIMD instruction sets which are supported by GCC

gcc -march=native -c -Q --help=target
Enter fullscreen mode Exit fullscreen mode

We could see the instruction set which is enabled or disabled.
For example:

The following options are target specific:
  -m128bit-long-double                  [enabled]
  -m16                                  [disabled]
  -m32                                  [disabled]
  -m3dnow                               [disabled]
  -m3dnowa                              [disabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -m8bit-idiv                           [disabled]
  -m96bit-long-double                   [disabled]
  -mabi=                                sysv
  -mabm                                 [disabled]
  -maccumulate-outgoing-args            [disabled]
  -maddress-mode=                       long
  -madx                                 [disabled]
  -maes                                 [disabled]
  -malign-data=                         compat
  -malign-double                        [disabled]
  -malign-functions=                    0
  -malign-jumps=                        0
  -malign-loops=                        0
  -malign-stringops                     [enabled]
  -mamx-bf16                            [disabled]
  -mamx-int8                            [disabled]
  -mamx-tile                            [disabled]
  -mandroid                             [disabled]
  -march=                               nehalem
Enter fullscreen mode Exit fullscreen mode

Conclusion

SIMD is a very significant ability that the modern processor should have. It helps to process multiple data simultaneously and saves a lot of time. In the Lab6(aka project 1) we explored more and test how much time the SIMD saved for the program. It is fun to learn and actually test how it works.

Top comments (0)

12 Rarely Used Javascript APIs You Need

Practical examples of some unique Javascript APIs that beautifully demonstrate a practical use-case.