Hello!
We are getting closer to an end here and we are finally starting our final Project (We will be working in the Open!).
Step 1
For the first step of the project, we were supposed to choose some packages that would beneficiate from sve2 instructions.
The ideal package is one that process massive amounts of data. This way the sve2 can be used at its maximum capabilities to improve performance.
After some research, I found two candidates that could benefit from sve2:
Gstreamer1 and FFmpeg.
Gstream1
According to them:
GStreamer is a streaming media framework, based on graphs of filters which operate on media data.
Applications using this library can do anything from real-time sound processing to playing videos, and just about anything else media-related.
Its plugin-based architecture means that new data types or processing capabilities can be added simply by installing new plugins.
Gstream1 use inline assembler code for many functions, as we can see here:
static inline void
inner_product_gint16_full_1_neon (gint16 * o, const gint16 * a,
const gint16 * b, gint len, const gint16 * icoeff, gint bstride)
{
uint32_t remainder = len % 16;
len = len - remainder;
asm volatile (" vmov.s32 q0, #0\n"
" cmp %[len], #0\n"
" beq 2f\n"
" vmov.s32 q1, #0\n"
"1:"
" vld1.16 {d16, d17, d18, d19}, [%[b]]!\n"
" vld1.16 {d20, d21, d22, d23}, [%[a]]!\n"
" subs %[len], %[len], #16\n"
" vmlal.s16 q0, d16, d20\n"
" vmlal.s16 q1, d17, d21\n"
" vmlal.s16 q0, d18, d22\n"
" vmlal.s16 q1, d19, d23\n"
" bne 1b\n"
" vadd.s32 q0, q0, q1\n"
"2:"
" cmp %[remainder], #0\n"
" beq 4f\n"
"3:"
" vld1.16 {d16}, [%[b]]!\n"
" vld1.16 {d20}, [%[a]]!\n"
" subs %[remainder], %[remainder], #4\n"
" vmlal.s16 q0, d16, d20\n"
" bgt 3b\n"
"4:"
" vadd.s32 d0, d0, d1\n"
" vpadd.s32 d0, d0, d0\n"
" vqrshrn.s32 d0, q0, #15\n"
" vst1.16 d0[0], [%[o]]\n"
: [a] "+r" (a), [b] "+r" (b),
[len] "+r" (len), [remainder] "+r" (remainder)
: [o] "r" (o)
: "cc", "q0", "q1",
"d16", "d17", "d18", "d19",
"d20", "d21", "d22", "d23");
}
FFmpeg
According to them:
FFmpeg is a complete and free Internet live audio and video broadcasting solution for Linux/Unix. It also includes a digital VCR. It can encode in real time in many formats including MPEG1 audio and video, MPEG4, h263, ac3, asf, avi, real, mjpeg, and flash.
The Second option has many files that use neon
and use gcc to compile.
Planning my approach
Auto-vectorization:
My plan for implementing sve2 in this project is to use the auto-vectorization, this means I will change the Makefile
and include options that will make the compiler applies the optimizations for me.
Here is a exemple of Makefile from gstreamer1:
As we can see they are using -O0 for optimizations which means no optimizations at all.
So I will include the options -O3 -march=armv8-a+sve2
and test it to see if the improvements were made.
About Makefiles
Makefiles
can be complicated, I thought this would be an easy approach first but there is so many Makefiles
in a project and they are linked to each other.
Take a look at this example from FFmpeg:
MAIN_MAKEFILE=1
include ffbuild/config.mak
vpath %.c $(SRC_PATH)
vpath %.cpp $(SRC_PATH)
vpath %.h $(SRC_PATH)
vpath %.inc $(SRC_PATH)
vpath %.m $(SRC_PATH)
vpath %.S $(SRC_PATH)
vpath %.asm $(SRC_PATH)
vpath %.rc $(SRC_PATH)
vpath %.v $(SRC_PATH)
vpath %.texi $(SRC_PATH)
vpath %.cu $(SRC_PATH)
vpath %.ptx $(SRC_PATH)
vpath %.metal $(SRC_PATH)
vpath %/fate_config.sh.template $(SRC_PATH)
TESTTOOLS = audiogen videogen rotozoom tiny_psnr tiny_ssim base64 audiomatch
HOSTPROGS := $(TESTTOOLS:%=tests/%) doc/print_options
# $(FFLIBS-yes) needs to be in linking order
FFLIBS-$(CONFIG_AVDEVICE) += avdevice
FFLIBS-$(CONFIG_AVFILTER) += avfilter
FFLIBS-$(CONFIG_AVFORMAT) += avformat
FFLIBS-$(CONFIG_AVCODEC) += avcodec
FFLIBS-$(CONFIG_POSTPROC) += postproc
FFLIBS-$(CONFIG_SWRESAMPLE) += swresample
FFLIBS-$(CONFIG_SWSCALE) += swscale
FFLIBS := avutil
DATA_FILES := $(wildcard $(SRC_PATH)/presets/*.ffpreset) $(SRC_PATH)/doc/ffprobe.xsd
SKIPHEADERS = compat/w32pthreads.h
# first so "all" becomes default target
all: all-yes
include $(SRC_PATH)/tools/Makefile
include $(SRC_PATH)/ffbuild/common.mak
FF_EXTRALIBS := $(FFEXTRALIBS)
FF_DEP_LIBS := $(DEP_LIBS)
FF_STATIC_DEP_LIBS := $(STATIC_DEP_LIBS)
$(TOOLS): %$(EXESUF): %.o
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(EXTRALIBS-$(*F)) $(EXTRALIBS) $(ELIBS)
target_dec_%_fuzzer$(EXESUF): target_dec_%_fuzzer.o $(FF_DEP_LIBS)
target_dec_%_fuzzer$(EXESUF): target_dec_%_fuzzer.o $(FF_DEP_LIBS)
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)
tools/target_bsf_%_fuzzer$(EXESUF): tools/target_bsf_%_fuzzer.o $(FF_DEP_LIBS)
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)
target_dem_%_fuzzer$(EXESUF): target_dem_%_fuzzer.o $(FF_DEP_LIBS)
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)
tools/target_dem_fuzzer$(EXESUF): tools/target_dem_fuzzer.o $(FF_DEP_LIBS)
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)
tools/target_io_dem_fuzzer$(EXESUF): tools/target_io_dem_fuzzer.o $(FF_DEP_LIBS)
$(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)
…
(it keeps going and going…)
Looks like some kind of Martian language to me.
Finally
After some research I decided to go with ffmpeg
.
Because gstream1 don't use make
and Makefiles
to compile, it uses meson
and ninja
which makes life difficult for me as I have no knowledge at all in those technologies.
To change the ffmpeg Makefile
, I will have to change the 'config.mak' file inside the ffbuild directory as it send configurations to the Makefile
which builds the project.
Thats it for now!
Thank you for reading!
Top comments (0)