Ian Jacobs

Posted on Nov 29, 2023

Automatic Function Redirecting: Navigating Dynamic Code Optimization in GCC

#spo600 #architecture #opensource #programming

Intro

In this blog post, I will be explaining the usage of indirect functions (or ifunc) and how they work to optimize code for the AARCH64 platform. This is a project for my software portability and optimization class where I am going to present a design implementation for an ifunc option when compiling with GCC.

PART-1: ifunc and its Intermediate Representation

To test ifunc usage in a program I used an "autoifunc" script that creates a function redirect scheme for ARM64 scalar vectorization instructions (SVE) and regular ARM Advanced SIMD instructions (ASIMD) before being compiled by GCC.

This script checks if a c-source file is vectorizable while using either SVE or SVE2. It then rewrites the functions for SVE and ASMID and creates a resolver function for processing at run time.

For testing this script I've used a program that adjusts the RGB values for a provided image which demonstrates this effect with one of the separated functions (in its own C file, referred to as function.c).

/*

adjust_channels :: adjust red/green/blue colour channels in an image

The function returns an adjusted image in the original location.

Copyright (C)2022 Seneca College of Applied Arts and Technology
Written by Chris Tyler
Distributed under the terms of the GNU GPL v2

*/

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

//----Naive implementation in C

#include <sys/param.h>

void adjust_channels(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor) {

    for (int i = 0; i < x_size * y_size * 3; i += 3) {
        image[i]   = MIN((float)image[i]   * red_factor,   255);
        image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
        image[i+2] = MIN((float)image[i+2] * green_factor, 255);
    }
}

After running the program through the script it produces a descriptive/verbose explanation of its function redirecting process and is similar to what was described earlier.

scripts/autoifunc function.c

* Auto-ifunc Tool v 0.001
> Input file: 'function.c'
> Input file does appear to contain C source.
> Vectorization using SVE/SVE2 is being applied.
> SVE2 optimizations are basically the same as the SVE optimizations, skipping.
> Compiling input file to assembly to get function names.
> Function #1: adjust_channels
> Writing output file.
> Output file function_ifunc.c has been created. Use this in place of function.c in your buld.

Here is the function.c (now function_ifunc.c) file after being run through the autoifunc script:

#include <sys/auxv.h>
#include <stdio.h>

void adjust_channels(unsigned char *image,int x_size,int y_size,
float red_factor,float green_factor,float blue_factor) _    
_attribute__(( ifunc("adjust_channels__resolver") ));

#pragma GCC target "arch=armv8-a+sve"
/*

adjust_channels__sve :: adjust red/green/blue colour 
channels in an image

The function returns an adjusted image in the original location.

Copyright (C)2022 Seneca College of Applied Arts and Technology
Written by Chris Tyler
Distributed under the terms of the GNU GPL v2

*/

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

// -------------------------------------Naive implementation in C

#include <sys/param.h>

void adjust_channels__sve(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor) {

    for (int i = 0; i < x_size * y_size * 3; i += 3) {
            image[i]   = MIN((float)image[i]   * red_factor,   255);
            image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
            image[i+2] = MIN((float)image[i+2] * green_factor, 255);
    }
}


#pragma GCC target "arch=armv8-a"
/*

adjust_channels__asimd :: adjust red/green/blue colour 
channels in an image

The function returns an adjusted image in the original location.

Copyright (C)2022 Seneca College of Applied Arts and Technology
Written by Chris Tyler
Distributed under the terms of the GNU GPL v2

*/

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

// ----------------------------------Naive implementation in C

#include <sys/param.h>

void adjust_channels__asimd(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor) {

    for (int i = 0; i < x_size * y_size * 3; i += 3) {
            image[i]   = MIN((float)image[i]   * red_factor,   255);
            image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
            image[i+2] = MIN((float)image[i+2] * green_factor, 255);
    }
}

static void (*adjust_channels__resolver(void)) {
        long hwcaps  = getauxval(AT_HWCAP);
        long hwcaps2 = getauxval(AT_HWCAP2);


 if (hwcaps & HWCAP_SVE) {
                return adjust_channels__sve;
        } else {
                return adjust_channels__asimd;
        }
};

As can be seen, the function has been broken down into 3 different components including a redirect function with an attribute value for a resolver function below:

void adjust_channels(unsigned char *image,int x_size,int y_size,
 float red_factor,float green_factor,float blue_factor) _    
 _attribute__(( ifunc("adjust_channels__resolver") ));

Next are the 2 remade functions for each variant's optimization where each function's name has been updated to indicate which method is being used for that implementation and to provide the resolver function with a way to call the specific remade function.

void adjust_channels__sve(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor) {

and

void adjust_channels__asimd(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor) {

In addition, there is the resolver function handling at the bottom which seems to get the current optimization strategy and choose the correct version of the function to work with.

static void (*adjust_channels__resolver(void)) {
        long hwcaps  = getauxval(AT_HWCAP);
        long hwcaps2 = getauxval(AT_HWCAP2);


 if (hwcaps & HWCAP_SVE) {
                return adjust_channels__sve;
        } else {
                return adjust_channels__asimd;
        }
};

This process is essentially what redirect functioning (ifunc) is doing.

To analyze the compilation process for this I then compiled the new function_ifunc.c into an intermediate representation using GIMPLE which dumps the compiler's intermediate interpretation of the code in a readable format that the compiler will then do optimizations on.

To do so I used the compiler option -fdump-tree-gimple with GCC to get the intermediate code.

The differences between the two different indirect functions are minimal and come down to styling and variable names.

A snippet of the resolver function

_attribute__((target ("arch=armv8-a+sve", "arch=armv8-a")))
void * adjust_channels__resolver ()
{
  void * D.5656;
  long int hwcaps;
  long int hwcaps2;

  # DEBUG BEGIN_STMT
  _1 = getauxval (16);
  hwcaps = (long int) _1;
  # DEBUG BEGIN_STMT
  _2 = getauxval (26);
  hwcaps2 = (long int) _2;
  # DEBUG BEGIN_STMT
  _3 = hwcaps & 4194304;
  if (_3 != 0) goto <D.5654>; else goto <D.5655>;
  <D.5654>:
  ...

As can be seen, the resolver function has morphed into this new intermediary function that handles each runtime compilation with the appropriate function just as it was described in the previous explanation.

Snippet from __sve function

__attribute__((target ("arch=armv8-a+sve")))
void adjust_channels__sve (unsigned char * image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor)
{
  unsigned char iftmp.0;
  unsigned char iftmp.1;
  unsigned char iftmp.2;

  # DEBUG BEGIN_STMT
  {
    int i;

    # DEBUG BEGIN_STMT
    i = 0;
    goto <D.5634>;
    <D.5633>:
    # DEBUG BEGIN_STMT
    _1 = (sizetype) i;
    _2 = image + _1;
    _3 = *_2;
    _4 = (float) _3;
    _5 = red_factor * _4;
    if (_5 < 2.55e+2) goto <D.5659>; else goto <D.5660>;
    <D.5659>:
    _6 = (sizetype) i;
    _7 = image + _6;
    _8 = *_7;
    _9 = (float) _8;
    _10 = red_factor * _9;
    iftmp.0 = (unsigned char) _10;
    goto <D.5661>;
    <D.5660>:
    iftmp.0 = 255;
    <D.5661>:
    _11 = (sizetype) i;
    _12 = image + _11;
    *_12 = iftmp.0;
    # DEBUG BEGIN_STMT
    _13 = (sizetype) i;
    _14 = _13 + 1;
    _15 = image + _14;
    _16 = *_15;
    _17 = (float) _16;
    _18 = blue_factor * _17;
    if (_18 < 2.55e+2) goto <D.5663>; else goto <D.5664>;
    <D.5663>:
    _19 = (sizetype) i;
    _20 = _19 + 1;
    _21 = image + _20;
    _22 = *_21;
    _23 = (float) _22;
    _24 = blue_factor * _23;
    iftmp.1 = (unsigned char) _24;
    goto <D.5665>;
    <D.5664>:
    iftmp.1 = 255;
    <D.5665>:
    _25 = (sizetype) i;
    _26 = _25 + 1;
    _27 = image + _26;
    *_27 = iftmp.1;
    ...

Above is the __sve function that includes the _atttribute(target("arch=armv8-a+sve")) at the function declaration that was the #pragma GCC target "arch=armv8-a+sve" from the original. The code now is meant to be a readable explanation before optimizations are made.

Snippet from __asmid function:

__attribute__((target ("arch=armv8-a+sve")))
void adjust_channels__sve (unsigned char * image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor)
{
  unsigned char iftmp.0;
  unsigned char iftmp.1;
  unsigned char iftmp.2;

  # DEBUG BEGIN_STMT
  {
    int i;

    # DEBUG BEGIN_STMT
    i = 0;
    goto <D.5634>;
    <D.5633>:
    # DEBUG BEGIN_STMT
    _1 = (sizetype) i;
    _2 = image + _1;
    _3 = *_2;
    _4 = (float) _3;
    _5 = red_factor * _4;
    if (_5 < 2.55e+2) goto <D.5659>; else goto <D.5660>;
    <D.5659>:
    _6 = (sizetype) i;
    _7 = image + _6;
    _8 = *_7;
    _9 = (float) _8;
    _10 = red_factor * _9;
    iftmp.0 = (unsigned char) _10;
    goto <D.5661>;
    <D.5660>:
    iftmp.0 = 255;
    <D.5661>:
    _11 = (sizetype) i;
    _12 = image + _11;
    *_12 = iftmp.0;
    # DEBUG BEGIN_STMT
    _13 = (sizetype) i;
    _14 = _13 + 1;
    _15 = image + _14;
    _16 = *_15;
    _17 = (float) _16;
    _18 = blue_factor * _17;
    if (_18 < 2.55e+2) goto <D.5663>; else goto <D.5664>;
    <D.5663>:
    _19 = (sizetype) i;
    _20 = _19 + 1;
    _21 = image + _20;
    _22 = *_21;
    _23 = (float) _22;
    _24 = blue_factor * _23;
    iftmp.1 = (unsigned char) _24;
    goto <D.5665>;
    <D.5664>:
    iftmp.1 = 255;
    <D.5665>:
    _25 = (sizetype) i;
    _26 = _25 + 1;
    _27 = image + _26;
    *_27 = iftmp.1;
    ...

Just like the previous function, it includes the target attribute and looks very similar to the other GIMPLE example except for the different function/location calls.

It seems that the GIMPLE code at this point hasn't gone through any vectorization or any optimizations as the code still contains loops (in the form of goto) and scalar operations, as well as math operations which vectorization would attempt to optimize at the highest optimization level meaning that this current implementation in GIMPLE doesn't apply vectorization yet.

If we were to get the GIMPLE representation of function.c it would look like just one of the two different functions and thus it can be seen that the compiler at this stage is performing Link Time Optimizations

So with that, the process of converting function.c into function_ifunc.c is the following:

Choose a specific architecture variant and extension to use optimizations from.
Decide which parts of a program are vectorizable and determine which functions should be remade into an ifunc variant.
Create named versions of each of the chose functions for different architecture variant optimizations.
Create a resolver function that uses the correct function during runtime.
Create a prototype "driver" function that the resolver function can load the correct auto-vectorization function into (the original function as a prototype in function_ifunc.c).

These steps are how indirect functions are created and are important to note for the next part of this blog where I talk about a theoretical design for an ifunc feature in GCC.

PART 2: Designing ifunc for GCC

For the next part of this blog, I want to discuss how an auto-ifunc feature could be added to GCC to be usable by any user for compilation.

Command Line Argument

Like any of the many commands that can be used in GCC enabling it would come in the form of a command line argument. For example, the option could take the name -fauto-ifunc. The 'f' in the function is used for other arguments in GCC to denote that whatever follows it is a flag (more info here)

Using this flag would then enable the automatic ifunc capability during compilation and looks like a good name for the feature to be called from GCC.

Specifying Architecture

To facilitate this the target architecture could be specified, in a -march=[base]+[ext1]+[ext2]... where m at the start, in this case, is the mode flag used in other GCC flags that deal with mode (again more info about -m in GCC can be found here).

For example, the argument -march=armv8-a+sve would be used to specify aarch64 armv8-a type architecture variant with the SVE extension.

The base architecture in this case would need to be mandatory but the additional extensions can be chosen to be added.

With this implementation, there will need to be a lot of constraints that need to be added for the architecture variants for the resulting program to be runnable.

These constraints would include:

1. Compatibility:

Making sure that these architecture variants are compatible with each other so that GCC doesn't combine incompatible features or instructions into a single executable.
Handling mutually exclusive variants (32-bit vs 64-bit)
If specifying specific instruction set extensions (e.g., SIMD), ensure that the selected extensions are compatible with each other.
Making sure that the compiler version supports the specified architecture variant

2. Consistency:

Ensure that the selected variants share common features so that there is no unpredictable behavior.
Library Consistency in the program by checking if the libraries that are used in the program are compatible with the specified architecture variants.
If the compilation is for a different architecture than the host, ensure that cross-compilation is set up correctly, and the target architecture is supported.

These are some of the main constraints that would need to be applied for the architecture variants. There are more that would need to be handled including things like runtime detection, but with proper testing and documentation in the source code, I think that this is the best solution if the constraints can all be accounted for.

Specifying Functions

The auto-ifunc script demonstrated at the beginning of the blog shows that the script works on a per-function basis. With this in mind when adding this to GCC the names of the functions in which automatic ifunc capability should be applied.

I believe that a user should have a choice as to whether they should specify specific functions or have the auto-ifunc automatically detect functions.

User-specified Functions:

This could come in the form of another extended initial argument -fauto-ifunc-functions=[function1],[function2],....

For example from part 1: -fauto-ifunc-functions=adjust_channels

In the source code, these specific functions to be targeted will be added as #pragma statements.

This flag would be optional and if not provided the -fauto-ifunc will need to auto-select functions based on specified selection criteria provided by the user.

Automatic Function Selection & Specifying Selection Criteria

This option would allow a user to specify which criteria should be used when automatically selecting functions. For example, using the criteria terms that could be specified would be: "vectorizable", "complexity", or "execution-frequency". would have the script check each function for that compatibility.

To add specific criteria you could specify another default argument -fauto-ifunc-criteria=[criteria].

An example from part 1 would be -fauto-ifunc-criteria=vectorizable.

This argument would be optional and have a default value that would be applied if not used. This default value should be a balance between the main criteria; targeting functions that benefit from multiple implementations based on the different architectures provided (either by default or through the -march=[arch] mode flag).

Allowing for both user-specified and automatically selected functions would allow for more flexibility and accessibility to both advanced and novice users which in my opinion is the most effective when adding this option to GCC.

Diagnostics

When running the -fauto-ifunc option in GCC the user should be fully informed of the selected architecture and its variants, the name of each function that will be processed, and any potential issues that arise in attempting to run this command.

An option should be included to display a more verbose diagnostic message (something like -fauto-ifunc-verbose) and forgoing them completely (with something like -fauto-ifunc-quiet)

Conclusion

I believe that the process I have presented here if implemented as described would be a good way to add automatic indirect function creation to GCC as it goes through the constraints and user-level control over how the option could be used seamlessly with GCC.

There exist many more considerations that need to be reviewed including its many interactions with existing GCC options and its effect on the source code when used but I believe the current theoretical implementation works well for this feature.

DEV Community

Automatic Function Redirecting: Navigating Dynamic Code Optimization in GCC

Intro

PART-1: ifunc and its Intermediate Representation

PART 2: Designing ifunc for GCC

Command Line Argument

Specifying Architecture

Specifying Functions

Diagnostics

Conclusion

Top comments (0)

Read next

Revolutionizing Payments: The WhiteBIT Crypto Card in Action

KaibanJS v0.11.0: Empowering Developers with Advanced RAG Tools

Unveiling the ConFoo 2025 edition!

Why Quick Fixes Fail: Rethinking Microservices Testing