We've already made a article about flags that works for both GCC and Clang, however, those tips have general instructions for compilation.
In this article, we will specify more the objective at "compile time" that directly influences the performance of the binary, making the speed at "runtime" better!
01. The basics
The -fsanitize=address
flag and all the others in sanitize(libasan
), from Google, which was natively implemented by the GNU Project are used to check for memory leaks, memory violations and other related failures.
However, it should only be used during development, when you are going to make it available for production, that is, the release version. The ideal is to create the Makefile, or CMake or any other compilation tool without this flag. In fact, it is a good idea to remove any other debug flag, including: -g
, -Wall
, -Werror
, -pedantic
, -Wpedantic
,...
Because they, especially -fsanitize=address
, make the binary execution very slow. You can replace it with the optimizer, for example, -O1
, -O2
or -O3
:
-O1
(Basic optimization) - Enables optimizations that improve performance without significantly increasing compilation time. Examples: dead code elimination, constant propagation, limited inlining.-O2
(Moderate optimization) - Includes all optimizations from-O1
and adds more aggressive ones that still maintain code reliability. Examples: loop unrolling, elimination of common subexpressions, better instruction scheduling.-O3
(Aggressive optimization) - Includes all optimizations from-O2
and adds new, more aggressive ones, such as increased inlining and loop vectorization. May increase code size and, in some cases, reduce performance due to over-optimization.
And there is also -Ofast
, although it is the most aggressive of all and almost equivalent to -O3
, it can completely optimize the code, making it even faster, since it still includes the -ffast-math
flag, this can be good, but the ideal is to do tests, since some precision calculations, mainly for the double
and float
types, can have unexpected results, since it can reduce the number of significant digits, in addition to being able to break with the C and C++ standards.
However, in most cases, it is recommended for release, for example:
g++ -Ofast main.cpp
If you want a less conflicting fusion, use it together with
-ffp-contract=fast
: Allows fusion of floating point operations, such as FMA (Fused Multiply-Add).
In short:
Use -Ofast
if exact numerical precision is not critical and you want to extract maximum performance.
02. Architecture-specific tuning
The -march=native
flag allows the compiler to generate code optimized for your CPU architecture:
g++ -Ofast -march=native main.cpp
Using it in combination with
-Ofast
can be a great idea for performance.
This allows the compiler to use advanced instructions of your processor, such as SSE, AVX, etc.
If you need to distribute the binary to other machines, choose a specific value instead of native, such as -march=haswell
, -march=znver3
, etc.
In short:
The -march=native
flag, if you want, allows the compiler to generate code optimized for your CPU architecture.
03. Parallelism with OpenMP
If the code is parallelizable, add support for OpenMP to take advantage of multiple CPU cores:
g++ -Ofast -march=native -fopenmp main.cpp
Also in combination with the flags mentioned above.
This allows loops and other parts of the code to run in parallel.
OpenMP (Open Multi-Processing) is an application programming interface (API) for shared-memory multiprocessing on multiple platforms. It allows adding concurrency to programs written in C, C++ and Fortran based on the fork-join execution model.
04. Improve CPU cache usage
The -funroll-loops
and -fprefetch-loop-arrays
flags help improve loop execution:
g++ -Ofast -march=native -funroll-loops -fprefetch-loop-arrays main.cpp
Also in combination with the flags mentioned above.
If we used them in the video about Ranking of Programming Languages, C++ and C would leave those behind them even further behind! đ
Remember, an even better utility than these flags is ccache
, which we published in the article: Use Ccache and compile much faster, however, its focus is to reduce "compilation time" and not only binary performance.
05. Optimized linking
The -flto
(Link-Time Optimization) flag is used to allow the optimizer to see the code as a whole:
g++ -Ofast -march=native -flto main.cpp
It is good to use it in conjunction with the first two flags mentioned.
06. Avoid exceptions and RTTI if they are not necessary
Use the -fno-rtti
flag if the code does not use exceptions or RTTI (Runtime Type Information), disable them to gain performance:
g++ -Ofast -march=native -fno-exceptions -fno-rtti main.cpp
RTTI (Run-time Type Information) is a technique that stores information about the data type of an object during the execution of a program. RTTI is available in some programming languages, such as Delphi and C++.
07. Use execution profiles with Profile-Guided Optimization (PGO)
If you can run the program before final compilation (AND DO IT!!!), use PGO with the -fprofile-generate
flag to optimize based on real execution data:
Don't confuse it with the so-called: borrow checker!
Compile with instrumentation:
g++ -Ofast -march=native -fprofile-generate main.cpp
It is good to use it together with the first two flags mentioned.
Run the program normally to generate profile data, and then recompile using the generated profiles:
g++ -Ofast -march=native -fprofile-use main.cpp
Profile-guided optimization (PGO), also known as profile-directed feedback (PDF) or feedback-directed optimization (FDO), is a compiler optimization technique that uses prior analysis of software artifacts or behaviors ("profiling") to improve the expected runtime performance of the program.
08. Improve the "tuning"
In addition to -march=native
, you can use -mtune
to tune the code generation for better performance without losing compatibility:
g++ -Ofast -march=native -mtune=native main.cpp
It is good to use it in conjunction with the first two flas mentioned.
If you need to run on multiple architectures, use something more generic, like -mtune=generic
.
If you want to use all the flags we mentioned together, feel free:
g++ -Ofast -march=native -flto -funroll-loops -fprefetch-loop-arrays \
-fno-exceptions -fno-rtti -fopenmp main.cpp
For more information, see the links below:
Top comments (0)