DEV Community

Discussion on: Perf - Perfect Profiling of C/C++ on Linux

Collapse
jpenuchot profile image
Jules Pénuchot

Intel IACA is the tool I used the most to develop high-performance SIMD optimized kernels. It calculates the throughput of a portion of code and shows critical sections of your resulting assembly (saturated execution ports, register usage graph etc.).

It's a great way to compare two versions of a same code, especially when you want to fine tune your program by trying to push your compiler to use more optimized instructions.

software.intel.com/en-us/articles/...

But let's be honest: at the end, only benchmark numbers matter, which is why I also use Google Benchmark.

github.com/google/benchmark

Collapse
etcwilde profile image
Evan Wilde Author

I was not aware of these tools and they look pretty good. I'll have to take a look into them.

Collapse
jpenuchot profile image
Jules Pénuchot

If you don't want to rely on Intel you can just surround the part of your program you want to inspect with a few asm("nop"); in your source and look for a bunch of nop instructions in your disassembled program. However you won't get any information about the throughput of your program.

I made a quick project to compare ASM programs very fast. The shell and CMake scripts take care of building all the programs in every "src__*" folder they find, disassemble them and eventually do the IACA inspection and dependency graph generation in one command. It's not a big thing but it helps when you want to inspect more than one program using IACA without modifying your program, etc...