gprof and gcov are classical profilers that are still in use. Since the dawn of time, they were used by hackers to gain insight into their programs at the source code level.
In the examples below, I’ll use my block_hasher program to illustrate the usage of profilers. because it’s a small and simple utility.
gprof
gprof (GNU Profiler) – simple and easy profiler that can show how much time your program spends in routines in percents and seconds. gprof uses source code instrumentation by inserting special mcount
function call to gather metrics of your program.
Building with gprof instrumentation
To gather profile you need to compile your program with -pg
gcc option and then launch under gprof. For better results and statistical errors elimination, it’s recommended to launch profiling subject several times.
To build with gprof instrumentation invoke gcc like this:
$ gcc <your options> -pg -g prog.c -o prog
Here is the actual compile instructions for the block_hasher
:
$ gcc -lrt -pthread -lcrypto -pg -g block_hasher.c -o block_hasher
As a result, you’ll get instrumented program. To check if it’s really instrumented just grep mcount
symbol.
$ nm block_hasher | grep mcount
U mcount@@GLIBC_2.2.5
Profiling block_hasher under gprof
As I said earlier to collect useful statistic we should run the program several times and accumulate metrics. To do that I’ve written simple bash script:
#!/bin/bash
if [[ $# -lt 2 ]]; then
echo "$0 <number of runs> <program with options...>"
exit 1
fi
RUNS=$1
shift 1
COMMAND="$@"
# Profile name is a program name (first element in args)
PROFILE_NAME="$(echo "${COMMAND}" | cut -f1 -d' ')"
for i in $(seq 1 ${RUNS}); do
# Run profiled program
eval "${COMMAND}"
# Accumulate gprof statistic
if [[ -e gmon.sum ]]; then
gprof -s ${PROFILE_NAME} gmon.out gmon.sum
else
mv gmon.out gmon.sum
fi
done
# Make final profile
gprof ${PROFILE_NAME} gmon.sum > gmon.profile
So, each launch will create gmon.out that gprof will combine in gmon.sum. Finally, gmon.sum will be feed to gprof to get flat text profile and call graph.
Let’s do this for our program:
$ ./gprofiler.sh 10 ./block_hasher -d /dev/sdd -b 1048576 -t 10 -n 1000
After finish, this script will create gmon.profile - a text profile, that we can analyze.
Analyzing
The flat profile has info about program routines and time spent in it.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.24 0.01 0.01 thread_func
0.00 0.01 0.00 50 0.00 0.00 time_diff
0.00 0.01 0.00 5 0.00 0.00 bdev_close
0.00 0.01 0.00 5 0.00 0.00 bdev_open
gprof metrics are clear from the name. As we can see almost all of it’s time our little program spent in thread function, BUT look at the actual seconds – only 0.01 seconds from whole program execution. It means that it’s not the thread function who is slowing down but something underlying. In the case ofblock_hasher
, it’s a pread
syscall that does the I/O for the block device.
The call graph is really not interesting here, so I won’t show you it, sorry.
gcov
gcov (short for GNU Coverage) – tool to collect function call statistics line-by-line. Usually it’s used in pair with gprof to understand what exact line in slacking function is holds your program down.
Building with gcov instrumentation
Just as gprof you need to recompile your program with gcov flags
# gcc -fprofile-arcs -ftest-coverage -lcrypto -pthread -lrt -Wall -Wextra block_hasher.c -o block_hasher
There are 2 gcov flags: -fprofile-arcs
и -ftest-coverage
. First will instrument your program to profile so called arcs – paths in program’s control flow. The second option will make gcc to collect additional notes for arcs profiling and gcov itself.
You can simply put --coverage
option which implies both of -fprofile-arcs
and -ftest-coverage
along with linker -lgcov
flag. See GCC debugging options for more info.
Profiling block_hasher under gcov
Now, after instrumenting we just launch the program to get 2 files –block_hasher.gcda and block_hasher.gcno. Please, don’t look inside it – we will transform it to text profile. To do this we execute gcov passing it source code filename. It’s important to remember that you must have<filename>.gcda
and <filename>.gcno
files.
$ gcov block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
block_hasher.c:creating 'block_hasher.c.gcov'
Finally, we’ll get block_hasher.c.gcov.
Analyzing
.gcov
file is result of that whole gcov work. Let’s look at it. For each of your source files gcov will create annotated source codes with runtime coverage. Here is excerpt from thread_func
:
10: 159: gap = num_threads * block_size; // Multiply here to avoid integer overflow
-: 160:
-: 161: // Initialize EVP and start reading
10: 162: md = EVP_sha1();
10: 163: mdctx = EVP_MD_CTX_create();
10: 164: EVP_DigestInit_ex( mdctx, md, NULL );
-: 165:
10: 166: get_clock( &start );
10010: 167: for( i = 0; i < nblocks; i++)
-: 168: {
10000: 169: offset = j->off + gap * i;
-: 170:
-: 171: // Read at offset without changing file pointer
10000: 172: err = pread( bdev->fd, buf, block_size, offset );
9999: 173: if( err == -1 )
-: 174: {
#####: 175: fprintf(stderr, "T%02d Failed to read at %llu\n", j->num, (unsigned long long)offset);
#####: 176: perror("pread");
#####: 177: pthread_exit(NULL);
-: 178: }
-: 179:
9999: 180: bytes += err; // On success pread returns bytes read
-: 181:
-: 182: // Update digest
9999: 183: EVP_DigestUpdate( mdctx, buf, block_size );
-: 184: }
10: 185: get_clock( &end );
10: 186: sec_diff = time_diff( start, end );
-: 187:
10: 188: EVP_DigestFinal_ex( mdctx, j->digest, &j->digest_len );
10: 189: EVP_MD_CTX_destroy(mdctx);
The left outmost column is how many times this line of code was invoked. As expected, the for loop was executed 10000 times – 10 threads each reading 1000 blocks. Nothing new.
Though gcov was not so much useful for me, I’d like to say that it has really cool feature – branch probabilities. If you’ll launch gcov with -b
option
[root@simplex block_hasher]# gcov -b block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
Branches executed:100.00% of 66
Taken at least once:60.61% of 66
Calls executed:51.47% of 68
block_hasher.c:creating 'block_hasher.c.gcov'
You’ll get nice branch annotation with probabilities. For example, here istime_diff
function
113 function time_diff called 10 returned 100% blocks executed 100%
114 10: 106:double time_diff(struct timespec start, struct timespec end)
115 -: 107:{
116 -: 108: struct timespec diff;
117 -: 109: double sec;
118 -: 110:
119 10: 111: if ( (end.tv_nsec - start.tv_nsec) < 0 )
120 branch 0 taken 60% (fallthrough)
121 branch 1 taken 40%
122 -: 112: {
123 6: 113: diff.tv_sec = end.tv_sec - start.tv_sec - 1;
124 6: 114: diff.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
125 -: 115: }
126 -: 116: else
127 -: 117: {
128 4: 118: diff.tv_sec = end.tv_sec - start.tv_sec;
129 4: 119: diff.tv_nsec = end.tv_nsec - start.tv_nsec;
130 -: 120: }
131 -: 121:
132 10: 122: sec = (double)diff.tv_nsec / 1000000000 + diff.tv_sec;
133 -: 123:
134 10: 124: return sec;
135 -: 125:}
In 60% of if
calls we’ve fallen in the branch to calculate time diff with borrow, that actually means we were executing for more than 1 second.
Conclusion
gprof and gcov are really entertaining tools despite a lot of people think about them as obsolete. On the one hand, these utilities are simple, they implement and automate an obvious method of source code instrumentation to measure functions hit count.
But on the other hand, such simple metrics won’t help with problems outside of your application like kernel or library, although there are ways to use it for an operating system, e.g. for Linux kernel. Anyway, gprof and_gcov_ are useless in the case when your application spends most of its time in waiting for some syscall (pread
in my case).
Top comments (0)