Originally published on my blog in 2014
I still remember an interview I had around February 2001, in which an embedded firmware engineer talked about how his team wrote code:
We write stuff in Assembler, because we're too lazy to write stuff in C.
Wait...what? I thought the whole purpose of C was to have portable Assembly, so you could control the bare metal correctly? I did get an inkling if you were that good, assembly could be seductive in your ability to do whatever you want.
This came to mind again when a former colleague of mine posed a similar question on Facebook the other night:
Pop quiz: When you run this, what prints out?
#include <stdio.h> | |
int main (int argc, char** argv) { | |
int i = 5; | |
int j = 10; | |
while (--j) { printf("%d %d\n", i, j); } while (--i); | |
} |
Basically, the above is a quiz to determine if you understand loops, expressions -versus- statements, and the pre-decrement operator (--
). Pre-decrement specifies that the lvalue of the expression is the current value minus one and the post-state of that variable is assigned that decremented value. Post-decrement has the same result (decrementing the value), but the lvalue of the expression is the PREVIOUS value.
As is my wont, I got the above wrong, but that's not the point. :-D
To check my answer, I sucked it into quick c program using vim:
Compiling that program and using mac's otool
to dump the assembly gives you this:
Unoptimized version
(__TEXT,__text) section | |
_main: | |
0000000100000ef0 pushq %rbp | |
0000000100000ef1 movq %rsp, %rbp | |
0000000100000ef4 subq $0x20, %rsp | |
0000000100000ef8 movl $0x0, -0x4(%rbp) | |
0000000100000eff movl %edi, -0x8(%rbp) | |
0000000100000f02 movq %rsi, -0x10(%rbp) | |
0000000100000f06 movl $0x5, -0x14(%rbp) | |
0000000100000f0d movl $0xa, -0x18(%rbp) | |
0000000100000f14 movl -0x18(%rbp), %eax | |
0000000100000f17 addl $0xffffffff, %eax ## imm = 0xFFFFFFFF | |
0000000100000f1c movl %eax, -0x18(%rbp) | |
0000000100000f1f cmpl $0x0, %eax | |
0000000100000f24 je 0x100000f46 | |
0000000100000f2a leaq 0x61(%rip), %rdi | |
0000000100000f31 movl -0x14(%rbp), %esi | |
0000000100000f34 movl -0x18(%rbp), %edx | |
0000000100000f37 movb $0x0, %al | |
0000000100000f39 callq 0x100000f70 | |
0000000100000f3e movl %eax, -0x1c(%rbp) | |
0000000100000f41 jmp 0x100000f14 | |
0000000100000f46 jmp 0x100000f4b | |
0000000100000f4b movl -0x14(%rbp), %eax | |
0000000100000f4e addl $0xffffffff, %eax ## imm = 0xFFFFFFFF | |
0000000100000f53 movl %eax, -0x14(%rbp) | |
0000000100000f56 cmpl $0x0, %eax | |
0000000100000f5b je 0x100000f66 | |
0000000100000f61 jmp 0x100000f4b | |
0000000100000f66 movl -0x4(%rbp), %eax | |
0000000100000f69 addq $0x20, %rsp | |
0000000100000f6d popq %rbp | |
0000000100000f6e retq |
Some things to note in the above:
- The compiler has done a faithful job of translating exactly the program (as-is) to assembler:
- We load the variables in lines 9 and 10
- We have the first loop in lines 11-22
- The second loop (despite being a no-op) still exists, in lines 24-29
Compiler-optimized version
Things get slightly more interesting when you pass the -O (optimize) flag
a.out: | |
(__TEXT,__text) section | |
_main: | |
0000000100000ea0 pushq %rbp | |
0000000100000ea1 movq %rsp, %rbp | |
0000000100000ea4 pushq %rbx | |
0000000100000ea5 pushq %rax | |
0000000100000ea6 leaq 0xdd(%rip), %rbx | |
0000000100000ead movl $0x5, %esi | |
0000000100000eb2 movl $0x9, %edx | |
0000000100000eb7 xorl %eax, %eax | |
0000000100000eb9 movq %rbx, %rdi | |
0000000100000ebc callq 0x100000f6a | |
0000000100000ec1 movl $0x5, %esi | |
0000000100000ec6 movl $0x8, %edx | |
0000000100000ecb xorl %eax, %eax | |
0000000100000ecd movq %rbx, %rdi | |
0000000100000ed0 callq 0x100000f6a | |
0000000100000ed5 movl $0x5, %esi | |
0000000100000eda movl $0x7, %edx | |
0000000100000edf xorl %eax, %eax | |
0000000100000ee1 movq %rbx, %rdi | |
0000000100000ee4 callq 0x100000f6a | |
0000000100000ee9 movl $0x5, %esi | |
0000000100000eee movl $0x6, %edx | |
0000000100000ef3 xorl %eax, %eax | |
0000000100000ef5 movq %rbx, %rdi | |
0000000100000ef8 callq 0x100000f6a | |
0000000100000efd movl $0x5, %esi | |
0000000100000f02 movl $0x5, %edx | |
0000000100000f07 xorl %eax, %eax | |
0000000100000f09 movq %rbx, %rdi | |
0000000100000f0c callq 0x100000f6a | |
0000000100000f11 movl $0x5, %esi | |
0000000100000f16 movl $0x4, %edx | |
0000000100000f1b xorl %eax, %eax | |
0000000100000f1d movq %rbx, %rdi | |
0000000100000f20 callq 0x100000f6a | |
0000000100000f25 movl $0x5, %esi | |
0000000100000f2a movl $0x3, %edx | |
0000000100000f2f xorl %eax, %eax | |
0000000100000f31 movq %rbx, %rdi | |
0000000100000f34 callq 0x100000f6a | |
0000000100000f39 movl $0x5, %esi | |
0000000100000f3e movl $0x2, %edx | |
0000000100000f43 xorl %eax, %eax | |
0000000100000f45 movq %rbx, %rdi | |
0000000100000f48 callq 0x100000f6a | |
0000000100000f4d movl $0x5, %esi | |
0000000100000f52 movl $0x1, %edx | |
0000000100000f57 xorl %eax, %eax | |
0000000100000f59 movq %rbx, %rdi | |
0000000100000f5c callq 0x100000f6a | |
0000000100000f61 xorl %eax, %eax | |
0000000100000f63 addq $0x8, %rsp | |
0000000100000f67 popq %rbx | |
0000000100000f68 popq %rbp | |
0000000100000f69 retq |
Some things to note:
- This looks nothing like the C code. There are no loops (or indeed, branch instructions) at all.
- The compiler determined the second loop to be a no-op, and compiled it away completely.
- Our stack variables are gone. The compiler is using x64 CPU registers exclusively.
- The compiler has analyzed the loop and unrolled it into discrete calls to
callq
for the printf function.
Lastly: The answer to the quiz is in the assembly if you look hard enough:
5 9
5 8
5 7
5 6
5 5
5 4
5 3
5 2
5 1
Pretty cool....I never get to look at assembly in my day-job, so getting this close to the CPU was a neat
Top comments (2)
AMD has a great guide on writing optimized C/C++ code.
Personally I think of C as kinda psuedocodes for assembly, I can
write stuff out without having to think as much, and it allows me
to mess around without having to write as much code
The compiler also does this to sum(1,...n) as n + n(n-1)/2. Pretty intelligent compiler devs.