A Little Fun with Assembly

#assembly #compiler

Originally published on my blog in 2014

I still remember an interview I had around February 2001, in which an embedded firmware engineer talked about how his team wrote code:

We write stuff in Assembler, because we're too lazy to write stuff in C.

Wait...what? I thought the whole purpose of C was to have portable Assembly, so you could control the bare metal correctly? I did get an inkling if you were that good, assembly could be seductive in your ability to do whatever you want.

This came to mind again when a former colleague of mine posed a similar question on Facebook the other night:

Pop quiz: When you run this, what prints out?

	#include <stdio.h>

	int main (int argc, char** argv) {

	int i = 5;
	int j = 10;
	while (--j) { printf("%d %d\n", i, j); } while (--i);

	}

view raw quiz.c hosted with ❤ by GitHub

Basically, the above is a quiz to determine if you understand loops, expressions -versus- statements, and the pre-decrement operator (--). Pre-decrement specifies that the lvalue of the expression is the current value minus one and the post-state of that variable is assigned that decremented value. Post-decrement has the same result (decrementing the value), but the lvalue of the expression is the PREVIOUS value.

As is my wont, I got the above wrong, but that's not the point. :-D

To check my answer, I sucked it into quick c program using vim:
Compiling that program and using mac's otool to dump the assembly gives you this:

Unoptimized version

	(__TEXT,__text) section
	_main:
	0000000100000ef0 pushq %rbp
	0000000100000ef1 movq %rsp, %rbp
	0000000100000ef4 subq $0x20, %rsp
	0000000100000ef8 movl $0x0, -0x4(%rbp)
	0000000100000eff movl %edi, -0x8(%rbp)
	0000000100000f02 movq %rsi, -0x10(%rbp)
	0000000100000f06 movl $0x5, -0x14(%rbp)
	0000000100000f0d movl $0xa, -0x18(%rbp)
	0000000100000f14 movl -0x18(%rbp), %eax
	0000000100000f17 addl $0xffffffff, %eax ## imm = 0xFFFFFFFF
	0000000100000f1c movl %eax, -0x18(%rbp)
	0000000100000f1f cmpl $0x0, %eax
	0000000100000f24 je 0x100000f46
	0000000100000f2a leaq 0x61(%rip), %rdi
	0000000100000f31 movl -0x14(%rbp), %esi
	0000000100000f34 movl -0x18(%rbp), %edx
	0000000100000f37 movb $0x0, %al
	0000000100000f39 callq 0x100000f70
	0000000100000f3e movl %eax, -0x1c(%rbp)
	0000000100000f41 jmp 0x100000f14
	0000000100000f46 jmp 0x100000f4b
	0000000100000f4b movl -0x14(%rbp), %eax
	0000000100000f4e addl $0xffffffff, %eax ## imm = 0xFFFFFFFF
	0000000100000f53 movl %eax, -0x14(%rbp)
	0000000100000f56 cmpl $0x0, %eax
	0000000100000f5b je 0x100000f66
	0000000100000f61 jmp 0x100000f4b
	0000000100000f66 movl -0x4(%rbp), %eax
	0000000100000f69 addq $0x20, %rsp
	0000000100000f6d popq %rbp
	0000000100000f6e retq

view raw quiz_unoptimized.asm hosted with ❤ by GitHub

Some things to note in the above:

The compiler has done a faithful job of translating exactly the program (as-is) to assembler:
We load the variables in lines 9 and 10
We have the first loop in lines 11-22
The second loop (despite being a no-op) still exists, in lines 24-29

Compiler-optimized version

Things get slightly more interesting when you pass the -O (optimize) flag

	a.out:
	(__TEXT,__text) section
	_main:
	0000000100000ea0 pushq %rbp
	0000000100000ea1 movq %rsp, %rbp
	0000000100000ea4 pushq %rbx
	0000000100000ea5 pushq %rax
	0000000100000ea6 leaq 0xdd(%rip), %rbx
	0000000100000ead movl $0x5, %esi
	0000000100000eb2 movl $0x9, %edx
	0000000100000eb7 xorl %eax, %eax
	0000000100000eb9 movq %rbx, %rdi
	0000000100000ebc callq 0x100000f6a
	0000000100000ec1 movl $0x5, %esi
	0000000100000ec6 movl $0x8, %edx
	0000000100000ecb xorl %eax, %eax
	0000000100000ecd movq %rbx, %rdi
	0000000100000ed0 callq 0x100000f6a
	0000000100000ed5 movl $0x5, %esi
	0000000100000eda movl $0x7, %edx
	0000000100000edf xorl %eax, %eax
	0000000100000ee1 movq %rbx, %rdi
	0000000100000ee4 callq 0x100000f6a
	0000000100000ee9 movl $0x5, %esi
	0000000100000eee movl $0x6, %edx
	0000000100000ef3 xorl %eax, %eax
	0000000100000ef5 movq %rbx, %rdi
	0000000100000ef8 callq 0x100000f6a
	0000000100000efd movl $0x5, %esi
	0000000100000f02 movl $0x5, %edx
	0000000100000f07 xorl %eax, %eax
	0000000100000f09 movq %rbx, %rdi
	0000000100000f0c callq 0x100000f6a
	0000000100000f11 movl $0x5, %esi
	0000000100000f16 movl $0x4, %edx
	0000000100000f1b xorl %eax, %eax
	0000000100000f1d movq %rbx, %rdi
	0000000100000f20 callq 0x100000f6a
	0000000100000f25 movl $0x5, %esi
	0000000100000f2a movl $0x3, %edx
	0000000100000f2f xorl %eax, %eax
	0000000100000f31 movq %rbx, %rdi
	0000000100000f34 callq 0x100000f6a
	0000000100000f39 movl $0x5, %esi
	0000000100000f3e movl $0x2, %edx
	0000000100000f43 xorl %eax, %eax
	0000000100000f45 movq %rbx, %rdi
	0000000100000f48 callq 0x100000f6a
	0000000100000f4d movl $0x5, %esi
	0000000100000f52 movl $0x1, %edx
	0000000100000f57 xorl %eax, %eax
	0000000100000f59 movq %rbx, %rdi
	0000000100000f5c callq 0x100000f6a
	0000000100000f61 xorl %eax, %eax
	0000000100000f63 addq $0x8, %rsp
	0000000100000f67 popq %rbx
	0000000100000f68 popq %rbp
	0000000100000f69 retq

view raw quiz_optimized.asm hosted with ❤ by GitHub

Some things to note:

This looks nothing like the C code. There are no loops (or indeed, branch instructions) at all.
The compiler determined the second loop to be a no-op, and compiled it away completely.
Our stack variables are gone. The compiler is using x64 CPU registers exclusively.
The compiler has analyzed the loop and unrolled it into discrete calls to callq for the printf function.

Lastly: The answer to the quiz is in the assembly if you look hard enough:

Pretty cool....I never get to look at assembly in my day-job, so getting this close to the CPU was a neat

Top comments (2)

connell-paxton • Jul 14 '20

AMD has a great guide on writing optimized C/C++ code.
Personally I think of C as kinda psuedocodes for assembly, I can
write stuff out without having to think as much, and it allows me
to mess around without having to write as much code