DEV Community

Harold Combs
Harold Combs

Posted on

6 1

A Little Fun with Assembly

Originally published on my blog in 2014

I still remember an interview I had around February 2001, in which an embedded firmware engineer talked about how his team wrote code:

We write stuff in Assembler, because we're too lazy to write stuff in C.

Wait...what? I thought the whole purpose of C was to have portable Assembly, so you could control the bare metal correctly? I did get an inkling if you were that good, assembly could be seductive in your ability to do whatever you want.

This came to mind again when a former colleague of mine posed a similar question on Facebook the other night:

Pop quiz: When you run this, what prints out?

#include <stdio.h>
int main (int argc, char** argv) {
int i = 5;
int j = 10;
while (--j) { printf("%d %d\n", i, j); } while (--i);
}
view raw quiz.c hosted with ❤ by GitHub

Basically, the above is a quiz to determine if you understand loops, expressions -versus- statements, and the pre-decrement operator (--). Pre-decrement specifies that the lvalue of the expression is the current value minus one and the post-state of that variable is assigned that decremented value. Post-decrement has the same result (decrementing the value), but the lvalue of the expression is the PREVIOUS value.

As is my wont, I got the above wrong, but that's not the point. :-D

To check my answer, I sucked it into quick c program using vim:
Compiling that program and using mac's otool to dump the assembly gives you this:

Unoptimized version

(__TEXT,__text) section
_main:
0000000100000ef0 pushq %rbp
0000000100000ef1 movq %rsp, %rbp
0000000100000ef4 subq $0x20, %rsp
0000000100000ef8 movl $0x0, -0x4(%rbp)
0000000100000eff movl %edi, -0x8(%rbp)
0000000100000f02 movq %rsi, -0x10(%rbp)
0000000100000f06 movl $0x5, -0x14(%rbp)
0000000100000f0d movl $0xa, -0x18(%rbp)
0000000100000f14 movl -0x18(%rbp), %eax
0000000100000f17 addl $0xffffffff, %eax ## imm = 0xFFFFFFFF
0000000100000f1c movl %eax, -0x18(%rbp)
0000000100000f1f cmpl $0x0, %eax
0000000100000f24 je 0x100000f46
0000000100000f2a leaq 0x61(%rip), %rdi
0000000100000f31 movl -0x14(%rbp), %esi
0000000100000f34 movl -0x18(%rbp), %edx
0000000100000f37 movb $0x0, %al
0000000100000f39 callq 0x100000f70
0000000100000f3e movl %eax, -0x1c(%rbp)
0000000100000f41 jmp 0x100000f14
0000000100000f46 jmp 0x100000f4b
0000000100000f4b movl -0x14(%rbp), %eax
0000000100000f4e addl $0xffffffff, %eax ## imm = 0xFFFFFFFF
0000000100000f53 movl %eax, -0x14(%rbp)
0000000100000f56 cmpl $0x0, %eax
0000000100000f5b je 0x100000f66
0000000100000f61 jmp 0x100000f4b
0000000100000f66 movl -0x4(%rbp), %eax
0000000100000f69 addq $0x20, %rsp
0000000100000f6d popq %rbp
0000000100000f6e retq

Some things to note in the above:

  • The compiler has done a faithful job of translating exactly the program (as-is) to assembler:
  • We load the variables in lines 9 and 10
  • We have the first loop in lines 11-22
  • The second loop (despite being a no-op) still exists, in lines 24-29

Compiler-optimized version

Things get slightly more interesting when you pass the -O (optimize) flag

a.out:
(__TEXT,__text) section
_main:
0000000100000ea0 pushq %rbp
0000000100000ea1 movq %rsp, %rbp
0000000100000ea4 pushq %rbx
0000000100000ea5 pushq %rax
0000000100000ea6 leaq 0xdd(%rip), %rbx
0000000100000ead movl $0x5, %esi
0000000100000eb2 movl $0x9, %edx
0000000100000eb7 xorl %eax, %eax
0000000100000eb9 movq %rbx, %rdi
0000000100000ebc callq 0x100000f6a
0000000100000ec1 movl $0x5, %esi
0000000100000ec6 movl $0x8, %edx
0000000100000ecb xorl %eax, %eax
0000000100000ecd movq %rbx, %rdi
0000000100000ed0 callq 0x100000f6a
0000000100000ed5 movl $0x5, %esi
0000000100000eda movl $0x7, %edx
0000000100000edf xorl %eax, %eax
0000000100000ee1 movq %rbx, %rdi
0000000100000ee4 callq 0x100000f6a
0000000100000ee9 movl $0x5, %esi
0000000100000eee movl $0x6, %edx
0000000100000ef3 xorl %eax, %eax
0000000100000ef5 movq %rbx, %rdi
0000000100000ef8 callq 0x100000f6a
0000000100000efd movl $0x5, %esi
0000000100000f02 movl $0x5, %edx
0000000100000f07 xorl %eax, %eax
0000000100000f09 movq %rbx, %rdi
0000000100000f0c callq 0x100000f6a
0000000100000f11 movl $0x5, %esi
0000000100000f16 movl $0x4, %edx
0000000100000f1b xorl %eax, %eax
0000000100000f1d movq %rbx, %rdi
0000000100000f20 callq 0x100000f6a
0000000100000f25 movl $0x5, %esi
0000000100000f2a movl $0x3, %edx
0000000100000f2f xorl %eax, %eax
0000000100000f31 movq %rbx, %rdi
0000000100000f34 callq 0x100000f6a
0000000100000f39 movl $0x5, %esi
0000000100000f3e movl $0x2, %edx
0000000100000f43 xorl %eax, %eax
0000000100000f45 movq %rbx, %rdi
0000000100000f48 callq 0x100000f6a
0000000100000f4d movl $0x5, %esi
0000000100000f52 movl $0x1, %edx
0000000100000f57 xorl %eax, %eax
0000000100000f59 movq %rbx, %rdi
0000000100000f5c callq 0x100000f6a
0000000100000f61 xorl %eax, %eax
0000000100000f63 addq $0x8, %rsp
0000000100000f67 popq %rbx
0000000100000f68 popq %rbp
0000000100000f69 retq

Some things to note:

  • This looks nothing like the C code. There are no loops (or indeed, branch instructions) at all.
  • The compiler determined the second loop to be a no-op, and compiled it away completely.
  • Our stack variables are gone. The compiler is using x64 CPU registers exclusively.
  • The compiler has analyzed the loop and unrolled it into discrete calls to callq for the printf function.

Lastly: The answer to the quiz is in the assembly if you look hard enough:

5 9
5 8
5 7
5 6
5 5
5 4
5 3
5 2
5 1
Enter fullscreen mode Exit fullscreen mode

Pretty cool....I never get to look at assembly in my day-job, so getting this close to the CPU was a neat

Top comments (2)

Collapse
 
connellpaxton profile image
connell-paxton

AMD has a great guide on writing optimized C/C++ code.
Personally I think of C as kinda psuedocodes for assembly, I can
write stuff out without having to think as much, and it allows me
to mess around without having to write as much code

Collapse
 
wpdevvy profile image
wpdevvy

The compiler also does this to sum(1,...n) as n + n(n-1)/2. Pretty intelligent compiler devs.