Omar Emara

Posted on Jan 5, 2021

Guide to writing, compiling, and running MIPS binaries on Linux

#mips #qemu #musl #unix

In this article, I shall explain the process of writing, compiling, and running MIPS binaries on Linux. I assume the reader is somewhat familiar with MIPS. I also assume the reader is using a host system that is not MIPS, which constitutes part of the challenge.

Introduction

The main challenge in this article is to write, compile, and run binaries for the MIPS Instruction Set Architecture (ISA) on a host machine that is not MIPS, like an x86 AMD or Intel system. If one do have a MIPS system then compiling and running MIPS binaries would be straightforward. But most of us don't have access to a MIPS system. In particular, we can divide the challenge into two sub challenges:

How can one run a MIPS binary on a system that is not MIPS?
How can one compile a MIPS source into an executable MIPS binary?

Running Binaries

Running binaries is the easier challenge. Qemu is an open source generic emulator for many many ISAs and processors, including MIPS. Qemu provide two methods of emulation:

Full system emulation.
User-space emulation.

A full system emulator boots a full MIPS Linux kernel where you can do whatever you like, including installing a native compiler and running binaries just like you would run binaries on your system. This is overkill for our use case, so we shall not look into this solution. The other method is to run a user-space emulator that just runs a single MIPS binary. This is perfect for our use case, so we shall utilize it.

Compiling Binaries

Compiling a binary for a different platform is called Cross Compiling, and it is a bit involved. In order to cross compile, you need a Cross Compiler, actually, you need a whole Cross Compiler Toolchain including the compiler, assembler, linker, runtime, and so on. For GCC, you need a separate toolchain for each compilation platform you want to target, which means you will have to compile the toolchain yourself configuring the target in the process. Luckily, we have another option, Clang. Clang is a cross compiler by default with support for many targets out of the box, so it is very likely that the Clang version you have already supports MIPS as a target. So we shall utilize Clang for our purpose, for now.

Armed with an emulator and a cross compiler, let us write a hello world program and test it using this toolchain.

Minimal Executable

Platform

First, we have to make some choices with regards to the platform we will be targeting. We know we are targeting MIPS, but there are so many variations. The significance of those choices will become apparent later. Our choices are as follows:

MIPS 64 bit. There are 32 and 64 bit MIPS architectures, we choose the modern 64 bit architecture.
N64 ABI. The Application Binary Interface (ABI) is a description for the the rules and conventions we have to follow while writing applications, including the function calling conventions, the used data types, and more. There are three available ABIs for MIPS: O32, N32, and N64. We choose the modern N64 ABI.
Little Endian. We choose little endian because that's what we are used to.

Exit Program

The simplest program we can write is a program that just exits. So let us start with that. The assembly is as follows:

.text
.global __start
__start:
  # Exit.
  move $a0, $zero
  dli $v0, 5058
  syscall

We start by declaring the __start symbol as global, that is, accessible and visible to the linker. Why __start? Because that's the default entry point for ELF executables, so that's where our program will start executing. Then we call the exit syscall. Since we are using the N64 ABI, we have to use the calling conventions specified by that ABI, which states:

Up to eight integer registers ($4 .. $11) may be used to pass integer arguments.
Up to eight floating point registers ($f12 .. $f19) may be used to pass floating point arguments.

Notice that the first argument is either $4 or $12, the second is either $5 or $13, the third is $6 or $14, and so on depending on their type. The exit syscall takes only a single integer argument, the status. So we pass it through the first integer argument register $a0 which is $4. Next, to identify the syscall we are calling, the $v0 register is loaded with the syscall identifier. To find the appropriate identifier, we have to look at the Linux kernel headers, in particular, we look at the base identifier for the N64 ABI in unistd.h, which is 5000, and add to it the syscall number in syscall_n64.tbl, which is 58, so the final identifier for the exit syscall is 5058.

To compile this, we use the following Clang invocation:

clang --target=mips64el-linux-gnu -nostdlib -static -fuse-ld=lld -o helloWorld helloWorld.s

We choose the target to be the MIPS 64 little endian with the default ABI. Next, since we don't need to link the standard library, we exclude it. Then, we statically link the binary using the lld linker. It is important to choose the LLVM linker because the default GNU linker is not a cross linker as discussed before.

To run the output binary, we use Qemu as follows:

qemu-mips64el helloWorld

You can check the return status using the following bash command:

echo $?

Which will be zero in our case. Try changing the status of the syscall and see how this value match.

Hello World Program

Now that we have successfully wrote, compiled, and ran our first program. Let us try to print Hello World! To print something, we should use the write syscall. The assembly code is as follows:

.data
hello_string: .ascii "Hello World!\n"
hello_string_length: .quad . - hello_string

.text
.global __start
__start:
  # Print Hello World.
  dli $a0, 1
  dla $a1, hello_string
  ld $a2, hello_string_length
  dli $v0, 5001
  syscall

  # Exit.
  move $a0, $zero
  dli $v0, 5058
  syscall

The write syscall have the following interface:

ssize_t write(int fildes, const void *buf, size_t nbyte);

The first argument is the file descriptor of the file to write in, we need to print to the standard output, which have a file descriptor of 1, so we pass 1 to the $a0 register.

The second argument is a pointer to the string, which is the address of the hello_string label in our case, so we load the address into the $a1 register.

Finally, the last argument is the number of characters to write. We compute the the length of the string and store it in the hello_string_length label using the expression:

. - hello_string

The dot special symbol in the GNU assembler resolves to the address of the current label, and since the address of the current label minus the address of the hello_string label gives the length of the string, the hello_string_length label correctly stores the length of the string. So we load the value of hello_string_length into the $a2 register.

Calling the write syscall is very similar to what we did before. Check the syscall table linked above to confirm the syscall identifier.

If you try to run this binary, however, you will get a segmentation fault. The next section investigate this issue and solves it.

Global Offset Table

The reason we got a segmentation fault is because we missed something important. Let us look at the disassembly of the executable section of the binary to get some insights:

┌─[omar]──[/tmp]
└─╼ llvm-objdump -d helloWorld

helloWorld:     file format elf64-mips
Disassembly of section .text:
0000000000020210 <__start>:
   20210: 01 00 04 24   addiu   $4, $zero, 1
   20214: 20 80 85 df   ld      $5, -32736($gp)
   20218: 28 80 86 df   ld      $6, -32728($gp)
   2021c: 00 00 c6 dc   ld      $6, 0($6)
   20220: 89 13 02 24   addiu   $2, $zero, 5001
   20224: 0c 00 00 00   syscall
   20228: 25 20 00 00   move    $4, $zero
   2022c: c2 13 02 24   addiu   $2, $zero, 5058
   20230: 0c 00 00 00   syscall

The disassembly is close enough to our source, some of the pseudo instructions were compiled into actual instructions, but that's not important. The two instructions that load the string address and the string length seems to use the $gp register as the base address for the load operation, with some offset numbers.

The $gp register stores the address of the Global Offset Table (GOT). Indeed, all N64 MIPS binaries are compiled using Position Independent Code (PIC), that means access to global data like the string and the string length are done through an extra level of indirection. This level of indirection is the GOT. We will not get into the details of how PIC or GOT works, but essentially, the GOT is a table in the data section of the binary that stores the effective address of global data and any access to global data has to done through this table. The offsets as seen above are the offsets into this table as filled by the linker.

The important thing to note is, the instructions use the $gp register to find the global data. But in our program, $gp is not initialized anywhere! And the ABI doesn't guarantee it to be initialized. So that's probably why we are getting a segmentation fault.

Programmers need to compute the value of the $gp register as needed. There is an assembler directive for computing the value of the $gp register called .cpsetup. The .cpsetup directive have the following interface:

.cpsetup r, {offset | s}, label

Where register r holds the run-time address of any label and the label is that same label. The second argument of the directive is either an offset number or a register s. The N64 ABI specify that the $gp register is callee-saved, which means that any function that changes the $gp register need to restore it to its original value before returning. If an offset is passed, the original value of the $gp register is stored in the stack frame at the specified offset. If the s register is passed, the original value of the $gp register will be stored in that register. The value of the $gp register can be restored before returning using the .cpreturn directive passing the same offset or the s register.

But before we try to understand how this directive works. We need to understand a certain point described in the N64 ABI. At the start of a routine, the $t9 register is guaranteed to hold the run-time address of the routine currently executing. This is guaranteed by the caller, that is, the caller sets the $t9 register before calling the function.

Essentially, the distance between the the address of the GOT and the address of any label in the executable is known at link-time. If we know the run-time address of that label, we can easily compute the run-time address of the GOT because the distance is constant. And that's exactly what the directive does to compute the value of the $gp register.

Combining the aforementioned facts, it should be possible for us to compute the value of the $gp register by passing the $t9 register as the input r register of the directive and the label of the routine as the input label of the directive. The problem is, __start is the entry function and $t9 is not set by the caller, because there isn't one, and the ABI makes no guarantees that $t9 will be initialized. The challenge now is to compute the value of the $t9 register.

Let us peek into the source code of GLibc and see how they do it. If we look at the entry function definition of GLibc in sysdeps/mips/start.S, we see that there are two setup macro calls for the $gp register when PIC is defined, SETUP_GP and SETUP_GPX64. Their definition is available in sysdeps/mips/sys/asm.h. The first macro SETUP_GP resolves to nothing on the N64 ABI. The second macro SETUP_GPX64 is the one that actually calls .cpsetup. We see that it uses a neat trick to call the directive.

The SETUP_GPX64 executes a "fake" unconditional branch and link instruction to a local forward label in order to update the $ra register, which after the instruction now stores the run-time address of the the local label. Then it runs .cpsetup using that address and the same local label. Indeed, we need the run-time address of any label, even if it was a local one, so this works just as if we had the $t9 register. Notice, however, that we don't need the first and last instructions as the original $ra register is undefined anyway at the beginning of __start. Moreover, we also don't need to store the original $gp because it is also undefined, so we just pass $zero as the s register of the directive, which will work as a /dev/null in this case. We can then use this macro to initialize the $gp register as follows:

.macro initialize_gp
  bal 1f
  1:
  .cpsetup $ra, $zero, 1b
.endm

.data
hello_string: .ascii "Hello World!\n"
hello_string_length: .quad . - hello_string

.text
.global __start
__start:
  initialize_gp

  # Print Hello World.
  dli $a0, 1
  dla $a1, hello_string
  ld $a2, hello_string_length
  dli $v0, 5001
  syscall

  # Exit.
  move $a0, $zero
  dli $v0, 5058
  syscall

And that's it, if you compile and run this it will print Hello World!

Doing More

Alright, how do we print or read an integer or a double? Are there syscalls for that? Certainly not. Such functionality is too high level for a syscall. We could write our own parsers using the read and write syscalls, but as I have come to realize, parsing and printing IEEE doubles is not an easy task. Looks like we will be needing Libc routines like printf and scanf after all. And that's where our next challenge begin, which we shall tackle in the next section.

Using Libc

The Libc library on our systems are compiled for our host system, so we can't use them with MIPS programs. We will need to get a version compiled for MIPS. Moreover, it would be best if we can statically link Libc to our program to simplify things. Otherwise we will also need a runtime for MIPS, and we will need a dynamic linker for MIPS, and we will need to setup Qemu to deal with those libraries and runtime, which is not something we want to get into. GLibc is not ideal for static linking. But there is Musl Libc.

musl is an implementation of the C standard library built on top of the Linux system call API, including interfaces defined in the base language standard, POSIX, and widely agreed-upon extensions. musl is lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety.

And most importantly, it is ideal for static linking! The good news is that there are full pre-compiled cross compiler toolchains based on musl available for both GCC and Clang. The Clang solution, called ELLCC, doesn't seem to be maintained any more. But the GCC solution, called musl.cc, is maintained and seems perfect for our use.

Musl.cc

There are many toolchains available on musl.cc, but we need the cross compiler for the little endian 64 MIPS platform, downloadable from this link:

https://musl.cc/mips64el-linux-musl-cross.tgz

So, download and extract this archive and lets get to work.

Hello World Again

The assembly for the hello world program using the Libc functions is much simpler:

.data
hello_string: .asciz "Hello World!"

.text
.global main
main:
  # Print Hello World.
  dla $a0, hello_string
  jal puts

  # Exit.
  move $a0, $zero
  jal exit

First, notice that we are using main as our entry label now since Libc will do the initialization for us. We also don't need to worry about the $gp register as it was initialized by Libc as well.

Secondly, while we said before that the address of any function we call need to be stored in the $t9 register before we make the function call as mandated by the ABI for PIC code, we don't have to do that manually because the assembler takes care of it. So we simply call the standard puts function passing the string address as the first argument. And we also call the standard exit function using the status as the first argument. This program can be compiled using the following GCC invocation:

./mips64el-linux-musl-cross/bin/mips64el-linux-musl-gcc -static -o helloWorld helloWorld.s

And the binary is executed just like before. That's it, we have a hello world program using Libc!

Advanced Libc Usage

Now lets look at some more example usages of Libc that are a bit more advanced. But first, lets define some useful macros that will come in handy later.

Utilities

We will define five macros as follows:

.equ DWORD_SIZE, 8

# Allocate a number of double words on the stack.
.macro stack_allocate n
  daddiu $sp, $sp, -(DWORD_SIZE * \n)
.endm

# Free a number of double words from the stack.
.macro stack_free n
  daddiu $sp, $sp, (DWORD_SIZE * \n)
.endm

# Store the GPR r in the stack at index n.
.macro stack_store_gpr r, n
  sd \r, (DWORD_SIZE * \n)($sp)
.endm

# Load the stack value at index n into GPR r.
.macro stack_load_gpr r, n
  ld \r, (DWORD_SIZE * \n)($sp)
.endm

# Load the stack value at index n into FPR r.
.macro stack_load_fpr r, n
  ldc1 \r, (DWORD_SIZE * \n)($sp)
.endm

# Load the address of the stack element at index n into GPR r.
.macro stack_load_address r, n
  daddiu \r, $sp, (DWORD_SIZE * \n)
.endm

It should be clear what each of those macros do.

stack_allocate(n) allocates a new stack frame with n double words.
stack_free(n) frees the stack frame containing n double word.
stack_store_gpr(GPR) stores the input GPR in the stack at the input index.
stack_load_gpr(GPR) loads the stack value at the input index into the input GPR.
stack_load_fpr(FPR) loads the stack value at the input index into the input FPR.
stack_load_fpr(GPR, index) loads the address of the stack value at the input index into the input GPR.

Scanf

In this section, we shall define two routines that read a long and a double from the user respectively using scanf.

Read Long

The code for the Read Long routine is as follows:

# Read a long and store it in $v0.

.data
read_long_format: .asciz "%ld"

.text
readLong:
  stack_allocate 2
  stack_store_gpr $ra, 0

  dla $a0, read_long_format
  stack_load_address $a1, 1
  jal scanf
  stack_load_gpr $v0, 1

  stack_load_gpr $ra, 0
  stack_free 2
  jr $ra

First notice that the N64 ABI calling conventions specify:

Function results are returned in $2 (and $3 if needed), or $f0 (and $f2 if needed), as appropriate for the type.

Since we will be returning an integer type, we should return in $2 which is $v0. The data section contains the format string for the scanf which specify a long. The routine starts by allocating two stack elements in a new stack frame. We store the $ra register as the first element at index 0 in order return from the function correctly after we are done.

The scanf takes as a first argument the format string, so we load the string address in $a0. The second argument of the scanf is a pointer to a double word where the long will be stored. By pointer we mean the address of a double word. We already allocated space in the stack for this value, so we just have to store its address in $a1. Then we call scanf and load the stack value at index 1 into the return register $v0, which now stores the long read by scanf. Finally, we load the $ra register from the stack, free the stack frame, and return.

Read Double

Reading a double is very similar to reading a long, we will just load an FPR onto $f0 instead. The code for the read double is as follows:

# Read a double and store it in $f0.

.data
read_double_format: .asciz "%lf"

.text
readDouble:
  stack_allocate 2
  stack_store_gpr $ra, 0

  dla $a0, read_double_format
  stack_load_address $a1, 1
  jal scanf
  stack_load_fpr $f0, 1

  stack_load_gpr $ra, 0
  stack_free 2
  jr $ra

Checking the return value of scanf is left to the reader as an exercise.

Printf

In this section, we shall define two routines that print a long and a double from the user respectively using printf.

Print Long

Printing a long is very straightforward. The assembly is as follows:

# Print the long stored in $a0.

.data
print_long_format: .asciz "%ld"

.text
printLong:
  stack_allocate 1
  stack_store_gpr $ra, 0

  move $a1, $a0
  dla $a0, print_long_format
  jal printf

  stack_load_gpr $ra, 0
  stack_free 1
  jr $ra

The format string address is passed in $a0 and the long is passed in $a1.

Print Double

Printing a double is a bit less straightforward. The assembly is as follows:

# Print the double stored in $f12.

.data
print_double_format: .asciz "%lf"

.text
printDouble:
  stack_allocate 1
  stack_store_gpr $ra, 0

  dmfc1 $a1, $f12
  dla $a0, print_double_format
  jal printf
  stack_load_gpr $ra, 0
  stack_free 1
  jr $ra

The important thing to note here is the following instruction:

  dmfc1 $a1, $f12

The input double in $f12, which is an FPR, is moved into the GPR $a1 and passed to printf. That's because of the following point in the N64 ABI calling convention:

Whenever possible, floating point arguments are passed in floating point registers regardless of whether they are preceded by integer parameters.

Variable argument routines require an exception to the previous rule. Any floating point parameters in the variable part of the argument list (leading or otherwise) are passed in integer registers.

printf is a variable length argument routine, that's why the double needed to be passed in the GPR.

References

The references for MIPS are rather scarce. The best resources I found are available in the Silicon Graphics Archives. I suggest you read the following: