Vincent Corbee

Posted on Aug 21

HTTP Server in arm64 assembly Apple silicon M1

#programming #assembly #softwaredevelopment #arm

How do you create a simple http server? If you are a node programmer you may simply respond with: Simple, just use the http module, create a server, listen on a port, and voila you have an http server. Ok, but can you do it in arm64 assembly? I you may respond with: Yeah… no.. why would I even want to know? And you have a valid point! But Isn’t it cool to know how you would actually instruct the cpu and os to do does things for you? If you are like me you would think it’s very cool! If not, you probably belong to the majority of programmers. So for the former, let’s build an http server in arm64 assembly.

Naming

Let’s start with preventing some confusion around names. You may have heard of ARM64 or AArch64 or even ARMv8. So what is the difference? Let’s clarify.

ARMv8 This is the architecture of the cpu. ARMv8 is used to describe the overall architecture.

AArch64 This is the name used to describe the 64-bit execution state of the ARMv8 architecture. ARMv8 also as 32-bit execution state named AArch32 but we won’t go into that.

ARM64 This is also used as an alias for AArch64.

In the rest of this article we will stick with the name arm64 when we talk about any of the above.

Gotchas

First caveat though, as the title says, we are going to be creating our server on an Apple silicon machine. You can build along on other arm64 machines or emulators but you have to adjust the system call numbers accordingly.

Another thing to keep in mind is that we will only cover what we will need in other to get the http server working, so this is not an in depth tutorial about arm64 assembly. Besides, I am not really qualified either way. Check out this excellent tutorial for a detailed look into arm64 or any of the resources in the reference section for that matter.

Something that was pretty frustrating is although macOS is based on FreeBSD, not all system calls — which we will get to later — are available. For example, recv which we want to use to read incoming messages, is not available as a system call on macOS, the same applies to timers, e.g. nanosleep and other thread related calls. As far as I know these are in the Mach part of the kernel. But I haven’t figured out how to use them, so we have to make do without them. You can find the signature of a mach system call here.

Finally, roughly speaking in an assembly file we work with two sections. One section for our instructions, which is called .text and is readonly. And a second sections which is called .data which is used for reading and writing data. Only on macOS we can’t use .data. This means that we can only define read only data in our .text section. Also we have to use a different instruction to load data.

Tools and what not

In order to compile our programs, we should have the latest xCode. To compile our program we are going to you as for our assembler and ld for our linker. These are not actually from GNU, underneath they use Clang.

The next tool that is pretty useful is a debugger. It lets us step through our program and look inside registers and memory addresses to see what data is stored. This is pretty handy since we don’t have for example a console.log to debug our program. We will be using lldb which should be installed on your mac.

Editor wise, I’m using VSCode but you can use any editor you like.

What will be of help when writing assembly code for macOS is to open /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr in your editor if you can. This contains the header files for and has lots of useful information for us.

Another very useful source is de source browser. Here you can find the source for xnu, the kernel for macOS.

Call to Arm(s)64

I’m going to assume you know something about assembly language in general. If not, in the reference section are some good sources of information to get you going. So arm is a flavour of assembly language known as RISC meaning Reduced Instruction Set Computer as apposed too for example X86 which belongs to the CISC family meaning Complex Instruction Set Computer. So for one they try the use as few instructions as possible. This even means that not all instructions you use are actual instructions, they are aliases around other instructions.

Further, arm64 has a so called load store architecture which means that only load and store instructions can access memory directly. So a constraint is that you can only manipulate data via registers. Another thing is that every instruction is 4 bytes wide. This means that your data needs to be four bytes aligned or your program won’t work. Ok enough for now, let’s look at some code!

Hello world(ish)

So when creating program, the mandatory thing we need to do is create a Hello world program. A common thing to do is “print” Hello world! But in arm64 that is some next level shit! What we are going to is something much more exciting… Are you ready? We are going to run a program and.. exit with a status code 🤯. I know right? So let’s get typing.

Let start by creating a file called hello-world.s. We start of with the following lines:

.global main
.align 4
.text

First thing to note here is that things starting with a dot are not instructions, but are directives for the assembler. .global tells the assembler what symbol is visible to the linker program. In our case this is main. The next one is .text which defines readonly data where our instruction will reside. The next one is .align. Because arm64 needs to be 4 byte aligned, we use the value 4 here.

main:

First we jot down our main label to match our .global directive. You might ask, what is a label? A label is nothing more than an alias for an address. So in this case our .global directive points the address that this label represents.

Instructions

You might ask what the heck is an instruction? Wel an instruction is a way of “instructing” the cpu what to do. In arm64 an instruction is 32 bits long and is divided into fields of bits that mean certain things. Luckily we don’t have to set those bits our selfs, but let the assembler do it for us. So for us, an instruction can be broken down into an opcode and operands. The opcode is the command we want to use. We don’t use the code directly, we use a mnemonic. That instruction commands the cpu to “do” something. What that something is, is determined by the operands. I like to think in terms of the operator and the operands or function and function arguments. What the operands are, differs per instruction. But types of operands are: registers, immediate values, labels, memory addresses.

Now let’s write some instructions.

main:
  mov x0, #2          ; move immediate value 2 into register x0

Our first instruction is: mov x0, #2. So what does that instruction do? As you might expects, it is a move instruction. What it does is that is moves the value of the second operand in the register in the first operand. So you can think of the instruction as mov destination, source. In our case we move the value 2 into register x0. The # sign indicates that this is an immediate value, more specifically the decimal value 2. We could also have use hexadecimal or binary notation. In that case we have to prefix the number with 0x and 0b respectively. So we could have used 0x2. Great, our x0 register now contains the value 2.

A few thing to note here. First because an instruction is always 32 bits wide, there is a limit as to how big an immediate value can be. For the mov instruction that is 16 bits. Second, an instruction can have multiple signatures, just like for example a function can have multiple signatures. This also applies to the mov instruction.

Registers

A register can hold a 64 bit value and there are 31 general purpose registers x0 — x30. We can also use any of those register in a 32 bit mode as w0 — w30. In this case the high 32 bits will be zeroed out. Some of these register have a special meaning, these are:

x0 Is used as the return value from a subroutine — function.
x1 Is used if more than 64 bits needs to be returned.
x8 Is used to hold the system call number unless.
x16 Is used for the system call number on macOS because you know, Apple.

Note that on macOS register x18 is reserved and should not be used.

x29 / fp Is used for the frame pointer. Pointers to the bottom of the stack frame.
x30 / lr Is used to hold the location to the next instruction after a subroutine.

Next to the general purpose registers we have some special registers:

xzr | wzr This can be used as the zero register and can be used to set a register to zero.
sp The stack pointer, points to the current address in the stack.
pc The program counter, contains the location of the current instruction, which you cannot access it explicitly.
spsr Saved program status register, which you also cannot access explicitly.

There are also SIMD and Floating-Point Registers. But we won’t cover them here, since we won’t be using them.

Let’s get back to our code.

main:
  mov x0, #2          ; move immediate value 2 into register x0
  mov x16, #1         ; move the syscall number for exit into register x16

Our next instruction is similar: mov x16, #1. Here we move the value 1 into register x16.

main:
  mov x0, #2          ; move immediate value 2 into register x0
  mov x16, #1         ; move the syscall number for exit into register x16
  svc 0x80            ; make supervisor call

The last one is: svc 0x80. What does this do? Well we need a way to signal to the OS that we want to it to do something for us. So what this instruction does is, it makes a supervisor call. It lets the program communicate with the OS. We use the value 0x80 here because that is what macOS uses. In macOS this value is actually ignored. We can change it to another value and it still works fine.

System calls

Great we let the OS know that we want to do something, but how does it know “what” to do? That is where system calls come into play. You can find the list of system calls for mac os here. So in register x16 we used the value 1. What does that do? This is actually the system call number for the exit system call. Ok, what about the status code? Well, in x0 we put the value 2 which will be used as the status code. You can think of a system call as calling a function. For example to exit a program in Node you would use exit(2) to exit the program with a status code of 2. If we look at the system call for exit we see the following C function signature: void exit(int status). We can see that the return type is void and it takes a status as an argument. So in our case we call the function as exit(2).

The ABI for system calls on macOS is

ARM system call interface:

swi 0x80
args: r0-r6
return code: r0
on error, carry bit is set in the psr, otherwise carry bit is cleared.

Ok, let’s build and run this program. We are going to use as to build and ld to link the program.

as -o hello.o hello.s

As will output an object file, more precise a Mach-O object file for macOS. The first argument -o defines the name of the output file. The second argument defines the input file. So when we run this command, we will get hello.o as output. We now have to use the linker to turn in into an executable file.

ld -o hello hello.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e main -arch arm64

As mentioned above, the linker will output an executable, more precise a Mach-O executable for macOS. The first argument -o hello defines the output file name. The second argument defines the source, that would be our newly created object file. The next argument -lSystem is needed for our program to work. This tells the linker to link with libSystem.dylib. To tell the linker where it can find this file, we use -syslibroot xcrun -sdk macosx — show-sdk-path. Remember that we used main as our global label? We have to tell the linker that this is our entry point with -e main. Lastly we the the linker which architecture to use with -arch arm64.

Now we when run.

./hello

Our program exits immediately. Now, to see if our program exited with our status code type.

echo $?

This will write the status code of the last run program to stdout. No if our program worked, we would have gotten 2. Dit you get it? If so congratulations, we have our hello world! This gives our output but we can’t really see what is going on. To look inside, we are going to run lldb with our hello program as the target.

In your terminal type.

lldb ./hello

You should see something like the following.

(lldb) target create "./hello"
Current executable set to '../hello' (arm64).
First we are going to define a break point where the program should stop when it run. We want to break at our main label. To do that type the following.

(lldb) b main

Then we run the program with the following.

(lldb) r

Now the program should stop at the specified label. Now we can step through our program and inspect registers. First we step through after our first mov instruction with the following.

(lldb) si

Remember that x0 is our status code. Now to read the contents type the following.

(lldb) re read x0

Now we should the value that is inside which is 2.

(lldb) re read x0
      x0 = 0x0000000000000002

To exit lldb type exit and then y.

Lets build a server

Alright, now let’s build a server! We will write our program in server.s. Next to our program we will write some macros which will reside in macros.s. We will also create a makefile to make it easier to compile our program. Let’s create the following file structure.

/src
- server.s
- macros.s
/obj
/bin
makefile

Our makefile will contain the following.

BIN = server

ASSEMBLER = as
LINKER = ld
ENTRYPOINT = main
ARCH = arm64
LIB_SEARCH_PATH = System -syslibroot `xcrun -sdk macosx --show-sdk-path`

ODIR = obj
BDIR = bin
SDIR = src

BINPATH = $(BDIR)/$(BIN)

SRCS = $(wildcard $(SDIR)/*.s)
OBJS = $(patsubst $(SDIR)/%.s, $(ODIR)/%.o, $(SRCS))

$(shell mkdir -p $(ODIR) $(BDIR))

$(ODIR)/%.o: $(SDIR)/%.s
 $(ASSEMBLER) -o $@ $< -g

$(BINPATH): $(OBJS)
 $(LINKER) -o $(BINPATH) ${OBJS} -l$(LIB_SEARCH_PATH) -e $(ENTRYPOINT) -arch $(ARCH)

.PHONY: clean

clean:
 rm -rf *~ $(ODIR) $(BDIR)

What this basically does is it first takes all our source files and passes them to the assembler. The assembler then outputs object files in obj folder. It then takes these object files and passes them to the linker. The linker then creates an executable and places it in our bin folder. Lastly we have a separate command clean which removes all the files from our obj and bin directory.

Now in server.s.

.global main
.align 4
.text

.include "./src/macros.s"

.equiv AF_INET, 0x2
.equiv SOCK_STREAM, 0x1
.equiv IPPROTO_IP, 0x0

.equiv STDOUT, 1

.equiv REQUEST_BUFFER_SIZE, 4096
.equiv ADDRESS_SIZE, 0x10
.equiv ADDRESS_LEN_SIZE, 0x10
.equiv STACK_SIZE_MAIN, REQUEST_BUFFER_SIZE + CLIENT_ADDRESS_SIZE + CLIENT_ADDRESS_LEN_SIZE

.equiv VAR_server_address, 0x10
.equiv VAR_client_address, 0x20
.equiv VAR_client_address_len, 0x30
.equiv VAR_request_buffer, 0x40

We start with the .global, .align, and .text directives. The we have a new directive, the .include directive. With this directive we tell the assembler to include the file at the specified path. In our case that is our macros. Next up we have another new directive .equiv. With this directive we can define constant values that we want to refer to by name throughout our application. You also have .equi which differs from .equiv in that it can be overridden. And since we don’t want that, we use .equiv. We have defined a bunch of constants. We will cover them ones we will use them. Some of them probably are already self explanatory.

After that we define our main label.

...

main:
  mov x12, VAR_stack_size_main
  stack.frame.create x12

  stack.frame.destroy x12

  sys.exit wzr

Now is a good time to talk about functions in assembly.

Functions / subroutines

In assembly we don’t have functions like in more high level languages. We also don’t call them functions but subroutines. But we do have similar concepts like caller, callee, stack frame, return address, return value and function arguments. Our main label can be seen as a function. We can define our main function signature as int main(void). It takes zero arguments and returns an integer. From here on we will just stick to the term function.

Stack

Just like any function we need a way to store data locally within the function. And how do we do this you might ask? What we can do is store this data on the stack. The stack is a region of memory that we have access to when the program runs and is of fixed size. We can add or remove things from the stack. The way data is added, is by pushing it to the stack. And to retrieve / remove, we pop data from the stack. When we push data to the stack it grows down to lower memory addresses.

So how do we work with the stack? Recall that we have a register called sp. This is called the stack pointer. Its value is the memory address of the top most item on the stack. We mentioned push and pop operations. Arm64 does not have a push and pop instruction like in arm32. The reason is that the stack pointer needs to be 16 byte aligned and a register is 8 byte. So how do we push and pop items onto / off the stack? There are two instructions that we can use to place registers onto the stack: str — store register and stp — store pair. And we have to corresponding instruction to retrieve items from the stack, ldr — load register and ldp — load pair.

Recall that we said earlier that arm64 has a load-store architecture. This means in order to work with data in memory, we need to load from and store into registers. To do this we can use the instructions above in combination with addressing modes.

Load / Store addressing modes

Arm64 has the following addressing modes in their simplest form. Let’s look at these modes with the str and ldr instruction and use the registers x0 and sp.

Simple register [base] Store
This stores the value of the x0 into or load from sp

str x0, [sp]
ldr x0, [sp]

Offset [base, #imm] [base, Xm]
This stores the value of the x0 from sp or loads from sp but with an offset. This can be an immediate value or a register.

str x0, [sp, #0x10]
ldr x0, [sp, #0x10]

Pre-indexed [base, #imm]!
This works the same as with an offset set except that after it calculates the address, it stores that back into sp.

// sp now contains the old address + 0x10
str x0, [sp, #0x10]!
ldr x0, [sp, #0x10]!

Post-indexed [base], #imm
This is just like the first mode but afwards it updates sp to contain the new address.

// sp now contains the old address + 0x10 after data is stored / retrieved
str x0, [sp], #0x10
ldr x0, [sp], #0x10

PC-relative load label
We don’t use this for the stack. It adds an offset to the program counter based on the label that is referenced and stores that in the destination register. As an example

adr x0, message

Stores the calculated address of pc — message into register x0.

Let’s get back to the stack.

With str we can push a register onto the stack as follows.

str x0, [sp, #-16]!

We use pre-index addressing mode to store x0 at the memory location of sp minus 16 bytes and afterwards subtract 16 from sp. But with this instruction we waste 8 bytes because sp needs to be 16 bytes aligned.

To pop a value from the stack into a register we use

ldr x0, [sp], #16

We use post-index addressing mode to load the value at the stack pointer and add 16 afterwards to the sp so that the stack pointer is restored to before pushing.

With stp we push two registers at once onto the stack like so

stp x0, x1, [sp, #-16]!

And popping them

ldp x0, x1 [sp], #16

Stack frame

In a function we usually don’t push and pop but most of the time we would access variables in random order. Luckily we are not limited to push and pop when operating on the stack. What we want is to allocate some same space on the stack that we can use within our function. To tie a piece of the stack to our function we can create what is called a stack frame. When you have a chain of multiple function calls, you can trace it back. In other words, you get a stack trace. And how do we make one? The way we do that is by decrementing the stack pointer with the space we need. But remember when we do this, the sp needs to be 16 byte aligned.

Ok, say we have three variables in our function var a, b and c which are 8 bytes each which gives us a total of 24 bytes. We need to round this total up so it meets the alignment criteria which will yield 32 bytes. We can now create our stack frame as follows

stp lr,fp, [sp, #-16]!
sub sp, sp, #32
mov fp, sp

We first store lr and fp onto the stack and use pre-indexing to subtract 16 bytes from sp. Then we subtract 32 bytes from sp and finally we move the value of sp into fp so our frame pointer and stack pointer align. This is also called the function prologue.

The way we store a variable, is by using an offset from the fp. Because the stack grows down, we use a positive offset. So we could store a, b and c as follows assuming the values are in registers x0, x1 and x2.

stp x0, x1, [fp, #32]
str x2, [fp, #16]

This way x0 is at the bottom and x2 is at the top. If we need them we simple load the values from the stack.

ldp x0, x1, [fp, #32]

Now at the end of the function we need to clean up our stack. The way we do this, is to restore registers we saved and increment de stack pointer.

add sp, #32
ldp fp, lr, [fp], #16

We first move the stack pointer up by 32 bytes. Next we restore lr and fp and move the stack pointer up by 16 bytes using post-indexing. This is also called the function epilogue.

Function arguments

So how do you pass function arguments to a function? The way that is usually done in arm64 is that registers x0 — x7 are used for function arguments. If you want to pass more arguments to a function they are stored on the stack. Registers x0 — x18 are called scratch registers in that these registers may be changed by the function. So if you want to preserve them, you have to store them before you call the function. Registers x19 — x30 are the responsibility of the callee to insure that the contents of these registers are preserved after the function finishes.

Return value

Functions can also return values. The way values are typically return from a function is by storing them in x0 and for values larger dan 64 bits x1 is also used.

Calling a function

So how do we call a function? Suppose we have the following code.

main:
  mov x0, #1
  mov x1, #2

  bl add_numbers

  ret

add_numbers:
  ...

  ret

We first setup two arguments to our function that are stored in x0 and x1. We then have the instruction bl add_numbers. The bl instructions is called branch with link. What it does is that it stores the memory address of the first instruction of the bl instruction in the link register(lr). It then branches to the memory location of our function add_numbers. When the return instruction is executed in the called function, it branches back to the value in the lr register. Which in our case is the ret instruction in our main function.

In our case we do that by using the macro stack.frame.create which we will write shortly.

At the end of our main function we also need to deallocate or destroy our stack frame. We do this with our macro stack.frame.destroy with the same size. At the end we exit the program with status code zero using our macro sys.exit.

So we need to write a macro. But first what is a macro? If you programmed in for example C, this will sound familiar. A macro is a way to generate code. Our assembler will replace every invocation of our macro with the code defined in that macros. This is help full when our code is repetitive or verbose and we don’t what to write it over and over again and we don’t want the overhead of a function call.

In macros.s add the following.

.ifndef __MACROS

.align 4

__MACROS:

.equiv SWI_SYSCALL, 0x80

.equiv SYS_exit, 1

.equiv STACK_entry_size, 0x10

/* Syscalls */

.macro sys.call
  svc SWI_SYSCALL
.endm

.macro sys.exit
  mov w16, SYS_exit
  sys.call
.endm

/* Stack */

.macro stack.frame.create $size
  sub sp, sp, STACK_entry_size
  sub sp, sp, \$size
  stp fp, lr, [sp]
  mov fp, sp
.endm

.macro stack.frame.destroy $size
  ldp fp, lr, [sp]
  add sp, sp, STACK_entry_size
  add sp, sp, \$size
.endm

.endif

We start with .ifndef __MACROS and and with .endif. We do this so that when this macro is includes a second time, it won’t load the contents if the __MACROS is already defined. That is because an include just loads the contents of the source into the destination file. ifndef only accepts a label so we have added it as a blank label which does not do anything.

We will include a macro for every system call we use. Why do we do this? Well for every system call we have some boiler plate. So instead of writing these instruction we simply use our macro. For example for the exit system call in stead of

mov x0, #0
mov x16, #1
svc 0x80

We will use

mov x0, #0
sys.exit

Not only do we write less instructions but also state more clearly which system call we use.

Server socket

In order to create a server, we need a socket that a client can connect too. We need a server socket. So let’s create one. Let’s add the following in server.s.

...

...

/* Error messages */

error_message_socket: .string "Could not create socket\n"
error_message_socket_len = . - error_message_socket
.align 2

main:
  ...
  mov w0, AF_INET                                       ; domain = AF_INET
  mov w1, SOCK_STREAM                                   ; type = SOCK_STREAM
  mov w2, IPPROTO_IP                                    ; protocol = IPPROTO_IP
  sys.socket

  b.cs error_socket                                     ; if carry flag is set, jump to error_socket

  mov w19, w0                                           ; store server_socket fd in x19

  mov w0, w19                                           ; sockfd = server_socket;
  sys.close

  ...

/* Error handling */

error_socket:
  mov w9, w0                                            ; store error code in w9

  mov w0, STDOUT
  adr x1, error_message_socket
  ldr x2, =error_message_socket_len
  sys.write

  mov w0, w9
  sys.exit

In our main function we create our server socket with the socket system call. We can find the definition here. It is defined as follows.

int socket(int domain, int type, int protocol);
The function takes a domain, type and protocal and it returns a file descriptor if successful.

mov w0, AF_INET                                       ; domain = AF_INET
mov w1, SOCK_STREAM                                   ; type = SOCK_STREAM
mov w2, IPPROTO_IP                                    ; protocol = IPPROTO_IP
sys.socket

We set up our arguments and call the system call with sys.socket. I all goes well we should return a file descriptor. But what if it fails? In that case it returns -1 and set’s errno to the error code. But, in our case the error code will be returned in x0, so how do we now if an error occurred? Well if an error occurred, the carry flag — can be found here — in the status register will be set. So we use the instruction

b.cs error_socket

What this does is that it jumps to the address of error_socket when the carry flag is set.

An overview of the error codes on MacOS can be found here.

In error_socket we use the write system call to write an error message to stdout.

ssize_t write(int fildes, const void *buf, size_t nbyte);

It takes a file descriptor, in our stdout, an in put buffer, in our case the address of error_message_socket and the number of bytes to write, in our case the length of our message.

mov w0, STDOUT
adr x1, error_message_socket
ldr x2, =error_message_socket_len
sys.write

Our buffer is loaded into x1 with the adr instruction. What this does is load the address of error_message_socket.

error_message_socket: .string "Could not create socket\n"
At this label we use the .string directive to define our error message. The string directives creates a null terminated ascii string.

ldr x2, =error_message_socket_len

In x2 we use the ldr to load the value of error_message_socket_len. The = let’s us load the value that is located at error_message_socket_len. Here we have defined

error_message_socket_len = . - error_message_socket

What this does is it takes the address of this location and subtracts the location of error_message_socket. What remains is the difference i.e. the length of our string.

After writing the error, we exit the program setting the status code with our error code. For this we use our macro sys.exit.

In our main code, when we are successful, we go ahead and close the server socket with sys.close. The definition of close is

int close(int fd)

The rest of our main function is going to be written before the close call.

In macros.s we add the following.

...

...
.equiv SYS_write, 4
.equiv SYS_close, 6
.equiv SYS_socket, 97

...

.macro sys.socket
  mov w16, SYS_socket
  sys.call
.endm

.macro sys.close
  mov w16, SYS_close
  sys.call
.endm

.macro sys.write
  mov w16, SYS_write
  sys.call
.endm

...

.endif

Bind

Next we need bind to our socket to an address. What we use for this is not surprisingly the bind system call.

...

/* Error messages */

...
error_message_bind: .string "Could not bind to port\n"
error_message_bind_len = . - error_message_bind
.align 2

main:
  ...

  mov  x3, #0x0200                                      ; sin_len = 0, sin_family = PF_INET
  movk x3, #0xD204, lsl #0x10                           ; sin_port = 1234
  movk x3, 0x0000, lsl #0x20                            ; sin_addr = INADDR_ANY
  movk x3, 0x0000, lsl #0x30                            ; ...

  stack.store x3, VAR_server_address                    ; store server_address to the stack

  mov w0, w19                                           ; sockfd = server_socket;
  stack.loadadr x1, VAR_server_address                  ; *sockaddr = server_address;
  mov w2, ADDRESS_LEN_SIZE                              ; socklen_t = sizeof(server_address) = 16 bytes
  sys.bind

  b.cs error_bind

  ...

/* Error handling */

...

error_bind:
  mov w19, w0
  mov w0, STDOUT
  adr x1, error_message_bind
  ldr x2, =error_message_bind_len
  sys.write

  mov w0, w19
  sys.exit

The function definition for bind is

int bind(int socket, const struct sockaddr *address, socklen_t address_len)

It takes a socket in our case our server socket, a pointer to an address, which will need to create and length of the address as an input. It returns 0 if it is successful. So we need to create a sockaddr struct. But what is it and how do we do that? We can find the struct in netinit/in.h.

struct sockaddr_in {
 __uint8_t       sin_len;
 sa_family_t     sin_family;
 in_port_t       sin_port;
 struct  in_addr sin_addr;
 char            sin_zero[8];
};

sin_len — uint8 This is not used, so we can just use 0.

sin_port — uint16 The port number we want to use, which is 1234.

sin_address — uint32 This is the address we want to bind to, because we don’t care, we use the value of INADDR_ANY.

sin_zero — char This is just padding of 8 bytes.

If we add of these bytes together, we get a total of 16 bytes. So we also know the address_len argument.

One thing to note is that this data needs to be stored in network byte order which is big endian. This means the most significant byte is stored at the smallest memory address. With little endian it is the other way around. Arm supports both. And on macOS little endian is used. Therefor we need to reverse the order of bytes.

mov  x3, #0x0200                                      ; sin_len = 0, sin_family = PF_INET
movk x3, #0xD204, lsl #0x10                           ; sin_port = 1234
movk x3, 0x0000, lsl #0x20                            ; sin_addr = INADDR_ANY
movk x3, 0x0000, lsl #0x30                            ; ...

The way we do that is as follows, we first move the sin_family, which is 0x02 and the length which is 0x00 in reverse order. We then use the instruction movk. With this instruction we preserve that is already in the register. So we first move our port number in reverse order and shift it over 16 bits. So now we have in our register 0xD2040200. Then we do the same for our sin_addr. And these are just zeros. So we end up with: 0x00000000D2040200. Because we must supply a pointer to the value, we store this value onto the stack with

stack.store x3, VAR_server_address

We can then make our call

mov w0, w19                                           ; sockfd = server_socket;
stack.loadadr x1, VAR_server_address                  ; *sockaddr = server_address;
mov w2, ADDRESS_LEN_SIZE                              ; socklen_t = sizeof(server_address) = 16 bytes
sys.bind

As you can see we load the address of server address with stack.loadadr.

Now if all goes well, the carry flag should not be set. If it is we use the same error mechanism we used previously.

In marcos.s add the following

...

...
.equiv SYS_bind, 104

...
.macro sys.bind
  mov w16, SYS_bind
  sys.call
.endm

/* Stack */

...

.macro stack.store $reg, $offset
  stp \$reg, xzr, [fp, \$offset]
.endm

.macro stack.loadadr $reg, $offset
  add \$reg, fp, \$offset
.endm

.endif

We have added a macro for storing a value onto the stack and one for retrieving the address from the stack which is an offset to the current frame pointer.

Listen

Next, we need to listen to the socket for incoming connections. For that we use listen.

int listen(int fd, int backlog)

It takes a file descriptor, our server socket and a backlog which is the maximum number of pending connections, which we set to 1.

...

/* Error messages */

...
error_message_listen: .string "Error while listening\n"
error_message_listen_len = . - error_message_listen
.align 2

/* Messages */

message_listen: .string "Listening for connections\n\n"
message_listen_len = . - message_listen
.align 2

main:
  ...

  mov w0, w19                                           ; sockfd = server_socket;
  mov w1, #1                                            ; backlog = 1
  sys.listen

  b.cs error_listen

  mov w0, STDOUT
  adr x1, message_listen
  ldr x2, =message_listen_len
  sys.write

  ...

/* Error handling */

...

error_listen:
  mov w19, w0
  mov w0, STDOUT
  adr x1, error_message_listen
  ldr x2, =error_message_listen_len
  sys.write

  mov w0, w19
  sys.exit

We use sys.listen to make the system call. If something goes wrong, we use the same routine we should be familiar with. If we don’t have an error, we print a message to stdout that we are listening for connections. 🎉

In marcos.s we add the following.

...

...
.equiv SYS_listen, 106

...

/* Syscalls */

...

.macro sys.listen
  mov w16, SYS_listen
  sys.call
.endm

...
.endif

Accept

When we get an incoming connection, we also need to accept that connection. The way we do that is surprise surprise, accept.

int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);

The accept function takes in a socket, in our case our server socket, a pointer to a buffer where the accept call stores the client address and a pointer to a buffer where the length of the client address is stored.

...

/* Error messages */

...
error_message_accept: .string "Error while accepting\n"
error_message_accept_len = . - error_message_accept
.align 2

...

main:
  ...

  /* Our main loop where we listen and respond to incomming connections */
  loop:
    stack.loadadr x7, VAR_client_address                ; load *client_address

    mov w0, w19                                         ; sockfd = server_socket;
    mov x1, x7                                          ; *address = client_address;
    stack.loadadr x2, VAR_client_address_len            ; *address_len;
    sys.accept

    b.cs error_accept

    mov w10, w0                                         ; store client_socket fd in xw10

    b loop
  ...

/* Error handling */

...

error_accept:
  mov w19, w0
  mov w0, STDOUT
  adr x1, error_message_accept
  ldr x2, =error_message_accept_len
  sys.write

  mov w0, w19
  sys.exit

First we define a label loop. This loops keeps running until the program stops. We first load our client address pointer into x7. We then move our socket descripter for our server socket into x0, the first argument of accept. We then move our client address pointer into x1, the second argument. Finally we load our third argument into x2, our buffer for the client address length. Then we are ready to call accept with sys.accept. We then check to see if we have any errors.

Now if we don’t have any errors, accept wilt wait until there is a pending connection. We then go on and store our client socket descriptor in w10. When we reads the end of the loop, we branch back up.

In macro.s add the following.

...

...
.equiv SYS_accept, 30
...


/* Syscalls */

...

.macro sys.accept
  mov w16, SYS_accept
  sys.call
.endm

...

Read

Now when we have accepted an incoming connection, we need to read the data that it sends, i.e. the request. We will be using read.

ssize_t read(int fildes, void *buf, size_t nbyte);

The arguments we need is: a file descriptor, a pointer to a buffer we the data is going to be stored and the size of the buffer.

...

/* Error messages */

...
error_message_read: .string "Error while reading\n"
error_message_read_len = . - error_message_read
.align 2

...

main:
  ...

  /* Our main loop where we listen and respond to incomming connections */
  loop:
    ...

    mov w0, w10
    stack.loadadr x1, VAR_request_buffer                ; *buffer = buffer;
    mov x2, REQUEST_BUFFER_SIZE                         ; length = REQUEST_BUFFER_SIZE;
    sys.read

    b.cs error_read

    stack.loadadr x1, VAR_request_buffer
    mov x2, x0
    mov x0, STDOUT
    sys.write

    ...

    b loop

  ...

/* Error handling */

...

error_read:
  mov w19, w0
  mov w0, STDOUT
  adr x1, error_message_read
  ldr x2, =error_message_read_len
  sys.write

  mov w0, w19
  sys.exit

We first move our client socket descriptor from w10 to w0, the first argument. The second argument, the pointer to the buffer we load from the stack at the location of VAR_request buffer and store it in x1. We then move the buffer size into x2 which will be our final argument. Then we call read with sys.read. The first the we do again is check if there are errors and take care of that error. I we don’t have an error, we are going to write the request to STDOUT so we can see what was send.

In macro.s add the following.

...

...
.equiv SYS_read, 3
...

/* Syscalls */

...

.macro sys.read
  mov w16, SYS_read
  sys.call
.endm

.macro sys.close
  mov w16, SYS_close
  sys.call
.endm

...

Send

Now that we have processed request, we want to respond to that request. Because it would be rude not to right? We could use write for that, but we are not gonna, because we are rebels and because what I could get what I wanted to use which is send, to work. So we are going to use sendto.

ssize_t sendto(
  int socket, 
  const void *buffer, 
  size_t length,
  int flags, 
  const struct sockaddr *dest_addr, 
  socklen_t dest_len)

With this system call we can send a message to another socket. It takes as arguments: a socket descriptor, a pointer to a buffer containing the data to be send, the length of that buffer, additional flags — which we won’t be using — , a destination address and the length of that address. Because we have a connected socket which has an address, we don’t have to specify the destination address nor its length.

...

/* Error messages */

...
error_message_send: .string "Error while sending data\n"
error_message_send_len = . - error_message_send
.align 2

/* Messages */

...
response: .string "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 15\r\n\r\nHello, world!\n"
response_len = . - response
.align 2

main:
  ...

  /* Our main loop where we listen and respond to incomming connections */
  loop:
    ...

    mov w0, w10                                         ; sockfd = client_socket;
    adr x1, response                                    ; *buffer = response;
    ldr x2, =response_len                               ; length = response_len;
    mov w3, wzr                                         ; flags = 0;
    mov w4, wzr                                         ; *dest_addr = 0;
    mov w5, wzr                                         ; dest_len = 0;
    sys.sendto

    b.cs error_send

    b loop

  ...

  ...

/* Error handling */ 

...

error_send:
  mov w19, w0
  mov w0, STDOUT
  adr x1, error_message_send
  ldr x2, =error_message_send_len
  sys.write

  mov w0, w19
  sys.exit

We start by moving our client socket descriptor w10 into w0 our first argument. Then we load the address of our response message into x1 which is our second argument a pointer to the buffer. Next we load the value at the address of response_len into x2 which is our third argument the length of the buffer. Since we won’t be using the last three arguments, we will simply set them to zero by moving wzr into w3, w4 and w5. With all our arguments set we can make the system call with sys.sendto.

Of course the last thing we need to do is our error checking and handling.

In macros.s add the following.

...

....
.equiv SYS_sendto, 133

...

/* Syscalls */

...
.macro sys.sendto
  mov w16, SYS_sendto
  sys.call
.endm

...

That was everything we needed for our http server! Our final program should look like the following.

.global main
.align 4
.text

.include "./src/macros.s"

.equiv AF_INET, 0x2
.equiv SOCK_STREAM, 0x1
.equiv IPPROTO_IP, 0x0

.equiv STDOUT, 1

.equiv REQUEST_BUFFER_SIZE, 4096
.equiv ADDRESS_SIZE, 0x10
.equiv ADDRESS_LEN_SIZE, 0x10

.equiv STACK_SIZE_MAIN, REQUEST_BUFFER_SIZE + ADDRESS_SIZE + ADDRESS_LEN_SIZE

.equiv VAR_server_address, 0x10
.equiv VAR_client_address, 0x20
.equiv VAR_client_address_len, 0x30
.equiv VAR_request_buffer, 0x40

/* Error messages */
error_message_socket: .string "Could not create socket\n"
error_message_socket_len = . - error_message_socket
.align 2
error_message_bind: .string "Could not bind to port\n"
error_message_bind_len = . - error_message_bind
.align 2
error_message_listen: .string "Error while listening\n"
error_message_listen_len = . - error_message_listen
.align 2
error_message_accept: .string "Error while accepting\n"
error_message_accept_len = . - error_message_accept
.align 2
error_message_read: .string "Error while reading\n"
error_message_read_len = . - error_message_read
.align 2
error_message_send: .string "Error while sending data\n"
error_message_send_len = . - error_message_send
.align 2

/* Messages */
message_listen: .string "Listening for connections\n\n"
message_listen_len = . - message_listen
.align 2
response: .string "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 15\r\n\r\nHello, world!\n"
response_len = . - response
.align 2

main:
  mov x12, STACK_SIZE_MAIN
  stack.frame.create x12

  mov w0, AF_INET                                       ; domain = AF_INET
  mov w1, SOCK_STREAM                                   ; type = SOCK_STREAM
  mov w2, IPPROTO_IP                                    ; protocol = IPPROTO_IP
  sys.socket

  b.cs error_socket                                     ; if carry flag is set, jump to error_socket

  mov w19, w0                                           ; store server_socket fd in x19

  mov  x3, #0x0200                                      ; sin_len = 0, sin_family = PF_INET
  movk x3, #0xD204, lsl #0x10                           ; sin_port = 1234
  movk x3, 0x0000, lsl #0x20                            ; sin_addr = INADDR_ANY
  movk x3, 0x0000, lsl #0x30                            ; ...

  stack.store x3, VAR_server_address                    ; store server_address to the stack

  mov w0, w19                                           ; sockfd = server_socket;
  stack.loadadr x1, VAR_server_address                  ; *sockaddr = server_address;
  mov w2, ADDRESS_LEN_SIZE                              ; socklen_t = sizeof(server_address) = 16 bytes
  sys.bind

  cmp x0, xzr
  b.ne error_bind

  mov w0, w19                                           ; sockfd = server_socket;
  mov w1, #1                                            ; backlog = 1
  sys.listen

  b.cs error_listen

  mov w0, STDOUT
  adr x1, message_listen
  ldr x2, =message_listen_len
  sys.write

  /* Our main loop where we listen and respond to incoming connections */
  loop:
    stack.loadadr x7, VAR_client_address                ; load *client_address

    mov w0, w19                                         ; sockfd = server_socket;
    mov x1, x7                                          ; *address = client_address;
    stack.loadadr x2, VAR_client_address_len            ; *address_len;
    sys.accept

    b.cs error_accept

    mov w10, w0                                         ; store client_socket fd in w10

    mov w0, w10
    stack.loadadr x1, VAR_request_buffer                ; *buffer = buffer;
    mov x2, REQUEST_BUFFER_SIZE                         ; length = REQUEST_BUFFER_SIZE;
    sys.read

    b.cs error_read

    stack.loadadr x1, VAR_request_buffer
    mov x2, x0
    mov x0, STDOUT
    sys.write

    mov w0, w10                                         ; sockfd = client_socket;
    adr x1, response                                    ; *buffer = response;
    ldr x2, =response_len                               ; length = response_len;
    mov w3, #0                                          ; flags = 0;
    mov w4, wzr                                         ; *dest_addr = 0;
    mov w5, wzr                                         ; dest_len = 0;
    sys.sendto

    b.cs error_send

    b loop

  mov w0, w19                                           ; sockfd = server_socket;
  sys.close

  stack.frame.destroy x12

  mov w0, wzr
  sys.exit

  error_socket:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_socket
    ldr x2, =error_message_socket_len
    sys.write

    mov w0, w19
    sys.exit

  error_bind:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_bind
    ldr x2, =error_message_bind_len
    sys.write

    mov w0, w19
    sys.exit

  error_listen:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_listen
    ldr x2, =error_message_listen_len
    sys.write

    mov w0, w19
    sys.exit

  error_accept:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_accept
    ldr x2, =error_message_accept_len
    sys.write

    mov w0, w19
    sys.exit

  error_read:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_read
    ldr x2, =error_message_read_len
    sys.write

    mov w0, w19
    sys.exit

  error_send:
    mov w19, w0
    mov w0, STDOUT
    adr x1, error_message_send
    ldr x2, =error_message_send_len
    sys.write

    mov w0, w19
    sys.exit

And macros.s

.ifndef __MACROS

.align 4

__MACROS:

.equiv SWI_SYSCALL, 0x80

.equiv SYS_exit, 1
.equiv SYS_read, 3
.equiv SYS_write, 4
.equiv SYS_close, 6
.equiv SYS_accept, 30
.equiv SYS_socket, 97
.equiv SYS_bind, 104
.equiv SYS_listen, 106
.equiv SYS_sendto, 133

.equiv STACK_entry_size, 0x10

/* Syscalls */

.macro sys.call
  svc SWI_SYSCALL
.endm

.macro sys.exit
  mov w16, SYS_exit
  sys.call
.endm

.macro sys.socket
  mov w16, SYS_socket
  sys.call
.endm

.macro sys.close
  mov w16, SYS_close
  sys.call
.endm

.macro sys.write
  mov w16, SYS_write
  sys.call
.endm

.macro sys.bind
  mov w16, SYS_bind
  sys.call
.endm

.macro sys.listen
  mov w16, SYS_listen
  sys.call
.endm

.macro sys.accept
  mov w16, SYS_accept
  sys.call
.endm

.macro sys.read
  mov w16, SYS_read
  sys.call
.endm

.macro sys.sendto
  mov w16, SYS_sendto
  sys.call
.endm

/* Stack */

.macro stack.frame.create $size
  sub sp, sp, STACK_entry_size
  sub sp, sp, \$size
  stp fp, lr, [sp]
  mov fp, sp
.endm

.macro stack.frame.destroy $size
  ldp fp, lr, [sp]
  add sp, sp, STACK_entry_size
  add sp, sp, \$size
.endm

.macro stack.store $reg, $offset
  stp \$reg, xzr, [fp, \$offset]
.endm

.macro stack.loadadr $reg, $offset
  add \$reg, fp, \$offset
.endm

.endif

Now we can compile and run our program with.

make
./bin/server

We should see the following in our terminal.

Listening for connections

Now if we go to our browser and type in localhost:1234 we should get Hello world! printed to the screen. And in our terminal we should be able to see the request as follows.

GET / HTTP/1.1
Host: localhost:1234
Connection: keep-alive
Cache-Control: max-age=0
sec-ch-ua: "Not(A:Brand";v="24", "Chromium";v="122"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
DNT: 1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

And that’s it, we have created our http server in arm64 assembly for macOS! Isn’t that something.

Conclusion

And that wraps our how build a “HTTP Server in arm64 assembly Apple silicon M1”. I hope you learned something and I wish you good look on the rest of your arm64 assembly journey.

References

Smith 2020
S. Smith, Programming with 64-Bit ARM Assembly Language: Single Board
Computer Development for Raspberry Pi and Mobile Devices, New York: Apress 2020

‘Arm instruction set reference’, https://developer.arm.com/documentation/100076/0100/A64-Instruction-Set-Reference/A64-General-Instructions

‘Introduction to ARM64v8, https://book.hacktricks.xyz/macos-hardening/macos-security-and-privilege-escalation/macos-apps-inspecting-debugging-and-fuzzing/arm64-basic-assembly

‘Arm64 Tutorial’, https://mariokartwii.com/armv8/

‘MacOS system calls’, https://opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master

‘FreeBSD man pages’, https://man.freebsd.org/cgi/man.cgi?query=&apropos=0&sektion=2&manpath=macOS+14.3.1&arch=default&format=html

‘Linux system call table’, https://www.chromium.org/chromium-os/developer-library/reference/linux-constants/syscalls/

‘An introduction to arm64 assembly — macOS adaptations’, https://github.com/below/HelloSilicon?tab=readme-ov-file

‘Using the Stack in AArch64: Implementing Push and Pop’, https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch64-implementing-push-and-pop

‘Using the Stack in AArch32 and AArch64’, https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch32-and-aarch64

‘iOS ARM64 Syscalls’, https://stackoverflow.com/questions/56985859/ios-arm64-syscalls

‘xnu’, https://github.com/apple-oss-distributions/xnu

‘Writing ARM64 code for Apple platforms’, https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms