DEV Community

Cover image for Your FIRST STEPS on the ASSEMBLY Programming Language!
Weslley Neves
Weslley Neves

Posted on

Your FIRST STEPS on the ASSEMBLY Programming Language!

Introduction

I will not focus on history. I will be assuming you are a beginner on the Assembly programming language. And assuming that, I will explain it for anybody who's interested to understand an assembly code to really understand it. The assembler that will be used on this article is nasm (Netwide Assembler) and we will be coding in x86_64 Linux assembly.


Why Assembly?

Assembly is often called "The Father of Programming Languages" because it serves as a bridge between high-level languages and the raw machine code executed by a computer's hardware, providing precise control over system resources and performance.

So, Assembly should be chosen not only for those who are coming from C/C++, but also for those who want to understand about what happens on your code and computer at bit-level.

Today's code

The code we are taking a look today is this one:

section .data
  say db "Say something!", 0xA, 0
  say_len equ $ - say

  input_char db "> ", 0
  input_char_len equ $ - input_char

  said db "You said: "
  said_len equ $ - said

section .bss
  input resb 128

section .text
global _start

_start:
  mov rax, 1
  mov rdi, 1
  mov rsi, say
  mov rdx, say_len
  syscall

  mov rax, 1
  mov rdi, 1
  mov rsi, input_char
  mov rdx, input_char_len
  syscall

  mov rax, 0
  mov rdi, 0
  mov rsi, input
  mov rdx, 128
  syscall

  mov rax, 1
  mov rdi, 1
  mov rsi, said
  mov rdx, said_len
  syscall

  mov rax, 1
  mov rdi, 1
  mov rsi, input
  mov rdx, 128
  syscall

  mov rax, 60
  mov rsi, 0
  syscall
Enter fullscreen mode Exit fullscreen mode

At the end of this article, you will be able to read it again and understand what it means and how it works. But first we need to understand some things.


Assembly's code structure

The Assembly code is devided in segments/sections (I rather to say sections) and each section has it's own responsabilities. We have the section

  • .data: Where initialized data (variable pointer) is put.
  • .rodata: Where read-only initialized data (constant pointer) is put.
  • .bss: Where uninitialized data is put (also a pointer).
  • .text: Where the actual executable code is put.

Defined and Reserved bytes

Where something like variables and constants are created.

Defined bytes

Defined bytes are created when you assign a value to an identifier. You need three elements:

  1. The name of the identifier (label).
  2. The size in bytes.
  3. The value.

For example, in C, you might write char* hello = "Hello, World!";. In Assembly, the equivalent is:

hello db "Hello, World!"
Enter fullscreen mode Exit fullscreen mode

The db directive stands for "define byte". With db, it means that every chunk of value of this "variable" will be stored in the size of 1 byte each one.

And you can also define it in chunks of higher value bytes if you need to. These are the possible values:

  • db: Defines in chunks of 1 byte (8 bits).
  • dw: Define word. Defines in chunks of 2 bytes (16 bits).
  • dd: Define double-word. Defines in chunks of 4 bytes (32 bits).
  • dq: Define quad-word. Defines in chunks of 8 bytes (64 bits).

Reserved bytes

They are pointers to undefined values. An these are the directives:

  • resb: Reserves in chunks of 1 byte (8 bits).
  • resw: Reserve word. Reserves in chunks of 2 bytes (16 bits).
  • resd: Reserve double-word. Reserves in chunks of 4 bytes (32 bits).
  • resq: Reserve quad-word. Reserves in chunks of 8 bytes (64 bits).

Registers

Registers are like volatile boxes that have a value assigned to it. And since we are coding in x86_64, these are the registers of this family:

General Purpose Registers

  • RAX: Accumulator (arithmetic operations and function calls).
  • RBX: Base (general-purpose, preserved across function calls).
  • RCX: Counter (used in loops and some instructions like rep).
  • RDX: Data (arithmetic operations and I/O).
  • RSI: Source Index (string operations and general-purpose).
  • RDI: Destination Index (string operations and general-purpose).
  • RBP: Base Pointer (used to access local variables, preserved across function calls).
  • RSP: Stack Pointer (points to the top of the stack, used in function calls and flow control).
  • R8 to R15: Additional general-purpose registers.

Deeper look into registers

Image description

Taking RAX as an example for knowing more about the registers:

  • RAX: Re-extended ax. As previously said, it is used in arithmetic operations and function calls.
  • EAX: Extended ax. The 32-bit version of RAX.
  • AX: 16-bit version of RAX.
  • AL: 8-bit subdivision of AX (least significant bit of AX).
  • AH: 8-bit subdivision of AX (most significant bit of AX).

Learn more about least and most significant bit here


System Calls

System calls, or syscalls, are the interface between a user program and the operating system kernel. They allow programs to request services from the kernel, such as reading from or writing to files, allocating memory, or terminating a process. In x86_64 Assembly, the syscall instruction is used to invoke these services.


How System Calls Work

When a program makes a syscall:

  1. The program sets specific values in registers to indicate the syscall number and its parameters.
  2. The syscall instruction is executed.
  3. The operating system processes the request and returns a result, typically in a register.

You can consult the Linux Syscalls Table here.

Registers Used in Syscalls

  • RAX: Contains the syscall number (identifies the service to invoke).
  • RDI: The first argument for the syscall.
  • RSI: The second argument for the syscall.
  • RDX: The third argument for the syscall.
  • R10: The fourth argument for the syscall.
  • R8: The fifth argument for the syscall.
  • R9: The sixth argument for the syscall.
  • The return value of the syscall is stored in RAX.

Example: Writing to Standard Output

Below is a simple example where the program writes "Hello, World!" to the terminal using the write syscall:

section .data
    hello db "Hello, World!", 0xA, 0 ; The message to write, followed by a newline and a null terminator

section .text
global _start

_start:
    ; Syscall: write
    mov rax, 1          ; Syscall number for write (1)
    mov rdi, 1          ; File descriptor for standard output (1)
    mov rsi, message    ; Address of the message
    mov rdx, 15         ; Length of the message
    syscall             ; Make the syscall

    ; Syscall: exit
    mov rax, 60         ; Syscall number for exit (60)
    mov rdi, 0          ; Exit status (0)
    syscall             ; Make the syscall

Enter fullscreen mode Exit fullscreen mode

The mov instruction moves values between registers and memory addresses.


Labels

Labels in Assembly are identifiers followed by a : (colon). They act as markers in the code, serving as references for jumps, loops, or points to access data. Think of them as "raw functions" or "bookmarks" within your program.

Types of Labels

There are two main types of labels in Assembly:

  1. Local Labels: Used within a specific section of the program and cannot be accessed globally. These are typically written with a leading . (dot) to signify they are local.
  2. Global Labels: Accessible throughout the program and often marked with the global keyword for external visibility.

Declaring Labels

A label is simply an identifier followed by a colon (:). For example:

start:         ; This is a global label
.loop:         ; This is a local label
Enter fullscreen mode Exit fullscreen mode

Using labels for control flow

Labels are often used with control flow instructions like jmp (unconditional jump) or je (jump if equal). Here's an example of how labels are used in a loop:

section .text
global _start

_start:
    mov ecx, 5          ; Set the counter (ecx) to 5

.loop:                  ; Start of the loop
    dec ecx             ; Decrement the counter
    jnz .loop           ; Jump back to .loop if ecx != 0

    ; Exit the program
    mov rax, 60         ; Syscall for exit
    mov rdi, 0          ; Exit code 0
    syscall
Enter fullscreen mode Exit fullscreen mode

In this example:

  • .loop is a local label.
  • The program jumps back to .loop while the counter (ecx) is greater than zero.

Using Global Labels

Global labels can be accessed across files when linking multiple Assembly files. To declare a global label, use the global directive:

File 1: function.asm

section .text
global my_function

my_function:
    ; Code for the function
    ret ; returns to the caller 
Enter fullscreen mode Exit fullscreen mode

File 2: call_function.asm

extern my_function     ; Declare the external function

section .text
global _start

_start:
    call my_function   ; Call the external function

    ; Exit the program
    mov rax, 60
    mov rdi, 0
    syscall

Enter fullscreen mode Exit fullscreen mode

The global _start on each main file tells the linker where is the main entry for it to link.


Offset calculation

  1. say db "Say something!", 0xA, 0:
    • 0xA: Adds a newline character (\n).
    • 0: Null terminator (\0) to mark the string's end.

This creates the string "Say something!\n\0" in memory.

  1. say_len equ $ - say:
    • equ: Defines a constant value.
    • $: Represents the current memory address (after the string ends).
    • $ - say: Calculates the length of the string in bytes by subtracting the starting address (say) from the current address ($).

This computes the total string length, including the characters, newline (0xA), and null terminator (0).


And we're done.

Now read again the code we wrote at the beginning:

prompter.asm:

section .data
  ; Define initialized data
  say db "Say something!", 0xA, 0          ; The string "Say something!" followed by a newline (0xA) and null terminator (0)
  say_len equ $ - say                      ; Calculate the length of the string (current address minus 'say' label)

  input_char db "> ", 0                    ; The prompt string "> " followed by a null terminator
  input_char_len equ $ - input_char        ; Calculate the length of the prompt string

  said db "You said: ", 0                  ; The string "You said: " followed by a null terminator
  said_len equ $ - said                    ; Calculate the length of the "You said: " string

section .bss
  ; Define uninitialized data
  input resb 128                           ; Reserve 128 bytes of space for storing user input

section .text
global _start                              ; Define the program's entry point

_start:
  ; Display the message "Say something!"
  mov rax, 1                               ; Syscall number for 'write'
  mov rdi, 1                               ; File descriptor: 1 (stdout)
  mov rsi, say                             ; Address of the string "Say something!"
  mov rdx, say_len                         ; Length of the string
  syscall                                  ; Make the system call

  ; Display the prompt "> "
  mov rax, 1                               ; Syscall number for 'write'
  mov rdi, 1                               ; File descriptor: 1 (stdout)
  mov rsi, input_char                      ; Address of the prompt string "> "
  mov rdx, input_char_len                  ; Length of the prompt string
  syscall                                  ; Make the system call

  ; Read user input (up to 128 bytes)
  mov rax, 0                               ; Syscall number for 'read'
  mov rdi, 0                               ; File descriptor: 0 (stdin)
  mov rsi, input                           ; Address to store user input
  mov rdx, 128                             ; Max number of bytes to read
  syscall                                  ; Make the system call

  ; Display the string "You said: "
  mov rax, 1                               ; Syscall number for 'write'
  mov rdi, 1                               ; File descriptor: 1 (stdout)
  mov rsi, said                            ; Address of the string "You said: "
  mov rdx, said_len                        ; Length of the "You said: " string
  syscall                                  ; Make the system call

  ; Display the user input
  mov rax, 1                               ; Syscall number for 'write'
  mov rdi, 1                               ; File descriptor: 1 (stdout)
  mov rsi, input                           ; Address of the user input
  mov rdx, 128                             ; Max number of bytes to display
  syscall                                  ; Make the system call

  ; Exit the program
  mov rax, 60                              ; Syscall number for 'exit'
  mov rsi, 0                               ; Exit code: 0
  syscall                                  ; Make the system call

Enter fullscreen mode Exit fullscreen mode

Run the code

First we need to assemble it with nasm:

nasm -f elf64 prompter.asm -o prompter.o
Enter fullscreen mode Exit fullscreen mode

Then link the object file with ld:

ld prompter.o -o prompter
Enter fullscreen mode Exit fullscreen mode

And run the code:

./prompter
Enter fullscreen mode Exit fullscreen mode

Final considerations

You learned what Assembly is like and how to use it. But, that's only the peek of the iceberg. Thanks for reading and see you in the next article!

Top comments (0)