Making Pong in x86 Assembly

#assembly #low #learning #gamedev

Introduction 📘

So many developers won't have even touched assembly language let alone decide to write a whole program in it. This is understandable, reasons to write assembly are few and far between. However the knowledge you gain from doing as such can help build a deeper understanding of how computers function helping you write better more efficient code even in other programming languages.

So to start off lets take a look at what "x86 Assembly" actually means. Assembly code is device specific meaning code that runs one one machine might not work on another. This is because CPUs can have different architecture and therefore expect different instructions. Each architecture will have a certain set of instructions that the CPU is designed to use. Here we are using the x86 architecture which is what majority of laptops and desktops use. ARM and RISCV are the other two major ones. Whilst your code might not run on a different architecture the knowledge you gain from learning one variant of assembly will easily transfer to any other variant.

Boot loaders 🧰

A common use of assembly is writing bootloader games. These were popular 40 or so years ago where memory was expensive and so compacting a game into the 512 byte boot segment was a popular and challenging thing to do. It was the first segment that is loaded into memory when the computer boots, and is the first to starts running.

I decided to set myself this challenge of writing pong in assembly from the bootloader. I gave myself one allowance: the ability to utilize additional memory segments beyond the strict 512-byte boot sector. This is critical if I want to implement a font which requires lots of space to store. Here are some of the most interesting parts of this project.

One thing that you might not have expected is the fact that when booting computers are in 16 bit mode, even if it is a 64 bit processor. This may seem strange, and it is, but the reason for it is to keep backwards compatibility with older software whilst computers were still making the transition from 16 bit to 32 bit. This isn't the only weird thing about booting. For some reason the boot sector is loaded in the hex address 0x7c00. One would naturally expect it to be address 0, but no. The reason for this is that BIOS (code that boots the bootloader and does other stuff) can use values in the lower address space and overwriting that would be bad. So before any code is written it is important to have the two lines

[bits 16];tells the assembler this code is for 16 bit mode
[org 0x7c00];tells the assembler where the code is located in RAM

16 bit instructions are different to 32 bit instructions and therefore we need to tell the assembler to generate the 16 bit machine code not 32 bit. [org 0x7c00] is important so that the assembler can correctly calculate the address of our labels and data. This is because in assembly instead of using if, while, for you use labels. These are just flags that don't actually exist but the assembler knows there byte position in the machine code and you can tell the CPU to jump to them e.g

someLabel:
jmp someLable

creating an infinite loop. This links back to [org 0x7c00] if we don't tell the assembler the starting address of our labels it will miscalculate their positions. This means that jmp someLabel will send execution to completely the wrong place in memory, which is no bueno.

Right, now the assembler knows what its doing, lets take a look at the next vital part of our program.

[bits 16]
[org 0x7c00]
jmp main; Jump over variables to not execute as instructions
BOOT_DISK: db 0; Stores boot device ID
main:
  ;Stack grows downwards from the bootloader
  mov bp, 0x7c00
  mov sp, bp
  ;Save the boot device for loading the later segments
  mov byte[BOOT_DISK], dl

One of the first things we need to do is set up the stack. If you have programmed in C or C++ you may be familiar with what the stack is, but if not. The stack is data structure in computer science where you can only add or remove an item from the top but you can read from any where on the stack. This is how the majority of data is stored in a computer program, there is something else called the heap. This can be used for say, lists that can change dynamically in size during run time, but we don't need to worry about that for this project. However there is a problem to solve. After a function has been called and during which variables have been added to the stack, when that function returns how do we know how to restore the stack i.e remove all the data that was previously pushed whilst the function was running? To solve this we break the stack into sections called stack frames. A stack frame is created every time a function is called, and when it is created it pushes the address of the previous stack frame onto the stack. This means when climbing up the stack it can set bp = sp then read of the stack the new value for sp, returning the stack to it's state before the function call.

All we need to implement a stack is a value to store the address of the start of the stack and a value to store the address of the previse stack frame. Luckily for us, x86 architecture comes with two registers designed just for that. The names of the two registers are sp and bp, standing for stack pointer and base pointer. Stack pointer points to the previse stack frame and the base pointer to the bottom of the stack. You may think that to set up the stack you should do something like

mov bp, 0
mov sp, 0

setting their address to begging of memory seems sensible right? Wrong! The slightly unintuitive thing about stacks are that they grow downwards, each subsequent item added is at a lower memory address. That is why the following code works.

mov bp, 0x7c00; setting to the top of the stack
mov sp, bp; no previse stack frame so just set it to bp

Whilst the memory segments containing our program are loaded into higher addresses, the stack will use lower address. Understanding memory and therefore stacks is a really important part of righting good efficient programs in any language.

The BIOS will only load the boot sector, so it is our job to do the rest. We need to load the next, how ever many pages (512 bytes) of memory. This is the reason for the line.

mov byte[BOOT_DISK], dl

Upon boot the 8 bit id of the storage device that was used to boot from, is moved into the lower half of the 16 bit dx register, hence dl (data lower). We need to make a copy of this value so we know which device to read the remaining sectors off of. One of the great things about being in 16 bit mode is that we have access to BIOS routines, this is not the case in 32 bit / 64 bit mode because the processor uses different methods for managing memory which makes the code incompatible (16 bit mode uses segmentation, 64 bit mode uses paging and 32 bit mode uses a combination of paging and segmentation). To call these routines all you have to do is call the int instruction, short for interrupt, with some code which relates to some routine. So to read in the next segments from memory all we have to do is.

readDisk:
mov ah, 0x02; Specifies BIOS function: "Read sectors from disk"
mov bx, PROGRAM_SPACE ; ES:BX = destination address in memory to load the data
mov al, [BOOT_DISK_READ_SIZE] ; AL = number of sectors to read (each sector is 512 bytes)
mov dl, [BOOT_DISK] ; DL = BIOS drive number (e.g., 0x00 for floppy, 0x80 for HDD)
mov ch, 0x00 ; CH = cylinder number (part of the CHS address), here it's 0
mov dh, 0x00 ; DH = head number, starting at head 0
mov cl, 0x02 ;CL = sector number (sectors start at 1; this means sector 2)

int 0x13 ; BIOS disk hindering function — executes the read based on register setup
jc .errorCode ;carry flag is set if error occurs
ret ; returns from function back to callee
.errorCode:
    mov bx, diskReadErrMsg; bx is pointer to string
    call print; takes bx as argument
    jmp $; $ means current address therefore infinite loop

There is one small part of this code that you might not be able to understand just yet, and that is ES:BX. This simply means BX offset by ES. More precisely it translates to BX + ES * 16. Because this program is small we don't need to write into the higher address spaces so ES and all other segment registers are set to zero at the start by the bootloader.

This raps up all the functionality that we need from the bootloader. All we need to do now is call the readDisk function then jump to the loaded segments.

call readDisk
jmp 0x7e00; = 0x7c00 + 512

You may be wondering why the code jumps to the new segments and doesn't just run from the boot sector into the new segments. That would save space right? Well this is partially true but there is a special two bytes called the boot signature, 0xaa55. Located at the end of the boot sector which is mandatory. As our code grows we don't want it to overwrite these two bytes. So it is just simpler to jump over them and execute in the new segments.

Graphics 🎨

I think the most interesting part of this project is the graphics. I mean, what is a game without graphics, right? Luckily for use we can get graphics working within a few lines of code just by using in-built interrupts.

mov ax, 0x13 ; VGA mode 0x13
int 0x10 ; Sets the graphics mode based on ax

VGA mode 13 gives us a colour pallet size of 255 and a resolution of 320 x 200. A list of what int 0x10 can do can be found here (https://en.wikipedia.org/wiki/INT_10H). This is great, but now we need to be able to draw to the screen. There are two ways to do this, using interrupts and writing directly to memory. Using interrupts is simple, but at the cost of performance. The second option of writing directly to memory at a certain address where the graphics card will read from is much faster. This special place is at address 0xA0000 (not the same for all video modes), you may think, hang on this is bigger than 16 bit register can address, do we need to use segment registers? You would be right that is a valid way, but the strange thing is, in 16 bit mode we are still free to use 32 bit registers (not the case for 64 bit registers). To write to the first pixel on the screen (top left) all we need to do is.

mov eax, 0xA0000; eax is 32 bit version of ax
mov byte[eax], 200

200 represents the 200th colour in the VGA 13h colour pallet. This is very simple and very fast. Building upon this to draw text is easy. All we need to do is have a way to draw pictures, this can be done with two loops like so.

drawImg_f: ; eax = ptr edi = xpos esi = ypos
    pushad; pushes all resgisters to the stack
    push eax; stored because mul uses eax
        mov eax, SX; SX=screen size x = 320
        mul esi; = SX * ypos
        add edi, eax; xpos + SX * ypos
        add edi, BM; xpos + SX * ypos + buf offset
    pop eax; restore to value
    ;esi ecx x,y respectivly
    xor ecx, ecx; quicker than mov ecx, 0
        ; I chose to store the sizex and sizey before the start
        ; of the image therefore need to jump over both 4 bytes
    mov ebx, 8
    .loopY:
        mov edx, dword[eax+4]; get size y
        cmp ecx, edx
        jge .endLoopY

        xor esi, esi
        .loopX:
            mov edx, dword[eax]; get size x
            cmp esi, edx
            jge .endLoopX

            push eax        
                mov al, byte[eax+ebx]
                mov byte[edi], al
            pop eax

            inc esi 
            inc edi
            inc ebx
            jmp .loopX
        .endLoopX:

        add edi, SX
        sub edi, dword[eax]
        inc ecx
        jmp .loopY
    .endLoopY:  
    popad; pops all registers from the stack
    ret

Now all we need to do is add the images to our code. We want to have a way of getting the location of each numerical character so we can render it. To do this we can use labels and have the NASM pre-processor set them for use

%assign i 0 ; what number the font reprisents
%rep 10 ; set all 0-9 characters
Font%[i]: ; defines the label we can use later
        ;Gets the bin file from folder Font
    %defstr myString Font\\%[i].bin
        ;Short for include binary
    incbin myString
        ; incriments the number counter
    %assign i i+1
%endrep

Now Font0 is represents the address of the 0 character image and so on. This is all well and good, but image editors don't just output the plain image data. There is almost always some sort of compression going on. To get the plain data we can use a python script to help us.

import sys
from PIL import Image
import numpy
import numpy as np
#Takes in images names as arguments when program is launched
for i in range(1,len(sys.argv)):
    pic = Image.open(sys.argv[i])#Understands png format
    pix = numpy.array(pic.getdata(),numpy.ubyte)
    name = sys.argv[i]
    #Remove the .png from the end of the file name
    name = name[:-4]
    f = open(name+".bin","wb")
    #Writes image dimension each 4 bytes
    f.write(bytes(np.array([pic.size[0],pic.size[1]],np.uint)))
    #Writes the image data .flatten() means make it one 
    #dimentional list
    f.write(bytes(pix.flatten()))
    f.close()

All together it yields:
(https://github.com/asdf-a11/Pong_Boot_Sector/blob/main/README_Images/Animation.gif). I don't believe one can upload a gif, so here it is hosted in the repo.

This has been a quick overview into getting started with x86 assembly. I hope you found it engaging, any questions leave them in the comment section. If you would like to learn more, here is the link to my GitHub repo (https://github.com/asdf-a11/Pong_Boot_Sector/).

Common Bugs 🐛

Lastly for those who would like to get started with assembly bootloader shenanigans, I would like to help you avoid some of the pitfalls I fell in whilst I was learning. The first being, include special data at the start of you bootloader.

    jmp main ; dont want to execute this table by accident
    TIMES 3-($-$$) DB 0x90 
    OEMname: db    "mkfs.fat"
    bytesPerSector:    dw    512
    sectPerCluster:    db    1
    reservedSectors:   dw    1
    numFAT:            db    2
    numRootDirEntries: dw    224
    numSectors:        dw    2880
    mediaType:         db    0xf0
    numFATsectors:     dw    9
    sectorsPerTrack:   dw    18
    numHeads:          dw    2
    numHiddenSectors:  dd    0
    numSectorsHuge:    dd    0
    driveNum:          db    0
    reserved:          db    0
    signature:         db    0x29
    volumeID:          dd    0x2d7e5a1a
    volumeLabel:       db    "NO NAME    "
    fileSysType:       db    "FAT12   "

You may notice that your code runs in a emulator but not on real hardware. If you don't have this at the very start of your bootloader, that might be the problem. I don't know exactly what it does but make sure to include it. Another thing to be aware of is making sure the processor is in a known state. When booting registers and settings could be anything, make sure to set their values. For example at the start of my bootloader I do

;Dont need to use segment registers so sell them to 0
mov ax, 0x00
mov es, ax
mov ss, ax
mov ds, ax
;set the direction flag, it is used for certain string related
;instructions
cld