$ CBA : - an ARM7TDMI / GBA emulator, built from scratch
Author: Aditya R
Contents
- Introduction
- From the ground up
- What's an emulator, actually?
- Meet the ARM7TDMI
- The GBA's memory map
- ARM's party trick: conditional execution
- Genesis
- All about CBA
- File structure
- Workflow of CBA
- Instructions supported
- Main features
- How you can run CBA locally
- Where things stand
- Experience
- Fun stuff
- References
Introduction
What made me pick this project?
A few months after wrapping up psh a POSIX-ish shell I built with a team over a summer : - I wanted to go one level lower. A shell talks to the kernel. I wanted to know what's underneath the kernel: the chip itself, decoding raw bytes into "move this register into that one" and nothing else. An emulator felt like the natural next rabbit hole.
I picked the ARM7TDMI specifically because it's small enough for one person to reasonably understand end-to-end : - unlike, say, a modern x86 core and because it's the CPU inside the Game Boy Advance, which means every decision I make can be checked against real hardware, real games, and an extremely well-documented memory map. CBA is the result: a from-scratch attempt at a hardware-accurate GBA emulator in C++, with no borrowed CPU cores or existing emulator code, just the ARM architecture reference manual, GBATEK, and a lot of trial and error.
Unlike psh, this one's solo no teammates, no mentors, just me and a very large PDF.
From the ground up
Some background before diving in.
What's an emulator, actually?
An emulator is software that pretends to be a different piece of hardware, closely enough that programs written for the real thing run on it unmodified. A GBA emulator needs to reproduce the GBA's CPU, its memory layout, its graphics and sound hardware, and its timers, closely enough that a real cartridge dump can't tell the difference. CBA is currently focused on the hardest and most foundational part of that: the CPU.
Meet the ARM7TDMI
The ARM7TDMI is a 32-bit RISC processor from the early '90s that ended up everywhere the original GBA, the Nintendo DS (as a co-processor), the classic iPod, and a long list of embedded devices. The name is basically a spec sheet:
- T : - Thumb, a compressed 16-bit instruction encoding traded for smaller code size
- D : - Debug hardware support
- M : - a hardware Multiplier
- I : - an embedded In-circuit emulator interface
For an emulator author, the part that matters most is that it's really two instruction sets sharing one CPU: 32-bit ARM instructions and 16-bit Thumb instructions, with the ability to switch between them at runtime.
The GBA's memory map
Real hardware doesn't just have "RAM" : - it has several distinct memory regions, each with its own size, speed, and purpose, all mapped into one 32-bit address space:
0x00000000 – 0x00003FFF BIOS 16 KB
0x02000000 – 0x0203FFFF WRAM 256 KB (on-board work RAM)
0x03000000 – 0x03007FFF IWRAM 32 KB (on-chip, fast work RAM)
0x04000000 – 0x040003FE I/O registers
0x05000000 – 0x050003FF Palette RAM 1 KB
0x06000000 – 0x06017FFF VRAM 96 KB
0x07000000 – 0x070003FF OAM 1 KB (sprite attributes)
0x08000000 – 0x09FFFFFF Game Pak ROM / FlashROM
0x0E000000 – 0x0E00FFFF Game Pak SRAM
Every load/store instruction has to figure out which of these regions an address falls into and route it accordingly, which is most of what CBA's Memory class does.
ARM's party trick: conditional execution
Most instruction sets only let branches be conditional. ARM lets almost every instruction carry a 4-bit condition code, so "add these two registers, but only if the last comparison was equal" is a single instruction, no branch required. It's a neat trick for avoiding pipeline-stalling branches on simple if-statements and it means the first thing CBA's decoder has to do, before it even figures out what instruction it's looking at, is check whether to run it at all.
Genesis
CBA started about as small as a project can start: a Registers struct 16 general-purpose registers, plus the current and saved status registers (CPSR/SPSR) and a Memory class that could barely do more than load a raw binary into a buffer. Then a fetch → decode → execute loop that, for a good while, decoded nothing at all, it just walked the ROM printing hex and hoping.
Everything since has been filling in that decode step, one instruction category at a time, and checking the results against what the reference manual says should happen.
All about CBA
File structure
CBA
├── CMakeLists.txt
├── LICENSE
├── README.md
├── link.ld
├── makeGBA.sh
├── makeGBA.ps1
├── include
│ ├── arm.hpp
│ ├── cpu.hpp
│ └── memory.hpp
└── src
├── arm.cpp
├── cpu.cpp
├── emulator.cpp
├── kernel.s
├── memory.cpp
└── thumb.cpp
Workflow of CBA
On boot, CBA loads a raw .gba binary straight into ROM, peeks at the very first instruction to guess whether it's ARM or Thumb code, and sets the CPU's starting mode accordingly. From there it's a straightforward loop:
- Fetch : - read 4 bytes (ARM mode) or 2 bytes (Thumb mode) at the program counter
-
Decode : - for ARM, walk a table of
{mask, pattern, handler}entries and find the first one whose pattern matches the masked instruction bits; for Thumb, switch on the top opcode bits - Execute : - call the matching handler, which reads and writes registers and memory as needed
The mask-and-pattern dispatch table, instead of one giant switch, mirrors how real ARM decoders are usually built ARM's 32-bit encoding is dense enough that a table lookup stays a lot more maintainable than a wall of nested conditionals.
Instructions supported
ARM-mode decoding currently covers:
- BX : - branch and exchange, the actual ARM ⟷ Thumb mode switch
- MRS / MSR : - read and write the status registers, both immediate and register forms
- SWP : - atomic register/memory swap
- MUL / MLA : - multiply and multiply-accumulate
- Halfword / signed data transfer
- Branch / Branch-with-Link
- Load/Store : - word and byte, base register + immediate offset
- SWI : - software interrupt (today it logs the interrupt and halts, this is the eventual home for BIOS syscalls)
- Data processing : - the ALU ops (MOV, ADD, CMP, and friends), including flag updates
Multiply-long and block data transfer (LDM/STM) are stubbed but not wired up yet, and Thumb mode currently understands a handful of opcodes immediate MOV, register ADD, conditional branch, literal LDR rather than the full set.
Main features
- Full condition-code evaluation : - all 16 ARM conditions, on almost every instruction, not just branches
- A mask-and-pattern dispatch table for ARM decoding, instead of a wall of
ifs - Automatic ARM/Thumb mode detection on boot, by inspecting the first instruction in ROM
- A real, GBA-accurate memory map (BIOS / WRAM / IWRAM / I/O / Palette / VRAM / OAM / ROM / SRAM), not one flat byte array
- Hand-written ARM assembly test kernels, assembled and linked with a custom linker script, to check instruction behaviour against the reference manual
- In progress: full Thumb mode, block data transfer, multiply-long, and a real BIOS/syscall layer
- Up next: an SDL front-end for real-time framebuffer output, and tightening the fetch-decode-execute loop toward cycle-accurate timing
How you can run CBA locally
git clone https://github.com/adityatr64/CBA
cd CBA
cmake -B build
cmake --build build
To assemble a test ROM you'll need the arm-none-eabi GCC toolchain:
mkdir -p bin
./makeGBA.sh # Linux / macOS
./makeGBA.ps1 # Windows (PowerShell)
This assembles src/kernel.s against link.ld, drops a bin/kernel.gba, and hex-dumps the first few lines so you can eyeball the header. Then run the emulator which, courtesy of CMakeLists.txt, is quietly named PlusBoy behind CBA's back:
./PlusBoy
Where things stand
No formal benchmarks yet that's further down the roadmap, once there's a display to measure frame timing against. For now, correctness gets checked the old-fashioned way: a small hand-assembled ARM program runs through MOV, CMP, ADDS, MRS, and MSR, and after every single instruction the CPU dumps registers r0–r6 and all four condition flags to the terminal, so I can check them by hand against what the manual says should happen. It's not elegant. It's a lot of scrolling. It works.
Experience
Doing this solo after a team project like psh is a genuinely different kind of hard, there's no one to rubber-duck a bug with, and no code review to catch a forgotten sign-extension before it costs you an evening. Most of the bugs here aren't the segfault-and-gdb kind; they're the quieter, more annoying kind, where the code compiles fine, runs fine, and just quietly computes the wrong number a flag off by one bit, an offset that needed sign-extending and didn't, a condition checked backwards. The only real fix is going back to the datasheet, instruction by instruction, and being willing to be wrong a lot before being right.
It's also given me a much deeper respect for how much a "simple" CPU actually does per instruction. Rereading the same page of the manual for the tenth time and finally noticing the one detail you'd missed is a very specific kind of satisfying.
Fun stuff
- The compiled binary is secretly named
PlusBoyCMakeLists.txt's project name has never matched the repo name, and at this point it's staying that way on purpose. - The first program CBA ever ran "successfully" was 15 lines of hand-written ARM assembly that zeroes a register, compares it to zero, flips a couple of CPSR flags around, and then loops forever doing absolutely nothing. Watching those flags flip correctly in the terminal was, unreasonably, one of the best feelings of the project so far.
- Every single cycle currently prints a full register-and-flag dump to the terminal, so running anything longer than a few dozen instructions turns your terminal into a wall of hex. Future-me's problem.
References
- GBATEK : - Martin Korth's GBA hardware reference; CBA's memory map is credited straight from here
- ARM7TDMI Technical Reference Manual
- ARM Architecture Reference Manual (ARMv4T)
- Stack Overflow
- the gbadev community/wiki
Built solo, with reference material pulled from the sources above. Source: github.com/adityatr64/CBA
Top comments (0)