DEV Community

Cover image for ๐Ÿš€ Building Toy ARM64 Emulator
Aakash Apoorv
Aakash Apoorv

Posted on

1 1 1 1 2

๐Ÿš€ Building Toy ARM64 Emulator

Hey everyone! ๐Ÿ‘‹

๐Ÿค” Ever wondered what itโ€™s like to get really close to the chip level?
Dive into the world of ARM64 by building your own emulator!

Whether youโ€™re into C++, Python, or JavaScript, Iโ€™ve got you covered with this super easy-to-follow post ๐Ÿ•น๏ธ.

๐Ÿ”ง What Youโ€™ll Learn

  • Get up close and personal with ARM64 architecture.
  • Gain hands-on experience with low-level programming and emulation.
  • Build an emulator in your favorite language: C++, Python, or JavaScript.

๐Ÿ’ก Why Build an Emulator?

  • Learn by Doing.
  • Understand the ARM64 architecture.

๐Ÿ‘จโ€๐Ÿ’ป Choose Your Language:

  • C++: Perfect for those who love performance and speed.
  • Python: Great if you prefer simplicity and readability.
  • JavaScript: Awesome for web-based emulation and flexibility.

Features

  • Emulates 31 general-purpose registers (x0 to x30).
  • Supports basic ARM64 instructions: ldr, str, add, mul, mov, svc, and b.
  • Handles memory operations.
  • Can print the current state of registers and memory.

Methods

  • constructor(): Initializes the emulator with empty registers and memory, and sets the program counter (pc) to 0.
  • loadProgram(program): Loads a program into the emulator. The program should be a string of ARM64 assembly instructions.
  • run(): Runs the loaded program.
  • printMemory(): Prints the current state of the memory.
  • printRegisters(): Prints the current state of the registers.
  • initializeMemory(memoryInit): Initializes the emulator's memory with the given key-value pairs.

Supported Instructions

  • ldr: Loads a value into a register.
  • str: Stores a value from a register into memory.
  • add: Adds two register values and stores the result in a destination register.
  • mul: Multiplies two register values and stores the result in a destination register.
  • mov: Moves an immediate value into a register.
  • svc: (Not implemented) Placeholder for handling system calls.
  • b: Branches to a labeled instruction.

ARM64 Overview

  • ARM64 (AArch64) is a 64-bit architecture used in modern processors.
  • Supports a large set of registers (x0-x30), each 64 bits wide.
  • Designed for high performance and energy efficiency.

Purpose of the Emulator

  • Simulate ARM64 instruction execution.

Initializing Emulator

  • Constructor initializes registers (x0-x30) to 0.
  • Memory and program counter (pc) initialized.
  • Instructions and labels are set up for later use.

Initializing Code

cpp

#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>
#include <sstream>

class ARM64Emulator {
private:
    std::unordered_map<std::string, int> registers;
    std::unordered_map<int, int> memory;
    std::vector<std::string> instructions;
    std::unordered_map<std::string, int> labels;
    int pc;

public:
    ARM64Emulator() : pc(0) {
        for (int i = 0; i < 31; i++) {
            registers["x" + std::to_string(i)] = 0;
        }
    }

}
Enter fullscreen mode Exit fullscreen mode

python

class ARM64Simulator:
    def __init__(self):
        self.registers = {f'x{i}': 0 for i in range(31)}
        self.memory = {}
        self.pc = 0
        self.instructions = []
        self.labels = {}
Enter fullscreen mode Exit fullscreen mode

javascript

class ARM64Emulator {
    constructor() {
        this.registers = {};
        for (let i = 0; i < 31; i++) {
            this.registers[`x${i}`] = 0;
        }
        this.memory = {};
        this.pc = 0;
        this.instructions = [];
        this.labels = {};
    }

}
Enter fullscreen mode Exit fullscreen mode

Loading the Program

  • loadProgram(program): Loads the program into the emulator.
  • Splits the program into instructions and filters out empty lines.
  • Calls parseLabels() to identify labels in the program.

Loading Code

cpp

    void loadProgram(const std::string& program) {
        std::istringstream stream(program);
        std::string line;
        while (std::getline(stream, line)) {
            std::string trimmed = trim(line);
            if (!trimmed.empty()) {
                instructions.push_back(trimmed);
            }
        }
        parseLabels();
    }

    void parseLabels() {
        for (size_t i = 0; i < instructions.size(); i++) {
            const std::string& line = instructions[i];
            size_t colonPos = line.find(':');
            if (colonPos != std::string::npos) {
                std::string label = trim(line.substr(0, colonPos));
                labels[label] = i;
            }
        }
    }

    std::string trim(const std::string& str) {
        size_t first = str.find_first_not_of(" \t");
        size_t last = str.find_last_not_of(" \t");
        return (first == std::string::npos || last == std::string::npos) ? "" : str.substr(first, (last - first + 1));
    }
Enter fullscreen mode Exit fullscreen mode

python

    def load_program(self, program):
        self.instructions = [line.strip() for line in program.split('\n') if line.strip()]
        self.parse_labels()

    def parse_labels(self):
        for i, line in enumerate(self.instructions):
            if ':' in line:
                label = line.split(':')[0].strip()
                self.labels[label] = i
Enter fullscreen mode Exit fullscreen mode

javascript

    loadProgram(program) {
        this.instructions = program.split('\n').map(line => line.trim()).filter(line => line);
        this.parseLabels();
    }

    parseLabels() {
        this.instructions.forEach((line, i) => {
            if (line.includes(':')) {
                const label = line.split(':')[0].trim();
                this.labels[label] = i;
            }
        });
    }
Enter fullscreen mode Exit fullscreen mode

Running the Program

  • run(): Executes the loaded instructions one by one.
  • Skips label lines and calls executeInstruction(line) for each instruction.

Running Code

cpp

    void run() {
        while (pc < instructions.size()) {
            const std::string& line = instructions[pc];
            if (line.back() != ':') {
                executeInstruction(line);
            }
            pc++;
        }
    }
Enter fullscreen mode Exit fullscreen mode

python

    def run(self):
        while self.pc < len(self.instructions):
            line = self.instructions[self.pc]
            if not line.endswith(':'):
                self.execute_instruction(line)
            self.pc += 1
Enter fullscreen mode Exit fullscreen mode

javascript

    run() {
        while (this.pc < this.instructions.length) {
            const line = this.instructions[this.pc];
            if (!line.endsWith(':')) {
                this.executeInstruction(line);
            }
            this.pc++;
        }
    }
Enter fullscreen mode Exit fullscreen mode

Executing Instructions

  • executeInstruction(line): Parses and executes a single instruction.
  • Supports ldr, str, add, mul, mov, svc, and b instructions.

Executing Code

cpp

    void executeInstruction(const std::string& line) {
        std::istringstream iss(line);
        std::vector<std::string> parts;
        std::string part;
        while (iss >> part) {
            parts.push_back(part);
        }

        const std::string& cmd = parts[0];
        // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'

    }
Enter fullscreen mode Exit fullscreen mode

python

    def execute_instruction(self, line):
        parts = line.split()
        cmd = parts[0]

        # Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
Enter fullscreen mode Exit fullscreen mode

javascript

    executeInstruction(line) {
        const parts = line.split(/\s+/);
        const cmd = parts[0];

        switch (cmd) {
            // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
        }
    }
Enter fullscreen mode Exit fullscreen mode

LDR and STR Instructions

  • ldr: Loads a value into a register.
  • str: Stores a value from a register into memory.

LDR Code

cpp

        if (cmd == "ldr") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1); // remove trailing comma
            std::string value = parts[2];
            if (value[0] == '=') {
                int addr = std::stoi(value.substr(1));
                registers[reg] = addr;
            } else {
                int addr = registers[value.substr(1, value.length() - 2)];
                registers[reg] = memory[addr];
            }
        }
Enter fullscreen mode Exit fullscreen mode

python

        if cmd == 'ldr':
            reg, value = parts[1].strip(','), parts[2]
            if value.startswith('='):
                addr = value[1:]
                self.registers[reg] = addr
            else:
                addr = self.registers[value.strip('[]')]
                self.registers[reg] = self.memory.get(addr, 0)
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'ldr': {
                const reg = parts[1].replace(',', '');
                const value = parts[2];
                if (value.startsWith('=')) {
                    const addr = value.substring(1);
                    this.registers[reg] = addr;
                } else {
                    const addr = this.registers[value.replace('[', '').replace(']', '')];
                    this.registers[reg] = this.memory[addr] || 0;
                }
                break;
            }
Enter fullscreen mode Exit fullscreen mode

STR Code

cpp

          else if (cmd == "str") {
            std::string value = parts[1].substr(0, parts[1].length() - 1);
            std::string reg = parts[2].substr(1, parts[2].length() - 2);
            int addr = registers[reg];
            memory[addr] = registers[value];
        } 
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'str':
            value, reg = parts[1].strip(','), parts[2]
            addr = self.registers[reg.strip('[]')]
            self.memory[addr] = self.registers[value]
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'str': {
                const value = parts[1].replace(',', '');
                const reg = parts[2];
                const addr = this.registers[reg.replace('[', '').replace(']', '')];
                this.memory[addr] = this.registers[value];
                break;
            }
Enter fullscreen mode Exit fullscreen mode

ADD and MUL Instructions

  • add: Adds values from two registers and stores the result in a destination register.
  • mul: Multiplies values from two registers and stores the result in a destination register.

ADD and MUL Code

cpp

          else if (cmd == "add") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] + registers[src2];
        } else if (cmd == "mul") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] * registers[src2];
        }
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'add':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] + self.registers[src2]
        elif cmd == 'mul':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] * self.registers[src2]
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'add': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] + this.registers[src2];
                break;
            }
            case 'mul': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] * this.registers[src2];
                break;
            }
Enter fullscreen mode Exit fullscreen mode

MOV and B Instructions

  • mov: Moves an immediate value into a register.
  • b: Branches to a labeled instruction.

MOV and B Code

cpp

          else if (cmd == "mov") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1);
            int value = std::stoi(parts[2].substr(1));
            registers[reg] = value;
        } else if (cmd == "b") {
            std::string label = parts[1];
            pc = labels[label] - 1;
        } else {
            std::cout << "Unknown instruction: " << cmd << std::endl;
        }
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'mov':
            reg, value = parts[1].strip(','), int(parts[2].strip('#'))
            self.registers[reg] = value
        elif cmd == 'svc':
            pass  # We will handle syscall separately
        elif cmd == 'b':
            label = parts[1]
            self.pc = self.labels[label] - 1
        else:
            print(f"Unknown instruction: {cmd}")
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'mov': {
                const reg = parts[1].replace(',', '');
                const value = parseInt(parts[2].replace('#', ''));
                this.registers[reg] = value;
                break;
            }
            case 'svc': {
                // Handle syscall separately
                break;
            }
            case 'b': {
                const label = parts[1];
                this.pc = this.labels[label] - 1;
                break;
            }
            default: {
                console.log(`Unknown instruction: ${cmd}`);
                break;
            }
Enter fullscreen mode Exit fullscreen mode

Memory and Register Handling

  • initializeMemory(memoryInit): Initializes memory with given values.
  • printMemory(): Prints the current state of memory.
  • printRegisters(): Prints the current state of registers.

Memory and Register Code

cpp

    void printMemory() {
        std::cout << "Memory:" << std::endl;
        for (const auto& [k, v] : memory) {
            std::cout << k << ": " << v << std::endl;
        }
    }

    void printRegisters() {
        std::cout << "Registers:" << std::endl;
        for (const auto& [k, v] : registers) {
            std::cout << k << ": " << v << std::endl;
        }
    }

    void initializeMemory(const std::unordered_map<std::string, int>& memoryInit) {
        for (const auto& [key, value] : memoryInit) {
            memory[std::stoi(key)] = value;
        }
    }
Enter fullscreen mode Exit fullscreen mode

python

    def print_memory(self):
        print("Memory:")
        for k, v in self.memory.items():
            print(f"{k}: {v}")

    def print_registers(self):
        print("Registers:")
        for k, v in self.registers.items():
            print(f"{k}: {v}")

    def initialize_memory(self, memory_init):
        for var, value in memory_init.items():
            self.memory[var] = value
Enter fullscreen mode Exit fullscreen mode

javascript

    printMemory() {
        console.log("Memory:");
        for (const [k, v] of Object.entries(this.memory)) {
            console.log(`${k}: ${v}`);
        }
    }

    printRegisters() {
        console.log("Registers:");
        for (const [k, v] of Object.entries(this.registers)) {
            console.log(`${k}: ${v}`);
        }
    }

    initializeMemory(memoryInit) {
        this.memory = { ...memoryInit };
    }
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

  • Define the program to be executed.
  • Initialize memory with values.
  • Create emulator instance, load program, run, and print results.

Driver Code

cpp

int main() {
    std::string program = 
        "ldr x0, =5\n"
        "ldr x1, [x0]\n"
        "ldr x0, =7\n"
        "ldr x2, [x0]\n"
        "add x3, x1, x2\n"
        "ldr x0, =3\n"
        "ldr x4, [x0]\n"
        "mul x5, x3, x4\n"
        "ldr x0, =0\n"
        "str x5, [x0]\n";

    std::unordered_map<std::string, int> memoryInit = {
        {"5", 5},
        {"7", 7},
        {"3", 3},
        {"0", 0}
    };

    ARM64Emulator emulator;
    emulator.initializeMemory(memoryInit);
    emulator.loadProgram(program);
    emulator.run();
    emulator.printRegisters();
    emulator.printMemory();

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

python

program = """
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
"""

memory_init = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
}

simulator = ARM64Simulator()
simulator.initialize_memory(memory_init)
simulator.load_program(program)
simulator.run()
simulator.print_registers()
simulator.print_memory()
Enter fullscreen mode Exit fullscreen mode

javascript

const program = `
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
`;

const memoryInit = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
};

const emulator = new ARM64Emulator();
emulator.initializeMemory(memoryInit);
emulator.loadProgram(program);
emulator.run();
emulator.printRegisters();
emulator.printMemory();
Enter fullscreen mode Exit fullscreen mode

Future Work

The journey doesn't end here! Building a simple emulator is just the beginning. You can explore advanced instruction sets with following tasks:-

  • Implement additional ARM64 instructions to enhance your emulatorโ€™s capabilities.
  • Explore conditional instructions, floating-point operations, and vector processing.

GitHub

https://github.com/ToyMath/ToyARM64Emulator

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (1)

Collapse
 
anitaolsen profile image
Anita Olsen โ€ข โ€ข Edited

Python er valget mitt hele veien! ๐Ÿ˜Š

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

๐Ÿ‘‹ Kindness is contagious

Please show some love โค๏ธ or share a kind word in the comments if you found this useful!

Got it!