Hey everyone! ๐
๐ค Ever wondered what itโs like to get really close to the chip level? 
Dive into the world of ARM64 by building your own emulator! 
Whether youโre into C++, Python, or JavaScript, Iโve got you covered with this super easy-to-follow post ๐น๏ธ.
๐ง What Youโll Learn
- Get up close and personal with ARM64 architecture.
- Gain hands-on experience with low-level programming and emulation.
- Build an emulator in your favorite language: C++, Python, or JavaScript.
๐ก Why Build an Emulator?
- Learn by Doing.
- Understand the ARM64 architecture.
๐จโ๐ป Choose Your Language:
- C++: Perfect for those who love performance and speed.
- Python: Great if you prefer simplicity and readability.
- JavaScript: Awesome for web-based emulation and flexibility.
Features
- Emulates 31 general-purpose registers (x0 to x30).
- Supports basic ARM64 instructions: ldr, str, add, mul, mov, svc, and b.
- Handles memory operations.
- Can print the current state of registers and memory.
Methods
- constructor(): Initializes the emulator with empty registers and memory, and sets the program counter (pc) to 0.
- loadProgram(program): Loads a program into the emulator. The program should be a string of ARM64 assembly instructions.
- run(): Runs the loaded program.
- printMemory(): Prints the current state of the memory.
- printRegisters(): Prints the current state of the registers.
- initializeMemory(memoryInit): Initializes the emulator's memory with the given key-value pairs.
Supported Instructions
- ldr: Loads a value into a register.
- str: Stores a value from a register into memory.
- add: Adds two register values and stores the result in a destination register.
- mul: Multiplies two register values and stores the result in a destination register.
- mov: Moves an immediate value into a register.
- svc: (Not implemented) Placeholder for handling system calls.
- b: Branches to a labeled instruction.
ARM64 Overview
- ARM64 (AArch64) is a 64-bit architecture used in modern processors.
- Supports a large set of registers (x0-x30), each 64 bits wide.
- Designed for high performance and energy efficiency.
Purpose of the Emulator
- Simulate ARM64 instruction execution.
Initializing Emulator
- Constructor initializes registers (x0-x30) to 0.
- Memory and program counter (pc) initialized.
- Instructions and labels are set up for later use.
Initializing Code
cpp
#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>
#include <sstream>
class ARM64Emulator {
private:
    std::unordered_map<std::string, int> registers;
    std::unordered_map<int, int> memory;
    std::vector<std::string> instructions;
    std::unordered_map<std::string, int> labels;
    int pc;
public:
    ARM64Emulator() : pc(0) {
        for (int i = 0; i < 31; i++) {
            registers["x" + std::to_string(i)] = 0;
        }
    }
}
python
class ARM64Simulator:
    def __init__(self):
        self.registers = {f'x{i}': 0 for i in range(31)}
        self.memory = {}
        self.pc = 0
        self.instructions = []
        self.labels = {}
javascript
class ARM64Emulator {
    constructor() {
        this.registers = {};
        for (let i = 0; i < 31; i++) {
            this.registers[`x${i}`] = 0;
        }
        this.memory = {};
        this.pc = 0;
        this.instructions = [];
        this.labels = {};
    }
}
Loading the Program
- loadProgram(program): Loads the program into the emulator.
- Splits the program into instructions and filters out empty lines.
- Calls parseLabels() to identify labels in the program.
Loading Code
cpp
    void loadProgram(const std::string& program) {
        std::istringstream stream(program);
        std::string line;
        while (std::getline(stream, line)) {
            std::string trimmed = trim(line);
            if (!trimmed.empty()) {
                instructions.push_back(trimmed);
            }
        }
        parseLabels();
    }
    void parseLabels() {
        for (size_t i = 0; i < instructions.size(); i++) {
            const std::string& line = instructions[i];
            size_t colonPos = line.find(':');
            if (colonPos != std::string::npos) {
                std::string label = trim(line.substr(0, colonPos));
                labels[label] = i;
            }
        }
    }
    std::string trim(const std::string& str) {
        size_t first = str.find_first_not_of(" \t");
        size_t last = str.find_last_not_of(" \t");
        return (first == std::string::npos || last == std::string::npos) ? "" : str.substr(first, (last - first + 1));
    }
python
    def load_program(self, program):
        self.instructions = [line.strip() for line in program.split('\n') if line.strip()]
        self.parse_labels()
    def parse_labels(self):
        for i, line in enumerate(self.instructions):
            if ':' in line:
                label = line.split(':')[0].strip()
                self.labels[label] = i
javascript
    loadProgram(program) {
        this.instructions = program.split('\n').map(line => line.trim()).filter(line => line);
        this.parseLabels();
    }
    parseLabels() {
        this.instructions.forEach((line, i) => {
            if (line.includes(':')) {
                const label = line.split(':')[0].trim();
                this.labels[label] = i;
            }
        });
    }
Running the Program
- run(): Executes the loaded instructions one by one.
- Skips label lines and calls executeInstruction(line) for each instruction.
Running Code
cpp
    void run() {
        while (pc < instructions.size()) {
            const std::string& line = instructions[pc];
            if (line.back() != ':') {
                executeInstruction(line);
            }
            pc++;
        }
    }
python
    def run(self):
        while self.pc < len(self.instructions):
            line = self.instructions[self.pc]
            if not line.endswith(':'):
                self.execute_instruction(line)
            self.pc += 1
javascript
    run() {
        while (this.pc < this.instructions.length) {
            const line = this.instructions[this.pc];
            if (!line.endsWith(':')) {
                this.executeInstruction(line);
            }
            this.pc++;
        }
    }
Executing Instructions
- executeInstruction(line): Parses and executes a single instruction.
- Supports ldr, str, add, mul, mov, svc, and b instructions.
Executing Code
cpp
    void executeInstruction(const std::string& line) {
        std::istringstream iss(line);
        std::vector<std::string> parts;
        std::string part;
        while (iss >> part) {
            parts.push_back(part);
        }
        const std::string& cmd = parts[0];
        // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
    }
python
    def execute_instruction(self, line):
        parts = line.split()
        cmd = parts[0]
        # Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
javascript
    executeInstruction(line) {
        const parts = line.split(/\s+/);
        const cmd = parts[0];
        switch (cmd) {
            // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
        }
    }
LDR and STR Instructions
- ldr: Loads a value into a register.
- str: Stores a value from a register into memory.
LDR Code
cpp
        if (cmd == "ldr") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1); // remove trailing comma
            std::string value = parts[2];
            if (value[0] == '=') {
                int addr = std::stoi(value.substr(1));
                registers[reg] = addr;
            } else {
                int addr = registers[value.substr(1, value.length() - 2)];
                registers[reg] = memory[addr];
            }
        }
python
        if cmd == 'ldr':
            reg, value = parts[1].strip(','), parts[2]
            if value.startswith('='):
                addr = value[1:]
                self.registers[reg] = addr
            else:
                addr = self.registers[value.strip('[]')]
                self.registers[reg] = self.memory.get(addr, 0)
javascript
            case 'ldr': {
                const reg = parts[1].replace(',', '');
                const value = parts[2];
                if (value.startsWith('=')) {
                    const addr = value.substring(1);
                    this.registers[reg] = addr;
                } else {
                    const addr = this.registers[value.replace('[', '').replace(']', '')];
                    this.registers[reg] = this.memory[addr] || 0;
                }
                break;
            }
STR Code
cpp
          else if (cmd == "str") {
            std::string value = parts[1].substr(0, parts[1].length() - 1);
            std::string reg = parts[2].substr(1, parts[2].length() - 2);
            int addr = registers[reg];
            memory[addr] = registers[value];
        } 
python
        elif cmd == 'str':
            value, reg = parts[1].strip(','), parts[2]
            addr = self.registers[reg.strip('[]')]
            self.memory[addr] = self.registers[value]
javascript
            case 'str': {
                const value = parts[1].replace(',', '');
                const reg = parts[2];
                const addr = this.registers[reg.replace('[', '').replace(']', '')];
                this.memory[addr] = this.registers[value];
                break;
            }
ADD and MUL Instructions
- add: Adds values from two registers and stores the result in a destination register.
- mul: Multiplies values from two registers and stores the result in a destination register.
ADD and MUL Code
cpp
          else if (cmd == "add") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] + registers[src2];
        } else if (cmd == "mul") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] * registers[src2];
        }
python
        elif cmd == 'add':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] + self.registers[src2]
        elif cmd == 'mul':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] * self.registers[src2]
javascript
            case 'add': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] + this.registers[src2];
                break;
            }
            case 'mul': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] * this.registers[src2];
                break;
            }
MOV and B Instructions
- mov: Moves an immediate value into a register.
- b: Branches to a labeled instruction.
MOV and B Code
cpp
          else if (cmd == "mov") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1);
            int value = std::stoi(parts[2].substr(1));
            registers[reg] = value;
        } else if (cmd == "b") {
            std::string label = parts[1];
            pc = labels[label] - 1;
        } else {
            std::cout << "Unknown instruction: " << cmd << std::endl;
        }
python
        elif cmd == 'mov':
            reg, value = parts[1].strip(','), int(parts[2].strip('#'))
            self.registers[reg] = value
        elif cmd == 'svc':
            pass  # We will handle syscall separately
        elif cmd == 'b':
            label = parts[1]
            self.pc = self.labels[label] - 1
        else:
            print(f"Unknown instruction: {cmd}")
javascript
            case 'mov': {
                const reg = parts[1].replace(',', '');
                const value = parseInt(parts[2].replace('#', ''));
                this.registers[reg] = value;
                break;
            }
            case 'svc': {
                // Handle syscall separately
                break;
            }
            case 'b': {
                const label = parts[1];
                this.pc = this.labels[label] - 1;
                break;
            }
            default: {
                console.log(`Unknown instruction: ${cmd}`);
                break;
            }
Memory and Register Handling
- initializeMemory(memoryInit): Initializes memory with given values.
- printMemory(): Prints the current state of memory.
- printRegisters(): Prints the current state of registers.
Memory and Register Code
cpp
    void printMemory() {
        std::cout << "Memory:" << std::endl;
        for (const auto& [k, v] : memory) {
            std::cout << k << ": " << v << std::endl;
        }
    }
    void printRegisters() {
        std::cout << "Registers:" << std::endl;
        for (const auto& [k, v] : registers) {
            std::cout << k << ": " << v << std::endl;
        }
    }
    void initializeMemory(const std::unordered_map<std::string, int>& memoryInit) {
        for (const auto& [key, value] : memoryInit) {
            memory[std::stoi(key)] = value;
        }
    }
python
    def print_memory(self):
        print("Memory:")
        for k, v in self.memory.items():
            print(f"{k}: {v}")
    def print_registers(self):
        print("Registers:")
        for k, v in self.registers.items():
            print(f"{k}: {v}")
    def initialize_memory(self, memory_init):
        for var, value in memory_init.items():
            self.memory[var] = value
javascript
    printMemory() {
        console.log("Memory:");
        for (const [k, v] of Object.entries(this.memory)) {
            console.log(`${k}: ${v}`);
        }
    }
    printRegisters() {
        console.log("Registers:");
        for (const [k, v] of Object.entries(this.registers)) {
            console.log(`${k}: ${v}`);
        }
    }
    initializeMemory(memoryInit) {
        this.memory = { ...memoryInit };
    }
Putting It All Together
- Define the program to be executed.
- Initialize memory with values.
- Create emulator instance, load program, run, and print results.
Driver Code
cpp
int main() {
    std::string program = 
        "ldr x0, =5\n"
        "ldr x1, [x0]\n"
        "ldr x0, =7\n"
        "ldr x2, [x0]\n"
        "add x3, x1, x2\n"
        "ldr x0, =3\n"
        "ldr x4, [x0]\n"
        "mul x5, x3, x4\n"
        "ldr x0, =0\n"
        "str x5, [x0]\n";
    std::unordered_map<std::string, int> memoryInit = {
        {"5", 5},
        {"7", 7},
        {"3", 3},
        {"0", 0}
    };
    ARM64Emulator emulator;
    emulator.initializeMemory(memoryInit);
    emulator.loadProgram(program);
    emulator.run();
    emulator.printRegisters();
    emulator.printMemory();
    return 0;
}
python
program = """
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
"""
memory_init = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
}
simulator = ARM64Simulator()
simulator.initialize_memory(memory_init)
simulator.load_program(program)
simulator.run()
simulator.print_registers()
simulator.print_memory()
javascript
const program = `
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
`;
const memoryInit = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
};
const emulator = new ARM64Emulator();
emulator.initializeMemory(memoryInit);
emulator.loadProgram(program);
emulator.run();
emulator.printRegisters();
emulator.printMemory();
Future Work
The journey doesn't end here! Building a simple emulator is just the beginning. You can explore advanced instruction sets with following tasks:-
- Implement additional ARM64 instructions to enhance your emulatorโs capabilities.
- Explore conditional instructions, floating-point operations, and vector processing.
 
 
              
 
    
Top comments (1)
Python er valget mitt hele veien! ๐