chrischtel

Posted on Oct 8

Building an x86_64 Instruction Encoder from Scratch

#assembly #machinecode #x8664 #rust

I’ve always been fascinated by how compilers actually turn code into bytes.

Over the last few weeks I’ve been building Rask, a small x86_64 instruction encoder in Rust that generates machine code programmatically.

It’s meant for anyone who wants to understand instruction encoding or experiment with JIT compilation — without diving into LLVM’s massive complexity.

What it does

Rask takes simple instruction descriptions and turns them into raw x86_64 machine code bytes:

use rask_x86_64::{encoder::Encoder, registers::Reg64::*, operand::Operand};

let mut encoder = Encoder::new();
encoder.mov(Operand::Reg(RAX), Operand::Imm(1337));
encoder.add(RAX, RBX);
encoder.ret();

let bytes = encoder.bytes();
// [0x48, 0xb8, 0x39, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x01, 0xd8, 0xc3]

That's valid machine code that loads 1337 into RAX, adds RBX to it, and returns.

Why build this?

LLVM is incredibly powerful but has a steep learning curve and massive compile times. Cranelift is better but still complex for simple use cases. I wanted something you could actually read and understand. I didn’t plan for it to be a full backend — just a weekend experiment that slowly grew legs.

The goal isn't to compete with these tools on features or performance. It's to provide a clean foundation for:

Learning x86_64 instruction encoding
Prototyping JIT compilers
Building custom assemblers or analysis tools
Understanding how CPUs actually execute code

Technical details

The interesting parts were getting the encoding details right:

REX prefixes - x86_64 uses REX prefixes to access extended registers (R8-R15). The encoder automatically generates the correct prefix based on which registers you use:

encoder.mov(Operand::Reg(RAX), Operand::Imm(42));  // REX: 0x48
encoder.mov(Operand::Reg(R10), Operand::Imm(42));  // REX: 0x49

Memory addressing - Supporting different displacement sizes was tricky. The encoder chooses the most compact encoding:

let mem = MemOperand { base: RBX, disp: 8 };     // 8-bit displacement
encoder.mov(Operand::Reg(RAX), Operand::Mem(mem));

let mem = MemOperand { base: RBX, disp: 1000 };  // 32-bit displacement  
encoder.mov(Operand::Reg(RAX), Operand::Mem(mem));

ModR/M encoding - This byte specifies registers and addressing modes. Getting the bit layout right for all combinations took careful reading of Intel's documentation.

Current status

Rask is very early stage. It supports:

MOV instructions (register-to-register, immediate-to-register, memory operations)
Basic arithmetic (ADD, SUB)
Function returns (RET)

Missing: jumps, calls, most instructions, optimizations, multiple architectures.

Every implemented instruction is tested against known-good byte sequences. The focus is on correctness over completeness.

Design decisions

Type safety - The operand system catches encoding errors at compile time:

enum Operand {
    Reg(Reg64),
    Mem(MemOperand), 
    Imm(i64),
}

Documentation - Each instruction includes Intel SDM references and encoding details. You can understand what's happening without external docs.

Modularity - Split into rask-common (shared types) and rask-x86_64 (architecture-specific). Future architectures will be separate crates.

No magic - Everything is explicit. No macros, no hidden code generation.

Examples

The repo includes working examples:

Basic encoding - demonstrates core instructions
REX prefixes - shows extended register handling
Arithmetic operations - memory addressing and ModR/M encoding

You can see the exact byte sequences generated and verify them against disassemblers.

Use cases

While Rask is experimental, the techniques are useful for:

Dynamic code generation - Generate optimized code at runtime based on input data

Domain-specific languages - Compile custom languages directly to machine code

Performance analysis - Generate test sequences to understand CPU behavior

Education - Learn low-level programming concepts with immediate feedback

What's next

Next up: expanding instruction support — jumps, calls, and more arithmetic.
I’m also thinking about experimenting with other architectures like ARM64 once the core is solid.

Longer term, I'm interested in higher-level code generation patterns and potentially other architectures (ARM64).

Feedback welcome, especially from people working on JIT compilers or low-level tooling.

TL;DR: Built a small x86_64 instruction encoder in Rust for learning and experimentation. Generates correct machine code, focuses on clarity over features. Early stage but working for basic instructions.

DEV Community