I’ve always been fascinated by how compilers actually turn code into bytes.
Over the last few weeks I’ve been building Rask, a small x86_64 instruction encoder in Rust that generates machine code programmatically.
It’s meant for anyone who wants to understand instruction encoding or experiment with JIT compilation — without diving into LLVM’s massive complexity.
What it does
Rask takes simple instruction descriptions and turns them into raw x86_64 machine code bytes:
use rask_x86_64::{encoder::Encoder, registers::Reg64::*, operand::Operand};
let mut encoder = Encoder::new();
encoder.mov(Operand::Reg(RAX), Operand::Imm(1337));
encoder.add(RAX, RBX);
encoder.ret();
let bytes = encoder.bytes();
// [0x48, 0xb8, 0x39, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x01, 0xd8, 0xc3]
That's valid machine code that loads 1337 into RAX, adds RBX to it, and returns.
Why build this?
LLVM is incredibly powerful but has a steep learning curve and massive compile times. Cranelift is better but still complex for simple use cases. I wanted something you could actually read and understand. I didn’t plan for it to be a full backend — just a weekend experiment that slowly grew legs.
The goal isn't to compete with these tools on features or performance. It's to provide a clean foundation for:
- Learning x86_64 instruction encoding
- Prototyping JIT compilers
- Building custom assemblers or analysis tools
- Understanding how CPUs actually execute code
Technical details
The interesting parts were getting the encoding details right:
REX prefixes - x86_64 uses REX prefixes to access extended registers (R8-R15). The encoder automatically generates the correct prefix based on which registers you use:
encoder.mov(Operand::Reg(RAX), Operand::Imm(42)); // REX: 0x48
encoder.mov(Operand::Reg(R10), Operand::Imm(42)); // REX: 0x49
Memory addressing - Supporting different displacement sizes was tricky. The encoder chooses the most compact encoding:
let mem = MemOperand { base: RBX, disp: 8 }; // 8-bit displacement
encoder.mov(Operand::Reg(RAX), Operand::Mem(mem));
let mem = MemOperand { base: RBX, disp: 1000 }; // 32-bit displacement
encoder.mov(Operand::Reg(RAX), Operand::Mem(mem));
ModR/M encoding - This byte specifies registers and addressing modes. Getting the bit layout right for all combinations took careful reading of Intel's documentation.
Current status
Rask is very early stage. It supports:
- MOV instructions (register-to-register, immediate-to-register, memory operations)
- Basic arithmetic (ADD, SUB)
- Function returns (RET)
Missing: jumps, calls, most instructions, optimizations, multiple architectures.
Every implemented instruction is tested against known-good byte sequences. The focus is on correctness over completeness.
Design decisions
Type safety - The operand system catches encoding errors at compile time:
enum Operand {
Reg(Reg64),
Mem(MemOperand),
Imm(i64),
}
Documentation - Each instruction includes Intel SDM references and encoding details. You can understand what's happening without external docs.
Modularity - Split into rask-common
(shared types) and rask-x86_64
(architecture-specific). Future architectures will be separate crates.
No magic - Everything is explicit. No macros, no hidden code generation.
Examples
The repo includes working examples:
- Basic encoding - demonstrates core instructions
- REX prefixes - shows extended register handling
- Arithmetic operations - memory addressing and ModR/M encoding
You can see the exact byte sequences generated and verify them against disassemblers.
Use cases
While Rask is experimental, the techniques are useful for:
Dynamic code generation - Generate optimized code at runtime based on input data
Domain-specific languages - Compile custom languages directly to machine code
Performance analysis - Generate test sequences to understand CPU behavior
Education - Learn low-level programming concepts with immediate feedback
What's next
Next up: expanding instruction support — jumps, calls, and more arithmetic.
I’m also thinking about experimenting with other architectures like ARM64 once the core is solid.
Longer term, I'm interested in higher-level code generation patterns and potentially other architectures (ARM64).
Feedback welcome, especially from people working on JIT compilers or low-level tooling.
TL;DR: Built a small x86_64 instruction encoder in Rust for learning and experimentation. Generates correct machine code, focuses on clarity over features. Early stage but working for basic instructions.
Top comments (0)