<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Damilare Akinlaja</title>
    <description>The latest articles on DEV Community by Damilare Akinlaja (@darmie).</description>
    <link>https://dev.to/darmie</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F203338%2F8104601b-3b3e-4b60-8d82-de3c3a0f428e.png</url>
      <title>DEV Community: Damilare Akinlaja</title>
      <link>https://dev.to/darmie</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/darmie"/>
    <language>en</language>
    <item>
      <title>Building a JIT Compiler from Scratch: Part 1 — Why Build a JIT Compiler?</title>
      <dc:creator>Damilare Akinlaja</dc:creator>
      <pubDate>Thu, 19 Feb 2026 17:28:35 +0000</pubDate>
      <link>https://dev.to/darmie/building-a-jit-compiler-from-scratch-part-1-why-build-a-jit-compiler-590o</link>
      <guid>https://dev.to/darmie/building-a-jit-compiler-from-scratch-part-1-why-build-a-jit-compiler-590o</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;After reading &lt;a href="https://craftinginterpreters.com/" rel="noopener noreferrer"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt;&lt;/a&gt;, I thought building a bytecode VM would be enough. I built &lt;a href="https://github.com/darmie/cabasa" rel="noopener noreferrer"&gt;Cabasa&lt;/a&gt;, a WebAssembly runtime. I’m now building &lt;a href="https://github.com/darmie/rayzor" rel="noopener noreferrer"&gt;Rayzor&lt;/a&gt;, a &lt;a href="https://haxe.org" rel="noopener noreferrer"&gt;Haxe&lt;/a&gt; compiler in Rust. Each project taught me the same lesson: interpretation has a ceiling.&lt;/p&gt;

&lt;p&gt;I could wire up Cranelift or LLVM and call it done. But I wanted to understand JIT compilation from first principles — what happens between IR (Intermediate Representation) and machine code, and why it makes things fast. This series is that journey.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In &lt;a href="https://medium.com/p/3a5c71fbe88d" rel="noopener noreferrer"&gt;Part 0&lt;/a&gt;, we explored how computers run our code, and touched on the history of compilers and runtimes. Now we will actually write a JIT compiler.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Interpreter Performance Ceiling
&lt;/h2&gt;

&lt;p&gt;In a bytecode interpreter, we use symbols (opcodes) to represent instructions, these opcodes are encoded in a byte array which are then looped through for translation. The code below demonstrates a typical stack-based interpreter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// A simple bytecode interpreter loop
#[repr(u8)]
enum Opcode {
    Add,
    Sub,
    Mul,
    Load,
    Store,
    Jump,
    Return,
    // ... dozens more
}
fn interpret(bytecode: &amp;amp;[u8], constants: &amp;amp;[i64]) -&amp;gt; i64 {
    let mut ip: usize = 0;                    // Instruction pointer
    let mut stack: Vec&amp;lt;i64&amp;gt; = Vec::new();

    loop {
        // ┌─────────────────────────────────────────────────────┐
        // │ FETCH: Memory read, bounds check                    │
        // │ Cost: ~3-5 cycles                                   │
        // └─────────────────────────────────────────────────────┘
        let opcode = bytecode[ip];
        ip += 1;
        // ┌─────────────────────────────────────────────────────┐
        // │ DECODE + DISPATCH: Branch on opcode                 │
        // │ Cost: ~10-20 cycles (branch misprediction penalty)  │
        // │                                                     │
        // │ The CPU tries to predict which case we'll take.     │
        // │ With 50+ opcodes, it guesses wrong ~95% of the time.│
        // │ Each misprediction: flush pipeline, start over.     │
        // └─────────────────────────────────────────────────────┘
        match opcode {
            0 =&amp;gt; {  // Add
                // ┌─────────────────────────────────────────────┐
                // │ EXECUTE: The actual work                    │
                // │ Cost: 1 cycle                               │
                // └─────────────────────────────────────────────┘
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a + b);  // &amp;lt;-- This is ALL we wanted to do
            }
            1 =&amp;gt; {  // Sub
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a - b);
            }
            2 =&amp;gt; {  // Mul
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a * b);
            }
            3 =&amp;gt; {  // Load constant
                let idx = bytecode[ip] as usize;
                ip += 1;
                stack.push(constants[idx]);
            }
            4 =&amp;gt; {  // Return
                return stack.pop().unwrap();
            }
            _ =&amp;gt; panic!("Unknown opcode"),
        }
        // Then we loop back and do it ALL again for the next instruction
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why is this slow? Because we pay the price in CPU branch misprediction penalties. Let’s look at how a modern CPU executes instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The CPU Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your computer’s CPU does not execute instructions one at a time, while one instruction is being executed, the next is being decoded, and the one after that is being fetched, simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm0igvreiuk1jbhgw45u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm0igvreiuk1jbhgw45u.png" alt="CPU Pipeline" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modern CPUs have between 15–20+ pipeline stages. This means the instructions are “in-flight” all at once. This is great for throughput, but there is a catch!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Branching Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The CPU maintains a branch prediction mechanism, a hardware feature in modern processors that guesses the outcome of conditional branches like &lt;code&gt;if&lt;/code&gt;-statements and loops. The CPU gains a massive performance boost when it is able to predict branching patterns, but when it can’t predict the patterns it “flushes” the pipeline and that degrades the performance, this is the tax we pay.&lt;/p&gt;

&lt;p&gt;Let’s take a look at a simple &lt;code&gt;for-loop&lt;/code&gt; where the next iteration is determined by a simple increment from &lt;code&gt;0-n&lt;/code&gt;. The CPU can predict this pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Loop: branch taken 999 times, not taken once
for i in 0..1000 {   
    sum += i;        // Predictor learns: "always taken"
}                    // 99.9% accuracy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for our interpreter, the CPU can’t predict the next opcode in the bytecode, the pipeline gets flushed on every instruction, this makes the interpreter inefficient, and causes a spike in latency.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I experienced this first hand: a &lt;strong&gt;Mandelbrot&lt;/strong&gt; took about &lt;strong&gt;~45s&lt;/strong&gt; to complete with an interpreter, while the JIT version executed it in under &lt;strong&gt;300ms&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So what exactly does a JIT compiler do differently? Let’s see what native code gets you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INTERPRETER:                          JIT COMPILED:
─────────────────────────────────     ─────────────────────────────────
fetch opcode          ← branch        ; No dispatch loop
match opcode          ← branch        ; No opcode decoding  
  Add:                                ; Just the actual operations:
    pop, pop, add, push               
fetch opcode          ← branch        add x0, x1, x2
match opcode          ← branch        mul x3, x0, x4
  Mul:                                sub x5, x3, x6
    pop, pop, mul, push               
fetch opcode          ← branch        ; Linear code = perfect prediction
...                                   ; Pipeline stays full
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Numbers Don’t Lie&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here’s a real comparison from Rayzor, the Haxe compiler I’m building. Same source code, different execution strategies:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyff2fuv3iroq8fspdkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyff2fuv3iroq8fspdkh.png" alt="Rayor vs Haxe Interpreter Benchmark Results" width="800" height="173"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The interpreter spends most of its time deciding what to do. The JIT just does it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://rayzor-blade.com/benchmarks/" rel="noopener noreferrer"&gt;Full benchmark methodology →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we JIT compile, we eliminate the dispatch entirely: &lt;strong&gt;The branches that remain in JIT code (loops, conditionals) are &lt;em&gt;your program’s&lt;/em&gt; branches — which often &lt;em&gt;are&lt;/em&gt; predictable.&lt;/strong&gt; The artificial dispatch branches are gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  When JIT Makes Sense
&lt;/h2&gt;

&lt;p&gt;Not every project requires a JIT compiler, it is really complex — especially if you intend to make it useful in real world applications. But in the right scenarios the pay-off is great!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-Running Programs&lt;/strong&gt; &lt;br&gt;
If your program runs for seconds, minutes, or hours, JIT compilation cost becomes negligible. The time spent compiling hot functions is amortized across millions of executions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Examples of Long-Running Programs:&lt;/em&gt;&lt;/strong&gt; Web servers and application backends, database query engines, game engines with scripting, IDEs and developer tools.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;The Common pattern&lt;/em&gt;:&lt;/strong&gt; Startup can be slow; steady-state performance matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Languages&lt;/strong&gt; In dynamic languages, you don’t know the type of most values until runtime. For example, let’s look at the code snippet below for an untyped dynamic language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function process(item) {
    return item.value * 2;
}
// Same call site receives different types over time
eventQueue.forEach(item =&amp;gt; process(item));
// Iteration 1-1000:   item is {value: Number}
// Iteration 1001:     item is {value: String}  ← type changed!
// AOT must handle ALL cases (slow)
// JIT observes: "99.9% are Numbers" → specializes → guards → fast path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No static analysis can predict this. The types depend on what data flows through the program — often from user input, network responses, or database queries.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;AOT dilemma&lt;/strong&gt;: It generates code that handles &lt;em&gt;every&lt;/em&gt; possible type. Every operation becomes a type check followed by a dispatch.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;JIT advantage&lt;/strong&gt;: An opportunity to observe what &lt;em&gt;actually&lt;/em&gt; happens at runtime, then specialize. The snippet below demonstrates how real world JIT runtimes will execute the code:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;JIT observes: "First 1000 calls, item.value is always Number"
  → Compiles fast path: direct numeric multiplication
  → Inserts guard: "if not Number, bail out"
Iteration 1001: item.value is String

  → Guard fails
  → Deoptimize, fall back to interpreter [It is good to pair with an interpreter in such cases]
  → Recompile with polymorphic handling
Iteration 1002+: mostly Numbers again
  → Fast path still works 99.9% of the time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why V8, LuaJIT, and PyPy can approach native performance despite running dynamic languages. They don’t solve the “what type is this?” problem statically — they &lt;em&gt;observe&lt;/em&gt; and &lt;em&gt;adapt&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Statically typed languages still benefit greatly from JIT compilation than Dynamic languages, because we have solved the “what type is this” problem before handling hot functions in JIT, we get runtime performance boost, but we pay the cost in compile-time analysis phase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain-Specific Languages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DSLs can benefit from JIT compilation, especially DSLs capable of expensive runtime behavior that simple interpreters can not optimise or data intensive applications.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Query engines:&lt;/strong&gt; SQL JIT compilation (PostgreSQL, DuckDB)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Shader compilers:&lt;/strong&gt; GPU pipeline optimization&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Rule engines:&lt;/strong&gt; Business logic that changes at runtime&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Template engines&lt;/strong&gt;: Server-side rendering&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Expression evaluators and formula engines:&lt;/strong&gt; Spreadsheets, Data filtering, financial models, animation curves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hot Code Paths&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s commonly observed that programs spend the majority of execution time (often cited as 90%) in a small fraction of the code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Studies have shown that a program typically spends 90% of its execution time in only 10% of its code.” — &lt;em&gt;Jon Bentley, “Writing Efficient Programs”&lt;/em&gt; (1982)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frevtypsmv97znifokf67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frevtypsmv97znifokf67.png" alt="captionless image" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A tiered JIT interprets the cold 90% of your code (which takes 10% of time) and compiles the hot 10% of your code (which takes 90% of time). Best of both worlds.&lt;/p&gt;

&lt;h2&gt;
  
  
  When JIT Is Overkill
&lt;/h2&gt;

&lt;p&gt;Rule of thumb is to always start with an optimised interpreter, then evaluate if you need JIT.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Short-lived scripts&lt;/strong&gt;: Compilation cost &amp;gt; execution time&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Memory-constrained&lt;/strong&gt;: JIT infrastructure is heavy&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Predictable latency required&lt;/strong&gt;: Compilation pauses are unpredictable&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Simple glue code:&lt;/strong&gt; Interpretation is fast enough&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  So Let’s Build One
&lt;/h2&gt;

&lt;p&gt;We’ve covered a lot of ground : dispatch overhead, CPU pipelines, branch prediction, and when JIT makes sense. Now let’s get practical.&lt;/p&gt;

&lt;p&gt;In this series, we’re building a JIT compiler from scratch. Not a toy that emits hardcoded bytes, but a real pipeline:&lt;/p&gt;

&lt;p&gt;Source → IR → SSA → Optimize → ARM64 → Execute&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the roadmap:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Part 2:&lt;/strong&gt; A minimal Intermediate Representation (IR)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 3:&lt;/strong&gt; Control Flow Graphs — blocks, branches, loops&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 4:&lt;/strong&gt; SSA transformation — making data flow explicit&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 5:&lt;/strong&gt; Dominance analysis — where to place φ-nodes&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 6:&lt;/strong&gt; Optimization passes — constant folding, dead code elimination&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 7:&lt;/strong&gt; ARM64 code generation — turning IR into machine code&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 8:&lt;/strong&gt; Register allocation — fitting values into hardware registers&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 9:&lt;/strong&gt; Executable memory — the Apple Silicon W^X challenge&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Part 10:&lt;/strong&gt; Testing across architectures with QEMU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why ARM64?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Apple Silicon is everywhere now — MacBooks, iPads, even servers. ARM64 has a clean, fixed-width instruction set that’s easier to emit than x86’s variable-length encoding. And if you’re on Intel, we’ll use QEMU to test cross-platform.&lt;/p&gt;

&lt;p&gt;But first, let’s write some machine code.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Taste: Hello, Machine Code
&lt;/h2&gt;

&lt;p&gt;Whew! Enough theory. Let’s generate and execute machine code.&lt;/p&gt;

&lt;p&gt;If you don’t have Rust development environment on your computer, you need to set it up so that you can run the code examples.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create a new project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cargo new jit-hello
cd jit-hello
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add &lt;code&gt;dynasmrt&lt;/code&gt; to your &lt;code&gt;cargo.toml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[dependencies]
dynasmrt = "5.0.0"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, the simplest possible JIT — a function that returns 42:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;use dynasmrt::{dynasm, AssemblyOffset};
use dynasmrt::aarch64::Assembler;
fn main() {
    // Create an assembler for ARM64
    let mut asm = Assembler::new().unwrap();
    // Write actual assembly instructions
    dynasm!(asm
        ; mov x0, #42    // Return value goes in x0 (ARM64 calling convention)
        ; ret            // Return to caller
    );
    // Finalize: makes memory executable
    let code = asm.finalize().unwrap();
    // Cast to function pointer and call
    let func: fn() -&amp;gt; i64 = unsafe { 
        std::mem::transmute(code.ptr(AssemblyOffset(0))) 
    };
    println!("JIT returned: {}", func());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cargo run
JIT returned: 42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That’s it.&lt;/strong&gt;  We wrote ARM64 assembly, it got encoded to machine code, and we executed it as a native function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Just Happened?&lt;/strong&gt;&lt;br&gt;
Let’s break it down:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;Assembly::new()&lt;/code&gt; — Allocates a buffer for machine code&lt;/li&gt;
&lt;li&gt; &lt;code&gt;dynasm!(...)&lt;/code&gt; — Encodes our assembly into bytes (&lt;code&gt;mov x0, #42&lt;/code&gt; → &lt;code&gt;0xD2800540&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt; &lt;code&gt;finalize()&lt;/code&gt; — Marks the memory where buffer was allocated as executable, this is where the OS gets involved.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;transmute&lt;/code&gt; — Casts raw pointer to the callable function&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;unsafe&lt;/code&gt; block is unavoidable, we're telling Rust "trust me, these bytes are a valid function." Since the &lt;code&gt;transmute&lt;/code&gt; is also marked by the Rust standard library as unsafe because both the argument and the result must be valid, else it could cause undefined behavior. This is the JIT contract: we take responsibility for generating correct code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why This Doesn’t Scale&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is bad developer experience, very daunting task to write assembly by hand all the time.&lt;/p&gt;

&lt;p&gt;We just hand-wrote two instructions. Real programs have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Thousands of operations&lt;/li&gt;
&lt;li&gt;  Branches and loops&lt;/li&gt;
&lt;li&gt;  Function calls&lt;/li&gt;
&lt;li&gt;  Values competing for limited registers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Writing &lt;code&gt;dynasm!&lt;/code&gt; blocks by hand would be unmaintainable. We need structure between "source code" and "machine code"—a representation we can analyze, optimize, and systematically lower.&lt;/p&gt;

&lt;p&gt;That’s what the rest of this series builds. An Intermediate Representation.&lt;/p&gt;

&lt;p&gt;In Part 2, we’ll design a minimal Intermediate Representation — the data structure that represents code in a form we can work with. We’ll define values, types, and operations, and build a function that does more than return a constant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teaser&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Coming in Part 2:
let mut builder = FunctionBuilder::new("add");
let a = builder.param(Type::I64);
let b = builder.param(Type::I64);
let sum = builder.add(a, b);
builder.ret(sum);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;LuaJIT&lt;/strong&gt; — Study Mike Pall’s approach to tracing JIT&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/bytecodealliance/wasmtime/tree/main/cranelift" rel="noopener noreferrer"&gt;&lt;strong&gt;Cranelift&lt;/strong&gt;&lt;/a&gt; — Production-quality code generator (we’ll reference this)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;“A Brief History of Just-In-Time”&lt;/strong&gt; — Aycock (linked in &lt;a href="https://medium.com/p/3a5c71fbe88d" rel="noopener noreferrer"&gt;Part 0&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Series Navigation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;← &lt;a href="https://medium.com/codejitsu/building-a-jit-compiler-from-scratch-part-0-how-computers-run-your-code-3a5c71fbe88d" rel="noopener noreferrer"&gt;Part 0: How Computers Run Your Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 1: Why Build a JIT Compiler?&lt;/strong&gt; &lt;em&gt;(You are here)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;→ Part 2: Designing a Minimal IR &lt;em&gt;(Coming soon)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hi there! My name is Damilare Akinlaja&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I am currently building&lt;/em&gt; &lt;a href="https://github.com/darmie/zyntax" rel="noopener noreferrer"&gt;&lt;em&gt;Zyntax&lt;/em&gt;&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; &lt;a href="https://github.com/darmie/rayzor/" rel="noopener noreferrer"&gt;&lt;em&gt;Rayzor&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>jit</category>
      <category>programming</category>
      <category>compilers</category>
      <category>llvm</category>
    </item>
    <item>
      <title>Building a JIT Compiler from Scratch: Part 0 - How Computers Run Your Code</title>
      <dc:creator>Damilare Akinlaja</dc:creator>
      <pubDate>Sun, 07 Dec 2025 05:52:47 +0000</pubDate>
      <link>https://dev.to/darmie/building-a-jit-compiler-from-scratch-part-0-how-computers-run-your-code-2bp1</link>
      <guid>https://dev.to/darmie/building-a-jit-compiler-from-scratch-part-0-how-computers-run-your-code-2bp1</guid>
      <description>&lt;p&gt;This is the first part of the series on &lt;strong&gt;&lt;em&gt;Building a JIT Compiler from Scratch&lt;/em&gt;&lt;/strong&gt;. I have been exploring the world of compilers and interpreters for many years, from theory to many failed attempts. As a hobbyist trying to cure my curiosity. I did not have a Computer Science background even though programming was a mandatory course as a student of Applied Physics. Every advance topic I have learned about compilers were based on personal reading, learning and attempts. &lt;/p&gt;

&lt;p&gt;Writing a language parser is the easy part, then it gets a bit dramatic once you start tree-walking your AST to interpret it. But I wanted to go beyond parsers and basic interpreters. My journey led me to the world of virtual machines, made attempts like developing Cabasa and Wasp experimental runtimes for WebAssembly, and now developing &lt;a href="https://github.com/darmie/zyntax" rel="noopener noreferrer"&gt;Zyntax&lt;/a&gt; - a multi-paradigm and multi-tier JIT compilation and runtime framework.&lt;/p&gt;

&lt;p&gt;For the curious cats, I am writing this blog series to demystify compiler and virtual machine development. I might oversimplify some definitions or use diagrams to explain concepts, but the major fun part is in the hands-on coding of our own JIT compiler in Rust. &lt;/p&gt;

&lt;p&gt;But first, let's touch on a brief overview of compilers and runtimes.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Tour of Interpreters, Compilers and Everything In Between
&lt;/h2&gt;

&lt;p&gt;Every time you run a program, something bridges the gap between your code and the transistors that execute it. That bridge is the programming language runtime, a program designed to interpret or translate your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Is Just Text
&lt;/h2&gt;

&lt;p&gt;When you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is meaningless to the CPU. The processor understands only binary encoded instructions specific to its architecture. Someone or something must translate this code. &lt;/p&gt;

&lt;p&gt;Since the invention of high level programming languages, two main translation philosophies emerge:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Translate everything upfront (Compilation)&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Convert entire program to machine code before running&lt;/li&gt;
&lt;li&gt;Pay translation cost once&lt;/li&gt;
&lt;li&gt;Run at full hardware speed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translate on demand (Interpretation)&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Read and execute code line by line&lt;/li&gt;
&lt;li&gt;Quick upstart, no upfront wait to compile&lt;/li&gt;
&lt;li&gt;Pay translation cost at repeatedly&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The split, established in the 1950s, still define how we think about language implementation today. &lt;/p&gt;

&lt;h2&gt;
  
  
  A Brief History of Language Execution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Foundations - 1950s
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1952 - A-0 System
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.computinghistory.org.uk/det/5487/Grace-Hopper-completes-the-A-0-Compiler/" rel="noopener noreferrer"&gt;&lt;em&gt;"All I had to do was to write down a set of call numbers, let the computer find them on the tape, bring them over and do the additions. This was the first compiler."&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Grace Brewster Murray Hopper, A computer pioneer and United States Navy rear admiral created a system that translates symbolic mathematical code into machine language. Grace had been collecting subroutines and transcoded them on tape, each routine had a call number which the computer finds and then compute. &lt;/p&gt;

&lt;h4&gt;
  
  
  1958 - FORTRAN (FORmular TRANslator)
&lt;/h4&gt;

&lt;p&gt;This is the first widely used high-level programming language with an optimizing compiler. Though it was first developed in 1954 by John Backus at IBM to simplify math programming, subsequent versions in 1958 introduced a features that made code reusable.  &lt;/p&gt;

&lt;h4&gt;
  
  
  1958 - LISP
&lt;/h4&gt;

&lt;p&gt;This is the first interpreted language. Lisp was created by John McCarthy at MIT, it's the second high-level language designed for symbolic computation. Lisp introduced the REPL (Read Evaluate Print Loop) system that interactively interprets Lisp code.  &lt;/p&gt;

&lt;h3&gt;
  
  
  The Divergence  - 1960s-70s
&lt;/h3&gt;

&lt;p&gt;Between the 60s and 70s, more high-level programming languages had emerged, for systems programming, numerical computation, and rapid prototyping.&lt;/p&gt;

&lt;h4&gt;
  
  
  Compiled Languages: COBOL, FORTRAN, C
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Systems programming and numerical computation&lt;/li&gt;
&lt;li&gt;Maximum performance, minimal runtime overhead&lt;/li&gt;
&lt;li&gt;Steep edit-compile-run cycle&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Interpreted Languages: BASIC, early Smalltalk
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Education, rapid prototyping&lt;/li&gt;
&lt;li&gt;Immediate feedback, slow execution&lt;/li&gt;
&lt;li&gt;Often 10-100x slower than compiled code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  An Optimized Interpretation: The Bytecode Virtual Machine - 1980s
&lt;/h3&gt;

&lt;p&gt;Bytecode immediate representations were introduced in this period with Smalltalk-80 as the first consistently object-oriented language (after Simula, which introduced OO concepts). Unlike in classical interpreters, Bytecode borrows the structure of machine code - compact opcodes and operands but they are interpreted by a virtual processor (not directly on the CPU via machine code). This makes it more efficient to interpret while remaining portable and higher-level than machine code. &lt;/p&gt;

&lt;p&gt;Bytecode designs vary widely in abstraction level, some are quite close to machine code in structure, others are typically serialized abstract syntax trees. &lt;/p&gt;

&lt;p&gt;Other benefits of the Bytecode virtual machine (VM) is that it allows languages to be executable anywhere the VM can run, this made languages (like UCSD Pascal's P-code) to be written once and run anywhere, using the Bytecode as a distribution format. &lt;/p&gt;

&lt;h3&gt;
  
  
  Just-In-Time Execution (JIT) - 1990s
&lt;/h3&gt;

&lt;p&gt;In 1987, David Ungar and Randall Smith pioneered Just-In-Time compilation techniques while developing the &lt;strong&gt;Self&lt;/strong&gt; programming language.  Self was developed as an object-oriented language based on prototypes instead of classes. This design choice posed a significant challenge to the efficiency of the runtime implementation. Every operation was a method invocation, even simple operation like a variable access, this increased the complexity of the language's execution. &lt;/p&gt;

&lt;p&gt;As a workaround to Self programming language's implementation bottleneck, the team experimented with several approaches that led to key innovations in Just-In-Time execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adaptive optimization&lt;/li&gt;
&lt;li&gt;Inline caching, poplymorphic inline caches&lt;/li&gt;
&lt;li&gt;Type feedback and On-the-fly recompilation&lt;/li&gt;
&lt;li&gt;Achieved half the speed of C language, proving that a dynamic language could also be fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Self language team's innovation eventually laid the foundation for modern JIT compilers in languages implementations like Java Hotspot Virtual Machine, and more recent ones like V8 (Javascript), LuaJIT (lua), PyPy (python). &lt;/p&gt;

&lt;h4&gt;
  
  
  JIT Changed Everything
&lt;/h4&gt;

&lt;p&gt;Modern language implementations can now achieve best-in-class runtime performance without Ahead-Of-Time compilation to machine code.  JIT opened the doors to more runtime innovation. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Java (1995) / Hotsport (1999)&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Bytecode + JIT hybrid architecture became mainstream&lt;/li&gt;
&lt;li&gt;Hot paths and Cold paths compilation philosophy for runtime optimization&lt;/li&gt;
&lt;li&gt;JIT became industry standard&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Javascript / V8&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Earlier Javascript implementation started as a slow interpreter&lt;/li&gt;
&lt;li&gt;V8's JIT made Javascript among the fastest dynamic language implementations&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modern VMs combine everything&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Interpreter for cold startup (start fast, optimize later)&lt;/li&gt;
&lt;li&gt;Baseline JIT for quick compilation and hot paths (move hot code to JIT for better performance)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Execution Models Compared
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pure Interpretation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9j7st1aygc5yydm69tu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9j7st1aygc5yydm69tu.png" alt="An illustration of a programming language interpreter" width="800" height="979"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How it works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;A parser reads the source text and translate it into an Abstract Syntax Tree&lt;/li&gt;
&lt;li&gt;AST contains information about the language which is used to decode what to execute&lt;/li&gt;
&lt;li&gt;In a loop we walk the AST and interpret the next statement or operation in the tree. &lt;/li&gt;
&lt;li&gt;Repeat for the next statement&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Examples of interpreter implementations can be found in shell scripts  and early programming language implementations of BASIC, Python, Ruby.&lt;/p&gt;

&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution speed&lt;/td&gt;
&lt;td&gt;Slow (10-100x native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory usage&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Excellent (source available)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portability&lt;/td&gt;
&lt;td&gt;High (interpreter handles platform)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Why it's slow:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Parse/decode cost paid on every execution&lt;/li&gt;
&lt;li&gt;Dispatch overhead&lt;/li&gt;
&lt;li&gt;Little opportunity for optimization&lt;/li&gt;
&lt;li&gt;No direct use of CPU's native instruction pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Bytecode Interpretation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kq0cbh3fv2fyynhocsu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kq0cbh3fv2fyynhocsu.png" alt="An illustration of a bytecode interpreter implementation" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
Bytecode gets its name from its structure: a sequence of bytes where each byte (or small group of bytes) encodes an instruction (or set of operands). Think of it as machine code for a virtual computer, it doesn't execute on the native CPU hardware. This buys you portability (the same bytecode runs anywhere the VM runs) and efficiency (no re-parsing source code on every execution).&lt;/p&gt;

&lt;h4&gt;
  
  
  How it works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Compile source to bytecode (once)&lt;/li&gt;
&lt;li&gt;Bytecode is sequence of simple operations&lt;/li&gt;
&lt;li&gt;Interpreter loops: fetch opcode → dispatch to handler → execute&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Popular examples of bytecode-based programming languages: Ruby MRI, Lua, early versions of Java, Wren.&lt;/p&gt;

&lt;h4&gt;
  
  
  Types of Bytecode Interpreters
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Stack-based vs Register-based:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Stack-based (JVM, CPython, Wasm)     Register-based (Lua, Dalvik, LuaJIT)
─────────────────────────────────    ─────────────────────────────────────
push 3                               load r0, 3
push 4                               load r1, 4  
add          ; implicit operands     add r2, r0, r1   ; explicit operands
push 2                               load r3, 2
mul                                  mul r4, r2, r3

Smaller bytecode                     Fewer instructions
Simpler compiler                     Easier to optimize
More instructions executed           Maps better to real CPUs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Fixed vs Variable Width:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fixed width (simpler, faster decode):
┌────────┬────────┬────────┬────────┐
│ opcode │ operand│ operand│ operand│   Always 4 bytes
└────────┴────────┴────────┴────────┘

Variable width (compact, complex decode):
┌────────┐  or  ┌────────┬────────┐  or  ┌────────┬────────┬────────┐
│ opcode │      │ opcode │ operand│      │ opcode │  wide operand   │
└────────┘      └────────┴────────┘      └────────┴────────┴────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;Fast (compile once)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution speed&lt;/td&gt;
&lt;td&gt;Moderate (3-10x native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory usage&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Good (with source maps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portability&lt;/td&gt;
&lt;td&gt;Excellent (bytecode is platform-independent)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Why it's faster than pure source interpretation:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Parsing done once, not per execution&lt;/li&gt;
&lt;li&gt;Bytecode is compact (cache-friendly)&lt;/li&gt;
&lt;li&gt;Simpler dispatch (opcode vs. AST node type)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why it's still slow:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Dispatch overhead on every instruction&lt;/li&gt;
&lt;li&gt;No native code generation&lt;/li&gt;
&lt;li&gt;Can't use CPU's branch prediction effectively&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Study Recommendation For Building Interpreters
&lt;/h4&gt;

&lt;p&gt;If you are interested in building an interpreter, pure or bytecode based, I highly recommend Rob Nystrom's free book &lt;a href="https://craftinginterpreters.com" rel="noopener noreferrer"&gt;Crafting Interprers&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Tree-Walker Interpreter: &lt;a href="https://craftinginterpreters.com/a-tree-walk-interpreter.html" rel="noopener noreferrer"&gt;https://craftinginterpreters.com/a-tree-walk-interpreter.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A Bytecode Virtual Machine: &lt;a href="https://craftinginterpreters.com/contents.html#a-bytecode-virtual-machine" rel="noopener noreferrer"&gt;https://craftinginterpreters.com/contents.html#a-bytecode-virtual-machine&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Ahead-Of-Time (AOT) Compilation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94i7rbdqmnalyb3v3q12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94i7rbdqmnalyb3v3q12.png" alt="An illustration of an Ahead-Of-Time compiler architecture" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AOT compilation transforms the source code into machine code once. Some implementations have to choose between aggressive optimization versus compile time. Optimization may take more time and this usually means you have to wait for the code to compile before you can run it. Most systems programming language implementations prefer AOT compilation for maximum performance. Some give you option to switch between JIT and AOT. &lt;/p&gt;

&lt;p&gt;Common examples of AOT compiled languages are Rust, Haskell, C, C++, Go. &lt;/p&gt;

&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;Instant (already compiled)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution speed&lt;/td&gt;
&lt;td&gt;Fast (native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compile time&lt;/td&gt;
&lt;td&gt;Slow (seconds to minutes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;Depends on optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portability&lt;/td&gt;
&lt;td&gt;Low (recompile per platform)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Why it's fast:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;No runtime interpretation overhead&lt;/li&gt;
&lt;li&gt;Heavy optimization possible (compiler has time)&lt;/li&gt;
&lt;li&gt;Direct use of CPU features &lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Limitations:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Must know everything at compile time&lt;/li&gt;
&lt;li&gt;Can't optimize based on runtime behavior&lt;/li&gt;
&lt;li&gt;Long edit-compile-run cycle&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Just-In-Time Compilation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fomrchvs2yybztjcfmu73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fomrchvs2yybztjcfmu73.png" alt="An illustration of a Just-In-Time compiler architecture" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
JIT gives you half the performance of AOT, yet more portable. If you are developing a programming language for long-running fault-tolerant servers, JIT is the best choice. Most modern JIT infrastructure starts out from the &lt;code&gt;Cold Path&lt;/code&gt; - When the code runs the first time, it is interpreted for quicker execution, a profiler is then used to analyze the &lt;code&gt;Hot Paths&lt;/code&gt; - When certain functions or operations are being used more frequently, or require better optimization to perform.  Hot path code is recompiled into machine code and executed on-demand while the interpreter is still in process. Unlike AOT, the JIT architecture is a collaborative one, between the interpreter and the JIT executor. &lt;/p&gt;

&lt;p&gt;Modern compiler vendors support JIT execution, giving you the opportunity to skip AOT compilation before executing machine code. Common examples are LLVM and Cranelift. &lt;/p&gt;

&lt;h4&gt;
  
  
  How it works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Start executing (interpret or baseline compile)&lt;/li&gt;
&lt;li&gt;Profile: track which code runs frequently, what types appear&lt;/li&gt;
&lt;li&gt;Hot code triggers JIT compilation - &lt;strong&gt;inside the running process&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Generated native code stored in executable memory  - &lt;strong&gt;in the same address space&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Continue profiling, recompile if behavior changes&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Characteristics:
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Startup time&lt;/td&gt;
&lt;td&gt;Fast (interpret first)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peak execution speed&lt;/td&gt;
&lt;td&gt;Near-native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm-up time&lt;/td&gt;
&lt;td&gt;Moderate (JIT needs profile)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory usage&lt;/td&gt;
&lt;td&gt;High (code + compiler in memory)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portability&lt;/td&gt;
&lt;td&gt;Good (bytecode portable, JIT per-platform)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Why it can match AOT in some scenarios:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Optimizes based on &lt;em&gt;actual&lt;/em&gt; runtime behavior&lt;/li&gt;
&lt;li&gt;Can specialize for observed types&lt;/li&gt;
&lt;li&gt;Can inline across module boundaries&lt;/li&gt;
&lt;li&gt;Can deoptimize and reoptimize as behavior changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Distinctions Between AOT and JIT Compilation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjilzch75rloovafaeh7w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjilzch75rloovafaeh7w.png" alt="Illustration showcasing the key distinctions between AOT and JIT compilation" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  The fundamental tradeoff:
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AOT&lt;/th&gt;
&lt;th&gt;JIT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Compile once, run forever&lt;/td&gt;
&lt;td&gt;Compile repeatedly, run smarter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimizations are guesses&lt;/td&gt;
&lt;td&gt;Optimizations are informed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No runtime compilation cost&lt;/td&gt;
&lt;td&gt;Pays compilation cost during execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predictable memory usage&lt;/td&gt;
&lt;td&gt;Memory includes compiler infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must handle all possible cases&lt;/td&gt;
&lt;td&gt;Can specialize for observed cases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Programming language compilers and runtimes have evolved over many decades as computing advancements. In smaller projects, the differences in runtimes are not usually immediately obvious, but in performance and memory critical environments we begin to see where the other implementations shine and why they do.  In upcoming sequels to this post, we will discuss the anatomy of modern high performance runtimes, the different optimization strategies, and then we will build a JIT compiler from scratch!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Books:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="//craftinginterpreters.com"&gt;&lt;em&gt;Crafting Interpreters&lt;/em&gt; by Robert Nystrom (free online)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.amazon.es/Engineering-Compiler-Keith-Cooper/dp/012088478X" rel="noopener noreferrer"&gt;&lt;em&gt;Engineering a Compiler&lt;/em&gt; by Cooper &amp;amp; Torczon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sciencedirect.com/book/monograph/9781558609105/virtual-machines" rel="noopener noreferrer"&gt;&lt;em&gt;Virtual Machines&lt;/em&gt; by Smith &amp;amp; Nair&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Papers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.lua.org/doc/jucs05.pdf" rel="noopener noreferrer"&gt;"The Implementation of Lua 5.0" - Ierusalimschy et al.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cs.tufts.edu/comp/150IPL/papers/aycock03jit.pdf" rel="noopener noreferrer"&gt;"A Brief History of Just-In-Time" - Aycock&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.researchgate.net/publication/220751959_Trace-based_Just-in-Time_Type_Specialization_for_Dynamic_Languages" rel="noopener noreferrer"&gt;"Trace-based Just-in-Time Type Specialization for Dynamic Languages" - Gal et al.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementations to study:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LuaJIT (brilliantly simple tracing JIT)&lt;/li&gt;
&lt;li&gt;V8 (production JavaScript, open source)&lt;/li&gt;
&lt;li&gt;GraalVM (polyglot, written in Java)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Hi there! My name is Damilare Akinlaja&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;I am currently building &lt;a href="https://github.com/darmie/zyntax" rel="noopener noreferrer"&gt;Zyntax&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>jit</category>
      <category>llvm</category>
      <category>systemsprogramming</category>
      <category>compilers</category>
    </item>
  </channel>
</rss>
