itplamen

Posted on Jun 5 • Edited on Jun 10

JIT Compiler Lowering Explained: What Happens at Runtime (Part 2)

#programming #compiling #systems #architecture

This series is based on official sources, compiler code exploration, runtime experimentation with decompiler tools, and real-world experience.

From IL to Machine Code
What is a Just-In-Time Compiler
A Brief History of Just-In-Time
JIT Compiler Architecture
Conclusion: Why JIT Lowering Matters
References and Further Reading

From IL to Machine Code

In C# Compiler Lowering Explained: What Happens at Build Time (Part 1), we explored how high-level constructs are lowered into lower-level equivalents and compiled into Intermediate Language (IL). However, IL is still not what the CPU executes.

Before execution, IL must undergo additional transformations — this time at runtime. One such transformation is JIT lowering phase.

In Part 2, we will explore how runtime lowering differs fundamentally from its compile-time counterpart. But before diving in, it is important to understand what a JIT compiler is and how its internal architecture works.

What is a Just-In-Time Compiler

Just-In-Time (JIT) compilation is a runtime technique that translates platform-independent bytecode (such as .NET IL or Java bytecode) into native machine code just before execution.

Unlike ahead-of-time (AOT) compilation, JIT has access to runtime information, allowing it to generate code that is tailored to the current execution environment — including the specific CPU architecture and runtime behavior.

As John Aycock from the University of Calgary notes in his work A Brief History of Just-In-Time:

Software systems have been using “just-in-time” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution.

A Brief History of Just-In-Time

1952 - A-0 Compiler

Compilers have evolved significantly over time. One of the earliest examples is the A-0 compiler developed by Grace Hopper in 1952, often considered the first compiler. As she described:

All I had to do was to write down a set of call numbers, let the computer find them on the tape, bring them over and do the additions. This was the first compiler.

1960 - LISP

Later, in 1960, John McCarthy introduced LISP in his paper Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. While not describing JIT compilation directly, it introduced concepts around dynamic evaluation and runtime execution that influenced later runtime system design.

1970 - FORTRAN

In the 1970s, Hansen’s work on FORTRAN introduced the idea of identifying frequently executed code paths (hot spots) and optimizing them dynamically at runtime. Although primitive compared to modern JIT compilers, these early systems established a fundamental principle that still holds today: runtime information can enable more effective optimizations than static analysis alone.

1980 - Smalltalk

The Smalltalk systems of the 1980s introduced one of the first practical uses of JIT compilation. Deutsch and Schiffman developed a system that translated virtual machine code into native machine code at runtime, compiling methods lazily. This approach significantly improved performance and introduced key JIT concepts still used today, such as lazy compilation and code caching.

1990 - Java

In the late 1990s, JIT compilation was largely driven by the rise of Java. Early Java Virtual Machines (JVMs) relied heavily on interpretation, which resulted in poor performance. JIT compilation emerged as the key solution, translating bytecode into native machine code at runtime. However, it quickly became clear that simple translation was not enough — effective runtime optimization was essential for achieving good performance. This led to significant advancements in JIT design, including improved intermediate representations and optimization strategies.

JIT Compiler Architecture

A general structure of a JIT compiler is shown below.

At a high level, a JIT compiler takes platform-independent bytecode, produced by a front-end compiler (such as .NET Roslyn or javac), and translates it into native machine code at runtime.

Bytecode

Before this translation happens, the input bytecode must be parsed into a structured representation that the JIT compiler can reason about. Bytecode is easier to analyze than source code and is composed of low-level instructions that typically include opcodes and operands.

Opcodes (Operation Codes) — virtual instructions that specify operations, such as adding data, moving information, or jumping to a new memory location. For example: add (addition), ldloc (load local variable), stloc (store local variable).
Operands — the values or registers that an opcode operates on, such as: constants 10, variables x, y, z and memory address 0xA1.

When bytecode is inspected using disassembler tools, instructions are shown using mnemonics, which are human-readable representations of opcodes. For example, MUL (multiplication), and MOV (move data).

IL Translation

Also called the Importer, this phase is responsible for translating bytecode into an intermediate representation (IR) that is easier to analyze and optimize. During this phase, the JIT converts bytecode instructions into internal data structures such as:

IR Trees (IR GenTrees) — tree-based representations of expressions and operations used for analysis and optimization.
Control Flow Graph (CFG) — a representation of all possible execution paths through a program. Instead of viewing code as a linear sequence of instructions, the JIT organizes it into a graph structure that captures branching and control flow. A CFG consists of:
- Basic Blocks — sequence of instructions with a single entry and exit point and no internal control flow (jumps) in the middle (no if, goto or return).
- Control Edges — directed connections between basic blocks that represent possible execution paths.

Static Single Assignment (SSA) Construction

SSA is an intermediate representation used by most modern compilers, including RyuJIT, OpenJ9 JIT, GCC, LLVM, V8, HotSpot and the Go compiler. Introduced in the late 1980s, SSA is a fundamental technique in compiler design in which each variable is assigned exactly once, and every use of a variable refers to a single definition. This makes many optimizations simpler and efficient.

Variable renaming (single assignment)

Consider the following code:

x = 10;
x = x - 5;
x = x * 3;
x = x + 9;
y = x

The variable x is assigned multiple times. When the compiler reads y = x, it must trace back through previous assignments to determine the correct value.

In SSA form, each assignment creates a new version of the variable:

x1 = 10;
x2 = x1 - 5;
x3 = x2 * 3;
x4 = x3 + 9;
y = x4

Now each variable is defined exactly once and when the compiler reads y = x4, there is no ambiguity — it can directly reference the definition.

Insert φ (phi) nodes at merge points — SSA introduces additional complexity when dealing with branching code. Consider an if/else statement where each branch assigns a different value to the same variable:

int x;
if condition {
    x = 10;
} else {
    x = 20;
}

y = x;

Here, x may have different values depending on execution path. When execution reaches y = x, the compiler must account for both possibilities.

In SSA form, this is handled using φ (phi) nodes, which merge values from different control flow paths:

int x1, x2;
if condition {
    x1 = 10;
} else {
    x2 = 20;
}

int x3 = φ(x1, x2);
y = x3;

The φ (phi) function selects a value based on the control flow path taken at runtime:

If the "if" branch is taken, "x3" gets the value of "x1"
If the "else" branch is taken, "x3" gets the value of "x2"

Φ nodes are a conceptual construct used by the compiler to represent value merging.

Data Flow Analysis

Once SSA is constructed, the data-flow analysis phases executes in which the JIT analyzes how data moves through the program using the Control Flow Graph (CFG). The results of this analysis are later used by various optimization passes, such as removing redundant computations and reusing previous calculations. Data-flow analysis consists of several analyses:

Reaching Definitions Analysis — determines which assignments (definitions) of a variable can reach a given point in the program without being overwritten. It answers the question: "Where did this variable’s value come from?" The algorithm for computing reaching definitions is outlined below:
- Generate blocks — assign a unique label to each definition and group the code into basic blocks (B).
- Identify GEN(B) and KILL(B) sets — for each basic block (B), compute the GEN(B) set of definitions generated within block (B) that reach the end of (B) and are not overwritten within the block, and KILL(B) set of definitions from outside the block that are invalidated (overwritten) by the code inside the block (B).
- Apply data-flow equations — using the computed GEN(B) and KILL(B) sets, determine the IN(B) and OUT(B) sets for each basic block, representing definitions that reach the entry/exit of (B), respectively. JIT compilers use the following equations to calculate what information exits a block based on what entered it: The first equation computes the input state of a block by merging the OUT(B) sets of its predecessor blocks using the (⋃) union operator. This represents all definitions that may reach block (B) from any incoming control-flow path. The second equation computes the output state by taking the incoming definitions IN(B), removing those that are overwritten KILL(B), and adding newly generated definitions GEN(B).
- Annotate the Control Flow Graph (CFG) — once the data-flow equations reach a fixed point, the computed GEN(B), KILL(B), IN(B), and OUT(B) sets are associated with each basic block in the CFG. These annotations describe how definitions propagate through the program and are used by subsequent optimization passes.

Live Variable Analysis — unlike Reaching Definitions, which focuses on assignments, Live Variable Analysis focuses on variable usage. It determines which variables may be used in the future at a given point in the program. This is a backward data-flow analysis, meaning it works from the end of the program toward the beginning. A variable is considered live at a program point if its current value may be used along some future execution path. If a variable is never used again, it is considered dead at that point. There are two kinds of variable liveness:
- Syntactic Liveness — a variable is considered live if there exists a path in the control-flow graph where it is used later in the program.
- Semantic Liveness — a variable is truly live if its value actually affects the program’s behavior.
x is both syntactically and semantically LIVE
```
int x = y * z;
return x;
```
x is DEAD
```
int x = 10;
return 0;
```
x is LIVE (conditionally)
```
int x = y * z;

if (flag)
    return x;

return 0;
```

Available Expressions Analysis (AVAIL) — is a forward data-flow analysis that determines which expressions have already been computed and remain valid at different points in the program. In other words, an expression is considered available if the compiler can safely reuse its previously computed value instead of recalculating it. It is closely related to Common Subexpression Elimination (CSE) analysis in RyuJIT.

Available Expressions

int x = y * z;
return y * z;

After computing y * z in the first statement, and since neither y nor z has been modified, the expression remains valid at the next program point. Therefore, the available expression set includes avail(n) = { y * z }, allowing the JIT compiler to reuse the previously computed result instead of recalculating the expression.

Unavailable Expressions

int a = 2;
int b = 3;

int c = a + b; 
int x = c * 10;

a = 5;

int d = a + b;

After computing a + b, the expression is initially available avail(n) = { a + b }. However, the assignment a = 5; modifies the value of a, which invalidates any previous computations. As a result, when the compiler reaches d = a + b; the expression a + b is no longer available and must be recomputed.

Optimizations

JIT compilers apply a wide range of optimizations to transform the intermediate representation into more efficient code. Examples include:

Local Optimization — focuses on improving a small region of code, typically within a single basic block. Common techniques include:

Removing Redundant Instructions — redundant computations are identified and removed by reusing previously calculated results.
Constant Folding — expressions are calculated at compile time and replaced with computed values by the JIT compiler. For example, instead of calculating the expression, the compiler can assign x = 10;
```
// High-level code
int x = 2 * 3 + 4;

// Before optimization IR
t1 = MUL(2, 3)
t2 = ADD(t1, 4)
x = t2

// After optimization IR
(MOV, 10, -, x) 
```

Strength Reduction — expensive operations are replaced with equivalent but less expensive ones.

// High-level code -> Before optimization
int y = pow(x, 2);  
int z = x * 32;

// Intermediate Representation (IR) -> Before optimization
(CALL, pow, x, 2, y)
(MUL, x, 32, z)

// High-level code -> After optimization
int y = x * x;
int z = x << 5;

// Intermediate Representation (IR) -> After optimization
(MUL, x, x, y)
(SHL, x, 5, z) -> shifting (SHL) is faster than multiplication

Global Optimization — operates across multiple basic blocks or the entire function (method). While more computationally expensive, these optimizations provide a better increase in performance. Common techniques include:

Dead Code Elimination (DCE) — removes unreachable or unused code.
Example: Unused Variable

// High-level code
int a = 5;
int b = 10;
int c = a + b; // Never used
return a;

// Before optimization IR
(MOV, 5, -, a)
(MOV, 10, -, b)
(ADD, a, b, c)
(RET, a, -, -)

// After optimization IR
(MOV, 5, -, a)
(RET, a, -, -)

Example: Unreached Code

// High-level code
int x = 10;
if (false) {
    x = 20; // Never used
}
return x;

// Before optimization IR
(MOV, 10, -, x)
(JMP_FALSE, false, L1, -)
(MOV, 20, -, x)
L1:
(RET, x, -, -)

// After optimization IR
(MOV, 10, -, x)
(RET, x, -, -)

Control Flow Optimization — focuses on improving how execution flows through a function. Instead of optimizing individual expressions, it operates on the program’s control structure. The goal is to reduce unnecessary branching, simplify execution paths, and improve overall execution efficiency.

Lowering

Unlike front-end compilers (e.g. .NET Roslyn), where lowering is a hidden transformation phase between Semantic Analysis and Intermediate Code Generation, in JIT compilers such as RyuJIT (CLR), lowering is a major architectural phase. At this stage, the compiler transforms the optimized high-level intermediate representation (IR) into a machine-oriented, low-level IR (LIR) that is much closer to actual CPU instructions.

Example:

// High-level code
int y = (x * 3) + 10;

// Intermediate Representation (IR)
(ADD, (MUL, x, 3), 10, y)

// Lowering to x86 / x64
(LEA, x + x * 2 + 10, y)

// Lowering to ARM64
(MUL, x, 3, t1)
(ADD, t1, 10, y)

Lowering is architecture-aware. Although both versions compute the same result, the generated IR differs because lowering adapts the program to the target architecture.

On x86/x64, the architecture provides LEA (Load Effective Address) instruction, which can compute expressions like base + index * scale + offset. This allows the JIT compiler to replace x * 3 with x + x * 2, which can reduce instruction count and improve efficiency.

On the other hand, the ARM64 architecture follows a RISC (Reduced Instruction Set Computer) design, which means that there are no LEA-style complex instructions, thus arithmetic operations must be explicit using small, highly-optimized set of simple instructions.

Code Generation

This is the final phase of the JIT compilation pipeline, where the compiler transforms the intermediate representation into machine instructions that the CPU can execute. At this stage the IR has already been lowered into a machine-friendly form (LIR), and registers have been assigned. The JIT compiler performs several critical tasks:

Instruction Selection — the process of mapping low-level intermediate representation (LIR) operations to actual machine instructions supported by the target CPU.
Register Allocation — this is the process of assigning program values to a limited set of physical CPU registers. It determines where each value lives at runtime — either in a register or, if necessary, in memory. Modern CPUs operate fastest on registers, but the number of available registers is limited:

x64 has 16 registers
ARM64 has 31 registers (X0–X30)

Programs often have far more "live" values than available registers, so the compiler must decide which values remain in registers and which are temporarily stored in memory (a process known as register spilling).
```
// High-level code
int x = 10;
int y = x + 1;
int z = x + 2;

// Before optimization IR -> "x" is loaded twice from memory
(STORE, 10, -, x)
(LOAD, x, -, t1)
(ADD, t1, 1, y)
(LOAD, x, -, t2)
(ADD, t2, 2, z)

// After optimization IR -> "R1" holds the value 10 and reused
(MOV, 10, -, R1)      
(ADD, R1, 1, y)
(ADD, R1, 2, z)
```
Spilling (When Registers Are Not Enough) — if there are more "live" values than available registers, some values must be stored in memory temporarily:
```
(MOV, 10, -, R1)
(STORE, R1, -, [mem])   // spill to memory
...
(LOAD, [mem], -, R2)    // reload later
```
This introduces overhead, which is why efficient register allocation is critical for performance.
Instruction Encoding — is the final task where selected machine instructions are translated into binary form (opcodes) that the CPU can execute.

Conclusion: Why JIT Lowering Matters

If C# Compiler Lowering Explained: What Happens at Build Time (Part 1) of the series showed how compilers reshape our code at build time, Part 2 reveals what happens when that same process continues at runtime. This is where lowering becomes interesting.

Understanding JIT lowering changes how we think about runtime execution. Code is no longer static once compiled to IL — it is continuously analyzed, transformed, and optimized as it moves closer to the hardware. At the center of this process is the JIT lowering phase — the moment where the compiler stops reasoning in terms of expressions and semantics alone, and starts reasoning in terms of the machine itself.

Just-In-Time lowering is where abstraction meets reality, and where abstract program structure is reshaped by the constraints of a specific architecture.

DEV Community