Lahari Tenneti

Posted on Mar 17 • Edited on Mar 18

LLVM #2 — Optimiser Support & JIT Compilation

#llvm #compilers #computerscience #learning

Having understood what compiler optimisations are and how they work in theory, we now actually wire them into Kaleidoscope, and then take things one step further by adding a JIT compiler so our REPL can evaluate expressions on the spot.

What I built: Commit 91267d2

What I understood:

1) Adding Optimisation Passes:

So far, our codegen was correct but not efficient. The IR we produced was merely a pretty-print of the AST. LLVM provides a FunctionPassManager to change that.
A pass is simply one "go" over the IR that looks for a specific pattern and rewrites it. The FunctionPassManager contains a sequence of passes and runs them over each function in that order, passing the output of one as input to the next.
The key change is in InitialiseModuleAndManagers(). After creating the module and the IR builder, we now also have a whole suite of analysis and pass managers:

TheFPM = std::make_unique<FunctionPassManager>();
TheLAM = std::make_unique<LoopAnalysisManager>();
TheFAM = std::make_unique<FunctionAnalysisManager>();
TheCGAM = std::make_unique<CGSCCAnalysisManager>();
TheMAM = std::make_unique<ModuleAnalysisManager>();

The four AnalysisManagers each correspond to a level of LLVM's IR hierarchy — loops, functions, call-graph SCCs, and whole modules.
They're required so transform passes can look up analysis results when they need them.
Side note: A call graph is one with functions for nodes, with every edge representing a call. So an edge from A to B means A calls B.
SCCs (Strongly Connected Components) in a call graph represent a group of functions where each function can reach/call the other. (Mutual recursion of sorts)

Then we create four transform passes:

TheFPM->addPass(InstCombinePass());    // peephole optimisations
TheFPM->addPass(ReassociatePass());    // reorder expressions
TheFPM->addPass(GVNPass());            // eliminate redundant computations
TheFPM->addPass(SimplifyCFGPass());    // clean up unreachable blocks

A) InstCombinePass:

Handles "peephole" optimisations (small/local rewrites).
1+2 becoming 3.0 before the program ever runs is constant folding at play, and this pass is what catches it in the generated IR.

B) ReassociatePass:

Reorders expressions to enable more opportunities for other passes.
(x+1)+2 becomes x+3. Small change, but it means GVN can now recognise that (x+1)+2 and 1+(x+2) are the same thing.

C) GVNPass (Global Value Numbering):

Assigns a symbolic identity to each new/unique computation and replaces duplicates.
Ex: if we write (1+2+x)*(x+(1+2)), both sides of the multiplication are x+3.
Without GVN, the IR computes x+3 twice. With it, the result is computed once and reused.

So before GVN:

%addtmp = fadd double 3.000000e+00, %x
%addtmp1 = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp1

After GVN:

%addtmp = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp

D) SimplifyCFGPass:

Cleans up the control flow graph. If a branch can never be entered into, or a block has no predecessors, this pass removes it.
It's more like keeping the IR tidy after the other passes have finished their work.

Finally, we run the pass manager after every function is constructed, just before returning it:

TheFPM->run(*TheFunction, *TheFAM);

The FunctionPassManager updates the function in-place. The IR going in is the naive transcription; the IR coming out is the cleaned-up, optimised version.

2) JIT Compilation:

Now that we have nice IR coming out of the optimiser, we want to execute it — not just pretty-print it. That's where the JIT comes in.
JIT (Just-In-Time) compilation means converting LLVM IR to native machine code at runtime, in memory, right as the user types. The result is a pointer to executable code we can call directly, as if it were a C function.
Setting it up requires initialising the native target first:

InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
TheJIT = ExitOnErr(KaleidoscopeJIT::Create());

// Side note: LLVM is cross-platform by design. These initialisation calls are what tell
// the JIT to look at the hardware the user is on and prepare to speak its specific language.
// Ex: Even if we're on Apple Silicon (arm64), LLVM could generate x86 code if we told it to.

And setting the module's data layout to match the JIT's:

TheModule->setDataLayout(TheJIT->getDataLayout());

This is important as it ensures that the memory layout of structs, function arguments, and return values matches what the host machine expects.
When the user types a top-level expression like 4+5;, we wrap it in an anonymous function (__anon_expr), add the module to the JIT, look up the symbol, and call it:

auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
double (*FP)() = ExprSymbol.toPtr<double (*)()>();
fprintf(stderr, "Evaluated to %f\n", FP());

The JIT compiles the LLVM IR to machine code, returns its address, we cast it to a function pointer, and call it like any other native function.
There's no difference at the hardware level between JIT-compiled code and statically linked machine code (i.e., they live in the same address space).

After calling it, we clean it up:

ExitOnErr(RT->remove());

The ResourceTracker (RT) is responsible for the JIT'd memory allocated to that anonymous expression. Removing it frees that memory.

3) The module lifetime problem:

The issue in the REPL:

ready> def testfunc(x y) x + y*2;
ready> testfunc(4, 10);
Evaluated to 24.000000

ready> testfunc(5, 10);
LLVM ERROR: Program used external function 'testfunc' which could not be resolved!

testfunc was defined in the same module as the anonymous expression for testfunc(4, 10).
When we removed that module from the JIT to free the memory for the anonymous expression, we inadvertently deleted testfunc along with it.
The solution: Every function definition gets its own module. The JIT can resolve calls across module boundaries, so testfunc lives in its own module indefinitely.
Each new anonymous expression gets a fresh module, which we remove after execution. Function definitions stay.
But this creates a new problem: when the codegen for a new anonymous expression tries to emit a call to testfunc, that function doesn't exist in the current module.
The IR emitter needs at least a declaration of testfunc to generate a valid call instruction.
The solution is getFunction(), a helper that first checks the current module for a declaration, and if it doesn't find one, regenerates it from FunctionProtos (a map of the most recent prototype for every function we've seen):

Function *getFunction(std::string Name) {
  if (auto *F = TheModule->getFunction(Name))
    return F;

  auto FI = FunctionProtos.find(Name);
  if (FI != FunctionProtos.end())
    return FI->second->codegen();

  return nullptr;
}

So the function "body" lives in its own module in the JIT.
The function "declaration" gets re-emitted into each new module that needs to call it.
The JIT links them at call time.

What's next: Control flow (if/then/else and loops).

Musings:
Learning the piano taught me something about passes. When you're learning a new piece, you don't play the whole thing perfectly on the first try. You play it slowly, fix the wrong notes, correct the timing, then fix the phrasing, then the dynamics. Each run is a pass, and each one builds on the last until what comes out sounds nothing like the stumbling first attempt, but means exactly the same thing. The optimiser does the same thing. The IR that enters is technically correct and the IR that exits is still correct; only better.

DEV Community

LLVM #2 — Optimiser Support & JIT Compilation

Top comments (0)