Lahari Tenneti

Posted on May 18

LLVM #5 — Mutable Variables

#compilers #llvm #computerscience #learning

So far, Kaleidoscope has been a functional language with immutable variables and no reassignment. But to write anything resembling real code (loops that accumulate or programs with state), we need mutation. We add it now.

We introduce two features: the ability to mutate variables with =, and the ability to define new local variables with var/in.

What I built: Commit 7911f2b

What I understood:

Context

The Problem:

Kaleidoscope, in its functional paradigm, only ever had immutable variables.
But what if we want to write things like this iterative Fibonacci:

def fibi(x)
  var a = 1, b = 1, c in
  (for i = 3, i < x in
     c = a + b :
     a = b :
     b = c) :
  b;

The issue is that LLVM IR requires SSA form. In SSA, a variable is assigned exactly once. But mutation means assigning the same variable multiple times.
If we want to maintain SSA while making our language more imperative, there's immediate confusion over where to put the PHI nodes.
Because unlike if/else where the merge point is obvious from the AST, here the assignments could be scattered anywhere.

The Solution:

a) The "Easy Version:"

The trick LLVM recommends is that we write a deliberately simple (and temporarily inefficient) IR, and let LLVM clean it up. Here's how it works:

1) At the start of the function, we create a "box" on the stack for each mutable variable:

%x_addr = alloca i32

This gives x a memory address. The compiler doesn't need to track how many times x has been changed and just always knows the address.

2) Every time the user changes the value, we update the box:

store i32 10, i32* %x_addr

3) Every time we need to use the variable, we peek inside the box:

%x_val = load i32, i32* %x_addr

This is easy for the front-end to generate and requires no PHI node reasoning.
The cost is that we're constantly reading and writing to memory, which is slow.
But that's where mem2reg comes in.

b) mem2reg: The lifting pass:

After we generate the "easy version", we run PromotePass() (which is mem2reg). It performs a "lifting" operation:

1) It looks at each alloca and checks if it's only used for simple loads and stores.

2) If yes, it deletes the alloca, load, and store instructions entirely.

3) It replaces them with high-speed CPU registers (and for this, it tracks the lifetime of the values like a timeline), assigning a new SSA register every time a new value is written.

When paths merge (like after an if/else), mem2reg mathematically figures out where the PHI nodes need to go, using a graph theory concept called dominance frontier, so the logic remains correct but the speed is improved.
The dominance frontier works like this: if we have a block that forks into two paths and merges later (Block A → Block C, Block B → Block C), then C is in A's dominance frontier if A can influence what happens right before C, but A doesn't have "total" control over C (because B is an alternate path that skips A). At every such frontier, mem2reg knows a PHI node is needed.
The final result is that we don't have to look at a box in memory anymore and only need pure, high-speed CPU registers. And we never had to figure out PHI placement ourselves.

Rules for mem2reg to work:

alloca instructions must be in the entry block of the function (so the memory address is only created once per function call).
The alloca variable can only be used for direct load/store operations and can't pass its address into another function.
Works on single numbers, booleans, and pointers. Won't promote complex data structures like arrays or custom structs (that needs a different pass, sroa).

Making it happen:

1) NamedValues changes type:

static std::map<std::string, AllocaInst*> NamedValues;

Previously it held Value*. Now it holds AllocaInst*. This one line physically transitions the compiler from tracking values to tracking memory locations.

2) Adding an alloca at the beginning of a block:

static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, StringRef VarName) {
  IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
                   TheFunction->getEntryBlock().begin());
  return TmpB.CreateAlloca(Type::getDoubleTy(*TheContext), nullptr, VarName);
}

This creates a temporary IRBuilder pointing at the very first instruction of the entry block, then creates an alloca there.
Why the entry block specifically? Because mem2reg only promotes allocas it can find in the entry block guaranteeing that the box is created exactly once per function call.

3) Loading/Reading Variables:

AllocaInst *A = NamedValues[Name];
return Builder->CreateLoad(A->getAllocatedType(), A, Name.c_str());

Every variable reference is now a load from a memory address rather than a direct SSA value.

4) Registering variables in NamedValues:

For each argument, we now make an alloca, store the initial value into it, and register the alloca in NamedValues. This is what allows function arguments to be mutable.

5) Getting rid of the Phi node in the codegen for For:

The for loop no longer needs a PHI node for its induction variable.
Instead, we create an alloca for the loop variable in the entry block.
Store the start value into it.
At the end of each iteration, load the current value, add the step, and store the result back.
With this, the PHI node is gone and mem2reg handles the SSA construction.

6) Adding mem2reg:

TheFPM->addPass(PromotePass());

This is the mem2reg pass. It runs first, before InstCombine, GVN, etc. and converts all our alloca/load/store patterns back into clean SSA registers.

7) The Assignment Operator:

= is parsed as a binary operator with precedence 2 (lower than everything else), but its codegen is a special case as it doesn't follow the normal "emit LHS, emit RHS, do computation" model. Instead:

if (Op == '=') {
  VariableExprAST *LHSE = static_cast<VariableExprAST*>(LHS.get());
  // ...
  Value *Val = RHS->codegen();
  Builder->CreateStore(Val, Variable);
  return Val;
}

It's important to note that the LHS must be a variable and not an expression. (x + 1) = 5 is illegal; only x = 5 is valid.
And assignment returns the assigned value, which allows chaining like x = (y = z).

8)var/in (user-defined local variables):

var/in declares one or more variables, optionally initialises them (defaulting to 0.0), and makes them available for the duration of the body expression. It has the aame procedure as always — lexer, AST, parser, codegen.
VarExprAST::codegen() loops over all the declared variables, emits each initialiser before adding the variable to scope (so var a = 1 in var a = a in ... correctly refers to the outer a), creates an alloca, stores the initial value, and saves the old binding in OldBindings.
After the body runs, it restores all the old bindings.

What's next: Compiling to object code.

Musings:

There's something nice about the alloca + mem2reg pattern. We deliberately write something worse (slower, more verbose, and naively use memory where registers would do) and then trust a pass to fix it. The front-end stays simple while the complexity lives in the optimiser, where it's been tested and tuned.

I suppose not every problem needs to be solved at the level it's encountered. Sometimes the right move is to do the honest and simpler version of something and let a more capable system handle the hard part. The trick is knowing which problems are ours to solve and which ones we can hand off.

DEV Community

LLVM #5 — Mutable Variables

Top comments (0)