After all we've done (building a lexer, parser, code-generator, optimiser, and the JIT), we give Kaleidoscope decision-making abilities.
What I built: Commit 5ba5803
What I understood:
If/Then/Else:
1) Lexer:
- We add three new tokens, namely
tok_if,tok_then, andtok_else, and their corresponding checks ingettok()(throughif (IdentifierStr == ...))
2) AST:
- The
IfExprASTholds three child expressions —Cond,Then, andElse. - It's worth noting that in Kaleidoscope, everything is an expression. There are no statements. This means that
if/then/elsedoesn't result in an action, and instead returns a value. - This is to keep the language consistent, as the codegen never has to resolve whether something is an expression or a statement, as only the former is allowed.
3) Parser:
- The
ParseIfExpr()consumesif, parses the condition, expectsthen, parses the then-expression, expectselse, parses the else-expression, and returns anIfExprAST. This is simple recursive descent at play.
4) Codegen:
- When we generate code for an if-then-else condition, we can't emit instructions linearly anymore. The two branches (then, else) are mutually exclusive, and only one runs.
- This is where we need proper control-flow: a conditional branch, two separate blocks of code, and a merge point.
- That looks like:
entry:
%ifcond = fcmp one double %x, 0.0
br i1 %ifcond, label %then, label %else
then:
%calltmp = call double @foo()
br label %ifcont
else:
%calltmp1 = call double @bar()
br label %ifcont
ifcont:
%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
ret double %iftmp
What I didn't understand (in if/else):
a) Why basic blocks?
- Code is merely a list of instructions. Why not just emit them line-by-line and tell the CPU to 'jump' when it hits an
if? - This is because it would be nightmarish for the optimiser to understand the flow in such a scenario. LLVM hence forces us to use 'basic blocks,' which are chunks of code guaranteed to execute from beginning to end, without any jumping in or out.
- Through blocks, the compiler has a somewhat high-level map (Ex: It knows exactly what happens in a
ThenBB, anElseBB, etc.), which matters for optimisation. If the optimiser knows a block runs as a unit, it can reason about the whole block at once.
b) Why the Phi node?
- Why couldn't I just assign a value to a variable in both blocks, depending on the condition, and then have the
ifcontread the value? For example:
x = 20
ans = ""
if (x % 2 == 0)
ans = "even"
else
ans = "odd"
- This is because LLVM uses SSA. The same variable cannot be changed once defined, thereby making the
elsebranch impossible unless... we use a Phi node. - This Phi node sits at the junction where both
ifandelsebranches meet. It does no "calculation" and only looks back at the path the CPU took. At runtime, when the CPU jumps fromthenorelsetoifcont, it already carries information about which block it just came from. - Phi reads that "came from" information and resolves to the corresponding value.
-
Ex: my value is
calltmpif we came fromthen, orcalltmp1if we came fromelse.
c) Why do we insert the Phi node manually for if/else, instead of using alloca + mem2reg to handle user variables?
- This is because we know exactly where the merge happens.
- When codegen is processing an
IfExprAST, it knows the shape of the problem before it even starts. There are two branches. They will meet at exactly one point. That meeting point needs exactly one Phi node with exactly two inputs. It's the same every single time, no matter what. So we just write it directly. - Contrarily,
alloca+mem2regis more helpful when code looks like this:
var x = 1;
x = x + 1;
- Here,
xgets assigned in multiple places. And in a more complex program, those assignments could be scattered across loops, nested ifs, all over the place. The compiler can't just look at the AST node for x and know where to put the Phi. It would have to trace every possible path through the entire program to figure out where values ofxmerge. - So instead of doing that hard work ourselves, we use
alloca(we givexa slot in memory, let every branch just write to that slot, and then hand it off tomem2reg). -
mem2regis a pass that already knows how to trace control flow, find all the merge points, and insert the right Phi nodes automatically.
d) Why do we re-fetch the ThenBB block?
- Why is there a need for another
ThenBB = Builder -> GetInsertBlock()if we already had it? - This is to account for recursion. If the code inside our
thenblock is a simplex + y, the pointer is the same. But if it contains anotherif/else, the nestedifwill create its own blocks and move the builder's insertion point. - In this case, the builder sits at the end of the nested merge block. Hence, we re-fetch it because the Phi node needs to know the final block that ran, and not necessarily the one we started out with.
The for loop:
for i = 1, i < n, 1.0 in
putchard(42);
- There are four parts to this: start value, end condition, step value (defaults to 1.0), and the body. We follow the same 'lexer, parser, AST, codegen' template as
if/else. - The IR generated by the codegen looks like:
entry:
br label %loop
loop:
%i = phi double [ 1.0, %entry ], [ %nextvar, %loop ]
%calltmp = call double @putchard(double 42.0)
%nextvar = fadd double %i, 1.0
%loopcond = fcmp one double %booltmp, 0.0
br i1 %loopcond, label %loop, label %afterloop
afterloop:
ret double 0.0
`
What I didn't understand (in for loops):
a) Why is there a preheader in the for-loop, before it even starts?
- Again, for the Phi node. It needs to know which block a value came from. When we enter a loop for the first time, we aren't wntering from within the loop itself. As trivial as it sounds, we enter from outside the loop.
- The Preheader thus gives the Phi node a clear starting point for the first iteration, and without which the Phi node wouldn't know what our counter's initial value should be.
b) Why return a 0.0?
- As of now, our for loops return a value of
0.0. - This is because we don't have mutable memory yet. The loop variable disappears once the loop ends, causing the body's value to not be accumulated anywhere.
- This is just a temporary placeholder of sorts.
c) What if the variable we use for the loop already exists? What happens to its value after the loop ends?
- This is better understood with an example:
var i = 99; for i = 1, i < 10, 1.0 in putchard(i); - The outer
i's value is 99. And the loop has anivalue of its own. - The problem is that both live in the same symbol table,
NamedValues. Whenever the loop writesNamedValues["i"] = Variable, the previous value gets overwritten. Thus, when the loop ends, the pre-loop value is either gone, or replaced by the loop's last value. - The answer to this is Variable shadowing. Pretty similar to what we did with Lox's environments. We peek at
NamedValuesto see that pre-loop value, and save it.cpp Value *OldVal = NamedValues[VarName]; NamedValues[VarName] = Variable; // loop's i takes over
Then we restore:
cpp
if (OldVal)
NamedValues[VarName] = OldVal; // 99 comes back
else
NamedValues.erase(VarName); // nothing was there before, so clean up
- The inner variable doesn't destroy the outer one and only temporarily steps in front of it. Once the loop exits, the outer scope is exactly as it was. Same principle as Lox's chained environments, just done manually here since we're managing the symbol table ourselves.
What's next: Extending Kaleidoscope with user-defined operators. More control.
Musings:
Hehe, hello again. Twenty thousand different things came up. I hope to be more consistent with this though. I mean, it is sort of good that I keep coming back. Not even as an obligation because I actually enjoy it very much. But I could do better.
Anyhoo, I've been walking quite a bit lately (which I absolutely love). It is among the few things I do to relax. I think excellently when I'm walking. Any problem I feel I'm unable to find a solution to seems to solve itself merely 15-20 minutes into a walk. I feel more optimistic and in charge of life. I also learn to, briefly though it may be, set aside all that goes on in my little world, and just feel like I'm part of something bigger. Almost akin to that mix of awe and ease one feels knowing they're in the audience witnessing something grand. Our world too, can be beautiful and inspire hope if we figure out how to look at it. A walk out there helps.



Top comments (0)