Kaleidoscope's grammar can now be extended by the user. They can define their own binary and unary operators with custom symbols and precedence, without rewriting the parser or adding new cases to the codegen.
What I built: Commit 89fa3f8
What I understood
The idea is simple. We should allow for users to write something like:
and
Through which we can do:
1) Lexer:
- We add two new tokens:
tok_binaryandtok_unary, along with their checks ingettok().
2) AST:
- We create a
UnaryExprAST, which is pretty similar toBinaryExprAST, but with one child instead of two. - We also extend
PrototypeASTto have two new fields:IsOperator(bool) andPrecedence(unsigned). - A prototype now knows whether it's defining an operator, and if yes, at what precedence.
3) Parser:
-
ParsePrototype()uses a switch-case onCurTok. - If it sees
tok_identifier, it's a regular function, like before. - But if it sees
tok_binary, it reads the operator character, optionally reads a precedence number (in case of binary), and builds the name"binary" + char(sobinary|,binary>, etc.). If it sees
tok_unary, it does the same but without precedence.ParseUnary()is also new. It sits betweenParseExpression()andParsePrimary()in the call chain.If the current token looks like a unary operator (an ASCII character that isn't
(or,), it consumes it and recursively callsParseUnary()on the rest. This is for handling chaining (like!!x). Otherwise, it falls through toParsePrimary().ParseExpression()andParseBinOpRHS()are also updated to callParseUnary()instead ofParsePrimary().
4) Codegen:
This is where things change a bit.
- For binary operators,
BinaryExprAST::codegen()already had a switch-case onOp. We just add a default case that does a symbol table lookup for"binary" + Opand emits a call to it:
Function *F = getFunction(std::string("binary") + Op);
assert(F && "binary operator not found!");
Value *Ops[2] = { L, R };
return Builder->CreateCall(F, Ops, "binop");
User-defined operators are mostly similar to functions (only with new names). The codegen doesn't bother distinguishing whether it's a function or UDF; It merely finds the function and calls it.
Likewise, for unary operators,
UnaryExprAST::codegen()looks up"unary" + Opcodeand calls it.Like I mentioned earlier, before building the function body, if the prototype is a binary operator, we register its precedence in
BinopPrecedence. This change is made inFunctionAST::codegen():
if (P.isBinaryOp())
BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();
- The grammar is dynamically extensible at JIT runtime: define a new operator and it is immediately available with the right precedence.
What I didn't understand:
a) How does naming operators binary| or unary! work?
- I had this question because we construct a string like
binary|and use it as a function name. - The thing is, LLVM's symbol table allows names with symbols, so
binary|is a perfectly valid function name in LLVM IR. - When the user writes
x | y, codegen looks upbinary|in the module, finds the user-defined function, and emits a call. - It is an ordinary function dispatch dressed up to look like operator syntax.
b) Why do user-defined operators not need new AST nodes?
- This is because the existing
BinaryExprASTandUnaryExprASTalready represent “an operator applied to operands”, and thus don't care whether the operator is built-in or user-defined. - The only thing that changes is what
codegen()does with an unrecognisedOp. Instead of erroring, it looks the operator up as a function. - The AST stays blissfully unaware of the distinction.
What's next: Mutable variables and SSA construction (the last big piece).
Musings:
I have the insanest Tiny Chef obsession. He's the most adorable, sassy, and tiny little bundle of joy I've seen in a long, long time now. He's so refreshingly and unabashedly authentic. Bad singing (but still does it anyway), pop-astrology, yoga, wardrobe dilemmas, and above all — unapologetic optimism. I never imagined I'd find myself rooting for, or seeking life-lessons from a barely legible green little ball of felt. But hey, here we are. When the going gets tough, all we've got to do is put our hand on our heart and say, "You know what? I'm blenough, and it's all going to be blokay."






Top comments (0)