DEV Community

Lahari Tenneti
Lahari Tenneti

Posted on

LLVM #4 — User Defined Operators

Kaleidoscope's grammar can now be extended by the user. They can define their own binary and unary operators with custom symbols and precedence, without rewriting the parser or adding new cases to the codegen.

What I built: Commit 89fa3f8


What I understood

The idea is simple. We should allow for users to write something like:

and

Through which we can do:

1) Lexer:

  • We add two new tokens: tok_binary and tok_unary, along with their checks in gettok().

2) AST:

  • We create a UnaryExprAST, which is pretty similar to BinaryExprAST, but with one child instead of two.
  • We also extend PrototypeAST to have two new fields: IsOperator (bool) and Precedence (unsigned).
  • A prototype now knows whether it's defining an operator, and if yes, at what precedence.

3) Parser:

  • ParsePrototype() uses a switch-case on CurTok.
  • If it sees tok_identifier, it's a regular function, like before.
  • But if it sees tok_binary, it reads the operator character, optionally reads a precedence number (in case of binary), and builds the name "binary" + char (so binary|, binary>, etc.).
  • If it sees tok_unary, it does the same but without precedence.

  • ParseUnary() is also new. It sits between ParseExpression() and ParsePrimary() in the call chain.

  • If the current token looks like a unary operator (an ASCII character that isn't ( or ,), it consumes it and recursively calls ParseUnary() on the rest. This is for handling chaining (like !!x). Otherwise, it falls through to ParsePrimary().

  • ParseExpression() and ParseBinOpRHS() are also updated to call ParseUnary() instead of ParsePrimary().

4) Codegen:

This is where things change a bit.

  • For binary operators, BinaryExprAST::codegen() already had a switch-case on Op. We just add a default case that does a symbol table lookup for "binary" + Op and emits a call to it:
Function *F = getFunction(std::string("binary") + Op);
assert(F && "binary operator not found!");
Value *Ops[2] = { L, R };
return Builder->CreateCall(F, Ops, "binop");
Enter fullscreen mode Exit fullscreen mode
  • User-defined operators are mostly similar to functions (only with new names). The codegen doesn't bother distinguishing whether it's a function or UDF; It merely finds the function and calls it.

  • Likewise, for unary operators, UnaryExprAST::codegen() looks up "unary" + Opcode and calls it.

  • Like I mentioned earlier, before building the function body, if the prototype is a binary operator, we register its precedence in BinopPrecedence. This change is made in FunctionAST::codegen():

if (P.isBinaryOp())
  BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();
Enter fullscreen mode Exit fullscreen mode
  • The grammar is dynamically extensible at JIT runtime: define a new operator and it is immediately available with the right precedence.

What I didn't understand:

a) How does naming operators binary| or unary! work?

  • I had this question because we construct a string like binary| and use it as a function name.
  • The thing is, LLVM's symbol table allows names with symbols, so binary| is a perfectly valid function name in LLVM IR.
  • When the user writes x | y, codegen looks up binary| in the module, finds the user-defined function, and emits a call.
  • It is an ordinary function dispatch dressed up to look like operator syntax.

b) Why do user-defined operators not need new AST nodes?

  • This is because the existing BinaryExprAST and UnaryExprAST already represent “an operator applied to operands”, and thus don't care whether the operator is built-in or user-defined.
  • The only thing that changes is what codegen() does with an unrecognised Op. Instead of erroring, it looks the operator up as a function.
  • The AST stays blissfully unaware of the distinction.


What's next: Mutable variables and SSA construction (the last big piece).


Musings:

I have the insanest Tiny Chef obsession. He's the most adorable, sassy, and tiny little bundle of joy I've seen in a long, long time now. He's so refreshingly and unabashedly authentic. Bad singing (but still does it anyway), pop-astrology, yoga, wardrobe dilemmas, and above all — unapologetic optimism. I never imagined I'd find myself rooting for, or seeking life-lessons from a barely legible green little ball of felt. But hey, here we are. When the going gets tough, all we've got to do is put our hand on our heart and say, "You know what? I'm blenough, and it's all going to be blokay."

Top comments (0)