DEV Community


Posted on

Editing the Java compiler

Compilers always fascinated me throughout my career. I suppose it was the ability of the compiler to take a high level piece of code and quickly translate it into something that executes. I have tried writing a compiler from scratch once (for my own programming language) and it didn't go well but I was able to learn a lot from it. Fast forward to today, I finally edited the "openjdk17" / "corretto-17" java compiler to add my own grammar and functionality.

The "javac" command that you generally use to compile high level java code to a .class file or bytecode, is actually surprisingly written in java. This is known as a bootstrap compiler or a "self compiling compiler". It's a compiler written for a language in that language.

The Javac compiler has several stages at a detailed level but the main steps of compiling still adheres to the compiler theory / compiler design concepts. They are:

  1. Lexical Analysis (Tokenizing)
  2. Syntax Analysis (Parsing) (Generates a parse tree)
  3. Semantic Analysis (Type checking, Dead code analysis, etc)
  4. Intermediate Code Generation (optional)
  5. Target Code Generation (Bytecode)
  6. Code Optimization

After the parse tree is generated, every other stage is a visitor. It follows the visitor design pattern. There is a Flow visitor (checks Dead code), There is an Enter visitor (which collects Symbols like methods, classes). There is a Attr visitor which does Type checking. Etc.

To get started on editing the javac compiler, you need to first set up the openjdk codebase locally. Unfortunately, I couldn't find a good set of tools for compiler development but I found vscode to be the easiest in this case. IntelliJ also works but I prefer vscode here. This page describes how to setup the codebase locally.

The feature I am trying to add here, is the "Javascript's Option Chain" operator or the "?." operator. After navigating through the code, it appears that the grammar has to be written alongside the ternary operator's grammar code. This piece of code is present in the file. Here's where all parsing to a JCTree happens.

 /** Expression1Rest = ["?" Expression ":" Expression1]
    JCExpression term1Rest(JCExpression t) {
        if (token.kind == QUES) {
            int pos = token.pos;
            // option chaining
            if (token.kind == DOT) {
                var ident = ident();
                JCExpression returnable =
              ,, null), t)),
              , null),, ident));
                return term1Rest(returnable);
            // ternary
            JCExpression t1 = term();
            JCExpression t2 = term1();
            return, t1, t2);
        } else {
            return t;
Enter fullscreen mode Exit fullscreen mode

Here, the grammar is accepted by calling accept(DOT). DOT is a token defined in

The way this functionality works, is by replacing the JCFieldAccess Select(...) subtree with a custom tree similar to this code:

(t == null) ? null : t.ident;
Enter fullscreen mode Exit fullscreen mode

This is how option chaining works.

An example of how this feature works is like so:

Code snippet:

public class Test {
        public static void main(String[] args) {
                Car car = new Car();
                Integer wheel1 = car?.w1?.x;
                Integer wheel2 = car?.w2?.x;
                System.out.println("value of wheel1 is: " + wheel1);
                System.out.println("value of wheel2 is: " + wheel2);

        public static class Car {
                Wheel w1 = null;
                Wheel w2 = new Wheel();
                public Car() {

        public static class Wheel {
                int x = 2;
                public Wheel() {
Enter fullscreen mode Exit fullscreen mode

Image description

This still isn't perfect and has its own problems but regardless it was a quite interesting journey of understanding how the compiler is written. I was able to learn a lot of things from this.


My commit:

Top comments (0)