Upgrading Tokens (Pogo Pt: 5)

#python #go #compiler #transpiler

Intro

In this series I am creating a transpiler from Python to Golang called Pogo. In the last post we created a great bulk of the lexer, and we're now starting to get ready for the parser.

This Post

The parser turns our slice of tokens into an abstract syntax tree. Each node of the tree will be one token, but our tokens don't have the capability to be part of a tree, because they don't have any concept of children. In this case, we will make a new type of upgraded token, and we will call it a Structure, as it will define the structure of our program.

The Structure Struct

type Structure struct {
    children []Structure
    code     int
    text     string
}

This may seem very similar to the Token struct, if you remember it from one of the previous posts. Another note to remember is that the code property of the Structure will not be the same as a Token's code. This is because there will be more types of Structures. For example, there will not only be a Structure for the keyword if, but also a Structure for an entire if statement. Furthermore, there will be another Structure for an if-else block, which will hold the structure of an if, as many else ifs as are laid out, and the final else. In this way we are trying to separate and group sections of our code into nice chunks that are easy and simple to use.
For debugging, I have created a method that can turn a Structure into a string in a nice readable way. This method is recursive, calling all of the Structures children.

func (st Structure) stringify() string {
    text := ""
    for i := 0; i < len(st.children); i++ {
        text += "\n" + st.children[i].stringify()
    }
    return st.text + strings.ReplaceAll(text, "\n", "\n\t")
}

In this method we add a new line for every child, and using strings.ReplaceAll we also indent every child. This means that when we have a child of a child, they will be shown with 2 indents in the terminal.
Now we are going to make the same sort of map we made for the tokens.

var structureCode map[string]int = map[string]int{
    // Not implemented
    "ILLEGAL": -1,

    // Statements
    "ST_IMPORT":       0,
    "IF_ELSE_BLOCK":   1,
    "ST_IF":           2,
    "ST_ELIF":         3,
    "ST_ELSE":         4,
    "ST_FOR":          5,
    "ST_DECLARATION":  6,
    "ST_MANIPULATION": 7,

    // Other
    "BLOCK":        32,
    "CALL":         33,
    "EXPRESSION":   34,
    "COMPARISON":   35,
    "PROGRAM":      36,
    "ASTERISK":     37,
    "IDENTIFIER":   38,
    "NEWLINE":      39,
    "INDENT":       40,
    "L_PAREN":      41,
    "R_PAREN":      42,
    "L_BLOCK":      43,
    "R_BLOCK":      44,
    "L_SQUIRLY":    45,
    "R_SQUIRLY":    46,
    "SEP":          47,
    "COLON":        48,
    "ANTI_COLON":   49,
    "ASSIGN":       50,
    "UNDETERMINED": 51,
    "COMMENT_ONE":  52,

    // Keywords
    "K_IMPORT": 64,
    "K_FROM":   65,
    "K_FOR":    66,
    "K_IN":     67,
    "K_IF":     68,
    "K_ELIF":   69,
    "K_ELSE":   70,

    // In-built functions
    "IB_PRINT": 96,
    "IB_RANGE": 97,

    // Bool operands
    "BO_NOT": 128,
    "BO_AND": 129,
    "BO_OR":  130,

    // Math operands
    "MO_PLUS":   160,
    "MO_SUB":    161,
    "MO_MUL":    162,
    "MO_DIV":    163,
    "MO_MODULO": 164,

    // Literal
    "L_BOOL":   192,
    "L_INT":    193,
    "L_STRING": 194,

    // Comparison operands
    "CO_EQUALS":     224,
    "CO_NOT_EQUALS": 225,
    "CO_GT":         226,
    "CO_GT_EQUALS":  227,
    "CO_LT":         228,
    "CO_LT_EQUALS":  229,
}

This may look very familiar, but with the slight difference of the statement section.

Why?

Tokens and Structures are very similar, so why didn't I just use one?
Firstly, it separates the 2 processes of the lexer and the parser very well
Secondly, I didn't want to have to give each Token children. This is inefficient both space-wise and takes up more space when initializing these Tokenss in the compiler's source code.

Now that we have created Structures, we can start work on our parser. The parser will turn our linear form of Tokens into an abstract syntax tree. This is the last step we will do before we start emitting code, which creates the final Go code as output, completing the project in the most minimal way.

DEV Community

Upgrading Tokens (Pogo Pt: 5)

Intro

This Post

The Structure Struct

Why?

Next

Top comments (0)