DEV Community

Cover image for Compiler Cleanup (Pogo Pt:15)
Chig Beef
Chig Beef

Posted on

Compiler Cleanup (Pogo Pt:15)

Intro

In this series I am creating a transpiler from Python to Golang called Pogo. In the last post we did some good work on function calls, and all the type logic surrounding them more sound. This will be the last post on Pogo sadly, unless I decide to pick this project back up, which is very likely because I did enjoy this a lot. But there's a reason this is the last post, and that's because it's the end of the month, which mean a new project will start (I'll talk more about it at the end of this post)!

Comments

Now let's start fixing some loose-ends on our compiler. Our first issue is comments. I haven't touched comments since they were pretty much implemented. Because of this, they are very error prone. Whenever we write a comment and transpile, the comment starts with '#', which isn't how comments start in Go. To fix this, we're going to the source of the comment, which is at the lexer. This actually turned out to be a single line change.

note := string(append([]byte{'/', '/'}, l.source[start+1:l.curPos+1]...))
Enter fullscreen mode Exit fullscreen mode

All we are doing here is getting rid of the '#' by using start+, then adding "//" to the start using append.

Multi-Line Comments

Another feature I forgot about was multi-line comments. This shouldn't be too hard to implement, so let's start working on it. First we need to make it a valid token in the lexer.

else if l.curChar == ''' {
    if string(l.source[l.curPos:l.curPos+3]) == "'''" {
        start := l.curPos
        l.nextChar()
        for string(l.source[l.curPos:l.curPos+3]) != "'''" {
            l.nextChar()
        }
        l.nextChar()
        l.nextChar()
        note := "/" + string(l.source[start+3:l.curPos-2]) + "/"
        token = Token{tokenCode["COMMENT_MULTI"], note, l.line}
    }
}
Enter fullscreen mode Exit fullscreen mode

In this we just keep on looking until we can find the end of the comment, and once we have then we can replace the "'''" with "/*" and the closing version.

Parsing Our Comments

Now our comments are valid tokens, they need to be parsed correctly.

else if p.curToken.code == tokenCode["COMMENT_MULTI"] {
    s = createStructure("COMMENT_MULTI", p.curToken.text, p.curToken.line)
}
Enter fullscreen mode Exit fullscreen mode

Pretty simple. We also need to add this to nextTokenNoNotes, but that's pretty simple so I'll skip over that, remember, there is a GitHub (linked up top) so if you do want to look at all the source code in your own time then you can (not that this isn't your own time but you get what I mean).

Single Item Comparisons

Currently, we have expressions that can be a single item, but now we need to add the same functionality to comparisons.

if err != nil {
    p.rollBack()
    p.funcLine = p.funcLine[:len(p.funcLine)-1]
    return s, nil // Could be a single literal, so we don't error
}
Enter fullscreen mode Exit fullscreen mode

This is all we need to add to fix this up. Just before this code we check for an operator, and if we get an error, we don't bubble it, instead, this means we have a single item, so we return that instead. Of course, we also have to remove this function from the function line, so that we get those nice error messages.

Fixing Numbers

Because of the way we lex numbers, numbers such as 5.3.4 are valid, when they definitely shouldn't be. All we need to do to fix this is count the number of dots.

has_dot := false
for i := 0; i < len(num); i++ {
    if num[i] == '.' {
        if has_dot {
            log.Fatal("[Lex (lex)] Numbers can only have one dot on line " + strconv.Itoa(l.line))
        }
        has_dot = true
    }
}
Enter fullscreen mode Exit fullscreen mode

Since we only need to count whether we have one dot to throw the error, we can just use a bool. As soon as we find that second dot we exit.

Underscores In Numbers

In Go, putting an underscore next to a dot is illegal, whether the underscore is on the right or left side. In Python, it is illegal to put an underscore to the left of a dot. It is not illegal to put it to the right however, but we're going to make it illegal because just why did they have to. To fix this, we first have to find the index of the dot, so that we can check around it. We can find this index in the loop we just made. Now we can check both sides for underscores.

if num[dot_index-1] == '_' || num[dot_index+1] == '_' {
    log.Fatal("[Lex (lex)] Cannot place underscores next to dots in numbers on " + strconv.Itoa(l.line))
}
Enter fullscreen mode Exit fullscreen mode

We can check both sides in one if statement, and it makes the check pretty simple. Naturally, we only allow numbers that start with a digit, and we also only allow numbers that end with a digit, so we won't go out of range by using dot_index+1

Type Checking Declarations

Currently, this is valid code for Pogo.

from GoType import *

def someString() -> string:
    return "Howdy"

x: int = someString()
Enter fullscreen mode Exit fullscreen mode

We obviously don't want this, so now we need to fix it in the semantic analyzer. I won't include the code here, because it got a bit large, but now the above code gives us an error.

Better Expressions and Comparisons

Something I want to add is longer expressions and comparisons, for example.

x: int = 1 + 2 + 3 + 4
Enter fullscreen mode Exit fullscreen mode

To implement this, all we need to do is make it so that we create a loop so we can have as many items as we want.

func (p *Parser) expression() (Structure, error) {
    p.funcLine = append(p.funcLine, "expression")
    s := createStructure("EXPRESSION", "EXPRESSION", p.curToken.line)

    p.setMarker()
    temp, err := p.call()
    if err != nil {
        p.gotoMarker()
        temp, err = p.checkTokenChoices([]string{
            "L_BOOL",
            "L_INT",
            "L_STRING",
            "IDENTIFIER",
        })
        if err != nil {
            return s, err
        }
    }
    s.children = append(s.children, temp)
    p.nextToken()

    temp, err = p.checkTokenChoices([]string{
        "MO_PLUS",
        "MO_SUB",
        "MO_MUL",
        "MO_DIV",
        "MO_MODULO",
    })

    for err == nil {
        s.children = append(s.children, temp)
        p.nextToken()

        p.setMarker()
        temp, err = p.call()
        if err != nil {
            p.gotoMarker()
            temp, err = p.checkTokenChoices([]string{
                "L_BOOL",
                "L_INT",
                "L_STRING",
                "IDENTIFIER",
            })
            if err != nil {
                return s, err
            }
        }
        s.children = append(s.children, temp)
        p.nextToken()

        temp, err = p.checkTokenChoices([]string{
            "MO_PLUS",
            "MO_SUB",
            "MO_MUL",
            "MO_DIV",
            "MO_MODULO",
        })
    }
    p.rollBack()
    p.funcLine = p.funcLine[:len(p.funcLine)-1]
    return s, nil // Could be a single literal, or as many items as we need
}
Enter fullscreen mode Exit fullscreen mode

I know that's a lot of code, but that's the entire logic for expressions! Have a look at this valid code.

from GoType import *

if 5 + 2 == 5 + 2:
    print("Hello")
Enter fullscreen mode Exit fullscreen mode

Next

That's all for Pogo for a while, which mean's we're on to our next project. This project will be a Wolfenstein 3D-like game (raycaster) from scratch, most likely in Go. I'm planning a space theme for the game so that we aren't directly copying, because that wouldn't really be fun. This is definitely going to be a fun project, and hopefully I don't get absolutely destroyed by the trigonometry.

Top comments (0)