Tristan Elliott

Posted on Dec 1, 2023

Over engineering 101 | building my own programming language to deal with live stream chat UI | Part 1. Basic scanning

#android #kotlin #computerscience #tristan

Introduction
Building a basic Scanner to identify spaces
Lexical analysis
Token type
Token Data class
Code for the basic Scanner
Token Data class
Starting and stopping
Testing
Full code
Next post
Resources

My app on the Google play store

The app

Introduction

So basically I have my app (Not to brag but I have two active users now) which is a mobile moderation app for Twitch. I want to add extra functionality in the chat section. Which is a bit of a problem because we have to quite literally re-build the twitch chat functionality from scratch. Initially this might not seem like a challenge (how foolish I was). Because if you check out the functionality of the Twitch chat feature, you can see how fancy it gets with all the commands and pop ups and quickly realize that its basically its own DSL programming language. After trying to recreate its functionality with a unholy amount of regex and if statements. I have decided to just say F*** it, lets build our own language!!!. With the aid of the book Crafting Interpreters by Robert Nystrom we shall attempt to build just that

Building a basic Scanner to identify spaces

if you want the actual details of what we are doing check out Crafting Interpreters scanner chapter but long story short we want our scanner to be able to take a String identify all the spaces, create a Token and add that Token to a list (This list will get passed to a parser in later blog posts)
For the actual finished Scanner we want it to be able to identify things like, @testUsername \modCommand and have the UI act accordingly but for right now lets just get it to identify empty spaces and create Tokens for it

Lexical analysis

So the first step is to do a little bit of lexical analysis, which, as the book describes is this: scan through the list of characters and group them together into the smallest sequences that still represent something. Each of these blobs of characters is called a lexeme
We can then take those lexeme(which is just blank spaces for us) combined them with extra data to create our Token, which is what we need to pass into a parser. Now a Token is going to consist of 3 things:

1) Token type(a Enum we create)
2) Literal value(the empty space character)
3) Location information(index where it is found)

Token type

As stated previously it is just a Enum:

enum class TokenType {

    // just empty space characters
    EmptySpace
}

Token Data class

Since the Token class is really just meant to hold data about the character we have identified, it really is a great choice to use a data class:

data class Token(
    val type: TokenType,
    val lexeme: String,
    val startIndex:Int
    )

Code for the basic Scanner

To start our scanner is going to contain 4 variables:

class Scanner{
    private  var source:String = ""
    private  val tokens = mutableListOf<Token>()
    private var start = 0
    private var current = 0
}

source: string to scan, tokens : list of tokens found , start : where we are starting , current : where scanner currently is

Moving through the scanner

to allow our Scanner to move through the string we are going to create a simple function called advance():

private fun advance():Char{
        return this.source[current++]
    }

This function will return the current Char our scanner is on and increase the current variable by one, which moves our scanner along as well

Adding Token to token list

Now we need to create a function that will allow us to add a token to the token list:

 private fun addToken(type:TokenType){
        val text = source.subSequence(start,current).toString()
        val token = Token(type,text,current)
        tokens.add(token)
    }

basically we give it a TokenType, create and identify the space character(text), create the token and add it to the list

Scanning for tokens:

Now we want to do some actual scanning and identify some token, which is done with:

 private fun scanToken(){
        val char = advance()
        when(char){
            ' ' ->{addToken(TokenType.EmptySpace)}
        }
    }

So val char = advance() will get the current character. Then its just a simple when(){} statement to identify when a empty space character is found

Starting and stopping

Now we need to be able to start scanning and tell it when to stop, we will do this with a while loop:

private fun scanTokens(){
        while(!isAtEnd()){
            start = current
            //start scanning tokens here
            scanToken()
        }
    }
    private fun isAtEnd():Boolean{
        return this.current >= source.length
    }

This might seem a little strange, but remember that the current is increased by +1 when advanced is called.

Testing

To prove to those doubters(myself) that we have succesfully scanned the tokens we can run these tests:

   //UNDER TEST
    private val underTest = Scanner()
    @Test
    fun testing_clear_chat_parsing_clear_chat_command() {
        /* Given */
        val sourceStringWithSevenSpaces = "It do be like that sometimes another one"
        val sourceStringWithTwoSpaces = "It do "

        /* When */
        scannerUnderTest.setSource(sourceStringWithTwoSpaces)
        val actualAmountOfTokens = underTest.getTokenList().size



        /* Then */
        Assert.assertEquals(2,actualAmountOfTokens)
    }

Full code:

GitHub

enum class TokenType {
    //a @username word
    MENTION,

    // everything that is NOT a @username word
    WORD,

    // just empty space characters
    EmptySpace
}

class Scanner{
    private  var source:String = ""
    private  val tokens = mutableListOf<Token>()
    private var start = 0
    private var current = 0

    fun setSource(source:String){
        this.source = source
        scanTokens()
    }

    private fun scanTokens(){
        while(!isAtEnd()){
            start = current
            //start scanning tokens here
            scanToken()
        }
    }
    private fun isAtEnd():Boolean{
        return this.current >= source.length
    }
    private fun advance():Char{

        return this.source[current++]
    }
    private fun scanToken(){
        val char = advance()
        when(char){
            ' ' ->{addToken(TokenType.EmptySpace)}
        }
    }
    private fun addToken(type:TokenType){
        val text = source.subSequence(start,current).toString()
        val token = Token(type,text,current)
        tokens.add(token)

    }
    fun getTokenList():List<Token>{
        return this.tokens
    }
}

data class Token(
    val type: TokenType,
    val lexeme: String,
    val startIndex:Int
    )

The next post we will identify lexemes of @someUsername and /someCommand. Which is really what we want

Resources

Crafting Interpreters scanner chapter by Robert Nystrom

Conclusion

Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.

DEV Community

Over engineering 101 | building my own programming language to deal with live stream chat UI | Part 1. Basic scanning

Table of contents

My app on the Google play store

Introduction

Building a basic Scanner to identify spaces

Lexical analysis

Token type

Token Data class

Code for the basic Scanner

Moving through the scanner

Adding Token to token list

Scanning for tokens:

Starting and stopping

Testing

Full code:

Next post

Resources

Conclusion

Top comments (0)

Read next

Run Android Emulator without installing Android Studio

Implement WitnessCalc in native apps Pt.2

Why Read/Write in Hash Maps Has O(1) Time Complexity⁉️🚀

Automating Auth Token Injection in Retrofit with OkHttp Interceptors