DEV Community

Cover image for Over engineering 101 | building my own programming language to deal with live stream chat UI | Part 1. Basic scanning
Tristan Elliott
Tristan Elliott

Posted on

Over engineering 101 | building my own programming language to deal with live stream chat UI | Part 1. Basic scanning

Table of contents

  1. Introduction
  2. Building a basic Scanner to identify spaces
  3. Lexical analysis
  4. Token type
  5. Token Data class
  6. Code for the basic Scanner
  7. Token Data class
  8. Starting and stopping
  9. Testing
  10. Full code
  11. Next post
  12. Resources

My app on the Google play store

Introduction

  • So basically I have my app (Not to brag but I have two active users now) which is a mobile moderation app for Twitch. I want to add extra functionality in the chat section. Which is a bit of a problem because we have to quite literally re-build the twitch chat functionality from scratch. Initially this might not seem like a challenge (how foolish I was). Because if you check out the functionality of the Twitch chat feature, you can see how fancy it gets with all the commands and pop ups and quickly realize that its basically its own DSL programming language. After trying to recreate its functionality with a unholy amount of regex and if statements. I have decided to just say F*** it, lets build our own language!!!. With the aid of the book Crafting Interpreters by Robert Nystrom we shall attempt to build just that

Building a basic Scanner to identify spaces

  • if you want the actual details of what we are doing check out Crafting Interpreters scanner chapter but long story short we want our scanner to be able to take a String identify all the spaces, create a Token and add that Token to a list (This list will get passed to a parser in later blog posts)
  • For the actual finished Scanner we want it to be able to identify things like, @testUsername \modCommand and have the UI act accordingly but for right now lets just get it to identify empty spaces and create Tokens for it

Lexical analysis

  • So the first step is to do a little bit of lexical analysis, which, as the book describes is this: scan through the list of characters and group them together into the smallest sequences that still represent something. Each of these blobs of characters is called a lexeme
  • We can then take those lexeme(which is just blank spaces for us) combined them with extra data to create our Token, which is what we need to pass into a parser. Now a Token is going to consist of 3 things:

1) Token type(a Enum we create)
2) Literal value(the empty space character)
3) Location information(index where it is found)

Token type

  • As stated previously it is just a Enum:
enum class TokenType {

    // just empty space characters
    EmptySpace
}

Enter fullscreen mode Exit fullscreen mode

Token Data class

  • Since the Token class is really just meant to hold data about the character we have identified, it really is a great choice to use a data class:
data class Token(
    val type: TokenType,
    val lexeme: String,
    val startIndex:Int
    )

Enter fullscreen mode Exit fullscreen mode

Code for the basic Scanner

  • To start our scanner is going to contain 4 variables:
class Scanner{
    private  var source:String = ""
    private  val tokens = mutableListOf<Token>()
    private var start = 0
    private var current = 0
}
Enter fullscreen mode Exit fullscreen mode
  • source: string to scan, tokens : list of tokens found , start : where we are starting , current : where scanner currently is

Moving through the scanner

  • to allow our Scanner to move through the string we are going to create a simple function called advance():
private fun advance():Char{
        return this.source[current++]
    }
Enter fullscreen mode Exit fullscreen mode
  • This function will return the current Char our scanner is on and increase the current variable by one, which moves our scanner along as well

Adding Token to token list

  • Now we need to create a function that will allow us to add a token to the token list:
 private fun addToken(type:TokenType){
        val text = source.subSequence(start,current).toString()
        val token = Token(type,text,current)
        tokens.add(token)
    }

Enter fullscreen mode Exit fullscreen mode
  • basically we give it a TokenType, create and identify the space character(text), create the token and add it to the list

Scanning for tokens:

  • Now we want to do some actual scanning and identify some token, which is done with:
 private fun scanToken(){
        val char = advance()
        when(char){
            ' ' ->{addToken(TokenType.EmptySpace)}
        }
    }

Enter fullscreen mode Exit fullscreen mode
  • So val char = advance() will get the current character. Then its just a simple when(){} statement to identify when a empty space character is found

Starting and stopping

  • Now we need to be able to start scanning and tell it when to stop, we will do this with a while loop:
private fun scanTokens(){
        while(!isAtEnd()){
            start = current
            //start scanning tokens here
            scanToken()
        }
    }
    private fun isAtEnd():Boolean{
        return this.current >= source.length
    }

Enter fullscreen mode Exit fullscreen mode
  • This might seem a little strange, but remember that the current is increased by +1 when advanced is called.

Testing

  • To prove to those doubters(myself) that we have succesfully scanned the tokens we can run these tests:
   //UNDER TEST
    private val underTest = Scanner()
    @Test
    fun testing_clear_chat_parsing_clear_chat_command() {
        /* Given */
        val sourceStringWithSevenSpaces = "It do be like that sometimes another one"
        val sourceStringWithTwoSpaces = "It do "

        /* When */
        scannerUnderTest.setSource(sourceStringWithTwoSpaces)
        val actualAmountOfTokens = underTest.getTokenList().size



        /* Then */
        Assert.assertEquals(2,actualAmountOfTokens)
    }

Enter fullscreen mode Exit fullscreen mode

Full code:

enum class TokenType {
    //a @username word
    MENTION,

    // everything that is NOT a @username word
    WORD,

    // just empty space characters
    EmptySpace
}

class Scanner{
    private  var source:String = ""
    private  val tokens = mutableListOf<Token>()
    private var start = 0
    private var current = 0

    fun setSource(source:String){
        this.source = source
        scanTokens()
    }

    private fun scanTokens(){
        while(!isAtEnd()){
            start = current
            //start scanning tokens here
            scanToken()
        }
    }
    private fun isAtEnd():Boolean{
        return this.current >= source.length
    }
    private fun advance():Char{

        return this.source[current++]
    }
    private fun scanToken(){
        val char = advance()
        when(char){
            ' ' ->{addToken(TokenType.EmptySpace)}
        }
    }
    private fun addToken(type:TokenType){
        val text = source.subSequence(start,current).toString()
        val token = Token(type,text,current)
        tokens.add(token)

    }
    fun getTokenList():List<Token>{
        return this.tokens
    }
}

data class Token(
    val type: TokenType,
    val lexeme: String,
    val startIndex:Int
    )

Enter fullscreen mode Exit fullscreen mode

Next post

  • The next post we will identify lexemes of @someUsername and /someCommand. Which is really what we want

Resources

Conclusion

  • Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.

Top comments (0)