To begin
We are going to build an interpreter for our language, but before creating the interpreter we have to do the part of the Tokens that will be very important for our interpreter or compiler.
Every language has a defined syntax, so for your language you must create a syntax, our syntax will be a syntax I created called regius syntax which is a mixture of python and c ++:
Hello world: print "Hello world";
variables: int age = 10;
Show variables: print age;
Making the Types
Let's do the main function and create the token types for our language:
# language types
STRING = "STRING"
INT = "INT"
IDENTIFY = "IDENTIFY"
NULL = "NULL"
def main():
Interpreter = interpreter()
while True:
line = input(">> ") # get input
Interpreter.main(line) # call interpreter
main()
REMEMBERING THAT THE TOKEN AND INTERPRETER CLASSES MUST BE BETWEEN THE MAIN FUNCTION AND THE TYPES, for example:
# language types
STRING = "STRING"
INT = "INT"
IDENTIFY = "IDENTIFY"
NULL = "NULL"
class token:
...
class interpreter:
...
def main():
Interpreter = interpreter()
while True:
line = input(">> ") # get input
Interpreter.main(line) # call interpreter
main()
So now let's make a token class and define the token's type and value as an argument:
class token:
def __init__(self, type, value):
self.type = type
self.value = value
And we create a class for our interpreter by defining the line as an argument in the interpreter's main function:
class interpreter:
def __init__(self):
pass
def main(line):
pass
Now let's do the interpretation process that is explained in the comments:
class interpreter:
def __init__(self):
self.vars = {} # We "hash table" to save vars
def main(self, line):
# Important variables
self.i = 0 # Actual Line Index
self.c = '' # Actual char of Line
self.line = line # Saving string in a class var
self.actual_token = token(NULL, '') # Save the actual token
self.tokens = [] # Array of tokens
self.in_string = False # bool controler to strings
while True:
self.c = self.line[self.i] # Get Char
print(self.c)
if self.c == ';': ; Line End
break
self.i += 1 # Go to next Index
This way, our interpreter can read a line and show each digit of it in the console until it reaches ;
and stop the lexical analysis
Now let's generate lexical analysis tokens, as well as log the tokens into the console and disable character logging:
class interpreter:
def __init__(self):
self.vars = {} # We "hash table"
def main(self, line):
self.i = 0 # Actual Index
self.c = '' # Actual Char
self.line = line # Save we Line
self.actual_token = token(NULL, '') # Actual TOKEN
self.tokens = [] # Array of tokens
self.in_string = False # bool controler to strings
while True:
self.c = self.line[self.i] # Get Char
if self.c == ';': # End of Line
break
elif self.c.isdigit() and self.actual_token.type == INT or self.c.isdigit() and self.actual_token.type == NULL: # If actual char is a number and we are in a INT ou starting a token
self.actual_token.type = INT # Token type is INT
self.actual_token.value = self.actual_token.value + self.c # Add char to Token Value
if self.line[self.i+1].isdigit(): # If Next char is number, continue
pass
else: # Else, Save actual token and clear her
self.tokens.append(self.actual_token)
self.actual_token = token(NULL, '')
elif self.c == '"': # If actual char is "
if self.in_string: # IF in string, in_string = False, save actual token and clear her
self.in_string = False
self.tokens.append(self.actual_token)
self.actual_token = token(NULL, '')
else: # Else, in_string = True, actual token is string
self.actual_token.type = STRING
self.in_string = True
elif self.c.isalnum(): # If char is alphaNum
if self.in_string: # If in string
self.actual_token.value = self.actual_token.value + self.c
else: # Else is string, is a function
self.actual_token.type = IDENTIFY
self.actual_token.value = self.actual_token.value + self.c
if self.line[self.i + 1] in [" ", ';']: # If next char is space, end function
self.tokens.append(self.actual_token)
self.actual_token = token(NULL, '')
elif self.c == ' ' and self.in_string: # If the char is space and we are in string
self.actual_token.value = self.actual_token.value + self.c
self.i += 1 # Next Index
# Log Tokens
for __token__ in self.tokens:
print(__token__.type, __token__.value)
Now, if we run and type print "hello world" 123;
the expected output is:
IDENTIFY print
STRING hello world
INT 123
We use IDENTIFY type to functions. Here we finish our Tokenizer and are ready to create our interpreter itself, but that's for the next post.
Oldest comments (0)