DEV Community

Cover image for Table Driven Scanners
Ez Pz Developement
Ez Pz Developement

Posted on

Table Driven Scanners

Introduction

A scanner or tokenizer is a part of the compiler, it is also that potion of code or program that gets a set of characters as input and produces a set of tokens or words as an output.

We can count three types of scanners table-driven scanners, direct coded scanners, and hand-coded scanners, each one of them has some advantages, disadvantages, based on this, we can determine when to use any of the previously listed approaches, i also need to mention that some compiler designer and developers use a hybrid approach of this three, or they might also just combine two approaches.

Note that hand-coded scanners are the most used in most of the commercial versions of the famous compilers, This is due to many reasons, one of them is that unlike the other two table-driven scanners, and direct coded scanners; hand-coded scanner is not auto-generated it is for the developer to take care of the implementation and to improve the compiler performance.

Table Driven Scanners

But in this post, i want to explain and show what I understand about table-driven scanners, so as mentioned above in the introduction section this scanner is auto-generated, take as an example, Lex and Flex.

At first, we must have a set of regular expressions as an input for the scanner generator to produce tables, we will also need the help of what we can call a skeleton scanner to control the process of recognizing our language and return a set of tokens.
An example of the produced tables can be a classifier table, transition table, and token type table

classifier table: take a character as an input and identify whether this character belongs to any of the groups that make up our DFA or our RES, for example, a character can be a digit, a special character like ,;=+, a lower case alphabet, or an upper case alphabet, the figure below show an example of this table.

Image description

transition table: take the current state and the character group and return the next state to us, as you can see in the table below.

Image description

token type table: take a state as an input, and return the type of token that i must-see in this state, in the case when there is no token this table returns invalid or something similar, the figure below shows an example of a token type table.

Image description

Now i want to go back to the skeleton scanner, it is the program or the potion of code that control this process, you can find an algorithmic implementation with a deep and clear explanation in Engineering a compiler by Keith Cooper Linda Torczon on page 86, according to the last mentioned book, this skeleton scanner is usually divided into 3 sections initializations, a scanning loop that models the DFA's behavior.

in the end, i want to mention that the table-driven implementation in addition to the other two implementations, direct and hand-coded have different runtime costs, but they all have same asymptotic complexity per character.

Top comments (0)