The Parsing Process
Let's try to put a mental model of how the entire command line string needs to be parsed and what are the various modules we might require along the way.
- Scanner -> Parser -> Executer -> Command
- Token_list -> Abstract Syntax Tree -> Convert to Execution Unit -> Execute-Execute
The Tentative Process:
- Get the whole string until new line using read_line library.
- Search the string from index 0 until '\0' for ('')
- Divide string in tokens based on whitespace
- it can be any amount of whitespaces, abcd efgh or abcd efgh; They should both make two tokens.
- whitespace cannot be included in token characters except when there are double or single quotes. ** "hello world" is one token hello world ** "" hello world "" is two tokens hello and world; and the empty string before and after hello world will not be tokenized. ** "'hello world'" -- A single quote inside a double quote will be included in the token, so the result will be one token called 'hello world'; All the above rules apply to single quotes as well.
- After dividing the tokens, if there is a token with a $ sign, whatever comes after the $ sign is compared with the list of all the env variables available and replaced with the value of the said variable. ** Important sub-point - In case of a string in single quote, even if there is a $ sign, it is not replaced with the value of the said env variable.
- Environment variables are received using envp parameter from the main function and stored in a key value structure t_env and put in a linked list.
Converting stuff into an Abstract Syntax Tree.
- A scanner will scan through the string and make a list of tokens based on the rules, we listed above.
- A parser takes that list of tokens verifies the syntax (represents an error if there is a syntax error), and converts it into a tree structure.
Lets try to build the schematics top down:
----- Command line -----
Job ; command line ; job
(Identify command line (unparsed line) as either a formed job or a combination of jobs and unparsed command_lines)
----- Job ------
Command | job ; command
(Identify a job as a command or another job in a list)
---------- Command ----
simple_command redirection list simple_command
(Identify a command as either a simple command or a redirection list )
---------- Redirection List ----
redirection redirection_list
----------- Redirection ------
filename < filename >> filename
---------- Simple_Command -------
pathname token_list
----------- Token_List --------
token token_list
For example:
cat > a > b < c should end up as:
command
/ \
simple cmd redirection
/ \
cat >
/ \
aaaa >
/ \
bbbb <
/
cccc
(hold)
/ \
cat >
/ \
aaaa >
/ \
bbbb <
/
cccc
(hold)
/ \
cat aaaa (type : NODE_REDIRECT_IN)
\
bbbb (type: NODE_REDIRECT_IN)
\
cccc (type: NODE_REDIRECT_OUT)
The tree could finally evaluate to Abstract Tree
aaaa (type : NODE_REDIRECT_IN)
/ \
cat bbbb (type: NODE_REDIRECT_IN)
\
cccc (type: NODE_REDIRECT_OUT)
Let's breakdown the logic of the above command tree:
We identify command as one of two structures:
- Simple_command and redirection_list structures
Simple_command structure.
In case of simple command
First we check if it is simple_command
** We check the token_list with pathnames
** If it just one token we check once but it is a list we check each one.
- In case of redirection list
** We check if it just one redirection or a list.
** We attach one case ( <, >, <<, >>) and attach the filename information, create a node and add it to the list.
Top comments (0)