In my (very limited) experience writing parsers/compilers, one of the more understandable forms of writing a compiler is the Parser Combinator. Reading most literature on these parsers is typically pretty heavy on the mathy- or haskelly-syntax, but its core concept is really simple.
Essentially, you can understand a parser simply as a function. Specifically one that takes an input String and returns a node, which is the part of the program that function could understand, and the remaining String to be parsed by a different function. I find Kotlin to be an exceptionally readable language for this so here's a way to represent that function as an interface:
interfaceParser{funparse(input:String):Pair<Node?,String>// if parsing succeeds, the Node will be non-null}// Node classes are simple, and either contain text or other nodes (but not both!)classNode(valkind:String)classLeafNode(kind:String,valcontent:String):Node(kind)classInnerNode(kind:String,valchildren:List<Node>):Node(kind)
From this interface, we can write the first actual parsers, which just recognize a single character of a specific type:
classAnyCharParser:Parser{overridefunparse(input:String):Pair<Node?,String>{valnextChar=input.first()// get the first charactervalremainingString=input.substring(1)// get the rest of the StringreturnLeafNode("anyChar",nextChar)toremainingString// return a node with the data it found, and the remaining String to be parsed}}classDigitParser:Parser{overridefunparse(input:String):Pair<Node?,String>{valnextChar=input.first()// get the first characterif(nextChar.isDigit()){// parsing only succeeds if the next character is a digitvalremainingString=input.substring(1)returnLeafNode("digit",nextChar)toremainingString}else{// otherwise we return a null node to signify parsing failurereturnnulltoinput}}}
At this point, we can successfully parse a language with only 1 character. Success is determined by some Parser returning a single root Node with an empty String as the remaining String.
valparser=DigitParser()valinput="1"val(rootNode,remainingString)=parser.parse(input)// rootNode should be non-null, and remainingString is empty. Parsing was successful!
You can imagine more kinds of "Leaf Parsers" that recognize different types of characters (whitespace, letters) or even recognize tokens rather than individual characters (if, for, class, etc). But we usually parse code which contains more than 1 token[citation needed]. So let's add a higher-order parser to allow parsing longer String inputs:
Notice how we can now recognize different types of String simply by passing a different Parser to the ManyParser parser. You could then extend this idea with other types of parsers, such as sequences of specific parser rules (if token, followed by an boolean expression, followed by block of code), a choice of many different possible parsers (an assignment could parse a literal String OR a literal number), etc. The result of all of these operations is a tree of nodes nested inside each other, which are annotated with information describing the context in which it was parsed and the actual content that was found. We now have something we can traverse in code and actually do something with.
For further study, you might check out my library Kudzu, which is a more complete implementation of this exact idea and will help reinforce this idea and show how it can be used to slowly build up complex grammars from simple building blocks. For example, this test case shows how a JSON parser might be built and parsed data transformed to actual usable data.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
In my (very limited) experience writing parsers/compilers, one of the more understandable forms of writing a compiler is the Parser Combinator. Reading most literature on these parsers is typically pretty heavy on the mathy- or haskelly-syntax, but its core concept is really simple.
Essentially, you can understand a parser simply as a function. Specifically one that takes an input String and returns a node, which is the part of the program that function could understand, and the remaining String to be parsed by a different function. I find Kotlin to be an exceptionally readable language for this so here's a way to represent that function as an interface:
From this interface, we can write the first actual parsers, which just recognize a single character of a specific type:
At this point, we can successfully parse a language with only 1 character. Success is determined by some Parser returning a single root Node with an empty String as the remaining String.
You can imagine more kinds of "Leaf Parsers" that recognize different types of characters (whitespace, letters) or even recognize tokens rather than individual characters (
if
,for
,class
, etc). But we usually parse code which contains more than 1 token[citation needed]. So let's add a higher-order parser to allow parsing longer String inputs:Notice how this
ManyParser
delegates parsing to another child parser. We can then combine it with one of our Leaf parsers to recognize longer inputs:Notice how we can now recognize different types of String simply by passing a different Parser to the
ManyParser
parser. You could then extend this idea with other types of parsers, such as sequences of specific parser rules (if
token, followed by an boolean expression, followed by block of code), a choice of many different possible parsers (an assignment could parse a literal String OR a literal number), etc. The result of all of these operations is a tree of nodes nested inside each other, which are annotated with information describing the context in which it was parsed and the actual content that was found. We now have something we can traverse in code and actually do something with.For further study, you might check out my library Kudzu, which is a more complete implementation of this exact idea and will help reinforce this idea and show how it can be used to slowly build up complex grammars from simple building blocks. For example, this test case shows how a JSON parser might be built and parsed data transformed to actual usable data.