Imagine you're trying to find your way through a large forest. You don't know how big the forest is, you don't have a map, and while you don't speak the same language as the locals you did gather enough information to be able to identify some of the landmarks you expect you'll find in the forest.
You've been given some rough drawings of some of the different kinds of trees, rocks, and bushes, and other things that live in the forest, and you've even managed to get some notes on the more complex signs that the locals use themselves for navigation. For example, you know that three black rocks with a white rock next to it points toward the river, while three black rocks with a stick in the ground near it point toward another pile of rocks. You've been studying these people for a while and have a pretty good understanding of how the locals build their navigational signs, as well as learning how to pay close attention to the natural signs around you, like animal tracks and the direction moss grows on the trees.
So armed with your notebook, you set out on your journey, and you write down all the signs you come across so that you can make a full map which will make it much easier to get back to the village or help your fellow researchers cross it themselves.
In this story, you, as the explorer, are the ANTLR parser. You don't have the full map of the forest yet, but you do have knowledge of al the important bits in it. In ANTLR terms, this is the grammar, and it consists of a set of rules which will help it build the "map" (the parser). The grammar consists of instructions for building the lexer and the parser.
The lexer is the bit of knowledge you have about single, easily identified structures within the program. In the example, this is the types of trees, types of rocks, etc. In a program, a lexer might define what is a valid string literal, or a valid integer, or the keywords used in the language.
The parser takes the tokens identified by the lexer and creates meaning behind certain groupings of those tokens. So you might have a single rock which means nothing, but a special formation of the rocks does have meaning. Likewise, a string literal might be assigned to a variable, or it might be passed as an argument to a method: the parser decides what should happen with the single string literal given its context in relation to other tokens.
Furthermore, you'll notice that the man in the story just makes the map, but he doesn't really do anything with it himself. He made the map simply so that it is easier for the other researches to find their way through the forest. In the same way, ANTLR doesn't create a fully-functional compiler for you, it just provides a structure for navigating your way through a program's textual representation so that you can focus less on the specifics of language construction and more on what the language is intended to do.
tl;dr : ANTLR is a tool for generating parsers for any formally specified language.
Explanation:
What is a grammar
Programming languages have a grammar which tells us about the terms in the programming language. For example, java has expressions, variables, classes, methods, operators, lambdas. A grammar for Java will tell us how these various terms combine, what is expressible using these terms. e.g int i = 3.4; is not expressible in Java's grammar.
What is a parser
A parser is a tool for converting text i.e code into a form that allows a compiler to understand what the terms mean. e.g int i = 4; will be converted by a parser into a form like Assignment(Variable(name: i, type: int), Value(4)). Notice that this looks like a tree. It is called an Abstract Syntax Tree (AST) and is a intermediate form which is easier for the compiler or interpreter to make use of.
Why do we need grammars?
It is for telling the parser how to convert the text which is written in a way that the computer can understand (AST). You can write a parser in many ways. Many compilers use hand written parsers for performance.
What does ANTLR do and how does it help?
ANTLR is a tool for generating parsers for your own custom languages easily. All you need for generating a parser using ANTLR is a grammar file. ANTLR will convert the grammar file into generated Java classes which do the parsing. ANTLR follows the Visitor pattern which means you can then add custom behavior for each of the keywords in your language.
Here is an example ANTLR grammar for a scientific calculator : calculator.g4
Conclusion
In general, if you are creating a domain specific language, it is nice to formally specify the grammar. And if you have the grammar, ANTLR makes it very easy to generate a custom parser without any extra code.
There is a lot more discussion we can have on the parsing algorithm that ANTLR uses (LL*) and when to use/not use ANTLR.
Top comments (4)
Imagine you're trying to find your way through a large forest. You don't know how big the forest is, you don't have a map, and while you don't speak the same language as the locals you did gather enough information to be able to identify some of the landmarks you expect you'll find in the forest.
You've been given some rough drawings of some of the different kinds of trees, rocks, and bushes, and other things that live in the forest, and you've even managed to get some notes on the more complex signs that the locals use themselves for navigation. For example, you know that three black rocks with a white rock next to it points toward the river, while three black rocks with a stick in the ground near it point toward another pile of rocks. You've been studying these people for a while and have a pretty good understanding of how the locals build their navigational signs, as well as learning how to pay close attention to the natural signs around you, like animal tracks and the direction moss grows on the trees.
So armed with your notebook, you set out on your journey, and you write down all the signs you come across so that you can make a full map which will make it much easier to get back to the village or help your fellow researchers cross it themselves.
In this story, you, as the explorer, are the ANTLR parser. You don't have the full map of the forest yet, but you do have knowledge of al the important bits in it. In ANTLR terms, this is the grammar, and it consists of a set of rules which will help it build the "map" (the parser). The grammar consists of instructions for building the lexer and the parser.
The lexer is the bit of knowledge you have about single, easily identified structures within the program. In the example, this is the types of trees, types of rocks, etc. In a program, a lexer might define what is a valid
string
literal, or a valid integer, or the keywords used in the language.The parser takes the tokens identified by the lexer and creates meaning behind certain groupings of those tokens. So you might have a single rock which means nothing, but a special formation of the rocks does have meaning. Likewise, a
string
literal might be assigned to a variable, or it might be passed as an argument to a method: the parser decides what should happen with the singlestring
literal given its context in relation to other tokens.Furthermore, you'll notice that the man in the story just makes the map, but he doesn't really do anything with it himself. He made the map simply so that it is easier for the other researches to find their way through the forest. In the same way, ANTLR doesn't create a fully-functional compiler for you, it just provides a structure for navigating your way through a program's textual representation so that you can focus less on the specifics of language construction and more on what the language is intended to do.
Thanks Casey.
tl;dr : ANTLR is a tool for generating parsers for any formally specified language.
Explanation:
What is a grammar
Programming languages have a grammar which tells us about the terms in the programming language. For example,
java
has expressions, variables, classes, methods, operators, lambdas. A grammar for Java will tell us how these various terms combine, what is expressible using these terms. e.gint i = 3.4;
is not expressible in Java's grammar.What is a parser
A parser is a tool for converting text i.e code into a form that allows a compiler to understand what the terms mean. e.g
int i = 4;
will be converted by a parser into a form likeAssignment(Variable(name: i, type: int), Value(4))
. Notice that this looks like a tree. It is called an Abstract Syntax Tree (AST) and is a intermediate form which is easier for the compiler or interpreter to make use of.Why do we need grammars?
It is for telling the parser how to convert the text which is written in a way that the computer can understand (AST). You can write a parser in many ways. Many compilers use hand written parsers for performance.
What does ANTLR do and how does it help?
ANTLR
is a tool for generating parsers for your own custom languages easily. All you need for generating a parser using ANTLR is a grammar file. ANTLR will convert the grammar file into generated Java classes which do the parsing. ANTLR follows theVisitor
pattern which means you can then add custom behavior for each of the keywords in your language.Here is an example ANTLR grammar for a scientific calculator : calculator.g4
Conclusion
In general, if you are creating a domain specific language, it is nice to formally specify the grammar. And if you have the grammar, ANTLR makes it very easy to generate a custom parser without any extra code.
There is a lot more discussion we can have on the parsing algorithm that ANTLR uses (LL*) and when to use/not use ANTLR.
Thanks Raunak.