Alright, as far as I know, there are two general types of parsers, top down, and bottom up. Bottom up is more performant iirc, but is harder to get used to, while top down is less performant (O(n3) if you know what I mean, and iirc), but I hear that they're much simpler to work with.
Ohh before I elaborate further, I'll need to know what sorts of systems and tools you're used to working with. What language, OS, compiler?
In grad school I programmed in C/C++ for 5 years, now I'm mostly Java and R, but I'm teaching myself Haskell, as well. I work on Windows, Ubuntu, and macOS. If you're asking which compiler I use to compile my own code, just gcc or whatever's available. I haven't yet looked into anything for my language project.
I found this article while researching this and it mentions yacc, which I have heard of. I wasn't aware that bison is its successor. I don't know how yacc works, though, I just recognise the name.
Before you move forward with your project I strongly suggest you study Julia (as a replacement for R) and Elixir (as a replacement for Java). They're more modern languages so they've worked through a lot of issues in their recent, early pre 1.0 phases and have thought, hard and heatedly, about a lot of things you're looking at. Both are functional, and both are dynamic (I think strict adherence to staticity is a bit of a religion: Julia kills the idea that static = performance, because of its amazing compiler strategy, and Elixir + dialyzer kills the idea that static = correctness - I basically have no runtime errors with static typechecking engine, and the rock solid dynamic stability of OTP makes it way worth it - I'm a a relatively inexperienced dev, they just promoted me from junior directly to senior =) with a lot of freedom and in elixir I wrote a testing harness in three days for a senior coworker's go program (6 months in the making) that triggered both linux and go program panics/errors, while my harness (living on the same node and dispatching and monitoring tens of thousands of concurrent processes) was fine.
Moreover, both PLs focus on developer productivity and joy. I don't know if that's what you're optimizing for, but I suggest learning from some of the best!
These would be converted into a Finite State Automata that can parse a string in LL(1) or O(n). It's basically a program that takes one character after another and changes state depending on the character. Each state can accept different characters leading to different states. This way, a state can encompass multiple non-terminals.
Let's say you define a language as having members "hero" and "hello".
MyLang <- hello | hero
The way to parse a given string without backtracking or lookahead is to construct this finite state machine. Opening state accepts an h, leading to state 1. State 1 accepts e, leading to state 2. State 2 accepts l, leading to state 3, or r leading to state 4. State 3 accepts l, leading to state 5, which accepts o, leading to a state which it is valid to end on. State 4 also accepts an o and is valid to end on. In this case you could reuse state 4 for 5 if desired.
I don't know what any of this means. (Sorry, I'm new to language design.) Can you elaborate?
Alright, as far as I know, there are two general types of parsers, top down, and bottom up. Bottom up is more performant iirc, but is harder to get used to, while top down is less performant (O(n3) if you know what I mean, and iirc), but I hear that they're much simpler to work with.
Ohh before I elaborate further, I'll need to know what sorts of systems and tools you're used to working with. What language, OS, compiler?
In grad school I programmed in C/C++ for 5 years, now I'm mostly Java and R, but I'm teaching myself Haskell, as well. I work on Windows, Ubuntu, and macOS. If you're asking which compiler I use to compile my own code, just
gcc
or whatever's available. I haven't yet looked into anything for my language project.I found this article while researching this and it mentions
yacc
, which I have heard of. I wasn't aware thatbison
is its successor. I don't know howyacc
works, though, I just recognise the name.Before you move forward with your project I strongly suggest you study Julia (as a replacement for R) and Elixir (as a replacement for Java). They're more modern languages so they've worked through a lot of issues in their recent, early pre 1.0 phases and have thought, hard and heatedly, about a lot of things you're looking at. Both are functional, and both are dynamic (I think strict adherence to staticity is a bit of a religion: Julia kills the idea that static = performance, because of its amazing compiler strategy, and Elixir + dialyzer kills the idea that static = correctness - I basically have no runtime errors with static typechecking engine, and the rock solid dynamic stability of OTP makes it way worth it - I'm a a relatively inexperienced dev, they just promoted me from junior directly to senior =) with a lot of freedom and in elixir I wrote a testing harness in three days for a senior coworker's go program (6 months in the making) that triggered both linux and go program panics/errors, while my harness (living on the same node and dispatching and monitoring tens of thousands of concurrent processes) was fine.
Moreover, both PLs focus on developer productivity and joy. I don't know if that's what you're optimizing for, but I suggest learning from some of the best!
Thanks, Isaac! I'll have to explore a few more languages before I make a decision one way or the other. I'll definitely check out Julia and Elixir.
A PEG is a parsing expression grammar. It is a way of specifying all valid strings in a language. It looks kinda like this
These would be converted into a Finite State Automata that can parse a string in LL(1) or O(n). It's basically a program that takes one character after another and changes state depending on the character. Each state can accept different characters leading to different states. This way, a state can encompass multiple non-terminals.
Let's say you define a language as having members "hero" and "hello".
The way to parse a given string without backtracking or lookahead is to construct this finite state machine. Opening state accepts an
h
, leading to state 1. State 1 acceptse
, leading to state 2. State 2 acceptsl
, leading to state 3, orr
leading to state 4. State 3 acceptsl
, leading to state 5, which acceptso
, leading to a state which it is valid to end on. State 4 also accepts ano
and is valid to end on. In this case you could reuse state 4 for 5 if desired.Bison and Yaac are tools that allow you to write a grammar and it will spit out a finite state machine like I described.