I've tried to roll my own parsers for DSLs back in the distant past and usually made something monstrous, so good on you for doing it properly!
One clarification: most parsers do not use regexes for everything. A regex-powered parser wouldn't even be able to handle HTML since that's not a regular language. PEG lets you define what certain tokens look like with regexes, but that governs lexing (breaking up a text into actionable tokens) rather than parsing, which structures tokens into a syntax tree or other usable form. It's spelling vs grammar: you can assemble valid tokens into meaningless instructions, like if you try to use infix arithmetic on an RPN calculator.
In practice the relatively simple LL parsers use a stack to represent the program structure, while more powerful but more complex LR parsers use state machines.
Thank you. Good point on the regexes and parsers, I'll update the article to fix that.
Great clarification on the difference between lexing and parsing, PEGs just mashes the two concepts together into a single file. It's fine for simple grammars but can quickly become problematic for more complex ones.
I've written my own parsers and found it quite tedious, would you have any tools you'd recommend for writing parsers? I've looked at YACC and ANTLR, but didn't get very far, might revisit them in future.
I've only looked into those two. The last time I tried to do any sort of language tinkering like this was years ago. I got halfway through building a grammar, realized I'd just invented a crappier LISP, and promptly gave up.
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.