Writing a DSL parser using PegJS

Barry O Sullivan on February 12, 2018

In the previous article I wrote about Domain Specific Languages (DSLs) and how useful they are, but I didn't get into the details of parsing them... [Read Full]
 

I've tried to roll my own parsers for DSLs back in the distant past and usually made something monstrous, so good on you for doing it properly!

One clarification: most parsers do not use regexes for everything. A regex-powered parser wouldn't even be able to handle HTML since that's not a regular language. PEG lets you define what certain tokens look like with regexes, but that governs lexing (breaking up a text into actionable tokens) rather than parsing, which structures tokens into a syntax tree or other usable form. It's spelling vs grammar: you can assemble valid tokens into meaningless instructions, like if you try to use infix arithmetic on an RPN calculator.

In practice the relatively simple LL parsers use a stack to represent the program structure, while more powerful but more complex LR parsers use state machines.

 

Thank you. Good point on the regexes and parsers, I'll update the article to fix that.

Great clarification on the difference between lexing and parsing, PEGs just mashes the two concepts together into a single file. It's fine for simple grammars but can quickly become problematic for more complex ones.

I've written my own parsers and found it quite tedious, would you have any tools you'd recommend for writing parsers? I've looked at YACC and ANTLR, but didn't get very far, might revisit them in future.

 

I've only looked into those two. The last time I tried to do any sort of language tinkering like this was years ago. I got halfway through building a grammar, realized I'd just invented a crappier LISP, and promptly gave up.

 

PegJS and other parser combinator libraries are great when you want to parse something that is not amenable to regular expressions but please don't use it to make more external DSLs. For everything that you can use an external DSL for you could have just as well used a small library or framework in an actual programming language. External DSLs don't have syntax highlighting, linting, and error checking and they basically throw away years of effort that actual programming languages have invested in their tooling. We don't need more external DSLs. If you must for some reason make a DSL then use a language that is suited for making internal DSLs like Ruby.

The example in the post

User.ScheduleAppointment has { 
  a UserId userId 
  an AppointmentDatetime appointmentDatetime
  a Location location from {
    a LocationName locationName from location
    a Latitude latitude
    a Longitude longitude
  }
}

could just as easily have been written as a small internal DSL in Ruby or any other language for that matter without losing all the benefits of syntax highlighting and modern IDE support

describe command: 'User.ScheduleAppointment' do
  field('userId', 'UserId')
  field('appointmentDatetime', 'AppointmentDatetime')
  field('location', 'Location', [
    field('locationName', 'LocationName', ['location']),
    field('latitude', 'Latitude'),
    field('longitude', 'Longitude')
  ])
end

Moreover internal DSLs are much easier to debug when issues invariably arise and external DSLs often don't have any kind of debugging capabilities so user are forced to jump through all sorts of hoops to debug an issue.

code of conduct - report abuse