DEV Community


Discussion on: What is a project, you are really proud of?

mayankjoshi profile image
mayank joshi Author

So, from what I read on Github, all I understood is that it is parse a PDF and plain-text files and generates a Machine parsable XML file.

Parsinator allows you to extract relevant information from any text-based file.

By this line, you mean that it only parses the main content, the actual information and ignore lots of unnecessary details, like date, page number. right?
If this is the case then there might be lots and lots of base cases you have taken care of.

Tell me if my understanding is wrong.

I need to dig deeper now, as it sounds interesting.

canro91 profile image
Cesar Aguirre

Yes, you're right. The idea was to parse from a set of composable rules or "parsers" the relevant information from a pdf or a plain text file. It was heavily inspired by parsers combinators from Haskell. The main use case was given a pdf for a multi-page invoice, create an xml to feed a REST API.

Forem Open with the Forem app