Discussion on: Designing the ultimate (INCLUSIVE) writing tool. [Part 1 - a WYSIWYG in 745 Bytes! 😱]

View post

How about using pegjs to build a parser to do the syntax highlighting? Sure it's a bit more work, but I'm pretty sure there's a starter grammar out there that could help... like this one

GrahamTheDev • Jul 25 '21

Thanks for the suggestion, I will keep a bookmark on pegjs as if this evolves then it could be useful, but at the moment I want to understand every nut and bolt of what I am building.

I presume pegjs is purely a generator and I don't need it as a dependency once I have created a parser with it?

To be fair the point of the syntax highlighter was more to experiment with the headache that is live colouring on a WYSIWYG editor.

It is actually a really complex thing to deal with if you don't cheat as I did with the double stacked divs perfectly aligned and hiding the content in the contenteditable div that you actually write in.

I am experimenting as one idea for one part can then be applied to another part while I am still prototyping.

For example, one thing I didn't mention in the article is trying to guess if someone has added a "non English" phrase to a sentence like "c'est la vie" so that we can prompt them to add a <span lang="fr"> around the phrase so that screen readers can announce things properly.

The RegEx parsing method used in the syntax highlighter would work nicely on that scenario as I could just do a load of pipe separated words that are common in English and some basic occurrence counting to highlight potential other languages (or at least that is my first thought of how to do it...that could also change!).

Also just having an array of RegExs and corresponding <span> outputs seems like a really simple way to cover basic word lookup for swear words, racist language etc. that isn't sensitive to the context of the document.

One thing I learned in all this is apparently a prebuilt piped RegEx is really efficient using .match on a string when trying to match multiple needles to a haystack.

All very much a work in progress and suggestions like the above are always welcome!

Mike Talbot ⭐ • Jul 25 '21

I'm interested to see how you proceed with this for sure :) PegJS produces a parser that is a standalone JS file - the rules of Peg are very similar to regexes (though no backtracking etc) so that's what brought it to mind.

I'd wonder also about training a model to recognise sub sections of the document etc, though you are right, categorical searches for racist or swear words would also make sense.

GrahamTheDev • Jul 26 '21

Yeah the training a model bit is way beyond my current abilities (hence why I have shelved Grammar suggestions for now as trying to do that with simple algorithms seems very difficult!).

This project does provide some really interesting challenges and I don't think I can ever make a perfect solution. But who knows, maybe I can produce something useful enough that it can be turned into a paid service and I can pay someone smarter than me to handle the scary stuff! 😋🤣

Thanks once again for the suggestion, I will have a play with PegJS at some point in the future as it is interesting from a learning perspective, especially that parser example you linked to, still scratching my head on some of that!