James Carlson

Posted on Aug 28, 2018

Why Elm is the right tool to build MiniLatex

#elm #latex #minilatex #parsing

LaTeX is the de facto markup language for writing technical articles with lots of mathematics in them — research articles, class notes, etc. In this post, I will describe how I used Elm to build a live-rendering web app for LaTeX. You can try out the demo at MiniLatex Live, which is pictured below. (This article continues below the image)

About LaTeX

A few words about LaTeX. The infrastructure for rendering it in print via PDF is excellent and well-established. For the web, there is a partial solution. A LaTeX document is made of two kinds of text — math mode and text mode. The formulas are in math mode, and the rest — section headings, tables of contents, cross references, theorem environments, macros, plain old prose, etc. — are in text mode. Mathjax or KaTeX can render the math mode material, but to my knowledge there has been to date no way of live rendering all of LaTeX. That is what MiniLatex, a 3k loc Elm library does. In a word, it uses a custom parser based on the elm/parser library to transform LaTeX source text into an abstract syntax tree (AST). Once one has an AST in hand, writing a renderer for the text mode material is fairly straightforward, with the math mode material delegated to MathJax.

The Beginning

I began working on the MiniLatex project just a little over a year ago, shortly after the Elm Europe conference in June 2017. That was when I realized that elm/parser may have the mojo to parse LaTeX. Happily, the parser lived in a framework for building Elm apps, so the combination was perfectly suited to my needs. Even now, a year later, the parser is only 300 lines of code in 3K loc library. But writing it was a real challenge for me: (a) I am a mathematician, not a computer scientist, and I knew only a little about parsing; (b) normally one begins with a grammar, then writes the parser — but there is no written grammar for LaTeX; (c) therefore, one has to reverse-engineer both a grammar and parser; (d) it turns out that any grammar for any adequate subset of LaTeX must be context-sensitive; (e) therefore the standared parser-generator tools cannot be used.

Obstacles

Despite these obstacles, of which I was (fortunately) for the most part only dimly aware, I began work on the parser. The elm/parser library is a parser-combinator library. It gives tools for building up a parser in a sensible way from little pieces. This is ideal for experimentation, which is what I had to do the absence of a grammar. The first step was to define an Elm type for the AST. Here it is:

type LatexExpression
    = LXString String
    | Comment String
    | Item LatexExpression
    | InlineMath String
    | DisplayMath String
    | SMacro String (List LatexExpression) (List LatexExpression) LatexExpression 
    | Macro String (List LatexExpression) (List LatexExpression) 
    | Environment String (List LatexExpression) LatexExpression
    | LatexList (List LatexExpression)
    | LXError (List DeadEnd)

Just 11 lines of code that governs a 300 line parser which is the heart of a 3000 line library. I did not write all 11 lines in one go. I wrote a few lines, tried to build a parser that would handle them, and then iterated.

Slow going at first

The going was quite slow at first. Evan Czaplicki had brought elm/parser, parser combinator library. Although I had read a little about parser combinators, I had never used them. Thankfully, I had much generous assistance from community on the Elm Slack, and Ilias van Peer, who hangs out there frequently, helped me over many, many rough spots. (Thankyou Ilias!) Bye and bye I got the hang of it.

Over the next few months, things came together, and by the time of the of the Elm Conf in September of 2017, I had a decent working prototype. But there was a problem. The parser was too slow! I could easily take 15 seconds to parse a few pages of text. I was chatting back and forth with Ilias about this, who discovered the bottleneck was — characters were boxed into a JS object (as I understand it). The problem was likely to be fixed in the 0.19 version of the compiler, so in the meantime, using another of Ilias' suggestions, implementing a diffing strategy that made incremental parse-render operations (after the initial, full-document parse) very fast.

0.19!!!

All of these efforts came together when the alpha of 0.19 came out. The elm/parser library had been rewritten and as a result the MiniLatex parser (which also had to be rewritten) was incredibly fast. That speedup, combined with the diffing strategy, made live-rendering of LaTeX documents a reality. I hadn't thought that this would be possible, but yes, it was!

Comments

Elm was my first experience with a typed functional language. It took me a while to understand the type system well, and also to appreciate its power. I now realize that this is one of Elm's great strengths. Of course, we as developers experience it in our daily work when we embark on an extreme refactoring operation and come out of the battle an hour or two later with everything working. (Thank you, compiler, for your helpful messages!) But in this project, the type system played a central role. The real beginning of MiniLatex was the moment when I wrote down a first version of the LatexExpression type. The code for the parser grew out of this type definition, and it has played a guiding role throughout.

The success of the MiniLatex project was made possible by the generous help of the community on the Elm Slack, and by Ilias in particular. I would also like to thank Evan Czaplicki and Luke Westby, who helped me at some critical points in the transition from 0.18 to 0.19. It has been a joy to work with Elm, and I would like to give a big shout out of appreciation to Evan and the core Team. Bravo! What a fantastic thing you have made!!

Caveat

MiniLatex is two things: a proper subset of LaTeX and a parser-render for that subset. The subset, which is of course subject to expansion as time and energy permit, is nonetheless adequate to write some pretty heavy lecture notes. MiniLatex documents can also be exported and run through standard machines like pdflatex.

Note

MiniLatex is embedded in an app at knode.io which provides a live editing environment, a searchable repository where the documents you write a are stored, an image uploader, and a searchable image catalogue.

Top comments (5)

DrBearhands • Aug 28 '18

That's really cool man.

I'm curious though, what's the reason for sticking with LaTeX syntax? I've never been a fan personally, many things seemed poorly thought out.

James Carlson • Aug 28 '18

Thanks so much!

LaTeX, for better or for worse, is what mathematicians, physicists, etc. use to write research articles. That is the community I am addressing. The app at knode.io will also handle Asciidoc and Markdown, by the way. But that is for a different audience.

DrBearhands • Aug 29 '18

That's a good reason :-)

I'd personally like to see more LaTeX-inspired programming languages. It's no fun reading chebyshev's inequality by writing out every function name... even worse when it's a pass by reference language so every function need to have an output argument to avoid memory allocation.

Maybe this will help in that respect.

James Carlson • Aug 29 '18

I may look into this down the road, now that I know how to make a parser. But I'll be occupied getting this off the ground for a good while. Thanks for the feedback!

One thing one has to take into account, however, is human behavior. Essentially all mathematicians use & know LaTeX. It is like a second language to them. Getting busy people to adopt something different is very difficult. They would have to see a huge benefit, and even then, I think it would be a lift. Example: QWERTY vs Dvorak. We'll see!!

James Carlson • Aug 29 '18

By the way, I read your monads post. Very nice & very clear.