DEV Community

Cover image for Building a syntax highlighting extension for VS Code
Matouš Borák
Matouš Borák

Posted on

Building a syntax highlighting extension for VS Code

I spent a few days of my spare time building a VS Code extension that would bring better syntax highlighting for the Slim template language to the editor. I quite enjoyed most of the process so I’d like to share what I learned.

Why?

First of all, I like Slim. I like the beauty and cleanness of Slim templates, to me they are way more readable than regular ERB templates and I think they fit in the ruby/Rails ecosystem very well. Slim is a close cousin to Haml, without the ugly percent characters, haha. I've used Slim exclusively in my projects since about 2016.

But why bother building a new syntax highlighting extension for VS Code especially when there already is a quite popular one? To answer that I need to add a bit more of a personal context (short, I promise)… As a long-time and mostly happy user of the RubyMine IDE from Jetbrains I recently noticed I somehow got used to this subdued, passive role: when I encountered a bug or a missing feature of the editor, I could of course file an issue in the tracker but – that was about it. As RubyMine is closed source code and the plugins ecosystem was hard to grasp for me, I never felt the courage to deep dive into building a custom plugin (actually, I did try hard once – and failed – many years ago, and I can see that things might be a bit easier now with Kotlin, the Gradle toolkit and finally the recent LSP support). So, I waited patiently for the developers to notice the issue and fix it. And sometimes they did! But more often not which is quite understandable for a niche bug or language like Slim.

Now, fast forward to last year's Rails World conference that I was a lucky attendee of. What a breeze of fresh air! Among the many many inspiring people, talks and presentations, I noticed one thing: most people use VS Code, some use Vim but – more importantly – a lot of people tweak their editor / IDE almost as routinely as they tweak the code they work on professionally! And I thought: I want that too, how come I've lost this mindset here? I’ve taken for granted that I can tweak every imaginable aspect of my Linux OS as well as the Gnome environment so why not my IDE – the program that I literary spend most hours a day in? That was the final nudge for me to try to switch to something – anything really – that would be feasible for me to tweak and that’s how I ended up in VS Code. I’m not saying this will be my final IDE destination (looking at you Zed, Fleet or perhaps even Vim) but I know I want to stay closer to where a more active developer community around the editor is.

OK, but why another Slim syntax extension?

After the switch, an opportunity for a small initial tweak came almost immediately: I didn't like the syntax highlighting of the Slim templates in our project. It almost resembled the old days when RubyMine highlighted slashes in some of the Tailwind classes as ugly errors (to be fair, they have fixed it a long time ago):

An old version of RubyMine marking a Tailwind class with a slash as an error

The only Slim extension available for VS Code at that time had somewhat similar problems: it couldn’t recognize slashes in class names, it did not understand multi-line comments, attribute values and expressions, support for some embedded language blocks such as Markdown or SASS was missing.

Slim extension having problems with a multi-line attribute

I said to myself: it couldn’t be that hard to fix, could it? Well… When I looked at the repository, I soon found out that it wouldn’t be that easy: the grammar syntax was in some hardly comprehensible XML format (PList) and looked like being a bare copy from somewhere else, there were no comments or helpful tips anywhere and, above all, the whole thing was basically unmaintained: last changes from several years ago, no issues resolved, no PRs merged. I’m not judging here (of course I made several ”zombie projects“ myself, too), I just hope it’s clear now that I had to go down the rabbit hole and rebuild the extension from the bottom up.

Down the rabbit hole

The official Syntax Highlighting Guide was very nice and helpful. It was not hard to create a blank grammar extension. What turned out much harder was the grammar syntax itself, partly because there was no formal specification available. The official guide showed a basic grammar example but mostly linked to a few articles such as the Textmate grammars from which the VS Code ones originate.

The most helpful resource for me, in the end, was this post by Matt Neuburg. It was a long one and I loved how sincere it was when describing the time involvement needed to deep dive into writing language grammars:

Excerpt from Matt Neuburg’s article about grammars

Strangely enough, it got me hooked, it looked like a fun thing to try! 🙂

Regular expressions

I quickly learned that I needed to invest myself in one more thing before trying to actually edit the files: regular expressions. All grammar matching is based on them. I refreshed the basics and re-read thoroughly the lookahead and lookbehind chapters which turned out to be tremendously useful.

As a ruby developer, I was happy to find that VS Code / TextMate grammar files use the same regular expression engine called Oniguruma as ruby itself. Thus, I could be sure that when trying my regular expressions in my favorite online regex tool, rubular.com, there would be no inconsistencies due to the engine inner workings.

Unit-testing the beast

Oh, one more thing – and actually the most important one – unit tests! I would never be able to finish a working extension without having them at hand, I don’t even understand how did people make language grammars without unit-testing them, it seems just like a too complex task to me. I am no TDD-ist but it was unit tests that gave me the necessary confidence and guided me throughout the process.

I spent some time looking for ways to unit-test the grammar. The official VS Code docs didn’t say a word about that and their Testing Extensions chapter dealt rather with integration tests for the extension itself. I needed something at a lower level.

Luckily, there is a project that has fitted my needs perfectly: vscode-grammar-test. It’s a command line tool that builds on the VS Code regex engine and grammar file parser and allows to run unit tests directly against a given grammar file.

The format of the test files themselves is inspired by the relatively new initiative by the Sublime Text team when they introduced a new grammar file format called Sublime Syntax and – more importantly – a way to unit test grammars. It’s using some lovely human-friendly magic comments that allow to specify what scopes should the grammar file produce for a given position on a given line.

For example, the following Slim line is tested using asserts in the comment lines below it: the <- ”operator“ targets the first character on the tested line and the ^s target the ”underlined“ part of the tested line:

/ unit_tests.slim (i.e. this is a Slim file with comments)

' Verbatim <b>#{text}</b> without processing.
/ <- punctuation.section.verbatim.slim
/ ^^^^^^^^^^^^ text.html.embedded.slim
/          ^^^ meta.tag.inline.b.start.html
/              ...
Enter fullscreen mode Exit fullscreen mode

This is cool. Even cooler is the fact that the Sublime Text team also provide their own official and full-featured grammar for Slim templates as well as unit tests! So, in the end, I just grabbed the test file and used it as a basis for all my development of the VS Code extension grammar file. ❤️

TL;DR recap

Chances are that your head is already exploding so let’s recap the dependencies and inspiration sources for building a VS Code syntax grammar file:

  • VS Code syntax highlighting works with TextMate editor grammar files,
  • they are written in an old but well-thought-out specification based on some clever regexp matching
  • for VS Code they can be written in PList, JSON or YAML formats,
  • the Sublime Text team introduced a newer format for the same task (which is not directly usable in VS Code) and another one for testing the grammars,
  • they also created grammar files for various languages, they are usually well-maintained and include the test files,
  • the vscode-grammar-test tool combines the Sublime Text test files with VS Code / TextMate grammar files and thus allows unit testing the VS Code syntax grammars.

Building the grammar

With all the tools and some basic theoretical knowledge of the problem at hand, I began building the grammar file. Of course I didn’t start from scratch though, I used the official TextMate grammar file (the one in the YAML format) from the Slim language repository.

I opened up the grammar file and the unit test file in my editor and started working iteratively:

  1. I converted my YAML grammar file to the JSON format using the js-yaml tool (see my wiki for more instructions).

  2. I ran the tests with the output redirected to a results.txt log file. This flooded the results log with hundreds of assertion errors. I opened up the log file and scrolled to the first error, for example similar to this one:

    ✖ tests/unit_tests.slim failed
      at [tests/unit_tests.slim:5:1:2]:
      5: /Comment
         ^
      missing required scopes: comment.block.code.slim punctuation.definition.comment.slim
      actual: source.slim comment.block.slim comment.line.slash.slim punctuation.definition.comment.slim
    
  3. Next, I tried to understand what the error was about: was this a real error or a missing feature in the grammar file? Or, was this just a mismatch between what scopes the original grammar file produced vs. what scopes the – much younger – Sublime Text unit test file expected? What scopes does VS Code need anyway? While it was hard to asses these things in the beginning, I slowly began to follow… (I will share some hints below.)

    BTW, I found the GitHub Copilot: Explain This feature in VS Code quite useful. It provided me with explanations that helped me initially, even though I still had to do the hard part – deciding what the error really was about – myself.
    Copilot explaining a grammar test for me

  4. Having an idea about the cause of the error I edited the grammar file and/or the test file and jumped back to #1. And that was all there was to it! 😄

It may sound scary but it was actually quite fun! Although trying to untangle some of the problems definitely was frustrating, the very fast feedback loop that this TDD-ish workflow provided was very satisfying and kept me reassured and oriented along the path. The most pleasant were the situations when I fixed a single thing in the grammar file and then watched tens or even hundreds of errors magically disappear!

Selected grammar file quirks

I will not try to explain how the grammar file works, it would be useless to repeat stuff from other great materials such as the Matt’s post. Instead, I want to briefly mention a few of the peculiarities that one likely hits when playing with the VS Code grammar files:

  • The order of patterns in the grammar file is important - first patterns are tried first.

  • The regexps can and often should use groups (either numbered or named groups). They can then be targeted by the capture keys mapping the groups to the actual VS Code scopes that the given language pattern should produce.

  • Although the regexps in the grammar can match new lines \n, they will never match multiple lines of text even though they are written like they should. In other words, regexps in grammars are forced to always match a single line of text only.

  • The end key of a pattern, which specifies the regexp that the matched language pattern ends with, often acts in a surprising way that one has to keep in mind: if you match some nested patterns inside this one, its end regexp will become a ”floating“ one. It will match the (final) part of the language pattern after all nested patterns have finished their matching.

    So, for example, if the end regexp matches a new line but patterns nested inside the main one also match some new lines, the main pattern end will match a new line only after all the nested patterns are satisfied. In other words, what the end regexp effectively means is that it’s the first match the pattern can have if no other nested patterns prolong the whole pattern.

  • The end regexp can also backreference groups from the begin regexp. This is very helpful in Slim templates because it allows the end regexp to match lines that are nested out of the begin line.

  • The grammar syntax also allows defining a repository of named patterns. The repository very conveniently cleans up the grammar file and allows reusing of patterns.

  • When working with embedded languages, the appropriate language grammar file for the embedded language is needed to better understand how embedding works and to run tests that target embedded language scopes (see the -g option of the vscode-tmgrammar-test tool). It should not be packed into the extension, though, this is a dev-only dependency.

  • VS Code has a convention for scopes naming. Sometimes this convention differs from conventions in other editors, notable those used in the new grammar files in Sublime Text which I used as a basis for the Slim extension. Thus, I had to rename some of the scopes to match the VS Code ones.

Debugging and testing the extension

I used a few ways to try and test the syntax highlighting extension I was working on. First of all, the most important one were the unit tests. Once I got used to the error messages the testing tool reported, I was able to quickly pinpoint the problematic location in the grammar file and could start playing with the matching rules there. To verify that they work, I again ran the tests and watched the changes in the output log.

In more complicated cases, I used the Scope inspector. I created a launch configuration to run the extension in the VS Code Extension host. Then I could run the extension any time, open up a Slim template and start the Scope inspector via the ”Editor: Inspect Editor Tokens and Scopes“ quick action (for which I soon learned the shortcut key). This way I was able to inspect the behavior of the extension in a live document.

By the way, I extracted the official Literate tests from the Slim repository into a Slim template that would be always ready for some manual testing.

From time to time I also packed the extension into a .vsix file and installed it locally into my VS Code (using the ”Extension: Install from VSIX…“ quick action) so that I could watch it highlighting some real code during my regular working days.

Final notes

So here we are, the Slim highlighting extension is ready for you to try. I mostly enjoyed the process of building it, although at times frustrating and having an arcane feel. Thanks to the fast feedback loop, very clear problem boundaries and a good notion of progress it was fun most of the time!

I now realize I quickly made most of the extension four months ago and it was only after I finished covering the grammar file by the unit tests that I somehow lost interest. I had the extension installed locally and it was working well, so I didn’t feel the push to finish the last 5% and actually publish the thing. Until now, luckily. 😅

So, enjoy the extension if you work in Slim, and I encourage you to try to build your own language grammar if you like working with a vintage, sometimes weird but apparently well-thought-out piece of technology.

Want more stuff like this? Follow me here or on Twitter. Cheers!

Top comments (0)