loading...

Advice on Sanemark

yujiri8 profile image Ryan Westlund ・3 min read

While learning Crystal, I ended up forking a Crystal commonmark processor to implement by a new variant of Markdown I specified, Sanemark.

The goal was to remove contravariant features and confusing edge cases from Markdown, to the point of having a syntax that, when specified as thoroughly as Commonmark, is at least 2-3x shorter and requires much less work to implement properly.

So that's the background. Now the questions:

Indentation

Markdown and Commonmark have different ways of doing continuation paragraphs of list items. According to John Gruber's spec, subsequent content in a list item must be indented by "either 4 spaces or a tab". Commonmark has a much more convoluted rule about matching the number of spaces from the containing node, so most of the time the continuing paragraph should be indented to the same column as the first one (but there are confusing edge cases).

I like John Gruber's rule, but there's one problem with it: non-text content, particularly fenced code blocks and blockquotes. If continuing content has to be indented by 4 spaces, a blockquote would have to look like this:

1. > text
    > text
Enter fullscreen mode Exit fullscreen mode

Nasty. I understand the motivation for the Commonmark rule. It's even worse with fenced code.

I suppose the ideal solution is to have blockquotes and fenced code be exceptions that have their lines all indented equally even if it means subsequent lines don't meet the normal criteria for continuing a list item, but that seems inelegant and a challenge to implement because the parser has to override its normal way of "check if the outer block is closed, if not, check if the inner block is closed".

Emphasis spans

It's occasionally desirable (at least in theory) to have italics and bold overlap, like <em>start <strong>middle</em> end</strong> - even though that's not valid HTML, of course. Commonmark doesn't allow this, as *start **middle* end** comes out as <em>start <em><em>middle</em> end</em></em>. Sanemark allows it (renders it as <em>start <strong>middle</strong></em><strong> end</strong>). But implementing a spoiler extension to the Sanemark processor made me realize: why doesn't this also work with links? I had a real-life situation where I wanted link and italic to overlap, but they can't because the Commonmark implementation I forked - and I thought this was for good reason at the time - treats links as a separate context for emphasis processing.

It's because links are a node. To allow italic and bold to overlap, I made it so there aren't italic and bold nodes that contain text, there are "open italic", "close italic", "open bold" and "close bold" that all can't contain text. But that makes me wonder if I'm making a mistake by reducing the amount of abstraction here. It now only parses emphasis delimiters as tokens instead of nodes, which feels like a step further down into implementation details.

I don't know, I'll probably end up doing the same thing for links and spoilers, but I would like to hear some opinion on the issue.

Discussion

pic
Editor guide