DEV Community

SEN LLC
SEN LLC

Posted on

Writing a Markdown Parser From Scratch With GitHub, Qiita, and Zenn Theme Modes

Writing a Markdown Parser From Scratch With GitHub, Qiita, and Zenn Theme Modes

The parser is line-based for blocks (headers, code fences, lists, blockquotes) and regex-based for inline (bold, italic, links, code spans). Each block element gets recognized on its first line, then the parser advances until the block ends. No AST, no dependencies, about 200 lines. And switching the preview between GitHub / Qiita / Zenn styles is a matter of injecting different scoped CSS strings.

Every tech blog platform renders Markdown slightly differently. GitHub has its own look, Qiita uses a different heading style, Zenn has its own color palette. When you're writing an article, it helps to see it in the same style as the platform you'll publish to.

🔗 Live demo: https://sen.ltd/portfolio/markdown-live/
📦 GitHub: https://github.com/sen-ltd/markdown-live

Screenshot

Features:

  • Markdown parser from scratch (CommonMark subset)
  • 3 preview themes: GitHub, Qiita, Zenn
  • Split-pane layout with resizable divider
  • Auto-save to localStorage
  • Download HTML / copy HTML
  • Word / character / line count
  • Scroll sync between panes
  • Japanese / English UI
  • Zero dependencies, 61 tests

Line-based block parsing

Block-level elements (headers, lists, code fences) are all recognizable from a line's leading characters:

while (i < lines.length) {
  const line = lines[i];

  if (/^#{1,6} /.test(line)) {
    // Header
    const level = line.match(/^#+/)[0].length;
    blocks.push(`<h${level}>${parseInline(line.slice(level + 1))}</h${level}>`);
    i++;
  } else if (line.startsWith('```

')) {
    // Fenced code block - consume until closing fence
    const lang = line.slice(3);
    const codeLines = [];
    i++;
    while (i < lines.length && !lines[i].startsWith('

```')) {
      codeLines.push(lines[i]);
      i++;
    }
    i++;
    blocks.push(`<pre><code class="language-${lang}">${escapeHTML(codeLines.join('\n'))}</code></pre>`);
  } else if (/^[-*] /.test(line)) {
    // Unordered list - consume contiguous items
    ...
  }
  // ... etc.
}
Enter fullscreen mode Exit fullscreen mode

The parser advances the index i as it consumes lines, so each block handler decides how much to consume. Lists, code blocks, and blockquotes may span many lines; headers are always one.

Inline parsing with sequential regex replacements

For inline formatting, a sequence of regex replacements works — as long as the order is right:

export function parseInline(text) {
  let result = escapeHTML(text);
  // Code spans first (so nothing inside gets formatted)
  result = result.replace(/`([^`]+)`/g, '<code>$1</code>');
  // Images before links (so the ! doesn't get lost)
  result = result.replace(/!\[([^\]]*)\]\(([^)]+)\)/g, '<img alt="$1" src="$2">');
  // Links
  result = result.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2">$1</a>');
  // Bold (** before *)
  result = result.replace(/\*\*([^*]+)\*\*/g, '<strong>$1</strong>');
  // Italic
  result = result.replace(/\*([^*]+)\*/g, '<em>$1</em>');
  return result;
}
Enter fullscreen mode Exit fullscreen mode

Order matters:

  1. Code spans first — anything inside backticks is protected from further formatting
  2. Images before links — because ![...](...) would otherwise be parsed as ! + [...](...)
  3. Bold (`) before italic (`)* — because **text** would otherwise match italic twice

A full CommonMark parser with an AST handles all the edge cases (nested emphasis, spaces around delimiters, etc.), but this shortcut works for 99% of real-world Markdown.

Theme injection

Each theme is a CSS string scoped to the preview pane:

export const THEMES = {
  github: `
    .preview { font-family: -apple-system, "Helvetica Neue", sans-serif; }
    .preview h1 { border-bottom: 1px solid #eaecef; padding-bottom: .3em; }
    .preview code { background: #f6f8fa; padding: .2em .4em; border-radius: 3px; }
    /* ... */
  `,
  qiita: `
    .preview { font-family: "Hiragino Kaku Gothic ProN", sans-serif; }
    .preview h1 { color: #55c500; }
    /* ... */
  `,
  zenn: `
    .preview { font-family: "Hiragino Sans", sans-serif; }
    .preview h1 { color: #3ea8ff; }
    /* ... */
  `,
};

function applyTheme(name) {
  document.getElementById('theme-style').textContent = THEMES[name];
}
Enter fullscreen mode Exit fullscreen mode

Switching themes is a one-line text replacement on an inline <style> element. No flash, no reflow hiccup, no external stylesheet reload.

Scroll sync

Two pane scroll events fighting each other create an infinite loop. requestAnimationFrame breaks the cycle:

let syncing = false;
editor.addEventListener('scroll', () => {
  if (syncing) return;
  syncing = true;
  requestAnimationFrame(() => {
    preview.scrollTop = editor.scrollTop / editor.scrollHeight * preview.scrollHeight;
    syncing = false;
  });
});
preview.addEventListener('scroll', () => { /* mirror */ });
Enter fullscreen mode Exit fullscreen mode

The syncing flag is released only after the rAF callback runs, so the sibling event fires normally but exits early on the syncing === true check.

Series

This is entry #55 in my 100+ public portfolio series.

Top comments (0)