DEV Community

Dev Nestio
Dev Nestio

Posted on

HTML to Markdown Converter: Using DOMParser for Reliable Conversion

Converting HTML to Markdown is surprisingly tricky — nested lists, table alignment, fenced code blocks with language hints. I built a browser tool that handles it all using the native DOMParser API.

Try it

HTML to Markdown Converter — DevNestio

What it converts

  • Headings h1-h6 (ATX # or Setext === mode)
  • Bold, italic, strikethrough, underline, mark, sup, sub
  • Links with optional title attributes
  • Images with alt and title
  • Ordered and unordered lists (nested, arbitrary depth)
  • GFM tables with auto-aligned column widths
  • Fenced code blocks with language class detection
  • Blockquotes
  • Horizontal rules
  • Strips script/style tags automatically

Why DOMParser?

Instead of hand-rolling an HTML parser, I delegate to the browser:

function htmlToMarkdown(html, opts = {}) {
  const doc = new DOMParser().parseFromString(
    '<div id="root">' + html + '</div>', 'text/html'
  );
  return processChildren(doc.getElementById('root'), {}).trim();
}
Enter fullscreen mode Exit fullscreen mode

This handles malformed HTML, arbitrary nesting, and character entities for free.

Nested list handling

function processList(node, ctx, ordered, depth = 0) {
  const indent = '  '.repeat(depth);
  Array.from(node.children).forEach(li => {
    const nested = [], inline = [];
    Array.from(li.childNodes).forEach(child => {
      const t = child.tagName?.toLowerCase();
      if (t === 'ul' || t === 'ol')
        nested.push(processList(child, ctx, t === 'ol', depth + 1));
      else
        inline.push(processNode(child, ctx));
    });
    lines.push(indent + prefix + ' ' + inline.join('').trim());
    nested.forEach(n => lines.push(n));
  });
}
Enter fullscreen mode Exit fullscreen mode

GFM table output

Column widths are computed per-column for clean alignment:

| Name  | Role  | Score |
| ----- | ----- | ----- |
| Alice | Admin | 95    |
| Bob   | User  | 82    |
Enter fullscreen mode Exit fullscreen mode

Pipe characters in cell content are auto-escaped.

Options

  • Setext headings: Title\n===== instead of # Title
  • Fenced code blocks: ` vs 4-space indent
  • GFM tables: on/off
  • Preserve BR: treat <br> as \ + newline

88 tests with jsdom

Since DOMParser is browser-only, tests use jsdom as a polyfill:

`js
const { JSDOM } = require('jsdom');
const dom = new JSDOM('<!DOCTYPE html>');
global.DOMParser = dom.window.DOMParser;
global.Node = dom.window.Node;
`


Part of DevNestio — 115 free browser-only developer tools.

Top comments (0)