Converting HTML to Markdown for Documentation and CMS Migration

#webdev #markdown #programming #productivity

Markdown is the universal format for developer documentation. GitHub READMEs, dev blogs, documentation sites, wikis, and note-taking apps all support it. But content often starts as HTML -- web pages you want to reference, email content you want to archive, CMS exports, or web scraping output.

Converting HTML to Markdown manually is tedious and error-prone. Understanding the mapping between HTML elements and Markdown syntax helps you do it efficiently and catch conversion errors.

The element mapping

Most HTML elements have direct Markdown equivalents:

<h1>Title</h1>          →  # Title
<h2>Subtitle</h2>       →  ## Subtitle
<p>Paragraph text</p>   →  Paragraph text
<strong>bold</strong>    →  **bold**
<em>italic</em>          →  *italic*
<a href="url">text</a>  →  [text](url)
<img src="url" alt="t">  →  ![t](url)
<code>inline</code>     →  `inline`
<ul><li>item</li></ul>  →  - item
<ol><li>item</li></ol>  →  1. item
<blockquote>text</blockquote>  →  > text
<hr>                     →  ---

Code blocks with language hints:

<pre><code class="language-javascript">
const x = 1;
</code></pre>

becomes:

```

javascript
const x = 1;


```
```

`

## What does not convert cleanly

**Tables.** HTML tables convert to Markdown pipe tables, but only simple tables work. Merged cells (`colspan`, `rowspan`), nested tables, and complex layouts have no Markdown equivalent. Conversion tools typically flatten merged cells or drop them.

**Inline styles.** `<span style="color: red;">text</span>` has no Markdown equivalent. The styling is lost. If the styling carries meaning (not just decoration), the conversion loses information.

**Forms and interactive elements.** `<input>`, `<select>`, `<button>` have no Markdown representation. They are silently dropped.

**Complex nesting.** A blockquote containing a list containing a code block is valid in both HTML and Markdown, but the Markdown syntax (nested `>` with indented `-` with indented backticks) is finicky and many converters get the indentation wrong.

**Semantic HTML.** `<aside>`, `<figure>`, `<figcaption>`, `<details>`, `<summary>` have no standard Markdown equivalents. Some Markdown flavors support a subset (GitHub Flavored Markdown supports `<details>`), but conversion tools typically strip these to plain text.

## Common conversion scenarios

**Blog migration.** Moving from WordPress (HTML) to a static site generator (Markdown). The conversion needs to handle posts with images, code blocks, embeds, and internal links. This is the most complex scenario because WordPress HTML includes shortcodes, plugin-generated markup, and auto-formatting quirks.

**Documentation conversion.** Moving from a wiki (HTML-based) to a docs-as-code approach (Markdown in Git). The key challenge is preserving information hierarchy, internal links, and any diagrams or embedded media.

**Content archival.** Saving web pages as Markdown for personal knowledge management in tools like Obsidian, Notion, or Logseq. The conversion needs to extract the main content and ignore navigation, headers, footers, and ads.

**API documentation.** Converting API reference HTML (from tools like Swagger/OpenAPI HTML renderers) to Markdown for inclusion in README files or documentation sites.

## Programmatic conversion

For bulk conversion, Turndown (JavaScript) is the standard library:

```javascript
const TurndownService = require('turndown');
const turndown = new TurndownService();
const markdown = turndown.convert('<h1>Hello</h1><p>World</p>');
// # Hello

World
```

Turndown supports custom rules for handling non-standard elements, which is essential for CMS migration where custom shortcodes or plugin markup need special handling.

I built an HTML-to-Markdown converter at [zovo.one/free-tools/html-to-markdown](https://zovo.one/free-tools/html-to-markdown/) that handles the conversion in your browser. Paste HTML, get Markdown. It handles headings, lists, links, images, code blocks, tables, and bold/italic formatting. Useful for quick conversions without setting up a Node.js environment or installing a library.

---

I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.

DEV Community

Converting HTML to Markdown for Documentation and CMS Migration

The element mapping

Top comments (0)