We needed a very fast way to convert HTML or XML to JSON. A tiny library, dependency-free, efficient, and flexible that can handle complex structures, we can quickly customize for applications, such as:
HTML editors
XML data processing
content sanitizers
template transformers
or anywhere you need to treat markup as structured data
For the sake of flexibility on our side, we ended up writing our own: a single-pass, string-based parser that walks the markup once and builds a JSON general tree that can be used across those different use cases. Considering the variety of errors and oddities you find in real-world HTML, we also included a JSON-to-HTML converter. That lets us write unit tests and validate the quality of each conversion by round-tripping HTML → JSON → HTML.
A simple JSON tree is easier to manipulate than raw HTML strings, which is where this library helps. It parses markup into a minimal structural representation, and the same module can render that structure back to HTML.
JavaScript HTML to JSON converter:
Zero dependencies
Bidirectional
Understands both HTML & XML, including namespaces.
Pretty output — optional pretty-printing with indentation.
The library is an MIT, and if you wish to try it out, you can find more information on our GitHub page below.
Install & get started
npm install @lemonadejs/html-to-json
Github Repository
https://github.com/lemonadejs/html-to-json
What are the limitations, and what is expected for the next version?
HTML entities like
&need better handling;Whitespace is preserved exactly, creating a lot of extra entries in the JSON;
Markup errors are not validated, so no detailed errors are available;
Other useful tools
You can find more tools from the same authors:
Top comments (0)