Doing software development since I can remember. Big fan of the JavaScript and Rust ecosystems. Long-time Java and JavaScript developer. Music producer wanna-be. ;)
Location
Warsaw, Poland
Work
Software Architect | Senior Full-stack Java/JS Developer at ISOLUTION
We can parse a string into a DOM Document with the DOMParser class. From there we can use a function to traverse the DOM and eliminate any text and comment nodes (nodes have types assigned). This is going to be a bit lengthy:
Let's parse a sample document:
constdom=newDOMParser().parseFromString(`
<!doctype html>
<html>
<head>
<title>Test</title>
</head>
<body>
<strong>Simple text<\/strong>
<!-- comment -->
<script>
document.write('<em>This is not</em> <em>a part of the document</em>');
console.log('This is not as well');
<\/script>
</body>
</html>`,"text/html");
We have here a simple HTML document with new lines, tabs/spaces, a comment and a script block. I've had to escape the closing script tag or otherwise Firefox and VSCode were complaining (unterminated string).
Let's write a simple minify function (recursive - I'm lazy ;) ):
functionminify(parent){// we have to make a copy of the iterator for traversal, because we cannot// iterate through what we'll be modifying at the same timeconstvalues=[...parent?.childNodes?.values()];for(constnodeofvalues){if(node.nodeType==Node.COMMENT_NODE){// remove comments nodeparent.removeChild(node);}elseif(node.nodeType==Node.TEXT_NODE){// test for pure whitespace node (not containing characters other than whitespaces)if(!/[^\s]/.test(node.nodeValue)){// remove pure whitespace nodeparent.removeChild(node);}}else{// process child node recursivelyminify(node);}}}
It's simple and won't turn into a mess once you try implementing corner cases (like preventing regex from parsing what's inside a script tag). It also gives more flexibility and control (as is the case with code vs regex).
Finally, let's use it:
console.log(`<!doctype ${dom.doctype.name}>\n${dom.childNodes[1].outerHTML}`);// original HTMLminify(dom);console.log(`<!doctype ${dom.doctype.name}>${dom.childNodes[1].outerHTML}`);// minified HTML
Yes, I know doctypes are a bit more complex, when you take pre-HTML5 document types into account, but for the sake of simplicity let's assume we're only dealing with simple HTML5 document type.
The first log will print the formatted HTML code generated from the unminified DOM Document. The second log will print it after minification (removal of unnecessary nodes). Outputs to compare below:
First logging - before minify:
<!doctype html>
<html><head>
<title>Test</title>
</head>
<body>
<strong>Simple text</strong>
<!-- comment -->
<script>
document.write('<em>This is not</em> <em>a part of the document</em>');
console.log('This is not as well');
</script>
</body></html>
Second logging - after minify:
<!doctype html><html><head><title>Test</title></head><body><strong>Simple text</strong><script>
document.write('<em>This is not</em> <em>a part of the document</em>');
console.log('This is not as well');
</script></body></html>
While the HTML document has been minified, the JavaScript code remained unchanged. In our minify function we could add another condition for detecting script tags and minifying them differently (e.g. compare node.nodeType === Node.ELEMENT_NODE and check if node.nodeName === 'SCRIPT').
It's just a simple example of how you could use DOM to minify your HTML. It could also be used as a parser for XML documents and such, among other use cases.
I do like your answer I think it has a great specific use case, but I am confused by the rigidity of this approach. Example if someone minifies a chunk of html which has no doctype or head or body. First how would your code handle full html files and html chunks? From my testing of your code you can either do one or the other but not both. Is there something I am missing because I do like your answer but not sure it has the flexibility of minifying any html you throw at it.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
But of course. :)
We can parse a string into a DOM
Document
with theDOMParser
class. From there we can use a function to traverse the DOM and eliminate any text and comment nodes (nodes have types assigned). This is going to be a bit lengthy:Let's parse a sample document:
We have here a simple HTML document with new lines, tabs/spaces, a comment and a script block. I've had to escape the closing
script
tag or otherwise Firefox and VSCode were complaining (unterminated string).Let's write a simple
minify
function (recursive - I'm lazy ;) ):It's simple and won't turn into a mess once you try implementing corner cases (like preventing regex from parsing what's inside a script tag). It also gives more flexibility and control (as is the case with code vs regex).
Finally, let's use it:
Yes, I know
doctypes
are a bit more complex, when you take pre-HTML5 document types into account, but for the sake of simplicity let's assume we're only dealing with simple HTML5 document type.The first
log
will print the formatted HTML code generated from the unminified DOM Document. The secondlog
will print it after minification (removal of unnecessary nodes). Outputs to compare below:First logging - before
minify
:Second logging - after
minify
:While the HTML document has been minified, the JavaScript code remained unchanged. In our
minify
function we could add another condition for detectingscript
tags and minifying them differently (e.g. comparenode.nodeType === Node.ELEMENT_NODE
and check ifnode.nodeName === 'SCRIPT'
).It's just a simple example of how you could use DOM to minify your HTML. It could also be used as a parser for XML documents and such, among other use cases.
I do like your answer I think it has a great specific use case, but I am confused by the rigidity of this approach. Example if someone minifies a chunk of html which has no doctype or head or body. First how would your code handle full html files and html chunks? From my testing of your code you can either do one or the other but not both. Is there something I am missing because I do like your answer but not sure it has the flexibility of minifying any html you throw at it.