- If using XHTML, close them, period.
- Even if using HTML5, we should close them.
The long story...
XHTML and the origin of HTML5
If we would have a travel machine and we would go to the year of 2000, we would find the very first release of XHTML, which was basically a reformulation of the three HTML 4 document types were supported by web browsers back then.
XHTML was created as an extension of HTML 4.x, in order to make HTML documents compatible with XML strict-tree-like structures. That provided the advantage of using common XML tools to handle and work with HTML documents (as well-formed XML documents).
However, during some years and until HTML5 came into play in 2014 as an actual W3C recommendation, there was some uncertainty about what to use or how to proceed in terms of documents types and, therefore, correct ways to create HTML documents.
We are now in 2020, things are clearer and more mature in terms of standards and web browsers support. However, we shouldn't underestimate XHTML's actual usage in enterprise systems and processes, because sometimes the larger the project/scope is, the more likely we are going to work with XHTML over HTML.
For instance, tasks such as parsing, cleaning or crawling HTML documents become way easier if we have well-formed XML trees.
My following examples of that kind of tasks are more anecdotal, than hard data and facts:
- The process of working with PDF libraries in Java that transform HTML documents to PDF such as iText becomes less error-prone, due to most common used XML parsers work better with XML trees.
- Even if using something like jsoup -the Java HTML Parser for excellence- to auto-close our tags, that will add some processing time to parsing/traversing tasks.
But, what if my authors/tools don't "close" all void elements?
First of all, what is a "void" element?
In the HTML spec a void element is an element that only has a start tag.
Void elements are:
area, base, br, col, embed, hr, img, input, link, meta, param, source, track, wbr
Please check this page for more information: https://html.spec.whatwg.org/#void-elements
And per the W3C HTML spec:
Then, if the element is one of the void elements, or if the element is a foreign element, then there may be a single U+002F SOLIDUS character (/). This character has no effect on void elements, but on foreign elements it marks the start tag as self-closing.
Having said that, even if a self-closing void element is perfectly valid in HTML5 (see https://html.spec.whatwg.org/#start-tags), we should close our void elements, because it is not going to affect anything in a bad way and it will give us a more XML-tree-like structure, which will be faster and more "secure" to parse.
So, instead of having:
<img src="/path/to/image.png" alt="Image">
We should have the following tag:
<img src="/path/to/image.png" alt="Image" />
How about my current website which has a XML strict doctype?
Imagine you have this Doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
That means, you MUST close void elements until you decide to switch to HTML5, which may open a new question in your mind: what are the reasons I am using XHTML in strict mode? The answer will depend on your particular use case, however, from my personal experience, HTML5 is the right doctype to be used in most scenarios.
In a nutshell, my personal recommendation is to switch to HTML5 and always close your tags, for instance:
This kind of tag:
<input type="text" name="search">
Should be transformed to:
<input type="text" name="search" />
Any thoughts? I will be glad to know your opinion.
Top comments (4)
Switch to HTML5. Period.
Hi @Mr.13, thanks for reading and commenting!
HTML5would allow us to have tags such as
<img>without the need to close them, the issue that I have personally faced is with parsers that are no related to web browsers, tools such as
XMLtrees really fast, however, they start to throw exceptions with non-well created trees.
We could fix the
HTMLto convert it into a well-structured tree with tools such as
jsoup, however, that would increase the processing time. This is not necessarily an issue when processing small sites, but if we need to process more than half million of pages in a short time each millisecond of processing counts.
Having said that, I agree with you, in most scenarios just switching to HTML5 will be fine.
I was about to point you to hixie.ch/advocacy/xhtml as a rebuttal to your proposal to "close" void tags, but it looks like it actually (kind of) agrees with you on that point.
Just make sure the people you work with don't mistake HTML5 for XML and then produce
<script src="…" />or CDATA sections, and at the same time also stick to your rule of being XML-compatible (which also means always quoting attributes, never omitting optional tags, etc.)
At work, I ask coworkers to not use
/>, we're doing HTML, not XML, and the clearer you make the better IMO.
I agree with you on this but still confused Why would you parse that many pages?
Even for search engine indexing we need meta data only.
Try some C/C++/Rust based parser.