Switching to HTML5 would allow us to have tags such as <img> without the need to close them, the issue that I have personally faced is with parsers that are no related to web browsers, tools such as iText parse XML trees really fast, however, they start to throw exceptions with non-well created trees.
We could fix the HTML to convert it into a well-structured tree with tools such as jsoup, however, that would increase the processing time. This is not necessarily an issue when processing small sites, but if we need to process more than half million of pages in a short time each millisecond of processing counts.
Having said that, I agree with you, in most scenarios just switching to HTML5 will be fine.
I was about to point you to hixie.ch/advocacy/xhtml as a rebuttal to your proposal to "close" void tags, but it looks like it actually (kind of) agrees with you on that point.
Just make sure the people you work with don't mistake HTML5 for XML and then produce <script src="…" /> or CDATA sections, and at the same time also stick to your rule of being XML-compatible (which also means always quoting attributes, never omitting optional tags, etc.)
At work, I ask coworkers to not use />, we're doing HTML, not XML, and the clearer you make the better IMO.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Switch to HTML5. Period.
Hi @Mr.13, thanks for reading and commenting!
Switching to
HTML5
would allow us to have tags such as<img>
without the need to close them, the issue that I have personally faced is with parsers that are no related to web browsers, tools such asiText
parseXML
trees really fast, however, they start to throw exceptions with non-well created trees.We could fix the
HTML
to convert it into a well-structured tree with tools such asjsoup
, however, that would increase the processing time. This is not necessarily an issue when processing small sites, but if we need to process more than half million of pages in a short time each millisecond of processing counts.Having said that, I agree with you, in most scenarios just switching to HTML5 will be fine.
Regards.
I agree with you on this but still confused Why would you parse that many pages?
Even for search engine indexing we need meta data only.
Suggestions:
Try some C/C++/Rust based parser.
I was about to point you to hixie.ch/advocacy/xhtml as a rebuttal to your proposal to "close" void tags, but it looks like it actually (kind of) agrees with you on that point.
Just make sure the people you work with don't mistake HTML5 for XML and then produce
<script src="…" />
or CDATA sections, and at the same time also stick to your rule of being XML-compatible (which also means always quoting attributes, never omitting optional tags, etc.)At work, I ask coworkers to not use
/>
, we're doing HTML, not XML, and the clearer you make the better IMO.