Christian Bewernitz

Posted on Aug 29, 2024

Release 0.9.0 of `@xmldom/xmldom`

#javascript #xml #opensource #release

Context

xmldom is a javascript ponyfill to provide the following APIs that are present in modern browsers to other runtimes:

convert an XML string into a DOM tree ```

new DOMParser().parseFromString(xml, mimeType) => Document
- create, access and modify a DOM tree
new DOMImplementation().createDocument(...) => Document
- serialize a DOM tree back into an XML string
new XMLSerializer().serializeToString(node) => string

Source: xmldom readme

History

Since I started contributing to the forked xmldom library in June 2020, there have been 40 releases.

It is a very interesting and challenging project and will most likely stay that way for quite a while.

According to GitHub over 50 people have contributed to it since it was forked.

Thank you again to all contributors.

And this doesn't count all the people that managed to make the move from the original unscoped xmldom package, to the scoped @xmldom/xmldom package version 0.7.0 to get all security fixes.
The most recent version released as the lts tag is 0.7.13.

The last version with breaking changes was 0.8.0 which was released on Dec 22, 2021, almost 3 years ago.
The most recent version released as latest is 0.8.10.

0.9.0 (2024-08-29)

But what I want to talk about today is all the stuff that has been released under the next tag since October 2022.

I'm really excited about those changes since they are providing a clear foundation for potential future changes.

TLDR: More alignment with the specs, and differences are made as explicit as possible.

1. Enforcing `mimeType` to give back control

One aspect that makes the implementation complex, is that there are different rules for parsing XML vs HTML.
xmldom (to some degree) "supported" both flavors from the beginning. It was even not required to pass a mimeType at all: What rules to apply was decided based on the current default namespace of the XML string/node that was currently being parsed.

This ends with 0.9.0: From now on the mimeType in DOMParser.parseFromString(xml, mimeType) is mandatory and is the only thing that is ever checked to decide whether to apply XML or HTML rules. Basta.

And that information is preserved in the resulting Document (new type property), so when serializing it, the proper rules are applied again.

This was a massive (and potentially breaking) change, but I'm really excited it is ready, since it made tons of related bug fixes possible/way simpler to implement and also reduces the complexity of the API and the implementation.

Additionally it now only accepts the mime types specified, and throws a TypeError in any other case.

Strictness and Error handling

An aspect that personally confuses me about the error handling of the native browser API is that it always returns a Document and if something went wrong, a parsererror node will be the first child of the body:

Since error handling never worked this way in xmldom but the existing error handling was very complex and confusing and badly documented, 0.9.0 simplifies it and now has a (way more) consistent behavior towards any potential error that happens during parsing:
It throws a ParseError 🎉, e.g. in one of the following cases:

In previous versions it was possible for some non well-formed XML strings, that the returned Document would not have a documentElement, which will most likely lead to TypeErrors later in the code.
several non well-formed XML strings will now properly be reported as fatalError which now always prevents any further processing.
several things that have previously not been reported as an error or only have been reported as a warning are now also reported as a fatalError

There are still cases left which are reported as a warning (especially when parsing HTML) or as an error which do not stop the data from being processed, but the new error handling makes it very easy to decide how strict the code that uses xmldom needs to be.

The (non spec compliant) option that can be passed to the DOMParser constructor is called onError.
it takes a function with the following signature:



function onError(level:ErrorLevel, message:string, context: DOMHandler):void;

ErrorLevel is either warning, error or fatalError
xmldom already provides an implementaiton for the two most common use cases:
- onErrorStopParsing to throw a ParseError also for all error level issues
- onWarningStopParsing to throw a ParseError also for all error level issues

It is a recommendation to apply one of them to stop processing XML on the first signal of anything unexpected:



// prevent parsing of XML that has errors

new DOMParser({onError: onErrorStopParsing}).parseFromString(...)

// prevent parsing of XML that has warnings

new DOMParser({onError: onWarningStopParsing}).parseFromString(...)

`compareDocumentPosition`, extended HTML entities , `null` instead of `undefined`, ...

Another fork of the original xmldom repository made it's way back into our repo by extending the HTML entities to the complete set (also available in 0.8.x) and porting over the implementation of the compareDocumentPosition API. Thank you, and welcome @zorkow

Along the way several places where xmldom so far returned undefined instead of null, have been fixed to adhere to the spec.

And I discovered that the former author seems to have preferred iterating from the end of a list in so many places, that attributes were processed in the reverse order in multiple places, which is now fixed.

The implementation of the removeChild API changed quite a bit, to comply to the spec and throws a DOMException when it should.

And 3 related bugs were fixed in a way that clearly states what the future direction of xmldom is:
Support for lax HTML parsing rules will only be provided if proper strict XML parsing doesn't suffer from it.
The former (broken) "support" for automatic self closing tags in HTML is gone.

coctype internalSubset

More recently @shunkica invested a huge amount of time end effort to fix tons of issues in the former handling of the internalSubset part of the !DOCTYPE.

It is now preserved as part of the internalSubset property of the doctype of a Document and many wrong doctype declarations are now correctly detected as such and reported as a fatalError.

Also thanks to @kboshold for the latest bug fix in this area.

Along the way we created a new module containing regular expressions for the relevant grammar, and correctness checks are based on those and they are properly covered by tests.

It is not the goal of xmldom to become a validating parser, but this a great step to support those documents that come with more complex DTDs.

And there is even more

Up to now development was done using Node v10, since this is also the lowest version xmldom currently supports. As part of the work on the upcoming version, I decided to switch to v18 for development, since more and more devDependencies also made this a minimum requirement. This will be the new minimum runtime version for the time being starting with this release.

I initiated a public poll / dicussion to ask people which version of Node or other runtimes they need support for.
The next breaking release will most likely drop support for some older Node versions, if there is no feedback indicating something different.

Along the way plenty of APIs have received jsdoc comments with proper types.

Thank you

for taking the time to read through all of this.

Those are quite some changes, and I'm very excited to be able to ship those.

I hope you are as excited as I am :)

If you need more details you can go through the very detailed changelog, or head over to the repository and join or start a discussion or file an issue.

DEV Community

Release 0.9.0 of `@xmldom/xmldom`

Context

History

0.9.0 (2024-08-29)