DEV Community

Author
Author

Posted on

Parsing XML

HTML is the most common markup language for web development. HTML is a superset of XML, which is to say it is an extension of the XML specification. What is cool about this fact is that web browsers, in their ability to render HTML, actually come with XML parsers, and have XML parsing capabilities under the hood.

Why Think About XML At All

HTML is the ubiquitous markup language of internet developers. The audience of this blog, software engineers, likely only has need for HTML. Yet, my Media Company deals with many authors of the non-technical variety, and I have got to say... Authors think about their content wayyy differently than HTML gives credit for.

The beauty of XML is its generic stucture which allows for custom parsing and handling. This flexibiliy has been beautifully exemplified in HTML, but the use case of allowing custom definitions is better handled by XML.

XML is a data-carrying language. HTML is an extension of that language that comes with standardized graphical-user interface rendering. To see what I mean by this, open an XML file in a browser. https://alexason.com/uploads/library.xml

As you will see, modern browsers render the file complete with element tags. But also take note that the browser recognizes the datatype, and applies special formatting. In this way, XML is more like JSON.


Parsing XML

While not native to browser rendering agents, it's possible to parse XML using the browser API's DOMParser.

See a gist of this is action
const xmlString = `
  <story>
    <styles>
      <titleStyle>
        <color>#4A90E2</color>
      </titleStyle>
      <paragraphStyle>
        <color>#333333</color>
      </paragraphStyle>
    </styles>
    <title>Elena and the Embrace of Holiness</title>
    <paragraph>In the heart of the village, where the sun kissed the earth...</paragraph>
    <!-- More paragraphs here -->
  </story>`;

const parser = new DOMParser();
const xmlDocument = parser.parseFromString(xmlString, "text/xml");
const parserError = xmlDoc.getElementsByTagName("parsererror");
if (parserError.length > 0) {
  // Handle error
  console.error("Error parsing XML:", parserError[0].textContent);
} else {
  // Successfully parsed the XML
  // XML Document contains a document
  console.log("Parsed XML Document:", xmlDocument);
  const title = xmlDocument.getElementsByTagName("title")[0].textContent;
  const titleColor = xmlDocument.getElementsByTagName("color")[0].textContent;
}
Enter fullscreen mode Exit fullscreen mode


Real Use Case

The example shown demonstrates what is possible with XML, yet the use case of rendering and styling content is better handled by HTML. While the format, resembles HTML, using XML as HTML must not be the best case of XML.

My HTML Developer I know, Israel, writes XML like this. He uses the data format to recreate HTML, then uses JavaScript to make it HTML. While this is possible given the flexibility of XML, if the only use case is for the browser, I'll tell you what I tell Israel: "Just write HTML!"

Join Israel and the HTML Devs at Salvation.

Where to use XML

XML is a great format for intermediate representation. As mentioned, the immediate use case of my company is translating many different Author's (book authors, manuscript writers) representation of their work into a standardized format. The task is to turn Word documents, PDFs, plaintext, and spoken words into some similar data format.

XML could do that, and is exactly used as such in software programs such as Calibre and Manuskript.


This has been a look at XML. It is a widely-recognized format, compatible with many readers and conversion tools. Given it's ease of parsing, W3C recommendation, and ubiquity, XML is a safe language for indefinite data storage.

If you're interested in tools for data science and storage, be sure to Follow this Dev.to. Add a reaction 💖 for more content like this.

A

Sentry blog image

Identify what makes your TTFB high so you can fix it

In the past few years in the web dev world, we’ve seen a significant push towards rendering our websites on the server. Doing so is better for SEO and performs better on low-powered devices, but one thing we had to sacrifice is TTFB.

Read more

Top comments (0)

Image of DataStax

Langflow: Simplify AI Agent Building

Langflow is the easiest way to build and deploy AI-powered agents. Try it out for yourself and see why.

Get started for free

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay