loading...
Cover image for In defense of XML

In defense of XML

nfrankel profile image Nicolas Frankel Originally published at blog.frankel.ch ・7 min read

When I started my career, XML was ubiquitous. The meta-information in a Java JAR file - the manifest - follows a proprietary format. But Java EE designers built it from the ground up on XML: meta-information of all artifacts is in XML format e.g. web.xml, ejb-jar.xml, application.xml, etc.

Java EE is one example I experienced personally. But XML was everywhere in the enterprise world at the time. Its prevalence manifested itself in two areas: configuration and data transfer.

Ever since then, it would be an euphemism to say XML has been losing in popularity. Other formats, such as JSON and YAML, have replaced it in the hearts of developers. In this post, I'd like to:

  • Explore some of the reasons why the mighty XML has fallen
  • Raise some downsides of the popular alternatives
  • And describe how XML already solved those problems

The downfall of XML

I think there are several reasons that led to the downfall of XML. It's not a single one, but the conjunction of them that led to the current state.

Associated with "Enterprise"

I'm afraid the worst flaw of XML is its close association with the enterprise world. As everybody knows, Enterprise is notoriously bad - by definition: bloated, heavy, not nimble, etc. And yes, that's sarcasm if you wondered.

In general, perception trumps truth. Developers are no different in that regard. In the end, that's how Hype-Driven Developers - and most developers, perceive XML nowadays.

Lack of integration with front-end

One of the main usages of XML was in the realm SOAP web services. Let's be frank about it: the ease of consuming those web services from JavaScript and/or the browser is not spectacular.

It' no wonder that JavaScript Object Notation, aka JSON became a de facto standard. JSON brought REST along with it. As its name implies, JSON is JavaScript native, while XML is not.

Steep first steps

JSON is quite easy to start with, YAML even more so. Even with bare XML, one has the concept of namespaces, which are not beginner-friendly. XML allows one document to use elements from different namespaces. On the flip side, it makes designing simple documents more complicated.

XML has a lot of powerful features, but all this power can be confusing to beginners. I willingly admit that they make easy things more complex than they should.

Performance

I've stumbled upon the performance "argument" a couple of time. This is usually "proven" by using a sample describing the same in XML, JSON and YAML. Because of its opening and closing tags, the writer shows that XML is quite noisy compared to the other two.

IMHO, this argument is shallow, as all 3 formats are text-based. Thus, you can - and should - compress files. Parsing might be a bit slower, but it depends a lot on the exact parser (and the associated technology stack). In the end, the overhead of transmitting and parsing in XML - if any - is negligible compared to the total time in the whole use-case.

People who favor YAML over JSON use the same reasoning: less characters.

Abuse

The above reasons are more or less congruent with XML. Yet, I'm more than willing to admit architects have been abusing XML. I've personally seen SOAP webservices with payload in the order of several megabytes. At that time, you might imagine the performance of such design was not stellar.

Failings of alternatives

JSON, YAML & al. all have their own failings. Here's a sample of them:

  • JSON has no comments. The most usual fallback is to use the "_comment" property.

    {
      "foo": {
        "_comment" : "My important comment",
        "bar": true
      }
    }
    
  • YAML has no governing body. Individuals manage the specification.

  • YAML has 22 ways to write booleans - no less!

    Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway.

        -- Nobody wants to write YAML

To cope with the above, other formats have poped-up:

  • TOML draws its inspiration from the https://en.wikipedia.org/wiki/INI_file format. It allows nested hierarchies of properties
  • Lightbend pushes the HOCON format:

    This is an informal spec, but hopefully it's clear.

        -- HOCON README

    This one statement doesn't fill me confidence.

The original sin: the lack of grammar

Whatever the format, regardless of their own specific downsides, one of the most important issue is for clients to decide if the read data is correct or not.

When using JSON and YAML, the different clients need to provide ad hoc validation. Issues arise when the provider changes the data format:

  1. How to make clients aware that the format changed?
  2. What information to communicate to the client about the format change?
  3. How to keep validation synchronized across clients?

XML has this issue solved since the beginning by providing a grammar. A grammar plays the same role for a XML document as constraints and types in a SQL database. The most important difference is that you can externalize the grammar.

Several XML grammar implementations are available: Document Type Definition, XML Schema, Relax NG, etc. The most widespread one is XML Schema. Since a XML Schema is also written in XML format, a web server can host it. Then you can reference it by a publicly-accessible URL.

This approach solves the above issues: when a client receives an XML document, the former looks at the XML Schema URL. It can then fetch it, and check that the data conforms to the schema.

Changing the data format is as simple as versioning the XML Schema file, and publishing it under a new URL.

Other benefits of XML

In this section, I'd like to list a couple of benefits of using XML.

Public open stewardship

XML is not under the stewardship of a single person or a company, but of a NGO, namely the W3C. A W3C specification has a publicly documented process and defined lifecycle.

Battle-proven

XML is not hype, but benefits from plenty of documentation, blog posts, and FAQs available

Composable

While XML doesn't strictly enforces namespaces, it's considered a good practice. This way, similarly-named entities defined in different namespaces can co-exist in the same document without confusion about semantics.

Different flavors

XML parsing comes into two flavors:

  1. Tree-based parsing i.e. Document Object Model. It loads the whole document in memory
  2. Event-based parsing i.e. Simple API for XML. It makes possible the parsing of large documents.

Note that SAX is not a W3C specification.

Implementation in different languages:

Every commonly-used language in the industry offers at least one XML parsing implementation. This is either baked in the standard library that comes along the language, or available in a third-party one. Here are a couple of them:

Language Implementation DOM SAX Notes
Java Standard Library
Java Standard Library
Ruby Nokogiri Wrapper around libxml2
Python Standard library
Go Standard Library
Go Gogogiri ? Wrapper around libxml2
C# Standard Library
C libxml2
C libexpat
C++ pugixml
C++ Xerces
Erlang Standard library
Erlang Fast XML
NodeJS libxmljs Wrapper around libxml
NodeJS node-expat Wrapper around libexpat

Document transformation

XSLT is a W3C specification. It allows to transform one XML document into another document in a declarative way. Target documents can be either XML themselves, or not.

Document querying

XPath is another W3C specification. It defines how to query XML documents, similar to CSS selectors.

Conclusion

XLM has a lot of advantages compared to other more alternative technologies. In addition to what I described above, it benefits from a rich ecosystem.

It's not considered hype by a lot of young (and not so young) developers. I believe would be beneficial if our industry would value more battle-proven technologies than new shiny ones.

To go further:

Originally published at A Java Geek on September 27th, 2020

Discussion

pic
Editor guide
Collapse
hanpari profile image
Pavel Morava

Seriously, I don't care about YAML, but you lost me as soon as I saw complaints about significant white spaces. When I started with Python about many years ago, they thought the battle had been already won, but still many do complain about significant whitespaces even though dealing with brackets is sheer masochism.

XML is over-engineered because it does not provide simple solution for simple problem. People tend to pick up the simplest way and XML offers none.

Collapse
nfrankel profile image
Nicolas Frankel Author

People tend to pick up the simplest way

Agree. And afterwards, they need to build solutions on shallow foundations because of that.

Collapse
hanpari profile image
Pavel Morava

Nope
They just fail to keep things simple.
This is another natural tendency and more important than tools in which their complexity borders unusability.
In most cases, complexity is not linear which means that if you level up in complexity, you can meet the edge of what is humanly possible.
The XML is giving the false sense of security, leading into the unmanageable abyss.

Avoid complex and complicated things, and you can keep simple tools.

Thread Thread
nfrankel profile image
Nicolas Frankel Author

Avoid complex and complicated things

It seems we had different stakeholders. And believe me, I pushed back as much as I could

Thread Thread
hanpari profile image
Pavel Morava

Use a different vocabulary. Have you ever seen Yes, Minister?

You want to increase complexity. It is a bold decision.

The "bold decision" is an unpopular one, politicians losing votes because of it :)

youtube.com/watch?v=jNKjShmHw7s

Collapse
alxgrk profile image
Alexander Girke

I totally agree with you, that it would be wrong to condemn XML. As you mentioned, it might be hard to start with, but after that tough beginning all of what you wrote in favor applies.

However, I've worked with large XML files and the tooling (at least in Java) is everything else but fun. Stream parsing using SAX might be okay, but as I also wanted to transform large files using XSLT (which was a real pain to write btw) in a streaming manner, it took a lot of custom code to get that done.
Unfortunately, the alternatives are indeed not even better. For JSON there is at least JSON Schema. But I don't think, there is something like XSLT for JSON Schema, is it?

Collapse
patarapolw profile image
Pacharapol Withayasakpunt

I am thinking if there should be a non-native data structure, just to hold YAML comments. (XML already deserialized comments, BTW.)