DEV Community

Cover image for Hello XML, My Old Friend; I've Come To Encode You Again.
Rion Williams
Rion Williams

Posted on

Hello XML, My Old Friend; I've Come To Encode You Again.

This was originally posted on my blog.

Depending on how long you've been a developer, XML might be a term that sounds totally foreign. For others, it might bring back long repressed memories of some dark days.

In many of my encounters with developers from broad ranges of experiences, I've seen the spectrum from the die-hard JSON lovers to the folks that believe XML is the one true encoding. Either way, if you have written any web services or Web APIs over the past decade, you've likely worked with one of these before.

This post is about the middle-ground on that spectrum: converting from XML to JSON and the many potential shortcomings of going down that road.

Why might you ever want to do this?

Ever created a Word document? How about Excel? Consumed an RSS feed? Designed an SVG? Configured a .NET application? Worked with just about any of the data industry standards (e.g. HL7, etc.)

If you answer yes to any of these, then XML played a role in some way, shape, or form. It's everywhere and pretty damn good at what it does. Heck! The .NET team tried to migrate away from XML during the infancy of .NET Core and ended up crawling back and scrapping the JSON-based alternative.

Now why does that matter? Well, let's say that you are building a brand-new application using Web API. You probably are going to be going JSON all the way, which could be a problem if you need to consume some old files that are XML encoded, thankfully there's a trick for that thanks to JSON.NET, which I'll probably safely assume is in your current application:

public string ConvertXmlToJson(string xml)
{
    // Load your existing XML string into an XML document
    XmlDocument document = new XmlDocument();
    document.LoadXml(xml);

    // Return a JSON-encoded version of the document
    return JsonConvert.SerializeXmlNode(document);
}
Enter fullscreen mode Exit fullscreen mode

Likewise, if you needed to convert an existing JSON string into XML, well JSON.NET has your back there once again:

public string ConvertJsonToXml(string json)
{
    // Return an XML string representing the JSON data
    XNode document = JsonConvert.DeserializeXNode(json, "Root");
    return document.ToString();
}
Enter fullscreen mode Exit fullscreen mode

That's it.

Now it's worth noting that if you are planning on returning this directly to the client to be consumed as JSON, you'll want to ensure that you have the appropriate content type set to "application/json"

Potential Bumps in The Road

As you might imagine with both these formats being so generic, there's probably a large likelihood of some crazy stuff happening during the conversion process - and you'd be right.

This Stack Overflow response from montewhizdoh does a pretty comprehensive job of explaining the major things to check for to ensure that your ride isn't too bumpy.

XML to JSON

  • All data is stringified - All data within the XML objects will be converted to their string equivalents (e.g. "false" instead false, "0" instead of 0, etc.) Since Javascript treats these very differently, it's worth looking out for.
  • Children elements can become nested-object [] OR nested-array {} depending if there is only one or more than one XML child-element - A nested object and a nested array would both be consumed very differently in Javascript, so this is something to consider. JSON.NET will recognize a json:Array='true' attribute, which can be used to work around this issue in some cases.
  • XML must be well-formed - While the XML doesn't have to perfectly conformed to standards, you must include a root element and element names cannot begin with numbers.
  • Empty elements are not converted - Any blank elements within the XML document will not be converted to JSON and are simply ignored.

JSON to XML Conversion Issues

  • A top-level element is required - Since XML expects a root element, one must be present that can be converted to the root of your XML result or the parser will fail.
  • Object names cannot begin with numbers - Your object names cannot start with a number, as they cannot be converted to elements. XML is technically more strict for this, but you can sometimes get away with breaking some of the naming rules.

I'm sure that your mileage may vary and that every scenario might require a few data massage sessions to get everything converted across as expected. Hopefully, this will give you a few things to keep on the top of your mind should you have to work with any legacy XML files in your brand new Web API application.

Top comments (7)

Collapse
 
martinhaeusler profile image
Martin Häusler

Nice article. Just as an addendum, JSON does have shortcomings in comparison to XML:

1) JSON has no standardized way of expressing type information for each node (XML has named tags for that). Some conventions do exist (e.g. a specifically named "type" property in a node that tells the reader what this object actually is) but that's a convention, not a standard.

2) JSON has no notion of identifiers, at all. XML on the other hand has standard ID- and IDRef-Types. A good validator can even tell you that an IDRef points to no other known element within the document at parse time.

3) This is a little bit related to 2). While XML can represent arbitrary object graphs, JSON natively only supports trees. Without relying on conventions again, it is not possible to convert a general object graph to JSON because it cannot deal with backreferences.

4) Vanilla JSON (i.e. not JSON5) has no comments. No big deal for data exchange, but being unable to put a comment to a dependency in a package.json is more than annoying.

Especially the first three points can be absolute dealbreakers. If you take the XStream library in Java, you pass it an arbitrary (!) object and get an XML. You pass it the XML, and get back the object. You don't specify which class you want to serialize or deserialize, because the XML data is complete. I have yet to encounter a JSON (de-)serializer which can do that. There are reasons why XML was so popular. JSON is much more lightweight, but it can get messy / ambiguous easily. It takes a lot more thougt and effort to produce a good, usable and unambiguous JSON format compared to XML. Enjoy responsibly.

Collapse
 
erebos-manannan profile image
Erebos Manannán • Edited

Quick couple of comments:

1) JSON has no standardized way of expressing type information for each node
(XML has named tags for that). Some conventions do exist (e.g. a specifically
named "type" property in a node that tells the reader what this object
actually is) but that's a convention, not a standard.

json-ld.org/spec/latest/json-ld/ is a standard, though not 100% sure what you mean with the point so not 100% sure this fits your request, and it might easily exceed a lot of what you want from it too

4) Vanilla JSON (i.e. not JSON5) has no comments. No big deal for data
exchange, but being unable to put a comment to a dependency in a package.json
is more than annoying.

This just means that the person who decided to use .json format for their configuration was not particularly clever even though they probably thought they were. JSON is primarily a data transfer format, and should not be used for everything, incl. arbitrary configuration (AWS should die in a fire with their massive JSON configuration mess). XML would be (and is) also a mess for storing configuration for things.

For example YAML is MUCH better for that kind of use.

Collapse
 
martinhaeusler profile image
Martin Häusler

By "standard" I simply mean what the language supports on its own - not how you use it. In that regard, JSON has objects, arrays and properties. Nothing more, nothing less. XML has the concept of an identifier built natively into the language. Also, it is not uncommon to see an element reference another one by means of an XPath expression within the XML document itself. Yes, JSON can emulate all of that, no doubt about it, but I rarely ever encounter such cases in practice. I assume that people simply try to avoid these scenarios when working with JSON, or switch to a different format when they are inevitable. Even as a data interchange format, JSON does have its limitations. As a Java programmer, I would much rather work with XML, but in the web world, I guess that JavaScript devs prefer JSON.

I agree that using .json for the package configuration was not a very smart decision ;-) That's exactly what I meant with "enjoy responsibly".

One advantage that JSON definitly has over XML is that it is much easier to write manually than XML, and also more lightweight to read. I never worked with YAML so far, but it seems like a middle ground between XML and JSON.

Collapse
 
millebi profile image
Bill Miller

One of the most important things: Sequencing! In XML (and JSON too) the order in which items are inserted into the resulting string can be extremely important. The biggest problems are Name/Value pair objects (i.e. a:{"first":"1", "second":"2"...}) where most of the serialization libraries will dump them in some partially predicatable sequence but if that's not what the "other end" is expecting or is validating against (DTD's anyone?) it may fail for stupid reasons.

Also beware of numeric precision. A string representation of a float value may not have been serialized "properly" depending upon your needs. Especially if the value was supposed to be encoded using hex, octal or BCD instead of "simple" base 10.

Collapse
 
veddingindia profile image
Awesome • Edited

Wonderful article. and would like to share a tool which helps XML developers. codebeautify.org/xmlviewer

Collapse
 
ben profile image
Ben Halpern

Super well written.

Collapse
 
thejoezack profile image
Joe Zack

Agreed, definitely worth a follow!