Converting XML to JSON using Recursion

#javascript #xml #json

The other day, I was working on an app which needed to fetch data from a third party rest api, and what happened next is the thing of the one of the worst nightmares of a JavaScript developer.

The server sent back response in.. gasp.. XML instead of JSON like any sane rest api would do.

So, I came up with a way to easily convert XML into JavaScript Object. Here’s an example of the data I was trying to read.

Keep in mind that this code makes use of WebAPIs so it is not available in server side javascript like NodeJS. This works great for front end applications like React or Angular.

The format of XML is generally something like this:

<book>
    <title>Some title</title>
    <description>some description </description>
    <author>
        <id>1</id>
        <name>some author name</name>
    </author>
    <review>nice book</review>
    <review>this book sucks</review>
    <review>amazing work</review>
</book>

I want the ouput to look a little something like this:

{
  "book": {
    "title": "Some title",
    "description": "some description",
    "author": { "id": "1", "name": "some author name" },
    "review": ["nice book", "this book sucks", "amazing work"]
  }
}

Since, XML has a lot of nested tags, this problem is a perfect example of a practical application of recursion.

Before we begin to code, we need to understand something called the DOMParser Web API.

According to the MDN documentation,

The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM TREE.

In simple words, it converts and XML string into a DOM Tree. Here’s how it works.

Lets say we have an some XML stored in a string, strxml. We can parse the data in it as a DOM tree like this:

let strxml = `<book><title>Some title</title>
<description>some description </description>
<author>
    <id>1</id>
    <name>some author name</name>
</author>
<review>nice book</review>
<review>this book sucks</review>
<review>amazing work</review></book>
`;

const parser = new DOMParser();  // initialize dom parser
const srcDOM = parser.parseFromString(strxml, "application/xml");  // convert dom string to dom tree. 

// Now we can call DOM methods like GetElementById, etc. on scrDOM.

Now that we have got the basics right. Let’s start writing the psuedo code.

Initialize variable jsonResult is empty object. 
If scrDOM has no children nodes:
    return innerHTML of the DOM. // This is our base case.

For each childNode in children nodes:
    Check if childNode has siblings of same name. 
    If it has no siblings of same name: 
        set childnode name as key whose value is json of the child node. (we're calling the function recursively.)
    If it has no siblings of same name
        set childnode name as key whose value is an empty array, every child whose name is same as this pushed into this array.
return jsonResult

Here’s the JavaScript code:

/**
 * This function coverts a DOM Tree into JavaScript Object. 
 * @param srcDOM: DOM Tree to be converted. 
 */
function xml2json(srcDOM) {
  let children = [...srcDOM.children];

  // base case for recursion. 
  if (!children.length) {
    return srcDOM.innerHTML
  }

  // initializing object to be returned. 
  let jsonResult = {};

  for (let child of children) {

    // checking is child has siblings of same name. 
    let childIsArray = children.filter(eachChild => eachChild.nodeName === child.nodeName).length > 1;

    // if child is array, save the values as array, else as strings. 
    if (childIsArray) {
      if (jsonResult[child.nodeName] === undefined) {
        jsonResult[child.nodeName] = [xml2json(child)];
      } else {
        jsonResult[child.nodeName].push(xml2json(child));
      }
    } else {
      jsonResult[child.nodeName] = xml2json(child);
    }
  }

  return jsonResult;
}

// testing the function
let xmlstr = `<book><title>Some title</title>
<description>some description </description>
<author>
    <id>1</id>
    <name>some author name</name>
</author>
<review>nice book</review>
<review>this book sucks</review>
<review>amazing work</review></book>
`;

// converting to DOM Tree
const parser = new DOMParser();
const srcDOM = parser.parseFromString(xmlstr, "application/xml");

// Converting DOM Tree To JSON. 
console.log(xml2json(srcDOM));

/** The output will be
{
  "book": {
    "title": "Some title",
    "description": "some description",
    "author": { "id": "1", "name": "some author name" },
    "review": ["nice book", "this book sucks", "amazing work"]
  }
}
*/

This is the basic algorithm / code for converting an XML string into a JSON object. Since, it uses recursion, it can go very deep into the DOM tree and parse every single element.

This works for most of the cases. You can modify this algorithm according to your own needs or requirements.

Top comments (7)

Stuart • Apr 1 '19 • Edited

The script above doesn't take in consideration attributes. The following does

function xml2json(srcDOM) {

  let children = [...srcDOM.children];

  // base case for recursion. 
  if (!children.length) {

    if (srcDOM.hasAttributes()) {      
      var attrs = srcDOM.attributes;
      var output = {};
      for(var i = attrs.length - 1; i >= 0; i--) {
        output[attrs[i].name] = attrs[i].value;
      }

      output.value = srcDOM.innerHTML;
      return output;

    } else {
      return srcDOM.innerHTML
    }  
  }

  // initializing object to be returned. 
  let jsonResult = {};

  for (let child of children) {

    // checking is child has siblings of same name. 
    let childIsArray = children.filter(eachChild => eachChild.nodeName === child.nodeName).length > 1;

    // if child is array, save the values as array, else as strings. 
    if (childIsArray) {
      if (jsonResult[child.nodeName] === undefined) {
        jsonResult[child.nodeName] = [xml2json(child)];
      } else {
        jsonResult[child.nodeName].push(xml2json(child));
      }
    } else {
      jsonResult[child.nodeName] = xml2json(child);
    }
  }

  return jsonResult;
}

narjune1 • Mar 20 '20

When I ran this code on the example above with the books xml, it returned an error: srcDOM.children is not iterable

Dylan Archer • Feb 21 '19 • Edited

For semantic brevity, would you not want to pull the filter func outside the for..of loop? Also utilizing const seems more ideal.

function xml2json(srcDOM) {

  const children = [...srcDOM.children];
  if (!children.length) return srcDOM.innerHTML

  const jsonResult = Object.create(null),
    childIsArray = (x, y) => x.filter(z => z.nodeName === y.nodeName).length > 1;

  for (const child of children) {
    if (!childIsArray(children, child)) jsonResult[child.nodeName] = xml2json(child);
    else {
      if (jsonResult[child.nodeName] !== undefined) jsonResult[child.nodeName].push(xml2json(child));
      else jsonResult[child.nodeName] = [xml2json(child)];
    }
  }

  return jsonResult;
}

I am researching how others approached this scenario due to a similar surprise on my current project. Nice approach!

David Burg • Aug 28 '19

This will not generate a consistent schema for arrays as it cannot distinguish single element arrays from other complex type. As the xml document doesn't declare arrays you need to fetch the xml schema from the service and look up each complex node element schema for its maxoccur setting. If more than 1, you need to use an array in Json so your payload schema is consistent. Without that you will mostly work and you won't reliably work. I like my services reliable.