Moksh Gupta

Posted on Jun 11

XML to JSON Conversion - Avoiding the Pitfalls That Actually Bite You

#webdev #javascript #api #devex

Converting XML to JSON looks deceptively simple - until your pipeline silently drops attribute data, crashes on single-item responses, or passes "true" as a string into a boolean check. The structural gap between XML and JSON is wider than most developers expect. This guide covers the real conversion challenges that matter in production code.

Why XML and JSON Don't Map Cleanly

XML carries concepts that JSON simply doesn't support natively - attributes, mixed-content nodes, namespaces, and implicit ordering. A direct parse without any configuration will technically produce output, but that output will be inconsistent and fragile. Understanding where the model breaks down is the first step toward writing a conversion that actually holds up.

XML Concept	JSON Equivalent	The Problem
Attributes (id="123")	Properties	No native attribute concept
Single vs. multiple child elements	Value vs. Array	Single item becomes a string, multiple become an array
All text is a string	Typed values	"true" and "42" need real type conversion
Namespaces (soap:Body)	No equivalent	Naming collisions without careful handling

Handling XML Attributes Without Losing Data

XML attributes - things like id, role, or currency - encode real business data, yet most XML parsers drop them by default. The standard convention is to prefix attribute keys with @ so they land as regular JSON properties alongside text content.

In fast-xml-parser for Node.js, set ignoreAttributes: false and attributeNamePrefix: "@". The default behavior silently discards attribute data, which is a common source of hard-to-debug data loss in API migrations.

import { XMLParser } from 'fast-xml-parser';

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: "@",
  textNodeName: "#text",
  parseAttributeValue: true
});

const json = parser.parse(xmlString);

The Array Problem - The Bug That Catches Everyone

XML doesn't distinguish between a single child element and a collection of child elements. This means a one-item list parses to a plain string, while a two-item list parses to an array - and your array.map() call works fine in testing and fails in production when a single-item edge case arrives.

The fix is to declare known collection tags explicitly using the isArray callback in fast-xml-parser, or to write a post-processing normalization step that enforces array types on known plural keys like products, items, users, and orders. Pick one approach and apply it consistently across your pipeline.

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: "@",
  isArray: (tagName) => {
    const alwaysArrayTags = ['product', 'item', 'user', 'order', 'role'];
    return alwaysArrayTags.includes(tagName);
  }
});

Type Inference - Don't Let "42" Stay a String

XML stores everything as text. Without explicit type parsing, boolean flags come through as "true" strings, numeric IDs come through as "123" strings, and your downstream code has to compensate - or silently misbehaves. Most parsers offer a parseTagValue: true option that handles the common cases automatically.

For custom logic, a simple helper that checks for "true", "false", numeric strings, and empty values covers most real-world needs. Pair it with trimValues: true to strip whitespace from element text.

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: "@",
  parseAttributeValue: true,
  parseTagValue: true,
  trimValues: true
});

Real-World Case - Stripping a SOAP Envelope

A common XML-to-JSON task is migrating from a SOAP-based service to a REST JSON API. The SOAP response wraps the actual payload in soap:Envelope and soap:Body containers that you need to navigate past before extracting your data.

The pattern is straightforward: parse the full XML with attribute support enabled, drill into the envelope wrapper using optional chaining, and remap the inner payload to a clean flat JSON structure. Always declare collection tags like Role as arrays during the parse step - not as a post-processing fix.

function soapToRest(soapXml) {
  const parser = new XMLParser({
    ignoreAttributes: false,
    attributeNamePrefix: "@",
    parseAttributeValue: true,
    parseTagValue: true,
    isArray: (name) => name === 'Role'
  });
  const parsed = parser.parse(soapXml);
  const rawUser = parsed?.['soap:Envelope']?.['soap:Body']?.User;
  if (!rawUser) throw new Error('Could not locate User in SOAP body');
  return {
    user: {
      id: rawUser['@id'],
      name: rawUser.Name,
      email: rawUser.Email,
      roles: rawUser.Roles.Role.map(r => r.toLowerCase())
    }
  };
}

Real-World Case - Migrating XML Config to JSON Config

When migrating apps from XML-based configuration (Spring, Maven, legacy enterprise apps) to JSON config, type inference becomes critical. A port value of 5432 stored as XML text must become a JSON number, not the string "5432", or your app will reject it at startup.

Enable parseTagValue: true during parsing and validate the output schema before replacing your config files in production. Type mismatches in config migrations are easy to introduce and annoying to debug.

Libraries and Tools

For Node.js, fast-xml-parser gives you fine-grained control over attribute handling, array normalization, and type inference. For Python, xmltodict paired with json.dumps() handles most straightforward cases.

For quick one-off conversions without writing code, the DevToolLab XML to JSON converter processes files locally in the browser - no data is sent to a server.

References

Top comments (3)

Luis Cruz • Jun 11

This is an excellent deep dive into the real-world pitfalls of XML-to-JSON conversion. I really appreciate how you highlight attribute preservation, single-item arrays, and type inference, which are often overlooked in production pipelines. The examples for handling SOAP envelopes and config migrations are extremely practical and show exactly how fragile naive parsing can be.
I’d love to collaborate and explore more robust XML/JSON pipelines together—especially for modular APIs or enterprise data migrations where schema consistency and type safety are critical. We could experiment with automated validation, array normalization, and type enforcement strategies to make XML-to-JSON conversions safer and faster.
Would you be open to connecting and prototyping some of these approaches together?

Moksh Gupta • Jun 12

Thank you for the thoughtful feedback! I'm glad you found the article useful. I agree that schema consistency, validation, and type safety are critical for reliable XML-to-JSON conversions. I'd be happy to connect and discuss ideas around building more robust conversion pipelines.

Luis Cruz • Jun 12

Welcome and You can see my profile site.