Converting XML to JSON looks deceptively simple - until your pipeline silently drops attribute data, crashes on single-item responses, or passes "true" as a string into a boolean check. The structural gap between XML and JSON is wider than most developers expect. This guide covers the real conversion challenges that matter in production code.
Why XML and JSON Don't Map Cleanly
XML carries concepts that JSON simply doesn't support natively - attributes, mixed-content nodes, namespaces, and implicit ordering. A direct parse without any configuration will technically produce output, but that output will be inconsistent and fragile. Understanding where the model breaks down is the first step toward writing a conversion that actually holds up.
| XML Concept | JSON Equivalent | The Problem |
|---|---|---|
| Attributes (id="123") | Properties | No native attribute concept |
| Single vs. multiple child elements | Value vs. Array | Single item becomes a string, multiple become an array |
| All text is a string | Typed values | "true" and "42" need real type conversion |
| Namespaces (soap:Body) | No equivalent | Naming collisions without careful handling |
Handling XML Attributes Without Losing Data
XML attributes - things like id, role, or currency - encode real business data, yet most XML parsers drop them by default. The standard convention is to prefix attribute keys with @ so they land as regular JSON properties alongside text content.
In fast-xml-parser for Node.js, set ignoreAttributes: false and attributeNamePrefix: "@". The default behavior silently discards attribute data, which is a common source of hard-to-debug data loss in API migrations.
import { XMLParser } from 'fast-xml-parser';
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@",
textNodeName: "#text",
parseAttributeValue: true
});
const json = parser.parse(xmlString);
The Array Problem - The Bug That Catches Everyone
XML doesn't distinguish between a single child element and a collection of child elements. This means a one-item list parses to a plain string, while a two-item list parses to an array - and your array.map() call works fine in testing and fails in production when a single-item edge case arrives.
The fix is to declare known collection tags explicitly using the isArray callback in fast-xml-parser, or to write a post-processing normalization step that enforces array types on known plural keys like products, items, users, and orders. Pick one approach and apply it consistently across your pipeline.
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@",
isArray: (tagName) => {
const alwaysArrayTags = ['product', 'item', 'user', 'order', 'role'];
return alwaysArrayTags.includes(tagName);
}
});
Type Inference - Don't Let "42" Stay a String
XML stores everything as text. Without explicit type parsing, boolean flags come through as "true" strings, numeric IDs come through as "123" strings, and your downstream code has to compensate - or silently misbehaves. Most parsers offer a parseTagValue: true option that handles the common cases automatically.
For custom logic, a simple helper that checks for "true", "false", numeric strings, and empty values covers most real-world needs. Pair it with trimValues: true to strip whitespace from element text.
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@",
parseAttributeValue: true,
parseTagValue: true,
trimValues: true
});
Real-World Case - Stripping a SOAP Envelope
A common XML-to-JSON task is migrating from a SOAP-based service to a REST JSON API. The SOAP response wraps the actual payload in soap:Envelope and soap:Body containers that you need to navigate past before extracting your data.
The pattern is straightforward: parse the full XML with attribute support enabled, drill into the envelope wrapper using optional chaining, and remap the inner payload to a clean flat JSON structure. Always declare collection tags like Role as arrays during the parse step - not as a post-processing fix.
function soapToRest(soapXml) {
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@",
parseAttributeValue: true,
parseTagValue: true,
isArray: (name) => name === 'Role'
});
const parsed = parser.parse(soapXml);
const rawUser = parsed?.['soap:Envelope']?.['soap:Body']?.User;
if (!rawUser) throw new Error('Could not locate User in SOAP body');
return {
user: {
id: rawUser['@id'],
name: rawUser.Name,
email: rawUser.Email,
roles: rawUser.Roles.Role.map(r => r.toLowerCase())
}
};
}
Real-World Case - Migrating XML Config to JSON Config
When migrating apps from XML-based configuration (Spring, Maven, legacy enterprise apps) to JSON config, type inference becomes critical. A port value of 5432 stored as XML text must become a JSON number, not the string "5432", or your app will reject it at startup.
Enable parseTagValue: true during parsing and validate the output schema before replacing your config files in production. Type mismatches in config migrations are easy to introduce and annoying to debug.
Libraries and Tools
For Node.js, fast-xml-parser gives you fine-grained control over attribute handling, array normalization, and type inference. For Python, xmltodict paired with json.dumps() handles most straightforward cases.
For quick one-off conversions without writing code, the DevToolLab XML to JSON converter processes files locally in the browser - no data is sent to a server.
Top comments (1)
This is an excellent deep dive into the real-world pitfalls of XML-to-JSON conversion. I really appreciate how you highlight attribute preservation, single-item arrays, and type inference, which are often overlooked in production pipelines. The examples for handling SOAP envelopes and config migrations are extremely practical and show exactly how fragile naive parsing can be.
I’d love to collaborate and explore more robust XML/JSON pipelines together—especially for modular APIs or enterprise data migrations where schema consistency and type safety are critical. We could experiment with automated validation, array normalization, and type enforcement strategies to make XML-to-JSON conversions safer and faster.
Would you be open to connecting and prototyping some of these approaches together?