xml-trueformat is a TypeScript library for parsing and manipulating XML documents while retaining their exact original formatting. It stores whitespace, line breaks, comment placement, and attribute order—ensuring a no-op parse/serialize if you don’t change anything, and only minimal diffs when you do.
Why Formatting Preservation Matters
Many XML tools strip “insignificant” whitespace or reformat tags. For files where human-friendly layout and indentation matter—like configuration files, manifests, or annotated XML templates—this can be disruptive. xml-trueformat preserves every nuance of an XML file, so you can add elements or attributes programmatically without rewriting everything else.
When to (not) Use
Clearly, xml-trueformat satisfies a specific set of requirements - the following outlines, in which situations it fits well and where you should rather go for another parser.
Use xml-trueformat if:
- You require exact round-trip output, preserving whitespace, comments, and attribute ordering.
- You handle configuration or manifest files under version control where small diffs are critical.
- You need to insert or remove nodes while leaving everything else unaltered.
Not ideal if:
- You only need to parse XML data (no need for original layout) or convert to JSON.
- You want fast, large-scale data extraction from huge XML files (performance overhead is higher here).
- You prefer a simpler JSON-like object structure, e.g., from
xml2js
, and don’t care about formatting.
Quick Comparison with Other XML Parsers
- xml2js: Converts XML to a JavaScript object, loses formatting and comments. Simpler for data but no layout fidelity.
- fast-xml-parser: High performance, optional “preserveOrder” for node sequence, but still not guaranteed exact whitespace or comments in original positions.
- DOM-based parsers (xmldom, etc.): May preserve some whitespace, but typically not attribute quotes/order or small spacing nuances. Re-serialization often changes formatting.
- sax-js: An event-based streaming parser that’s great for processing large XML on-the-fly. It doesn’t build a modifiable DOM nor preserve formatting. Ideal for fast reads, not for round-trip exactness.
xml-trueformat is purpose-built for retaining everything. In exchange, it’s less streamlined for pure data transformations and may be slower for large files compared to specialized parsers.
How It Works
Under the hood, xml-trueformat uses its own AST (abstract syntax tree) rather than a standard DOM. Every piece of whitespace (indentation, newlines, spacing around attributes) is modeled as text nodes, plus specialized classes for comments, CDATA, etc. When you add or remove something, it automatically:
- Matches Indentation of sibling nodes if possible.
-
Preserves Quoting Style from existing attributes (e.g.,
' '
or" "
). - Keeps Comments and Processing Instructions precisely where they were.
-
Distinguishes between self-closing and non-self-closing elements (e.g.
<tag/>
vs.<tag></tag>
)
Smart Formatting on Modifications
The real magic of xml-trueformat is that when you insert new elements or attributes, it doesn't just plop them in arbitrarily – it matches the existing formatting style. Two helper methods illustrate this well: adding a new element and adding a new attribute (we'll only show the first here).
Inserting Elements without Breaking Indentation
Let’s say you have an XML list of entries:
<users>
<user name="Alice"/>
<user name="Bob"/>
</users>
If you want to add a new <user>
element programmatically, you’d want it indented with the same 4 spaces as the others, and on its own new line. With xml-trueformat, you can do something like:
const newUser = new XmlElement('user', [new XmlAttribute('name','Charlie')]);
userElement.addElement(newUser);
If there are sibling elements, xml-trueformat checks their whitespace usage (like line breaks and indentation) and applies the same to <user>
. This results in a well formatted result:
<users>
<user name="Alice"/>
<user name="Bob"/>
<user name="Charlie"/>
</users>
Of course you could as well manually control formatting, by using XmlElement.addChild
which does not perform any "smart" formatting and gives you full control.
Example Workflow
import { XmlParser } from 'xml-trueformat';
import * as fs from 'fs';
const xmlData = fs.readFileSync('example.xml','utf8');
const doc = XmlParser.parse(xmlData);
// Modify attributes
doc.getRootElement().setAttributeValue('version', '2.0');
// Add new element
const newElem = new XmlElement('feature', [new XmlAttribute('enabled','true')]);
doc.getRootElement().addElement(newElem);
// Serialize
fs.writeFileSync('output.xml', doc.toString());
The output remains faithful to the original indentation and spacing, with only the changes you requested.
Other Perks and Features
Comments and CDATA preserved: Comments, processing instructions, and CDATA sections are not lost or reformatted. They are part of the object model (e.g., there are
XmlComment
andXmlCData
node types) and will round-trip through parse and serialize intact, at their original places.No heavy dependencies: xml-trueformat is implemented in plain TypeScript with no external libraries required. This makes it lightweight and ensures compatibility in Node.js and in browsers. You can use it in a backend script or on a frontend page – anywhere you need to manipulate XML reliably.
Minimal footprint of change: When you do make modifications, xml-trueformat keeps the scope of changes minimal. If you diff the before vs after XML files, you’ll typically see only the lines related to your actual change. This makes code reviews and merges smoother.
Conclusion
xml-trueformat is an excellent choice for high-fidelity XML editing, especially where minor layout changes would be problematic. It’s not the fastest or simplest for data extraction, but if you want minimal diffs and lossless round trips, it’s hard to beat.
Questions or Feedback?
Let me know what features or improvements you’d like. Feel free to open an issue or PR on GitHub — feedback is always welcome!
Top comments (0)