TL;DR
I built stax-xml, the first StAX-style (streaming) XML parser for JavaScript. It can parse XML files of any size without running into V8's ~1GB string limit that crashes DOM-based parsers. If you work with large XML files, give it a try!
The Problem I Hit
I'm a Java developer. At work, we process large XML files from government agencies and enterprise systems daily. In Java, we use StAX (Streaming API for XML) - a pull-based pattern where you iterate over XML events instead of loading the entire document into memory.
Last month, I needed to do the same thing in a Node.js project. I searched for a StAX implementation in JavaScript. There wasn't one.
What I found instead:
- SAX-based parsers (like xml2js which uses sax-js internally): Event-based with callbacks (push-based, not pull-based like StAX)
- XML-to-object mappers (like txml, fast-xml-parser): Parse full XML string into JavaScript objects
The problem with SAX parsers: They use callbacks (push-based), making it difficult to process specific element's text data. You have to maintain state across multiple callback invocations, leading to complex code.
The problem with object mappers: They should load the entire XML file into memory as a string first, then build the full object tree. This works great for files under ~100MB, but crashes on large files.
But when I tried parsing a 900MB government census file, I hit V8's hard limit:
RangeError: Invalid string length
V8 can't create strings larger than ~1GB (2^29 - 1 bytes). No matter which parser I used, they all crashed at the same point because they all rely on loading the full XML string first.
Why stax-xml?
I needed a streaming parser that could handle files of any size. In Java, StAX has been the standard since 2004. The pattern is simple: you pull events from the parser (start element, text, end element) and process them one at a time. The parser never loads the full document into memory.
JavaScript's async model is actually perfect for this pattern. With ReadableStream
and for await...of
, I could build a fully async StAX implementation that feels natural in modern JavaScript.
So I built stax-xml:
- Pull-based streaming: Process XML as events, never load the full document
- No size limits: Uses constant ~10MB memory regardless of file size (tested with 2GB+ files)
-
Type-safe: Type guards (
isStartElement
,isCharacters
) for clean TypeScript - Converter API: Zod-style declarative schemas with XPath selector
- Cross-platform: Works in Node.js, Bun, Deno, browsers, and edge runtimes (uses only Web Standard APIs)
Quick Example
Event-based API (low-level streaming):
import { StaxXmlParser, isStartElement, isCharacters } from 'stax-xml';
const parser = new StaxXmlParser(stream);
for await (const event of parser) {
if (isStartElement(event)) {
console.log(`Element: ${event.name}`);
}
if (isCharacters(event)) {
console.log(`Text: ${event.value}`);
}
}
Converter API (Zod-style schemas):
import { x } from 'stax-xml/converter';
const bookSchema = x.object({
title: x.string().xpath('/book/title'),
author: x.string().xpath('/book/author'),
price: x.number().xpath('/book/price')
});
const book = await bookSchema.parse(xml);
// Full TypeScript type inference!
When to Use It
For small files (< 100MB):
- Use StaxXmlParserSync - synchronous, faster for in-memory strings
For medium files (100MB - 900MB):
- Use StaxXmlParserSync for speed, StaxXmlParser for memory efficiency
For large files (900MB+):
- Only StaxXmlParser (async) - handles unlimited file sizes via streaming
Benchmark (97MB file):
- txml: 1.02s, 897.50 MB memory
- StaxXmlParserSync: 1.05s, 13.88 MB memory
- StaxXmlParser: 3.61s, 3.13 MB memory
stax-xml trades some speed for memory efficiency and unlimited scalability. Perfect for large files or memory-constrained environments (edge functions, containers).
What's Next?
I'm actively working on stax-xml and would love feedback. Some areas I'm focusing on:
- Performance optimizations for medium-sized files
- Better error messages and debugging tools
- More comprehensive XPath support (currently, XPath spec subset only)
- Additional converter API features
Try It Out
npm install stax-xml
- GitHub: https://github.com/Clickin/stax-xml
- Documentation: https://clickin.github.io/stax-xml
- NPM: https://www.npmjs.com/package/stax-xml
If you're working with large XML files and hitting memory limits, give stax-xml a try. I'd love to hear your feedback, bug reports, or feature requests!
Built with Web Standard APIs. Works everywhere. No size limits. ๐
Top comments (0)