Announcing stax-xml: StAX style XML Parser for JavaScript

#node #xml #webdev #programming

TL;DR

I built stax-xml, the first StAX-style (streaming) XML parser for JavaScript. It can parse XML files of any size without running into V8's ~1GB string limit that crashes DOM-based parsers. If you work with large XML files, give it a try!

The Problem I Hit

I'm a Java developer. At work, we process large XML files from government agencies and enterprise systems daily. In Java, we use StAX (Streaming API for XML) - a pull-based pattern where you iterate over XML events instead of loading the entire document into memory.

Last month, I needed to do the same thing in a Node.js project. I searched for a StAX implementation in JavaScript. There wasn't one.

What I found instead:

SAX-based parsers (like xml2js which uses sax-js internally): Event-based with callbacks (push-based, not pull-based like StAX)
XML-to-object mappers (like txml, fast-xml-parser): Parse full XML string into JavaScript objects

The problem with SAX parsers: They use callbacks (push-based), making it difficult to process specific element's text data. You have to maintain state across multiple callback invocations, leading to complex code.

The problem with object mappers: They should load the entire XML file into memory as a string first, then build the full object tree. This works great for files under ~100MB, but crashes on large files.

But when I tried parsing a 900MB government census file, I hit V8's hard limit:

RangeError: Invalid string length

V8 can't create strings larger than ~1GB (2^29 - 1 bytes). No matter which parser I used, they all crashed at the same point because they all rely on loading the full XML string first.

Why stax-xml?

I needed a streaming parser that could handle files of any size. In Java, StAX has been the standard since 2004. The pattern is simple: you pull events from the parser (start element, text, end element) and process them one at a time. The parser never loads the full document into memory.

JavaScript's async model is actually perfect for this pattern. With ReadableStream and for await...of, I could build a fully async StAX implementation that feels natural in modern JavaScript.

So I built stax-xml:

Pull-based streaming: Process XML as events, never load the full document
No size limits: Uses constant ~10MB memory regardless of file size (tested with 2GB+ files)
Type-safe: Type guards (isStartElement, isCharacters) for clean TypeScript
Converter API: Zod-style declarative schemas with XPath selector
Cross-platform: Works in Node.js, Bun, Deno, browsers, and edge runtimes (uses only Web Standard APIs)

Quick Example

Event-based API (low-level streaming):

import { StaxXmlParser, isStartElement, isCharacters } from 'stax-xml';

const parser = new StaxXmlParser(stream);

for await (const event of parser) {
  if (isStartElement(event)) {
    console.log(`Element: ${event.name}`);
  }

  if (isCharacters(event)) {
    console.log(`Text: ${event.value}`);
  }
}

Converter API (Zod-style schemas):

import { x } from 'stax-xml/converter';

const bookSchema = x.object({
  title: x.string().xpath('/book/title'),
  author: x.string().xpath('/book/author'),
  price: x.number().xpath('/book/price')
});

const book = await bookSchema.parse(xml);
// Full TypeScript type inference!

When to Use It

For small files (< 100MB):

Use StaxXmlParserSync - synchronous, faster for in-memory strings

For medium files (100MB - 900MB):

Use StaxXmlParserSync for speed, StaxXmlParser for memory efficiency

For large files (900MB+):

Only StaxXmlParser (async) - handles unlimited file sizes via streaming

Benchmark (97MB file):

txml: 1.02s, 897.50 MB memory
StaxXmlParserSync: 1.05s, 13.88 MB memory
StaxXmlParser: 3.61s, 3.13 MB memory

stax-xml trades some speed for memory efficiency and unlimited scalability. Perfect for large files or memory-constrained environments (edge functions, containers).

What's Next?

I'm actively working on stax-xml and would love feedback. Some areas I'm focusing on:

Performance optimizations for medium-sized files
Better error messages and debugging tools
More comprehensive XPath support (currently, XPath spec subset only)
Additional converter API features

Try It Out

npm install stax-xml

GitHub: https://github.com/Clickin/stax-xml
Documentation: https://clickin.github.io/stax-xml
NPM: https://www.npmjs.com/package/stax-xml

If you're working with large XML files and hitting memory limits, give stax-xml a try. I'd love to hear your feedback, bug reports, or feature requests!

Built with Web Standard APIs. Works everywhere. No size limits. 🚀