The problem
I work with hotel reservation systems that dump SOAP/OTA XML responses — sometimes 1-2 GB per file. Every XML viewer I tried either crashed, froze the tab, or ran out of memory. Notepad++ tops out around 200MB. Browser-based XML viewers load everything into a DOM tree that eats 3-10x the file size in RAM. A 500MB file? That's 4GB of RAM just to render it.
The solution
I built XML Stream Parser — a Chrome extension that handles XML files up to 2GB without freezing your browser.
How it works (the interesting part)
The core idea is embarrassingly simple: don't build a DOM tree.
-
File.slice(offset, offset + 16MB)reads a chunk -
TextDecoder({ stream: true })decodes UTF-8 correctly across chunk boundaries (this is the part everyone gets wrong — a multibyte character can land exactly on the boundary) - A custom SAX parser processes the chunk, firing
onOpenTag,onCloseTag,onTextevents - All of this runs in a Web Worker so the main thread stays free
- Worker sends progress updates via
postMessage, main thread renders a progress bar
Memory usage is ~20MB regardless of file size. A 2GB file uses the same RAM as a 2KB file.
What you can do with it:
- Stats: total elements, unique tags, attributes, max depth — computed in a single pass
- Search: filter by tag name, attribute name, attribute value, or text content. Results stream in real-time during parsing
- Element explorer: all tags listed by nesting depth. Click any tag to see its actual XML code with syntax highlighting. Navigate through up to 50 samples with ◀ ▶
- XML anatomy hint: the extension picks a representative element from your file and shows an interactive breakdown — what's a tag, what's an attribute, what's a value. Useful for non-dev users who receive XML exports
The SAX parser gotcha
I wrote a minimal SAX parser from scratch (~200 lines) instead of using sax-js because I needed it to:
- Handle
parser.write(chunk)for incremental feeding - Not allocate a tree
- Correctly handle CDATA, comments, PIs, and entity decoding across chunk boundaries
The trickiest part was self-closing tags like <Foo bar="1"/> — the / can end up in the next chunk if it lands on the boundary. The solution: the parser buffers incomplete tags until the closing > arrives.
Numbers from a real test:
| File | Size | Elements | Parse time | RAM |
|---|---|---|---|---|
| Hotel reservations | 1.8 GB | 2.4M | 3.4s | ~20MB |
| Product catalog | 890 MB | 1.1M | 1.7s | ~18MB |
| API log dump | 450 MB | 6.2M | 2.1s | ~16MB |
Stack: Vanilla JS, Web Workers, zero dependencies. The entire extension is 45KB.
Chrome Web Store link | Free, no tracking, all processing is local.
Would love feedback — especially if you have edge-case XML files that break things.
Top comments (0)