I built a Chrome extension that stream-parses 2GB XML files using only 20MB of RAM. Here's the architecture.

#showdev #javascript #performance #architecture

The problem

I work with hotel reservation systems that dump SOAP/OTA XML responses — sometimes 1-2 GB per file. Every XML viewer I tried either crashed, froze the tab, or ran out of memory. Notepad++ tops out around 200MB. Browser-based XML viewers load everything into a DOM tree that eats 3-10x the file size in RAM. A 500MB file? That's 4GB of RAM just to render it.

The solution

I built XML Stream Parser — a Chrome extension that handles XML files up to 2GB without freezing your browser.

How it works (the interesting part)

The core idea is embarrassingly simple: don't build a DOM tree.

File.slice(offset, offset + 16MB) reads a chunk
TextDecoder({ stream: true }) decodes UTF-8 correctly across chunk boundaries (this is the part everyone gets wrong — a multibyte character can land exactly on the boundary)
A custom SAX parser processes the chunk, firing onOpenTag, onCloseTag, onText events
All of this runs in a Web Worker so the main thread stays free
Worker sends progress updates via postMessage, main thread renders a progress bar

Memory usage is ~20MB regardless of file size. A 2GB file uses the same RAM as a 2KB file.

What you can do with it:

Stats: total elements, unique tags, attributes, max depth — computed in a single pass
Search: filter by tag name, attribute name, attribute value, or text content. Results stream in real-time during parsing
Element explorer: all tags listed by nesting depth. Click any tag to see its actual XML code with syntax highlighting. Navigate through up to 50 samples with ◀ ▶
XML anatomy hint: the extension picks a representative element from your file and shows an interactive breakdown — what's a tag, what's an attribute, what's a value. Useful for non-dev users who receive XML exports

The SAX parser gotcha

I wrote a minimal SAX parser from scratch (~200 lines) instead of using sax-js because I needed it to:

Handle parser.write(chunk) for incremental feeding
Not allocate a tree
Correctly handle CDATA, comments, PIs, and entity decoding across chunk boundaries

The trickiest part was self-closing tags like <Foo bar="1"/> — the / can end up in the next chunk if it lands on the boundary. The solution: the parser buffers incomplete tags until the closing > arrives.

Numbers from a real test:

File	Size	Elements	Parse time	RAM
Hotel reservations	1.8 GB	2.4M	3.4s	~20MB
Product catalog	890 MB	1.1M	1.7s	~18MB
API log dump	450 MB	6.2M	2.1s	~16MB

Stack: Vanilla JS, Web Workers, zero dependencies. The entire extension is 45KB.

Chrome Web Store link | Free, no tracking, all processing is local.

Would love feedback — especially if you have edge-case XML files that break things.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.