DEV Community

Valentina Sofia Leyes
Valentina Sofia Leyes

Posted on • Originally published at valen-leyes.github.io

How I Bypassed Browser RAM Limits to Format 1.5GB+ XML Files Using Streams & IndexedDB

Most standard online XML beautifiers fail or freeze your browser tab as soon as a file exceeds 10MB or 20MB.

This happens because traditional web tools load the entire text payload into the active DOM or RAM thread simultaneously. The result? Instant memory leaks.

To solve this, I built a 100% frontend, serverless solution. It handles massive database exports and raw logs directly in the sandbox browser tab without crashing.


⚡ The Architecture Pipeline

Instead of choking the system memory, the tool processes data sequentially:

[Massive XML File] ➡️ [File Stream Reader] ➡️ [Web Worker Parsing & Indentation] ➡️ [IndexedDB Async Queue Cache] ➡️ [Service Worker Interception Download]

1. Sequential Stream Chunking

  • What it does: Reads the raw input file asynchronously block-by-block using the native browser stream API (file.stream().getReader()).
  • The trick: It tracks processed bytes dynamically to update the UI progress bar without blocking the interface.

2. Offloading to Background Threads (Web Workers)

  • What it does: Heavy text transformation, token matching via RegExp, and indentation logic run entirely in a background supervisor thread (xmlWorker.js).
  • The trick: This isolates the heavy lifting and keeps the main user interface responsive at 60 FPS.

3. Paced Memory Flushing to Disk (IndexedDB)

  • What it does: Rather than accumulating strings in RAM, once the formatted text buffer hits a ~40MB threshold, it is encoded into binary (Uint8Array) and pushed to an operational queue.
  • The trick: A background synchronization loop breaks these down into optimized 4MB blocks (CHUNK_TARGET_SIZE) and saves them directly into IndexedDB.

4. Paced Delivery via Interception (Service Worker)

  • What it does: When you click download, a registered Service Worker (sw.js) intercepts the request routing (/download-stream-xml).
  • The trick: It streams the final payload to the browser's native download UI by reading data blocks directly from IndexedDB on-demand. Memory usage stays flat.

💻 Code Breakdown: The High-Performance Core Loop

Here is a look at how the main execution loop in xmlProcessor.js decodes binary data chunks, handles token line splitting, and periodically flushes text blocks into the binary memory queue to prevent RAM overload:

// Inside processXMLStream() - Running on the Web Worker thread
while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    bytesProcessed += value.length;
    onProgress(((bytesProcessed / file.size) * 100).toFixed(0));

    let text = decoder.decode(value, { stream: true });
    let chunkText = remainder + text;

    // Enforce proper token string tag closing boundary lines
    let lastCloseTag = chunkText.lastIndexOf('>');
    if (lastCloseTag !== -1) {
        remainder = chunkText.substring(lastCloseTag + 1);
        chunkText = chunkText.substring(0, lastCloseTag + 1);
    } else {
        // Fallback checks for open tags or structural limits...
        remainder = chunkText;
        continue;
    }

    // Tokenize and apply precise nested indentation levels
    let tokens = chunkText.match(tokenRegex);
    if (tokens) {
        let chunkResult = '';
        for (let i = 0; i < tokens.length; i++) {
            let token = tokens[i].trim();
            if (!token) continue;

            if (token.startsWith('</')) {
                indentLevel = Math.max(0, indentLevel - 1);
                chunkResult += ' '.repeat(indentLevel * indentSpaces) + token + lineEnding;
            } else if (token.startsWith('<') && !token.endsWith('/>') && !token.startsWith('<?')) {
                chunkResult += ' '.repeat(indentLevel * indentSpaces) + token + lineEnding;
                indentLevel++;
            } else {
                chunkResult += ' '.repeat(indentLevel * indentSpaces) + token + lineEnding;
            }
        }
        textBuffer += chunkResult;
    }

    // Periodically dump string cache buffers into binary memory slots (~40MB spans)
    if (textBuffer.length > 40 * 1024 * 1024) {
        const encoded = encoder.encode(textBuffer);
        totalFormattedBytes += encoded.length; 
        outputQueue.push(encoded);
        textBuffer = ''; 

        // Relinquish process priority frames temporarily to let worker breathe
        await new Promise(resolve => setTimeout(resolve, 0));
    }
}
Enter fullscreen mode Exit fullscreen mode

🔒 Enterprise-Grade Privacy by Default

Because the entire pipeline runs locally via client-side JavaScript (Uint8Array), confidential production database exports or sensitive enterprise configurations never touch a remote server.


🛠️ Try it out & Open Source

The project is completely free, client-side, and licensed under MIT.


I would love to hear your thoughts on this stream-parsing approach. How do you usually handle massive data configurations directly in the browser? Let's discuss in the comments!

Top comments (0)