DEV Community

Cover image for Converting Large JSON, NDJSON, CSV and XML Files without Blowing Up Memory
Bruno Hanss
Bruno Hanss

Posted on

Converting Large JSON, NDJSON, CSV and XML Files without Blowing Up Memory

Most of us have written something like this at some point:

const data = JSON.parse(hugeString);
Enter fullscreen mode Exit fullscreen mode

It works.

Until it doesn't.

At some point the file grows.\
50MB. 200MB. 1GB. 5GB.

And suddenly:

  • The tab freezes (in the browser)
  • Memory spikes
  • The process crashes
  • Or worse --- everything technically "works" but becomes unusable

This isn't a JavaScript problem.

It's a buffering problem.


The Real Issue: Buffering vs Streaming

Most parsing libraries operate in buffer mode:

  • Read the entire file into memory
  • Parse it completely
  • Return the result

That means memory usage scales with file size.

Streaming flips the model:

  • Read chunks
  • Process incrementally
  • Emit records progressively
  • Keep memory nearly constant

That architectural difference matters far more than micro-optimizations.


Why I Built a Streaming Converter

I've been working on a project called convert-buddy-js, a Rust-based
streaming conversion engine compiled to WebAssembly and exposed as a
JavaScript library.

It supports:

  • XML
  • CSV
  • JSON
  • NDJSON

The core goal was simple:

Keep memory usage flat, even as file size grows.

Not "be the fastest library ever."\
Just predictable. Stable. Bounded.


What Does "Low Memory" Actually Mean?

Here's an example from benchmarks converting XML → JSON.

Scenario Tool File Size Memory Usage
xml-large convert-buddy 38.41 MB ~0 MB change
xml-large fast-xml-parser 38.41 MB 377 MB

The difference is architectural.

The streaming engine processes elements incrementally instead of
constructing large intermediate structures.


CSV → JSON Benchmarks

I benchmarked against:

  • PapaParse\
  • csv-parse\
  • fast-csv

Here's a representative neutral case (1.26 MB CSV):

Tool Throughput
convert-buddy 75.96 MB/s
csv-parse 22.13 MB/s
PapaParse 19.57 MB/s
fast-csv 15.65 MB/s

In favorable large cases (13.52 MB CSV):

Tool Throughput
convert-buddy 91.88 MB/s
csv-parse 30.68 MB/s
PapaParse 24.69 MB/s
fast-csv 19.68 MB/s

In most CSV scenarios tested, the streaming approach resulted in roughly
3x--4x throughput improvements, with dramatically lower memory overhead.


Where Streaming Isn't Always Faster

For tiny NDJSON files, native JSON parsing can be faster.

Scenario Tool Throughput
NDJSON tiny Native JSON 27.10 MB/s
NDJSON tiny convert-buddy 10.81 MB/s

That's expected.

When files are extremely small, the overhead of streaming infrastructure
can outweigh benefits.\
Native JSON.parse is heavily optimized in engines and extremely
efficient for small payloads.

The goal here isn't to replace native JSON for everything.

It's to handle realistic and large workloads predictably.


NDJSON → JSON Performance

For medium nested NDJSON datasets:

Tool Throughput
convert-buddy 221.79 MB/s
Native JSON 136.84 MB/s

That's where streaming and incremental transformation shine ---
especially when the workload involves structured transformation rather
than just parsing.


What the Library Looks Like

Install:

npm install convert-buddy-js
Enter fullscreen mode Exit fullscreen mode

Then:

import { convert } from "convert-buddy-js";

const csv = 'name,age,city\nAlice,30,NYC\nBob,25,LA\nCarol,35,SF';

// Configure only what you need. Here we output NDJSON.
const buddy = new ConvertBuddy({ outputFormat: 'ndjson' });

// Stream conversion: records are emitted in batches.
const controller = buddy.stream(csv, {
  recordBatchSize: 2,

  // onRecords can be async: await inside it if you need (I/O, UI updates, writes...)
  onRecords: async (ctrl, records, stats, total) => {
    console.log('Batch received:', records);

    // Simulate slow async work (writing, rendering, uploading, etc.)
    await new Promise(r => setTimeout(r, 50));

    // Report progress (ctrl.* is the most reliable live state)
    console.log(
      `Progress: ${ctrl.recordCount} records, ${stats.throughputMbPerSec.toFixed(2)} MB/s`
    );
  },

  onDone: (final) => console.log('Done:', final),

  // Enable profiling stats (throughput, latency, memory estimates, etc.)
  profile: true
});

// Optional: await final stats / completion
const final = await controller.done;
console.log('Final stats:', final);
Enter fullscreen mode Exit fullscreen mode

It works in:

  • Node
  • Browser
  • Web Workers

Because the core engine is written in Rust and compiled to WebAssembly.


Why Rust + WebAssembly?

Not because it's trendy.

Because:

  • Predictable memory behavior
  • Strong streaming primitives
  • Deterministic performance
  • Easier control over allocations

WebAssembly allows that engine to run safely in the browser without
server uploads.


When This Tool Makes Sense

You probably don't need it if:

  • Files are always < 1MB
  • You're already happy with JSON.parse
  • You don't care about memory spikes

It makes sense if:

  • You process large CSV exports
  • You handle XML feeds
  • You work with NDJSON streams
  • You need conversion in the browser without uploads
  • You want predictable memory footprint

What I Learned Building It

  • Streaming is not just about speed --- it's about stability.
  • Benchmarks should include losses.
  • Native JSON.parse is hard to beat for tiny payloads.
  • Memory predictability matters more than peak throughput.

Closing Thoughts

There are many good parsing libraries in the JavaScript ecosystem.

PapaParse is mature.\
csv-parse is robust.\
Native JSON.parse is extremely optimized.

convert-buddy-js is simply an option focused on:

  • Streaming
  • Low memory usage
  • Format transformation
  • Large file handling

If that matches your constraints, it may be useful.

If not, the ecosystem already has excellent tools.

If you're curious, the full benchmarks and scenarios are available in
the repository.
convert-buddy-js — npm
brunohanss/convert-buddy

And if you have workloads where streaming would make a difference, I’d be interested in feedback.
You can get more information or try the interactive browser playground here: https://convert-buddy.app/

Top comments (1)

Collapse
 
nicow_lagnle_c938fa24c6a profile image
Nicow LaGnôle

Excellent write-up!! Thx ! The distinction between full in-memory parsing and true streaming pipelines is well demonstrated, especially with the cross-format benchmarks (CSV, NDJSON, XML…) The constant-memory behavior you’re achieving is particularly relevant for large-scale ETL and browser-side processing.

Using Rust + WebAssembly for deterministic performance and tighter memory control is a strong architectural choice. I’d be interested seeing more of that boy!
Very solid engineering approach. Keep going