Most of us have written something like this at some point:
const data = JSON.parse(hugeString);
It works.
Until it doesn't.
At some point the file grows.\
50MB. 200MB. 1GB. 5GB.
And suddenly:
- The tab freezes (in the browser)
- Memory spikes
- The process crashes
- Or worse --- everything technically "works" but becomes unusable
This isn't a JavaScript problem.
It's a buffering problem.
The Real Issue: Buffering vs Streaming
Most parsing libraries operate in buffer mode:
- Read the entire file into memory
- Parse it completely
- Return the result
That means memory usage scales with file size.
Streaming flips the model:
- Read chunks
- Process incrementally
- Emit records progressively
- Keep memory nearly constant
That architectural difference matters far more than micro-optimizations.
Why I Built a Streaming Converter
I've been working on a project called convert-buddy-js, a Rust-based
streaming conversion engine compiled to WebAssembly and exposed as a
JavaScript library.
It supports:
- XML
- CSV
- JSON
- NDJSON
The core goal was simple:
Keep memory usage flat, even as file size grows.
Not "be the fastest library ever."\
Just predictable. Stable. Bounded.
What Does "Low Memory" Actually Mean?
Here's an example from benchmarks converting XML → JSON.
| Scenario | Tool | File Size | Memory Usage |
|---|---|---|---|
| xml-large | convert-buddy | 38.41 MB | ~0 MB change |
| xml-large | fast-xml-parser | 38.41 MB | 377 MB |
The difference is architectural.
The streaming engine processes elements incrementally instead of
constructing large intermediate structures.
CSV → JSON Benchmarks
I benchmarked against:
- PapaParse\
- csv-parse\
- fast-csv
Here's a representative neutral case (1.26 MB CSV):
| Tool | Throughput |
|---|---|
| convert-buddy | 75.96 MB/s |
| csv-parse | 22.13 MB/s |
| PapaParse | 19.57 MB/s |
| fast-csv | 15.65 MB/s |
In favorable large cases (13.52 MB CSV):
| Tool | Throughput |
|---|---|
| convert-buddy | 91.88 MB/s |
| csv-parse | 30.68 MB/s |
| PapaParse | 24.69 MB/s |
| fast-csv | 19.68 MB/s |
In most CSV scenarios tested, the streaming approach resulted in roughly
3x--4x throughput improvements, with dramatically lower memory overhead.
Where Streaming Isn't Always Faster
For tiny NDJSON files, native JSON parsing can be faster.
| Scenario | Tool | Throughput |
|---|---|---|
| NDJSON tiny | Native JSON | 27.10 MB/s |
| NDJSON tiny | convert-buddy | 10.81 MB/s |
That's expected.
When files are extremely small, the overhead of streaming infrastructure
can outweigh benefits.\
Native JSON.parse is heavily optimized in engines and extremely
efficient for small payloads.
The goal here isn't to replace native JSON for everything.
It's to handle realistic and large workloads predictably.
NDJSON → JSON Performance
For medium nested NDJSON datasets:
| Tool | Throughput |
|---|---|
| convert-buddy | 221.79 MB/s |
| Native JSON | 136.84 MB/s |
That's where streaming and incremental transformation shine ---
especially when the workload involves structured transformation rather
than just parsing.
What the Library Looks Like
Install:
npm install convert-buddy-js
Then:
import { convert } from "convert-buddy-js";
const csv = 'name,age,city\nAlice,30,NYC\nBob,25,LA\nCarol,35,SF';
// Configure only what you need. Here we output NDJSON.
const buddy = new ConvertBuddy({ outputFormat: 'ndjson' });
// Stream conversion: records are emitted in batches.
const controller = buddy.stream(csv, {
recordBatchSize: 2,
// onRecords can be async: await inside it if you need (I/O, UI updates, writes...)
onRecords: async (ctrl, records, stats, total) => {
console.log('Batch received:', records);
// Simulate slow async work (writing, rendering, uploading, etc.)
await new Promise(r => setTimeout(r, 50));
// Report progress (ctrl.* is the most reliable live state)
console.log(
`Progress: ${ctrl.recordCount} records, ${stats.throughputMbPerSec.toFixed(2)} MB/s`
);
},
onDone: (final) => console.log('Done:', final),
// Enable profiling stats (throughput, latency, memory estimates, etc.)
profile: true
});
// Optional: await final stats / completion
const final = await controller.done;
console.log('Final stats:', final);
It works in:
- Node
- Browser
- Web Workers
Because the core engine is written in Rust and compiled to WebAssembly.
Why Rust + WebAssembly?
Not because it's trendy.
Because:
- Predictable memory behavior
- Strong streaming primitives
- Deterministic performance
- Easier control over allocations
WebAssembly allows that engine to run safely in the browser without
server uploads.
When This Tool Makes Sense
You probably don't need it if:
- Files are always < 1MB
- You're already happy with JSON.parse
- You don't care about memory spikes
It makes sense if:
- You process large CSV exports
- You handle XML feeds
- You work with NDJSON streams
- You need conversion in the browser without uploads
- You want predictable memory footprint
What I Learned Building It
- Streaming is not just about speed --- it's about stability.
- Benchmarks should include losses.
- Native JSON.parse is hard to beat for tiny payloads.
- Memory predictability matters more than peak throughput.
Closing Thoughts
There are many good parsing libraries in the JavaScript ecosystem.
PapaParse is mature.\
csv-parse is robust.\
Native JSON.parse is extremely optimized.
convert-buddy-js is simply an option focused on:
- Streaming
- Low memory usage
- Format transformation
- Large file handling
If that matches your constraints, it may be useful.
If not, the ecosystem already has excellent tools.
If you're curious, the full benchmarks and scenarios are available in
the repository.
convert-buddy-js — npm
brunohanss/convert-buddy
And if you have workloads where streaming would make a difference, I’d be interested in feedback.
You can get more information or try the interactive browser playground here: https://convert-buddy.app/
Top comments (1)
Excellent write-up!! Thx ! The distinction between full in-memory parsing and true streaming pipelines is well demonstrated, especially with the cross-format benchmarks (CSV, NDJSON, XML…) The constant-memory behavior you’re achieving is particularly relevant for large-scale ETL and browser-side processing.
Using Rust + WebAssembly for deterministic performance and tighter memory control is a strong architectural choice. I’d be interested seeing more of that boy!
Very solid engineering approach. Keep going