DEV Community

rouguemuse
rouguemuse

Posted on

How I streamed Gigabytes of Data directly to Disk using the browser's File System API

Generating massive datasets for database benchmarking used to require downloading C source code, installing a compiler, and running terminal commands. I wanted to build a web app that could generate standard 100-byte Sort Benchmark records instantly—with zero installation.

But I immediately hit a wall: How do you generate 10 Gigabytes of data in a web browser without crashing the user's RAM?

If you try to hold 10GB of strings in a JavaScript array, the browser tab will immediately run out of memory and crash.

Here is how I solved it using the modern File System Access API.

The Solution: Direct-to-Disk Streaming Instead of storing the generated text in memory, I built an engine that asks the user for a save location first using window.showSaveFilePicker().

Once the browser creates a FileSystemWritableFileStream, my engine goes to work. It generates the data in small 1-Megabyte chunks, writes that chunk directly to the hard drive, and immediately flushes it from memory.

The result? You can generate an infinite amount of data directly to your hard drive, and your browser's RAM usage never exceeds a few megabytes.

Building the GenSort Engine Using this architecture, I ported the logic for the famous gensort benchmark tool into JavaScript. Now, anyone who needs to stress-test a database, build an ETL pipeline, or run a TeraSort benchmark can generate highly-specific dummy data directly from a website.

You don't need a backend server. You don't need AWS hosting. 100% of the computation and file writing happens locally on the user's machine.

Try it yourself You can test the generation engine here: https://gen-sort-web-demo.vercel.app/

If you want to look at how I structured the streaming API, I open-sourced the UI and the demo generator here: https://github.com/rouguemuse/GenSort-web-demo
And if you want to bypass the demo limits and use the fully uncrippled source code for your own massive enterprise benchmarking, you can grab the Pro Source Code here: https://contextmuse.lemonsqueezy.com/checkout/buy/93872964-d70e-4051-accb-3335206379d0

Top comments (0)