Ayush

Posted on Jul 29 • Edited on Aug 17

🔥 1BRC in Node.js: From 12 Minutes to 35 Seconds

#node #1brc

I took on the 1 Billion Row Challenge (1BRC) — but with Node.js. The "event loop" language that is not the first choice when crunching raw numbers.

🧠 TL;DR

Stage	Time
🐢 Baseline	12:06
🔥 Final	0:35
📈 Speedup	1957%

I squeezed out sizeable performance gains out of Node with buffer math, manual parsing, worker threads — even byte-level micro-ops.

🎯 The Challenge

You're given a file with 1 billion lines, each like:

StationName;Temperature\n

You need to compute per station:

Minimum temperature
Maximum temperature
Average temperature

And you need to do it fast.

⚙️ System Configuration & Setup

Here’s my system details that was used for benchmarking and optimizations:

Machine: Windows 11 (x64)
CPU: 2.5 GHz, 10 cores, 12 logical threads, (Intel i5, 12th Gen)
RAM: 8 GB
Disk: 256GB NVMe SSD
Node.js version: v22.x (LTS)

⛽ Just Reading the File (No Work)

To measure the disk I/O floor, I timed a pure file read — no parsing or processing.

⏱️ 13 seconds

This was my physical lower limit; my goal. Here's the minimal benchmark:

const fs = require("fs");

const filePath = process.argv[2];
const bufferSize = 64 * 1024; // 64 KB
const buffer = Buffer.alloc(bufferSize);

const fd = fs.openSync(filePath, "r");

let totalBytesRead = 0;
let bytesRead = 0;

console.time("diskRead");

do {
  bytesRead = fs.readSync(fd, buffer, 0, bufferSize, null);
  totalBytesRead += bytesRead;
} while (bytesRead > 0);

console.timeEnd("diskRead");

fs.closeSync(fd);

console.log(`Read ${totalBytesRead} bytes`);
console.log(`Read speed: ${(totalBytesRead / (1024 * 1024)).toFixed(2)} MB`);

🧪 Starting Baseline

The first working version provided in repo:

Single thread
Used readline with for...await
Parsed strings with split(';'), used parseFloat & .toFixed()
Used Map for aggregation

⏱️ 12:06

Let’s optimize.

🧵 Parallelism — Low hanging fruit

1 billion CPU bound calculations on a multi-cores machine? Worker Threads were the first upgrade.

The trick here was to divide the file equally while keeping the lines intact. This was done with the below method.

const calculateOffset = async (start, end, fileHandle) => {
  const { buffer } = await fileHandle.read({
    buffer: Buffer.alloc(MAX_LINE_LENGTH),
    length: MAX_LINE_LENGTH,
    position: end,
  });
  const diff = buffer.indexOf(10);
  return { start, end, diff };
};

After, each worker got their slice to process. 🚀

Workers	Time
With 4 Workers	2:35
With 12 Workers (system core count)	1:41
With 14, 18 Workers	Perf degradation

🧠 Learning: Since the data is locally sourced, the context switching overhead between threads started having a negative impact. And worker threads make a HUGE difference.

⚡Loop optimizations

Once threading was established, I started chipping away at the main loop:

Switched to .on('data') from readline iterator - ⏱️1:19

for await (const line of lineStream) {}
// to
lineStream.on("line", (line) => {}

Switched to manual byte stream parsing readStream.on('data') - ⏱️1:02

Here, I started working with the byte chunk processing using processChunk method. Still using higher-order functions like toString and parseFloat

Improvements that gave minor gains - ⏱️1:00
- Replaced repetitive chunk[i] reference with const c = chunk[i]
- Removed .toFixed()

At this point, I started searching for what processing takes the most time & resources. I searched the web and tried to make sense of flame graphs (which I got working at this point, finally)

Consistent answer - string processing & float arithmetic

Replaced parseFloat with integer math - ⏱️0:35✅

Replaced with byte-based float parsing (scaled to tenths):

const parseBufferToDigit = (byte) => byte - 0x30;

const parseNumber = (length) => {
  if (number[0] === 0x2d) { // negative
    return length === 5
      ? -(parseBufferToDigit(number[1]) * 100 +
          parseBufferToDigit(number[2]) * 10 +
          parseBufferToDigit(number[4]))
      : -(parseBufferToDigit(number[1]) * 10 +
          parseBufferToDigit(number[3]));
  } else {
    return length === 3
      ? (parseBufferToDigit(number[0]) * 10 +
         parseBufferToDigit(number[2]))
      : (parseBufferToDigit(number[0]) * 100 +
         parseBufferToDigit(number[1]) * 10 +
         parseBufferToDigit(number[3]));
  }
};

❌ What Didn't Work

I tried some other things that failed (but worth noting):

⛔ Hashed station names: Collisions or performance. A fast function that concatenated/hashed ascii codes was much faster than .toString() (0:19s), but had a large amount of collisions. A larger hash absolutely hammered the performance. So, no straight forward solution found.
⛔ Float32Array, Int32Array for aggregation instead of objects: interestingly degraded performance
⛔ Using >12 threads (my core count): overhead cancelled benefit
⛔ Replaced Math.min/max with ternary comparison: lost ~2s

💻 Windows + WSL Gotchas

(not so) Fun roadblocks along the way:

Generator script failed with Java 24, specifically needed Java 21
Maven build failed: used the flag -Dlicense.skip to bypass license plugin
Switched to WSL for scripts and time: the path performance was horrible — sync file read in CMD: 22s vs WSL: 4 minutes 😵

12 threads benchmarking was +1:00 minute till I figured this out. Switched to Powershell to use Measure-Command {}

clinic profiler DID NOT work in PowerShell: Switched to CMD. This one took a looooong time. Ended up relying on a single console.time which worked consistently.

📢 Acknowledgments

🔍 This project draws inspiration from Edgar-P-Yan’s excellent 1BRC repo.
Some parsing techniques were adapted from his implementation.
This was an independent learning project, and I did not submit this to the official 1brc leaderboard.

🔗 GitHub

📦 Code & scripts:

👉 GitHub Repo
🏎️ 1BRC Node Repo

🧵 Final Thoughts

This wasn’t just about speeding up Node.js (ok, maybe it was) — but it was also about discovering what makes it tick.

CPU-bound work? Use worker_threads
Strings are expensive — avoid until the last moment
Floats are expensive — avoid in hot path, convert later
Every byte and cycle matters (literally)

I walked away with crashes, a laptop trying to take off 🚁 and a deep satisfaction.

Yet, I feel there is still more to be done here. I will keep trying to push these numbers so feel free to share insights, ideas or observations.

Thanks for reading!

DEV Community