DEV Community

Cover image for πŸ”₯ 1BRC in Node.js: From 12 Minutes to 35 Seconds
Ayush
Ayush

Posted on • Edited on

πŸ”₯ 1BRC in Node.js: From 12 Minutes to 35 Seconds

I took on the 1 Billion Row Challenge (1BRC) β€” but with Node.js. The "event loop" language that is not the first choice when crunching raw numbers.

🧠 TL;DR

Stage Time
🐒 Baseline 12:06
πŸ”₯ Final 0:35
πŸ“ˆ Speedup 1957%

I squeezed out sizeable performance gains out of Node with buffer math, manual parsing, worker threads β€” even byte-level micro-ops.


🎯 The Challenge

You're given a file with 1 billion lines, each like:

StationName;Temperature\n
Enter fullscreen mode Exit fullscreen mode

You need to compute per station:

  • Minimum temperature
  • Maximum temperature
  • Average temperature

And you need to do it fast.


βš™οΈ System Configuration & Setup

Here’s my system details that was used for benchmarking and optimizations:

  • Machine: Windows 11 (x64)
  • CPU: 2.5 GHz, 10 cores, 12 logical threads, (Intel i5, 12th Gen)
  • RAM: 8 GB
  • Disk: 256GB NVMe SSD
  • Node.js version: v22.x (LTS)

β›½ Just Reading the File (No Work)

To measure the disk I/O floor, I timed a pure file read β€” no parsing or processing.

⏱️ 13 seconds
Enter fullscreen mode Exit fullscreen mode

This was my physical lower limit; my goal. Here's the minimal benchmark:

const fs = require("fs");

const filePath = process.argv[2];
const bufferSize = 64 * 1024; // 64 KB
const buffer = Buffer.alloc(bufferSize);

const fd = fs.openSync(filePath, "r");

let totalBytesRead = 0;
let bytesRead = 0;

console.time("diskRead");

do {
  bytesRead = fs.readSync(fd, buffer, 0, bufferSize, null);
  totalBytesRead += bytesRead;
} while (bytesRead > 0);

console.timeEnd("diskRead");

fs.closeSync(fd);

console.log(`Read ${totalBytesRead} bytes`);
console.log(`Read speed: ${(totalBytesRead / (1024 * 1024)).toFixed(2)} MB`);
Enter fullscreen mode Exit fullscreen mode

πŸ§ͺ Starting Baseline

The first working version provided in repo:

  • Single thread
  • Used readline with for...await
  • Parsed strings with split(';'), used parseFloat & .toFixed()
  • Used Map for aggregation
⏱️ 12:06
Enter fullscreen mode Exit fullscreen mode

Let’s optimize.


🧡 Parallelism β€” Low hanging fruit

1 billion CPU bound calculations on a multi-cores machine? Worker Threads were the first upgrade.

The trick here was to divide the file equally while keeping the lines intact. This was done with the below method.

const calculateOffset = async (start, end, fileHandle) => {
  const { buffer } = await fileHandle.read({
    buffer: Buffer.alloc(MAX_LINE_LENGTH),
    length: MAX_LINE_LENGTH,
    position: end,
  });
  const diff = buffer.indexOf(10);
  return { start, end, diff };
};
Enter fullscreen mode Exit fullscreen mode

After, each worker got their slice to process. πŸš€

Workers Time
With 4 Workers 2:35
With 12 Workers (system core count) 1:41
With 14, 18 Workers Perf degradation

🧠 Learning: Since the data is locally sourced, the context switching overhead between threads started having a negative impact. And worker threads make a HUGE difference.


⚑Loop optimizations

Once threading was established, I started chipping away at the main loop:

  • Switched to .on('data') from readline iterator - ⏱️1:19
for await (const line of lineStream) {}
// to
lineStream.on("line", (line) => {}
Enter fullscreen mode Exit fullscreen mode
  • Switched to manual byte stream parsing readStream.on('data') - ⏱️1:02

Here, I started working with the byte chunk processing using processChunk method. Still using higher-order functions like toString and parseFloat

  • Improvements that gave minor gains - ⏱️1:00
    • Replaced repetitive chunk[i] reference with const c = chunk[i]
    • Removed .toFixed()

At this point, I started searching for what processing takes the most time & resources. I searched the web and tried to make sense of flame graphs (which I got working at this point, finally)

Consistent answer - string processing & float arithmetic

  • Replaced parseFloat with integer math - ⏱️0:35βœ…

Replaced with byte-based float parsing (scaled to tenths):

const parseBufferToDigit = (byte) => byte - 0x30;

const parseNumber = (length) => {
  if (number[0] === 0x2d) { // negative
    return length === 5
      ? -(parseBufferToDigit(number[1]) * 100 +
          parseBufferToDigit(number[2]) * 10 +
          parseBufferToDigit(number[4]))
      : -(parseBufferToDigit(number[1]) * 10 +
          parseBufferToDigit(number[3]));
  } else {
    return length === 3
      ? (parseBufferToDigit(number[0]) * 10 +
         parseBufferToDigit(number[2]))
      : (parseBufferToDigit(number[0]) * 100 +
         parseBufferToDigit(number[1]) * 10 +
         parseBufferToDigit(number[3]));
  }
};
Enter fullscreen mode Exit fullscreen mode

❌ What Didn't Work

I tried some other things that failed (but worth noting):

  • β›” Hashed station names: Collisions or performance. A fast function that concatenated/hashed ascii codes was much faster than .toString() (0:19s), but had a large amount of collisions. A larger hash absolutely hammered the performance. So, no straight forward solution found.
  • β›” Float32Array, Int32Array for aggregation instead of objects: interestingly degraded performance
  • β›” Using >12 threads (my core count): overhead cancelled benefit
  • β›” Replaced Math.min/max with ternary comparison: lost ~2s

πŸ’» Windows + WSL Gotchas

(not so) Fun roadblocks along the way:

  • Generator script failed with Java 24, specifically needed Java 21
  • Maven build failed: used the flag -Dlicense.skip to bypass license plugin
  • Switched to WSL for scripts and time: the path performance was horrible β€” sync file read in CMD: 22s vs WSL: 4 minutes 😡

12 threads benchmarking was +1:00 minute till I figured this out. Switched to Powershell to use Measure-Command {}

  • clinic profiler DID NOT work in PowerShell: Switched to CMD. This one took a looooong time. Ended up relying on a single console.time which worked consistently.

πŸ“’ Acknowledgments

πŸ” This project draws inspiration from Edgar-P-Yan’s excellent 1BRC repo.
Some parsing techniques were adapted from his implementation.
This was an independent learning project, and I did not submit this to the official 1brc leaderboard.


πŸ”— GitHub

πŸ“¦ Code & scripts:

πŸ‘‰ GitHub Repo
🏎️ 1BRC Node Repo


🧡 Final Thoughts

This wasn’t just about speeding up Node.js (ok, maybe it was) β€” but it was also about discovering what makes it tick.

  • CPU-bound work? Use worker_threads
  • Strings are expensive β€” avoid until the last moment
  • Floats are expensive β€” avoid in hot path, convert later
  • Every byte and cycle matters (literally)

I walked away with crashes, a laptop trying to take off 🚁 and a deep satisfaction.

Yet, I feel there is still more to be done here. I will keep trying to push these numbers so feel free to share insights, ideas or observations.

Thanks for reading!

Top comments (0)