I took on the 1 Billion Row Challenge (1BRC) β but with Node.js. The "event loop" language that is not the first choice when crunching raw numbers.
π§ TL;DR
Stage | Time |
---|---|
π’ Baseline | 12:06 |
π₯ Final | 0:35 |
π Speedup | 1957% |
I squeezed out sizeable performance gains out of Node with buffer math, manual parsing, worker threads β even byte-level micro-ops.
π― The Challenge
You're given a file with 1 billion lines, each like:
StationName;Temperature\n
You need to compute per station:
- Minimum temperature
- Maximum temperature
- Average temperature
And you need to do it fast.
βοΈ System Configuration & Setup
Hereβs my system details that was used for benchmarking and optimizations:
- Machine: Windows 11 (x64)
- CPU: 2.5 GHz, 10 cores, 12 logical threads, (Intel i5, 12th Gen)
- RAM: 8 GB
- Disk: 256GB NVMe SSD
- Node.js version: v22.x (LTS)
β½ Just Reading the File (No Work)
To measure the disk I/O floor, I timed a pure file read β no parsing or processing.
β±οΈ 13 seconds
This was my physical lower limit; my goal. Here's the minimal benchmark:
const fs = require("fs");
const filePath = process.argv[2];
const bufferSize = 64 * 1024; // 64 KB
const buffer = Buffer.alloc(bufferSize);
const fd = fs.openSync(filePath, "r");
let totalBytesRead = 0;
let bytesRead = 0;
console.time("diskRead");
do {
bytesRead = fs.readSync(fd, buffer, 0, bufferSize, null);
totalBytesRead += bytesRead;
} while (bytesRead > 0);
console.timeEnd("diskRead");
fs.closeSync(fd);
console.log(`Read ${totalBytesRead} bytes`);
console.log(`Read speed: ${(totalBytesRead / (1024 * 1024)).toFixed(2)} MB`);
π§ͺ Starting Baseline
The first working version provided in repo:
- Single thread
- Used
readline
withfor...await
- Parsed strings with
split(';')
, usedparseFloat
&.toFixed()
- Used
Map
for aggregation
β±οΈ 12:06
Letβs optimize.
π§΅ Parallelism β Low hanging fruit
1 billion CPU bound calculations on a multi-cores machine? Worker Threads were the first upgrade.
The trick here was to divide the file equally while keeping the lines intact. This was done with the below method.
const calculateOffset = async (start, end, fileHandle) => {
const { buffer } = await fileHandle.read({
buffer: Buffer.alloc(MAX_LINE_LENGTH),
length: MAX_LINE_LENGTH,
position: end,
});
const diff = buffer.indexOf(10);
return { start, end, diff };
};
After, each worker got their slice to process. π
Workers | Time |
---|---|
With 4 Workers | 2:35 |
With 12 Workers (system core count) | 1:41 |
With 14, 18 Workers | Perf degradation |
π§ Learning: Since the data is locally sourced, the context switching overhead between threads started having a negative impact. And worker threads make a HUGE difference.
β‘Loop optimizations
Once threading was established, I started chipping away at the main loop:
-
Switched to
.on('data')
from readline iterator - β±οΈ1:19
for await (const line of lineStream) {}
// to
lineStream.on("line", (line) => {}
-
Switched to manual byte stream parsing
readStream.on('data')
- β±οΈ1:02
Here, I started working with the byte chunk processing using processChunk
method. Still using higher-order functions like toString
and parseFloat
-
Improvements that gave minor gains - β±οΈ1:00
- Replaced repetitive
chunk[i]
reference withconst c = chunk[i]
- Removed
.toFixed()
- Replaced repetitive
At this point, I started searching for what processing takes the most time & resources. I searched the web and tried to make sense of flame graphs (which I got working at this point, finally)
Consistent answer - string processing & float arithmetic
-
Replaced
parseFloat
withinteger
math - β±οΈ0:35β
Replaced with byte-based float parsing (scaled to tenths):
const parseBufferToDigit = (byte) => byte - 0x30;
const parseNumber = (length) => {
if (number[0] === 0x2d) { // negative
return length === 5
? -(parseBufferToDigit(number[1]) * 100 +
parseBufferToDigit(number[2]) * 10 +
parseBufferToDigit(number[4]))
: -(parseBufferToDigit(number[1]) * 10 +
parseBufferToDigit(number[3]));
} else {
return length === 3
? (parseBufferToDigit(number[0]) * 10 +
parseBufferToDigit(number[2]))
: (parseBufferToDigit(number[0]) * 100 +
parseBufferToDigit(number[1]) * 10 +
parseBufferToDigit(number[3]));
}
};
β What Didn't Work
I tried some other things that failed (but worth noting):
- β Hashed station names: Collisions or performance. A fast function that concatenated/hashed ascii codes was much faster than
.toString()
(0:19s), but had a large amount of collisions. A larger hash absolutely hammered the performance. So, no straight forward solution found. - β Float32Array, Int32Array for aggregation instead of objects: interestingly degraded performance
- β Using >12 threads (my core count): overhead cancelled benefit
- β Replaced Math.min/max with ternary comparison: lost ~2s
π» Windows + WSL Gotchas
(not so) Fun roadblocks along the way:
- Generator script failed with Java 24, specifically needed Java 21
- Maven build failed: used the flag
-Dlicense.skip
to bypass license plugin - Switched to WSL for scripts and
time
: the path performance was horrible β sync file read in CMD: 22s vs WSL: 4 minutes π΅
12 threads benchmarking was +1:00 minute till I figured this out. Switched to Powershell to use Measure-Command {}
-
clinic
profiler DID NOT work in PowerShell: Switched to CMD. This one took a looooong time. Ended up relying on a singleconsole.time
which worked consistently.
π’ Acknowledgments
π This project draws inspiration from Edgar-P-Yanβs excellent 1BRC repo.
Some parsing techniques were adapted from his implementation.
This was an independent learning project, and I did not submit this to the official 1brc leaderboard.
π GitHub
π¦ Code & scripts:
π GitHub Repo
ποΈ 1BRC Node Repo
π§΅ Final Thoughts
This wasnβt just about speeding up Node.js (ok, maybe it was) β but it was also about discovering what makes it tick.
- CPU-bound work? Use
worker_threads
- Strings are expensive β avoid until the last moment
- Floats are expensive β avoid in hot path, convert later
- Every byte and cycle matters (literally)
I walked away with crashes, a laptop trying to take off π and a deep satisfaction.
Yet, I feel there is still more to be done here. I will keep trying to push these numbers so feel free to share insights, ideas or observations.
Thanks for reading!
Top comments (0)