Boris Barac

Posted on Jun 12

When JavaScript Isn't Fast Enough

#javascript #rust #api #benchmark

500 requests per second. One minute. An Express endpoint computing BTC technical indicators over 3.6 million data points.

Node.js folded. Average latency: 3,318 ms. p95: 8.4 seconds. 15.5% of requests failed. The median response crawled in at 1.6 seconds. The JavaScript computation pipeline couldn't keep up.

Same server. Same load. Same data. One change: the /price-rust endpoint, backed by a Rust addon compiled to a .node binary via NAPI.RS. Average latency: 660 ms. p95: 2 seconds. Zero errors. Peak RSS: 283 MB — less memory than the JS path.

Same Express process, same port, same payload. The only difference was who did the math.

What the API BENCHMARK actually does

The benchmark is a BTC price analysis endpoint. We synthetically generate 360 000 datapoints.
On that dataset the server computes SMA (25/50/100/200-day windows), RSI (14-period), MACD (12/26/9), Bollinger Bands (20-period), and a composite trading signal built from golden/death crosses, RSI divergence, MACD crossovers, and Bollinger squeeze detection.

Branching, windowing, stateful accumulation, multi-indicator correlation. The kind of workload that makes a runtime earn its keep — or not.

The full picture

The HTTP benchmark shows what happens under real load. The function benchmark isolates raw computation from HTTP overhead, request queuing, and GC pressure.

HTTP — k6, 500 req/s, 1 minute

Runtime	Endpoint	Avg latency	p95	Throughput	Errors	Peak RSS
Node.js	`/price` (JS)	3,318 ms	8,437 ms	62.4 rps	15.5%	292 MB
Node.js	`/price-rust` (N-API)	660 ms	2,061 ms	35.6 rps	0%	283 MB
Bun	`/price` (JS)	1,845 ms	5,146 ms	33.4 rps	0%	470 MB
Bun	`/price-rust` (N-API)	642 ms	2,412 ms	32.1 rps	0%	259 MB

Node.js on JS is the only combination that failed requests. Every other configuration held — including Bun running the exact same JavaScript. Bun's JS path is 1.8x faster than Node's and dropped zero requests.

The Rust paths are nearly identical across runtimes: 660 ms on Node, 642 ms on Bun. The runtime barely matters when Rust is doing the work.

Function — mitata, 366K data points

Variant	Runtime	ops/sec	Avg (ms)	Speedup vs JS
JS	Node.js	20	49.11	1x
Native (N-API)	Node.js	63	15.95	3.1x
JS	Bun	28	36.10	1x
Native (N-API)	Bun	69	14.51	2.5x
JS	Browser (Chromium)	26	38.04	1x
Native (WASM)	Browser (Chromium)	56	17.71	2.2x

Three runtimes, three JS baselines, all within a narrow band: 20–28 ops/sec. The Rust speedup is consistent — ~3x on Node, ~2.5x on Bun, ~2.2x on WASM.

WASM in the browser hits 56 ops/sec — closer to native N-API (63–69) than to any JS baseline. That's worth sitting with for a moment.

Sync vs async N-API made no measurable difference (63 vs 63 on Node, 69 vs 68 on Bun). The async variant offloads to a Tokio thread pool via spawn_blocking, but the bottleneck is the math, not the calling convention.

How a Rust function becomes a Node endpoint

The binding layer is thinner than you'd expect. napi-rs is the bridge: annotate a Rust function with #[napi], compile to a .node binary, and Node can require() it like any other module.

Here's the JS moving average function — the one that runs in every request to /price:

export function calculateMovingAverages(prices, smaWindows, cutoffYears = 9) {
  const cutoffIndex = Math.max(0, prices.length - cutoffYears * 365);
  const movingAverages = [];
  const runningSums = new Array(smaWindows.length).fill(0);

  for (let i = 0; i < prices.length; i++) {
    const price = prices[i][1];
    for (let wi = 0; wi < smaWindows.length; wi++) {
      const w = smaWindows[wi];
      runningSums[wi] += price;
      if (i >= w) runningSums[wi] -= prices[i - w][1];
    }
    if (i < cutoffIndex) continue;
    // ... build entry with date, price, SMA values
  }
  return movingAverages;
}

And here's the Rust equivalent, exposed to Node:

#[napi]
pub fn calculate_moving_averages(
    prices: Float64Array,
    sma_windows: Vec<u32>,
    cutoff_years: Option<u32>,
) -> Buffer {
    let cutoff_years = cutoff_years.unwrap_or(9);
    let dates = crate::utils::precompute_dates(&prices);
    let result = calc_ma(&prices, &sma_windows, cutoff_years, &dates);
    Buffer::from(serde_json::to_vec(&result).unwrap())
}

That #[napi] macro generates the C FFI glue Node's N-API runtime expects. It handles type conversion between V8 values and Rust types, and registers the function as a module export.

Float64Array comes in as a borrowed &[f64] slice with no copy. Buffer goes out as raw bytes, skipping V8's UTF-16 string conversion entirely.

On the JS side, loading the addon is one line:

const native = require("./napibench-native.node");

Then native.calculateMovingAverages(prices, [25, 50, 100, 200]) calls directly into compiled Rust. No HTTP, no IPC, no child process. Same call stack, same thread.

Need async? The async variant parks the work on a Tokio thread pool and returns a Promise. Same function, different concurrency model.

The server wires it into Express the same way the JS endpoint is wired. Same req → compute → res flow. The handler just calls a different function.

Same Rust, different planet

The Cargo.toml has a feature gate:

[features]
default = ["napi", "napi-derive", "rayon", "napi-build"]
wasm = ["wasm-bindgen", "js-sys"]

Build with the default features, you get napi_impl.rs — N-API bindings, rayon parallelism, tokio async. Build with --no-default-features --features wasm, you get wasm.rs — the same indicator logic compiled to WebAssembly via wasm-bindgen.

The WASM build strips away some optimizations. No rayon, no async thread pool, no Buffer returns — the WASM function returns a JSON string. It's the simpler, less-tuned version of the Rust code.

And it still hits 56 ops/sec in headless Chromium. That's 2.2x faster than Chromium's own JS engine running the same algorithms, and it's closer to native N-API performance (63–69 ops/sec) than to any JS baseline (20–28 ops/sec).

The bulk of the speedup comes from Rust's compiler — LLVM optimizing tight loops, stack-allocated structs, no GC pauses, no hidden type checks — not from rayon or async thread pools. Those help, but the floor is already high.

The browser benchmark ran in headless Chromium via Playwright: the WASM module and JS indicator code, same 366K-data-point pipeline, same 5-second window. The WASM binary is produced by wasm-pack build --target web.

Because napi-rs makes Rust functions callable like JS functions, the existing JS test suite doubles as a parity test for Rust. The test imports the JS indicator functions alongside the native addon, runs the same input through both, and asserts the outputs match field by field:

const jsMa = calculateMovingAverages(pricesJs, [25, 50, 100, 200], 1);
const rustMa = decode(native.calculateMovingAverages(prices, [25, 50, 100, 200], 1));
expect(rustMa).toHaveLength(jsMa.length);
for (let i = 0; i < jsMa.length; i++) {
  expect(rustMa[i].price).toBe(jsMa[i].price);
}

The JS implementation becomes the test oracle for the Rust implementation. The test runner (vitest) doesn't know or care that one side is native.

When to reach for Rust

Three runtimes. Two languages. One workload. Here's what the numbers actually say.

Reach for Rust when you're CPU-bound. This workload is arithmetic-heavy, loop-heavy, and largely sequential — the kind of thing V8 and JavaScriptCore optimize well but can't match against LLVM.

The 2.2–3x speedup appeared across every runtime, calling convention, and optimization level. If your server spends most of its time computing — parsing, transforming, aggregating, encrypting — a native addon pays for itself.

But N-API has a floor. The FFI boundary between V8 and native code adds fixed overhead per call. On small datasets — a few hundred data points — that overhead erases the compute savings. Node/Bun JS outperforms Node/Bun + Rust N-API at that scale. The native path only wins once the dataset is large enough to amortize the crossing cost.

Don't reach for Rust for I/O. The 3x raw compute speedup became a 5x latency reduction under HTTP load because the JS endpoint was past its breaking point. But the Rust endpoints on Node and Bun had nearly identical latencies (660 ms vs 642 ms).

The runtime's HTTP stack, event loop, and memory management dominate when computation is fast. Rust made the compute a non-factor; then the runtime didn't matter.

Bun is a cheaper upgrade than Rust. If you're on Node.js and your JS code is too slow, switching to Bun gives you a 40% speedup with zero code changes. Same JavaScript, same endpoints, different binary.

In this benchmark, Bun's JS path avoided errors entirely where Node.js failed 15.5% of requests. That might be enough. Rust is the bigger hammer, but Bun is the one you don't have to think about.

WASM is viable. Not a curiosity — genuinely competitive. 56 ops/sec in a browser tab, 2.2x faster than the browser's own JS, built from the same Rust codebase with a feature flag.

Unlike NODE/BUN, WASM's speedup holds even at small workloads. The calling convention is lighter — no FFI boundary to cross, the module runs inside the same engine. Where N-API needs a large dataset to amortize its overhead, WASM delivers from the start.

If you're running heavy computation in the browser — image processing, data analysis, simulations — WASM is the answer, and you don't need a separate codebase to get there.

The binding layer is not the bottleneck. napi-rs adds almost nothing to call overhead. Sync and async variants performed identically. The FFI boundary — V8 to Rust and back — is cheap enough that it doesn't appear in the numbers.

What matters is what happens on the other side of that boundary.

#rust #nodejs #bun #napi #webassembly #performance #benchmark

Top comments (1)

Boris Barac • Jun 12

github.com/borisBarac/NapiBench