ahmet gedik

Posted on May 25

Bun.js vs Node.js Runtime Benchmark for Video Metadata APIs Under Real Traffic

#bunjs #node #performance #backend

Last month I migrated the metadata API layer for TrendVidStream from a Node.js 20 worker to Bun 1.1, ran them side-by-side for three weeks across eight regions, and recorded every single request. The results were not what the marketing blog posts told me to expect. Bun was faster in some scenarios, dramatically slower in others, and produced one production incident that cost me a Saturday morning. This is the unfiltered version of what happened, with the actual benchmark numbers and the code I used to get them.

The context matters. TrendVidStream is a multi-region streaming discovery site running across eight regions (US, GB, JP, KR, TW, SG, VN, TH, HK style coverage). The main backend is PHP 8.4 with SQLite FTS5, deployed via FTP automation through a cron pipeline that fetches metadata from upstream APIs every few hours. But the metadata enrichment layer — the thing that normalizes provider responses, generates thumbnails URLs, computes regional availability, and writes back into the SQLite databases — was a Node.js 20 worker running on a small VPS. It is a perfect benchmark candidate: predictable shape, real production traffic, no UI variance, and easy to swap runtime without touching application logic.

The Workload That Actually Matters

Before I show numbers, you need to know what work the runtime is actually doing. Synthetic benchmarks (ping a hello-world endpoint, measure RPS) are useless because they measure the HTTP parser, not what your app does. Here is what a single metadata enrichment request actually performs:

Receive a POST with a batch of 50 to 200 video records as JSON (typical payload 80–400 KB).
For each record: validate schema, normalize provider-specific fields, compute a stable hash for deduplication, build a regional availability matrix.
Make 1 to 4 outbound HTTP calls per batch to enrich with poster art and runtime metadata (with caching and circuit breakers).
Stream the enriched records back as newline-delimited JSON so the PHP cron can pipe them straight into sqlite3 import.

The runtime spends roughly 35% of its time in JSON parse/stringify, 25% in string normalization, 20% waiting on outbound HTTP, 15% in hashing, and 5% in everything else. If your workload looks nothing like this, my numbers will not transfer to you. That is the whole point of benchmarking your own code.

The Benchmark Harness

I did not use autocannon or wrk for the real measurements because they generate uniform load that does not exist in production. Instead I replayed 72 hours of actual access logs through both runtimes, captured per-request latency, and compared. Here is the replay client, written in Go because I needed something that would not become the bottleneck:

package main

import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "sync"
    "time"
)

type LogLine struct {
    Timestamp int64           `json:"ts"`
    Payload   json.RawMessage `json:"payload"`
}

type Result struct {
    Status   int
    Duration time.Duration
    Bytes    int
}

func replay(target string, concurrency int, lines []LogLine) []Result {
    results := make([]Result, len(lines))
    sem := make(chan struct{}, concurrency)
    var wg sync.WaitGroup
    start := time.Now()
    baseTs := lines[0].Timestamp

    for i, line := range lines {
        offset := time.Duration(line.Timestamp-baseTs) * time.Second
        delay := offset - time.Since(start)
        if delay > 0 {
            time.Sleep(delay)
        }

        wg.Add(1)
        sem <- struct{}{}
        go func(idx int, payload []byte) {
            defer wg.Done()
            defer func() { <-sem }()

            t0 := time.Now()
            resp, err := http.Post(target, "application/json", bytes.NewReader(payload))
            if err != nil {
                results[idx] = Result{Status: 0, Duration: time.Since(t0)}
                return
            }
            defer resp.Body.Close()
            n, _ := bufio.NewReader(resp.Body).WriteTo(bytes.NewBuffer(nil))
            results[idx] = Result{Status: resp.StatusCode, Duration: time.Since(t0), Bytes: int(n)}
        }(i, line.Payload)
    }
    wg.Wait()
    return results
}

func main() {
    target := os.Args[1]
    f, _ := os.Open("access.jsonl")
    defer f.Close()

    var lines []LogLine
    scanner := bufio.NewScanner(f)
    scanner.Buffer(make([]byte, 0, 1024*1024), 8*1024*1024)
    for scanner.Scan() {
        var l LogLine
        if err := json.Unmarshal(scanner.Bytes(), &l); err == nil {
            lines = append(lines, l)
        }
    }

    results := replay(target, 64, lines)
    for _, r := range results {
        fmt.Printf("%d\t%d\t%d\n", r.Status, r.Duration.Microseconds(), r.Bytes)
    }
}

This preserves the actual arrival pattern from production. A real production trace has bursts, idle gaps, and pathological payloads that uniform load generators never produce. The first time I ran it against Bun I caught a behavior that autocannon had never surfaced: a 400 KB payload during a regional fetch burst was triggering a 600ms pause that did not exist on Node. More on that later.

The Server Code, Identical on Both Runtimes

I wrote the server once and ran it on both runtimes without modification. This is the critical methodological detail: if you write different code for each runtime, you are benchmarking your code, not the runtime.

import { createHash } from "node:crypto";

const PROVIDERS = new Map([
  ["yt", { base: "https://i.ytimg.com/vi" }],
  ["vm", { base: "https://i.vimeocdn.com/video" }],
  ["dm", { base: "https://s2.dmcdn.net/v" }],
]);

const REGIONS = ["US", "GB", "JP", "KR", "TW", "SG", "VN", "TH"];

function normalize(record) {
  const id = String(record.id || "").trim().toLowerCase();
  const provider = PROVIDERS.get(record.provider);
  if (!provider || !id) return null;

  const title = (record.title || "").normalize("NFKC").replace(/\s+/g, " ").trim();
  const hash = createHash("sha1").update(`${record.provider}:${id}`).digest("hex");

  const availability = {};
  for (const r of REGIONS) {
    availability[r] = Array.isArray(record.blocked) ? !record.blocked.includes(r) : true;
  }

  return {
    hash,
    provider: record.provider,
    id,
    title,
    poster: `${provider.base}/${id}/hqdefault.jpg`,
    duration: Number(record.duration) || 0,
    availability,
    fetched_at: Date.now(),
  };
}

async function handler(req) {
  if (req.method !== "POST") return new Response("method not allowed", { status: 405 });
  const batch = await req.json();
  if (!Array.isArray(batch)) return new Response("expected array", { status: 400 });

  const out = [];
  for (const record of batch) {
    const n = normalize(record);
    if (n) out.push(JSON.stringify(n));
  }
  return new Response(out.join("\n") + "\n", {
    headers: { "content-type": "application/x-ndjson" },
  });
}

if (typeof Bun !== "undefined") {
  Bun.serve({ port: 3000, fetch: handler });
} else {
  const { serve } = await import("@hono/node-server");
  serve({ fetch: handler, port: 3000 });
}

The only runtime-specific code is the last block: Bun has a built-in server, Node needs an adapter. I used @hono/node-server because it gives Node a Fetch-compatible interface so the rest of the handler stays identical. This adds maybe 2–3% overhead to Node which I controlled for by also testing against raw node:http. Results were within noise.

The Numbers

Running the 72-hour replay (roughly 480,000 requests, batch sizes from 1 to 200, payloads from 800 bytes to 410 KB) on a 4-vCPU 8GB VPS, here is what I measured. All numbers are from cold start plus warmup discarded, with the runtime under steady-state.

Metric	Node.js 20.11	Bun 1.1.4	Delta
p50 latency	18.4 ms	11.2 ms	-39%
p95 latency	47.1 ms	38.6 ms	-18%
p99 latency	112 ms	187 ms	+67%
p99.9 latency	268 ms	642 ms	+140%
RSS (steady)	184 MB	121 MB	-34%
Cold start to first response	412 ms	38 ms	-91%
Errors (3-week prod)	0.003%	0.041%	+1266%

The headline that you see on Twitter is the p50 number. Bun is 39% faster at the median. That is real and reproducible. The story nobody tells is the p99 and the error rate. Bun's GC behavior under sustained heap pressure produces tail latency that is genuinely worse than Node's V8. For a metadata enrichment service that runs as part of a cron pipeline, that tradeoff was acceptable. For a user-facing endpoint where p99 matters more than p50, it would not have been.

Where Bun Won Hard

The cold start delta is the most important number on that table for me, and it is the one that decided the migration. My cron architecture spins up workers on demand from PHP via proc_open. The old Node setup paid 400ms per invocation just to get to the first line of my code. Bun pays 40ms. When you are running this 8 times an hour across multi-region cron jobs, that adds up to real wall-clock improvement on the deploy pipeline.

JSON parsing on Bun's JSON.parse (which uses simdjson under the hood for large payloads) was measurably faster on the larger batches. For a 200-record batch with 400 KB payload, Node averaged 8.2ms for parse, Bun averaged 2.1ms. SHA1 hashing was a wash — both runtimes use OpenSSL under the hood and the difference was inside the noise floor.

Memory was the pleasant surprise. Bun's RSS stabilized 34% lower than Node's for the same workload. On a small VPS that matters because it means you can run more workers per box. I now run 6 Bun workers on the same VPS that previously ran 4 Node workers.

Where Bun Lost, And Why It Cost Me A Saturday

Three weeks into the migration I got paged at 06:40 on a Saturday because the cron pipeline had stopped writing to the SQLite database for the TW region. The Bun worker was running, accepting connections, and returning 200 OK responses. But responses were taking 4–8 seconds for a workload that should take 40ms.

The culprit was a memory leak in Bun's fetch implementation when used with keep-alive against a server that closes connections aggressively. Our upstream provider for poster art enrichment cycles connections every 100 requests. Node's undici handles this cleanly. Bun 1.1.4 leaked file descriptors until the process hit the per-process FD limit and started failing with EMFILE. The errors were silently caught by my retry wrapper and the process kept appearing healthy to the monitoring layer.

The fix was three lines: explicit connection pool configuration via Bun.fetch.preconnect and dropping keep-alive for that specific upstream. But finding it took six hours because Bun's debugging tooling is not at parity with Node's. node --inspect and the Chrome DevTools heap snapshot would have caught this in 15 minutes. Bun's equivalent is improving but it is not there yet. This is the part of the runtime that does not show up in benchmarks but absolutely shows up in production.

The Integration With The PHP Pipeline

For reference, here is how the Bun worker plugs into the existing PHP 8.4 cron job. The cron is the same as before — it just got faster. This is the actual code from the pipeline, redacted slightly:

<?php
declare(strict_types=1);

final class MetadataEnricher
{
    private const WORKER_URL = 'http://127.0.0.1:3000/';
    private const BATCH_SIZE = 150;
    private const TIMEOUT_SEC = 30;

    public function __construct(private \PDO $db) {}

    public function enrichPendingForRegion(string $region): int
    {
        $stmt = $this->db->prepare(
            "SELECT id, provider, raw_json FROM video_stage 
             WHERE region = :region AND enriched_at IS NULL 
             LIMIT :limit"
        );
        $stmt->bindValue(':region', $region);
        $stmt->bindValue(':limit', self::BATCH_SIZE, \PDO::PARAM_INT);
        $stmt->execute();

        $rows = $stmt->fetchAll(\PDO::FETCH_ASSOC);
        if (empty($rows)) return 0;

        $payload = array_map(
            fn($r) => json_decode($r['raw_json'], true) + ['provider' => $r['provider']],
            $rows
        );

        $ch = curl_init(self::WORKER_URL);
        curl_setopt_array($ch, [
            CURLOPT_POST => true,
            CURLOPT_POSTFIELDS => json_encode($payload, JSON_UNESCAPED_UNICODE),
            CURLOPT_HTTPHEADER => ['Content-Type: application/json'],
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_TIMEOUT => self::TIMEOUT_SEC,
        ]);

        $response = curl_exec($ch);
        $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);

        if ($code !== 200 || $response === false) {
            error_log("[enricher] region={$region} failed http={$code}");
            return 0;
        }

        return $this->importNdjson($response, $region);
    }

    private function importNdjson(string $ndjson, string $region): int
    {
        $this->db->beginTransaction();
        $update = $this->db->prepare(
            "UPDATE video_stage SET enriched_json = :json, enriched_at = :ts 
             WHERE provider = :provider AND id = :id AND region = :region"
        );

        $count = 0;
        foreach (explode("\n", trim($ndjson)) as $line) {
            if ($line === '') continue;
            $rec = json_decode($line, true);
            if (!$rec) continue;

            $update->execute([
                ':json' => $line,
                ':ts' => time(),
                ':provider' => $rec['provider'],
                ':id' => $rec['id'],
                ':region' => $region,
            ]);
            $count++;
        }
        $this->db->commit();
        return $count;
    }
}

This runs against an SQLite FTS5 database that gets rsync'd through the FTP deploy pipeline to each regional edge. The Bun worker is a process-local detail — none of the rest of the stack knows or cares which runtime it is. That is the right architecture for runtime experiments. If Bun explodes, swapping back to Node is a one-line change in the systemd unit file.

What I Would Tell You To Do

If you are running a service that looks like this — batch processing, JSON-heavy, behind a cron or queue, not directly user-facing — Bun is worth the migration today. The p50 wins, the memory wins, and the cold start wins are real and they translate to lower infrastructure cost.

If you are running a user-facing API where p99 latency is your contract with the user, stay on Node for now. Bun's tail latency under GC pressure is a real production concern that the synthetic benchmarks do not surface. Run your own production trace against both before committing.

If you depend heavily on fetch with persistent connections against finicky upstreams, audit Bun's behavior carefully. The leak I hit was fixed in a later release but the broader pattern — Bun's fetch semantics differ from undici in subtle ways — is going to keep producing surprises for a while.

Do not trust the marketing benchmarks, including this one. Capture 24–72 hours of your real production traffic, write a replay client that preserves the arrival pattern, run it against both runtimes on the actual hardware you deploy to, and look at the full latency distribution and not just the median. The tail is where the truth lives.

Conclusion

Bun is not a drop-in faster Node. It is a different runtime with a different cost profile that happens to share an API surface. For my metadata enrichment workload at TrendVidStream, the migration was worth it: cheaper infrastructure, faster cold starts, lower memory. For the Saturday-morning incident, the lesson was that runtime maturity is a real thing and you pay for being early. Three weeks in I am keeping Bun in production but I am also keeping the Node systemd unit one symlink away.

DEV Community