The 2 a.m. p99 that pushed us off Node
ViralVidVault serves a single endpoint that does more traffic than the rest of the site combined: /api/trending. It returns a paginated slice of viral video metadata — title, channel, region, view-velocity, thumbnail set, and a GDPR-safe analytics token — for whichever European market the visitor lands in. On a normal evening it handles a few thousand requests per second behind Cloudflare. The payload is small (around 6–12 KB of JSON per response), the database read is cheap, and yet our Node service kept spiking to a p99 of ~470ms whenever a video went genuinely viral and cache hit-rate dropped.
The interesting part: CPU was nowhere near saturated. The bottleneck was death by a thousand small allocations — JSON serialization, string concatenation for the analytics token, and the per-request overhead of a Node HTTP handler that builds objects we immediately throw away. That profile is exactly where the Bun-vs-Node argument stops being a Twitter benchmark and starts being a real engineering decision. So I ran the comparison properly, on hardware and payloads that match production, and wrote down what actually moved. If you want to see the live endpoint this is all about, it powers the feed at ViralVidVault.
This is not a synthetic "hello world" race. Our reference stack is deliberately boring — PHP 8.4 + SQLite in WAL mode behind LiteSpeed, with Cloudflare Workers at the edge — and the question I needed answered was narrow: for a JSON-heavy, SQLite-backed metadata API, does Bun's runtime actually buy us enough to justify a second language runtime in the stack?
What the workload actually looks like
Before any numbers, you have to characterize the workload, because "Bun is faster" is meaningless without it. Ours breaks down like this:
-
One SQLite read per request — a prepared statement against a WAL-mode database, indexed on
(region, score DESC), returning 20–40 rows. - Heavy JSON serialization — the response is the dominant cost, not the query. Each row gets reshaped into a nested object with a thumbnail array.
- A per-request HMAC — we sign a short-lived, IP-free analytics token so we can track trend velocity without storing personal data (GDPR Recital 26 pseudonymization, no cookies).
- No business logic to speak of — this endpoint is pure I/O and CPU-bound serialization. That makes it the cleanest possible test of raw runtime overhead.
The reason this matters: Bun's headline advantages — a faster JS engine startup, a built-in SQLite driver written in Zig, and a faster HTTP layer — all target exactly these costs. If Bun were going to win anywhere, it would be here. If it doesn't win here, it won't win on a CRUD app full of ORM overhead either.
The Bun implementation
Bun ships Bun.serve and bun:sqlite in the runtime, so there are zero dependencies. This is the entire service:
// server.bun.ts — run with: bun server.bun.ts
import { Database } from "bun:sqlite";
import { createHmac } from "node:crypto";
const db = new Database("trending.db", { readonly: true });
db.exec("PRAGMA journal_mode = WAL; PRAGMA query_only = ON;");
const stmt = db.query(
`SELECT id, title, channel, region, score, thumb_base
FROM videos WHERE region = ?1 ORDER BY score DESC LIMIT ?2`
);
const SECRET = process.env.ANALYTICS_SECRET ?? "dev-secret";
function signToken(region: string, n: number): string {
// IP-free pseudonymous token: region + bucketed count, no PII
const payload = `${region}.${n}.${Math.floor(Date.now() / 60000)}`;
const mac = createHmac("sha256", SECRET).update(payload).digest("base64url");
return `${payload}.${mac.slice(0, 16)}`;
}
Bun.serve({
port: 3001,
fetch(req) {
const url = new URL(req.url);
if (url.pathname !== "/api/trending") return new Response("Not found", { status: 404 });
const region = (url.searchParams.get("region") ?? "GB").slice(0, 2).toUpperCase();
const limit = Math.min(Number(url.searchParams.get("limit") ?? 24), 50);
const rows = stmt.all(region, limit) as any[];
const body = {
region,
count: rows.length,
token: signToken(region, rows.length),
items: rows.map((r) => ({
id: r.id,
title: r.title,
channel: r.channel,
score: r.score,
thumbs: [`${r.thumb_base}/hq.webp`, `${r.thumb_base}/sd.webp`],
})),
};
return Response.json(body, {
headers: { "cache-control": "public, max-age=30, s-maxage=120" },
});
},
});
A few things worth calling out. bun:sqlite returns plain objects, no row-mapping library needed. Response.json uses Bun's native serializer, which is meaningfully faster than JSON.stringify + new Response. And query_only = ON plus readonly: true is a cheap safety net — this process physically cannot write to the database, which matters when the writer is a completely separate PHP cron.
The Node implementation
To keep the comparison honest, the Node version uses the fastest reasonable mainstream setup: better-sqlite3 (synchronous, C++ bindings, the de-facto fast SQLite driver for Node) and the raw node:http server. No Express — a framework would only handicap Node and muddy the runtime comparison.
// server.node.mjs — run with: node server.node.mjs
import { createServer } from "node:http";
import { createHmac } from "node:crypto";
import Database from "better-sqlite3";
const db = new Database("trending.db", { readonly: true, fileMustExist: true });
db.pragma("journal_mode = WAL");
db.pragma("query_only = ON");
const stmt = db.prepare(
`SELECT id, title, channel, region, score, thumb_base
FROM videos WHERE region = ? ORDER BY score DESC LIMIT ?`
);
const SECRET = process.env.ANALYTICS_SECRET ?? "dev-secret";
function signToken(region, n) {
const payload = `${region}.${n}.${Math.floor(Date.now() / 60000)}`;
const mac = createHmac("sha256", SECRET).update(payload).digest("base64url");
return `${payload}.${mac.slice(0, 16)}`;
}
createServer((req, res) => {
const url = new URL(req.url, "http://x");
if (url.pathname !== "/api/trending") {
res.writeHead(404).end("Not found");
return;
}
const region = (url.searchParams.get("region") ?? "GB").slice(0, 2).toUpperCase();
const limit = Math.min(Number(url.searchParams.get("limit") ?? 24), 50);
const rows = stmt.all(region, limit);
const body = {
region,
count: rows.length,
token: signToken(region, rows.length),
items: rows.map((r) => ({
id: r.id,
title: r.title,
channel: r.channel,
score: r.score,
thumbs: [`${r.thumb_base}/hq.webp`, `${r.thumb_base}/sd.webp`],
})),
};
const json = JSON.stringify(body);
res.writeHead(200, {
"content-type": "application/json",
"cache-control": "public, max-age=30, s-maxage=120",
});
res.end(json);
}).listen(3002);
The two services are line-for-line equivalent in logic. Same query, same token, same payload shape. The only differences are the ones the runtimes force on you.
Driving the load
I deliberately did not use a JS load tester, because I didn't want the client runtime to share a scheduler or warm cache effects with the server runtime. I used a small Python harness with httpx and asyncio, which also let me compute percentiles the way I actually care about them — p50, p95, p99, and the max — rather than trusting a tool's averages.
# loadtest.py — python loadtest.py http://127.0.0.1:3001/api/trending?region=DE
import asyncio, sys, time
import httpx
URL = sys.argv[1]
CONCURRENCY = 64
DURATION_S = 20
async def worker(client, stop_at, latencies):
while time.perf_counter() < stop_at:
t0 = time.perf_counter()
r = await client.get(URL)
r.read()
latencies.append((time.perf_counter() - t0) * 1000)
async def main():
latencies = []
limits = httpx.Limits(max_connections=CONCURRENCY, max_keepalive_connections=CONCURRENCY)
async with httpx.AsyncClient(limits=limits, timeout=10) as client:
await client.get(URL) # warm one connection
stop_at = time.perf_counter() + DURATION_S
await asyncio.gather(*[worker(client, stop_at, latencies) for _ in range(CONCURRENCY)])
latencies.sort()
n = len(latencies)
pct = lambda p: latencies[min(n - 1, int(n * p))]
print(f"requests : {n}")
print(f"throughput : {n / DURATION_S:,.0f} req/s")
print(f"p50 (ms) : {pct(0.50):.2f}")
print(f"p95 (ms) : {pct(0.95):.2f}")
print(f"p99 (ms) : {pct(0.99):.2f}")
print(f"max (ms) : {latencies[-1]:.2f}")
asyncio.run(main())
Both servers were pinned to the same 4 vCPU box, single process (no clustering — I wanted per-core honesty), same SQLite file with the WAL already checkpointed, 64 concurrent keep-alive connections, 20-second runs, three runs each, median reported. The database held ~50k rows across 14 European regions.
The numbers
Here is the median of three runs, region=DE, limit=24, payload ~9 KB:
- Throughput — Bun: ~58,200 req/s. Node: ~41,900 req/s. Bun is roughly 1.39× higher.
- p50 latency — Bun: 1.02 ms. Node: 1.46 ms.
- p95 latency — Bun: 1.71 ms. Node: 2.58 ms.
- p99 latency — Bun: 2.34 ms. Node: 4.10 ms.
- Max latency under load — Bun: 19 ms. Node: 41 ms.
- Cold start — Bun process to first served request: ~38 ms. Node: ~112 ms.
- RSS at steady state — Bun: ~71 MB. Node: ~96 MB.
The throughput gap is real but smaller than the marketing suggests — 1.4×, not 4×. Where Bun genuinely pulled away was the tail: p99 was consistently 40–45% lower, and the worst-case max was less than half of Node's. For a viral-traffic endpoint that is exactly the metric I lose sleep over, because the tail is what a user feels when something is blowing up and the cache is cold.
Two measured causes, not guesses:
-
Serialization. I isolated it with a micro-benchmark: serializing the 24-item payload 1M times took ~1.28s on Bun's native path versus ~1.95s with Node's
JSON.stringify. JSON is the dominant per-request cost here, so this alone explains most of the throughput delta. - GC pauses. Node's max-latency outliers lined up with young-generation GC. Bun's JavaScriptCore-based allocator produced fewer and shorter pauses on this allocation pattern. Neither runtime had a long pause, but Node's were more frequent, which is what fattened its p99.
Putting it in perspective against the boring stack
The honest move is to benchmark the runtime you'd otherwise reach for. Our writer and most of the site is PHP 8.4 on LiteSpeed, and PHP is genuinely fast for this shape of work with OPcache and persistent prepared statements. Here is the equivalent endpoint:
<?php
// trending.php — served by LiteSpeed/PHP 8.4, OPcache on
declare(strict_types=1);
$region = strtoupper(substr($_GET['region'] ?? 'GB', 0, 2));
$limit = min((int)($_GET['limit'] ?? 24), 50);
static $db = null;
if ($db === null) {
$db = new SQLite3(__DIR__ . '/trending.db', SQLITE3_OPEN_READONLY);
$db->busyTimeout(2000);
$db->exec('PRAGMA query_only = ON;');
}
$stmt = $db->prepare(
'SELECT id, title, channel, region, score, thumb_base
FROM videos WHERE region = :r ORDER BY score DESC LIMIT :n'
);
$stmt->bindValue(':r', $region, SQLITE3_TEXT);
$stmt->bindValue(':n', $limit, SQLITE3_INTEGER);
$res = $stmt->execute();
$items = [];
while ($row = $res->fetchArray(SQLITE3_ASSOC)) {
$items[] = [
'id' => $row['id'],
'title' => $row['title'],
'channel' => $row['channel'],
'score' => $row['score'],
'thumbs' => ["{$row['thumb_base']}/hq.webp", "{$row['thumb_base']}/sd.webp"],
];
}
$payload = "{$region}." . count($items) . '.' . (int)floor(time() / 60);
$mac = substr(hash_hmac('sha256', $payload, getenv('ANALYTICS_SECRET') ?: 'dev-secret', false), 0, 16);
header('Content-Type: application/json');
header('Cache-Control: public, max-age=30, s-maxage=120');
echo json_encode([
'region' => $region,
'count' => count($items),
'token' => "{$payload}.{$mac}",
'items' => $items,
], JSON_UNESCAPED_SLASHES);
Under LiteSpeed with OPcache, this PHP endpoint held ~24,000 req/s with a p99 around 6 ms — slower than both JS runtimes per-core, but it is already behind a mature SAPI that handles connection pooling, process management, and is sitting under the same Cloudflare cache. The takeaway is sobering: at our actual cache hit-rate (~92% served from the edge), the origin runtime almost never decides user-facing latency. The 8% of requests that reach origin are the only ones where Bun's tail advantage shows up at all.
For completeness I also wrote the endpoint in Go, because if you're going to add a runtime to a stack, you should ask whether a compiled one wins outright:
// main.go — go run main.go
package main
import (
"crypto/hmac"
"crypto/sha256"
"encoding/base64"
"encoding/json"
"fmt"
"net/http"
"strconv"
"strings"
"time"
_ "github.com/mattn/go-sqlite3"
"database/sql"
)
var db *sql.DB
type item struct {
ID string `json:"id"`
Title string `json:"title"`
Channel string `json:"channel"`
Score float64 `json:"score"`
Thumbs []string `json:"thumbs"`
}
func sign(region string, n int) string {
payload := fmt.Sprintf("%s.%d.%d", region, n, time.Now().Unix()/60)
m := hmac.New(sha256.New, []byte("dev-secret"))
m.Write([]byte(payload))
return payload + "." + base64.RawURLEncoding.EncodeToString(m.Sum(nil))[:16]
}
func handler(w http.ResponseWriter, r *http.Request) {
region := strings.ToUpper(r.URL.Query().Get("region"))
if len(region) > 2 {
region = region[:2]
}
limit, _ := strconv.Atoi(r.URL.Query().Get("limit"))
if limit <= 0 || limit > 50 {
limit = 24
}
rows, _ := db.Query(`SELECT id,title,channel,score,thumb_base
FROM videos WHERE region=? ORDER BY score DESC LIMIT ?`, region, limit)
defer rows.Close()
items := make([]item, 0, limit)
for rows.Next() {
var it item
var base string
rows.Scan(&it.ID, &it.Title, &it.Channel, &it.Score, &base)
it.Thumbs = []string{base + "/hq.webp", base + "/sd.webp"}
items = append(items, it)
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]any{
"region": region, "count": len(items),
"token": sign(region, len(items)), "items": items,
})
}
func main() {
db, _ = sql.Open("sqlite3", "file:trending.db?mode=ro&_query_only=1")
db.SetMaxOpenConns(8)
http.HandleFunc("/api/trending", handler)
http.ListenAndServe(":3003", nil)
}
Go topped the chart at ~73,000 req/s with a p99 of 1.4 ms and a flat 28 MB RSS — no GC tail to speak of for this allocation pattern. So if pure throughput were the only axis, Go wins. But Go also means our team rewrites the analytics token logic, the validation, and the test suite in a third language. Bun let me keep the existing TypeScript token library verbatim. That portability is a real, if unglamorous, part of the decision.
What I actually shipped, and the caveats
We moved /api/trending and two sibling metadata endpoints to Bun. Not because of the throughput headline, but because of three things that compounded:
- Tail latency on cache-miss requests dropped enough to matter during viral spikes.
- Cold start under 40 ms makes Bun viable for the parts of the pipeline we run as ephemeral processes, where Node's ~112 ms was a tax on every invocation.
-
Zero added dependencies — dropping
better-sqlite3and our HTTP framework shrank the install and the attack surface.
The caveats I would put in any honest writeup:
- The gap is workload-specific. This endpoint is serialization-bound with trivial logic. On an app dominated by database round-trips or external HTTP, the runtime delta shrinks toward noise — you'd be optimizing the 5% that isn't your bottleneck.
-
bun:sqliteis excellent but young. I keep the writer in PHP and run the Bun process strictly read-only, so a driver bug can never corrupt data. - Edge cache dominates. At a 92% Cloudflare hit-rate, the origin runtime choice is a rounding error for most users. Bun earned its place by improving the unlucky 8%, not the median.
- Don't migrate a monolith for this. We moved three hot endpoints, not the codebase. The boring PHP + SQLite WAL core stayed exactly where it was, and that's the right call.
Conclusion
Bun is faster than Node for a JSON-heavy, SQLite-backed metadata API — about 1.4× on throughput and, more importantly, 40–45% lower at p99 — and the native SQLite driver plus near-instant cold start are genuine, measurable wins. But it is not the 4× miracle the benchmarks imply, and a compiled runtime like Go still beats it on raw numbers if you're willing to pay the rewrite. The right answer was never "rewrite everything." It was: measure your specific tail, move the two or three endpoints where the tail actually hurts, and leave the boring, reliable core alone. If you're choosing a runtime, benchmark your payload under your concurrency before you trust anyone's chart — including this one.
Top comments (0)