Aditya Srivastava

Posted on May 24

Node.js vs Bun vs Go - A Multi-Layer HTTP Benchmark

#webdev #javascript #bunjs #go

🚀 Node.js vs Bun vs Go: A Multi-Layer HTTP Benchmark

🎯 Premise

I came across a video discussing Bun's new native image support claiming "blazing fast" performance. As a software developer with 4+ years building Node services, I wanted to quantify Bun's actual performance characteristics compared to Node.js in realistic scenarios.

I included Go as a baseline representing compiled, systems-level performance to contextualize the JavaScript runtime results.

⚡ What this benchmark tests: Raw HTTP throughput serving static JSON responses

❌ What this benchmark does NOT test: Database I/O, JSON parsing, real application logic, memory pressure, error handling, or long-tail latency under sustained load

🔬 Methodology: From Localhost to Cloud

Localhost benchmarks often produce misleading results because they test memory bus speeds rather than real-world network and system constraints. To uncover performance across different bottleneck layers, I tested three phases:

📊 Test Phases

Phase	Environment	What It Reveals
1️⃣ Localhost	Loopback interface	Pure event loop overhead
2️⃣ LAN over Tailscale	WiFi network	Network I/O constraints
3️⃣ Cloud Datacenter	DigitalOcean droplets	Removes local hardware limits

⚙️ Test Configuration

Workload: GET /json → {"message": "Hello from [runtime]"}

Hardware:

💻 Local: Windows dev machine, Tenda U10 USB WiFi adapter (WiFi 3)
☁️ Cloud: DigitalOcean shared droplets (burstable vCPUs), 10 Gbps datacenter network

Load generator: wrk -t2 -c200 -d30s

⚠️ Important: Each test was run once. For production decisions, you'd want 5+ runs with statistical analysis (median, stddev, confidence intervals).

🏁 Phase 1: Localhost Baseline

Testing over the loopback interface to measure pure event loop and syscall overhead without network constraints.

📈 Results: Local Memory Performance

Configuration	Node.js	Bun	Go
1 CPU Core	~14,000 RPS	~28,000 RPS	~29,000 RPS
4 CPU Cores (Single Process)	~16,000 RPS	~30,000 RPS	~115,000 RPS 🚀
4 CPU Cores (Multi-Process)	~110,000 RPS	~170,000 RPS 🏆	N/A

💡 Analysis

🧵 Single-threaded bottleneck: Node and Bun's JavaScript execution is single-threaded. Without process clustering, they max out one CPU core even with --cpus="4", leaving 3 cores idle. Go's M:N scheduler automatically utilizes all available cores in a single process.

📊 Multi-process scaling: Once clustered (Node's cluster module, Bun with reusePort: true spawned 4 times), both JavaScript runtimes showed strong scaling. Bun's lighter-weight Zig event loop showed ~55% higher throughput than Node's V8-based implementation.

✅ What this tells us: For CPU-bound workloads on the loopback interface, Bun's event loop has measurably lower per-request overhead than Node. Go's native multi-core scheduling eliminates the need for manual clustering.

🌐 Phase 2: Network-Constrained Reality (Tailscale over WiFi)

Requests now traverse a physical network: MacBook → USB WiFi adapter → Tailscale WireGuard VPN → Windows dev machine.

📊 Results: Network-Bound I/O (4 Cores, Clustered)

Metric	Node.js	Bun	Go
Throughput	7,954 RPS	12,519 RPS	12,873 RPS 🏆
Avg Latency	26.79 ms	16.49 ms	15.69 ms ⚡
Max Latency	864.53 ms ⚠️	163.24 ms	152.21 ms ✅
Bandwidth	1.62 MB/s	1.76 MB/s	1.67 MB/s

💡 Analysis

🔌 Network becomes the bottleneck: All three runtimes collapsed from 30k-170k RPS to 8-13k RPS. The WiFi 3 USB adapter (theoretical max ~54 Mbps, real-world much lower) became the limiting factor.

⚠️ Node's outlier spike: The 864ms max latency suggests either:

Garbage collection pause under network pressure
IPC coordination delays between cluster master/workers when packets arrive in bursts
Should have been investigated with proper GC tuning flags and p99 analysis

📌 What this tells us: This phase primarily measured my WiFi adapter's limitations, not runtime performance. However, it does show that once network I/O becomes the constraint, runtime choice matters less than network hardware quality.

☁️ Phase 3: Cloud Infrastructure (1 Core)

Moved to DigitalOcean droplets to remove local hardware constraints. Target and load generator in same datacenter.

🐳 Docker Configuration

docker run --rm --cpus="1" -m="512m" -p 3000:3000 [image]

⚠️ Note: --cpus="1" uses CFS CPU quotas, not core pinning. The container can still migrate between physical cores, introducing cache invalidation. Should have used --cpuset-cpus="0" for true single-core isolation.

📊 Results: Cloud Single-Core

Metric	Node.js	Go	Bun
Throughput	11,705 RPS	13,935 RPS	25,444 RPS 🚀
Avg Latency	29.13 ms	22.83 ms	7.89 ms ⚡
Max Latency	2,000.00 ms ⚠️	135.97 ms	93.27 ms ✅
Failed Requests	60 (timeout) 🔴	0 ✅	0 ✅

💡 Analysis

🔴 Node's timeout failures: The 60 failed requests with 2-second max latency strongly suggest GC pauses. This test should have been re-run with Node tuning flags (--max-old-space-size, --optimize-for-size) to determine if this is fundamental or tunable.

🏆 Bun's single-core dominance: Nearly double Go's throughput on a single core is impressive, but remember this is for a trivial 40-byte JSON response. Real applications doing actual work may show different patterns.

🚀 Phase 4: Cloud Multi-Core (4 Cores)

Full resource allocation with clustering enabled for JavaScript runtimes.

🐳 Configuration

docker run --rm --cpus="4" -m="512m" -p 3000:3000 [image]

⚔️ The Load Generator

# Attacker VM - wrk in Alpine container
docker run --rm alpine sh -c "apk add --no-cache wrk && \
  wrk -t2 -c200 -d30s http://159.65.6.89:3000/json"

📊 Results: Cloud 4-Core Maximum Throughput

Metric	Node.js	Go	Bun
Throughput	31,025 RPS	37,617 RPS	53,446 RPS 🏆
Total Requests (30s)	933,074	1,130,171	1,605,818 🎯
Avg Latency	8.62 ms	5.79 ms	4.04 ms ⚡
Max Latency	641.25 ms ⚠️	218.91 ms	76.54 ms ✅
CPU Usage	400%+	340%	383%

💡 CPU Efficiency Deep Dive

The raw CPU % numbers are misleading. What matters is CPU cost per request:

📊 Efficiency Ranking:
┌─────────────────────────────────────┐
│ Bun:  0.0072% CPU per request  🥇  │
│ Go:   0.0090% CPU per request  🥈  │
│ Node: 0.0129% CPU per request  🥉  │
└─────────────────────────────────────┘

Calculation:

Node: 400% / 31k RPS = 0.0129% per request
Go: 340% / 37.6k RPS = 0.0090% per request
Bun: 383% / 53.4k RPS = 0.0072% per request

💡 Key Insight: Bun is genuinely the most efficient, but Node's higher absolute CPU usage just means all workers are busy—which is what you want under load.

🤔 Go's "idle" CPU: The 340% (leaving 60% unused) might indicate:

GOMAXPROCS not set correctly
Network socket polling leaving CPU headroom
Or simply more efficient syscall handling

⚖️ Code Architecture Comparison

🚨 Critical Difference: The Go Code is Heavily Optimized

The Node and Bun code use default patterns, but the Go implementation uses production micro-optimizations:

Go optimizations applied:

var jsonResponse = []byte(`{"message":"Hello from Go!"}`)  // Pre-rendered
w.Write(jsonResponse)  // Direct byte write, no JSON encoding

Equivalent fair comparison would be:

// Fair comparison - same as Node/Bun pattern
json.NewEncoder(w).Encode(map[string]string{"message": "Hello from Go!"})

⚠️ Impact: This makes Go ~15-20% slower but tests equivalent functionality. The current benchmark favors Go's implementation.

🔍 Bun "Clustering" Isn't Actually Clustering

The Bun code uses reusePort: true in a single process. This enables kernel-level socket load balancing but doesn't spawn multiple processes like Node's cluster module.

For true architectural equivalence:

// This is what would match Node's architecture
import { spawn } from "bun";
for (let i = 0; i < 4; i++) {
  spawn(["bun", "server.js"]);
}

💡 Note: The current test compares single-process Bun vs multi-process Node, which actually makes Bun's performance even more impressive but should be disclosed.

🎯 What This Benchmark Actually Tells Us

✅ Valid Conclusions

1. ⚡ Bun's event loop has lower per-request overhead than Node 
   for simple HTTP responses

2. 📈 Bun scales efficiently to multiple cores via kernel-level 
   socket distribution  

3. 🎯 Go provides predictable performance with excellent CPU 
   efficiency

4. 🌐 Network hardware matters more than runtime choice once 
   you hit I/O limits

5. 🔄 Node's cluster architecture has measurable IPC overhead 
   under high load

❌ Invalid Conclusions

~~"Bun is production-ready"~~

Speed ≠ ecosystem maturity, debugger support, APM tooling

~~"Node is slow"~~

This tests static JSON echo; database-heavy apps show different patterns

~~"Go is always better"~~

Developer productivity, ecosystem, and deployment complexity matter

~~"These numbers apply to my app"~~

Real apps do parsing, validation, DB queries, business logic

🔍 Limitations & What's Missing

❌ Not Tested

Click to expand - What this benchmark doesn't cover

Realistic payloads: 10KB+ JSON parsing and validation
Database I/O: Connection pooling, query performance
Memory pressure: Behavior at 80%+ RAM utilization
Sustained load: 24-hour endurance, memory leaks
Error handling: Behavior under packet loss, slow clients
Cold starts: Container spin-up time (critical for serverless)
Long-tail latency: p95, p99, p99.9 percentiles over hours

🔧 Methodological Improvements Needed

📊 Statistical rigor

5+ runs per config with statistical significance testing

🎯 Proper CPU pinning

Use --cpuset-cpus instead of --cpus

⚙️ GC tuning for Node

Test with optimized V8 flags

⚖️ Fair code comparison

Either optimize all three or use stock patterns for all

🔄 Proper clustering for Bun

Multi-process architecture to match Node

🎯 Production Recommendations

Choose based on your actual constraints:

🐰 Use Bun if:

✅ You have existing Node.js code and want drop-in performance gains
✅ Your workload is I/O-heavy API routing/proxying
✅ You're comfortable with a newer ecosystem (risk tolerance)
⚠️  You can handle potential edge cases in package compatibility

🔷 Use Go if:

✅ You need predictable resource consumption for Kubernetes limits
✅ Your team values type safety and compile-time checks
✅ You're building infrastructure/platform services
✅ You need maximum efficiency per CPU core
✅ Long-term stability and tooling maturity matter

🟢 Use Node.js if:

✅ You have existing Node infrastructure and expertise
✅ Your bottleneck is database/external services (not event loop)
✅ Ecosystem maturity and package availability are critical
✅ You need battle-tested observability/APM tooling

💎 The Real Takeaway

For a 40-byte JSON echo server, Bun is measurably faster than Node.js.

But real applications aren't JSON echo servers. Your actual bottlenecks are probably:

🗄️  Database query time
🌐  External API latency  
🧮  Business logic complexity
📡  Network infrastructure

Profile your real workload before choosing a runtime based on microbenchmarks.

That said, Bun's performance characteristics are impressive and worth evaluating for I/O-heavy services where event loop overhead matters.

📝 Full Code Listings

Node.js (Clustered)

const cluster = require('cluster');
const http = require('http');
const numCPUs = 4;

if (cluster.isMaster) {
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    if (req.method === 'GET' && req.url === '/json') {
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ message: "Hello from Clustered Node!" }));
    } else {
      res.writeHead(404);
      res.end();
    }
  }).listen(3000);
}

Bun (Single Process with reusePort)

Bun.serve({
  port: 3000,
  reusePort: true, // Kernel-level socket load balancing
  fetch(request) {
    const url = new URL(request.url);
    if (request.method === 'GET' && url.pathname === '/json') {
      return new Response(
        JSON.stringify({ message: "Hello from Bun!" }), 
        { headers: { 'Content-Type': 'application/json' } }
      );
    }
    return new Response("Not Found", { status: 404 });
  },
});

Go (Optimized - Not Fair Comparison)

⚠️ This version pre-renders the response and skips JSON encoding

package main

import (
    "fmt"
    "net/http"
)

// Pre-rendered response eliminates JSON encoding overhead
var jsonResponse = []byte(`{"message":"Hello from Go!"}`)

func jsonHandler(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodGet {
        w.WriteHeader(http.StatusMethodNotAllowed)
        return
    }
    w.Header().Set("Content-Type", "application/json")
    w.Write(jsonResponse) // Note: Ignoring error - not production-safe
}

func main() {
    server := &http.Server{
        Addr:    ":3000",
        Handler: http.HandlerFunc(jsonHandler),
    }
    fmt.Println("Go server running on port 3000")
    server.ListenAndServe()
}

Go (Fair Comparison - Uses JSON Encoding)

✅ This version matches Node/Bun's approach

package main

import (
    "encoding/json"
    "net/http"
)

type Response struct {
    Message string `json:"message"`
}

func jsonHandler(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodGet {
        w.WriteHeader(http.StatusMethodNotAllowed)
        return
    }
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(Response{Message: "Hello from Go!"})
}

func main() {
    http.HandleFunc("/json", jsonHandler)
    http.ListenAndServe(":3000", nil)
}

🙏 Acknowledgments

Thanks to the readers who will inevitably point out additional issues I missed. Benchmarking is hard, and there's always room for improvement.

If you want to reproduce these tests or improve the methodology, feel free to reach out!

Found this useful? Drop a ❤️ and let me know what you'd like to see benchmarked next!