ZyVOP

Posted on May 27 • Originally published at zyvop.com

Node.js Performance Profiling: Finding the Bottleneck Before Your Users Do

#nodejsperformanceprofiling #clinicjsnodejs #flamegraphnodejs #0xprofiling

There is a specific kind of production bug that is worse than a crash: a performance regression. A crash is visible. It pages someone, generates an error in Sentry, and gets fixed. A performance regression sits in the background — requests take 400ms instead of 80ms, the event loop lags under load, a specific endpoint times out occasionally. Users leave. Support tickets accumulate. The team assumes it is infrastructure.

It is almost never infrastructure. It is almost always code.

This guide covers the full profiling workflow: identifying the problem from metrics, profiling the CPU and event loop, reading flame graphs, and fixing the specific patterns responsible for most Node.js performance regressions.

Know What You Are Looking For First

Before profiling, you need to know which metric is degraded. Guessing which code to optimize without measurement is how you spend a day optimizing a function that runs 10 times per day.

The signals that indicate specific problems:

High CPU, slow requests → CPU-bound work blocking the event loop. Look for synchronous operations, expensive computations, or tight loops.

Normal CPU, high latency → I/O-bound or event loop lag. Look for unoptimized database queries, N+1 patterns, missing indexes, or slow external API calls.

Memory climbing → Leak. See the memory leaks guide.

High CPU under moderate load → Garbage collector thrashing. Many short-lived allocations, or large objects being created repeatedly.

Specific endpoint slow, others fine → Query or logic problem scoped to that code path. Profile that endpoint specifically.

The Tools

clinic.js — the most useful suite for Node.js profiling. Three tools in one:

clinic doctor — identifies the category of problem (CPU, I/O, memory, event loop)
clinic flame — CPU flame graph, shows where time is spent
clinic bubbleprof — async profiling, shows where the event loop waits

0x — single-command flame graphs. Faster to use than clinic when you already know it is a CPU issue.

--prof** + **node --prof-process — V8's built-in profiler, no dependencies required, produces similar data to flame graphs.

npm install -g clinic 0x autocannon

Step 1 — clinic doctor (Diagnose First)

clinic doctor runs your app under load and produces a report that tells you which category of problem you have before you spend time on the wrong kind of profiling.

# Start your app under clinic doctor
clinic doctor -- node src/server.js

# In another terminal, apply load
autocannon -c 100 -d 30 http://localhost:3000/api/orders

After the load stops, clinic doctor opens a report in your browser showing:

Event loop delay — if high, you have synchronous blocking or very heavy async operations
CPU usage — if consistently at 100%, you have CPU-bound work
Memory — if climbing, you have a leak
Handles/requests — if handles grow without requests growing proportionally, something is not being cleaned up

The report recommends which clinic tool to use next. Follow it.

Step 2 — clinic flame (CPU Profiling)

When doctor indicates a CPU problem, clinic flame produces a flame graph — a visualization where the width of each bar represents how much CPU time that function consumed.

clinic flame -- node src/server.js

# Apply focused load on the slow endpoint
autocannon -c 50 -d 20 http://localhost:3000/api/search?q=laptop

Reading a flame graph:

The bottom of the graph is the call stack entry point
Each bar above represents a function called by the one below it
Width = time spent in that function (wider = more time)
The top of each stack is where execution was when the sample was taken
Look for wide bars near the top — these are the expensive functions

Common patterns in Node.js flame graphs:

Wide bar in JSON.parse or JSON.stringify — you are serializing large objects frequently. Consider streaming responses or reducing payload size.

Wide bar in a regex function — a regex is more expensive than expected, often because it is catastrophically backtracking. Test your regexes with rexploit or similar.

Wide bar in bcrypt or crypto — expected for hashing, but if it is in the hot path (every request, not just login), something is wrong.

Wide bar in your own business logic — investigate that function. Is it doing a computation that could be cached? Is it called more often than expected?

# Profile a specific script rather than a server
0x --open -- node src/scripts/generate-report.js

Step 3 — clinic bubbleprof (Async/I/O Profiling)

When doctor indicates I/O or event loop problems — not CPU — use bubbleprof. It shows where your code is waiting, not where it is running.

clinic bubbleprof -- node src/server.js
autocannon -c 50 -d 20 http://localhost:3000/api/orders

Bubbleprof shows a graph of async operations — database queries, HTTP calls, file I/O — and how long each one takes. Wide nodes are long waits.

What to look for:

Sequential awaits that could be parallel:

// SLOW — these run one after another
const user    = await getUser(userId);
const orders  = await getOrders(userId);
const profile = await getProfile(userId);
// Total time: getUser + getOrders + getProfile

// FAST — these run in parallel
const [user, orders, profile] = await Promise.all([
  getUser(userId),
  getOrders(userId),
  getProfile(userId),
]);
// Total time: max(getUser, getOrders, getProfile)

Missing connection pool configuration: If database operations show up as long waits, check your pool size. The default pg pool is 10 connections. Under 100 concurrent requests, requests queue waiting for a connection.

const pool = new Pool({
  connectionString: env.DATABASE_URL,
  max:             20,              // Increase pool size
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 5_000,  // Fail fast if no connection available
});

Step 4 — Event Loop Monitoring in Production

Flame graphs are taken in controlled environments. Production can behave differently. Add event loop lag monitoring to your metrics so you see regressions as they happen:

// src/lib/metrics.ts
import { monitorEventLoopDelay } from 'perf_hooks';

const histogram = monitorEventLoopDelay({ resolution: 20 });
histogram.enable();

// Gauge for Prometheus
const eventLoopLag = new client.Gauge({
  name: 'nodejs_event_loop_lag_p99_ms',
  help: 'Node.js event loop lag 99th percentile in milliseconds',
});

// Sample every 10 seconds
setInterval(() => {
  // histogram values are in nanoseconds
  const p99Ms = histogram.percentile(99) / 1_000_000;
  eventLoopLag.set(p99Ms);
  histogram.reset();
}, 10_000);

Event loop lag above 100ms is a warning. Above 500ms, users are noticeably affected. Above 1000ms, requests are timing out.

Step 5 — The Common Fixes

Synchronous Operations in the Hot Path

// BLOCKS the event loop — no other requests can be handled during this
const data = fs.readFileSync('/large/file.json');
const parsed = JSON.parse(data);

// Non-blocking — event loop stays free
const data = await fs.promises.readFile('/large/file.json', 'utf-8');
const parsed = JSON.parse(data);

Never use *Sync functions (readFileSync, execSync, writeFileSync) in request handlers. They block the entire Node.js event loop for their duration.

Expensive Computations

Move CPU-intensive work off the main thread:

import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';

// In your route handler
function runInWorker(scriptPath: string, data: unknown): Promise {
  return new Promise((resolve, reject) => {
    const worker = new Worker(scriptPath, { workerData: data });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });
}

// For truly CPU-intensive work (report generation, image processing):
router.post('/reports/generate', authenticate, async (req, res) => {
  const report = await runInWorker('./workers/reportGenerator.js', {
    tenantId: req.tenant.id,
    params:   req.body,
  });
  res.json(report);
});

Reducing Allocations in Hot Paths

Object and array allocations in tight loops create GC pressure. Reuse objects where possible:

// Creates a new object on every request — GC pressure at scale
app.use((req, res, next) => {
  req.context = {
    requestId: randomUUID(),
    startTime: Date.now(),
    user:      null,
  };
  next();
});

// Acceptable — the allocation is necessary. But avoid allocating
// inside loops or functions called thousands of times per second.
// Profile first. Optimize only what the flame graph shows is hot.

Caching Repeated Computations

// Recomputed on every request — if this is slow, cache it
async function getMenuItems(tenantId: string) {
  return db.query('SELECT * FROM menu_items WHERE tenant_id = $1', [tenantId]);
}

// Cache with LRU — computed once, served from memory
import { LRUCache } from 'lru-cache';

const menuCache = new LRUCache({
  max: 100,
  ttl: 5 * 60 * 1000,  // 5 minutes
});

async function getMenuItems(tenantId: string) {
  const cached = menuCache.get(tenantId);
  if (cached) return cached;

  const result = await db.query(
    'SELECT * FROM menu_items WHERE tenant_id = $1',
    [tenantId]
  );
  menuCache.set(tenantId, result.rows);
  return result.rows;
}

The Profiling Workflow in One Sequence

1. Detect: Grafana shows event loop lag or p99 latency spike
                ↓
2. Reproduce: Identify which endpoint or operation is slow
                ↓
3. Diagnose: clinic doctor → which category (CPU, I/O, memory, event loop)
                ↓
4. Profile:
   CPU issue  → clinic flame or 0x
   I/O issue  → clinic bubbleprof
   Memory     → heap snapshots (see memory leaks guide)
                ↓
5. Identify: Find the wide bar in the flame graph or the long wait in bubbleprof
                ↓
6. Fix: Apply the appropriate pattern (parallel awaits, caching, worker thread, remove sync op)
                ↓
7. Verify: Run autocannon before and after. Compare p99 latency.
                ↓
8. Monitor: Confirm event loop lag drops in Grafana after deploy

Profiling is not something you do once. Set up the metrics, watch them in production, and run a profiling session when they degrade. The regression that would have taken days to diagnose by reading code takes 30 minutes when you can see exactly where time is being spent.

Originally published on ZyVOP

DEV Community