p50, p95, p99: What Latency Percentiles Actually Mean for Your Node.js App

#javascript #node #testing #webdev

Your monitoring dashboard shows average response time: 45ms. Looks great.

Your users are complaining the app is slow.

Both things are true. Here's why.

Averages hide the worst experiences

Imagine 100 API requests. 99 of them take 10ms. One takes 5000ms.

Average: (99 × 10 + 5000) / 100 = 59.9ms

Your dashboard shows ~60ms average. Looks fine. But one in every 100 users waited 5 seconds. If you have 10,000 daily active users, that's 100 people per day getting a 5-second response.

Averages lie because they let the fast majority drown out the slow minority. Percentiles don't.

What percentiles actually mean

p50 (median): 50% of requests finish faster than this. This is your "typical" user experience. If your p50 is 20ms, most users feel your app as fast.

p95: 95% of requests finish faster than this. 1 in 20 requests is slower. On a busy API handling 1000 requests per minute, that's 50 slow requests every minute.

p99: 99% of requests finish faster than this. 1 in 100 requests is slower. This is where tail latency lives — GC pauses, cold caches, network spikes.

p99.9: 1 in 1000 requests is slower. For high-traffic systems, this still affects real users. For most apps, p99 is enough to care about.

The relationship between these numbers tells you a lot:

p50 ≈ p99: Your latency is very consistent. Simple workload, warm cache, no contention.
p99 is 10x p50: You have a significant tail. Something slow is happening for 1% of requests — worth investigating.
p99 is 100x p50: Something is seriously wrong. GC pauses, lock contention, or a broken dependency.

Why Node.js has a particular p99 problem

Node.js is single-threaded. Everything runs on one event loop. This means:

One slow operation blocks everything.

If a database query takes 2 seconds, no other requests can be processed during that time (unless you're careful with async). In a multi-threaded server, one slow query only affects that thread. In Node, it affects everyone.

This makes p99 disproportionately important for Node.js applications. Your p99 latency doesn't just affect 1% of users — it causes latency spikes for everyone who happens to be waiting behind a slow request.

The good news: Node's async model means you can handle thousands of concurrent requests efficiently when everything is fast. The bad news: the tail hurts more.

How to measure your actual percentiles

If you're running Express, add a percentile tracker:

const latencies = [];

app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    latencies.push(Date.now() - start);
    // keep last 1000 only
    if (latencies.length > 1000) latencies.shift();
  });
  next();
});

function percentile(arr, p) {
  const sorted = [...arr].sort((a, b) => a - b);
  return sorted[Math.floor(sorted.length * p / 100)];
}

// log every minute
setInterval(() => {
  if (latencies.length === 0) return;
  console.log({
    p50:  percentile(latencies, 50),
    p95:  percentile(latencies, 95),
    p99:  percentile(latencies, 99),
    count: latencies.length
  });
}, 60_000);

Run this in production for a day. The gap between your p50 and p99 will probably surprise you.

The real-world numbers

Here's what realistic production p50/p95/p99 latency looks like for common dependencies:

Dependency	p50	p95	p99
Postgres (local)	2–5ms	20–50ms	100–300ms
Postgres (remote)	5–15ms	50–150ms	200–800ms
Redis	0.5–2ms	3–8ms	10–30ms
MongoDB	5–15ms	40–80ms	150–400ms
S3 GetObject	20–50ms	100–200ms	400–800ms
Stripe API	100–300ms	500–1000ms	1500–3000ms
OpenAI API	500–1500ms	2000–5000ms	5000–12000ms

Notice the ratios. Postgres p99 is often 40-60x the p50. Redis is more consistent (10-15x). External APIs like Stripe are the worst — p99 can be 10x the p50 and there's nothing you can do about it except handle it gracefully.

What to do with this information

Set timeouts based on p99, not p50.

If your DB p99 is 300ms, your timeout should be at least 400-500ms. Setting it at 100ms means you're timing out 1% of requests unnecessarily — and those timeouts trigger retries, which add load, which makes the DB slower, which triggers more timeouts.

Size your connection pools for the p99 case.

If a DB query takes 300ms at p99, and you're handling 100 concurrent requests, you need at least 30 connections in the pool (100 requests × 300ms / 1000ms). Most developers size pools for the p50 case and wonder why they get connection exhaustion under load.

Build retry backoff around p99, not p50.

Your first retry should wait at least as long as your p99. If you retry after 50ms but your p99 is 200ms, your retry often arrives while the original is still processing — doubling load.

Test with realistic latency.

This is the one most people skip. If you're testing retry logic, timeout handling, or circuit breakers against a flat setTimeout(fn, 200), you're not testing the variance — and the variance is what breaks things.

import { withLatency } from 'slowdep';

// simulate realistic postgres latency in your tests
const db = withLatency(mockDBCall, 'postgres'); // p50:5ms p99:200ms

// now your tests see the actual distribution
// some calls are 3ms, some are 150ms, some are 600ms
// just like production

The mental model

Think of latency like a commute. Your average commute might be 25 minutes. But:

p50: 22 minutes (typical day)
p95: 40 minutes (traffic, occasional delays)
p99: 75 minutes (accident, bad weather, everything went wrong)

If you tell someone "my commute is 25 minutes" and they schedule a meeting 30 minutes after you leave, you'll miss it 5% of the time. That's p95.

Your SLA, your retry timeouts, your user experience — they're all governed by your tail, not your average.

Build for the tail.

If you want to test your Node.js app against realistic latency distributions, I built slowdep — a zero-dependency package that wraps any async function with production-accurate p50/p99 sampling.

github.com/arnnnavvvvv/slowdep