DEV Community

Cover image for Why Percentiles Matter More Than Average Response Time in Performance Testing
Oleh Koren
Oleh Koren

Posted on • Edited on

Why Percentiles Matter More Than Average Response Time in Performance Testing

When analyzing load test results, many teams highlight a single number:

Average response time = 1.2 seconds

And that number is often presented as the main indicator of system performance.

The problem?
Average response time can lie.

If you rely only on the mean, you might completely miss serious performance issues affecting real users.

Let’s break this down.

Why Average Response Time Is Misleading

The average (mean) is calculated as:
Sum of all response times / Total number of requests

Simple.

But averages are highly sensitive to outliers.

Imagine this response time distribution (in milliseconds):

100, 110, 120, 130, 140, 150, 5000
Enter fullscreen mode Exit fullscreen mode

Now calculate:

  • Average ≈ 821 ms

That’s a huge difference.
One slow request (5000 ms) drastically shifts the average, even though most users experienced fast responses.

Now imagine the opposite situation.

100, 110, 120, 130, 4420, 4620, 4920
Enter fullscreen mode Exit fullscreen mode

  • Average ≈ 2060 ms
  • But 3 out of 7 users waited 4+ seconds.

Does the average really represent user experience?

Not even close.

What Percentiles Actually Show

Percentiles answer a much more meaningful question:

"How fast were responses for most users?"

Definition

The P95 response time means:

95% of all requests were completed in this time or faster.
Only 5% were slower.

Similarly:

  • P90 → 90% of requests are faster than this value
  • P99 → 99% of requests are faster than this value

How Percentiles Are Calculated

  1. Sort all response times from smallest to largest.
  2. Determine the percentile position.

Basic intuition formula:

Position = (Percentile / 100) × N
Enter fullscreen mode Exit fullscreen mode

Where:

  • Percentile = for example 95 (for P95)
  • 100 = used to convert the percentage into a decimal (95% → 0.95)
  • N = total number of requests

Example:
If you have 1000 requests:
P95 position = 0.95 × 1000 = 950
The value around the 950th position in the sorted list represents your P95.

In practice, different tools may use slightly different formulas and interpolation methods, but the core idea remains the same:

Percentiles describe distribution, not averages.

No magic. Just distribution awareness.

Real-World Example from Load Testing

Let’s say during a load test you get:

  • Average response time = 3981 ms
  • Median (P50) = 3451 ms
  • P90 = 7325 ms
  • P95 = 9212 ms
  • P99 = 12760 ms


If you only report the average:

“The system responds in about 4 seconds.”

Sounds acceptable.

But reality:

  • 10% of users wait more than 7.3 seconds
  • 5% wait more than 9.2 seconds
  • 1% wait more than 12.7 seconds

Now ask yourself:

Would 1% of users waiting 12+ seconds be acceptable in your production system?

For e-commerce during checkout?
For login?
For payment processing?

Probably not.

Why Percentiles Represent User Experience Better

Users don’t experience averages. They experience their own request.

If your P95 is high, that means a noticeable portion of users is suffering.

In modern systems, especially:

  • High-concurrency APIs
  • Distributed microservices
  • Cloud-native environments
  • Systems with auto-scaling

Latency spikes are normal.

Percentiles help you detect:

  • Queue buildup
  • Thread pool saturation
  • Garbage collection pauses
  • Network bottlenecks
  • Lock contention
  • Cold starts

Average hides all of that.

Final Thought

Next time someone reports only the average response time, ask:

What does P95 look like?
What about P99?

Because performance is about distribution —
and users feel the slowest moments.

If you want to better understand performance testing and go beyond just running tools, I cover this topic in more depth in my course:

👉 Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

Top comments (1)

Collapse
 
sasha_chun_b420107679b0b7 profile image
Sasha Chun

Great breakdown. Reporting only the mean is how teams build false confidence.