Oleh Koren

Posted on Feb 9 • Edited on May 7

Why Percentiles Matter More Than Average Response Time in Performance Testing

#performance #loadtesting #performancetesting #testing

When analyzing load test results, many teams highlight a single number:

Average response time = 1.2 seconds

And that number is often presented as the main indicator of system performance.

The problem?
Average response time can lie.

If you rely only on the mean, you might completely miss serious performance issues affecting real users.

Let’s break this down.

Why Average Response Time Is Misleading

The average (mean) is calculated as:
Sum of all response times / Total number of requests

Simple.

But averages are highly sensitive to outliers.

Imagine this response time distribution (in milliseconds):

100, 110, 120, 130, 140, 150, 5000

Now calculate:

Average ≈ 821 ms

That’s a huge difference.
One slow request (5000 ms) drastically shifts the average, even though most users experienced fast responses.

Now imagine the opposite situation.

100, 110, 120, 130, 4420, 4620, 4920

Average ≈ 2060 ms
But 3 out of 7 users waited 4+ seconds.

Does the average really represent user experience?

Not even close.

What Percentiles Actually Show

Percentiles answer a much more meaningful question:

"How fast were responses for most users?"

Definition

The P95 response time means:

95% of all requests were completed in this time or faster.
Only 5% were slower.

Similarly:

P90 → 90% of requests are faster than this value
P99 → 99% of requests are faster than this value

How Percentiles Are Calculated

Sort all response times from smallest to largest.
Determine the percentile position.

Basic intuition formula:

Position = (Percentile / 100) × N

Where:

Percentile = for example 95 (for P95)
100 = used to convert the percentage into a decimal (95% → 0.95)
N = total number of requests

Example:
If you have 1000 requests:
P95 position = 0.95 × 1000 = 950
The value around the 950th position in the sorted list represents your P95.

In practice, different tools may use slightly different formulas and interpolation methods, but the core idea remains the same:

Percentiles describe distribution, not averages.

No magic. Just distribution awareness.

Real-World Example from Load Testing

Let’s say during a load test you get:

Average response time = 3981 ms
Median (P50) = 3451 ms
P90 = 7325 ms
P95 = 9212 ms
P99 = 12760 ms

If you only report the average:

“The system responds in about 4 seconds.”

Sounds acceptable.

But reality:

10% of users wait more than 7.3 seconds
5% wait more than 9.2 seconds
1% wait more than 12.7 seconds

Now ask yourself:

Would 1% of users waiting 12+ seconds be acceptable in your production system?

For e-commerce during checkout?
For login?
For payment processing?

Probably not.

Why Percentiles Represent User Experience Better

Users don’t experience averages. They experience their own request.

If your P95 is high, that means a noticeable portion of users is suffering.

In modern systems, especially:

High-concurrency APIs
Distributed microservices
Cloud-native environments
Systems with auto-scaling

Latency spikes are normal.

Percentiles help you detect:

Queue buildup
Thread pool saturation
Garbage collection pauses
Network bottlenecks
Lock contention
Cold starts

Average hides all of that.

Final Thought

Next time someone reports only the average response time, ask:

What does P95 look like?
What about P99?

Because performance is about distribution —
and users feel the slowest moments.

If you want to better understand performance testing and go beyond just running tools, I cover this topic in more depth in my course:

👉Performance Testing Fundamentals: From Basics to Hands-On (Udemy)

Top comments (1)

Sasha Chun • Feb 15

Great breakdown. Reporting only the mean is how teams build false confidence.