DEV Community

Cover image for Understanding Correlation in PHP: Pearson vs Spearman vs Kendall Tau
Roberto B.
Roberto B.

Posted on

Understanding Correlation in PHP: Pearson vs Spearman vs Kendall Tau

Correlation helps you understand whether two variables move together and how strongly they are related. In this article, you'll learn how Pearson, Spearman, and Kendall tau correlation work, when to use each method, and how to calculate them in PHP with practical examples.

Understanding Correlation

Correlation measures whether two variables tend to change together.

You use correlation when you have two variables describing the same observations. For example:

  • hours studied and exam scores for the same students
  • age and finish time for the same runners
  • product price and units sold for the same products
  • elevation gain and pace for the same running splits

Each value must correspond to the same observation in both datasets.

$hoursStudied = [1, 2, 3, 4, 5];
$testScores = [55, 62, 70, 78, 85];
Enter fullscreen mode Exit fullscreen mode

Here, the first student studied 1 hour and scored 55, the second studied 2 hours and scored 62, and so on. Correlation summarizes the strength and direction of the relationship between paired observations.

A positive correlation means that larger values in one dataset usually go with larger values in the other. A negative correlation means larger values in one dataset usually go with smaller values in the other. A correlation close to zero means the method does not detect a strong relationship.

Correlation does not prove causation. If study hours and test scores are correlated, it does not automatically prove that study hours are the only reason scores improved. Other factors may be involved. Still, correlation is useful for discovering relationships, validating assumptions, comparing signals, and choosing the right model.

The examples in this article use the PHP package hi-folks/statistics https://github.com/Hi-Folks/statistics, which provides statistical functions for PHP developers, including descriptive statistics, probability distributions, regression, and correlation analysis.

This package supports three correlation methods:

  • Pearson
  • Spearman
  • Kendall tau

Pearson Correlation

Pearson correlation measures how strongly two variables follow a straight-line relationship.

Use it when both variables are numeric, and you expect a linear relationship.

Result range:

  • +1: perfect positive linear relationship. As one value increases, the other increases in a straight-line pattern.
  • 0: no linear association detected. The variables may still have a nonlinear relationship.
  • -1: perfect negative linear relationship. As one value increases, the other decreases in a straight-line pattern.

Pearson correlation produces a correlation coefficient between -1 and +1.

Correlation measures the strength of a relationship, not the size of the effect or the rate of change between variables.

Pearson correlation is sensitive to outliers because extreme values can strongly affect the correlation coefficient.

Pearson is useful for:

  • age vs marathon finish time
  • temperature vs energy usage
  • advertising spend vs revenue
  • elevation gain vs running pace, if the effect is roughly linear
use HiFolks\Statistics\Stat;

$hoursStudied = [1, 2, 3, 4, 5];
$testScores = [55, 62, 70, 78, 85];

$correlation = Stat::correlation($hoursStudied, $testScores);

// Close to +1: more study hours strongly align with higher scores.
Enter fullscreen mode Exit fullscreen mode

Use Pearson when you care about whether the relationship is linear.

Spearman Correlation

Spearman's correlation measures whether one variable generally increases or decreases as the other changes, using ranked values rather than raw numbers.

Use it when values move together consistently, but not necessarily in a straight line.

Result range:

  • +1: perfect positive monotonic relationship. As one variable increases, the other always increases in rank order.
  • 0: no clear monotonic relationship detected.
  • -1: perfect negative monotonic relationship. As one variable increases, the other always decreases in rank order.

Spearman is useful for:

  • ranked or ordinal data
  • satisfaction scores
  • performance benchmarks
  • nonlinear growth, like experience vs salary
  • cases where outliers could distort Pearson correlation
use HiFolks\Statistics\Stat;

$experienceYears = [1, 2, 3, 4, 5];
$salary = [30_000, 40_000, 55_000, 80_000, 120_000];

$correlation = Stat::correlation($experienceYears, $salary, 'ranked');

// +1: salary consistently increases with experience rank.
Enter fullscreen mode Exit fullscreen mode

Use Spearman when you care about whether one variable generally increases as the other increases.

Kendall Tau Correlation

Kendall's tau measures the consistency of two rankings by comparing pairs of observations.

Use it when the dataset is ordinal, contains many tied ranks, or when you want a robust measure of rank agreement.
It is often more robust and easier to interpret for ranked data.

Result range:

  • +1: perfect agreement. Every pair of observations has the same order in both datasets.
  • 0: no clear pairwise rank agreement detected.
  • -1: perfect disagreement. Every pair of observations has the opposite order in the two datasets.

Kendall tau is useful for:

  • ordinal data
  • survey ratings
  • judge rankings
  • race placements
  • preference lists
  • small datasets with repeated values
use HiFolks\Statistics\Stat;

$judgeA = [1, 2, 3, 4, 5];
$judgeB = [1, 3, 2, 4, 5];

$correlation = Stat::kendallTau($judgeA, $judgeB, 4);

// High positive value: the judges mostly agree.
Enter fullscreen mode Exit fullscreen mode

You can also use Kendall tau through correlation():

$correlation = Stat::correlation($judgeA, $judgeB, 'kendall');
Enter fullscreen mode Exit fullscreen mode

Use Kendall tau when you care about whether two rankings agree pair by pair.

Choosing the Right Correlation Method

  • Use Pearson when the relationship is numeric and roughly linear.
  • Use Spearman when values move consistently but not necessarily linearly.
  • Use Kendall tau for rankings, ordinal data, small datasets, or datasets with many ties.

Using the hi-folks/statistics package, you can calculate these correlation coefficients directly in PHP and integrate statistical analysis into your applications, reports, or experiments.

You can find the package, documentation, and source code on GitHub:

https://github.com/Hi-Folks/statistics

If you find the package useful, consider giving the repository a ⭐ on GitHub.

Top comments (0)