DEV Community

ram vnet
ram vnet

Posted on

Statistics - Measures of Position In Data Science

1. What Are Measures of Position?

Measures of position describe where a particular data value stands relative to the rest of the dataset.
They help answer questions like:

Is this value high, low, or typical?

What proportion of data lies below (or above) a given value?

How extreme is a data point?

Unlike measures of central tendency (mean, median) or dispersion (variance, standard deviation), measures of position focus on relative standing.
**
2. Why Measures of Position Matter in Data Science**

In data science, measures of position are crucial for:

Outlier detection (e.g., IQR method)

Feature scaling and normalization

Risk assessment (finance, insurance)

Model evaluation (percentile-based metrics)

Fair comparisons across populations

Example:

A test score of 85 means very different things depending on whether it is in the 60th percentile or the 95th percentile.

3. Types of Measures of Position
Main categories:

**Percentiles

Quartiles

Deciles

Z-scores (Standard Scores)

Ranks
**
Each provides a different lens on relative position.

4. Percentiles

Definition

The p-th percentile is the value below which p% of the data falls.

Example:

90th percentile = value below which 90% of observations lie.

Properties

Percentiles range from 0 to 100

Not evenly spaced in valueβ€”depends on data distribution

How to Compute Percentiles

Given ordered data of size n:

Position of 𝑃𝑝 = 𝑝/100(𝑛+1)

​
Enter fullscreen mode Exit fullscreen mode

If the position is not an integer, interpolate.

Example :

Data:
10, 20, 30, 40, 50

Find the 60th percentile.

𝑃60=60/100(5+1)=3.6

Between 3rd and 4th values:

30+0.6(40βˆ’30)=36

So, P₆₀ = 36

Interpretation

60% of the data is ≀ 36

40% is β‰₯ 36

5. Quartiles

Quartiles divide data into four equal parts.

Quartile Meaning
Q₁ 25th percentile
Qβ‚‚ 50th percentile (Median)
Q₃ 75th percentile

Interquartile Range (IQR)

IQR=𝑄3βˆ’π‘„1
​
Why important?

Measures spread of the middle 50%

Robust to outliers :

Used heavily in box plots and anomaly detection

Outlier Detection (IQR Rule)
Lower bound=𝑄1βˆ’1.5×𝐼𝑄𝑅
Upper bound=𝑄3+1.5×𝐼𝑄𝑅

​
Enter fullscreen mode Exit fullscreen mode

Values outside these bounds are considered outliers.

  1. Deciles

Deciles split data into 10 equal parts.

Decile Percentile
D₁ 10th
Dβ‚… 50th (Median)
D₉ 90th
Usage

Income distribution analysis

Population studies

Risk stratification

Example:

Top 10% income earners = above the 9th decile

  1. Z-Scores (Standard Scores) Definition

A Z-score measures how many standard deviations a value is from the mean.

𝑍=π‘₯βˆ’πœ‡/𝜎

Where:

x = observation
ΞΌ = mean
Οƒ = standard deviation

Interpretation
Z-score Meaning
0 Exactly at mean
+1 1 SD above mean
-2 2 SD below mean
Why Z-Scores Are Powerful

Standardize different scales

Enable comparison across datasets

Fundamental in machine learning pre-processing

Basis of normal distribution probabilities

Example

Mean = 70
SD = 10
Score = 85

𝑍=85βˆ’70/10=1.5

​
Enter fullscreen mode Exit fullscreen mode

Interpretation:

The score is 1.5 standard deviations above the mean

  1. Relationship Between Z-Scores and Percentiles:

In a normal distribution:

Z Percentile
0 50%
1 ~84%
2 ~97.5%
-1 ~16%

This connection is vital in:

Hypothesis testing

Probability estimation

Statistical modelling:

  1. Ranks Definition

Rank assigns an ordinal position to each observation.

Example:

Highest score β†’ Rank 1

Next β†’ Rank 2

Types of Ranking

Dense ranking (1,2,2,3)

Competition ranking (1,2,2,4)

Fractional ranking (2.5 for ties)

Limitations

Ignores magnitude differences

Not suitable for distance-based models

  1. Measures of Position vs Measures of Central Tendency Aspect Central Tendency Position Focus Typical value Relative standing Examples Mean, Median Percentiles, Z Outliers Sensitive (mean) Often robust Use in ML Baseline Feature scaling, anomaly detection

11. Real-World Data Science Applications

  1. Machine Learning

Feature normalization using Z-scores

Quantile transformation

  1. Finance

Value-at-Risk (VaR) β†’ percentile-based

Risk classification using deciles

  1. Healthcare

Growth percentiles (BMI-for-age)

Lab result interpretation

  1. Education

Standardized test scores

Admission cut-offs

  1. Summary Table Measure Purpose Robust to Outliers Percentile Relative position Yes Quartile Spread & position Yes Decile Distribution segmentation Yes Z-score Standardized distance No Rank Order comparison Yes
  2. Key Takeaways

Measures of position explain where a value lies, not just what it is.

Percentiles and quartiles are distribution-free.

Z-scores assume normality but allow deep comparisons.

In data science, they are foundational for scaling, anomaly detection, and interpretation.

Read More....

Top comments (0)