Variance is a measure of how much spread (or dispersed or scattered) a set of data is from its mean (average). It quantifies how much the individual data points differ from the central point of the data. A higher variance indicates that the data points are more dispersed, while a lower variance suggests the data points are clustered closer to the mean.
Std-dev σ is the square root of variance.
- For population variance, divide by n
- For sample variance, divide by n-1
- For large n, the result is almost same
How to decide which variance to use: population or sample?
If you are analyzing all the data from the group you're interested in, then use population.
If you are analyzing a subset (a sample) of the full group and trying to generalize, then use sample.
Understanding variance through an example:
Let us say we have 10 data points that represents the whole population. Calculate population variance of following
Sample 1: All are same number (high precision)..
(e.g. time intervals (in seconds) between each operation in an assembly line)
Time_interval = 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
Here mean = (5 + 5 + …+ 5)/ 10 = 5.
Variance is ([5−5]^2/10 + [5−5]^2/10 + …+ [5−5]^2/10 ) = 0
Std_dev , σ = √𝟎 = 0
Sample 2: Few spread around mean ( e.g. Package Weights in a Production Line)
wt = 4, 3, 5, 6, 7, 3, 5, 6, 4, 5
Here mean = (4 + 3 + …+ 5)/ 10 = 4.8
Variance is ([4−4.8]^2/10 + [3−4.8]^2/10 + …+ [5−4.8]^2/10 ) = 1.56
Std. dev σ = √1.56 = 1.3
Sample 3: Large spread around mean (e.g. Monthly Income in a Big City)
income = 1, 1, 0, 0, 5, 9, 9, 10, 10, 5
Here mean = (1 + 1 + …+ 5)/ 10 = 5 .
Variance is = 16.8
Std. dev σ = √16.8 = 4.1
A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean. Low, or small, standard deviation indicates data are clustered tightly around the mean, and high, or large, standard deviation indicates data are more spread out. A standard deviation close to zero indicates that data points are very close to the mean, whereas a larger standard deviation indicates data points are spread further away from the mean.
In the image, the curve on top is more spread out and therefore has a higher standard deviation, while the curve below is more clustered around the mean and therefore has a lower standard deviation
Why the 2 measure: variance and std_dev?
Short Answer: Variance is for math, standard deviation is for humans!
The units matter
- Variance is the average squared deviation from the mean
- So if your data is in, say, cm, then variance is in square cm (cm²).
- Squared units aren’t directly interpretable: nobody says “the spread of your height is 100 cm²”.
- By taking the square root, you bring the units back to the original scale
- If the mean height is 170 cm and the standard deviation is 10 cm, you can immediately say: “Most data (actually around 68%) is within 10 cm above or below the mean.”
- That’s intuitive. But “the variance is 100 cm²” is not easily interpretable by our brain — squared differences are abstract
Shamlodhiya Ashwani
Top comments (0)