DEV Community

yvonne gatwiri
yvonne gatwiri

Posted on

SKEWNESS AND KURTOSIS

Understanding Skewness and Kurtosis in Data Distribution

In statistics, Skewness and Kurtosis are two important measures that describe the shape of a probability distribution. These measures help in understanding how the data is spread out and whether it is symmetric or skewed.

1. Skewness

Skewness refers to the asymmetry of the distribution of a dataset. A perfectly symmetric distribution has a skewness of zero. If the distribution is skewed to the left (negative skew), the tail is longer on the left side. If it is skewed to the right (positive skew), the tail is longer on the right side.

Formula for Skewness:

$$
\text{Skewness} = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{s} \right)^3
$$

Where:

  • $ n $ is the sample size
  • $ x_i $ is the $ i $-th data point
  • $ \bar{x} $ is the sample mean
  • $ s $ is the sample standard deviation

Interpretation:

  • Skewness = 0: Symmetric distribution
  • Skewness > 0: Right-skewed (positive skew)
  • Skewness < 0: Left-skewed (negative skew)

2. Kurtosis

Kurtosis measures the "tailedness" of the distribution. It describes the shape of the tails of the distribution. A normal distribution has a kurtosis of 3. Distributions with higher kurtosis have heavier tails and more outliers, while those with lower kurtosis have lighter tails.

Formula for Kurtosis:

$$
\text{Kurtosis} = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{s} \right)^4 - \frac{3(n-1)}{(n-2)(n-3)}
$$

Where:

  • $ n $ is the sample size
  • $ x_i $ is the $ i $-th data point
  • $ \bar{x} $ is the sample mean
  • $ s $ is the sample standard deviation

Interpretation:

  • Kurtosis = 3: Normal distribution (mesokurtic)
  • Kurtosis > 3: Heavy-tailed distribution (leptokurtic)
  • Kurtosis < 3: Light-tailed distribution (platykurtic)

Summary

Measure Description Formula
Skewness Asymmetry of the distribution $ \frac{n}{(n-1)(n-2)} \sum \left( \frac{x_i - \bar{x}}{s} \right)^3 $
Kurtosis Tailedness of the distribution $ \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum \left( \frac{x_i - \bar{x}}{s} \right)^4 - \frac{3(n-1)}{(n-2)(n-3)} $

Understanding Skewness and Kurtosis is essential for analyzing data and making informed decisions in fields such as finance, economics, and social sciences.


Let me know if you'd like a Python implementation or a visual example!

Top comments (0)