DEV Community

Cover image for Understanding Skewness and Kurtosis in Statistics
Nicholus Gathirwa
Nicholus Gathirwa

Posted on

Understanding Skewness and Kurtosis in Statistics

Understanding Skewness and Kurtosis in Statistics

When analyzing data distributions, two crucial statistical measures help us understand the shape and characteristics of our data: skewness and kurtosis. These measures go beyond simple central tendency (mean, median, mode) and dispersion (variance, standard deviation) to provide deeper insights into how our data is distributed.

What is Skewness?

Skewness measures the asymmetry of a probability distribution around its mean. It tells us whether the data is symmetrically distributed or if it has a longer tail on one side.

Types of Skewness

Image showing the 3 types of skewness

1. Symmetric Distribution (Zero Skewness)

  • Skewness ≈ 0
  • The distribution is perfectly balanced around the mean
  • Mean = Median = Mode
  • Both tails are equal in length

2. Right Skewed (Positive Skewness)

  • Skewness > 0
  • The right tail is longer than the left tail
  • Mean > Median > Mode
  • Most data points are concentrated on the left side
  • Also called "positively skewed"

3. Left Skewed (Negative Skewness)

  • Skewness < 0
  • The left tail is longer than the right tail
  • Mode > Median > Mean
  • Most data points are concentrated on the right side
  • Also called "negatively skewed"

Calculating Skewness

The formula for sample skewness is:

Skewness = (n / ((n-1)(n-2))) × Σ((xi - x̄) / s)³
Enter fullscreen mode Exit fullscreen mode

Where:

  • n = sample size
  • xi = each data point
  • x̄ = sample mean
  • s = sample standard deviation

Interpreting Skewness Values

  • -0.5 to 0.5: Approximately symmetric
  • -1 to -0.5 or 0.5 to 1: Moderately skewed
  • < -1 or > 1: Highly skewed

What is Kurtosis?

Kurtosis measures the "tailedness" of a distribution - how heavy or light the tails are compared to a normal distribution. It also indicates how peaked or flat the distribution is around the mean.

Types of Kurtosis

Image showing the 3 types of kurtosis

1. Mesokurtic (Normal Kurtosis)

  • Kurtosis = 3 (or excess kurtosis = 0)
  • Similar to a normal distribution
  • Moderate tail thickness and peak height

2. Leptokurtic (High Kurtosis)

  • Kurtosis > 3 (or excess kurtosis > 0)
  • Heavy tails and sharp peak
  • More data concentrated around the mean
  • Higher probability of extreme values

3. Platykurtic (Low Kurtosis)

  • Kurtosis < 3 (or excess kurtosis < 0)
  • Light tails and flat peak
  • Data is more spread out
  • Lower probability of extreme values

Calculating Kurtosis

The formula for sample kurtosis is:

Kurtosis = (n(n+1) / ((n-1)(n-2)(n-3))) × 
Σ((xi - x̄) / s)⁴ - (3(n-1)² / ((n-2)(n-3)))
Enter fullscreen mode Exit fullscreen mode

This formula gives excess kurtosis (kurtosis - 3).

Interpreting Kurtosis Values

  • Kurtosis = 0: Normal distribution
  • Kurtosis > 0: Heavier tails than normal
  • Kurtosis < 0: Lighter tails than normal

Why Skewness and Kurtosis Matter

1. Data Quality Assessment

Understanding these measures helps identify outliers and data quality issues.

2. Statistical Test Selection

Many statistical tests assume normality. Skewness and kurtosis help determine if transformations are needed.

3. Risk Assessment

In finance, high kurtosis indicates higher probability of extreme events (fat tails).

4. Model Selection

Different distributions may be more appropriate based on skewness and kurtosis values.

Real-World Examples

Income Distribution (Right Skewed)

  • Most people earn moderate incomes
  • Few people earn extremely high incomes
  • Results in a long right tail

Test Scores (Left Skewed)

  • Most students score well on an easy test
  • Few students score poorly
  • Results in a long left tail

Stock Returns (High Kurtosis)

  • Most days show small price changes
  • Occasional days show extreme changes (crashes or rallies)
  • Results in heavy tails

Computing Skewness and Kurtosis

Python Example

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
normal_data = np.random.normal(0, 1, 1000)
skewed_data = np.random.exponential(2, 1000)

# Calculate skewness and kurtosis
print("Normal Data:")
print(f"Skewness: {stats.skew(normal_data):.3f}")
print(f"Kurtosis: {stats.kurtosis(normal_data):.3f}")

print("\nSkewed Data:")
print(f"Skewness: {stats.skew(skewed_data):.3f}")
print(f"Kurtosis: {stats.kurtosis(skewed_data):.3f}")
Enter fullscreen mode Exit fullscreen mode

Common Misconceptions

  1. Skewness doesn't always indicate outliers - it measures asymmetry, not necessarily extreme values
  2. High kurtosis doesn't mean bimodality - it refers to tail behavior, not multiple peaks
  3. Zero skewness doesn't guarantee normality - a distribution can be symmetric but not normal

Equip yourself, find out how to address different types of skewness (overview).

For Right Skewed Data:

You can use:

  • Log transformation: log(x)
  • Square root transformation: √x
  • Box-Cox transformation: (x^λ - 1) / λ

For Left Skewed Data:

  • Reflection then transformation: Apply right-skew transformations to (max(x) + 1 - x)

To take home:

Skewness and kurtosis provide valuable insights that complement basic descriptive statistics and help in:

  • Choosing appropriate statistical methods
  • Identifying data anomalies
  • Understanding risk profiles
  • Making informed decisions about data transformations

When combined with visual tools like histograms and Q-Q plots, these measures form a comprehensive toolkit for distribution analysis.

Top comments (0)