DEV Community

Sajjad Rahman
Sajjad Rahman

Posted on

Skewness in Data: Concepts and Math Examples

When analyzing data, summary statistics like mean and median often help us understand central tendency. But they don’t tell us about the shape of the distribution. Sometimes data leans more to one side — this property is called skewness.

Today we are learning to ;

  • What skewness is
  • Types of skewness
  • A worked-out math example
  • Real dataset results
  • Why skewness matters in data analysis and ML

🔹 What is Skewness?

Skewness measures how asymmetric a data distribution is.

  • Skewness ≈ 0 → Symmetric (Normal Distribution)
  • Skewness > 0 → Right (Positive) Skew
  • Skewness < 0 → Left (Negative) Skew

Formula:

skewness

Where:

details of images

🔹 Types of Skewness

  1. Positive Skew (Right Skew)
  • Tail longer on the right.
  • Mean > Median.
  • Example: Salaries, house prices.
  1. Negative Skew (Left Skew)
  • Tail longer on the left.
  • Mean < Median.
  • Example: Retirement age, exam scores (few very low values).
  1. No Skew (Symmetric)
  • Balanced distribution.
  • Mean ≈ Median ≈ Mode.
  • Example: Human height.

🔹 Step-by-Step Math Example

Consider the Dataset:

    `X = [40, 45, 50, 55, 60, 65, 70, 75, 80, 100]`
Enter fullscreen mode Exit fullscreen mode

mean median mode

Result: Positive skew → most students scored near the average, but one high score (100) stretched the tail to the right.


🔹 Real Dataset Example

We computed skewness for three numerical features:

no_of_employees    12.26
yr_of_estab        -2.03
prevailing_wage     0.76
Enter fullscreen mode Exit fullscreen mode

📌 Interpretation:

  • no_of_employees → Highly right-skewed (few very large companies).
  • yr_of_estab → Left-skewed (few very old companies).
  • prevailing_wage → Moderately right-skewed (most salaries are average, few very high).

🔹 Why is Skewness Important?

  1. Outlier Detection
  • Extreme skew usually means outliers are present.
  1. Feature Engineering
  • Many ML models assume normal distribution.
  • Highly skewed data may require transformations (log, Box-Cox, Yeo-Johnson).
  1. Business Insights
  • Identifies rare but impactful cases (e.g., very high salaries, very large companies).

🔹 Rule of Thumb

  • -0.5 to +0.5 → Approximately symmetric
  • ±0.5 to ±1 → Moderate skew
  • > ±1 → Highly skewed

🔹 Conclusion

Skewness is not just a number — it’s a window into your data’s hidden structure. By analyzing skewness during Exploratory Data Analysis (EDA), you can:

  • Detect and handle outliers
  • Apply transformations to stabilize variance
  • Improve machine learning model performance

So next time you see a skewness value in your dataset, take a closer look — it might reveal something important about your data’s story.

Top comments (0)