When analyzing data, summary statistics like mean and median often help us understand central tendency. But they don’t tell us about the shape of the distribution. Sometimes data leans more to one side — this property is called skewness.
Today we are learning to ;
- What skewness is
- Types of skewness
- A worked-out math example
- Real dataset results
- Why skewness matters in data analysis and ML
🔹 What is Skewness?
Skewness measures how asymmetric a data distribution is.
- Skewness ≈ 0 → Symmetric (Normal Distribution)
- Skewness > 0 → Right (Positive) Skew
- Skewness < 0 → Left (Negative) Skew
Formula:
Where:
🔹 Types of Skewness
- Positive Skew (Right Skew)
- Tail longer on the right.
- Mean > Median.
- Example: Salaries, house prices.
- Negative Skew (Left Skew)
- Tail longer on the left.
- Mean < Median.
- Example: Retirement age, exam scores (few very low values).
- No Skew (Symmetric)
- Balanced distribution.
- Mean ≈ Median ≈ Mode.
- Example: Human height.
🔹 Step-by-Step Math Example
Consider the Dataset:
`X = [40, 45, 50, 55, 60, 65, 70, 75, 80, 100]`
✅ Result: Positive skew → most students scored near the average, but one high score (100) stretched the tail to the right.
🔹 Real Dataset Example
We computed skewness for three numerical features:
no_of_employees 12.26
yr_of_estab -2.03
prevailing_wage 0.76
📌 Interpretation:
- no_of_employees → Highly right-skewed (few very large companies).
- yr_of_estab → Left-skewed (few very old companies).
- prevailing_wage → Moderately right-skewed (most salaries are average, few very high).
🔹 Why is Skewness Important?
- Outlier Detection
- Extreme skew usually means outliers are present.
- Feature Engineering
- Many ML models assume normal distribution.
- Highly skewed data may require transformations (log, Box-Cox, Yeo-Johnson).
- Business Insights
- Identifies rare but impactful cases (e.g., very high salaries, very large companies).
🔹 Rule of Thumb
- -0.5 to +0.5 → Approximately symmetric
- ±0.5 to ±1 → Moderate skew
- > ±1 → Highly skewed
🔹 Conclusion
Skewness is not just a number — it’s a window into your data’s hidden structure. By analyzing skewness during Exploratory Data Analysis (EDA), you can:
- Detect and handle outliers
- Apply transformations to stabilize variance
- Improve machine learning model performance
So next time you see a skewness value in your dataset, take a closer look — it might reveal something important about your data’s story.
Top comments (0)