Welcome back to the Statistics Challenge for Data Scientists!
Today, we’re learning something that makes our data fair — Normalization.
What is Normalization?
Imagine you and your friend are running a race.
- You run 100 meters
- Your friend runs 1 kilometer (1000 meters)
Can we directly compare who runs faster?
Not really — because the units and scales are different.
That’s exactly what happens with data — some numbers are small (like age), and some are huge (like salary).
Normalization means scaling data so that all values fit into a similar range and can be compared fairly.
Why Do We Need Normalization?
Think of a teacher giving marks to students:
- Math score: 100 marks
- Science score: 50 marks
If we add them directly, Math will dominate because its maximum is higher.
To treat both subjects fairly, we scale the marks — that’s normalization.
In data science, normalization helps machine learning models:
- Work faster
- Learn better
- Give fair importance to each feature
Two Popular Normalization Methods
Let’s understand the two most common types — Min-Max Normalization and Z-Score Normalization.
1. Min-Max Normalization (Feature Scaling)
It squeezes all data values between 0 and 1.
Formula:
X' = (X - Xmin) / (Xmax - Xmin)
Example:
Let’s say we have ages: 10, 20, 30, 40, 50.
- Minimum = 10
- Maximum = 50
For age = 30
X' = (30 - 10) / (50 - 10) = 20 / 40 = 0.5
So, the normalized value is 0.5.
When to Use:
- When your data has a fixed range (like 0 to 100 marks).
- Best for algorithms that depend on distance (like KNN, K-Means, Neural Networks).
2. Z-Score Normalization (Standardization)
This method centers the data around mean = 0 and standard deviation = 1.
It shows how far each value is from the average.
Formula:
Z = (X - μ) / σ
Where:
- μ = Mean of the data
- σ = Standard deviation
Example:
Let’s say heights (in cm): 150, 160, 170, 180, 190
- Mean (μ) = 170
- Standard deviation (σ) = 14.14
For height = 150
Z = (150 - 170) / 14.14 = -1.41
So, 150 cm is 1.41 standard deviations below the mean.
When to Use:
- When data doesn’t have a fixed range.
- Works well with algorithms assuming normal distribution (like Linear Regression, Logistic Regression, PCA).
Min-Max vs Z-Score — Quick Comparison
| Feature | Min-Max Normalization | Z-Score Normalization |
|---|---|---|
| Range | 0 to 1 | Can be negative or positive |
| Depends on | Min & Max values | Mean & Standard Deviation |
| Sensitive to outliers | Yes | Less sensitive |
| Best for | Bounded data (e.g. exam scores) | Unbounded data (e.g. height, salary) |
Summary
- Normalization makes data fair by bringing all features to a similar scale.
- Use Min-Max when data has clear limits (like percentages).
- Use Z-Score when data spreads freely and you care about distance from average.
Quick Recap Example
| Original Value | Min-Max (0-1) | Z-Score |
|---|---|---|
| 10 | 0.0 | -1.41 |
| 30 | 0.5 | 0.0 |
| 50 | 1.0 | +1.41 |
In short:
Normalization is like giving everyone the same playing field so that your machine learning model doesn’t play favorites!
I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!
Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots


Top comments (0)