Chanchal Singh

Posted on Nov 15

Statistics Day 4: Z-Score vs Min-Max Normalization — Making Data Fair for ML Models

#statistics #datascience #machinelearning #beginners

Welcome back to the Statistics Challenge for Data Scientists!

Today, we’re learning something that makes our data fair — Normalization.

What is Normalization?

Imagine you and your friend are running a race.

You run 100 meters
Your friend runs 1 kilometer (1000 meters)

Can we directly compare who runs faster?
Not really — because the units and scales are different.

That’s exactly what happens with data — some numbers are small (like age), and some are huge (like salary).

Normalization means scaling data so that all values fit into a similar range and can be compared fairly.

Why Do We Need Normalization?

Think of a teacher giving marks to students:

Math score: 100 marks
Science score: 50 marks

If we add them directly, Math will dominate because its maximum is higher.

To treat both subjects fairly, we scale the marks — that’s normalization.

In data science, normalization helps machine learning models:

Work faster
Learn better
Give fair importance to each feature

Two Popular Normalization Methods

Let’s understand the two most common types — Min-Max Normalization and Z-Score Normalization.

1. Min-Max Normalization (Feature Scaling)

It squeezes all data values between 0 and 1.

Formula:

X' = (X - Xmin) / (Xmax - Xmin)

Example:
Let’s say we have ages: 10, 20, 30, 40, 50.

Minimum = 10
Maximum = 50

For age = 30

X' = (30 - 10) / (50 - 10) = 20 / 40 = 0.5

So, the normalized value is 0.5.

When to Use:

When your data has a fixed range (like 0 to 100 marks).
Best for algorithms that depend on distance (like KNN, K-Means, Neural Networks).

2. Z-Score Normalization (Standardization)

This method centers the data around mean = 0 and standard deviation = 1.
It shows how far each value is from the average.

Formula:

Z = (X - μ) / σ

Where:

μ = Mean of the data
σ = Standard deviation

Example:
Let’s say heights (in cm): 150, 160, 170, 180, 190

Mean (μ) = 170
Standard deviation (σ) = 14.14

For height = 150

Z = (150 - 170) / 14.14 = -1.41

So, 150 cm is 1.41 standard deviations below the mean.

When to Use:

When data doesn’t have a fixed range.
Works well with algorithms assuming normal distribution (like Linear Regression, Logistic Regression, PCA).

Min-Max vs Z-Score — Quick Comparison

Feature	Min-Max Normalization	Z-Score Normalization
Range	0 to 1	Can be negative or positive
Depends on	Min & Max values	Mean & Standard Deviation
Sensitive to outliers	Yes	Less sensitive
Best for	Bounded data (e.g. exam scores)	Unbounded data (e.g. height, salary)

Summary

Normalization makes data fair by bringing all features to a similar scale.
Use Min-Max when data has clear limits (like percentages).
Use Z-Score when data spreads freely and you care about distance from average.

Quick Recap Example

Original Value	Min-Max (0-1)	Z-Score
10	0.0	-1.41
30	0.5	0.0
50	1.0	+1.41

In short:
Normalization is like giving everyone the same playing field so that your machine learning model doesn’t play favorites!

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots