DEV Community

Cover image for Statistics Day 4: Z-Score vs Min-Max Normalization — Making Data Fair for ML Models
Chanchal Singh
Chanchal Singh

Posted on

Statistics Day 4: Z-Score vs Min-Max Normalization — Making Data Fair for ML Models

Welcome back to the Statistics Challenge for Data Scientists!

Today, we’re learning something that makes our data fair — Normalization.


What is Normalization?

Imagine you and your friend are running a race.

  • You run 100 meters
  • Your friend runs 1 kilometer (1000 meters)

Can we directly compare who runs faster?
Not really — because the units and scales are different.

That’s exactly what happens with data — some numbers are small (like age), and some are huge (like salary).

Normalization means scaling data so that all values fit into a similar range and can be compared fairly.


Why Do We Need Normalization?

Think of a teacher giving marks to students:

  • Math score: 100 marks
  • Science score: 50 marks

If we add them directly, Math will dominate because its maximum is higher.

To treat both subjects fairly, we scale the marks — that’s normalization.

In data science, normalization helps machine learning models:

  • Work faster
  • Learn better
  • Give fair importance to each feature

Two Popular Normalization Methods

Let’s understand the two most common types — Min-Max Normalization and Z-Score Normalization.


1. Min-Max Normalization (Feature Scaling)

It squeezes all data values between 0 and 1.

Formula:

X' = (X - Xmin) / (Xmax - Xmin)
Enter fullscreen mode Exit fullscreen mode

Example:
Let’s say we have ages: 10, 20, 30, 40, 50.

  • Minimum = 10
  • Maximum = 50

For age = 30

X' = (30 - 10) / (50 - 10) = 20 / 40 = 0.5
Enter fullscreen mode Exit fullscreen mode

So, the normalized value is 0.5.

Min-max normalization demonstration

When to Use:

  • When your data has a fixed range (like 0 to 100 marks).
  • Best for algorithms that depend on distance (like KNN, K-Means, Neural Networks).

2. Z-Score Normalization (Standardization)

This method centers the data around mean = 0 and standard deviation = 1.
It shows how far each value is from the average.

Formula:

Z = (X - μ) / σ
Enter fullscreen mode Exit fullscreen mode

Where:

  • μ = Mean of the data
  • σ = Standard deviation

Example:
Let’s say heights (in cm): 150, 160, 170, 180, 190

  • Mean (μ) = 170
  • Standard deviation (σ) = 14.14

For height = 150

Z = (150 - 170) / 14.14 = -1.41
Enter fullscreen mode Exit fullscreen mode

So, 150 cm is 1.41 standard deviations below the mean.

z-score normalization demonstration

When to Use:

  • When data doesn’t have a fixed range.
  • Works well with algorithms assuming normal distribution (like Linear Regression, Logistic Regression, PCA).

Min-Max vs Z-Score — Quick Comparison

Feature Min-Max Normalization Z-Score Normalization
Range 0 to 1 Can be negative or positive
Depends on Min & Max values Mean & Standard Deviation
Sensitive to outliers Yes Less sensitive
Best for Bounded data (e.g. exam scores) Unbounded data (e.g. height, salary)

Summary

  • Normalization makes data fair by bringing all features to a similar scale.
  • Use Min-Max when data has clear limits (like percentages).
  • Use Z-Score when data spreads freely and you care about distance from average.

Quick Recap Example

Original Value Min-Max (0-1) Z-Score
10 0.0 -1.41
30 0.5 0.0
50 1.0 +1.41

In short:
Normalization is like giving everyone the same playing field so that your machine learning model doesn’t play favorites!


I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Top comments (0)