Why Feature Scaling Matters in ML?!

#featurescaling #machinelearning #ai #datascience

Ever tried comparing a basketball to a watermelon…?! That's what your machine learning model feels without feature scaling!
If you feed your AI model age in years and net worth in millions.
One feature overpowers the other. The result? Terrible predictions.
No balance. No fairness.

Fig. ML model without feature scaling

Imagine a dataset with two features:
Age: 20 to 70
Income: 15,000 to 2,500,000

Without scaling, income will dominate distance-based decisions just because of its larger numbers. Now, incase of a insurance premium calculation, income may become deciding factor instead of age.

Fig. Wrong prediction because of net worth's high value

Feature Scaling:

According to wiki:
Feature scaling is a method used to normalize the range of independent variables or features of data.
Feature scaling is a data preprocessing technique that transforms the values of numeric features into a similar scale or range. It is done typically so that no one attribute dominates others due to its high numeric value.

Fig. Scaling using Age divided by 100 and net worth by 100,000 in the previous figure

It makes sure all the features are in the same range numerically so that they can be compared without much external factors. A feature is a characteristic or attribute of your data.

Why is Feature Scaling Important ?

Many ML models - especially those based on distance calculations or gradient descent - are sensitive to the scale of data. Without scaling:
Attributes with larger values can unfairly influence the model.
Models like K-Nearest Neighbors (KNN) or Support Vector Machines (SVM) can give wrong predictions.

Feature scaling Methods:

We can rescale all features to the same range - like 0 to 1 or -1 to 1 -
so every input gets equal importance.
Two common methods:

i) Standardization -

Centers data around zero. Data can be negative or positive. Formula for standardization is:
st. value = (original value - mean of the feature) / standard deviation of the feature

Fig. Apply Standaradization on data

ii) Normalization -

Compress values between 0 and 1. It is often called Min-Max Scaling. In this feature scaling technique it resizes all values in a feature to a fixed range, typically between 0 and 1.
norm. value = (original value - min value of the feature) / (max value of the feature - min value of the feature)