Feature scaling is a technique used in data preprocessing to ensure that all the features (independent variables) in a dataset have a similar scale. This is important because many machine learning algorithms perform better prediction when the features are on a similar scale.

Example: we have a dataset with two features: height (in centimeters) and weight (in kilograms). If we don’t scale these features, the algorithm might give more importance to the feature with larger values (height) just because of its scale, not because it’s more important.

**Why use Feature Scaling?**

Feature scaling is important in machine learning for several key reasons:

Improves Algorithm Performance: Many machine learning algorithms, like those using gradient descent, work better and converge faster when the features are on a similar scale.

Fair Contribution: Without scaling, features with larger ranges can dominate the learning process. Scaling ensures that all features contribute equally to the model.

Accurate Distance Calculations: Algorithms that rely on distance calculations, such as k-nearest neighbors (KNN) and support vector machines (SVM), need scaled features to compute accurate distances.

Consistent Units: When features are measured in different units (e.g., height in centimeters and weight in kilograms), scaling helps standardize them, making the model’s calculations more consistent.

Handling Outliers: Some scaling methods, like robust scaling, help reduce the impact of outliers by focusing on the median and interquartile range.

*Algorithms That Need Feature Scaling:*

**1.Gradient Descent-Based Algorithms:** These include linear regression, logistic regression, and neural networks. Scaling helps these algorithms converge faster.

**2.Distance-Based Algorithms:** Algorithms like k-nearest neighbors (KNN) and support vector machines (SVM) rely on distance calculations, which are affected by the scale of the features.

Principal Component Analysis (PCA): PCA is sensitive to the variances of the features, so scaling ensures that each feature contributes equally.

**3.Clustering Algorithms:** Algorithms like k-means clustering use distance measures, so scaling is important to ensure fair clustering.

*Algorithms That Do Not Need Feature Scaling:*

Tree-Based Algorithms: Decision trees, random forests, and gradient boosting algorithms (like XGBoost) are generally not affected by the scale of the features. These algorithms split the data based on feature values, so scaling is not necessary.

Naive Bayes: This algorithm is based on probability and is not influenced by the scale of the features.

*Types of feature scaling:*

**1.Standardization (Also called as Z-score Normalization):** Standardization is a technique used to transform data so that it has a mean of 0 and a standard deviation of 1. This process is particularly useful when the data follows a Gaussian (normal) distribution.

First, we should calculate the mean and standard deviation of the data we would like to normalize.

Then we are supposed to subtract the mean value from each entry and then divide the result by the standard deviation.

**Formula:** Z = X - mu / sigma

**Where:**

(X): is original feature value.

(mu):is the mean(average) of the feature.

(sigma): is the standard deviation of the feature.

(Z): is the standardized value.

Example with Age and Salary Features

Age=[25,35,45,55,65]

Salary = [50000,60000,70000,80000,90000]

**Calculate the Mean(mu):**

For Age: mean = 25 + 35 + 45 + 55 + 65 / 5 = 45

For Salary: mean = 50000 + 60000 + 70000 + 80000 + 90000 / 5 = 70000

**Calculate the Standard Deviation(sigma):**

For Age: sigma = (25-45)^2 + (35-45)^2 + (45-45)^2 + (55-45)^2 + (65-45)^2 / 5 = 14.14

For Salary: sigma = (50000-70000)^2 + (60000-70000)^2 + (70000-70000)^2 + (80000-70000)^2 + (90000-70000)^2 / 5 = 14142.14

**Apply the Formula:**

For Age:

X1 = 25 - 45 / 14.14 = -1.41

X1 = 35 - 45 / 14.14 = -0.71

X1= 45 - 45 / 14.14 = 0

X1 = 55 - 45 / 14.14 = 0.71

X1 = 65 - 45 / 14.14 = 1.41

For Salary:

X2 = 50000 - 70000 / 14142.14 = -1.41

X2 = 60000 - 70000 / 14142.14 = -0.71

X2 = 70000 - 70000 / 14142.14 = 0

X2 = 90000 - 70000 / 14142.14 = 1.41

**Standardized Data:**The standardized values for Age and Salary are:

**Age(Standardized)** = [-1.41, -0.71, 0, 0.71, 1.41]

**Salary(Standardized)** = [-1.41, -0.71, 0, 0.71, 1.41]

**2.Normalization(Min-Max Scaling)**

Normalization scaling is a technique used to adjust the range of feature values in our data so they all fit within a specific scale, usually 0 to 1.

*Why Use Normalization?*

Equal Contribution: Ensures each feature contributes equally in the analysis, preventing larger scale features from dominating the results.

Improves Performance: Helps algorithms like KNN, SVM, and Neural Networks perform better and more efficiently by removing bias due to varying feature scales.

Smooth Convergence: Makes training algorithms converge faster and more reliably.

*How It Works:*

Identify Min and Max Values: Find the minimum and maximum values for each feature.

Apply the Formula: Rescale each feature using the formula:

Normalized Value = Original Value - Min / Max - Min

**Example:**

Imagine we have heights (cm) and weights (kg) of people:

Height: Min = 150, Max = 200

Weight: Min = 50, Max = 100

If a person is 180 cm tall and weighs 75 kg:

Normalized Height: 180 - 150 / 200 - 150 = 0.6

Normalized Weight: 75 - 50 / 100 - 50 = 0.5

This rescaling makes it easier for algorithms to process the data without any one feature overshadowing the others.

Source code example link : https://www.kaggle.com/code/sagarborse/notebookb6572850b7

## Top comments (0)