Koki Esaki

Posted on Feb 3 • Updated on Feb 11

Deciphering Standardization and Normalization: Understanding Feature Scaling Techniques

#machinelearning #datascience #ai #dataengineering

Importance of Feature Scaling

Machine learning algorithms, such as linear regressions and neural networks, work better or converge faster when the features are on a similar scale, and standardization makes the scale of the features similar.

For example, when considering features like age and income, your model may prioritize income over age due to the significant difference in the scale of values.

Standardization (Z-score normalization)

Standardization rescales the feature of a dataset so that they have a mean of 0 and a standard deviation (SD) of 1. This feature scaling technique is achieved by subtracting the average value of the feature from respective feature and then dividing by the standard deviation.

The formula for standardization is:

x_i = \frac{x_i - mean(x)}{SD(x)}

It is less affected by outliers than normalization. Therefore, this method often used when the maximum and minimum values are not fixed or when outliers exist.

from sklearn import preprocessing
import numpy as np


X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])

scaler = preprocessing.StandardScaler().fit(X_train)
X_scaled = scaler.transform(X_train)
print(X_scaled)

array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

Normalization (Min-Max scaling)

Normalization scales the features of a dataset to a specific range, typically between 0 and 1. This is achived by subtracting the minimum value of the feature from respective feature and then dividing by the range.

The formula for normalization is:

x_i = \frac{x_i - min(x)}{max(x) - min(x)}

X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])

min_max_scaler = preprocessing.MinMaxScaler()
X_train_minmax = min_max_scaler.fit_transform(X_train)
print(X_train_minmax)

array([[0.5       , 0.        , 1.        ],
       [1.        , 0.5       , 0.33333333],
       [0.        , 1.        , 0.        ]])

Implementations from Scratch

First, we will import the necessary libraries, load the dataset, and use the two features from the Iris dataset for the demonstration.

pip install numpy==1.23.5 pandas==1.5.3 scikit-learn==1.2.2 matplotlib==3.7.4

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris


iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
X = data.iloc[:, 2:]

Standardization takes the mean as zero and the variance as one. The following code demonstrates how to standardize the dataset.

def standardize(X):
    return (X - np.mean(X, axis=0)) / np.std(X, axis=0)


X_std = standardize(X)

Normalization is a 0-1 scaling method where the minimum value is 0 and the maximum value is 1. The following code shows how to normalize the dataset.

def normalize(X):
    return (X - np.min(X, axis=0)) / (np.max(X, axis=0) - np.min(X, axis=0))


X_norm = normalize(X)

The preprocessing results can be visualized using the following plotting method. The first plot shows the original dataset, the second plot shows the standardized dataset, and the third plot shows the normalized dataset.

import matplotlib.pyplot as plt


fig = plt.figure(figsize=(16, 12))

ax = fig.add_subplot(2, 2, 1)
ax.scatter(X.iloc[:, 0], X.iloc[:, 1])
ax.set_title("Before Standardization")
ax.set_xlabel("petal length (cm)")
ax.set_ylabel("petal width (cm)")

ax = fig.add_subplot(2, 2, 3)
ax.scatter(X_std.iloc[:, 0], X_std.iloc[:, 1])
ax.set_title("After Standardization")
ax.set_xlabel("petal length (cm)")
ax.set_ylabel("petal width (cm)")

ax = fig.add_subplot(2, 2, 4)
ax.scatter(X_norm.iloc[:, 0], X_norm.iloc[:, 1])
ax.set_title("After Normalization")
ax.set_xlabel("petal length (cm)")
ax.set_ylabel("petal width (cm)")

plt.show()

DEV Community

Deciphering Standardization and Normalization: Understanding Feature Scaling Techniques

Importance of Feature Scaling

Standardization (Z-score normalization)

Normalization (Min-Max scaling)

Implementations from Scratch

References

Top comments (0)

Read next

Di1 - AI Driven Insights With Cloudflare

How to build a custom GPT: Step-by-step tutorial

The Future of AI in Software

Fun, Beautiful, Safe, Printable 'Story Cards' for Kids with Cloudflare AI