DEV Community

Balbir Yadav
Balbir Yadav

Posted on

1

Data Normalization using Python Tutorial

In this tutorial blog, I'm going to cover how to normalize data in python easily. I'll also show actual implementation by coding for the same. The jupyter Notebook for this tutorial can be viewed on GitHub

What is Normalization?

Normalization of data is basically scaling of data. Which is commonly scaled between 0 and 1.

Why do we need to Normalize data?

It is observed that machine learning algorithms tend to perform better and converge faster when the data are of a smaller scale.
So, it is always recommended and it is common practice to normalize your data.
Once you have normalized your data then you can build a model.

Normalization decreases the sensitivity to the scale of the features resulting in better results after training.

Formula for normalization is given below

Xnorm = (X-Xmin)/(Xmax-Xmin)

Just subtract the data point with min of all the data point and then divide the result with the difference with the maximum and minimum of all data points.

Normalization of data using Sklearn`

from sklearn import preprocessing
import numpy as np

x=np.linspace(10, 100, 10)
x

Ourput: array([ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.])

Note: linspace(a,b,size) is used to array which starts with a and ends on b with total number of elements equal to size.

Now, Let's Normalize the array x that we have created just above.

normalized_x=preprocessing.normalize([x])
normalized_x

Output: array([[0.05096472, 0.10192944, 0.15289416, 0.20385888, 0.2548236 ,
0.30578831, 0.35675303, 0.40771775, 0.45868247, 0.50964719]])

Boom!! Array has been normalized

You can see that every element in the normalized array has been rescaled between zero to one.

Another way to Normalize Data in python by using MinMaxScaler()

MinMaxScaler() is also provided by sklearn. It is also good to know here what MinMaxScaler gives us an option to rescale data in the choice of our range. That is suppose you want to rescale your data points between 0 to 4(say) then you can easily do it by using MinMaxScaler.

Let's implement it
from sklearn import preprocessing
import pandas as pd

data = pd.read_csv("restaurent.csv")
data

Image description

Image description

Image description

Boom!! Normalization done using MinMaxScaler()

Note: The beauty of the approach is that we can normalize all the columns in one go.

Let's Normalise the data in range of our choice

This will be done with the help of MinMaxScaler only but this time we will change the feature_range to, say, (0,3)

Image description

data2 = pd.read_csv("restaurent.csv")
data


Image description

normalizer=preprocessing.MinMaxScaler(feature_range=(0,3))
col_name=data.columns
d=normalizer.fit_transform(data)
normalized_data=pd.DataFrame(d,columns=col_name)
normalized_data

Image description

Boom!!

we can observe that it is rescaled to range of 0 to 3 for all the columns in just one go.

Conclusion

As we have seen in the above tutorial that we can normalize data in the range of our choice with the help of MinMaxScaler() provided under sklearn. But it is always recommended to normalize data between zero to one.

So, That's all from this tutorial. See you next tutorial blog. Thank you!

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

Try REST API Generation for MS SQL Server.

DevOps for Private APIs. With DreamFactory API Generation, you get:

  • Auto-generated live APIs mapped from database schema
  • Interactive Swagger API documentation
  • Scripting engine to customize your API
  • Built-in role-based access control

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay