One of the most important transformations you need to apply to your data is feature scaling. With few exceptions, Machine Learning algorithms don’t perform well when the input numerical attributes have very different scales. Consider the case for the housing data: the total number of rooms ranges from about 6 to 39,320, while the median incomes only range from 0 to 25. Note that scaling the target values is generally not required.
There are two common ways to get all attributes to have the same scale ,min-max scaling, and standardization.
We do this by subtracting the min value and dividing by the max minus the min. Scikit-Learn provides a transformer called MinMax Scalar for this. It has a feature range hyper parameter that lets you change the range if you don’t want 0–1 for some reason.
Min-max normalization is one of the most popular ways to normalize data.
For every feature,
the minimum value of that feature gets transformed into a 0,
the maximum value gets transformed into a 1,
and every other value gets transformed into a value between 0 and 1.
Where,
X-scaled- new value of feature X which is scaled.
X-old value of feature X.
X-min: -Minimum value of feature X.
X-max: - Maximum value of feature X.
Let us consider one example to make the concept method clear. We have a dataset containing some features, which is shown below in the figure.
As we are able to see here feature age and feature Estimated Salary are totally different with respect to scale, feeding this type of data to the model will result in poor performance and will fail in the real world. That’s why Feature scaling is a must and here we are talking about Minmax Scaling.
After using Scikit learn scaling let’s see the difference between before scaling and after scaling of features Age & Estimated Salary.
As you can see this technique enables us to interpret the data easily. There are no large numbers, only concise data that do not require further transformation and can be used in the decision-making process immediately.
Min-max normalization has one fairly significant downside: it does not handle outliers very well. For example, if you have 99 values between 0 and 40, and one value is 100, then the 99 values will all be transformed into a value between 0 and 0.4.
That data is just as squished as before!
Take a look at the image below to see an example of this.
After normalizing, look at the below diagram it fixed the squishing problem on the y-axis, but the x-axis is still problematic. And the point in orange color is an outlier, which the min-max normalizer doesn’t handle.
You can normalize your dataset using the sci-kit-learn object MinMax Scaler.
Good practice usage with the MinMax Scaler and other scaling techniques is as follows.
Fit the scaler using available training data. For normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit () function.
Apply the scale to training data. This means you can use the normalized data to train your model. This is done by calling the transform () function.
Apply the scale to data going forward. This means you can prepare new data in the future on which you want to make predictions.
We have already covered min-max normalization in detail on our website. Please use the below link to go there.
https://ml-concepts.com/2021/10/08/min-max-normalization/
Also, to know about Overfitting and underfitting in machine learning just click on the link.
Z-score Normalization, google will also understand that this link is for Z-score Normalization.
Also, to know about Embedded Methods and lasso regression, just visit the article.
Summary
In this article, I tried to explain MinMax Normalization in simple terms. If you have any questions related to the post, put them in the comment section and I will do my best to answer them.
Top comments (0)