Polynomial Regression is a powerful technique, which helps us to model the relationship between dependent and independent variable. Independent variables can be multiple (one or many). It extends the concept of Simple or multiple Linear Regression by allowing for more flexible curve fitting.
In this article we will see some theory behind it and How to implement it using python.
1. Understanding the Polynomial Regression
Introduction to Polynomial Regression :
The goal or aim of polynomial regression is to fit a polynomial equation to a given set of data points.
Below is the polynomial equation :
y = b0 + b1 * x + b2 * x^2 + ... + bn * x^n
where,
-
y
is the dependent variable. -
x
is the independent variable. -
b1, b2, ..., bn
are the coefficients of polynomial terms. -
n
denotes the degree of polynomial
Advantages and Limitations :
Polynomial Regression offers several advantages over simple Linear Regression -
Flexibility
: By using polynomial terms, we can capture non-linear relationships between variables that can not be represented by a straight line.Higher Order Trends
: Polynomial Regression can capture Higher order trends in data which can be used for more accurate predictions.Interpretability
: The polynomial Terms provide insights into the relationships between variables.
But Polynomial Regression also have some limitations -
Overfitting
: Using high-degree polynomials can easily cause the training data to overfit. Which results in poor generalization to unseen data.Computational Complexity
: When we increase the degree of polynomial then complexity of the regression model also increases with the degree. Which makes it more computationally expensive.
2. Implementing Polynomial Regression using Python :
We will use sklearn
library from python language to implement Polynomial Regression. sklearn
provides a simple and efficient way to build machine learning models.
Below are the steps to implement the Polynomial Regression -
step 1 : Data Preparation
In this step, we will load the data and preprocess it for training our model. This step also ensure that data should in required format.
## Importing Libraries
import numpy as np
import pandas as pd
## Loading the dataset
sal_data = pd.read_csv("Position_Salaries.csv")
## creating dependent and independent variables
x = sal_data.iloc[:,1:-1].values
y = sal_data.iloc[:,-1].values
step 2 : Feature Engineering
In this step we will generate the polynomial terms by transforming our independent variables into polynomial terms.
Below is the code for generating polynomial terms -
from sklearn.preprocessing import PolynomialFeatures
pf = PolynomialFeatures(degree=3)
x_poly = pf.fit_transform(x)
step 3 : Model Training
In this step we will train our model using above generated polynomial terms.
Below is the code for training -
from sklearn.linear_model import LinearRegression
lr2 = LinearRegression()
lr2.fit(x_poly,y)
step 4 : Evaluating the model
We will use r_squared method which is in sklearn library to evaluate the model.
from sklearn.metrics import r2_score
y_pred = lr2.predict(x_poly)
r2 = r2_score(y, y_pred)
print("r_squared : ",r2)
step 5 : Visualizing results from Polynomial Regression
This step will visualize the predicted values and original values on the line plot/scatter plot.
import matplotlib.pyplot as plt
plt.scatter(x,y, color='red')
plt.plot(x,lr2.predict(x_poly),color='blue')
plt.xlabel('level')
plt.ylabel('Salary')
plt.title('Polynomial')
plt.show()
These all steps I personally learned and implemented it using python. So if you find any mistake please comment down.
Conclusion :
Remember to carefully choose the degree of the polynomial to avoid overfitting and consider the computational complexity of higher degree polynomials.
Top comments (0)