DEV Community

Nitin Kendre
Nitin Kendre

Posted on

Polynomial Regression with Python: A Flexible Approach for Non-Linear Curve Fitting

Polynomial Regression is a powerful technique, which helps us to model the relationship between dependent and independent variable. Independent variables can be multiple (one or many). It extends the concept of Simple or multiple Linear Regression by allowing for more flexible curve fitting.

In this article we will see some theory behind it and How to implement it using python.

1. Understanding the Polynomial Regression

Introduction to Polynomial Regression :

The goal or aim of polynomial regression is to fit a polynomial equation to a given set of data points.

Below is the polynomial equation :


y = b0 + b1 * x + b2 * x^2 + ... + bn * x^n

Enter fullscreen mode Exit fullscreen mode

where,

  • y is the dependent variable.
  • x is the independent variable.
  • b1, b2, ..., bn are the coefficients of polynomial terms.
  • n denotes the degree of polynomial

Advantages and Limitations :

Polynomial Regression offers several advantages over simple Linear Regression -

  • Flexibility : By using polynomial terms, we can capture non-linear relationships between variables that can not be represented by a straight line.

  • Higher Order Trends : Polynomial Regression can capture Higher order trends in data which can be used for more accurate predictions.

  • Interpretability : The polynomial Terms provide insights into the relationships between variables.

But Polynomial Regression also have some limitations -

  • Overfitting : Using high-degree polynomials can easily cause the training data to overfit. Which results in poor generalization to unseen data.

  • Computational Complexity : When we increase the degree of polynomial then complexity of the regression model also increases with the degree. Which makes it more computationally expensive.

2. Implementing Polynomial Regression using Python :

We will use sklearn library from python language to implement Polynomial Regression. sklearn provides a simple and efficient way to build machine learning models.

Below are the steps to implement the Polynomial Regression -

step 1 : Data Preparation

In this step, we will load the data and preprocess it for training our model. This step also ensure that data should in required format.


## Importing Libraries

import numpy as np
import pandas as pd

## Loading the dataset

sal_data = pd.read_csv("Position_Salaries.csv")

## creating dependent and independent variables

x = sal_data.iloc[:,1:-1].values
y = sal_data.iloc[:,-1].values


Enter fullscreen mode Exit fullscreen mode

step 2 : Feature Engineering

In this step we will generate the polynomial terms by transforming our independent variables into polynomial terms.

Below is the code for generating polynomial terms -


from sklearn.preprocessing import PolynomialFeatures
pf = PolynomialFeatures(degree=3)
x_poly = pf.fit_transform(x)

Enter fullscreen mode Exit fullscreen mode

step 3 : Model Training

In this step we will train our model using above generated polynomial terms.

Below is the code for training -


from sklearn.linear_model import LinearRegression
lr2 = LinearRegression()
lr2.fit(x_poly,y)

Enter fullscreen mode Exit fullscreen mode

step 4 : Evaluating the model

We will use r_squared method which is in sklearn library to evaluate the model.


from sklearn.metrics import r2_score
y_pred = lr2.predict(x_poly)

r2 = r2_score(y, y_pred)

print("r_squared : ",r2)

Enter fullscreen mode Exit fullscreen mode

step 5 : Visualizing results from Polynomial Regression

This step will visualize the predicted values and original values on the line plot/scatter plot.

import matplotlib.pyplot as plt

plt.scatter(x,y, color='red')
plt.plot(x,lr2.predict(x_poly),color='blue')
plt.xlabel('level')
plt.ylabel('Salary')
plt.title('Polynomial')
plt.show()

Enter fullscreen mode Exit fullscreen mode

These all steps I personally learned and implemented it using python. So if you find any mistake please comment down.

Conclusion :

Remember to carefully choose the degree of the polynomial to avoid overfitting and consider the computational complexity of higher degree polynomials.

Top comments (0)