DEV Community

Mr Codeslinger
Mr Codeslinger

Posted on

Linear Regression with Scikit-learn (Part 2)

This is the second part and here we would be talking about Multiple Linear Regression.

Questions on all your minds:

What is Multiple Linear Regression?

  • It is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.

  • Multiple Linear Regression is used to estimate the relationship between two or more independent variables and one dependent variable.

With Multiple Linear Regression(MLR), you can predict the price of a car, house, and more.

Before we start coding you'll need to install the dataset we're gonna use. Click here to download the dataset we're gonna use. Open the file named 50_Startups.csv. You should see something like this:

Image

GOAL OF THE DAY

We're going to make a regression model that would be able to predict the profit of Startups.

We're gonna start coding now.

Importing the libraries

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

Load and view dataset

df = pd.read_csv('50_Startups.csv')
df.head()

OUTPUT

Image

Feature Extraction

What is Feature Extraction?

  • Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.

  • In other terms, it is the act of selecting useful features from a dataset and dumping the rest.

data = df[['R&D Spend', 'Administration', 'Marketing Spend']]
data = data.values.reshape(-1,3)
labels = df[['Profit']]

As you can see, I did not select the State column to be part of the data. The reason being that it is not really necessary, and any unnecessary data would decrease the chances of your model accuracy being high.

Making a Regression Model

model = LinearRegression()
model.fit(data,labels)
print(model.score(data,labels))

OUTPUT

0.9507459940683246

TAKE NOTE: The closer the accuracy is to 1.0 the better it is. It increases the chances of your model's prediction being true.

Making predictions with your model

print(model.predict([[165349.20, 136897.80, 471784.10]]))
print(model.predict([[144372.41, 118671.85, 383199.62]]))

OUTPUT

[[192521.25289008]]
[[173696.70002553]]

That's how simple it is. What you've done now is that you've predicted the profit of a Startup from some of their expenses.

You can visit Kaggle to find more datasets that you can perform Linear Regression on.

Check out my Twitter or Instagram.

Feel free to ask questions in the comments.

GOOD LUCK 👍

Top comments (0)