Mr Codeslinger

Posted on

# Linear Regression with Scikit-learn (Part 2)

This is the second part and here we would be talking about Multiple Linear Regression.

What is Multiple Linear Regression?

• It is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.

• Multiple Linear Regression is used to estimate the relationship between two or more independent variables and one dependent variable.

With Multiple Linear Regression(MLR), you can predict the price of a car, house, and more.

Before we start coding you'll need to install the dataset we're gonna use. Click here to download the dataset we're gonna use. Open the file named `50_Startups.csv`. You should see something like this:

Image

### GOAL OF THE DAY

We're going to make a regression model that would be able to predict the profit of Startups.

We're gonna start coding now.

### Importing the libraries

``````import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
``````

``````df = pd.read_csv('50_Startups.csv')
``````

OUTPUT

Image

### Feature Extraction

What is Feature Extraction?

• Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.

• In other terms, it is the act of selecting useful features from a dataset and dumping the rest.

``````data = df[['R&D Spend', 'Administration', 'Marketing Spend']]
data = data.values.reshape(-1,3)
labels = df[['Profit']]
``````

As you can see, I did not select the `State` column to be part of the data. The reason being that it is not really necessary, and any unnecessary data would decrease the chances of your model accuracy being high.

### Making a Regression Model

``````model = LinearRegression()
model.fit(data,labels)
print(model.score(data,labels))
``````

OUTPUT

`0.9507459940683246`

TAKE NOTE: The closer the accuracy is to `1.0` the better it is. It increases the chances of your model's prediction being true.

### Making predictions with your model

``````print(model.predict([[165349.20, 136897.80, 471784.10]]))
print(model.predict([[144372.41, 118671.85, 383199.62]]))
``````

OUTPUT

``````[[192521.25289008]]
[[173696.70002553]]
``````

That's how simple it is. What you've done now is that you've predicted the profit of a Startup from some of their expenses.

You can visit Kaggle to find more datasets that you can perform Linear Regression on.

Check out my Twitter or Instagram.