First off let's start with the questions on your mind:
What is Scikit-learn?
Scikit-learn is a Python framework for machine learning. It features various algorithms like support vector machines, random forests, and k-neighbors, which you are going to learn here.
What is Linear Regression?
A statistical way of measuring the relationship between variables. Just know that with Linear Regression, you can predict the future.
There are two types of Linear Regression:
- Simple Linear Regression
- Multiple Linear Regression
Just know that Multiple Linear Regression is an extension of Simple Linear Regression. It is used when we want to predict the value of a variable based on the value of two or more other variables.
That's enough information for now. We're gonna start coding.
This first article is for Simple Linear Regression the second part is for Multiple Linear Regression.
We have to install the following libraries using pip
:
pip install pandas
pip install numpy
pip install sklearn
Click here to install the dataset we're gonna use. Then extract the Salary_Data.csv
file inside it.
You should see a .csv
file like this:
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
The data explanation:
As you can see there is a column called YearsExperience
. This is the feature. In ML a feature is an individual measurable property or characteristic of a phenomenon being observed.
Also
there is a column called Salary
. This is the Label. In ML a label is the thing we're predicting. It's the y
variable in Simple Linear Regression.
Open your Code Editor and make a new Python file called: linear_regression.py
or you could open a Jupyter Notebook.
Importing the needed libraries
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
We use the as
keyword to give the imported module an alias to make our code shorter.
Load and view dataset
df = pd.read_csv('Salary_Data.csv')
print(df.head())
OUTPUT:
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
Feature Extraction
x = df['YearsExperience']
x = x.values.reshape(-1, 1)
y = df['Salary']
Making a regression model
model = LinearRegression()
model.fit(x,y)
print(model.score(x,y))
Just know that the last line print(model.score(x,y))
is done to check how accurate your model is.
Below is the output of the print()
statement above. The .score()
function is used to get the accuracy of your model.
OUTPUT
0.9569566641435086
The closer it is to 1
the more accurate it is.
Making predictions with your model
print(model.predict([[3]]))
print(model.predict([[4]]))
print(model.predict([[5]]))
OUTPUT
[54142.08716303]
[63592.04948449]
[73042.01180594]
That's how simple it is. What you've done now is that you've predicted the salary of a person from their years of experience
You can visit Kaggle to find more datasets that you can perform Linear Regression on.
Feel free to ask questions.
GOOD LUCK 👍
Top comments (0)