DEV Community

Cover image for Machine Learning - Regression- Simple Linear Regression
Nikhil Dhawan
Nikhil Dhawan

Posted on

Machine Learning - Regression- Simple Linear Regression

Hi all in this post we will see how we can train Simple Linear regression model. After we are done with preprocessing of data we can direct train our model with help of LinearRegression from sklearn linearmodel. First we train and then predict the testing values that will help to see how good our model is. To train we use fir method on LinearReression and using predict method on the same for testing data.
Before we move to code, let me tell you some BTS for SLR, our data is of shape as below:

Image1

so here we have salary for the years of experience and we have to make a mode to be able to predict salary in future based on years we input in model. SLR is usually depicted as:

yh= bo + b1X1

where yh is dependent variable, bo is y-intercept and is constant, b1 is slope co-officient and X1 is independent variable

Model tries to find best slope line such that we have ordinary least squares (sum of (y1-yh)^2 is minimized)

Let's move to code part

Training the model

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Predicting the test results

y_pred = regressor.predict(X_test)
Enter fullscreen mode Exit fullscreen mode

Lets now visualize the plot for training set, for that we use pyplot from matplotlib

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image2

Plot for Test set

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image3

Here if we see closely we are using plot line same for both , that's because the slope line will be same for both as the model we used will use same predictor while predicting values for unknown values, even if you change that to X_test it should be same. Red dots in second graph shows actual values we have in data and values along line will be predicted values, and we can see most of the points are overlapping but some are not. No model can be 100% fitting.

Thanks for reading, hope it was useful ,see you next time

Top comments (0)