Ordinary Least Squares in Simple Linear Regression - Unveiling the Math Behind the Line

Today we will talk about the detailed working behind the simple linear regression.

Explaining Ordinary Lease Squares (OLS)

Recall that in Simple Linear Regression, our objective is to find the best-fitting line that minimizes the overall error between the data points and the line itself. The OLS method is a common approach used to achieve this. Let's break down the steps involved:

1. Initial State

Imagine we have a scatter plot with four data points. we will draw an initial regression line to start the things. Our goal is to minimize the sum of the distances between these points and the line.
Let's take the distance from the point 1 to 4 be
d1,d2,d3 and d4

So we need to minimize the sum of d1+d2+d3+d4

2. Square of Distances

To avoid positive and negative distances cancelling each other out, we square the distances from each data point to the line. This ensures that all distances contribute to the error calculation, regardless of direction.

So sum will be d1^2 + d2^2 + d3^2 + d4^2

3. Error Function

Mathematically, the error function (E) is expressed as the sum of the squared difference
between the actual y values (yi) and the predicted values (yi_hat) estimated by the regression line:

4. Dissecting the Error

The predicted value (yi_hat) for each data point can be calculated using the equation of line (y = mx + b), where m is the slope and b is the y-intercept. With this error function can be written as

5. Finding the Perfect Fit

To minimize the error function and achieve the best fit, we need to find the values of m and b that make E the Smallest possible. We can achieve this by taking partial derivatives of E with respect to m and b, and then setting those derivatives to zero.

Solving the above equations provides us with the formulas for the optimal slope (m) and y-intercept (b) of best-fitting line

Building Our Own Linear Regression Model in Python: Hands-on Implementation

Now, let's implement the above theoretical understanding into practice

class OwnLR:
    def __init__(self):
        self.m = None
        self.b = None


    def fit(self, X_train,y_train):
        num = 0
        den = 0

        for i  in range(X_train.shape[0]):
            num = num + ((X_train[i] - X_train.mean())*(y_train[i] - y_train.mean()))
            den = den + ((X_train[i] - X_train.mean())*(X_train[i] - X_train.mean()))

        self.m = num/den
        self.b = y_train.mean()  - (self.m * X_train.mean())

    def predict(self,X_test):
        return self.m * X_test + self.b

This class defines two methods:

fit(X_train, y_train): This method takes the training data and calculates the optimal values of m an b using OLS formulas.
predict(X_test): This method takes a new data point as input and returns the predicted y value.

Conclusion

Today, we have explored the Ordinary Least Squares method in Simple Linear Regression. We have learned how OLS helps us find the best-fit line by minimizing the sum of squared errors. We've also implemented our own Simple Linear Regression class in Python.

Stay tuned for new topics