DEV Community

ThatMLGuy
ThatMLGuy

Posted on

ML Learning #1 : Linear Regression

What is Linear Regression

Linear Regression is a fundamental statistical machine learning algorithm that models the linear relationship between a dependent variable ( y\mathbf{y} ) and one or more independent variables ( x\mathbf{x} ). The goal is to fit a straight line (or a hyperplane in multiple dimensions) that minimizes the overall prediction error on the training data.

Note 1.1: Linearly related features exhibit a correlation where a change in one variable results in a proportional change in the other (e.g., as XX increases, YY also tends to increase, or vice versa). Linear Regression works best when this relationship is approximately linear.

Note 1.2: Regression is the process of predicting a continuous or real value (e.g., 265.34, 10.231).

How does it work

Linear Regression models the relationship by defining a linear function, often called the hypothesis y^\hat{y} , which calculates the predicted value.
For Multiple Linear Regression (more than one feature), this line is represented as:

y^=θ0+θ1x1+θ2x2++θnxn \hat{y} = \theta_0 + \theta_1x_1 + \theta_2x_2 + \dots + \theta_nx_n

y^\hat{y} is the predicted value (the model’s output).
θ0\theta_0 is the y-intercept (the bias term).
θi\theta_i are the coefficients or weights for each feature xix_i .

The model’s task is to find the optimal set of weights ( θ\boldsymbol{\theta} ) that best fit the data.

How to Measure the Model Performance?

The performance of a regression model is measured using a cost function (or loss function), which quantifies the “error” or “cost” for the model’s predictions. The most commonly used for Linear Regression is the Mean Squared Error (MSE).

Mean Squared Error (MSE) is calculated by averaging the squared differences between the predicted values and the actual values:

MSE=1ni=1n(yi^yi)2 \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y_i} - y_i)^2

Where yi^\hat{y_i} is the predicted value and yiy_i is the actual value.

MSE is popular because the squaring operation penalizes larger errors more heavily, making the model sensitive to outliers.

How does the Model Learn?

The model learns by iteratively adjusting its weights ( θ\boldsymbol{\theta} ) to minimize the cost function using the Gradient Descent optimization algorithm.

Gradient Descent works by calculating the gradient (the slope) of the cost function with respect to each weight. This gradient indicates the direction of the steepest increase in error. The weights are then updated by moving in the opposite direction of the gradient. The weight update rules for Linear Regression using MSE are:

For each feature weight ( θi\theta_i , where i=1,2,,ni = 1, 2, \dots, n ):

θi=θiα2nj=1n(yj^yj)xj,i \theta_i = \theta_i - \alpha \cdot \frac{2}{n} \sum_{j=1}^{n} (\hat{y_j} - y_j) x_{j,i}

For the intercept ( θ0\theta_0 ):

θ0=θ0α2nj=1n(yj^yj) \theta_0 = \theta_0 - \alpha \cdot \frac{2}{n} \sum_{j=1}^{n} (\hat{y_j} - y_j)

α\alpha (alpha) is the learning rate, a hyperparameter that controls the step size during each iteration.
j=1 to nj = 1 \text{ to } n is the data index.

This process is repeated over many iterations, called epochs, allowing the model to gradually converge on the optimal weights.

The code to implement Linear Regression from scratch is provided below.

import numpy as np

class linear_regression:
    def __init__(self):
        self.weights = []
        self.bias = 0.0
        self.learning_rate = 0.001

    def fit(self, x, y, epochs):
        data_size = len(x)
        number_of_features = len(x[0])
        x = np.array(x)
        y = np.array(y)
        self.weights = np.zeros(number_of_features)

        for epoch in range(epochs):
            derivatives = [0.0] * number_of_features
            bias_derivative = 0.0

            for pos in range(data_size):
                prediction = sum([self.weights[i] * x[pos][i] for i in range(number_of_features)]) + self.bias

                for i in range(number_of_features):
                    derivatives[i] += (2 / data_size) * (prediction - y[pos]) * x[pos][i]

                bias_derivative += (2 / data_size) * (prediction - y[pos])

            for i in range(number_of_features):
                self.weights[i] -= self.learning_rate * derivatives[i]

            self.bias -= self.learning_rate * bias_derivative

            # Safety check for numerical stability
            if any([np.isnan(w) or np.isinf(w) or abs(w) > 1e10 for w in self.weights]):
                return

    def predict(self, x):
        return sum([self.weights[i] * x[i] for i in range(len(self.weights))]) + self.bias
Enter fullscreen mode Exit fullscreen mode

Top comments (0)