ML Learning #1 : Linear Regression

#algorithms #datascience #beginners #machinelearning

What is Linear Regression

Linear Regression is a fundamental statistical machine learning algorithm that models the linear relationship between a dependent variable ( $\mathbf{y}$ ) and one or more independent variables ( $\mathbf{x}$ ). The goal is to fit a straight line (or a hyperplane in multiple dimensions) that minimizes the overall prediction error on the training data.

Note 1.1: Linearly related features exhibit a correlation where a change in one variable results in a proportional change in the other (e.g., as $X$ increases, $Y$ also tends to increase, or vice versa). Linear Regression works best when this relationship is approximately linear.

Note 1.2: Regression is the process of predicting a continuous or real value (e.g., 265.34, 10.231).

How does it work

Linear Regression models the relationship by defining a linear function, often called the hypothesis $\hat{y}$ , which calculates the predicted value.
For Multiple Linear Regression (more than one feature), this line is represented as:

\hat{y} = \theta_0 + \theta_1x_1 + \theta_2x_2 + \dots + \theta_nx_n

$\hat{y}$ is the predicted value (the model’s output).
$\theta_0$ is the y-intercept (the bias term).
$\theta_i$ are the coefficients or weights for each feature $x_i$ .

The model’s task is to find the optimal set of weights ( $\boldsymbol{\theta}$ ) that best fit the data.

How to Measure the Model Performance?

The performance of a regression model is measured using a cost function (or loss function), which quantifies the “error” or “cost” for the model’s predictions. The most commonly used for Linear Regression is the Mean Squared Error (MSE).

Mean Squared Error (MSE) is calculated by averaging the squared differences between the predicted values and the actual values:

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y_i} - y_i)^2

Where $\hat{y_i}$ is the predicted value and $y_i$ is the actual value.

MSE is popular because the squaring operation penalizes larger errors more heavily, making the model sensitive to outliers.

How does the Model Learn?

The model learns by iteratively adjusting its weights ( $\boldsymbol{\theta}$ ) to minimize the cost function using the Gradient Descent optimization algorithm.

Gradient Descent works by calculating the gradient (the slope) of the cost function with respect to each weight. This gradient indicates the direction of the steepest increase in error. The weights are then updated by moving in the opposite direction of the gradient. The weight update rules for Linear Regression using MSE are:

For each feature weight ( $\theta_i$ , where $i = 1, 2, \dots, n$ ):

\theta_i = \theta_i - \alpha \cdot \frac{2}{n} \sum_{j=1}^{n} (\hat{y_j} - y_j) x_{j,i}

For the intercept ( $\theta_0$ ):

\theta_0 = \theta_0 - \alpha \cdot \frac{2}{n} \sum_{j=1}^{n} (\hat{y_j} - y_j)

$\alpha$ (alpha) is the learning rate, a hyperparameter that controls the step size during each iteration.
$j = 1 \text{ to } n$ is the data index.

This process is repeated over many iterations, called epochs, allowing the model to gradually converge on the optimal weights.

The code to implement Linear Regression from scratch is provided below.

import numpy as np

class linear_regression:
    def __init__(self):
        self.weights = []
        self.bias = 0.0
        self.learning_rate = 0.001

    def fit(self, x, y, epochs):
        data_size = len(x)
        number_of_features = len(x[0])
        x = np.array(x)
        y = np.array(y)
        self.weights = np.zeros(number_of_features)

        for epoch in range(epochs):
            derivatives = [0.0] * number_of_features
            bias_derivative = 0.0

            for pos in range(data_size):
                prediction = sum([self.weights[i] * x[pos][i] for i in range(number_of_features)]) + self.bias

                for i in range(number_of_features):
                    derivatives[i] += (2 / data_size) * (prediction - y[pos]) * x[pos][i]

                bias_derivative += (2 / data_size) * (prediction - y[pos])

            for i in range(number_of_features):
                self.weights[i] -= self.learning_rate * derivatives[i]

            self.bias -= self.learning_rate * bias_derivative

            # Safety check for numerical stability
            if any([np.isnan(w) or np.isinf(w) or abs(w) > 1e10 for w in self.weights]):
                return

    def predict(self, x):
        return sum([self.weights[i] * x[i] for i in range(len(self.weights))]) + self.bias

DEV Community

ML Learning #1 : Linear Regression

What is Linear Regression

How does it work

How to Measure the Model Performance?

How does the Model Learn?

Top comments (0)