ML Learning #2: Logistic Regression

#machinelearning #ai #beginners #datascience

What is Logistic Regression

Logistic Regression is a regression model used for classification applications. "How?" you may ask, well logistic regression is based on the logit function or sigmoid function; this function takes any value from $-\infty$ to $+\infty$ and maps them to a value between 0 and 1. So assume you have two classes, a positive class and a negative class, your logistic regression model will predict the probability of your data belonging to the positive class (ie, 1). So in a sense, a logistic regression model predicts the probability of your data belonging to the positive class.

How does it work

1.1 Sigmoid Function

As mentioned, the model is based on the sigmoid function. This function can be represented by the following formula:

g(X) = \frac{1}{1 + e^{-\theta^T \dot X}}

Where

X \in (-\infty, +\infty)

and

g(X) \in [0, 1]

Now after the model predicts the probability, we will need to determine if the data belongs to the positive class or not, to do that, we compare the output g(X) with a threshold value as shown below

Note:

In general, the threshold is set to 0.5, meaning that if the model predicts a probability greater than or equal to 0.5, the output is classified as 1 (positive class), otherwise 0 (negative class).

However, the optimal threshold depends on the problem you’re trying to solve. For instance, in a spam detection system, it’s often better to let a few spam emails slip through than to incorrectly mark an important email as spam. In such cases, you might set a higher threshold (e.g., 0.8) to be more confident before labeling an email as spam.

This trade-off between precision and recall is a key aspect of tuning classification models.

1.2 Loss Function

As mentioned in the first article, all models use a loss function to determine how well they are performing, so they get penalized more with incorrect predictions. In a classification problem, there are two loss functions used, binary crossentropy (Two class classification), and categorical crossentropy (multi-class classification). We'll be using binary crossentropy as our model is used for a two class classification problem. Binary Crossentropy can be represented by the formula

\text{BinaryCrossEntropy} = -\frac{1}{N} \sum_{i=1}^{N} \Big[ y_i \log(\hat{y}) + (1 - y_i)\log(1 - \hat{y}) \Big]

If you look at the formula, you will notice that the BinaryCrossEntropy (BSE in short) will be high for incorrect values, and lower for correct values.

1.3 Gradient Decent

\theta = \theta - LearningRate \cdot (g({X}_i) - y_i)\mathbf{X}_i

Where $y_i$ is the actual class label, $g(X)$ is the sigmoid function and $\theta$ is the weights for the respective features.

This process occurs iteratively for each epoch, and the aim is to modify the weights (\theta) such that the Binary CrossEntropy is minimized.

Implementation

import numpy as np

class LogisticRegression:
    def __init__(self):
        self.theta = None
        self.LR = 0.001

    def sigmoid(self, z_input):
        z_input = np.clip(z_input, -300, 300)
        return 1 / (1 + np.exp(-z_input))

    def fit(self, x, y, epochs):
        x = np.array(x)
        y = np.array(y)
        self.theta = np.zeros(x.shape[1])

        for epoch in range(epochs):
            tp = tn = fp = fn = 0
            for i in range(len(x)):
                z = self.sigmoid(np.dot(self.theta, x[i]))
                gradient = (z - y[i]) * x[i]
                self.theta -= self.LR * gradient

    def predict(self, x):
        x = np.array(x, dtype=np.float64)
        probs = self.sigmoid(np.dot(x, self.theta))
        return 1 if probs >= 0.5 else 0

DEV Community