🚀 A Gentle Walk‑Through of Logistic Regression in Python

#logisticregression #python #numpy #machinelearning

🚀 A Gentle Walk‑Through of Logistic Regression in Python

Meta description

Learn logistic regression in Python from scratch using NumPy. Step‑by‑step guide to build, train, and predict without heavy libraries.

Tags

logisticregression, python, numpy, machinelearning, dataanalysis, classification, gradientdescent, crossentropy, sigmoid, tutorial

Introduction

When you think of classification, imagine questions like “Is this email spam?” or “Will this customer churn?” The answer is a binary label ( $1= \text{yes}, 0= \text{no}$ ). Logistic regression turns a linear model into a probability estimate, allowing us to quantify confidence in the decision. Because it relies on a simple sigmoid function, we can write the whole algorithm in a few lines while preserving intuition.

Overview

Data: features $\mathbf{X}$ , binary labels $\mathbf{y}$
Parameters: a scalar weight $m$ and bias $b$ for one feature; a vector $\mathbf{W}$ and bias $b$ for many
Training: 1 000 epochs of gradient descent
Prediction: sigmoid applied to the linear combination of inputs

The same equations work whether we have a single feature or several; the only difference is that the weight becomes a vector.

Imports and Data

import numpy as np
from tqdm import tqdm  # progress bar

# One‑dimensional toy data
X  = np.array([1, 2, 3, 4, 5, 6])
y  = np.array([0, 0, 0, 1, 1, 1])

# Two‑dimensional toy data
X2 = np.array([[25, 30],
               [35, 60],
               [45, 80]])
y2 = np.array([0, 1, 1])

These tiny arrays let us step through the whole learning process without any external data files.

Initialisation

# 1‑D parameters
m, b = 0.0, 0.0

# 2‑D parameters
W, bias = np.zeros(X2.shape[1]), 0.0

# Common hyper‑parameters
lr     = 0.01   # learning rate
epochs = 1000   # full passes over the dataset

Learning rate ( $lr$ ) controls the step size in gradient descent. Too high, and we overshoot; too low, and training stalls.
Epochs is the number of times we loop over the entire dataset.

Sigmoid (Logistic) Function

The sigmoid squashes any real number into the interval $(0,\,1)$ :

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

When $z$ is very negative, the output is close to 0; when $z$ is very positive, it approaches 1.

1‑D Logistic Regression

def logisticRegression(X, y, m, b, lr, epochs):
    n = len(X)  # number of samples

    for _ in tqdm(range(epochs), leave=False):
        # Forward pass
        z     = m * X + b
        y_hat = sigmoid(z)

        # Gradients of cross‑entropy loss
        dm = (1 / n) * np.sum((y_hat - y) * X)
        db = (1 / n) * np.sum(y_hat - y)

        # Gradient descent updates
        m -= lr * dm
        b -= lr * db

    return m, b

What the loop does

Compute the linear score $z$ .
Convert $z$ into a probability $\hat{y}$ with the sigmoid.
Calculate how much each parameter should change ( $dm$ , $db$ ).
Move the parameters a little toward the minimum.

After the loop, $m$ and $b$ hold the trained model.

Prediction (1‑D)

m, b = logisticRegression(X, y, m, b, lr, epochs)

new_x = 9
prob  = sigmoid(m * new_x + b)
print("Probability that x = 9 is class 1:", prob)

The output is a confidence score between 0 and 1, indicating how likely the point belongs to the positive class.

Multi‑Feature Logistic Regression

The only change is that we replace the scalar weight with a vector and use matrix operations:

def logisticRegressionMultipleFeatures(X, y, W, b, lr, epochs):
    n = len(X)

    for _ in tqdm(range(epochs), leave=False):
        # Forward pass
        z     = np.dot(X, W) + b
        y_hat = sigmoid(z)

        # Gradients
        dw = (1 / n) * np.dot(X.T, (y_hat - y))
        db = (1 / n) * np.sum(y_hat - y)

        # Updates
        W -= lr * dw
        b -= lr * db

    return W, b

The gradients $dw$ and $db$ are derived exactly as in the 1‑D case, just expressed in vector form.

Prediction (Multi‑D)

W, bias = logisticRegressionMultipleFeatures(X2, y2, W, bias, lr, epochs)

sample = np.array([40, 70])
prob   = sigmoid(np.dot(sample, W) + bias)
print("Probability that sample [40, 70] is class 1:", prob)

Again, the result is a probability that can be thresholded (e.g., 0.5) to obtain a hard class label.

Key Concepts & Math

Linear model: $z = \mathbf{w}^\top \mathbf{x} + b$
Sigmoid: $\sigma(z) = \frac{1}{1 + e^{-z}}$
Cross‑entropy loss: $$ L = -\frac{1}{n} \sum_i \Big[ y_i \log \sigma(z_i) + (1 - y