DEV Community

Viswa M
Viswa M

Posted on

🚀 A Gentle Walk‑Through of Logistic Regression in Python

🚀 A Gentle Walk‑Through of Logistic Regression in Python

Meta description

Learn logistic regression in Python from scratch using NumPy. Step‑by‑step guide to build, train, and predict without heavy libraries.

Tags

logisticregression, python, numpy, machinelearning, dataanalysis, classification, gradientdescent, crossentropy, sigmoid, tutorial


Introduction

When you think of classification, imagine questions like “Is this email spam?” or “Will this customer churn?” The answer is a binary label (1= \text{yes}, 0= \text{no}). Logistic regression turns a linear model into a probability estimate, allowing us to quantify confidence in the decision. Because it relies on a simple sigmoid function, we can write the whole algorithm in a few lines while preserving intuition.


Overview

  • Data: features \mathbf{X}, binary labels \mathbf{y}
  • Parameters: a scalar weight m and bias b for one feature; a vector \mathbf{W} and bias b for many
  • Training: 1 000 epochs of gradient descent
  • Prediction: sigmoid applied to the linear combination of inputs

The same equations work whether we have a single feature or several; the only difference is that the weight becomes a vector.


Imports and Data

import numpy as np
from tqdm import tqdm  # progress bar

# One‑dimensional toy data
X  = np.array([1, 2, 3, 4, 5, 6])
y  = np.array([0, 0, 0, 1, 1, 1])

# Two‑dimensional toy data
X2 = np.array([[25, 30],
               [35, 60],
               [45, 80]])
y2 = np.array([0, 1, 1])
Enter fullscreen mode Exit fullscreen mode

These tiny arrays let us step through the whole learning process without any external data files.


Initialisation

# 1‑D parameters
m, b = 0.0, 0.0

# 2‑D parameters
W, bias = np.zeros(X2.shape[1]), 0.0

# Common hyper‑parameters
lr     = 0.01   # learning rate
epochs = 1000   # full passes over the dataset
Enter fullscreen mode Exit fullscreen mode
  • Learning rate (lr) controls the step size in gradient descent. Too high, and we overshoot; too low, and training stalls.
  • Epochs is the number of times we loop over the entire dataset.

Sigmoid (Logistic) Function

The sigmoid squashes any real number into the interval (0,\,1):

def sigmoid(z):
    return 1 / (1 + np.exp(-z))
Enter fullscreen mode Exit fullscreen mode

When z is very negative, the output is close to 0; when z is very positive, it approaches 1.


1‑D Logistic Regression

def logisticRegression(X, y, m, b, lr, epochs):
    n = len(X)  # number of samples

    for _ in tqdm(range(epochs), leave=False):
        # Forward pass
        z     = m * X + b
        y_hat = sigmoid(z)

        # Gradients of cross‑entropy loss
        dm = (1 / n) * np.sum((y_hat - y) * X)
        db = (1 / n) * np.sum(y_hat - y)

        # Gradient descent updates
        m -= lr * dm
        b -= lr * db

    return m, b
Enter fullscreen mode Exit fullscreen mode

What the loop does

  1. Compute the linear score z.
  2. Convert z into a probability \hat{y} with the sigmoid.
  3. Calculate how much each parameter should change (dm, db).
  4. Move the parameters a little toward the minimum.

After the loop, m and b hold the trained model.


Prediction (1‑D)

m, b = logisticRegression(X, y, m, b, lr, epochs)

new_x = 9
prob  = sigmoid(m * new_x + b)
print("Probability that x = 9 is class 1:", prob)
Enter fullscreen mode Exit fullscreen mode

The output is a confidence score between 0 and 1, indicating how likely the point belongs to the positive class.


Multi‑Feature Logistic Regression

The only change is that we replace the scalar weight with a vector and use matrix operations:

def logisticRegressionMultipleFeatures(X, y, W, b, lr, epochs):
    n = len(X)

    for _ in tqdm(range(epochs), leave=False):
        # Forward pass
        z     = np.dot(X, W) + b
        y_hat = sigmoid(z)

        # Gradients
        dw = (1 / n) * np.dot(X.T, (y_hat - y))
        db = (1 / n) * np.sum(y_hat - y)

        # Updates
        W -= lr * dw
        b -= lr * db

    return W, b
Enter fullscreen mode Exit fullscreen mode

The gradients dw and db are derived exactly as in the 1‑D case, just expressed in vector form.


Prediction (Multi‑D)

W, bias = logisticRegressionMultipleFeatures(X2, y2, W, bias, lr, epochs)

sample = np.array([40, 70])
prob   = sigmoid(np.dot(sample, W) + bias)
print("Probability that sample [40, 70] is class 1:", prob)
Enter fullscreen mode Exit fullscreen mode

Again, the result is a probability that can be thresholded (e.g., 0.5) to obtain a hard class label.


Key Concepts & Math

  • Linear model: z = \mathbf{w}^\top \mathbf{x} + b
  • Sigmoid: \sigma(z) = \frac{1}{1 + e^{-z}}
  • Cross‑entropy loss: $$ L = -\frac{1}{n} \sum_i \Big[ y_i \log \sigma(z_i) + (1 - y

Top comments (0)