DEV Community

Viswa M
Viswa M

Posted on

Building a Simple Logistic Regression from Scratch (Python Edition)

Building a Simple Logistic Regression from Scratch (Python Edition)

Meta description: Learn to build a simple logistic regression model in pure python with gradient descent, no libraries needed. Step‑by‑step guide, code snippets, predictions.

Tags: logisticregression, python, gradientdescent, machinelearning, purepython, classification, tutorial, datamanipulation

Slug: build-logistic-regression-from-scratch-in-python

Overview

In this post we’ll hand‑craft a logistic‑regression classifier in vanilla NumPy, without any machine‑learning framework.

We’ll:

  • Train a one‑feature model.
  • Scale the same idea to two features.
  • See how gradient descent iteratively lowers the cross‑entropy loss.
  • Finally, predict the probability that a new sample belongs to the positive class.

Everything is fully transparent, so you can trace every math step and every line of code.

1. What the Code Does – Overview

  1. Create toy data for a binary classification problem.
  2. Define a one‑feature logistic‑regression function that trains by gradient descent.
  3. Predict the probability for a new single‑feature sample.
  4. Define a multi‑feature version of the same algorithm.
  5. Predict the probability for a new two‑feature sample.

All of this is implemented in plain NumPy, so you can see exactly what happens during training.

2. Step‑by‑Step Walk‑Through

2.1 Imports & Data Setup

import numpy as np
from tqdm import tqdm

# 1‑D toy data
X = np.array([1, 2, 3, 4, 5, 6])          # feature values
y = np.array([0, 0, 0, 1, 1, 1])          # binary labels
Enter fullscreen mode Exit fullscreen mode
  • numpy handles vectorised math.
  • tqdm shows a progress bar during the training loop.

2.2 Hyperparameters & Initial Parameters

m = 0          # weight (slope)
b = 0          # bias (intercept)
lr = 0.01      # learning rate
epochs = 1000  # number of gradient steps
Enter fullscreen mode Exit fullscreen mode
  • Parameters start at zero.
  • The learning rate determines the step size.
  • More epochs mean more passes over the data.

2.3 One‑Feature Logistic Regression – Core Function

def logisticRegression(X, y, m, b, lr, epochs):
    n = len(X)

    for _ in tqdm(range(epochs)):
        # Linear part
        z = m * X + b

        # Sigmoid activation
        y_hat = 1 / (1 + np.exp(-z))

        # Gradients
        dm = (1 / n) * np.sum((y_hat - y) * X)
        db = (1 / n) * np.sum(y_hat - y)

        # Gradient descent update
        m -= lr * dm
        b -= lr * db

    return m, b
Enter fullscreen mode Exit fullscreen mode
Step Operation Purpose
1 z = m * X + b Linear combination of feature and bias.
2 σ(z) = 1/(1+e^{-z}) Squashes any real number into the interval (0,\,1).
3 dm & db Partial derivatives of cross‑entropy loss w.r.t. m and b.
4 Update rules Move parameters toward the minimum of the loss.

2.4 Training & Prediction for 1‑D Data

m, b = logisticRegression(X, y, m, b, lr, epochs)

# Predict probability for a new input
inp = 9
z = m * inp + b
prob = 1 / (1 + np.exp(-z))
print("Probability:", prob)
Enter fullscreen mode Exit fullscreen mode

After training on the six points, the model estimates how likely x = 9 belongs to the positive class.

2.5 Multi‑Feature Logistic Regression – Scaling Up

# 2‑D toy data
X2 = np.array([
    [25, 30],
    [35, 60],
    [45, 80]
])
y2 = np.array([0, 1, 1])

weights = np.zeros(X2.shape[1])  # one weight per feature
bias = 0
Enter fullscreen mode Exit fullscreen mode

2.6 Core Function for Multiple Features


python
def logisticRegressionMultipleFeatures(X, y, W, b, lr, epochs):
    n = len(X)

    for _ in tqdm(range(epochs)):
        # Linear part
        z = np.dot(X, W) + b

        # Sigmoid
        y_hat = 1 / (1 + np.exp(-z))

        # Gradients
        dw = (1 / n
Enter fullscreen mode Exit fullscreen mode

Top comments (0)