Sachin Kr. Rajput

Posted on Jan 22

Why Is It Called 'Logistic Regression' If It's Used for Classification? The Naming Mystery Explained

#machinelearning #datascience #beginners #python

The One-Line Summary: Logistic regression IS regression — it regresses (predicts) the LOG-ODDS of an event, which happens to be a continuous number, and only becomes classification when you apply a threshold to the resulting probability.

The Confusing Name

Every machine learning student has this moment:

STUDENT'S INTERNAL MONOLOGUE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Week 1: "Regression predicts continuous numbers like 
         price, temperature, age..."

Week 2: "Classification predicts categories like 
         spam/not spam, cat/dog, yes/no..."

Week 3: "Today we'll learn LOGISTIC REGRESSION 
         for CLASSIFICATION..."

Student: "Wait... WHAT?! 🤯"

The Short Answer

WHY "REGRESSION"?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Logistic regression DOES predict a continuous number!

It predicts: The LOG-ODDS of the positive class
             (a number from -∞ to +∞)

Which becomes: A PROBABILITY
               (a number from 0 to 1)

The CLASSIFICATION part only happens AFTER,
when you apply a threshold (like 0.5).


LINEAR REGRESSION:      Predicts a continuous number
LOGISTIC REGRESSION:    Predicts a continuous number (probability!)
                        ↓
                        THEN you threshold it for classification

What Logistic Regression Actually Predicts

Let's trace through what the model outputs:

THE THREE STAGES OF LOGISTIC REGRESSION OUTPUT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STAGE 1: Linear Combination (z)
──────────────────────────────
z = β₀ + β₁x₁ + β₂x₂ + ...

This is REGRESSION! 
z can be any number: -∞ to +∞
Example: z = 2.3, z = -1.7, z = 0.5


STAGE 2: Probability (p)
──────────────────────────────
p = σ(z) = 1 / (1 + e^(-z))

This is STILL a continuous number!
p ranges from 0 to 1
Example: p = 0.91, p = 0.15, p = 0.62


STAGE 3: Class Label (ŷ)
──────────────────────────────
ŷ = 1 if p ≥ 0.5, else 0

THIS is where classification happens!
ŷ is discrete: only 0 or 1
Example: ŷ = 1, ŷ = 0


THE REGRESSION IS IN STAGES 1 AND 2!
The classification is just a post-processing step.

The Log-Odds: What's Actually Being Regressed

Here's the key insight:

import numpy as np

print("WHAT LOGISTIC REGRESSION ACTUALLY REGRESSES")
print("="*60)

print("""
The model finds coefficients such that:

    ln(p / (1-p)) = β₀ + β₁x₁ + β₂x₂ + ...
    ─────────────   ─────────────────────────
       LOG-ODDS            LINEAR!

This IS regression! We're predicting a continuous value
(the log-odds) as a linear function of the features.
""")

# Show the relationship
print("Probability → Odds → Log-Odds")
print("-"*60)
print(f"{'P(y=1)':<12} {'Odds':<15} {'Log-Odds':<15} {'Meaning'}")
print("-"*60)

for p in [0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:
    odds = p / (1 - p)
    log_odds = np.log(odds)

    if p < 0.5:
        meaning = "More likely 0"
    elif p > 0.5:
        meaning = "More likely 1"
    else:
        meaning = "50-50"

    print(f"{p:<12.2f} {odds:<15.4f} {log_odds:<15.4f} {meaning}")

print("""
The LOG-ODDS is a continuous number from -∞ to +∞.
Logistic regression REGRESSES this value!
""")

Output:

WHAT LOGISTIC REGRESSION ACTUALLY REGRESSES
============================================================

The model finds coefficients such that:

    ln(p / (1-p)) = β₀ + β₁x₁ + β₂x₂ + ...
    ─────────────   ─────────────────────────
       LOG-ODDS            LINEAR!

This IS regression! We're predicting a continuous value
(the log-odds) as a linear function of the features.

Probability → Odds → Log-Odds
------------------------------------------------------------
P(y=1)       Odds            Log-Odds        Meaning
------------------------------------------------------------
0.01         0.0101          -4.5951         More likely 0
0.10         0.1111          -2.1972         More likely 0
0.25         0.3333          -1.0986         More likely 0
0.50         1.0000          0.0000          50-50
0.75         3.0000          1.0986          More likely 1
0.90         9.0000          2.1972          More likely 1
0.99         99.0000         4.5951          More likely 1

The LOG-ODDS is a continuous number from -∞ to +∞.
Logistic regression REGRESSES this value!

Visual: The Regression Hidden Inside

THE REGRESSION YOU DON'T SEE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

What you THINK logistic regression does:
(Predicts 0 or 1)

  y │
  1 │         ●  ●  ●  ●  ●
    │
  0 │  ●  ●  ●
    └─────────────────────────── x


What logistic regression ACTUALLY does:
(Predicts continuous probability)

  p │
  1 │                    ●●●●●●●
    │                 ●●●
0.5 │- - - - - - - ●●- - - - - -
    │          ●●●
  0 │  ●●●●●●●
    └─────────────────────────── x

                 ↑
          This S-curve is the
          REGRESSION of probability!


What it's REALLY doing internally:
(Regressing log-odds — a straight line!)

log │
odds│                        ●
  2 │                    ●
  1 │                ●
  0 │- - - - - - ●- - - - - - - -
 -1 │        ●
 -2 │    ●
    └─────────────────────────── x

          This is LINEAR REGRESSION
          on the log-odds scale!

The Historical Reason

THE HISTORY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1805: Legendre & Gauss develop "least squares regression"
      for predicting continuous outcomes.

1838: Pierre François Verhulst develops the "logistic 
      function" to model population growth.
      (The S-curve that limits growth)

1944: Joseph Berkson coins "logistic regression" combining:
      • "Logistic" - the S-shaped function
      • "Regression" - because it predicts a continuous
                       value (probability/log-odds)

The name stuck, even though we now primarily use it
for classification tasks!


WHY DIDN'T THEY CALL IT "LOGISTIC CLASSIFICATION"?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Because the MODEL itself does regression!
The classification is YOUR choice of what to do
with the predicted probability.

You could:
• Threshold at 0.5 for classification
• Threshold at 0.3 for high-recall classification
• Use the raw probability for ranking
• Use the probability in a cost-benefit analysis

The model doesn't know you want to classify.
It just regresses probabilities.

Code: Seeing the Regression

import numpy as np
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt

# Create simple data
np.random.seed(42)
X = np.linspace(-3, 3, 100).reshape(-1, 1)
y = (X.ravel() + np.random.randn(100) * 0.5 > 0).astype(int)

# Fit logistic regression
model = LogisticRegression()
model.fit(X, y)

# Get all three outputs
z = model.intercept_[0] + model.coef_[0][0] * X.ravel()  # Linear combination
p = model.predict_proba(X)[:, 1]  # Probability
y_pred = model.predict(X)  # Class label

print("THE THREE OUTPUTS OF LOGISTIC REGRESSION")
print("="*60)

print(f"\n{'X':<10} {'z (linear)':<15} {'p (prob)':<15} {'ŷ (class)':<10}")
print("-"*50)

# Show for a few values
indices = [0, 25, 50, 75, 99]
for i in indices:
    print(f"{X[i,0]:<10.2f} {z[i]:<15.4f} {p[i]:<15.4f} {y_pred[i]:<10}")

print(f"""
OBSERVATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

• z (linear combination) is CONTINUOUS: {z.min():.2f} to {z.max():.2f}
  THIS IS REGRESSION!

• p (probability) is CONTINUOUS: {p.min():.4f} to {p.max():.4f}
  THIS IS ALSO REGRESSION!

• ŷ (class label) is DISCRETE: only 0 or 1
  THIS is classification, but it's just thresholding p!
""")

Output:

THE THREE OUTPUTS OF LOGISTIC REGRESSION
============================================================

X          z (linear)      p (prob)        ŷ (class) 
--------------------------------------------------
-3.00      -5.2341         0.0053          0         
-1.50      -2.6171         0.0682          0         
0.00       0.0000          0.5000          1         
1.50       2.6171          0.9318          1         
3.00       5.2341          0.9947          1         

OBSERVATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

• z (linear combination) is CONTINUOUS: -5.23 to 5.23
  THIS IS REGRESSION!

• p (probability) is CONTINUOUS: 0.0053 to 0.9947
  THIS IS ALSO REGRESSION!

• ŷ (class label) is DISCRETE: only 0 or 1
  THIS is classification, but it's just thresholding p!

The Family of Regression Models

Logistic regression belongs to a broader family:

GENERALIZED LINEAR MODELS (GLMs):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

All GLMs have this structure:
  g(E[y]) = β₀ + β₁x₁ + β₂x₂ + ...

Where g() is a "link function" that transforms
the expected value of y.


LINEAR REGRESSION:
  Link function: g(μ) = μ (identity)
  Predicts: μ directly
  Use for: Continuous outcomes (price, height, etc.)


LOGISTIC REGRESSION:
  Link function: g(p) = ln(p/(1-p)) (logit)
  Predicts: log-odds, which gives probability
  Use for: Binary outcomes (0/1, yes/no)


POISSON REGRESSION:
  Link function: g(λ) = ln(λ) (log)
  Predicts: log of count rate
  Use for: Count data (number of events)


ALL ARE CALLED "REGRESSION" BECAUSE ALL PREDICT
A CONTINUOUS VALUE (just on different scales)!

Why This Matters

Understanding that logistic regression IS regression helps you:

1. UNDERSTAND THE OUTPUT BETTER
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The primary output is a PROBABILITY, not a class.
You can use this probability for:
  • Ranking (sort by confidence)
  • Calibrated predictions (actual probability estimates)
  • Decision theory (combine with costs/benefits)
  • Soft voting in ensembles


2. INTERPRET COEFFICIENTS CORRECTLY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Coefficients affect the LOG-ODDS linearly:
  • β₁ = 0.5 means: each unit of x₁ ADDS 0.5 to log-odds
  • This MULTIPLIES odds by e^0.5 ≈ 1.65

This is like linear regression, just on a different scale!


3. APPLY REGULARIZATION PROPERLY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Since it's regression, the same regularization techniques work:
  • L2 (Ridge) for multicollinearity
  • L1 (Lasso) for feature selection
  • Elastic Net for both


4. CHOOSE THE RIGHT THRESHOLD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Since classification is just thresholding:
  • 0.5 is arbitrary, not magical
  • Adjust based on precision/recall needs
  • ROC curve explores all thresholds

The Naming Convention Across ML

WHY SOME CLASSIFIERS HAVE "REGRESSION" IN THE NAME:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

LOGISTIC REGRESSION → Classification
  Why "regression": Regresses log-odds/probability

SOFTMAX REGRESSION → Multiclass Classification
  Why "regression": Regresses class probabilities

ORDINAL REGRESSION → Ordered Classification
  Why "regression": Regresses cumulative probabilities


WHY SOME DON'T:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

DECISION TREE → Classification or Regression
  Named after the structure (tree), not the method

RANDOM FOREST → Classification or Regression
  Named after the ensemble structure

SUPPORT VECTOR MACHINE → Classification or Regression
  Named after the mathematical concept (support vectors)

NEURAL NETWORK → Classification or Regression
  Named after the biological inspiration


THE PATTERN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Old statistical methods → Named after what they DO
  (regression, classification, estimation)

Modern ML methods → Named after their STRUCTURE
  (tree, forest, network, boosting)

A Simple Analogy

THE THERMOSTAT ANALOGY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

A thermostat measures TEMPERATURE (continuous)
Then makes a BINARY decision (heat on/off)

Temperature: 68°F, 71°F, 65°F, 73°F ...
             ↓
Decision:    If temp < 70°F → Heat ON
             If temp ≥ 70°F → Heat OFF


Is the thermostat a "temperature measurer" or an "on/off switch"?

BOTH! It measures temperature (continuous)
      then thresholds to make a decision (binary).


LOGISTIC REGRESSION IS THE SAME:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

It predicts PROBABILITY (continuous)
Then makes a BINARY decision (class 0/1)

Probability: 0.23, 0.87, 0.45, 0.91 ...
             ↓
Decision:    If prob < 0.5 → Class 0
             If prob ≥ 0.5 → Class 1


The MODEL is regression (predicting probability).
The APPLICATION is classification (thresholding).

Quick Reference

THE NAMING EXPLAINED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"LOGISTIC" → The logistic (sigmoid) function used
             σ(z) = 1 / (1 + e^(-z))

"REGRESSION" → Because it regresses (predicts):
               • Log-odds (continuous: -∞ to +∞)
               • Probability (continuous: 0 to 1)


COMPARISON:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                    Linear Reg.     Logistic Reg.
─────────────────────────────────────────────────
Predicts            Continuous y    Continuous p
Range               (-∞, +∞)        (0, 1)
Typical use         Regression      Classification*
Loss function       MSE             Cross-entropy
Link function       Identity        Logit

*Classification comes from thresholding p

Key Takeaways

Logistic regression IS regression — It predicts continuous probabilities
Classification is just thresholding — The model outputs probability, YOU decide the cutoff
It regresses log-odds — ln(p/(1-p)) is a linear function of features
Historical naming — "Regression" was used because it predicts a continuous quantity
Part of GLM family — All GLMs are "regression" with different link functions
The sigmoid transforms, doesn't classify — It maps (-∞, +∞) to (0, 1)
Same techniques apply — Regularization, cross-validation, etc. work because it IS regression
Output is more than just 0/1 — The probability itself is valuable for ranking, calibration, decision-making

The One-Sentence Summary

Logistic regression is called "regression" because it genuinely IS regression — it predicts the continuous log-odds (or equivalently, probability) as a linear function of features, and the classification part only happens afterward when YOU choose to threshold that probability at 0.5 or whatever cutoff makes sense for your problem.

A Final Thought

NEXT TIME SOMEONE ASKS:
"Why is it called regression if it's for classification?"

YOU CAN SAY:
"Because it IS regression! It regresses probability — 
a continuous number between 0 and 1. The classification 
part is just you picking a threshold. The model doesn't 
even know you want to classify; it just predicts 
probabilities, and you decide what to do with them."

What's Next?

Now that you understand why logistic regression is called "regression," explore:

Probability Calibration — When predicted probabilities need adjustment
ROC Curves — Evaluating all possible thresholds
Generalized Linear Models — The broader family of regression techniques
Multinomial Logistic Regression — Extending to multiple classes

Follow me for the next article in this series!

Let's Connect!

If the "regression" mystery is finally solved for you, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

Did this naming ever confuse you? I spent weeks confused in my first ML course until someone explained that the classification is just thresholding! 🤯

The difference between "logistic regression does classification" and "logistic regression does regression and then you threshold for classification"? Understanding the second version means you truly understand the algorithm.

Share this with someone still puzzled by the name. It's one of ML's most common points of confusion!

Happy learning! 📚

DEV Community