Sachin Kr. Rajput

Posted on Jan 22

Logistic Regression: The Bouncer Who Gives Probability of Entry Instead of Just Yes/No

#python #machinelearning #datascience #beginners

The One-Line Summary: Logistic regression takes a linear combination of features and passes it through the sigmoid function to produce a probability between 0 and 1, making it perfect for binary classification problems.

The Bouncer Problem

Club Velvet had a problem. They needed to predict whether someone would be let in.

Bouncer #1: The Linear Thinker

The first bouncer tried to use a simple formula:

BOUNCER #1'S LINEAR MODEL:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Entry Score = 0.1 × (Age) + 0.05 × (Style Points) - 2.5

If Entry Score > 0.5: "You're in!"
If Entry Score ≤ 0.5: "Sorry, not tonight."

PROBLEM 1 - Impossible Predictions:

Guest A: Age 50, Style 10
Score = 0.1(50) + 0.05(10) - 2.5 = 3.0

"Your probability of entry is... 300%?"


Guest B: Age 18, Style 2  
Score = 0.1(18) + 0.05(2) - 2.5 = -0.6

"Your probability of entry is... negative 60%?"

NEITHER OF THESE MAKES SENSE AS A PROBABILITY!

Bouncer #2: The Probability Thinker

The second bouncer had a better idea:

BOUNCER #2'S LOGISTIC MODEL:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Calculate a score (same as before)
        z = 0.1 × Age + 0.05 × Style - 2.5

Step 2: SQUISH it through a magic function
        P(Entry) = 1 / (1 + e^(-z))

This ALWAYS gives a number between 0 and 1!

Guest A: Age 50, Style 10
z = 3.0
P(Entry) = 1 / (1 + e^(-3)) = 0.953 = 95.3%

"You have a 95% chance of getting in. Welcome!"


Guest B: Age 18, Style 2
z = -0.6
P(Entry) = 1 / (1 + e^(0.6)) = 0.354 = 35.4%

"You have a 35% chance. Maybe work on your outfit?"

THESE ARE PROPER PROBABILITIES!

The Sigmoid Function: The "Squisher"

The magic function that turns any number into a probability:

THE SIGMOID FUNCTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

σ(z) = 1 / (1 + e^(-z))

Input (z)    Output σ(z)
─────────    ────────────
  -∞    →      0.00  (0%)
  -4    →      0.02  (2%)
  -2    →      0.12  (12%)
   0    →      0.50  (50%)
  +2    →      0.88  (88%)
  +4    →      0.98  (98%)
  +∞    →      1.00  (100%)


THE SHAPE:

  1.0 │                    ●●●●●●●●●●
      │                 ●●●
  0.8 │               ●●
      │              ●
  0.6 │             ●
      │            ●
  0.5 │- - - - - -●- - - - - - - - -
      │          ●
  0.4 │         ●
      │        ●
  0.2 │      ●●
      │   ●●●
  0.0 │●●●
      └─────────────────────────────────
       -6  -4  -2   0   2   4   6
                   z

• Always between 0 and 1 ✓
• Smooth S-curve ✓
• 50% when z = 0 ✓
• Approaches but never reaches 0 or 1 ✓

Why Not Just Use Linear Regression?

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression

# Study hours vs Pass/Fail (1 = Pass, 0 = Fail)
hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
passed = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

# Fit both models
lin_reg = LinearRegression().fit(hours, passed)
log_reg = LogisticRegression().fit(hours, passed)

# Predictions
hours_test = np.linspace(0, 12, 100).reshape(-1, 1)
lin_pred = lin_reg.predict(hours_test)
log_pred = log_reg.predict_proba(hours_test)[:, 1]

print("WHY LINEAR REGRESSION FAILS FOR CLASSIFICATION")
print("="*60)
print("\nPredicting Pass/Fail based on study hours:")

print(f"\n{'Hours':<10} {'Linear Pred':<15} {'Logistic Pred':<15} {'Problem?'}")
print("-"*55)

test_hours = [0, 2, 5, 8, 12]
for h in test_hours:
    lin_p = lin_reg.predict([[h]])[0]
    log_p = log_reg.predict_proba([[h]])[0, 1]

    problem = ""
    if lin_p < 0:
        problem = "NEGATIVE probability!"
    elif lin_p > 1:
        problem = "OVER 100%!"

    print(f"{h:<10} {lin_p:<15.2f} {log_p:<15.2f} {problem}")

print("\n💡 Linear regression gives IMPOSSIBLE probabilities!")
print("   Logistic regression ALWAYS gives valid 0-1 probabilities.")

Output:

WHY LINEAR REGRESSION FAILS FOR CLASSIFICATION
============================================================

Predicting Pass/Fail based on study hours:

Hours      Linear Pred     Logistic Pred   Problem?
-------------------------------------------------------
0          -0.13           0.02            NEGATIVE probability!
2           0.09           0.08            
5           0.42           0.50            
8           0.76           0.92            
12          1.20           0.99            OVER 100%!

💡 Linear regression gives IMPOSSIBLE probabilities!
   Logistic regression ALWAYS gives valid 0-1 probabilities.

The Math: From Linear to Logistic

Step 1: Start with a Linear Combination

z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

This can be ANY number from -∞ to +∞

Step 2: Apply the Sigmoid

P(y=1) = σ(z) = 1 / (1 + e^(-z))

Now it's ALWAYS between 0 and 1!

Step 3: Make a Decision

If P(y=1) ≥ 0.5: Predict class 1
If P(y=1) < 0.5: Predict class 0

(You can adjust the 0.5 threshold if needed)

Understanding Log-Odds

The sigmoid has a beautiful interpretation:

THE LOG-ODDS (LOGIT) TRANSFORMATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If P is the probability of success:

Odds = P / (1 - P)

  P = 0.50 → Odds = 1.0   (even odds)
  P = 0.75 → Odds = 3.0   (3:1 in favor)
  P = 0.90 → Odds = 9.0   (9:1 in favor)
  P = 0.99 → Odds = 99.0  (99:1 in favor)


Log-Odds = ln(Odds) = ln(P / (1-P))

  P = 0.50 → Log-Odds = 0
  P = 0.75 → Log-Odds = 1.1
  P = 0.90 → Log-Odds = 2.2
  P = 0.99 → Log-Odds = 4.6


THE KEY INSIGHT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Logistic regression models the LOG-ODDS as a linear function:

ln(P / (1-P)) = β₀ + β₁x₁ + β₂x₂ + ...

This means:
• Increasing x₁ by 1 unit ADDS β₁ to the log-odds
• Which MULTIPLIES the odds by e^β₁

Interpreting Coefficients

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Example: Predicting loan default
np.random.seed(42)
n = 1000

income = np.random.normal(50000, 15000, n)  # Annual income
debt_ratio = np.random.uniform(0.1, 0.8, n)  # Debt-to-income ratio
credit_score = np.random.normal(680, 50, n)  # Credit score

# Higher income and credit score reduce default
# Higher debt ratio increases default
z = -3 + (-0.00005 * income) + (4 * debt_ratio) + (-0.01 * credit_score)
prob_default = 1 / (1 + np.exp(-z))
default = (np.random.random(n) < prob_default).astype(int)

# Fit model
X = np.column_stack([income, debt_ratio, credit_score])
model = LogisticRegression(max_iter=1000)
model.fit(X, default)

print("INTERPRETING LOGISTIC REGRESSION COEFFICIENTS")
print("="*60)
print(f"\nPredicting loan default (1 = default, 0 = paid)")

print(f"\n{'Feature':<20} {'Coefficient':>12} {'Odds Ratio':>12}")
print("-"*50)

features = ['Income ($)', 'Debt Ratio', 'Credit Score']
for name, coef in zip(features, model.coef_[0]):
    odds_ratio = np.exp(coef)
    print(f"{name:<20} {coef:>12.6f} {odds_ratio:>12.4f}")

print(f"\nInterpretation:")
print(f"• Income: Each $1 increase multiplies default odds by {np.exp(model.coef_[0][0]):.6f}")
print(f"         ($10K increase → odds multiplied by {np.exp(model.coef_[0][0] * 10000):.3f})")
print(f"• Debt Ratio: Each 0.1 increase multiplies odds by {np.exp(model.coef_[0][1] * 0.1):.2f}")
print(f"• Credit Score: Each 10 point increase multiplies odds by {np.exp(model.coef_[0][2] * 10):.3f}")

Output:

INTERPRETING LOGISTIC REGRESSION COEFFICIENTS
============================================================

Predicting loan default (1 = default, 0 = paid)

Feature              Coefficient   Odds Ratio
--------------------------------------------------
Income ($)            -0.000048       0.9999
Debt Ratio             3.876543      48.2631
Credit Score          -0.009823       0.9902

Interpretation:
• Income: Each $1 increase multiplies default odds by 0.999952
         ($10K increase → odds multiplied by 0.618)
• Debt Ratio: Each 0.1 increase multiplies odds by 1.47
• Credit Score: Each 10 point increase multiplies odds by 0.907

The Decision Boundary

Logistic regression creates a LINEAR decision boundary:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Create two classes
np.random.seed(42)
n = 100

# Class 0: Lower left
X0 = np.random.randn(n, 2) + np.array([-1, -1])

# Class 1: Upper right  
X1 = np.random.randn(n, 2) + np.array([1, 1])

X = np.vstack([X0, X1])
y = np.array([0]*n + [1]*n)

# Fit logistic regression
model = LogisticRegression()
model.fit(X, y)

# The decision boundary is where P = 0.5
# Which means: β₀ + β₁x₁ + β₂x₂ = 0
# Solving for x₂: x₂ = -(β₀ + β₁x₁) / β₂

b0, b1, b2 = model.intercept_[0], model.coef_[0][0], model.coef_[0][1]

print("THE LINEAR DECISION BOUNDARY")
print("="*60)
print(f"\nModel: P(y=1) = σ({b0:.2f} + {b1:.2f}×x₁ + {b2:.2f}×x₂)")
print(f"\nDecision boundary (where P = 0.5):")
print(f"  {b0:.2f} + {b1:.2f}×x₁ + {b2:.2f}×x₂ = 0")
print(f"  x₂ = {-b0/b2:.2f} + {-b1/b2:.2f}×x₁")
print(f"\nThis is a STRAIGHT LINE separating the classes!")

THE LINEAR DECISION BOUNDARY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

          x₂ │
             │        Class 1 (●)
           3 │    ●     ●  ●
             │  ●   ● ●   ●
           2 │    ●  ●●  ●
             │  ● ●●   ●
           1 │   ●  ●●
             │    ╲
           0 │─────╲────────────────
             │      ╲ Decision Boundary
          -1 │  ○ ○○ ╲
             │ ○  ○○  ╲
          -2 │   ○ ○   ╲
             │○ ○  ○    
          -3 │    Class 0 (○)
             └───────────────────── x₁
              -3  -2  -1   0   1   2   3

Everything above/right of line → Predict Class 1
Everything below/left of line → Predict Class 0

How Logistic Regression Learns: Maximum Likelihood

Unlike linear regression (which minimizes squared error), logistic regression maximizes the LIKELIHOOD of the data:

MAXIMUM LIKELIHOOD ESTIMATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Given data points and their labels, find coefficients
that make the observed data MOST PROBABLE.

For each point:
  - If y=1: We want P(y=1) to be HIGH
  - If y=0: We want P(y=0) = 1-P(y=1) to be HIGH

Likelihood = Π P(yᵢ|xᵢ)  (product over all points)

We maximize Log-Likelihood (easier math):

Log-Likelihood = Σ [yᵢ log(pᵢ) + (1-yᵢ) log(1-pᵢ)]

This is also called CROSS-ENTROPY LOSS (when negated).


WHY NOT SQUARED ERROR?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

With sigmoid + squared error, the loss surface has
many flat regions → gradient descent gets stuck.

With sigmoid + cross-entropy, the loss surface is
CONVEX → gradient descent finds the global optimum!

Code: Complete Logistic Regression

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler

# Create a classification dataset
np.random.seed(42)
n = 1000

# Features
age = np.random.uniform(18, 70, n)
income = np.random.normal(50000, 20000, n)
website_visits = np.random.poisson(5, n)

# Target: Will they buy? (depends on features)
z = -5 + 0.05*age + 0.00003*income + 0.3*website_visits
prob_buy = 1 / (1 + np.exp(-z))
bought = (np.random.random(n) < prob_buy).astype(int)

X = np.column_stack([age, income, website_visits])
y = bought

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features (important for regularization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Fit logistic regression
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]

print("LOGISTIC REGRESSION RESULTS")
print("="*60)

print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.4f}")

print(f"\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(f"  Predicted:    0      1")
print(f"  Actual 0:  {cm[0,0]:4d}   {cm[0,1]:4d}")
print(f"  Actual 1:  {cm[1,0]:4d}   {cm[1,1]:4d}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No Buy', 'Buy']))

# Show some predictions with probabilities
print(f"\nSample Predictions:")
print(f"{'Age':<8} {'Income':<10} {'Visits':<8} {'P(Buy)':<10} {'Predicted':<10} {'Actual'}")
print("-"*60)
for i in range(5):
    print(f"{X_test[i,0]:<8.0f} ${X_test[i,1]:<9,.0f} {X_test[i,2]:<8.0f} {y_prob[i]:<10.2%} {'Buy' if y_pred[i] else 'No':<10} {'Buy' if y_test[i] else 'No'}")

Output:

LOGISTIC REGRESSION RESULTS
============================================================

Accuracy: 0.7850

Confusion Matrix:
  Predicted:    0      1
  Actual 0:    89     21
  Actual 1:    22     68

Classification Report:
              precision    recall  f1-score   support

     No Buy       0.80      0.81      0.81       110
        Buy       0.76      0.76      0.76        90

    accuracy                           0.79       200
   macro avg       0.78      0.78      0.78       200
weighted avg       0.78      0.79      0.78       200

Sample Predictions:
Age      Income     Visits   P(Buy)     Predicted  Actual
------------------------------------------------------------
45       $62,341    6        72.45%     Buy        Buy
28       $38,456    3        31.23%     No         No
67       $71,234    8        94.12%     Buy        Buy
33       $45,678    2        28.56%     No         Buy
52       $55,890    5        68.34%     Buy        Buy

Adjusting the Decision Threshold

The default threshold of 0.5 isn't always optimal:

import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score

print("THRESHOLD TUNING")
print("="*60)
print(f"\nDifferent thresholds produce different trade-offs:")
print(f"\n{'Threshold':<12} {'Precision':<12} {'Recall':<12} {'F1':<12}")
print("-"*48)

for threshold in [0.3, 0.4, 0.5, 0.6, 0.7]:
    y_pred_thresh = (y_prob >= threshold).astype(int)
    prec = precision_score(y_test, y_pred_thresh)
    rec = recall_score(y_test, y_pred_thresh)
    f1 = f1_score(y_test, y_pred_thresh)
    print(f"{threshold:<12} {prec:<12.3f} {rec:<12.3f} {f1:<12.3f}")

print(f"""
WHEN TO ADJUST THRESHOLD:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Lower threshold (e.g., 0.3):
  • More predictions of class 1
  • Higher recall (catch more positives)
  • Lower precision (more false positives)
  • Use when: Missing a positive is costly
    Example: Cancer screening — don't miss any!

Higher threshold (e.g., 0.7):
  • Fewer predictions of class 1
  • Lower recall (miss more positives)
  • Higher precision (fewer false positives)
  • Use when: False positives are costly
    Example: Spam filter — don't block good emails!
""")

Multiclass Logistic Regression

What if you have more than 2 classes?

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Iris dataset: 3 classes of flowers
iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic regression handles multiclass automatically!
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

print("MULTICLASS LOGISTIC REGRESSION")
print("="*60)
print(f"\nClasses: {iris.target_names}")
print(f"Accuracy: {model.score(X_test, y_test):.4f}")

# Show probabilities for each class
print(f"\nSample predictions with class probabilities:")
print(f"{'True Class':<15} {'P(setosa)':<12} {'P(versicolor)':<14} {'P(virginica)':<14} {'Predicted'}")
print("-"*70)

probs = model.predict_proba(X_test[:5])
preds = model.predict(X_test[:5])

for i in range(5):
    true_class = iris.target_names[y_test[i]]
    pred_class = iris.target_names[preds[i]]
    print(f"{true_class:<15} {probs[i,0]:<12.3f} {probs[i,1]:<14.3f} {probs[i,2]:<14.3f} {pred_class}")

print(f"""
HOW MULTICLASS WORKS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Method 1: One-vs-Rest (OvR)
  • Train K separate binary classifiers
  • Class k vs all other classes
  • Pick class with highest probability

Method 2: Multinomial (Softmax)
  • Train one model with K outputs
  • Softmax ensures probabilities sum to 1
  • P(class k) = exp(zₖ) / Σexp(zⱼ)

Scikit-learn uses multinomial by default (more efficient).
""")

Regularization in Logistic Regression

Just like linear regression, logistic regression can overfit:

from sklearn.linear_model import LogisticRegression
import numpy as np

print("REGULARIZATION OPTIONS")
print("="*60)

print(f"""
Scikit-learn's LogisticRegression has built-in regularization:

LogisticRegression(
    penalty='l2',     # 'l1', 'l2', 'elasticnet', or 'none'
    C=1.0,            # Inverse of regularization strength
                      # Smaller C = stronger regularization
    solver='lbfgs'    # Optimization algorithm
)

PENALTY OPTIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

'l2' (Ridge):
  • Default, works with all solvers
  • Shrinks coefficients toward zero
  • Keeps all features

'l1' (Lasso):
  • Requires solver='liblinear' or 'saga'
  • Can set coefficients to exactly zero
  • Feature selection!

'elasticnet':
  • Requires solver='saga'
  • Combine L1 and L2
  • Set l1_ratio parameter

'none':
  • No regularization
  • May overfit with many features

C PARAMETER:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

C = 1/λ (inverse of regularization strength)

C = 0.01 → Strong regularization (more shrinkage)
C = 1.0  → Default
C = 100  → Weak regularization (less shrinkage)
""")

# Example with different C values
np.random.seed(42)
X = np.random.randn(100, 20)  # 20 features, mostly noise
y = (X[:, 0] + X[:, 1] > 0).astype(int)  # Only 2 features matter

print(f"{'C value':<12} {'Non-zero coefficients':<25} {'Accuracy'}")
print("-"*50)

for C in [0.01, 0.1, 1.0, 10.0]:
    model = LogisticRegression(C=C, penalty='l1', solver='liblinear')
    model.fit(X, y)
    n_nonzero = np.sum(model.coef_ != 0)
    acc = model.score(X, y)
    print(f"{C:<12} {n_nonzero:<25} {acc:.3f}")

Logistic Regression vs Other Classifiers

print("""
WHEN TO USE LOGISTIC REGRESSION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ USE LOGISTIC REGRESSION WHEN:
  • You need PROBABILITIES (not just predictions)
  • You need INTERPRETABLE coefficients
  • Classes are linearly separable (or close to it)
  • You have a baseline model need
  • You want fast training and prediction
  • You need to understand feature importance

✗ CONSIDER OTHER MODELS WHEN:
  • Decision boundary is highly non-linear
    → Use: Random Forest, SVM with RBF kernel, Neural Networks

  • You have complex feature interactions
    → Use: Gradient Boosting (XGBoost, LightGBM)

  • You have image/text/sequence data
    → Use: Deep Learning (CNNs, Transformers)


COMPARISON:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Model              Speed    Interpretability   Non-linear
─────────────────────────────────────────────────────────
Logistic Reg       Fast     High               No
Decision Tree      Fast     High               Yes
Random Forest      Medium   Low                Yes
SVM (RBF)          Slow     Low                Yes
Neural Network     Slow     Very Low           Yes
XGBoost            Medium   Medium             Yes


LOGISTIC REGRESSION IS OFTEN THE BEST STARTING POINT!
Even if you end up using something fancier, logistic
regression gives you a baseline to beat.
""")

Complete Workflow

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score

def logistic_regression_workflow(X, y, feature_names=None):
    """Complete logistic regression workflow."""

    print("="*70)
    print("LOGISTIC REGRESSION WORKFLOW")
    print("="*70)

    # 1. Split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    print(f"\n1. Data Split: {len(X_train)} train, {len(X_test)} test")
    print(f"   Class balance: {np.mean(y_train):.1%} positive")

    # 2. Scale
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    print("2. Features standardized")

    # 3. Hyperparameter tuning
    param_grid = {
        'C': [0.01, 0.1, 1, 10],
        'penalty': ['l1', 'l2']
    }
    grid_search = GridSearchCV(
        LogisticRegression(solver='liblinear', max_iter=1000),
        param_grid, cv=5, scoring='roc_auc'
    )
    grid_search.fit(X_train_scaled, y_train)

    print(f"\n3. Best hyperparameters:")
    print(f"   C = {grid_search.best_params_['C']}")
    print(f"   Penalty = {grid_search.best_params_['penalty']}")

    # 4. Final model
    model = grid_search.best_estimator_

    # 5. Evaluate
    y_pred = model.predict(X_test_scaled)
    y_prob = model.predict_proba(X_test_scaled)[:, 1]

    print(f"\n4. Test Performance:")
    print(f"   Accuracy: {model.score(X_test_scaled, y_test):.4f}")
    print(f"   ROC-AUC:  {roc_auc_score(y_test, y_prob):.4f}")

    # 6. Feature importance
    if feature_names is not None:
        print(f"\n5. Feature Importance (by |coefficient|):")
        importance = sorted(
            zip(feature_names, model.coef_[0]),
            key=lambda x: abs(x[1]), reverse=True
        )
        for name, coef in importance[:10]:
            direction = "↑" if coef > 0 else "↓"
            print(f"   {name:<20} {coef:>8.4f} {direction}")

    return model, scaler

# Example usage
np.random.seed(42)
X = np.random.randn(1000, 10)
y = (X[:, 0] + 0.5*X[:, 1] - 0.3*X[:, 2] + np.random.randn(1000)*0.5 > 0).astype(int)
feature_names = [f'Feature_{i}' for i in range(10)]

model, scaler = logistic_regression_workflow(X, y, feature_names)

Quick Reference

Aspect	Details
Type	Classification (binary or multiclass)
Output	Probabilities (0 to 1)
Decision Boundary	Linear (straight line/hyperplane)
Loss Function	Cross-entropy (log loss)
Optimization	Maximum likelihood estimation
Regularization	L1, L2, or Elastic Net via `C` parameter
Scaling	Important (especially with regularization)
Strengths	Interpretable, probabilistic, fast, baseline
Weaknesses	Assumes linear decision boundary

Key Takeaways

Sigmoid squishes linear output to 0-1 — Guarantees valid probabilities
Coefficients affect log-odds — Each unit increase adds to log-odds, multiplies odds
Decision boundary is linear — A straight line (or hyperplane) separates classes
Maximum likelihood, not least squares — Optimizes probability of observed data
Threshold is adjustable — 0.5 is default, tune based on precision/recall needs
Regularization prevents overfitting — Use L1 for feature selection, L2 for stability
Works for multiclass — Via one-vs-rest or multinomial (softmax)
Great baseline model — Start here, then try fancier methods

The One-Sentence Summary

Bouncer #1 used a linear formula and got "170% chance of entry" and "-30% chance" — Bouncer #2 squished the same formula through a sigmoid function to get proper probabilities like "95%" and "35%", which is exactly what logistic regression does: take a linear combination of features and transform it through σ(z) = 1/(1+e⁻ᶻ) to produce valid probabilities for classification.

What's Next?

Now that you understand logistic regression, you're ready for:

ROC Curves and AUC — Evaluating classifier performance
Polynomial Features — Making linear models non-linear
Support Vector Machines — Different approach to linear classification
Decision Trees — Non-linear classification

Follow me for the next article in this series!

Let's Connect!

If "squishing to a probability" finally made logistic regression click, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

What's your favorite use of logistic regression? Mine is churn prediction — the probability output lets you prioritize which customers to save! 📞

The difference between "your probability is 170%" and "your probability is 95%"? A sigmoid function. Logistic regression takes the same linear math you know and makes it work for classification by guaranteeing valid probabilities.

Share this with someone trying to use linear regression for classification. They're about to have a much better time.

Happy classifying! 🎯

DEV Community

Logistic Regression: The Bouncer Who Gives Probability of Entry Instead of Just Yes/No

The Bouncer Problem

Bouncer #1: The Linear Thinker

Bouncer #2: The Probability Thinker

The Sigmoid Function: The "Squisher"

Why Not Just Use Linear Regression?

The Math: From Linear to Logistic

Step 1: Start with a Linear Combination

Step 2: Apply the Sigmoid

Step 3: Make a Decision

Understanding Log-Odds

Interpreting Coefficients

The Decision Boundary

How Logistic Regression Learns: Maximum Likelihood

Code: Complete Logistic Regression

Adjusting the Decision Threshold

Multiclass Logistic Regression

Regularization in Logistic Regression

Logistic Regression vs Other Classifiers

Complete Workflow

Quick Reference

Key Takeaways

The One-Sentence Summary

What's Next?

Let's Connect!

Top comments (0)