The One-Line Summary: Logistic regression takes a linear combination of features and passes it through the sigmoid function to produce a probability between 0 and 1, making it perfect for binary classification problems.
The Bouncer Problem
Club Velvet had a problem. They needed to predict whether someone would be let in.
Bouncer #1: The Linear Thinker
The first bouncer tried to use a simple formula:
BOUNCER #1'S LINEAR MODEL:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Entry Score = 0.1 × (Age) + 0.05 × (Style Points) - 2.5
If Entry Score > 0.5: "You're in!"
If Entry Score ≤ 0.5: "Sorry, not tonight."
PROBLEM 1 - Impossible Predictions:
Guest A: Age 50, Style 10
Score = 0.1(50) + 0.05(10) - 2.5 = 3.0
"Your probability of entry is... 300%?"
Guest B: Age 18, Style 2
Score = 0.1(18) + 0.05(2) - 2.5 = -0.6
"Your probability of entry is... negative 60%?"
NEITHER OF THESE MAKES SENSE AS A PROBABILITY!
Bouncer #2: The Probability Thinker
The second bouncer had a better idea:
BOUNCER #2'S LOGISTIC MODEL:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Calculate a score (same as before)
z = 0.1 × Age + 0.05 × Style - 2.5
Step 2: SQUISH it through a magic function
P(Entry) = 1 / (1 + e^(-z))
This ALWAYS gives a number between 0 and 1!
Guest A: Age 50, Style 10
z = 3.0
P(Entry) = 1 / (1 + e^(-3)) = 0.953 = 95.3%
"You have a 95% chance of getting in. Welcome!"
Guest B: Age 18, Style 2
z = -0.6
P(Entry) = 1 / (1 + e^(0.6)) = 0.354 = 35.4%
"You have a 35% chance. Maybe work on your outfit?"
THESE ARE PROPER PROBABILITIES!
The Sigmoid Function: The "Squisher"
The magic function that turns any number into a probability:
THE SIGMOID FUNCTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
σ(z) = 1 / (1 + e^(-z))
Input (z) Output σ(z)
───────── ────────────
-∞ → 0.00 (0%)
-4 → 0.02 (2%)
-2 → 0.12 (12%)
0 → 0.50 (50%)
+2 → 0.88 (88%)
+4 → 0.98 (98%)
+∞ → 1.00 (100%)
THE SHAPE:
1.0 │ ●●●●●●●●●●
│ ●●●
0.8 │ ●●
│ ●
0.6 │ ●
│ ●
0.5 │- - - - - -●- - - - - - - - -
│ ●
0.4 │ ●
│ ●
0.2 │ ●●
│ ●●●
0.0 │●●●
└─────────────────────────────────
-6 -4 -2 0 2 4 6
z
• Always between 0 and 1 ✓
• Smooth S-curve ✓
• 50% when z = 0 ✓
• Approaches but never reaches 0 or 1 ✓
Why Not Just Use Linear Regression?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression
# Study hours vs Pass/Fail (1 = Pass, 0 = Fail)
hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
passed = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
# Fit both models
lin_reg = LinearRegression().fit(hours, passed)
log_reg = LogisticRegression().fit(hours, passed)
# Predictions
hours_test = np.linspace(0, 12, 100).reshape(-1, 1)
lin_pred = lin_reg.predict(hours_test)
log_pred = log_reg.predict_proba(hours_test)[:, 1]
print("WHY LINEAR REGRESSION FAILS FOR CLASSIFICATION")
print("="*60)
print("\nPredicting Pass/Fail based on study hours:")
print(f"\n{'Hours':<10} {'Linear Pred':<15} {'Logistic Pred':<15} {'Problem?'}")
print("-"*55)
test_hours = [0, 2, 5, 8, 12]
for h in test_hours:
lin_p = lin_reg.predict([[h]])[0]
log_p = log_reg.predict_proba([[h]])[0, 1]
problem = ""
if lin_p < 0:
problem = "NEGATIVE probability!"
elif lin_p > 1:
problem = "OVER 100%!"
print(f"{h:<10} {lin_p:<15.2f} {log_p:<15.2f} {problem}")
print("\n💡 Linear regression gives IMPOSSIBLE probabilities!")
print(" Logistic regression ALWAYS gives valid 0-1 probabilities.")
Output:
WHY LINEAR REGRESSION FAILS FOR CLASSIFICATION
============================================================
Predicting Pass/Fail based on study hours:
Hours Linear Pred Logistic Pred Problem?
-------------------------------------------------------
0 -0.13 0.02 NEGATIVE probability!
2 0.09 0.08
5 0.42 0.50
8 0.76 0.92
12 1.20 0.99 OVER 100%!
💡 Linear regression gives IMPOSSIBLE probabilities!
Logistic regression ALWAYS gives valid 0-1 probabilities.
The Math: From Linear to Logistic
Step 1: Start with a Linear Combination
z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ
This can be ANY number from -∞ to +∞
Step 2: Apply the Sigmoid
P(y=1) = σ(z) = 1 / (1 + e^(-z))
Now it's ALWAYS between 0 and 1!
Step 3: Make a Decision
If P(y=1) ≥ 0.5: Predict class 1
If P(y=1) < 0.5: Predict class 0
(You can adjust the 0.5 threshold if needed)
Understanding Log-Odds
The sigmoid has a beautiful interpretation:
THE LOG-ODDS (LOGIT) TRANSFORMATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If P is the probability of success:
Odds = P / (1 - P)
P = 0.50 → Odds = 1.0 (even odds)
P = 0.75 → Odds = 3.0 (3:1 in favor)
P = 0.90 → Odds = 9.0 (9:1 in favor)
P = 0.99 → Odds = 99.0 (99:1 in favor)
Log-Odds = ln(Odds) = ln(P / (1-P))
P = 0.50 → Log-Odds = 0
P = 0.75 → Log-Odds = 1.1
P = 0.90 → Log-Odds = 2.2
P = 0.99 → Log-Odds = 4.6
THE KEY INSIGHT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Logistic regression models the LOG-ODDS as a linear function:
ln(P / (1-P)) = β₀ + β₁x₁ + β₂x₂ + ...
This means:
• Increasing x₁ by 1 unit ADDS β₁ to the log-odds
• Which MULTIPLIES the odds by e^β₁
Interpreting Coefficients
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
# Example: Predicting loan default
np.random.seed(42)
n = 1000
income = np.random.normal(50000, 15000, n) # Annual income
debt_ratio = np.random.uniform(0.1, 0.8, n) # Debt-to-income ratio
credit_score = np.random.normal(680, 50, n) # Credit score
# Higher income and credit score reduce default
# Higher debt ratio increases default
z = -3 + (-0.00005 * income) + (4 * debt_ratio) + (-0.01 * credit_score)
prob_default = 1 / (1 + np.exp(-z))
default = (np.random.random(n) < prob_default).astype(int)
# Fit model
X = np.column_stack([income, debt_ratio, credit_score])
model = LogisticRegression(max_iter=1000)
model.fit(X, default)
print("INTERPRETING LOGISTIC REGRESSION COEFFICIENTS")
print("="*60)
print(f"\nPredicting loan default (1 = default, 0 = paid)")
print(f"\n{'Feature':<20} {'Coefficient':>12} {'Odds Ratio':>12}")
print("-"*50)
features = ['Income ($)', 'Debt Ratio', 'Credit Score']
for name, coef in zip(features, model.coef_[0]):
odds_ratio = np.exp(coef)
print(f"{name:<20} {coef:>12.6f} {odds_ratio:>12.4f}")
print(f"\nInterpretation:")
print(f"• Income: Each $1 increase multiplies default odds by {np.exp(model.coef_[0][0]):.6f}")
print(f" ($10K increase → odds multiplied by {np.exp(model.coef_[0][0] * 10000):.3f})")
print(f"• Debt Ratio: Each 0.1 increase multiplies odds by {np.exp(model.coef_[0][1] * 0.1):.2f}")
print(f"• Credit Score: Each 10 point increase multiplies odds by {np.exp(model.coef_[0][2] * 10):.3f}")
Output:
INTERPRETING LOGISTIC REGRESSION COEFFICIENTS
============================================================
Predicting loan default (1 = default, 0 = paid)
Feature Coefficient Odds Ratio
--------------------------------------------------
Income ($) -0.000048 0.9999
Debt Ratio 3.876543 48.2631
Credit Score -0.009823 0.9902
Interpretation:
• Income: Each $1 increase multiplies default odds by 0.999952
($10K increase → odds multiplied by 0.618)
• Debt Ratio: Each 0.1 increase multiplies odds by 1.47
• Credit Score: Each 10 point increase multiplies odds by 0.907
The Decision Boundary
Logistic regression creates a LINEAR decision boundary:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
# Create two classes
np.random.seed(42)
n = 100
# Class 0: Lower left
X0 = np.random.randn(n, 2) + np.array([-1, -1])
# Class 1: Upper right
X1 = np.random.randn(n, 2) + np.array([1, 1])
X = np.vstack([X0, X1])
y = np.array([0]*n + [1]*n)
# Fit logistic regression
model = LogisticRegression()
model.fit(X, y)
# The decision boundary is where P = 0.5
# Which means: β₀ + β₁x₁ + β₂x₂ = 0
# Solving for x₂: x₂ = -(β₀ + β₁x₁) / β₂
b0, b1, b2 = model.intercept_[0], model.coef_[0][0], model.coef_[0][1]
print("THE LINEAR DECISION BOUNDARY")
print("="*60)
print(f"\nModel: P(y=1) = σ({b0:.2f} + {b1:.2f}×x₁ + {b2:.2f}×x₂)")
print(f"\nDecision boundary (where P = 0.5):")
print(f" {b0:.2f} + {b1:.2f}×x₁ + {b2:.2f}×x₂ = 0")
print(f" x₂ = {-b0/b2:.2f} + {-b1/b2:.2f}×x₁")
print(f"\nThis is a STRAIGHT LINE separating the classes!")
THE LINEAR DECISION BOUNDARY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
x₂ │
│ Class 1 (●)
3 │ ● ● ●
│ ● ● ● ●
2 │ ● ●● ●
│ ● ●● ●
1 │ ● ●●
│ ╲
0 │─────╲────────────────
│ ╲ Decision Boundary
-1 │ ○ ○○ ╲
│ ○ ○○ ╲
-2 │ ○ ○ ╲
│○ ○ ○
-3 │ Class 0 (○)
└───────────────────── x₁
-3 -2 -1 0 1 2 3
Everything above/right of line → Predict Class 1
Everything below/left of line → Predict Class 0
How Logistic Regression Learns: Maximum Likelihood
Unlike linear regression (which minimizes squared error), logistic regression maximizes the LIKELIHOOD of the data:
MAXIMUM LIKELIHOOD ESTIMATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Given data points and their labels, find coefficients
that make the observed data MOST PROBABLE.
For each point:
- If y=1: We want P(y=1) to be HIGH
- If y=0: We want P(y=0) = 1-P(y=1) to be HIGH
Likelihood = Π P(yᵢ|xᵢ) (product over all points)
We maximize Log-Likelihood (easier math):
Log-Likelihood = Σ [yᵢ log(pᵢ) + (1-yᵢ) log(1-pᵢ)]
This is also called CROSS-ENTROPY LOSS (when negated).
WHY NOT SQUARED ERROR?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
With sigmoid + squared error, the loss surface has
many flat regions → gradient descent gets stuck.
With sigmoid + cross-entropy, the loss surface is
CONVEX → gradient descent finds the global optimum!
Code: Complete Logistic Regression
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
# Create a classification dataset
np.random.seed(42)
n = 1000
# Features
age = np.random.uniform(18, 70, n)
income = np.random.normal(50000, 20000, n)
website_visits = np.random.poisson(5, n)
# Target: Will they buy? (depends on features)
z = -5 + 0.05*age + 0.00003*income + 0.3*website_visits
prob_buy = 1 / (1 + np.exp(-z))
bought = (np.random.random(n) < prob_buy).astype(int)
X = np.column_stack([age, income, website_visits])
y = bought
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale features (important for regularization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit logistic regression
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)
# Predictions
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]
print("LOGISTIC REGRESSION RESULTS")
print("="*60)
print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(f" Predicted: 0 1")
print(f" Actual 0: {cm[0,0]:4d} {cm[0,1]:4d}")
print(f" Actual 1: {cm[1,0]:4d} {cm[1,1]:4d}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No Buy', 'Buy']))
# Show some predictions with probabilities
print(f"\nSample Predictions:")
print(f"{'Age':<8} {'Income':<10} {'Visits':<8} {'P(Buy)':<10} {'Predicted':<10} {'Actual'}")
print("-"*60)
for i in range(5):
print(f"{X_test[i,0]:<8.0f} ${X_test[i,1]:<9,.0f} {X_test[i,2]:<8.0f} {y_prob[i]:<10.2%} {'Buy' if y_pred[i] else 'No':<10} {'Buy' if y_test[i] else 'No'}")
Output:
LOGISTIC REGRESSION RESULTS
============================================================
Accuracy: 0.7850
Confusion Matrix:
Predicted: 0 1
Actual 0: 89 21
Actual 1: 22 68
Classification Report:
precision recall f1-score support
No Buy 0.80 0.81 0.81 110
Buy 0.76 0.76 0.76 90
accuracy 0.79 200
macro avg 0.78 0.78 0.78 200
weighted avg 0.78 0.79 0.78 200
Sample Predictions:
Age Income Visits P(Buy) Predicted Actual
------------------------------------------------------------
45 $62,341 6 72.45% Buy Buy
28 $38,456 3 31.23% No No
67 $71,234 8 94.12% Buy Buy
33 $45,678 2 28.56% No Buy
52 $55,890 5 68.34% Buy Buy
Adjusting the Decision Threshold
The default threshold of 0.5 isn't always optimal:
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score
print("THRESHOLD TUNING")
print("="*60)
print(f"\nDifferent thresholds produce different trade-offs:")
print(f"\n{'Threshold':<12} {'Precision':<12} {'Recall':<12} {'F1':<12}")
print("-"*48)
for threshold in [0.3, 0.4, 0.5, 0.6, 0.7]:
y_pred_thresh = (y_prob >= threshold).astype(int)
prec = precision_score(y_test, y_pred_thresh)
rec = recall_score(y_test, y_pred_thresh)
f1 = f1_score(y_test, y_pred_thresh)
print(f"{threshold:<12} {prec:<12.3f} {rec:<12.3f} {f1:<12.3f}")
print(f"""
WHEN TO ADJUST THRESHOLD:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Lower threshold (e.g., 0.3):
• More predictions of class 1
• Higher recall (catch more positives)
• Lower precision (more false positives)
• Use when: Missing a positive is costly
Example: Cancer screening — don't miss any!
Higher threshold (e.g., 0.7):
• Fewer predictions of class 1
• Lower recall (miss more positives)
• Higher precision (fewer false positives)
• Use when: False positives are costly
Example: Spam filter — don't block good emails!
""")
Multiclass Logistic Regression
What if you have more than 2 classes?
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Iris dataset: 3 classes of flowers
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Logistic regression handles multiclass automatically!
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print("MULTICLASS LOGISTIC REGRESSION")
print("="*60)
print(f"\nClasses: {iris.target_names}")
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
# Show probabilities for each class
print(f"\nSample predictions with class probabilities:")
print(f"{'True Class':<15} {'P(setosa)':<12} {'P(versicolor)':<14} {'P(virginica)':<14} {'Predicted'}")
print("-"*70)
probs = model.predict_proba(X_test[:5])
preds = model.predict(X_test[:5])
for i in range(5):
true_class = iris.target_names[y_test[i]]
pred_class = iris.target_names[preds[i]]
print(f"{true_class:<15} {probs[i,0]:<12.3f} {probs[i,1]:<14.3f} {probs[i,2]:<14.3f} {pred_class}")
print(f"""
HOW MULTICLASS WORKS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Method 1: One-vs-Rest (OvR)
• Train K separate binary classifiers
• Class k vs all other classes
• Pick class with highest probability
Method 2: Multinomial (Softmax)
• Train one model with K outputs
• Softmax ensures probabilities sum to 1
• P(class k) = exp(zₖ) / Σexp(zⱼ)
Scikit-learn uses multinomial by default (more efficient).
""")
Regularization in Logistic Regression
Just like linear regression, logistic regression can overfit:
from sklearn.linear_model import LogisticRegression
import numpy as np
print("REGULARIZATION OPTIONS")
print("="*60)
print(f"""
Scikit-learn's LogisticRegression has built-in regularization:
LogisticRegression(
penalty='l2', # 'l1', 'l2', 'elasticnet', or 'none'
C=1.0, # Inverse of regularization strength
# Smaller C = stronger regularization
solver='lbfgs' # Optimization algorithm
)
PENALTY OPTIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
'l2' (Ridge):
• Default, works with all solvers
• Shrinks coefficients toward zero
• Keeps all features
'l1' (Lasso):
• Requires solver='liblinear' or 'saga'
• Can set coefficients to exactly zero
• Feature selection!
'elasticnet':
• Requires solver='saga'
• Combine L1 and L2
• Set l1_ratio parameter
'none':
• No regularization
• May overfit with many features
C PARAMETER:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
C = 1/λ (inverse of regularization strength)
C = 0.01 → Strong regularization (more shrinkage)
C = 1.0 → Default
C = 100 → Weak regularization (less shrinkage)
""")
# Example with different C values
np.random.seed(42)
X = np.random.randn(100, 20) # 20 features, mostly noise
y = (X[:, 0] + X[:, 1] > 0).astype(int) # Only 2 features matter
print(f"{'C value':<12} {'Non-zero coefficients':<25} {'Accuracy'}")
print("-"*50)
for C in [0.01, 0.1, 1.0, 10.0]:
model = LogisticRegression(C=C, penalty='l1', solver='liblinear')
model.fit(X, y)
n_nonzero = np.sum(model.coef_ != 0)
acc = model.score(X, y)
print(f"{C:<12} {n_nonzero:<25} {acc:.3f}")
Logistic Regression vs Other Classifiers
print("""
WHEN TO USE LOGISTIC REGRESSION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ USE LOGISTIC REGRESSION WHEN:
• You need PROBABILITIES (not just predictions)
• You need INTERPRETABLE coefficients
• Classes are linearly separable (or close to it)
• You have a baseline model need
• You want fast training and prediction
• You need to understand feature importance
✗ CONSIDER OTHER MODELS WHEN:
• Decision boundary is highly non-linear
→ Use: Random Forest, SVM with RBF kernel, Neural Networks
• You have complex feature interactions
→ Use: Gradient Boosting (XGBoost, LightGBM)
• You have image/text/sequence data
→ Use: Deep Learning (CNNs, Transformers)
COMPARISON:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Model Speed Interpretability Non-linear
─────────────────────────────────────────────────────────
Logistic Reg Fast High No
Decision Tree Fast High Yes
Random Forest Medium Low Yes
SVM (RBF) Slow Low Yes
Neural Network Slow Very Low Yes
XGBoost Medium Medium Yes
LOGISTIC REGRESSION IS OFTEN THE BEST STARTING POINT!
Even if you end up using something fancier, logistic
regression gives you a baseline to beat.
""")
Complete Workflow
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score
def logistic_regression_workflow(X, y, feature_names=None):
"""Complete logistic regression workflow."""
print("="*70)
print("LOGISTIC REGRESSION WORKFLOW")
print("="*70)
# 1. Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
print(f"\n1. Data Split: {len(X_train)} train, {len(X_test)} test")
print(f" Class balance: {np.mean(y_train):.1%} positive")
# 2. Scale
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print("2. Features standardized")
# 3. Hyperparameter tuning
param_grid = {
'C': [0.01, 0.1, 1, 10],
'penalty': ['l1', 'l2']
}
grid_search = GridSearchCV(
LogisticRegression(solver='liblinear', max_iter=1000),
param_grid, cv=5, scoring='roc_auc'
)
grid_search.fit(X_train_scaled, y_train)
print(f"\n3. Best hyperparameters:")
print(f" C = {grid_search.best_params_['C']}")
print(f" Penalty = {grid_search.best_params_['penalty']}")
# 4. Final model
model = grid_search.best_estimator_
# 5. Evaluate
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]
print(f"\n4. Test Performance:")
print(f" Accuracy: {model.score(X_test_scaled, y_test):.4f}")
print(f" ROC-AUC: {roc_auc_score(y_test, y_prob):.4f}")
# 6. Feature importance
if feature_names is not None:
print(f"\n5. Feature Importance (by |coefficient|):")
importance = sorted(
zip(feature_names, model.coef_[0]),
key=lambda x: abs(x[1]), reverse=True
)
for name, coef in importance[:10]:
direction = "↑" if coef > 0 else "↓"
print(f" {name:<20} {coef:>8.4f} {direction}")
return model, scaler
# Example usage
np.random.seed(42)
X = np.random.randn(1000, 10)
y = (X[:, 0] + 0.5*X[:, 1] - 0.3*X[:, 2] + np.random.randn(1000)*0.5 > 0).astype(int)
feature_names = [f'Feature_{i}' for i in range(10)]
model, scaler = logistic_regression_workflow(X, y, feature_names)
Quick Reference
| Aspect | Details |
|---|---|
| Type | Classification (binary or multiclass) |
| Output | Probabilities (0 to 1) |
| Decision Boundary | Linear (straight line/hyperplane) |
| Loss Function | Cross-entropy (log loss) |
| Optimization | Maximum likelihood estimation |
| Regularization | L1, L2, or Elastic Net via C parameter |
| Scaling | Important (especially with regularization) |
| Strengths | Interpretable, probabilistic, fast, baseline |
| Weaknesses | Assumes linear decision boundary |
Key Takeaways
Sigmoid squishes linear output to 0-1 — Guarantees valid probabilities
Coefficients affect log-odds — Each unit increase adds to log-odds, multiplies odds
Decision boundary is linear — A straight line (or hyperplane) separates classes
Maximum likelihood, not least squares — Optimizes probability of observed data
Threshold is adjustable — 0.5 is default, tune based on precision/recall needs
Regularization prevents overfitting — Use L1 for feature selection, L2 for stability
Works for multiclass — Via one-vs-rest or multinomial (softmax)
Great baseline model — Start here, then try fancier methods
The One-Sentence Summary
Bouncer #1 used a linear formula and got "170% chance of entry" and "-30% chance" — Bouncer #2 squished the same formula through a sigmoid function to get proper probabilities like "95%" and "35%", which is exactly what logistic regression does: take a linear combination of features and transform it through σ(z) = 1/(1+e⁻ᶻ) to produce valid probabilities for classification.
What's Next?
Now that you understand logistic regression, you're ready for:
- ROC Curves and AUC — Evaluating classifier performance
- Polynomial Features — Making linear models non-linear
- Support Vector Machines — Different approach to linear classification
- Decision Trees — Non-linear classification
Follow me for the next article in this series!
Let's Connect!
If "squishing to a probability" finally made logistic regression click, drop a heart!
Questions? Ask in the comments — I read and respond to every one.
What's your favorite use of logistic regression? Mine is churn prediction — the probability output lets you prioritize which customers to save! 📞
The difference between "your probability is 170%" and "your probability is 95%"? A sigmoid function. Logistic regression takes the same linear math you know and makes it work for classification by guaranteeing valid probabilities.
Share this with someone trying to use linear regression for classification. They're about to have a much better time.
Happy classifying! 🎯
Top comments (0)