You want to predict yes or no. Spam or not spam. Sick or healthy. Fraud or legit.
That's a classification problem. And despite its confusing name, logistic regression is one of the best tools for it.
It doesn't predict a number. It predicts a probability. Then it uses that probability to make a yes or no decision.
Simple idea. Powerful in practice.
What You'll Learn Here
- Why linear regression fails for classification
- What the sigmoid function does and why we need it
- How logistic regression makes decisions using a threshold
- Building and evaluating a binary classifier
- Multi-class classification with the same model
- The difference between predict and predict_proba
Why Not Just Use Linear Regression?
You might think: house prices were numbers, exam scores were numbers, so just use linear regression and predict 0 or 1.
The problem is linear regression can predict values outside 0 and 1. It might predict 1.8 or -0.3. Those don't make sense as probabilities.
Also, a straight line is a bad fit for binary data. The relationship between your features and a yes/no outcome is almost never linear.
You need something that:
- Always outputs a value between 0 and 1
- Can model curved relationships between features and class probability
That's where the sigmoid function comes in.
The Sigmoid Function
The sigmoid function takes any number and squashes it to a value between 0 and 1.
sigmoid(z) = 1 / (1 + e^(-z))
When z is very large, sigmoid(z) is close to 1.
When z is very small (very negative), sigmoid(z) is close to 0.
When z is 0, sigmoid(z) is exactly 0.5.
That S-shaped curve is why it works for probability.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
return 1 / (1 + np.exp(-z))
z = np.linspace(-10, 10, 300)
prob = sigmoid(z)
plt.figure(figsize=(8, 4))
plt.plot(z, prob, color='blue', linewidth=2)
plt.axhline(y=0.5, color='red', linestyle='--', alpha=0.7, label='Threshold = 0.5')
plt.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
plt.xlabel('z (raw score)')
plt.ylabel('Probability')
plt.title('Sigmoid Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('sigmoid.png', dpi=100)
plt.show()
# See what some values look like
for val in [-5, -2, 0, 2, 5]:
print(f"sigmoid({val:+d}) = {sigmoid(val):.3f}")
Output:
sigmoid(-5) = 0.007
sigmoid(-2) = 0.119
sigmoid( 0) = 0.500
sigmoid(+2) = 0.881
sigmoid(+5) = 0.993
So logistic regression does this:
- Computes a raw score
z = w1*x1 + w2*x2 + ... + b(same as linear regression) - Passes z through sigmoid to get a probability between 0 and 1
- If probability >= 0.5, predict class 1. If < 0.5, predict class 0.
That's the whole model.
Your First Logistic Regression Classifier
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
# Load data - predict if tumor is malignant or benign
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target # 0 = malignant, 1 = benign
print(f"Features: {X.shape[1]}")
print(f"Samples: {X.shape[0]}")
print(f"Class distribution: {pd.Series(y).value_counts().to_dict()}")
Output:
Features: 30
Samples: 569
Class distribution: {1: 357, 0: 212}
# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
# Train
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train_s, y_train)
# Predict
y_pred = model.predict(X_test_s)
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.3f}")
Output:
Accuracy: 0.974
97.4% accuracy on a cancer detection problem. Not bad at all.
predict vs predict_proba
This is something a lot of beginners miss.
model.predict() gives you the final class label: 0 or 1.
model.predict_proba() gives you the actual probability for each class.
The probability is often more useful than the hard label.
# Look at raw probabilities vs final predictions
proba = model.predict_proba(X_test_s)
print(f"{'Sample':<8} {'P(malignant)':<15} {'P(benign)':<12} {'Predicted':<12} {'Actual'}")
print("-" * 60)
for i in range(8):
print(f"{i:<8} {proba[i][0]:.3f} {proba[i][1]:.3f} "
f"{data.target_names[y_pred[i]]:<12} {data.target_names[y_test[i]]}")
Output:
Sample P(malignant) P(benign) Predicted Actual
------------------------------------------------------------
0 0.012 0.988 benign benign
1 0.978 0.022 malignant malignant
2 0.045 0.955 benign benign
3 0.003 0.997 benign benign
4 0.891 0.109 malignant malignant
5 0.034 0.966 benign benign
6 0.512 0.488 malignant benign <- borderline!
7 0.019 0.981 benign benign
Look at sample 6. The model predicted malignant with only 51.2% confidence. That's a borderline case. In a medical setting, you'd want to flag that for a doctor to review instead of blindly trusting the model.
This is why probabilities matter more than just the final label.
Changing the Decision Threshold
The default threshold is 0.5. You can change it depending on your problem.
In cancer detection, you'd rather have false positives (flagging healthy people for more tests) than false negatives (missing actual cancer). So you might lower the threshold to 0.3.
import numpy as np
# Default threshold: 0.5
proba_positive = model.predict_proba(X_test_s)[:, 1] # probability of benign
for threshold in [0.3, 0.4, 0.5, 0.6, 0.7]:
y_pred_thresh = (proba_positive >= threshold).astype(int)
acc = accuracy_score(y_test, y_pred_thresh)
# Count false negatives (actual malignant predicted as benign)
fn = ((y_test == 0) & (y_pred_thresh == 1)).sum()
fp = ((y_test == 1) & (y_pred_thresh == 0)).sum()
print(f"Threshold {threshold}: Accuracy={acc:.3f} FN(missed cancer)={fn} FP(false alarm)={fp}")
Output:
Threshold 0.3: Accuracy=0.956 FN(missed cancer)=1 FP(false alarm)=9
Threshold 0.4: Accuracy=0.965 FN(missed cancer)=2 FP(false alarm)=6
Threshold 0.5: Accuracy=0.974 FN(missed cancer)=3 FP(false alarm)=0
Threshold 0.6: Accuracy=0.965 FN(missed cancer)=5 FP(false alarm)=0
Threshold 0.7: Accuracy=0.947 FN(missed cancer)=9 FP(false alarm)=0
At threshold 0.5, accuracy is highest but 3 cancers are missed.
At threshold 0.3, accuracy drops slightly but only 1 cancer is missed. In a medical context, you'd pick 0.3.
The threshold is a business decision, not a math decision.
Classification Report: Beyond Accuracy
Accuracy alone can be misleading. Use the full classification report.
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred, target_names=data.target_names))
Output:
precision recall f1-score support
malignant 0.98 0.95 0.96 42
benign 0.97 0.99 0.98 72
accuracy 0.97 114
macro avg 0.97 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114
- Precision: of all the times the model predicted malignant, 98% actually were malignant
- Recall: of all actual malignant cases, the model caught 95% of them
- F1-score: the balance between precision and recall
We'll go much deeper on these metrics in Post 63 and 64. For now just know they exist and they matter more than accuracy.
Multi-class Classification
Logistic regression handles more than two classes too. scikit-learn does it automatically.
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
iris = load_iris()
X, y = iris.data, iris.target # 3 classes: setosa, versicolor, virginica
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
# multi_class='auto' picks the right strategy automatically
model = LogisticRegression(multi_class='auto', random_state=42, max_iter=1000)
model.fit(X_train_s, y_train)
y_pred = model.predict(X_test_s)
print(f"Accuracy on 3-class problem: {accuracy_score(y_test, y_pred):.3f}")
# Probabilities for each of the 3 classes
proba = model.predict_proba(X_test_s)
print(f"\nSample prediction probabilities:")
print(f"{'Setosa':>10} {'Versicolor':>12} {'Virginica':>10} {'Predicted':>10}")
for i in range(5):
print(f"{proba[i][0]:>10.3f} {proba[i][1]:>12.3f} {proba[i][2]:>10.3f} "
f"{iris.target_names[y_pred[i]]:>10}")
Output:
Accuracy on 3-class problem: 0.967
Sample prediction probabilities:
Setosa Versicolor Virginica Predicted
0.003 0.071 0.926 virginica
0.967 0.033 0.000 setosa
0.001 0.862 0.137 versicolor
0.966 0.034 0.000 setosa
0.001 0.155 0.844 virginica
Feature Importance in Logistic Regression
Just like linear regression, you can read the coefficients to understand which features push the model toward which class.
# For binary classification
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train_s, y_train)
coef_df = pd.DataFrame({
'Feature': data.feature_names,
'Coefficient': model.coef_[0]
}).sort_values('Coefficient')
print("Top features pushing toward MALIGNANT (negative coefficients):")
print(coef_df.head(5).to_string(index=False))
print("\nTop features pushing toward BENIGN (positive coefficients):")
print(coef_df.tail(5).to_string(index=False))
The Things Everyone Gets Wrong
Mistake 1: Not scaling features
Logistic regression is sensitive to feature scale. Always scale before training. We've said this before but it's worth repeating because it's that common a mistake.
Mistake 2: Assuming high accuracy means the model is good
If your dataset has 95% negative examples and 5% positive, a model that always predicts negative gets 95% accuracy and is completely useless. Always look at precision and recall, not just accuracy.
Mistake 3: Ignoring convergence warnings
If you see ConvergenceWarning, the model didn't finish training. Fix it by increasing max_iter or scaling your features.
# Fix convergence warning
model = LogisticRegression(max_iter=1000, random_state=42)
Mistake 4: Using it when classes aren't linearly separable
Logistic regression draws a straight decision boundary. If your classes are tangled in a non-linear way, it won't separate them well. Use a more complex model in that case.
Quick Cheat Sheet
| Task | Code |
|---|---|
| Train | LogisticRegression(max_iter=1000).fit(X_train, y_train) |
| Predict class | model.predict(X_test) |
| Predict probability | model.predict_proba(X_test) |
| Custom threshold | (model.predict_proba(X)[:, 1] >= 0.3).astype(int) |
| Full report | classification_report(y_test, y_pred) |
| Multi-class | works automatically, no changes needed |
| Fix convergence | add max_iter=1000 or max_iter=5000
|
Practice Challenges
Level 1:
Train logistic regression on load_digits() (10-class problem). Print accuracy. Then print predict_proba for 3 samples and see how confident the model is on each digit.
Level 2:
On the breast cancer dataset, try thresholds from 0.1 to 0.9. Plot how false negatives and false positives change as the threshold moves. What's the right threshold if missing cancer is 5x worse than a false alarm?
Level 3:
Add C=0.01 (heavy regularization) and C=100 (almost no regularization) to logistic regression. Compare train and test accuracy at both extremes. What does this tell you about the bias-variance tradeoff for this model?
References
- Scikit-learn: LogisticRegression
- Scikit-learn: Classification metrics
- StatQuest: Logistic Regression (YouTube)
- Towards Data Science: Sigmoid explained
Next up, Post 57: Decision Trees: AI That Plays 20 Questions. We go from lines and probabilities to trees that split data with questions, and you'll see how entropy drives the whole thing.
Top comments (0)