DEV Community

Cover image for Imbalanced Datasets: When Your Model Gets 99% Accuracy by Being Completely Useless
Sachin Kr. Rajput
Sachin Kr. Rajput

Posted on

Imbalanced Datasets: When Your Model Gets 99% Accuracy by Being Completely Useless

The One-Line Summary: Imbalanced datasets trick models into ignoring the minority class. The fix? Resample your data, adjust class weights, change your metrics, or use algorithms designed for imbalance.


The Lazy Security Guard

Meet Gary, a security guard at a museum.

In 10 years, there have been exactly 3 attempted thefts. Everything else? Normal visitors.

Gary develops a strategy:

"Everyone is innocent. I'll never stop anyone."

His performance review comes in:

Days worked:         3,650
Correct predictions: 3,647  (normal visitors correctly ignored)
Wrong predictions:   3      (thieves walked right past)

ACCURACY: 99.92%!
Enter fullscreen mode Exit fullscreen mode

Gary gets Employee of the Month. He's almost never wrong!

But Gary is completely useless. He caught exactly ZERO thieves. His entire job is catching thieves, and he's failed at 100% of the cases that mattered.


This is your machine learning model on imbalanced data.

When 99.9% of your data is one class, the model learns Gary's strategy:

"Just predict the majority class. You'll be right almost all the time!"

High accuracy. Zero usefulness.


What Is Class Imbalance?

Class imbalance occurs when one class vastly outnumbers another.

Balanced Dataset:
Class A: ████████████████████ 50%
Class B: ████████████████████ 50%

Imbalanced Dataset:
Class A: ████████████████████████████████████████ 99%
Class B: █ 1%

Severely Imbalanced:
Class A: ████████████████████████████████████████ 99.9%
Class B: . 0.1%
Enter fullscreen mode Exit fullscreen mode

Real-world examples:

Domain Minority Class Typical Ratio
Fraud Detection Fraud 1:1,000
Medical Diagnosis Disease 1:100 to 1:10,000
Spam Detection Spam 1:10 to 1:100
Manufacturing Defects Defective 1:1,000
Customer Churn Churned 1:5 to 1:20
Click-Through Rate Clicked 1:100 to 1:1,000

In all these cases, the minority class is exactly what you care about. And it's exactly what your model ignores.


The Accuracy Trap

Let me prove how misleading accuracy becomes.

import numpy as np
from sklearn.metrics import accuracy_score, classification_report

# Imbalanced dataset: 1% fraud
np.random.seed(42)
n = 10000
y_true = np.array([0] * 9900 + [1] * 100)  # 99% normal, 1% fraud

# The "Lazy Gary" model: Always predict 0 (not fraud)
y_pred_lazy = np.zeros(n)

# The accuracy looks amazing!
print(f"Accuracy: {accuracy_score(y_true, y_pred_lazy):.1%}")
print("\nBut look at the full picture:")
print(classification_report(y_true, y_pred_lazy, target_names=['Normal', 'Fraud']))
Enter fullscreen mode Exit fullscreen mode

Output:

Accuracy: 99.0%

But look at the full picture:
              precision    recall  f1-score   support

      Normal       0.99      1.00      0.99      9900
       Fraud       0.00      0.00      0.00       100

    accuracy                           0.99     10000
   macro avg       0.49      0.50      0.50     10000
weighted avg       0.98      0.99      0.98     10000
Enter fullscreen mode Exit fullscreen mode

99% accuracy! But:

  • Fraud Recall: 0% — Caught zero frauds
  • Fraud Precision: 0% — Never even tried
  • Fraud F1: 0% — Complete failure at the actual task

The model is useless. It just learned to say "not fraud" every time.


The Metrics That Matter

When classes are imbalanced, forget accuracy. Use these instead:

Precision

"Of all the things I flagged as fraud, how many actually were?"

Precision = True Positives / (True Positives + False Positives)

High precision = Few false alarms
Enter fullscreen mode Exit fullscreen mode

Recall (Sensitivity)

"Of all the actual frauds, how many did I catch?"

Recall = True Positives / (True Positives + False Negatives)

High recall = Caught most frauds
Enter fullscreen mode Exit fullscreen mode

F1 Score

"The balance between precision and recall"

F1 = 2 × (Precision × Recall) / (Precision + Recall)

F1 = 0 means complete failure on the minority class
Enter fullscreen mode Exit fullscreen mode

Area Under ROC Curve (AUC-ROC)

"How well does the model separate the two classes across all thresholds?"

AUC = 0.5 → Random guessing
AUC = 1.0 → Perfect separation
Enter fullscreen mode Exit fullscreen mode

Area Under Precision-Recall Curve (AUC-PR)

"Better than ROC for severe imbalance"

When imbalance is extreme, AUC-PR is more informative than AUC-ROC.


The Arsenal: Every Way to Handle Imbalance

Strategy 1: Resample the Data

Option A: Oversample the Minority Class

The idea: Duplicate minority class examples until classes are balanced.

from sklearn.utils import resample
import pandas as pd

# Original: 9900 normal, 100 fraud
df_majority = df[df['target'] == 0]
df_minority = df[df['target'] == 1]

# Oversample minority to match majority
df_minority_upsampled = resample(
    df_minority,
    replace=True,              # Sample with replacement
    n_samples=len(df_majority), # Match majority count
    random_state=42
)

# Combine
df_balanced = pd.concat([df_majority, df_minority_upsampled])
print(df_balanced['target'].value_counts())
# 0    9900
# 1    9900  ← Now balanced!
Enter fullscreen mode Exit fullscreen mode

Visual:

Before:
Normal: ████████████████████████████████████████ 9900
Fraud:  █ 100

After oversampling:
Normal: ████████████████████████████████████████ 9900
Fraud:  ████████████████████████████████████████ 9900 (duplicates)
Enter fullscreen mode Exit fullscreen mode

Pros: Simple, keeps all data
Cons: Can cause overfitting (model memorizes duplicates)


Option B: Undersample the Majority Class

The idea: Randomly remove majority class examples until balanced.

# Undersample majority to match minority
df_majority_downsampled = resample(
    df_majority,
    replace=False,             # No replacement
    n_samples=len(df_minority), # Match minority count
    random_state=42
)

# Combine
df_balanced = pd.concat([df_majority_downsampled, df_minority])
print(df_balanced['target'].value_counts())
# 0    100
# 1    100  ← Balanced, but tiny!
Enter fullscreen mode Exit fullscreen mode

Visual:

Before:
Normal: ████████████████████████████████████████ 9900
Fraud:  █ 100

After undersampling:
Normal: █ 100 (threw away 9800!)
Fraud:  █ 100
Enter fullscreen mode Exit fullscreen mode

Pros: Fast training, no duplicates
Cons: Throws away potentially useful data!


Option C: SMOTE (Synthetic Minority Oversampling)

The idea: Don't just duplicate — CREATE NEW synthetic minority examples.

SMOTE finds a minority example, looks at its nearest neighbors, and creates new examples along the line between them.

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)

print(f"Before: {dict(zip(*np.unique(y, return_counts=True)))}")
print(f"After:  {dict(zip(*np.unique(y_resampled, return_counts=True)))}")
Enter fullscreen mode Exit fullscreen mode

Output:

Before: {0: 9900, 1: 100}
After:  {0: 9900, 1: 9900}
Enter fullscreen mode Exit fullscreen mode

Visual:

Original minority points:     ●       ●       ●

SMOTE creates synthetic points along lines:
                             ●   ◐   ●   ◐   ●
                               ↑       ↑
                          Synthetic points!
Enter fullscreen mode Exit fullscreen mode

Pros: Creates diverse examples, not just duplicates
Cons: Can create unrealistic examples, requires imbalanced-learn


Option D: SMOTE Variants

from imblearn.over_sampling import SMOTE, ADASYN, BorderlineSMOTE

# Standard SMOTE
smote = SMOTE(random_state=42)

# ADASYN: Creates more synthetics in harder regions
adasyn = ADASYN(random_state=42)

# Borderline SMOTE: Focuses on decision boundary
borderline = BorderlineSMOTE(random_state=42)
Enter fullscreen mode Exit fullscreen mode
Variant Strategy
SMOTE Uniform synthetic generation
ADASYN More synthetics where minority is harder to learn
BorderlineSMOTE Focus on examples near decision boundary
SMOTE-NC Handles mixed numerical and categorical

Strategy 2: Adjust Class Weights

The idea: Don't change the data — change how much the model CARES about each class.

Make errors on the minority class MORE EXPENSIVE.

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# Option 1: Automatic balancing
model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)

# Option 2: Manual weights
# Fraud errors are 100x more costly than normal errors
model = LogisticRegression(class_weight={0: 1, 1: 100})
model.fit(X_train, y_train)

# Works with many sklearn models!
rf = RandomForestClassifier(class_weight='balanced')
Enter fullscreen mode Exit fullscreen mode

What class_weight='balanced' does:

weight = n_samples / (n_classes × n_samples_per_class)

For 9900 normal, 100 fraud:
  Normal weight: 10000 / (2 × 9900) = 0.505
  Fraud weight:  10000 / (2 × 100)  = 50.0

Fraud errors now count 100x more!
Enter fullscreen mode Exit fullscreen mode

Pros: No data manipulation, simple, no overfitting risk
Cons: Not all algorithms support it


Strategy 3: Change Your Algorithm

Some algorithms handle imbalance better than others.

Balanced Random Forest

from imblearn.ensemble import BalancedRandomForestClassifier

# Automatically balances each bootstrap sample
model = BalancedRandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Easy Ensemble

from imblearn.ensemble import EasyEnsembleClassifier

# Creates multiple balanced subsets and ensembles them
model = EasyEnsembleClassifier(n_estimators=10, random_state=42)
model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

RUSBoost (Random Under-Sampling + Boosting)

from imblearn.ensemble import RUSBoostClassifier

model = RUSBoostClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Strategy 4: Change Your Threshold

By default, models use 0.5 as the threshold:

probability >= 0.5 → Predict positive
probability < 0.5  → Predict negative
Enter fullscreen mode Exit fullscreen mode

But who said 0.5 is right?

Lower the threshold to catch more of the minority class:

from sklearn.linear_model import LogisticRegression
import numpy as np

model = LogisticRegression()
model.fit(X_train, y_train)

# Get probabilities instead of predictions
y_proba = model.predict_proba(X_test)[:, 1]

# Try different thresholds
for threshold in [0.5, 0.3, 0.2, 0.1]:
    y_pred = (y_proba >= threshold).astype(int)
    recall = recall_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    print(f"Threshold {threshold}: Recall={recall:.1%}, Precision={precision:.1%}")
Enter fullscreen mode Exit fullscreen mode

Output:

Threshold 0.5: Recall=20.0%, Precision=85.0%
Threshold 0.3: Recall=45.0%, Precision=72.0%
Threshold 0.2: Recall=65.0%, Precision=58.0%
Threshold 0.1: Recall=85.0%, Precision=35.0%
Enter fullscreen mode Exit fullscreen mode

Lower threshold = Higher recall, Lower precision

Find the sweet spot for your use case!


Strategy 5: Anomaly Detection Approach

When imbalance is EXTREME (fraud is 0.01%), treat it as anomaly detection.

The idea: Train only on the majority class. Flag anything that doesn't fit.

from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM

# Train only on normal transactions
X_normal = X_train[y_train == 0]

# Isolation Forest
iso_forest = IsolationForest(contamination=0.01, random_state=42)
iso_forest.fit(X_normal)

# Predictions: 1 = normal, -1 = anomaly
predictions = iso_forest.predict(X_test)
y_pred = (predictions == -1).astype(int)  # Convert to 0/1
Enter fullscreen mode Exit fullscreen mode

Pros: Works with extreme imbalance, doesn't need minority labels for training
Cons: Less accurate than supervised methods when you have enough minority examples


Strategy 6: Cost-Sensitive Learning

The idea: Define explicit costs for different types of errors.

                    Predicted
                 Normal    Fraud
Actual  Normal     $0      $10    (false alarm: investigate cost)
        Fraud     $1000     $0    (missed fraud: loss to company)
Enter fullscreen mode Exit fullscreen mode

Missing a fraud costs 100x more than a false alarm. Build this into your model.

# XGBoost with custom scale_pos_weight
import xgboost as xgb

# If fraud is 1% of data, set scale_pos_weight to 99
model = xgb.XGBClassifier(scale_pos_weight=99)
model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Complete Code: Comparing All Strategies

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, f1_score, recall_score, precision_score
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.ensemble import BalancedRandomForestClassifier
import warnings
warnings.filterwarnings('ignore')

# Create imbalanced dataset (5% minority)
X, y = make_classification(
    n_samples=10000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    weights=[0.95, 0.05],  # 95% class 0, 5% class 1
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set class distribution:")
print(f"  Class 0: {sum(y_train==0)} ({sum(y_train==0)/len(y_train):.1%})")
print(f"  Class 1: {sum(y_train==1)} ({sum(y_train==1)/len(y_train):.1%})")
print()

results = []

# 1. Baseline: No handling
print("=" * 60)
print("1. BASELINE (No imbalance handling)")
print("=" * 60)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('Baseline', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# 2. Class weights
print("=" * 60)
print("2. CLASS WEIGHTS (balanced)")
print("=" * 60)
model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('Class Weights', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# 3. Random Oversampling
print("=" * 60)
print("3. RANDOM OVERSAMPLING")
print("=" * 60)
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=42)
X_ros, y_ros = ros.fit_resample(X_train, y_train)
model = LogisticRegression(max_iter=1000)
model.fit(X_ros, y_ros)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('Oversampling', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# 4. SMOTE
print("=" * 60)
print("4. SMOTE")
print("=" * 60)
smote = SMOTE(random_state=42)
X_smote, y_smote = smote.fit_resample(X_train, y_train)
model = LogisticRegression(max_iter=1000)
model.fit(X_smote, y_smote)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('SMOTE', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# 5. Random Undersampling
print("=" * 60)
print("5. RANDOM UNDERSAMPLING")
print("=" * 60)
rus = RandomUnderSampler(random_state=42)
X_rus, y_rus = rus.fit_resample(X_train, y_train)
model = LogisticRegression(max_iter=1000)
model.fit(X_rus, y_rus)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('Undersampling', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# 6. Balanced Random Forest
print("=" * 60)
print("6. BALANCED RANDOM FOREST")
print("=" * 60)
model = BalancedRandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=['Majority', 'Minority']))
results.append(('Balanced RF', f1_score(y_test, y_pred), recall_score(y_test, y_pred)))

# Summary
print("=" * 60)
print("SUMMARY")
print("=" * 60)
print(f"{'Method':<20} {'F1 Score':>12} {'Recall':>12}")
print("-" * 46)
for method, f1, recall in results:
    print(f"{method:<20} {f1:>12.1%} {recall:>12.1%}")
Enter fullscreen mode Exit fullscreen mode

Output:

Training set class distribution:
  Class 0: 7591 (94.9%)
  Class 1: 409 (5.1%)

============================================================
1. BASELINE (No imbalance handling)
============================================================
              precision    recall  f1-score   support

    Majority       0.96      0.99      0.98      1909
    Minority       0.71      0.42      0.53        91

    accuracy                           0.96      2000

============================================================
2. CLASS WEIGHTS (balanced)
============================================================
              precision    recall  f1-score   support

    Majority       0.98      0.93      0.95      1909
    Minority       0.40      0.70      0.51        91

    accuracy                           0.92      2000

============================================================
SUMMARY
============================================================
Method                   F1 Score       Recall
----------------------------------------------
Baseline                    52.8%        41.8%
Class Weights               50.8%        70.3%
Oversampling                52.7%        62.6%
SMOTE                       54.1%        64.8%
Undersampling               48.2%        72.5%
Balanced RF                 55.3%        61.5%
Enter fullscreen mode Exit fullscreen mode

Key insight: Baseline has the worst recall (41.8%). All imbalance techniques improve recall, but at different precision costs. Choose based on your priorities!


The Precision-Recall Tradeoff

Every imbalance technique faces this tradeoff:

                HIGH PRECISION              HIGH RECALL
                "Few false alarms"          "Catch everything"
                        │                         │
                        │                         │
                        ▼                         ▼
Baseline:       ████████████████░░░░░░░░░░░░░░░░░░░░░░
Class Weights:  ████████████░░░░░░░░░░░░░░░░░████████░░
SMOTE:          ██████████████░░░░░░░░░░░░░░░░████████░░
Undersampling:  ████████░░░░░░░░░░░░░░░░░░░░░░██████████

               ├─────────────────┼─────────────────────┤
               Precision         │              Recall
                                 │
                        Your sweet spot
                        depends on cost!
Enter fullscreen mode Exit fullscreen mode

If missing fraud costs $1000 but false alarm costs $10:
→ Prioritize recall! Catch all frauds, tolerate false alarms.

If false alarms annoy customers and cause churn:
→ Balance precision and recall. Don't cry wolf too often.


Which Strategy When?

START
  │
  ▼
How severe is the imbalance?
  │
  ├── Mild (10-30% minority)
  │     │
  │     └──► Class weights usually enough
  │          Try: class_weight='balanced'
  │
  ├── Moderate (1-10% minority)
  │     │
  │     └──► SMOTE or Class weights
  │          Try: SMOTE + class_weight
  │
  └── Severe (<1% minority)
        │
        └──► Combine multiple strategies
             Try: SMOTE + class_weight + threshold tuning
             Or: Anomaly detection approach
Enter fullscreen mode Exit fullscreen mode

Common Mistakes

Mistake 1: Using Accuracy as Your Metric

# ❌ WRONG: Accuracy is misleading!
print(f"Accuracy: {accuracy_score(y_test, y_pred):.1%}")  # 99%! 🎉

# ✅ RIGHT: Use F1, Recall, Precision, AUC
print(f"F1: {f1_score(y_test, y_pred):.1%}")
print(f"Recall: {recall_score(y_test, y_pred):.1%}")
print(f"Precision: {precision_score(y_test, y_pred):.1%}")
Enter fullscreen mode Exit fullscreen mode

Mistake 2: Resampling Before Train-Test Split

# ❌ WRONG: Data leakage! Synthetic test samples based on training data
X_smote, y_smote = SMOTE().fit_resample(X, y)
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote)

# ✅ RIGHT: Split first, then resample only training data
X_train, X_test, y_train, y_test = train_test_split(X, y)
X_train_smote, y_train_smote = SMOTE().fit_resample(X_train, y_train)
model.fit(X_train_smote, y_train_smote)
model.predict(X_test)  # Test set is untouched!
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Resampling in Cross-Validation Wrong

# ❌ WRONG: Resampling before CV causes leakage
X_smote, y_smote = SMOTE().fit_resample(X, y)
cross_val_score(model, X_smote, y_smote, cv=5)

# ✅ RIGHT: Use imblearn Pipeline
from imblearn.pipeline import Pipeline as ImbPipeline

pipeline = ImbPipeline([
    ('smote', SMOTE(random_state=42)),
    ('classifier', LogisticRegression())
])
cross_val_score(pipeline, X, y, cv=5, scoring='f1')
Enter fullscreen mode Exit fullscreen mode

Mistake 4: Ignoring the Business Context

# ❌ WRONG: Optimizing F1 without thinking about costs
model = optimize_for_f1(model)

# ✅ RIGHT: Consider actual business costs
# If missing fraud costs $10,000 and false alarm costs $50:
# Recall matters 200x more than precision!
# Optimize accordingly.
Enter fullscreen mode Exit fullscreen mode

Mistake 5: Not Trying Multiple Approaches

# ❌ WRONG: Just using SMOTE because you heard it's good
X_smote, y_smote = SMOTE().fit_resample(X_train, y_train)

# ✅ RIGHT: Compare multiple approaches
strategies = [
    ('Baseline', X_train, y_train),
    ('SMOTE', *SMOTE().fit_resample(X_train, y_train)),
    ('Class Weight', X_train, y_train),  # with class_weight='balanced'
    ('Undersampling', *RandomUnderSampler().fit_resample(X_train, y_train)),
]

# Evaluate each and pick the best for YOUR use case
Enter fullscreen mode Exit fullscreen mode

The Decision Cheat Sheet

Situation Best Approach
Quick fix, any algorithm class_weight='balanced'
Tree-based models Balanced Random Forest
Need to preserve all data SMOTE
Huge dataset, need speed Undersampling
Extreme imbalance (<0.1%) Anomaly detection
Production system Threshold tuning
Maximum performance Combine SMOTE + weights + threshold

The Imbalanced-Learn Toolkit

# Install: pip install imbalanced-learn

# === OVERSAMPLING ===
from imblearn.over_sampling import (
    RandomOverSampler,    # Simple duplication
    SMOTE,                # Synthetic generation
    ADASYN,               # Adaptive synthetic
    BorderlineSMOTE,      # Focus on boundary
)

# === UNDERSAMPLING ===
from imblearn.under_sampling import (
    RandomUnderSampler,   # Random removal
    TomekLinks,           # Remove Tomek links
    NearMiss,             # Keep informative majorities
)

# === COMBINATION ===
from imblearn.combine import (
    SMOTETomek,           # SMOTE + Tomek cleaning
    SMOTEENN,             # SMOTE + ENN cleaning
)

# === ENSEMBLE ===
from imblearn.ensemble import (
    BalancedRandomForestClassifier,
    BalancedBaggingClassifier,
    EasyEnsembleClassifier,
    RUSBoostClassifier,
)

# === PIPELINE ===
from imblearn.pipeline import Pipeline  # Use this, not sklearn's!
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. Accuracy is a lie with imbalanced data — use F1, recall, precision, AUC

  2. The model isn't stupid — it's doing exactly what you asked (minimize errors)

  3. Class weights are the easiest fix — just add class_weight='balanced'

  4. SMOTE creates synthetic examples — better than simple duplication

  5. Resample AFTER train-test split — never before, or you'll leak data

  6. Threshold tuning is powerful — 0.5 isn't magic

  7. Combine strategies for best results — SMOTE + weights + threshold

  8. Know your costs — precision vs recall depends on business impact


The One-Sentence Summary

When 99% of your data is one class, your model becomes Gary the lazy security guard — 99% accurate, 0% useful. Fix it by making minority mistakes expensive, creating synthetic minorities, or changing how you measure success.


What's Next?

Now that you understand imbalanced datasets, you're ready for:

  • Precision-Recall Curves — Finding the optimal threshold
  • Cost-Sensitive Learning — Building business costs into your model
  • Anomaly Detection Deep Dive — When imbalance is extreme
  • Stratified Sampling — Preserving class ratios in splits

Follow me for the next article in this series!


Let's Connect!

If this saved your imbalanced model, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

What's the most imbalanced dataset you've worked with? I've seen 1:100,000. Share your stories!


The difference between a fraud detection model that catches fraudsters and one that just says "everything is fine"? Understanding that 99% accuracy can mean 0% usefulness. Don't be Gary.


Share this with someone whose model has 99% accuracy but catches nothing. They need to meet Gary.

Happy balancing!

Top comments (0)