Sachin Kr. Rajput

Posted on Jan 21

Stratified Sampling: The Pollster Who Predicted a Landslide by Accidentally Surveying Only One Neighborhood

#python #machinelearning #datascience #beginners

The One-Line Summary: Stratified sampling ensures that when you split your data, each split has the same proportion of each class as the original. Without it, your rare class might accidentally end up mostly in training or mostly in testing — making your evaluation meaningless.

The Pollster's Catastrophic Prediction

November 2024. Smithville is holding its mayoral election.

Pollster Pete needs to predict the winner. The city has 100,000 voters across five neighborhoods:

SMITHVILLE VOTER DISTRIBUTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Neighborhood    | Voters  | Leans      | %
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Downtown        | 10,000  | 80% Blue   | 10%
Suburbs North   | 35,000  | 60% Red    | 35%
Suburbs South   | 30,000  | 55% Red    | 30%
University      |  5,000  | 90% Blue   |  5%
Industrial      | 20,000  | 70% Red    | 20%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL           | 100,000 | ~58% Red   | 100%

True outcome: Red wins 58-42.

Pete's "Random" Sample

Pete surveys 1,000 random voters. But "random" has a problem...

import numpy as np
np.random.seed(unlucky_seed)

# Pete's random sample
sample = random_sample(smithville, n=1000)

# What Pete got:
Downtown:       312 people  (31.2%)  ← Should be 10%!
Suburbs North:  245 people  (24.5%)  ← Should be 35%
Suburbs South:  198 people  (19.8%)  ← Should be 30%
University:     187 people  (18.7%)  ← Should be 5%!
Industrial:      58 people  ( 5.8%)  ← Should be 20%

# Pete's prediction based on this sample:
# Blue: 67%  Red: 33%
# "BLUE LANDSLIDE INCOMING!"

Pete's prediction: Blue wins 67-33.

Actual result: Red wins 58-42.

Pete was off by 25 POINTS.

What Went Wrong?

Random sampling doesn't guarantee proportional representation.

By pure chance, Pete's sample over-represented Blue-leaning areas (Downtown, University) and under-represented Red-leaning areas (Industrial, Suburbs).

THE SAMPLING DISASTER:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Neighborhood   | Actual % | Pete's % | Error
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Downtown       |   10%    |   31%    | +21% (Blue area!)
University     |    5%    |   19%    | +14% (Blue area!)
Suburbs North  |   35%    |   25%    | -10% (Red area)
Suburbs South  |   30%    |   20%    | -10% (Red area)
Industrial     |   20%    |    6%    | -14% (Red area!)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Pete's sample was UNREPRESENTATIVE.
His prediction was GARBAGE.

Stratified Sampling Saves The Day

Smart Susan uses stratified sampling:

"I'll ensure my sample has the SAME proportions as the population."

# Susan's stratified sample
sample = stratified_sample(smithville, n=1000, stratify_by='neighborhood')

# What Susan got:
Downtown:       100 people  (10.0%)  ← Exactly right!
Suburbs North:  350 people  (35.0%)  ← Exactly right!
Suburbs South:  300 people  (30.0%)  ← Exactly right!
University:      50 people  ( 5.0%)  ← Exactly right!
Industrial:     200 people  (20.0%)  ← Exactly right!

# Susan's prediction:
# Blue: 41%  Red: 59%
# "Red wins, close race."

Susan's prediction: Red wins 59-41.

Actual result: Red wins 58-42.

Susan was off by only 1 POINT.

Stratified Sampling in Machine Learning

The same problem happens when splitting data for ML:

# Your fraud detection dataset
total_samples = 10,000
fraud_cases = 200  (2%)
normal_cases = 9,800 (98%)

# RANDOM split (80/20)
# What you HOPE to get:
#   Train: 160 fraud (2%), 7,840 normal (98%)
#   Test:   40 fraud (2%), 1,960 normal (98%)

# What you MIGHT get (bad luck):
#   Train: 185 fraud (2.3%), 7,815 normal
#   Test:   15 fraud (0.75%), 1,985 normal  ← Almost no fraud to test on!

With only 15 fraud cases in your test set, your evaluation is statistically meaningless. One or two lucky/unlucky predictions swing your metrics wildly.

The Math: Why Random Fails

When you have rare classes, random sampling has HIGH VARIANCE:

import numpy as np
from scipy import stats

# Simulation: 10,000 samples, 2% positive class
# Random 80/20 split, repeated 1000 times

n_samples = 10000
positive_rate = 0.02
test_size = 0.2
n_simulations = 1000

test_positive_rates = []

for _ in range(n_simulations):
    # Create dataset
    y = np.random.binomial(1, positive_rate, n_samples)

    # Random split
    test_indices = np.random.choice(n_samples, int(n_samples * test_size), replace=False)
    y_test = y[test_indices]

    # Record positive rate in test set
    test_positive_rates.append(y_test.mean())

print(f"Expected positive rate: {positive_rate:.2%}")
print(f"Actual test positive rates:")
print(f"  Mean:   {np.mean(test_positive_rates):.2%}")
print(f"  Std:    {np.std(test_positive_rates):.2%}")
print(f"  Min:    {np.min(test_positive_rates):.2%}")
print(f"  Max:    {np.max(test_positive_rates):.2%}")
print(f"  Range:  {np.min(test_positive_rates):.2%} to {np.max(test_positive_rates):.2%}")

Output:

Expected positive rate: 2.00%
Actual test positive rates:
  Mean:   2.00%
  Std:    0.31%
  Min:    1.10%
  Max:    3.05%
  Range:  1.10% to 3.05%

Your test set positive rate varies from 1.1% to 3.05%!

That's a 3x difference in how many positive cases you're evaluating on — purely due to random chance.

Stratified Sampling: The Solution

from sklearn.model_selection import train_test_split

# WITHOUT stratification (dangerous!)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Check the distribution
print(f"Original:  {y.mean():.2%} positive")
print(f"Train set: {y_train.mean():.2%} positive")
print(f"Test set:  {y_test.mean():.2%} positive")

Original:  2.00% positive
Train set: 2.13% positive  ← Drifted!
Test set:  1.50% positive  ← Drifted more!

# WITH stratification (safe!)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y  # ← Magic parameter!
)

# Check the distribution
print(f"Original:  {y.mean():.2%} positive")
print(f"Train set: {y_train.mean():.2%} positive")
print(f"Test set:  {y_test.mean():.2%} positive")

Original:  2.00% positive
Train set: 2.00% positive  ← Exactly right!
Test set:  2.00% positive  ← Exactly right!

One parameter. Problem solved.

Visual: Random vs Stratified

ORIGINAL DATA (2% positive):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○●○○○○○○○○
○○○○○○○○○○○○○○○○○○○○○●○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○
(● = positive, ○ = negative)


RANDOM SPLIT (unlucky):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Train (80%):  ○○○○○○○○●○○○○○○○○○○○○○○○○○○○○○○●○○○○○○○○○○○○
              (2.5% positive - too many!)

Test (20%):   ○○○○○○○○○○○○○○○○○○○○○
              (0% positive - NONE! 😱)


STRATIFIED SPLIT (guaranteed):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Train (80%):  ○○○○○○○○○○○○○○○○○○●○○○○○○○○○○○○○○○○○○○●○○○○
              (2.0% positive - exact!)

Test (20%):   ○○○○○○○○○○●○○○○○○○○○○
              (2.0% positive - exact! ✓)

When Stratified Sampling Is Critical

1. Imbalanced Classification

# Fraud detection: 0.1% fraud
# Disease diagnosis: 2% positive
# Churn prediction: 5% churners
# Rare event prediction: <10% positive

# ALWAYS use stratify=y for these!
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y
)

2. Multi-Class with Rare Classes

# Image classification with 100 classes
# Some classes have 10,000 images, others have 50
# Random split might put ALL 50 rare images in training!

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y  # Preserves ALL class proportions
)

# Verify
import pandas as pd
print("Class distribution:")
print(pd.DataFrame({
    'Original': pd.Series(y).value_counts(normalize=True),
    'Train': pd.Series(y_train).value_counts(normalize=True),
    'Test': pd.Series(y_test).value_counts(normalize=True)
}))

3. Cross-Validation

from sklearn.model_selection import StratifiedKFold, cross_val_score

# Regular KFold - DANGEROUS for imbalanced data
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

# Stratified KFold - SAFE for imbalanced data
skfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Use stratified for classification!
scores = cross_val_score(model, X, y, cv=skfold, scoring='f1')
print(f"F1 scores: {scores}")
print(f"Mean: {scores.mean():.3f} ± {scores.std():.3f}")

4. Time Series with Categories

# Sales prediction by product category
# Some categories are rare (luxury items)
# Need stratified sampling WITHIN time-based splits

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)

for train_idx, test_idx in sss.split(X, y):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    # Each split has correct class proportions

The Consequences of NOT Stratifying

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, recall_score
from sklearn.datasets import make_classification

# Create imbalanced dataset (5% positive)
X, y = make_classification(
    n_samples=2000, n_features=20,
    weights=[0.95, 0.05], random_state=42
)

# Run experiment: compare random vs stratified splits
n_experiments = 100
random_f1s = []
stratified_f1s = []

for seed in range(n_experiments):
    # Random split
    X_tr_r, X_te_r, y_tr_r, y_te_r = train_test_split(
        X, y, test_size=0.2, random_state=seed
    )

    # Stratified split
    X_tr_s, X_te_s, y_tr_s, y_te_s = train_test_split(
        X, y, test_size=0.2, random_state=seed, stratify=y
    )

    # Train same model
    model_r = LogisticRegression(max_iter=1000).fit(X_tr_r, y_tr_r)
    model_s = LogisticRegression(max_iter=1000).fit(X_tr_s, y_tr_s)

    # Evaluate
    random_f1s.append(f1_score(y_te_r, model_r.predict(X_te_r)))
    stratified_f1s.append(f1_score(y_te_s, model_s.predict(X_te_s)))

print("F1 Score Comparison (100 experiments):")
print(f"\nRandom splits:")
print(f"  Mean: {np.mean(random_f1s):.3f}")
print(f"  Std:  {np.std(random_f1s):.3f}")
print(f"  Range: {np.min(random_f1s):.3f} - {np.max(random_f1s):.3f}")

print(f"\nStratified splits:")
print(f"  Mean: {np.mean(stratified_f1s):.3f}")
print(f"  Std:  {np.std(stratified_f1s):.3f}")
print(f"  Range: {np.min(stratified_f1s):.3f} - {np.max(stratified_f1s):.3f}")

Output:

F1 Score Comparison (100 experiments):

Random splits:
  Mean: 0.542
  Std:  0.089
  Range: 0.286 - 0.727

Stratified splits:
  Mean: 0.548
  Std:  0.047
  Range: 0.444 - 0.654

Key findings:

Metric	Random	Stratified
Mean F1	0.542	0.548
Std Dev	0.089	0.047 (47% lower!)
Range	0.286-0.727	0.444-0.654

Stratified sampling cut variance nearly in HALF!

With random splits, your F1 could be anywhere from 0.29 to 0.73 — a huge range that makes model comparison nearly impossible.

Stratified Sampling for Regression?

For regression, you don't have classes. But you can still stratify!

Option 1: Stratify by Binned Target

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# House prices (continuous target)
y_prices = df['price'].values

# Create bins for stratification
y_binned = pd.qcut(y_prices, q=10, labels=False)  # 10 equal-frequency bins

# Stratify by bins
X_train, X_test, y_train, y_test = train_test_split(
    X, y_prices, test_size=0.2, stratify=y_binned
)

# Verify distribution
print("Price distribution preserved:")
print(f"Train mean: ${y_train.mean():,.0f}")
print(f"Test mean:  ${y_test.mean():,.0f}")
print(f"Full mean:  ${y_prices.mean():,.0f}")

Option 2: Stratify by Important Category

# If you have a categorical feature that matters
# (e.g., property_type for house prices)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=df['property_type']
)

# Now each property type is proportionally represented

Multi-Label Stratification

When each sample can have MULTIPLE labels:

from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
# pip install iterative-stratification

# Multi-label: each sample can have multiple tags
# y shape: (n_samples, n_labels)
y_multilabel = np.array([
    [1, 0, 1],  # Sample 1: labels 0 and 2
    [0, 1, 0],  # Sample 2: label 1
    [1, 1, 1],  # Sample 3: all labels
    # ...
])

mskf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, test_idx in mskf.split(X, y_multilabel):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y_multilabel[train_idx], y_multilabel[test_idx]
    # Each label's proportion is preserved!

Group-Aware Stratified Splitting

Sometimes you need to keep groups together AND stratify:

# Medical data: multiple samples per patient
# - Can't have same patient in train AND test (data leakage!)
# - But still want stratified disease distribution

from sklearn.model_selection import StratifiedGroupKFold

# patient_ids: which patient each sample belongs to
# y: disease labels
sgkf = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, test_idx in sgkf.split(X, y, groups=patient_ids):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # Patients are kept together (no leakage)
    # AND class distribution is stratified!

Common Mistakes

Mistake 1: Forgetting to Stratify with Imbalanced Data

# ❌ WRONG
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# With 1% positive class, test set might have 0 positives!

# ✅ RIGHT
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y
)

Mistake 2: Stratifying on Wrong Variable

# ❌ WRONG: Stratifying on a feature instead of target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=X['age_group']
)

# ✅ RIGHT: Stratify on the TARGET variable
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y
)

# Or stratify on both if needed:
stratify_col = y.astype(str) + '_' + X['age_group'].astype(str)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=stratify_col
)

Mistake 3: Not Stratifying in Cross-Validation

# ❌ WRONG: Regular KFold for classification
from sklearn.model_selection import KFold, cross_val_score
scores = cross_val_score(model, X, y, cv=KFold(5))

# ✅ RIGHT: StratifiedKFold for classification
from sklearn.model_selection import StratifiedKFold
scores = cross_val_score(model, X, y, cv=StratifiedKFold(5))

# Or simply:
scores = cross_val_score(model, X, y, cv=5)  # Default uses StratifiedKFold for classifiers!

Mistake 4: Impossible Stratification

# ❌ ERROR: Class has fewer samples than n_splits
# If you have 3 samples of class A and want 5-fold CV...

y = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]  # Only 3 positives
skf = StratifiedKFold(n_splits=5)  # ERROR! Can't put 3 samples into 5 folds

# ✅ FIX: Reduce n_splits or use StratifiedShuffleSplit
skf = StratifiedKFold(n_splits=3)  # Works with 3 positives

# Or use repeated holdout:
from sklearn.model_selection import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(n_splits=5, test_size=0.2)  # Works!

Complete Example: The Right Way

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from collections import Counter

# Load imbalanced data
# Simulating: 95% negative, 5% positive
np.random.seed(42)
n_samples = 5000
X = np.random.randn(n_samples, 20)
y = np.random.choice([0, 1], size=n_samples, p=[0.95, 0.05])

print("="*60)
print("ORIGINAL DATA")
print("="*60)
print(f"Class distribution: {Counter(y)}")
print(f"Positive rate: {y.mean():.2%}")

# STEP 1: Stratified train/test split
print("\n" + "="*60)
print("STEP 1: STRATIFIED TRAIN/TEST SPLIT")
print("="*60)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Train - Positive rate: {y_train.mean():.2%} (n={len(y_train)})")
print(f"Test  - Positive rate: {y_test.mean():.2%} (n={len(y_test)})")
print("✓ Proportions preserved!")

# STEP 2: Stratified cross-validation on training set
print("\n" + "="*60)
print("STEP 2: STRATIFIED CROSS-VALIDATION")
print("="*60)

model = RandomForestClassifier(n_estimators=100, random_state=42)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Verify each fold has correct proportions
print("\nFold class distributions:")
for i, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):
    y_fold_train = y_train[train_idx]
    y_fold_val = y_train[val_idx]
    print(f"  Fold {i+1}: Train={y_fold_train.mean():.2%}, Val={y_fold_val.mean():.2%}")

# Cross-validation scores
scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='f1')
print(f"\nCV F1 Scores: {scores.round(3)}")
print(f"Mean F1: {scores.mean():.3f} ± {scores.std():.3f}")

# STEP 3: Final evaluation on test set
print("\n" + "="*60)
print("STEP 3: FINAL TEST EVALUATION")
print("="*60)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

Output:

============================================================
ORIGINAL DATA
============================================================
Class distribution: Counter({0: 4741, 1: 259})
Positive rate: 5.18%

============================================================
STEP 1: STRATIFIED TRAIN/TEST SPLIT
============================================================
Train - Positive rate: 5.18% (n=4000)
Test  - Positive rate: 5.20% (n=1000)
✓ Proportions preserved!

============================================================
STEP 2: STRATIFIED CROSS-VALIDATION
============================================================

Fold class distributions:
  Fold 1: Train=5.19%, Val=5.12%
  Fold 2: Train=5.16%, Val=5.25%
  Fold 3: Train=5.16%, Val=5.25%
  Fold 4: Train=5.19%, Val=5.12%
  Fold 5: Train=5.19%, Val=5.12%

CV F1 Scores: [0.462 0.488 0.421 0.505 0.471]
Mean F1: 0.469 ± 0.029

============================================================
STEP 3: FINAL TEST EVALUATION
============================================================
              precision    recall  f1-score   support

    Negative       0.97      0.99      0.98       948
    Positive       0.60      0.37روم      0.46        52

    accuracy                           0.96      1000
   macro avg       0.79      0.68      0.72      1000
weighted avg       0.95      0.96      0.95      1000

Every split has ~5.2% positive rate. Evaluation is reliable!

Quick Reference

When to Stratify

Scenario	Stratify?	How
Binary classification	Yes	`stratify=y`
Multi-class, balanced	Optional	`stratify=y`
Multi-class, imbalanced	Yes	`stratify=y`
Regression	Optional	`stratify=binned_y`
Multi-label	Yes	`MultilabelStratifiedKFold`
Groups + classes	Yes	`StratifiedGroupKFold`

The One-Line Fix

# Add this parameter to ALL your train_test_split calls:
stratify=y

# Add this to ALL your cross-validation:
cv=StratifiedKFold(n_splits=5)

Key Takeaways

Random sampling doesn't guarantee proportions — Rare classes can accidentally cluster
Stratified sampling preserves class ratios — Each split mirrors the original
Critical for imbalanced data — Without it, test sets may have few/no minority samples
Reduces evaluation variance — More reliable performance estimates
One parameter: stratify=y — Easy to implement, huge impact
Use StratifiedKFold for CV — Not regular KFold for classification
Works for regression too — Bin the target, then stratify
Check your distributions — Always verify after splitting

The One-Sentence Summary

Pollster Pete surveyed 1,000 "random" voters but accidentally oversampled Blue neighborhoods and predicted a 67-33 landslide when the real result was 42-58 the other way — stratified sampling ensures your train/test splits have the same proportions as your full data, so your model isn't trained on a different reality than it's tested on.

What's Next?

Now that you understand stratified sampling, you're ready for:

Cross-Validation Deep Dive — K-fold, Leave-One-Out, Time Series CV
Handling Extreme Imbalance — When stratification isn't enough
Sampling Strategies — Over-sampling, under-sampling, SMOTE
Bootstrap Methods — Sampling with replacement

Follow me for the next article in this series!

Let's Connect!

If stratified sampling finally clicked, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

Have you been burned by non-stratified splits? I once had a model with 0 positive cases in the validation set. Took hours to figure out why metrics were NaN! 😅

The difference between a reliable F1 score and one that swings wildly depending on random seed? Stratified sampling. Your test set should look like your training set should look like your real data. When that chain breaks, your evaluation is fiction.

Share this with someone who keeps getting different metrics every time they run their code. It might not be randomness in the model — it might be randomness in the split.

Happy stratifying! 🗳️

DEV Community

Stratified Sampling: The Pollster Who Predicted a Landslide by Accidentally Surveying Only One Neighborhood

The Pollster's Catastrophic Prediction

Pete's "Random" Sample

What Went Wrong?

Stratified Sampling Saves The Day

Stratified Sampling in Machine Learning

The Math: Why Random Fails

Stratified Sampling: The Solution

Visual: Random vs Stratified

When Stratified Sampling Is Critical

1. Imbalanced Classification

2. Multi-Class with Rare Classes

3. Cross-Validation

4. Time Series with Categories

The Consequences of NOT Stratifying

Stratified Sampling for Regression?

Option 1: Stratify by Binned Target

Option 2: Stratify by Important Category

Multi-Label Stratification

Group-Aware Stratified Splitting

Common Mistakes

Mistake 1: Forgetting to Stratify with Imbalanced Data

Mistake 2: Stratifying on Wrong Variable

Mistake 3: Not Stratifying in Cross-Validation

Mistake 4: Impossible Stratification

Complete Example: The Right Way

Quick Reference

When to Stratify

The One-Line Fix

Key Takeaways

The One-Sentence Summary

What's Next?

Let's Connect!

Top comments (0)