Last month I was debugging a training run that produced suspiciously bad results. The loop ran fine. No errors. No crashes. Just a model that learned nothing useful.
After three days of debugging I found it: the validation set had samples from the training set. Label leakage. The model had been cheating the entire time and I had no idea.
That was the moment I decided to build preflight.
What is preflight?
preflight is a CLI tool you run before your training loop starts. It catches the silent failures that waste GPU time — the bugs that don't crash Python but quietly ruin your model.
pip install preflight-ml
preflight run --dataloader my_dataloader.py
Output:
preflight — pre-training check report
╭────────────────────────┬──────────┬────────┬──────────────────────────────────────────────────╮
│ Check │ Severity │ Status │ Message │
├────────────────────────┼──────────┼────────┼──────────────────────────────────────────────────┤
│ nan_inf_detection │ FATAL │ PASS │ No NaN or Inf values found in 10 sampled batches │
│ normalisation_sanity │ WARN │ PASS │ Normalisation looks reasonable (mean=0.001) │
│ channel_ordering │ WARN │ PASS │ Channel ordering looks correct (NCHW) │
│ label_leakage │ FATAL │ FAIL │ Found 12/50 val samples (24%) in train set │
│ split_sizes │ INFO │ PASS │ train=800 samples, val=200 samples │
│ vram_estimation │ WARN │ PASS │ Estimated peak VRAM: 2.1 GB / 8.0 GB (26%) │
│ class_imbalance │ WARN │ PASS │ Class distribution looks balanced │
│ shape_mismatch │ FATAL │ PASS │ Model accepted input shape (3, 224, 224) │
│ gradient_check │ FATAL │ PASS │ All gradients look healthy │
╰────────────────────────┴──────────┴────────┴──────────────────────────────────────────────────╯
1 fatal 0 warnings 8 passed
Pre-flight failed. Fix fatal issues before training.
It exits with code 1 on any FATAL failure — which means it blocks CI automatically.
The 10 checks
preflight runs 10 checks grouped into three severity tiers:
FATAL — these stop the run:
-
nan_inf_detection— NaN or Inf values anywhere in sampled batches -
label_leakage— samples appearing in both train and val sets -
shape_mismatch— dataset output shape incompatible with model input -
gradient_check— zero gradients, dead layers, exploding gradients before training
WARN — these flag issues worth fixing:
-
normalisation_sanity— data that looks unnormalised (raw pixel values etc.) -
channel_ordering— NHWC tensors when PyTorch expects NCHW -
vram_estimation— estimated peak VRAM exceeds 90% of GPU memory -
class_imbalance— severe class imbalance beyond a configurable threshold
INFO — these are logged for awareness:
-
split_sizes— empty or degenerate train/val splits -
duplicate_samples— identical samples within a split
Why not just use pytest?
pytest tests code logic. preflight tests data state.
These are different failure modes at different levels of the stack. A pytest suite can pass completely while your dataset has NaNs, your labels are leaking, and your tensors are in the wrong channel order. preflight fills the gap between "my code runs" and "my training will actually work."
Why not Deepchecks or Great Expectations?
Both are excellent tools. But they're platforms — heavy, general-purpose, and require setup time. preflight is a tool. One pip install, one command, 30 seconds. No config required to get started.
The goal is to make running preflight feel as natural as running pytest before a commit.
How to use it
Basic usage — just a dataloader:
# my_dataloader.py
import torch
from torch.utils.data import DataLoader, TensorDataset
x = torch.randn(200, 3, 224, 224)
y = torch.randint(0, 10, (200,))
dataloader = DataLoader(TensorDataset(x, y), batch_size=32)
preflight run --dataloader my_dataloader.py
Full usage — with model, loss, and val set:
preflight run \
--dataloader my_dataloader.py \
--model my_model.py \
--loss my_loss.py \
--val-dataloader my_val_dataloader.py
In CI — add to your GitHub Actions workflow:
- name: Install preflight
run: pip install preflight-ml
- name: Run pre-flight checks
run: preflight run --dataloader scripts/dataloader.py --format json
With config — add .preflight.toml to your repo:
[thresholds]
imbalance_threshold = 0.05
nan_sample_batches = 20
[checks]
vram_estimation = false
What preflight does NOT do
This is important. preflight is a minimum safety bar, not a guarantee.
- It does not replace unit tests
- It does not guarantee a correct model
- It does not run your full training loop
- It does not catch every possible failure
Think of it like a pre-flight checklist before a flight. The pilot still needs to fly the plane.
What's next
The roadmap for upcoming releases:
-
--fixflag — auto-patch common issues like channel ordering and normalisation - Dataset snapshot + drift detection (
preflight diff baseline.json new_data.pt) - Full dry-run mode — one batch through model + loss + backward
- Jupyter magic command (
%load_ext preflight) -
preflight-monaiplugin for medical imaging specific checks -
preflight-sktimeplugin for time series checks
Links
- GitHub: github.com/Rusheel86/preflight
- PyPI: pypi.org/project/preflight-ml
- Install:
pip install preflight-ml
If you've ever lost hours to a silent training failure, give it a try. And if you want to contribute — especially new checks — PRs are very welcome. Every check needs a passing test, a failing test, and a fix hint. Check out CONTRIBUTING.md.
Top comments (0)