<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rusheel</title>
    <description>The latest articles on DEV Community by Rusheel (@rusheel86).</description>
    <link>https://dev.to/rusheel86</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825363%2F0a58cb7e-b7c5-4295-9c2a-bbb83746ecd4.png</url>
      <title>DEV Community: Rusheel</title>
      <link>https://dev.to/rusheel86</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rusheel86"/>
    <language>en</language>
    <item>
      <title>I built a pre-flight check tool for PyTorch, because silent failures are the worst kind</title>
      <dc:creator>Rusheel</dc:creator>
      <pubDate>Sun, 15 Mar 2026 13:28:44 +0000</pubDate>
      <link>https://dev.to/rusheel86/i-built-a-pre-flight-check-tool-for-pytorch-because-silent-failures-are-the-worst-kind-515n</link>
      <guid>https://dev.to/rusheel86/i-built-a-pre-flight-check-tool-for-pytorch-because-silent-failures-are-the-worst-kind-515n</guid>
      <description>&lt;p&gt;Last month I was debugging a training run that produced suspiciously bad results. The loop ran fine. No errors. No crashes. Just a model that learned nothing useful.&lt;/p&gt;

&lt;p&gt;After three days of debugging I found it: the validation set had samples from the training set. Label leakage. The model had been cheating the entire time and I had no idea.&lt;/p&gt;

&lt;p&gt;That was the moment I decided to build &lt;strong&gt;preflight&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is preflight?
&lt;/h2&gt;

&lt;p&gt;preflight is a CLI tool you run &lt;em&gt;before&lt;/em&gt; your training loop starts. It catches the silent failures that waste GPU time — the bugs that don't crash Python but quietly ruin your model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;preflight-ml
preflight run &lt;span class="nt"&gt;--dataloader&lt;/span&gt; my_dataloader.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;preflight — pre-training check report
╭────────────────────────┬──────────┬────────┬──────────────────────────────────────────────────╮
│ Check                  │ Severity │ Status │ Message                                          │
├────────────────────────┼──────────┼────────┼──────────────────────────────────────────────────┤
│ nan_inf_detection      │ FATAL    │ PASS   │ No NaN or Inf values found in 10 sampled batches │
│ normalisation_sanity   │ WARN     │ PASS   │ Normalisation looks reasonable (mean=0.001)      │
│ channel_ordering       │ WARN     │ PASS   │ Channel ordering looks correct (NCHW)            │
│ label_leakage          │ FATAL    │ FAIL   │ Found 12/50 val samples (24%) in train set       │
│ split_sizes            │ INFO     │ PASS   │ train=800 samples, val=200 samples               │
│ vram_estimation        │ WARN     │ PASS   │ Estimated peak VRAM: 2.1 GB / 8.0 GB (26%)      │
│ class_imbalance        │ WARN     │ PASS   │ Class distribution looks balanced                │
│ shape_mismatch         │ FATAL    │ PASS   │ Model accepted input shape (3, 224, 224)         │
│ gradient_check         │ FATAL    │ PASS   │ All gradients look healthy                       │
╰────────────────────────┴──────────┴────────┴──────────────────────────────────────────────────╯

  1 fatal  0 warnings  8 passed

Pre-flight failed. Fix fatal issues before training.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It exits with code 1 on any FATAL failure — which means it blocks CI automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 10 checks
&lt;/h2&gt;

&lt;p&gt;preflight runs 10 checks grouped into three severity tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FATAL&lt;/strong&gt; — these stop the run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;nan_inf_detection&lt;/code&gt; — NaN or Inf values anywhere in sampled batches&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;label_leakage&lt;/code&gt; — samples appearing in both train and val sets&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shape_mismatch&lt;/code&gt; — dataset output shape incompatible with model input&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gradient_check&lt;/code&gt; — zero gradients, dead layers, exploding gradients before training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;WARN&lt;/strong&gt; — these flag issues worth fixing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;normalisation_sanity&lt;/code&gt; — data that looks unnormalised (raw pixel values etc.)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;channel_ordering&lt;/code&gt; — NHWC tensors when PyTorch expects NCHW&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vram_estimation&lt;/code&gt; — estimated peak VRAM exceeds 90% of GPU memory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;class_imbalance&lt;/code&gt; — severe class imbalance beyond a configurable threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;INFO&lt;/strong&gt; — these are logged for awareness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;split_sizes&lt;/code&gt; — empty or degenerate train/val splits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;duplicate_samples&lt;/code&gt; — identical samples within a split&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why not just use pytest?
&lt;/h2&gt;

&lt;p&gt;pytest tests code logic. preflight tests data state.&lt;/p&gt;

&lt;p&gt;These are different failure modes at different levels of the stack. A pytest suite can pass completely while your dataset has NaNs, your labels are leaking, and your tensors are in the wrong channel order. preflight fills the gap between "my code runs" and "my training will actually work."&lt;/p&gt;




&lt;h2&gt;
  
  
  Why not Deepchecks or Great Expectations?
&lt;/h2&gt;

&lt;p&gt;Both are excellent tools. But they're platforms — heavy, general-purpose, and require setup time. preflight is a tool. One pip install, one command, 30 seconds. No config required to get started.&lt;/p&gt;

&lt;p&gt;The goal is to make running preflight feel as natural as running pytest before a commit.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Basic usage&lt;/strong&gt; — just a dataloader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# my_dataloader.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.utils.data&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TensorDataset&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
&lt;span class="n"&gt;dataloader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TensorDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;preflight run &lt;span class="nt"&gt;--dataloader&lt;/span&gt; my_dataloader.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full usage&lt;/strong&gt; — with model, loss, and val set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;preflight run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dataloader&lt;/span&gt; my_dataloader.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; my_model.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--loss&lt;/span&gt; my_loss.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--val-dataloader&lt;/span&gt; my_val_dataloader.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;In CI&lt;/strong&gt; — add to your GitHub Actions workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install preflight&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install preflight-ml&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run pre-flight checks&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preflight run --dataloader scripts/dataloader.py --format json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With config&lt;/strong&gt; — add &lt;code&gt;.preflight.toml&lt;/code&gt; to your repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[thresholds]&lt;/span&gt;
&lt;span class="py"&gt;imbalance_threshold&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="py"&gt;nan_sample_batches&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;

&lt;span class="nn"&gt;[checks]&lt;/span&gt;
&lt;span class="py"&gt;vram_estimation&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What preflight does NOT do
&lt;/h2&gt;

&lt;p&gt;This is important. preflight is a minimum safety bar, not a guarantee.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It does not replace unit tests&lt;/li&gt;
&lt;li&gt;It does not guarantee a correct model&lt;/li&gt;
&lt;li&gt;It does not run your full training loop&lt;/li&gt;
&lt;li&gt;It does not catch every possible failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like a pre-flight checklist before a flight. The pilot still needs to fly the plane.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The roadmap for upcoming releases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--fix&lt;/code&gt; flag — auto-patch common issues like channel ordering and normalisation&lt;/li&gt;
&lt;li&gt;Dataset snapshot + drift detection (&lt;code&gt;preflight diff baseline.json new_data.pt&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Full dry-run mode — one batch through model + loss + backward&lt;/li&gt;
&lt;li&gt;Jupyter magic command (&lt;code&gt;%load_ext preflight&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;preflight-monai&lt;/code&gt; plugin for medical imaging specific checks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;preflight-sktime&lt;/code&gt; plugin for time series checks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Rusheel86/preflight" rel="noopener noreferrer"&gt;github.com/Rusheel86/preflight&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/preflight-ml/" rel="noopener noreferrer"&gt;pypi.org/project/preflight-ml&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install: &lt;code&gt;pip install preflight-ml&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever lost hours to a silent training failure, give it a try. And if you want to contribute — especially new checks — PRs are very welcome. Every check needs a passing test, a failing test, and a fix hint. Check out &lt;a href="https://github.com/Rusheel86/preflight/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;CONTRIBUTING.md&lt;/a&gt;.&lt;/p&gt;




</description>
      <category>pytorch</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
