<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vimal Nakrani</title>
    <description>The latest articles on DEV Community by Vimal Nakrani (@vimal_nakrani).</description>
    <link>https://dev.to/vimal_nakrani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3983002%2F6923f662-042c-41fe-aac4-7fda94f8eb46.png</url>
      <title>DEV Community: Vimal Nakrani</title>
      <link>https://dev.to/vimal_nakrani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vimal_nakrani"/>
    <language>en</language>
    <item>
      <title>I Tested Quantized Unlimited-OCR on Mac. 4-Bit Was Not the Sweet Spot.</title>
      <dc:creator>Vimal Nakrani</dc:creator>
      <pubDate>Sat, 04 Jul 2026 14:24:14 +0000</pubDate>
      <link>https://dev.to/vimal_nakrani/i-tested-quantized-unlimited-ocr-on-mac-4-bit-was-not-the-sweet-spot-f2k</link>
      <guid>https://dev.to/vimal_nakrani/i-tested-quantized-unlimited-ocr-on-mac-4-bit-was-not-the-sweet-spot-f2k</guid>
      <description>&lt;p&gt;A quantization "quality ladder" where the full-precision model performs worse than its own 4-bit version is not really measuring quality.&lt;/p&gt;

&lt;p&gt;It is measuring noise.&lt;/p&gt;

&lt;p&gt;I kept running into this while looking at quantized versions of &lt;strong&gt;Unlimited-OCR&lt;/strong&gt;, Baidu's new 3B OCR model released under the MIT license. Most quantized repos had one of two problems: either they shipped no quality numbers at all, or the numbers were difficult to trust.&lt;/p&gt;

&lt;p&gt;Some baselines were losing to their own quants. Some character error rates were above 100%. In many cases, runaway repetition loops were silently inflating the averages.&lt;/p&gt;

&lt;p&gt;So I built the ladder I wanted to read.&lt;/p&gt;

&lt;p&gt;I tested the two runtimes people are most likely to use locally on a Mac:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MLX&lt;/strong&gt; for Apple Silicon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GGUF&lt;/strong&gt; through llama.cpp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every number below comes from a reproducible evaluation run, and the full harness is open source.&lt;/p&gt;

&lt;p&gt;The results were useful, but two findings genuinely surprised me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Did This
&lt;/h2&gt;

&lt;p&gt;Quantization benchmarks are easy to make look better than they are.&lt;/p&gt;

&lt;p&gt;A model can pass a quick smoke test on one clean page and still fall apart on dense text, invoices, tables, or small fonts. For OCR, that matters. The difference between "mostly works" and "silently corrupts numbers" is not small.&lt;/p&gt;

&lt;p&gt;I wanted to answer a more practical question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I run this model locally, how much quality do I actually lose at each quantization level?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not in theory. Not by vibes. On a controlled OCR task with known ground truth.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Methodology Most People Skip
&lt;/h2&gt;

&lt;p&gt;The hard part was not only quantizing the model.&lt;/p&gt;

&lt;p&gt;The hard part was making the evaluation readable.&lt;/p&gt;

&lt;p&gt;Three choices made the results much cleaner.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Use Exact Ground Truth
&lt;/h3&gt;

&lt;p&gt;Instead of scoring against a real-world form dataset with sparse annotations, I generated a synthetic OCR corpus.&lt;/p&gt;

&lt;p&gt;The corpus contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;24 rendered pages&lt;/li&gt;
&lt;li&gt;Three difficulty tiers:

&lt;ul&gt;
&lt;li&gt;clean prose&lt;/li&gt;
&lt;li&gt;dense small-font pages&lt;/li&gt;
&lt;li&gt;digit-heavy invoices&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;deterministic seed&lt;/li&gt;
&lt;li&gt;ground truth known character-for-character&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because sparse annotations can make OCR models look worse than they are.&lt;/p&gt;

&lt;p&gt;For example, if the dataset only labels a few fields but the model transcribes the entire page, the character error rate can be inflated for the wrong reason.&lt;/p&gt;

&lt;p&gt;For this test, I wanted exact text-to-text comparison.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Compare Each Quant Against Its Own Runtime Baseline
&lt;/h3&gt;

&lt;p&gt;Every quantized model is measured against the full-precision BF16 conversion in the same runtime.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GGUF quants are compared against GGUF-BF16 under llama.cpp&lt;/li&gt;
&lt;li&gt;MLX quants are compared against MLX-BF16 under mlx-vlm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No cross-runtime baselines.&lt;/p&gt;

&lt;p&gt;This is important because different runtimes can fail differently, even with the same model and prompt.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Surface Loops Instead of Hiding Them
&lt;/h3&gt;

&lt;p&gt;All decoding runs use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temperature 0&lt;/li&gt;
&lt;li&gt;repetition suppression turned off&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That second part is intentional.&lt;/p&gt;

&lt;p&gt;If quantization makes the model unstable, I want that instability to show up clearly as a loop page. I do not want it quietly buried inside an average.&lt;/p&gt;

&lt;p&gt;Loop pages are flagged using the output/reference length ratio. I also report loop counts separately so it is easier to tell the difference between normal OCR degradation and total decoding collapse.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GGUF Ladder
&lt;/h3&gt;

&lt;p&gt;Measured against GGUF-BF16 using llama.cpp on an M3 Max.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Overall CER&lt;/th&gt;
&lt;th&gt;Δ vs BF16&lt;/th&gt;
&lt;th&gt;Loops&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BF16&lt;/td&gt;
&lt;td&gt;5.47 GiB&lt;/td&gt;
&lt;td&gt;0.78%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q8_0&lt;/td&gt;
&lt;td&gt;2.91 GiB&lt;/td&gt;
&lt;td&gt;0.78%&lt;/td&gt;
&lt;td&gt;+0.00&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q6_K&lt;/td&gt;
&lt;td&gt;2.43 GiB&lt;/td&gt;
&lt;td&gt;0.78%&lt;/td&gt;
&lt;td&gt;+0.00&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q5_K_M&lt;/td&gt;
&lt;td&gt;2.07 GiB&lt;/td&gt;
&lt;td&gt;0.74%&lt;/td&gt;
&lt;td&gt;within noise&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4_K_M&lt;/td&gt;
&lt;td&gt;1.82 GiB&lt;/td&gt;
&lt;td&gt;15.64%&lt;/td&gt;
&lt;td&gt;+14.86 pp&lt;/td&gt;
&lt;td&gt;1/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4_0&lt;/td&gt;
&lt;td&gt;1.59 GiB&lt;/td&gt;
&lt;td&gt;44.02%&lt;/td&gt;
&lt;td&gt;+43.24 pp&lt;/td&gt;
&lt;td&gt;2/24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F61wyon6xe1wdp8ysf49m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F61wyon6xe1wdp8ysf49m.png" alt="Bar chart of character error rate by GGUF quantization level. BF16, Q8_0, Q6_K and Q5_K_M all under 0.8 percent; Q4_K_M jumps to 15.64 percent with 1 of 24 pages looping; Q4_0 reaches 44.02 percent with 2 of 24 pages looping." width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  MLX Ladder
&lt;/h3&gt;

&lt;p&gt;Measured against MLX-BF16 using mlx-vlm.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Overall CER&lt;/th&gt;
&lt;th&gt;Δ vs BF16&lt;/th&gt;
&lt;th&gt;Loops&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BF16&lt;/td&gt;
&lt;td&gt;6.67 GB&lt;/td&gt;
&lt;td&gt;1.62%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;3.92 GB&lt;/td&gt;
&lt;td&gt;1.62%&lt;/td&gt;
&lt;td&gt;+0.00&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6-bit&lt;/td&gt;
&lt;td&gt;3.19 GB&lt;/td&gt;
&lt;td&gt;1.62%&lt;/td&gt;
&lt;td&gt;+0.00&lt;/td&gt;
&lt;td&gt;0/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4-bit uniform&lt;/td&gt;
&lt;td&gt;2.45 GB&lt;/td&gt;
&lt;td&gt;123.61%&lt;/td&gt;
&lt;td&gt;+121.99 pp&lt;/td&gt;
&lt;td&gt;7/24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4-bit mixed&lt;/td&gt;
&lt;td&gt;2.70 GB&lt;/td&gt;
&lt;td&gt;20.86%&lt;/td&gt;
&lt;td&gt;+19.24 pp&lt;/td&gt;
&lt;td&gt;1/24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fmsoy30nmglak9uhmmbv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fmsoy30nmglak9uhmmbv0.png" alt="Bar chart of character error rate by MLX quantization level. BF16, 8-bit and 6-bit all at 1.62 percent; mixed-precision 4-bit jumps to 20.86 percent with 1 of 24 pages looping; uniform 4-bit reaches 123.61 percent with 7 of 24 pages looping." width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two things stand out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding 1: 4-Bit Was Not the Sweet Spot
&lt;/h2&gt;

&lt;p&gt;A lot of people treat 4-bit quantization as the default sweet spot.&lt;/p&gt;

&lt;p&gt;That rule often comes from perplexity studies on large dense chat models.&lt;/p&gt;

&lt;p&gt;Unlimited-OCR is not that.&lt;/p&gt;

&lt;p&gt;It is a 3B Mixture-of-Experts OCR model with roughly 570M active parameters. It is also doing a precision-heavy task where small errors matter. That is exactly the kind of setup where the usual 4-bit heuristic can break.&lt;/p&gt;

&lt;p&gt;In this test, there was no measurable loss down to 6-bit.&lt;/p&gt;

&lt;p&gt;Q8_0 and Q6_K reproduced the BF16 recognized text identically on all 24 pages.&lt;/p&gt;

&lt;p&gt;Then came the cliff.&lt;/p&gt;

&lt;p&gt;The damage was concentrated on dense, small-font pages. Clean pages still stayed under 2% CER even at 4-bit, which explains why a casual one-page test can be misleading.&lt;/p&gt;

&lt;p&gt;An easy page will not catch this.&lt;/p&gt;

&lt;p&gt;You need a hard tier. You also need a real baseline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding 2: The Kind of 4-Bit Quantization Matters
&lt;/h2&gt;

&lt;p&gt;Uniform 4-bit was bad.&lt;/p&gt;

&lt;p&gt;Mixed-precision 4-bit was much better.&lt;/p&gt;

&lt;p&gt;In GGUF, mixed precision beat uniform 4-bit by about &lt;strong&gt;2.8×&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In MLX, it beat uniform 4-bit by about &lt;strong&gt;5.9×&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But it did not fully fix the problem. It softened the cliff. It did not remove it.&lt;/p&gt;

&lt;p&gt;I also ran an MLX ablation to understand where the recovery was coming from. I kept only the vision-to-language projector in float while leaving everything else identical to uniform 4-bit.&lt;/p&gt;

&lt;p&gt;It still looped on 7 out of 24 pages.&lt;/p&gt;

&lt;p&gt;That suggests the recovery is happening on the decoder side: attention projections, embeddings, and the LM head.&lt;/p&gt;

&lt;p&gt;It is not mainly coming from the visual path.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Finding I Did Not Expect: Repetition Suppression Is a Tradeoff
&lt;/h2&gt;

&lt;p&gt;The official Unlimited-OCR pipeline uses repetition suppression with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;no_repeat_ngram_size=35
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;llama.cpp has a related mechanism through the DRY sampler.&lt;/p&gt;

&lt;p&gt;Since only the 4-bit quants were looping, I re-ran the affected GGUF model with upstream-style DRY settings.&lt;/p&gt;

&lt;p&gt;At first, it looked like a fix.&lt;/p&gt;

&lt;p&gt;Q4_K_M overall CER improved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15.64% → 10.47%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The worst loop pages were rescued.&lt;/p&gt;

&lt;p&gt;But the tradeoff was not clean.&lt;/p&gt;

&lt;p&gt;Pages that were previously fine got worse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.88% → 10.86%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The biggest damage showed up on legitimately repetitive documents, such as template invoices.&lt;/p&gt;

&lt;p&gt;Even worse, DRY introduced a brand-new catastrophic loop on a page that was fine without it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0% → 218% CER
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So repetition suppression is not a free fix.&lt;/p&gt;

&lt;p&gt;It trades one failure mode for another.&lt;/p&gt;

&lt;p&gt;If your documents contain honest repetition — tables, invoices, forms, templates — a default-on DRY sampler may quietly reduce accuracy.&lt;/p&gt;

&lt;p&gt;That is why the primary ladder above is scored with repetition suppression off.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus: A First Look at R-SWA
&lt;/h2&gt;

&lt;p&gt;Unlimited-OCR's headline mechanism is &lt;strong&gt;R-SWA&lt;/strong&gt;, or Reference Sliding Window Attention.&lt;/p&gt;

&lt;p&gt;The goal is to keep KV cache memory constant during long-horizon parsing.&lt;/p&gt;

&lt;p&gt;No MLX port implements R-SWA yet, but llama.cpp has an open PR, &lt;strong&gt;&lt;a href="https://github.com/ggml-org/llama.cpp/pull/24975" rel="noopener noreferrer"&gt;#24975&lt;/a&gt;&lt;/strong&gt;, that does. This is not my work. Full credit goes to the PR author.&lt;/p&gt;

&lt;p&gt;The same GGUF file loads with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;n_swa = 128
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;on the PR branch, compared with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;n_swa = 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;on mainline.&lt;/p&gt;

&lt;p&gt;I tested:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;same weights&lt;/li&gt;
&lt;li&gt;same 24 pages&lt;/li&gt;
&lt;li&gt;both attention regimes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The recognized text was identical on 21 of 24 pages.&lt;/p&gt;

&lt;p&gt;On the three pages that differed, R-SWA produced a small net improvement. It did not introduce any loops.&lt;/p&gt;

&lt;p&gt;Important caveat: this only tests single-page fidelity parity.&lt;/p&gt;

&lt;p&gt;The real reason R-SWA exists is multi-page, constant-memory parsing. That is not tested here.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Runtime Details Worth Knowing
&lt;/h2&gt;

&lt;p&gt;A couple of smaller runtime findings are also worth mentioning.&lt;/p&gt;

&lt;p&gt;First, the same prompt can fail differently depending on runtime.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Free OCR.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt emits immediate EOS in llama.cpp, but runs away into a repetition loop in mlx-vlm 0.6.3.&lt;/p&gt;

&lt;p&gt;Second, mlx-vlm 0.6.3 loads this tokenizer through a slow path that skips byte-level BPE decoding.&lt;/p&gt;

&lt;p&gt;That means raw output contains markers like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ġ
Ċ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You need to decode those yourself.&lt;/p&gt;

&lt;p&gt;llama.cpp detokenizes correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproduce It
&lt;/h2&gt;

&lt;p&gt;Everything is public:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/collections/vimalnakrani/unlimited-ocr-mlx-quants-with-measured-eval-ladder-6a45e49413447076db7a6bf0" rel="noopener noreferrer"&gt;Six model repos and a collection on Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vimalnakrani08/unlimited-ocr-eval-harness" rel="noopener noreferrer"&gt;Evaluation harness on GitHub&lt;/a&gt; — corpus generator, CER/WER scorer, loop diagnostics (the scoring core is backend-agnostic; the runner is MLX-specific)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exact command behind the GGUF numbers is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-mtmd-cli &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; unlimited-ocr-Q5_K_M.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--mmproj&lt;/span&gt; mmproj-unlimited-ocr-F16.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; page.png &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"document parsing."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--chat-template&lt;/span&gt; deepseek-ocr &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--temp&lt;/span&gt; 0 &lt;span class="nt"&gt;--repeat-penalty&lt;/span&gt; 1.0 &lt;span class="nt"&gt;--flash-attn&lt;/span&gt; off &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; 2600 &lt;span class="nt"&gt;-c&lt;/span&gt; 16384
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;p&gt;These numbers are specific to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this model&lt;/li&gt;
&lt;li&gt;this corpus&lt;/li&gt;
&lt;li&gt;synthetic clean renders&lt;/li&gt;
&lt;li&gt;English documents&lt;/li&gt;
&lt;li&gt;single-page evaluation&lt;/li&gt;
&lt;li&gt;these runtime builds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your documents look different, run the harness on them.&lt;/p&gt;

&lt;p&gt;That is what it is for.&lt;/p&gt;

&lt;p&gt;For quantization, the useful question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does it work?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The useful question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How much quality do I lose on my data, and can someone else verify it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the difference between a number and evidence.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>ocr</category>
      <category>quantization</category>
    </item>
    <item>
      <title>Why did my DataFrame lose rows? Debugging silent pandas pipeline failures</title>
      <dc:creator>Vimal Nakrani</dc:creator>
      <pubDate>Sat, 13 Jun 2026 17:49:27 +0000</pubDate>
      <link>https://dev.to/vimal_nakrani/why-did-my-dataframe-lose-rows-debugging-silent-pandas-pipeline-failures-4i0</link>
      <guid>https://dev.to/vimal_nakrani/why-did-my-dataframe-lose-rows-debugging-silent-pandas-pipeline-failures-4i0</guid>
      <description>&lt;p&gt;If you've written more than a handful of pandas pipelines, you know this feeling: the row count at the end is wrong, the numbers are slightly off, and somewhere across fifteen transformation steps, &lt;em&gt;something&lt;/em&gt; changed your data without telling you. No exception. No warning. Just a quietly wrong answer.&lt;/p&gt;

&lt;p&gt;These are the worst bugs in data work, because they don't crash — they ship. A dashboard shows a number that's 3% low. A model trains on rows that shouldn't exist. A report goes to a client missing a region. And by the time anyone notices, the pipeline has run a hundred times.&lt;/p&gt;

&lt;p&gt;This post is about why these failures happen, the usual (painful) way people debug them, and a small open-source tool I built called &lt;a href="https://pypi.org/project/dframe-trace/" rel="noopener noreferrer"&gt;&lt;code&gt;dframe-trace&lt;/code&gt;&lt;/a&gt; that automates the tedious part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three silent killers
&lt;/h2&gt;

&lt;p&gt;Almost every silent pipeline bug falls into one of three buckets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rows disappear.&lt;/strong&gt; A &lt;code&gt;merge&lt;/code&gt; with &lt;code&gt;how="inner"&lt;/code&gt; quietly drops every row without a match. A filter is slightly too aggressive. A &lt;code&gt;dropna&lt;/code&gt; removes more than you intended. The pipeline still runs; it just runs on less data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nulls appear.&lt;/strong&gt; A left join against an incomplete lookup table introduces blank values in the new columns. A &lt;code&gt;reindex&lt;/code&gt; or &lt;code&gt;pivot&lt;/code&gt; creates gaps. Downstream, those nulls become zeros, or get dropped, or silently skew an average.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dtypes drift.&lt;/strong&gt; A column of integers becomes floats after a merge with missing values. A date column comes back as a string. An &lt;code&gt;astype&lt;/code&gt; does something subtly different from what you expected. Nothing breaks immediately — but a join key that flipped from &lt;code&gt;int64&lt;/code&gt; to &lt;code&gt;float64&lt;/code&gt; will silently fail to match later.&lt;/p&gt;

&lt;p&gt;The common thread: none of these raise an error. Your code is "correct" in the sense that it executes. It's just wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The usual way to debug this
&lt;/h2&gt;

&lt;p&gt;When the final number looks off, most of us reach for the same tool — print statements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                   &lt;span class="c1"&gt;# (10000, 8)
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;        &lt;span class="c1"&gt;# (10000, 9), 240 nulls?!
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                   &lt;span class="c1"&gt;# (8800, 9)
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                   &lt;span class="c1"&gt;# (8560, 9)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works. It's also miserable. You're editing working code to add instrumentation, re-running the whole pipeline, eyeballing a wall of numbers, then deleting it all once you've found the culprit — until next time, when you add it all back. You're manually reconstructing information the pipeline already had and threw away.&lt;/p&gt;

&lt;p&gt;What you actually want is to run your code once, normally, and then &lt;em&gt;ask questions about what happened&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A different approach: trace first, ask later
&lt;/h2&gt;

&lt;p&gt;That's the idea behind &lt;code&gt;dframe-trace&lt;/code&gt;. Instead of declaring rules up front or instrumenting by hand, you turn on recording, run your normal code, and interrogate the trace afterward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dframe-trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has no required dependencies — you bring your own pandas and/or polars.&lt;/p&gt;

&lt;p&gt;The lowest-friction way to use it patches the DataFrame methods that most often cause silent bugs, so you don't have to touch your functions at all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dframe_trace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;autopatch&lt;/span&gt;

&lt;span class="n"&gt;autopatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;install&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="c1"&gt;# one line at the top of your script
&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# recorded automatically
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;           &lt;span class="c1"&gt;# recorded automatically
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;           &lt;span class="c1"&gt;# recorded automatically
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where_null_introduced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;   &lt;span class="c1"&gt;# -&amp;gt; "merge"
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;report()&lt;/code&gt; gives you a step-by-step diff of what each operation did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dframe-trace report
============================================================
[0] load  (0.5 ms)
    start: 4 rows, 2 cols
[1] merge_meta  (1.4 ms)
    +cols: ['region']
    nulls region: 0 -&amp;gt; 1  [WARN]
[2] filter  (0.4 ms)
    rows: -1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of bisecting by hand, you get a direct answer: the &lt;code&gt;merge&lt;/code&gt; introduced the nulls in &lt;code&gt;region&lt;/code&gt;, and a later step dropped a row. The questions you can ask map onto the three silent killers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where_null_introduced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# which step first added nulls to this column
&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where_rows_lost&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                 &lt;span class="c1"&gt;# [(step_name, negative_delta), ...]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you'd rather not patch anything globally, there's a decorator form — wrap the functions you care about with &lt;code&gt;@traced("name")&lt;/code&gt; and run them inside the &lt;code&gt;trace()&lt;/code&gt; block. Same recording, more explicit control.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this differs from Great Expectations and Pandera
&lt;/h2&gt;

&lt;p&gt;The Python data-validation space is crowded and mature, so it's worth being precise about where this fits.&lt;/p&gt;

&lt;p&gt;Tools like &lt;strong&gt;Great Expectations, Pandera, and Hamilton&lt;/strong&gt; check your data against rules &lt;em&gt;you write in advance&lt;/em&gt;: "this column must never be null," "row count must stay above 1,000." They're excellent and they're the right choice when you already know what correct looks like and want to enforce it in production.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dframe-trace&lt;/code&gt; is the opposite philosophy: &lt;strong&gt;zero rules.&lt;/strong&gt; You declare nothing. It records what every step did and lets you ask, after the fact, where something changed. It's closer to a profiler for data shape than to a schema checker.&lt;/p&gt;

&lt;p&gt;So the rule of thumb is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Pandera / Great Expectations&lt;/strong&gt; when you know your expectations and want to enforce them.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;&lt;code&gt;dframe-trace&lt;/code&gt;&lt;/strong&gt; when something is &lt;em&gt;already&lt;/em&gt; wrong and you need to find which step did it — or when you want a cheap, always-on record of how data flows through a script.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They're complementary; nothing stops you from using both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Catching regressions in CI
&lt;/h2&gt;

&lt;p&gt;Once you've found a bug, you usually want to make sure it stays fixed. A trace can become a build-failing assertion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dframe_trace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;guards&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_pipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;guards&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_no_new_nulls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;guards&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_no_row_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;    &lt;span class="c1"&gt;# allow expected drops
&lt;/span&gt;&lt;span class="n"&gt;guards&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert_no_silent_casts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;astype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each guard raises with a structured list of violations — "merge introduced 2 null(s) in 'region'" — so a failing build tells you exactly what regressed and where.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is it expensive to leave on?
&lt;/h2&gt;

&lt;p&gt;No, and that's deliberate. A snapshot is &lt;strong&gt;structural only&lt;/strong&gt;: row count, column names, dtypes, per-column null counts, and estimated memory. No row values are ever copied or stored. Outside an active &lt;code&gt;trace()&lt;/code&gt; block, &lt;code&gt;autopatch&lt;/code&gt; adds a single &lt;code&gt;is None&lt;/code&gt; check per call. That's cheap enough to leave installed in development without thinking about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;A debugging tool you can't trust is worse than none, so here's what it doesn't do yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Boolean-mask filtering&lt;/strong&gt; (&lt;code&gt;df[df.x &amp;gt; 0]&lt;/code&gt;) isn't auto-traced — it goes through &lt;code&gt;__getitem__&lt;/code&gt;, which is too broad to patch safely. The row loss still shows up in the &lt;em&gt;next&lt;/em&gt; recorded step's delta; for precise attribution, wrap that step in &lt;code&gt;@traced&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;groupby&lt;/code&gt;&lt;/strong&gt; terminal methods aren't traced yet (it's on the roadmap).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;polars support is newer&lt;/strong&gt; than the pandas path, which is more thoroughly tested.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a young project and a debugging aid, not a correctness guarantee.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If you've ever lost an afternoon to a pipeline that returned the wrong number for no obvious reason, this is built for exactly that afternoon.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dframe-trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/dframe-trace/" rel="noopener noreferrer"&gt;https://pypi.org/project/dframe-trace/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/vimalnakrani08/dframe-trace" rel="noopener noreferrer"&gt;https://github.com/vimalnakrani08/dframe-trace&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Issues and pull requests are welcome — there are tagged good-first-issues on the roadmap (&lt;code&gt;groupby&lt;/code&gt; tracing, Mermaid lineage export, more guards) if you want to contribute. And if you try it on a real pipeline, I'd genuinely like to hear what it caught — or missed.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>pandas</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
