<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abinesh N</title>
    <description>The latest articles on DEV Community by Abinesh N (@abinesh_n23022006).</description>
    <link>https://dev.to/abinesh_n23022006</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1972330%2F175ae4c6-7f18-455f-9f83-aa6420b544f8.jpg</url>
      <title>DEV Community: Abinesh N</title>
      <link>https://dev.to/abinesh_n23022006</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abinesh_n23022006"/>
    <language>en</language>
    <item>
      <title>Stop adding print statements to debug your data pipeline — use watcher instead</title>
      <dc:creator>Abinesh N</dc:creator>
      <pubDate>Wed, 20 May 2026 16:19:50 +0000</pubDate>
      <link>https://dev.to/abinesh_n23022006/stop-adding-print-statements-to-debug-your-data-pipeline-use-watcher-instead-43oa</link>
      <guid>https://dev.to/abinesh_n23022006/stop-adding-print-statements-to-debug-your-data-pipeline-use-watcher-instead-43oa</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0617byd0mhhdojxpr0b7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0617byd0mhhdojxpr0b7.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  I built a Python decorator that watches your DataFrame pipelines automatically
&lt;/h1&gt;

&lt;p&gt;You know this moment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Input&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;000&lt;/span&gt;
&lt;span class="n"&gt;Output&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="mi"&gt;263&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;979&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Somewhere in your pipeline, 736k rows disappeared.&lt;/p&gt;

&lt;p&gt;Which step caused it?&lt;/p&gt;

&lt;p&gt;A bad merge?&lt;br&gt;
A silent &lt;code&gt;dropna()&lt;/code&gt;?&lt;br&gt;
A duplicate join key?&lt;br&gt;
A dtype issue?&lt;br&gt;
A filter you forgot existed?&lt;/p&gt;

&lt;p&gt;So you start adding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…everywhere.&lt;/p&gt;

&lt;p&gt;Then rerun the entire pipeline again.&lt;/p&gt;

&lt;p&gt;That frustration is why I built &lt;strong&gt;dfwatcher&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is dfwatcher?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;dfwatcher&lt;/code&gt; is a lightweight decorator for pandas pipelines that automatically tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;row count changes&lt;/li&gt;
&lt;li&gt;null deltas&lt;/li&gt;
&lt;li&gt;schema drift&lt;/li&gt;
&lt;li&gt;dtype changes&lt;/li&gt;
&lt;li&gt;join explosions&lt;/li&gt;
&lt;li&gt;memory usage&lt;/li&gt;
&lt;li&gt;pipeline summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;with &lt;strong&gt;zero config&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just decorate your functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;watcher&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;watch&lt;/span&gt;

&lt;span class="nd"&gt;@watch&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@watch&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;merge_orders()  964,203 → 1,069,104  ▲ +104,901 rows (+10.9%) ⚠

  columns added : +tier

  💥 join explosion · duplication ratio 10.9%

  key column     top value    repeat count
  customer_id    9182               184
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of just telling you rows increased…&lt;/p&gt;

&lt;p&gt;…it tells you &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;Most pipeline bugs are not syntax bugs.&lt;/p&gt;

&lt;p&gt;They’re &lt;em&gt;data drift bugs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The code runs successfully.&lt;br&gt;
The tests pass.&lt;br&gt;
The pipeline completes.&lt;/p&gt;

&lt;p&gt;But the data quietly changes shape somewhere in the middle.&lt;/p&gt;

&lt;p&gt;Those are the hardest bugs to debug because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they’re silent&lt;/li&gt;
&lt;li&gt;they propagate downstream&lt;/li&gt;
&lt;li&gt;and they’re usually discovered hours later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted something that behaves like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“git diff for DataFrames”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;but automatically during execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Row tracking
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;clean()  1,000,000 → 964,203  ▼ -35,797 rows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Null tracking
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nulls -35,797  status  (35,797 → 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Schema drift detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;columns added : +revenue_band
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dtype change detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dtype change : customer_id  int64 → object
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Join explosion detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;💥 join explosion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Threshold guards
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;warn_on_loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;raise_on_loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.20&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Turn silent data corruption into CI failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Session summaries
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nightly ETL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the end you get a full pipeline summary automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  What surprised me while building it
&lt;/h2&gt;

&lt;p&gt;The hardest part wasn’t row tracking.&lt;/p&gt;

&lt;p&gt;It was making the output useful &lt;em&gt;without becoming noisy&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A debugging tool that prints too much becomes another thing developers ignore.&lt;/p&gt;

&lt;p&gt;So I focused heavily on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;readable terminal formatting&lt;/li&gt;
&lt;li&gt;meaningful warnings&lt;/li&gt;
&lt;li&gt;showing only the most important changes&lt;/li&gt;
&lt;li&gt;zero-config defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;install → decorate → immediately useful&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;p&gt;Currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pandas support&lt;/li&gt;
&lt;li&gt;memory tracking&lt;/li&gt;
&lt;li&gt;custom handlers&lt;/li&gt;
&lt;li&gt;CI-friendly summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Planned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polars backend&lt;/li&gt;
&lt;li&gt;DuckDB backend&lt;/li&gt;
&lt;li&gt;HTML / notebook renderer&lt;/li&gt;
&lt;li&gt;structured JSON logging&lt;/li&gt;
&lt;li&gt;global config system&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dfwatcher
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub:&lt;br&gt;
&lt;a href="https://github.com/Abineshabee/watcher" rel="noopener noreferrer"&gt;https://github.com/Abineshabee/watcher&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PyPI:&lt;br&gt;
&lt;a href="https://pypi.org/project/dfwatcher/" rel="noopener noreferrer"&gt;https://pypi.org/project/dfwatcher/&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;I’d genuinely love feedback from data engineers, ML engineers, analytics engineers, and pandas users.&lt;/p&gt;

&lt;p&gt;Especially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;features you wish pipeline tools had&lt;/li&gt;
&lt;li&gt;debugging pain points&lt;/li&gt;
&lt;li&gt;weird merge bugs you’ve experienced&lt;/li&gt;
&lt;li&gt;ideas for Polars / DuckDB support&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
  </channel>
</rss>
