<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Atharva Shah</title>
    <description>The latest articles on DEV Community by Atharva Shah (@atharva55).</description>
    <link>https://dev.to/atharva55</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3835941%2Ffedfa293-c785-4504-8337-4d61631210e1.png</url>
      <title>DEV Community: Atharva Shah</title>
      <link>https://dev.to/atharva55</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/atharva55"/>
    <language>en</language>
    <item>
      <title>Building pandasclean — a pandas data cleaning library from scratch to PyPI</title>
      <dc:creator>Atharva Shah</dc:creator>
      <pubDate>Fri, 20 Mar 2026 18:25:31 +0000</pubDate>
      <link>https://dev.to/atharva55/building-pandasclean-a-pandas-data-cleaning-library-from-scratch-to-pypi-54e</link>
      <guid>https://dev.to/atharva55/building-pandasclean-a-pandas-data-cleaning-library-from-scratch-to-pypi-54e</guid>
      <description>&lt;h2&gt;
  
  
  How I Built and Published My First Python Library as a Semester 4 Student
&lt;/h2&gt;

&lt;p&gt;Every data project I start looks the same. Load the data, spend 30 minutes hunting for outliers, write the same NaN handling code I wrote last week, watch my notebook eat RAM. Then repeat it all for the next project.&lt;/p&gt;

&lt;p&gt;I got tired of it. So I built a library.&lt;/p&gt;

&lt;p&gt;This is the story of how I went from a frustrated CS student to publishing &lt;strong&gt;pandasclean&lt;/strong&gt; on PyPI — and what I learned along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;It started simple. I just wanted a function that could detect outliers and let me choose what to do with them. But once I had that, I thought — why not add NaN handling? And memory reduction? And a single function that runs everything?&lt;/p&gt;

&lt;p&gt;Three weeks later I had a published library.&lt;/p&gt;




&lt;h2&gt;
  
  
  What pandasclean Does
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pandasclean&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has four core functions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;find_outliers()&lt;/code&gt; — IQR based outlier detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandasclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;find_outliers&lt;/span&gt;

&lt;span class="c1"&gt;# Just show me the bounds
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_outliers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;report&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Drop outlier rows
&lt;/span&gt;&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_outliers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;drop&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cap values instead of dropping (Winsorization)
&lt;/span&gt;&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bounds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_outliers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cap&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The IQR method computes bounds as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;lower = Q1 - (multiplier × IQR)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;upper = Q3 + (multiplier × IQR)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use &lt;code&gt;multiplier=1.5&lt;/code&gt; for mild outliers and &lt;code&gt;multiplier=3.0&lt;/code&gt; for extreme ones only.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;handle_nan()&lt;/code&gt; — Missing value handling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandasclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;handle_nan&lt;/span&gt;

&lt;span class="c1"&gt;# Fill with mean, median, or custom values
&lt;/span&gt;&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;handle_nan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;handle_nan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;custom&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fill_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Or drop rows/columns entirely
&lt;/span&gt;&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;handle_nan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;drop&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;code&gt;reduce_memory()&lt;/code&gt; — Dtype downcasting
&lt;/h3&gt;

&lt;p&gt;This one surprised me the most when I saw the results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandasclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;reduce_memory&lt;/span&gt;

&lt;span class="n"&gt;before&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memory_usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df_optimized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reduce_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;after&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_optimized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memory_usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Before: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;before&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; MB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;After:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;after&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; MB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: 1527.07 MB
After:  371.93 MB
Reduction: 75.6%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a 15 million row dataset. That number still makes me happy.&lt;/p&gt;

&lt;p&gt;What's happening under the hood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;int64&lt;/code&gt; → smallest safe integer type (&lt;code&gt;int8&lt;/code&gt;, &lt;code&gt;int16&lt;/code&gt;, or &lt;code&gt;int32&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;float64&lt;/code&gt; → &lt;code&gt;float32&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Low cardinality string columns → &lt;code&gt;category&lt;/code&gt; dtype&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;auto_clean()&lt;/code&gt; — One function to rule them all
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandasclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;auto_clean&lt;/span&gt;

&lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;auto_clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Runs NaN handling, memory reduction and outlier detection in the correct order with sensible defaults.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Interesting Technical Bits
&lt;/h2&gt;

&lt;p&gt;Building this taught me things I never would have learned from a tutorial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The IQR = 0 edge case&lt;/strong&gt; — what happens when a column has constant values? &lt;code&gt;Q1 == Q3&lt;/code&gt;, so &lt;code&gt;IQR = 0&lt;/code&gt;, and the bounds collapse. I had to add a guard for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pandas StringDtype compatibility&lt;/strong&gt; — newer versions of pandas use &lt;code&gt;pd.StringDtype()&lt;/code&gt; instead of plain &lt;code&gt;object&lt;/code&gt; for string columns. My &lt;code&gt;is_object_dtype()&lt;/code&gt; check was returning &lt;code&gt;False&lt;/code&gt; for string columns and silently skipping them. Fixed by also checking &lt;code&gt;is_string_dtype()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nullable integers&lt;/strong&gt; — pandas has two integer systems. &lt;code&gt;int64&lt;/code&gt; (numpy) can't hold NaN. &lt;code&gt;Int64&lt;/code&gt; (pandas nullable, capital I) can. Trying to downcast &lt;code&gt;Int64&lt;/code&gt; with NaN to numpy &lt;code&gt;int8&lt;/code&gt; raises &lt;code&gt;cannot convert NA to integer&lt;/code&gt;. The fix was to detect NaN integers and downcast to nullable &lt;code&gt;Int8/Int16/Int32&lt;/code&gt; instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Publishing to PyPI
&lt;/h2&gt;

&lt;p&gt;This was simpler than I expected. You need three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[build-system]&lt;/span&gt;
&lt;span class="py"&gt;requires&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;["setuptools&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;61.0&lt;/span&gt;&lt;span class="s"&gt;"]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="py"&gt;build-backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"setuptools.build_meta"&lt;/span&gt;

&lt;span class="nn"&gt;[project]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pandasclean"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.2"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"A lightweight library for cleaning and optimizing pandas DataFrames"&lt;/span&gt;
&lt;span class="py"&gt;dependencies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;["pandas&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.3&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;", "&lt;/span&gt;&lt;span class="py"&gt;numpy&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.21&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;"]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Build:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;build twine
python &lt;span class="nt"&gt;-m&lt;/span&gt; build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Upload:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;twine upload dist/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your library is live.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;96 downloads on day one&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;75.6% memory reduction&lt;/strong&gt; on a 15M row dataset&lt;/li&gt;
&lt;li&gt;Works across Python 3.8 to 3.12&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;More than I expected. Here's the short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write tests before you think you need them — they saved me multiple times&lt;/li&gt;
&lt;li&gt;Edge cases hide everywhere — IQR = 0, nullable integers, pandas version differences&lt;/li&gt;
&lt;li&gt;Shipping something imperfect is better than perfecting something unshipped&lt;/li&gt;
&lt;li&gt;A published library on your resume hits differently than a GitHub repo&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Z-score outlier detection (v0.2.0)&lt;/li&gt;
&lt;li&gt;Column name standardisation&lt;/li&gt;
&lt;li&gt;Duplicate detection&lt;/li&gt;
&lt;li&gt;sklearn pipeline integration&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/pandasclean" rel="noopener noreferrer"&gt;https://pypi.org/project/pandasclean&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/atharva557/Pandasclean" rel="noopener noreferrer"&gt;https://github.com/atharva557/Pandasclean&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try it and something breaks — please open an issue. Feedback from real users is worth more than anything else at this stage.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>opensource</category>
      <category>python</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
