<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chinthaparthi Sridhar</title>
    <description>The latest articles on DEV Community by Chinthaparthi Sridhar (@c_sridhar_22c16471fa64be8).</description>
    <link>https://dev.to/c_sridhar_22c16471fa64be8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3836429%2F824d1d82-fb03-47ff-92da-50ac06388c5b.jpg</url>
      <title>DEV Community: Chinthaparthi Sridhar</title>
      <link>https://dev.to/c_sridhar_22c16471fa64be8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/c_sridhar_22c16471fa64be8"/>
    <language>en</language>
    <item>
      <title>I Built a Python Package That Automates EDA in One Line</title>
      <dc:creator>Chinthaparthi Sridhar</dc:creator>
      <pubDate>Sat, 21 Mar 2026 04:05:09 +0000</pubDate>
      <link>https://dev.to/c_sridhar_22c16471fa64be8/i-built-a-python-package-that-automates-eda-in-one-line-2e38</link>
      <guid>https://dev.to/c_sridhar_22c16471fa64be8/i-built-a-python-package-that-automates-eda-in-one-line-2e38</guid>
      <description>&lt;p&gt;After writing the same pandas code for every new dataset, I decided to automate it and published it on PyPI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every data scientist knows this pain. You get a new dataset and start typing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;duplicated&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# ... 50 more lines
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same code. Every. Single. Time.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;smarteda&lt;/strong&gt; — a Python package that runs your entire EDA automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;smarteda
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;smarteda&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_data.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run everything at once
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or pick what you need
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basic_eda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# head, tail, info, describe, shape
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;overview&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# shape, memory, dtypes, constant columns
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# missing values + fill suggestions
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;outliers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# IQR + Z-score + Isolation Forest
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;correlations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# multicollinearity warnings + heatmap
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggestions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# smart recommendations + ML score
&lt;/span&gt;
&lt;span class="c1"&gt;# Clean your data
&lt;/span&gt;&lt;span class="n"&gt;clean_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# returns new cleaned df
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# modifies df directly
&lt;/span&gt;
&lt;span class="c1"&gt;# Generate full HTML report
&lt;/span&gt;&lt;span class="n"&gt;smarteda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Output
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== Smart Suggestions ===

⚠️  Column 'salary' has 16.7% missing → fill missing values
⚠️  Column 'salary' is highly skewed (skew=1.53) → apply log transformation
⚠️  '10_yop' and '12_yop' are 100% correlated → drop one to avoid multicollinearity
⚠️  Column 'Name' has high cardinality (20000 unique) → use target encoding
✅  No duplicates found

💡 ML Readiness Score: 74 / 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  All 15 Functions
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;basic_eda(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Head, tail, sample, shape, size, info, describe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;overview(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shape, memory, dtypes, constant columns, wrong type detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;missing(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Missing counts, percentages, heatmap, fill strategy suggestions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;duplicates(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Count and display duplicate rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;duplicates(df, drop=True)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Drop duplicates and return clean DataFrame&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;outliers(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;IQR + Z-score + Isolation Forest outlier detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;distributions(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Skewness, kurtosis, transformation suggestions, KDE plots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;correlations(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pearson/Spearman/Kendall, multicollinearity warnings, heatmap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;categorical(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Value counts, high cardinality, encoding suggestions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;timeseries(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auto datetime detection, trends, seasonality, gap detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;suggestions(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Smart recommendations + ML Readiness Score /100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;clean(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auto clean — drop dupes, fill nulls, fix types, cap outliers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;visualize(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auto charts for every column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;analyze(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🚀 Runs ALL functions in one call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;report(df)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;📄 Generates full standalone HTML report&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Publishing to PyPI is easier than you think&lt;/strong&gt;&lt;br&gt;
Build with &lt;code&gt;python -m build&lt;/code&gt;, upload with &lt;code&gt;twine upload dist/*&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Jupyter has quirks&lt;/strong&gt;&lt;br&gt;
When a function returns a dictionary, Jupyter auto-displays it — giving double output. Fixed it with a custom &lt;code&gt;SilentDict&lt;/code&gt; class that overrides &lt;code&gt;__repr__&lt;/code&gt; to return an empty string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Deprecation warnings matter&lt;/strong&gt;&lt;br&gt;
Pandas is actively deprecating &lt;code&gt;infer_datetime_format&lt;/code&gt; and &lt;code&gt;select_dtypes(include='object')&lt;/code&gt;. Wrapping these in &lt;code&gt;warnings.catch_warnings()&lt;/code&gt; keeps the package clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The gap between "it works" and "ready to publish" is large&lt;/strong&gt;&lt;br&gt;
The code worked on day one. But edge cases, warnings, Jupyter quirks, docstrings, README, PyPI config — that's where the real work was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📦 PyPI: &lt;a href="https://pypi.org/project/smarteda/" rel="noopener noreferrer"&gt;https://pypi.org/project/smarteda/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 GitHub: &lt;a href="https://github.com/sridharchinthaparthi/smarteda" rel="noopener noreferrer"&gt;https://github.com/sridharchinthaparthi/smarteda&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this helped you or gave you an idea — drop a comment. Always happy to talk Python and data science! 🚀&lt;/p&gt;

</description>
      <category>python</category>
      <category>pandas</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
