<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ahmed Jaber Choudhury</title>
    <description>The latest articles on DEV Community by Ahmed Jaber Choudhury (@jaber17).</description>
    <link>https://dev.to/jaber17</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3789425%2F8aaaa83f-792f-412d-957c-e621f828a561.jpeg</url>
      <title>DEV Community: Ahmed Jaber Choudhury</title>
      <link>https://dev.to/jaber17</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jaber17"/>
    <language>en</language>
    <item>
      <title>How I Built a CSV Data Cleaner in 4 Days (Python Beginner Working Project)</title>
      <dc:creator>Ahmed Jaber Choudhury</dc:creator>
      <pubDate>Tue, 24 Feb 2026 12:25:01 +0000</pubDate>
      <link>https://dev.to/jaber17/how-i-built-a-csv-data-cleaner-in-4-days-python-beginner-working-project-2bck</link>
      <guid>https://dev.to/jaber17/how-i-built-a-csv-data-cleaner-in-4-days-python-beginner-working-project-2bck</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;After 2+ years in QA (Meta, Microsoft) and RPA consulting, I decided to transition to automation engineering. This is my first Python project, built in 4 days, documented completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Build a production-ready CSV cleaner that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never loses data (even invalid entries)&lt;/li&gt;
&lt;li&gt;Provides detailed error reports&lt;/li&gt;
&lt;li&gt;Handles real-world messy data&lt;/li&gt;
&lt;li&gt;Uses quality-first principles&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;[Screenshot of your terminal output]&lt;/p&gt;

&lt;p&gt;A Python script that:&lt;br&gt;
✅ Cleans 1000+ contacts in seconds&lt;br&gt;
✅ Validates emails, phones, names, ages&lt;br&gt;
✅ Separates valid from invalid data&lt;br&gt;
✅ Generates detailed error reports&lt;/p&gt;

&lt;h2&gt;
  
  
  The Journey (Day by Day)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Day 1-2: Python Fundamentals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Variables, strings, functions&lt;/li&gt;
&lt;li&gt;Dictionaries and lists&lt;/li&gt;
&lt;li&gt;CSV file handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardest part:&lt;/strong&gt; Understanding loops and data flow&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 3: Building the Core
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Wrote 8 cleaning &amp;amp; validation functions&lt;/li&gt;
&lt;li&gt;Implemented error handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakthrough moment:&lt;/strong&gt; Realizing each function should return errors as a list&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 4: Integration &amp;amp; Testing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Combined all functions&lt;/li&gt;
&lt;li&gt;Added file writing&lt;/li&gt;
&lt;li&gt;Tested with messy data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key learning:&lt;/strong&gt; Separation of concerns (cleaning vs validation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Code Sections
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Validation Pattern
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check email structure&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing @&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# More checks...
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returns a list (can collect multiple errors)&lt;/li&gt;
&lt;li&gt;Clear error messages&lt;/li&gt;
&lt;li&gt;Easy to extend&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Main Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;all_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Clean
&lt;/span&gt;    &lt;span class="n"&gt;cleaned_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Validate
&lt;/span&gt;    &lt;span class="n"&gt;all_errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;validate_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Decide
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;all_errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_contacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;clean_contacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Technical Skills:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python fundamentals&lt;/li&gt;
&lt;li&gt;CSV processing&lt;/li&gt;
&lt;li&gt;Error handling patterns&lt;/li&gt;
&lt;li&gt;Function design for reusability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Meta-Skills:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to learn efficiently (fundamentals before frameworks)&lt;/li&gt;
&lt;li&gt;How to debug systematically&lt;/li&gt;
&lt;li&gt;How to write readable code&lt;/li&gt;
&lt;li&gt;How to document your work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;QA Mindset Applied to Code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test edge cases (empty strings, None values)&lt;/li&gt;
&lt;li&gt;Detailed error reporting&lt;/li&gt;
&lt;li&gt;Data integrity (never lose information)&lt;/li&gt;
&lt;li&gt;Clear documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mistakes I Made
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Initially tried to do everything in one function&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solution: Split into cleaning and validation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Forgot error handling on type conversions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solution: try/except blocks everywhere&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Wanted to make it "perfect" before shipping&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solution: Ship working version, iterate later&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project Stats:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~200 lines of code&lt;/li&gt;
&lt;li&gt;8 functions&lt;/li&gt;
&lt;li&gt;4 days start to finish&lt;/li&gt;
&lt;li&gt;100% written by myself (with learning resources)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-World Performance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,000 rows: &amp;lt; 1 second&lt;/li&gt;
&lt;li&gt;10,000 rows: ~3 seconds&lt;/li&gt;
&lt;li&gt;Handles all edge cases gracefully&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Short term:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build n8n workflow automation&lt;/li&gt;
&lt;li&gt;Learn Pandas (see how professionals do this)&lt;/li&gt;
&lt;li&gt;Add more validation features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Medium term:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4-6 portfolio projects&lt;/li&gt;
&lt;li&gt;First freelance automation work&lt;/li&gt;
&lt;li&gt;Technical blog (weekly updates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Long term:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full-time automation engineer role&lt;/li&gt;
&lt;li&gt;Specialize in workflow automation&lt;/li&gt;
&lt;li&gt;Help others transition to tech&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources That Helped
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python documentation&lt;/li&gt;
&lt;li&gt;Stack Overflow for specific syntax&lt;/li&gt;
&lt;li&gt;ChatGPT for explaining concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Learn fundamentals BEFORE frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Takeaways for Aspiring Developers
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start ugly, refine later&lt;/strong&gt; - Working code beats perfect code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build in public&lt;/strong&gt; - Accountability and feedback accelerate growth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA/testing experience is valuable&lt;/strong&gt; - Quality mindset transfers to code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 days is enough&lt;/strong&gt; - You don't need months to build something real&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;Full project on GitHub: &lt;a href="https://github.com/jaber17/csv-contact-cleaner/tree/main" rel="noopener noreferrer"&gt;https://github.com/jaber17/csv-contact-cleaner/tree/main&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use it for your projects&lt;/li&gt;
&lt;li&gt;Suggest improvements&lt;/li&gt;
&lt;li&gt;Ask questions in comments&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>automation</category>
      <category>beginners</category>
      <category>career</category>
    </item>
  </channel>
</rss>
