DEV Community

Ahmed Jaber Choudhury
Ahmed Jaber Choudhury

Posted on

How I Built a CSV Data Cleaner in 4 Days (Python Beginner Working Project)

Background

After 2+ years in QA (Meta, Microsoft) and RPA consulting, I decided to transition to automation engineering. This is my first Python project, built in 4 days, documented completely.

The Challenge

Build a production-ready CSV cleaner that:

  • Never loses data (even invalid entries)
  • Provides detailed error reports
  • Handles real-world messy data
  • Uses quality-first principles

What I Built

[Screenshot of your terminal output]

A Python script that:
✅ Cleans 1000+ contacts in seconds
✅ Validates emails, phones, names, ages
✅ Separates valid from invalid data
✅ Generates detailed error reports

The Journey (Day by Day)

Day 1-2: Python Fundamentals

  • Variables, strings, functions
  • Dictionaries and lists
  • CSV file handling
  • Hardest part: Understanding loops and data flow

Day 3: Building the Core

  • Wrote 8 cleaning & validation functions
  • Implemented error handling
  • Breakthrough moment: Realizing each function should return errors as a list

Day 4: Integration & Testing

  • Combined all functions
  • Added file writing
  • Tested with messy data
  • Key learning: Separation of concerns (cleaning vs validation)

Key Code Sections

The Validation Pattern

def validate_email(email):
    """Check email structure"""
    errors = []

    if "@" not in email:
        errors.append("Missing @")

    # More checks...

    return errors
Enter fullscreen mode Exit fullscreen mode

This pattern:

  • Returns a list (can collect multiple errors)
  • Clear error messages
  • Easy to extend

The Main Loop

for row_num, row in enumerate(reader, start=2):
    all_errors = []

    # Clean
    cleaned_name = clean_name(row.get("Name", ""))

    # Validate
    all_errors.extend(validate_name(cleaned_name))

    # Decide
    if all_errors:
        error_contacts.append(...)
    else:
        clean_contacts.append(...)
Enter fullscreen mode Exit fullscreen mode

What I Learned

Technical Skills:

  • Python fundamentals
  • CSV processing
  • Error handling patterns
  • Function design for reusability

Meta-Skills:

  • How to learn efficiently (fundamentals before frameworks)
  • How to debug systematically
  • How to write readable code
  • How to document your work

QA Mindset Applied to Code:

  • Test edge cases (empty strings, None values)
  • Detailed error reporting
  • Data integrity (never lose information)
  • Clear documentation

Mistakes I Made

  1. Initially tried to do everything in one function

    • Solution: Split into cleaning and validation
  2. Forgot error handling on type conversions

    • Solution: try/except blocks everywhere
  3. Wanted to make it "perfect" before shipping

    • Solution: Ship working version, iterate later

The Results

Project Stats:

  • ~200 lines of code
  • 8 functions
  • 4 days start to finish
  • 100% written by myself (with learning resources)

Real-World Performance:

  • 1,000 rows: < 1 second
  • 10,000 rows: ~3 seconds
  • Handles all edge cases gracefully

What's Next

Short term:

  • Build n8n workflow automation
  • Learn Pandas (see how professionals do this)
  • Add more validation features

Medium term:

  • 4-6 portfolio projects
  • First freelance automation work
  • Technical blog (weekly updates)

Long term:

  • Full-time automation engineer role
  • Specialize in workflow automation
  • Help others transition to tech

Resources That Helped

  • Python documentation
  • Stack Overflow for specific syntax
  • ChatGPT for explaining concepts
  • Key insight: Learn fundamentals BEFORE frameworks

Takeaways for Aspiring Developers

  1. Start ugly, refine later - Working code beats perfect code
  2. Build in public - Accountability and feedback accelerate growth
  3. QA/testing experience is valuable - Quality mindset transfers to code
  4. 4 days is enough - You don't need months to build something real

The Code

Full project on GitHub: https://github.com/jaber17/csv-contact-cleaner/tree/main

Feel free to:

  • Use it for your projects
  • Suggest improvements
  • Ask questions in comments

Top comments (0)