Background
After 2+ years in QA (Meta, Microsoft) and RPA consulting, I decided to transition to automation engineering. This is my first Python project, built in 4 days, documented completely.
The Challenge
Build a production-ready CSV cleaner that:
- Never loses data (even invalid entries)
- Provides detailed error reports
- Handles real-world messy data
- Uses quality-first principles
What I Built
[Screenshot of your terminal output]
A Python script that:
✅ Cleans 1000+ contacts in seconds
✅ Validates emails, phones, names, ages
✅ Separates valid from invalid data
✅ Generates detailed error reports
The Journey (Day by Day)
Day 1-2: Python Fundamentals
- Variables, strings, functions
- Dictionaries and lists
- CSV file handling
- Hardest part: Understanding loops and data flow
Day 3: Building the Core
- Wrote 8 cleaning & validation functions
- Implemented error handling
- Breakthrough moment: Realizing each function should return errors as a list
Day 4: Integration & Testing
- Combined all functions
- Added file writing
- Tested with messy data
- Key learning: Separation of concerns (cleaning vs validation)
Key Code Sections
The Validation Pattern
def validate_email(email):
"""Check email structure"""
errors = []
if "@" not in email:
errors.append("Missing @")
# More checks...
return errors
This pattern:
- Returns a list (can collect multiple errors)
- Clear error messages
- Easy to extend
The Main Loop
for row_num, row in enumerate(reader, start=2):
all_errors = []
# Clean
cleaned_name = clean_name(row.get("Name", ""))
# Validate
all_errors.extend(validate_name(cleaned_name))
# Decide
if all_errors:
error_contacts.append(...)
else:
clean_contacts.append(...)
What I Learned
Technical Skills:
- Python fundamentals
- CSV processing
- Error handling patterns
- Function design for reusability
Meta-Skills:
- How to learn efficiently (fundamentals before frameworks)
- How to debug systematically
- How to write readable code
- How to document your work
QA Mindset Applied to Code:
- Test edge cases (empty strings, None values)
- Detailed error reporting
- Data integrity (never lose information)
- Clear documentation
Mistakes I Made
-
Initially tried to do everything in one function
- Solution: Split into cleaning and validation
-
Forgot error handling on type conversions
- Solution: try/except blocks everywhere
-
Wanted to make it "perfect" before shipping
- Solution: Ship working version, iterate later
The Results
Project Stats:
- ~200 lines of code
- 8 functions
- 4 days start to finish
- 100% written by myself (with learning resources)
Real-World Performance:
- 1,000 rows: < 1 second
- 10,000 rows: ~3 seconds
- Handles all edge cases gracefully
What's Next
Short term:
- Build n8n workflow automation
- Learn Pandas (see how professionals do this)
- Add more validation features
Medium term:
- 4-6 portfolio projects
- First freelance automation work
- Technical blog (weekly updates)
Long term:
- Full-time automation engineer role
- Specialize in workflow automation
- Help others transition to tech
Resources That Helped
- Python documentation
- Stack Overflow for specific syntax
- ChatGPT for explaining concepts
- Key insight: Learn fundamentals BEFORE frameworks
Takeaways for Aspiring Developers
- Start ugly, refine later - Working code beats perfect code
- Build in public - Accountability and feedback accelerate growth
- QA/testing experience is valuable - Quality mindset transfers to code
- 4 days is enough - You don't need months to build something real
The Code
Full project on GitHub: https://github.com/jaber17/csv-contact-cleaner/tree/main
Feel free to:
- Use it for your projects
- Suggest improvements
- Ask questions in comments
Top comments (0)