Hey all,
I'm fairly new to data work and just finished a small project to get hands-on experience with data cleaning and feature engineering. It’s based on a simulated café sales dataset from Kaggle.
This is my first real attempt at tackling messy data, and I’d love to hear from anyone - especially those of you working with data professionally or regularly - about how I did and how I can improve.
About the Project:
- Dataset: Artificially generated café sales data (10,000 rows)
- Tools used: Python (Pandas, NumPy), Jupyter Notebook
- Goal: Learn and demonstrate data cleaning techniques
What I worked on:
- Handling missing values
- Fixing inconsistent text formatting
- Correcting data types
- Replacing unclear placeholders like "error" or "unknown"
GitHub:
Check it out here
I'd be super grateful for your feedback on:
How clean and readable my code is
Whether my cleaning approach makes sense
Ideas on what I could have done better or differently
Thank you so much in advance! I truly appreciate every single comment or suggestion you might have. If you have any tips on how I can continue learning or what to explore next, I'd love to hear them!
Thank you.
Top comments (0)