Feedback needed: Mini Data Cleaning & Feature Engineering Project (Café Sales)

#machinelearning #python #programming #datascience

Hey all,

I'm fairly new to data work and just finished a small project to get hands-on experience with data cleaning and feature engineering. It’s based on a simulated café sales dataset from Kaggle.

This is my first real attempt at tackling messy data, and I’d love to hear from anyone - especially those of you working with data professionally or regularly - about how I did and how I can improve.

About the Project:

Dataset: Artificially generated café sales data (10,000 rows)
Tools used: Python (Pandas, NumPy), Jupyter Notebook
Goal: Learn and demonstrate data cleaning techniques

What I worked on:

Handling missing values
Fixing inconsistent text formatting
Correcting data types
Replacing unclear placeholders like "error" or "unknown"

GitHub:
Check it out here

I'd be super grateful for your feedback on:
How clean and readable my code is
Whether my cleaning approach makes sense
Ideas on what I could have done better or differently

Thank you so much in advance! I truly appreciate every single comment or suggestion you might have. If you have any tips on how I can continue learning or what to explore next, I'd love to hear them!

Thank you.

DEV Community

Feedback needed: Mini Data Cleaning & Feature Engineering Project (Café Sales)

Top comments (0)