DEV Community

Naveen Gokul
Naveen Gokul

Posted on

🧹 Data Cleaning Challenge with Pandas (Google Colab)

🧹 Data Cleaning Challenge with Pandas (Google Colab)

Data cleaning is one of the most crucial steps in any data science or analytics project. In this challenge, I worked on a real-world dataset from Kaggle with over 100,000 rows, performing various Pandas operations to clean, preprocess, and prepare it for further analysis.


📂 Dataset Details

For this challenge, I selected the E-commerce Sales Dataset from Kaggle containing around 120,000 rows and 12 columns.

It includes data such as:

  • 🧾 Order ID
  • 👤 Customer Name
  • 🛒 Product & Quantity
  • 💰 Sales & Discount
  • 🌍 Region
  • 📅 Order Date

Before Cleaning:

  • Rows → 120,000
  • Columns → 12
  • File format → .csv

⚙️ Tools & Environment

  • Python 3
  • Google Colab
  • Libraries: Pandas, NumPy, Matplotlib

python
from google.colab import files
uploaded = files.upload()

import pandas as pd
df = pd.read_csv('ecommerce_sales.csv')
Enter fullscreen mode Exit fullscreen mode

Top comments (0)