DEV Community

Likith N
Likith N

Posted on

AutoCleanML – Intelligent ML Data preprocessing automation (pip install autocleanml)

If you’ve ever built a machine learning project, you already know the truth:
80% of ML work is data cleaning.
And 80% of that cleaning is… repetitive.

Handling missing values, encoding categoricals, scaling features, fixing data types — every new dataset, same boilerplate, different notebook.

After repeating this cycle one too many times, I decided to automate it.
That’s how AutoCleanML was born

The Problem I Faced

As a student working on multiple ML projects and datasets, I noticed:

  • Writing the same preprocessing code again and again
  • Inconsistent cleaning logic across projects
  • Hard-to-maintain notebooks
  • Beginners getting stuck before even training a model

I wanted something that:

  • Works out of the box
  • Follows best practices
  • Is modular, reusable, and simple

✨ Introducing AutoCleanML

AutoCleanML is a Python library that helps you automatically clean and preprocess datasets for machine learning with minimal code.

It’s built for:

  • Students
  • ML beginners
  • Data science interns
  • Anyone tired of rewriting preprocessing logic

Using AutoCleanML
With AutoCleanML, you can go from a raw dataset to train-test splits in just a few lines.

import pandas as pd
from autocleanml import AutoCleanML

# Load dataset
df = pd.read_csv("data.csv")

# Initialize cleaner
cleaner = AutoCleanML(target_column="target")

# Clean data and split automatically
X_train, X_test, y_train, y_test, report = cleaner.fit_transform(df)

# Check preprocessing summary
print(report)
Enter fullscreen mode Exit fullscreen mode

If data cleaning feels repetitive or slows down your ML projects, give AutoCleanML a try and see how much time it saves you.

🔗 GitHub: https://github.com/likith-n/AutoCleanML
📦 PyPI: https://pypi.org/project/AutoCleanML/

I’d genuinely love feedback from the community — whether it’s ideas, issues, or improvements.
If you find it useful, consider ⭐ starring the repo or sharing the post so others can benefit too.

Open source grows through people, not just code ❤️
Happy cleaning & happy modeling!

Top comments (0)