AutoCleanML – Intelligent ML Data preprocessing automation (pip install autocleanml)

#opensource #python #automation #machinelearning

If you’ve ever built a machine learning project, you already know the truth:
80% of ML work is data cleaning.
And 80% of that cleaning is… repetitive.

Handling missing values, encoding categoricals, scaling features, fixing data types — every new dataset, same boilerplate, different notebook.

After repeating this cycle one too many times, I decided to automate it.
That’s how AutoCleanML was born

The Problem I Faced

As a student working on multiple ML projects and datasets, I noticed:

Writing the same preprocessing code again and again
Inconsistent cleaning logic across projects
Hard-to-maintain notebooks
Beginners getting stuck before even training a model

I wanted something that:

Works out of the box
Follows best practices
Is modular, reusable, and simple

✨ Introducing AutoCleanML

AutoCleanML is a Python library that helps you automatically clean and preprocess datasets for machine learning with minimal code.

It’s built for:

Students
ML beginners
Data science interns
Anyone tired of rewriting preprocessing logic

Using AutoCleanML
With AutoCleanML, you can go from a raw dataset to train-test splits in just a few lines.

import pandas as pd
from autocleanml import AutoCleanML

# Load dataset
df = pd.read_csv("data.csv")

# Initialize cleaner
cleaner = AutoCleanML(target_column="target")

# Clean data and split automatically
X_train, X_test, y_train, y_test, report = cleaner.fit_transform(df)

# Check preprocessing summary
print(report)

If data cleaning feels repetitive or slows down your ML projects, give AutoCleanML a try and see how much time it saves you.

🔗 GitHub: https://github.com/likith-n/AutoCleanML
📦 PyPI: https://pypi.org/project/AutoCleanML/

I’d genuinely love feedback from the community — whether it’s ideas, issues, or improvements.
If you find it useful, consider ⭐ starring the repo or sharing the post so others can benefit too.

Open source grows through people, not just code ❤️
Happy cleaning & happy modeling!

DEV Community

AutoCleanML – Intelligent ML Data preprocessing automation (pip install autocleanml)

Top comments (0)