DEV Community

Cover image for Day 5 + 6: Pre-processing data
Jamie
Jamie

Posted on

Day 5 + 6: Pre-processing data

I decided to group day's 5 and 6 together (today and yesterday) since they were both on the same topic. I'd picked up a little bit on pre-processing data through my PhD, but I just didn't really know the actual terms to do things in Python.
Clefairy trying to use a computer

The Code

This is very helpful, being able to convert categories to binary so that it works better with machine learning
df_origin = pd.get_dummies(df)
Replace all zero values with nan, then drop all rows with nan in:
df.column.replace(0, np.nan, inplace=True)
df.dropna()

Scale data down to be between a smaller range of numbers
from sklearn.preprocessing import Scale
X_scaled = scale(X)

That's the end of the DataCamp course on scikit-learn (well this introduction anyway).
Kid walking down the hall to applause but also looking scared, because that's how I feel

Thoughts on the course

Would I recommend the DataCamp course I did? That's actually a tough question. If you already pay for DataCamp or can get a free trial, I'd recommend doing this course. But I don't recommend paying for DataCamp specifically for this course. It's a good course to know the terminology to be able to do some of the basics of machine learning in Python (see previous days for specifics on what the whole course contained). I just feel like there are other free resources you could probably find to pick up the same knowledge.
Anyway, now that I've finished the course I'm going to try and build my first machine learning model on Kaggle tomorrow.
Me freaking out and being all scared because that's how my brain feels right now

Top comments (0)