Day 5 + 6: Pre-processing data

#machinelearning #python #100daysofcode #datascience

I decided to group day's 5 and 6 together (today and yesterday) since they were both on the same topic. I'd picked up a little bit on pre-processing data through my PhD, but I just didn't really know the actual terms to do things in Python.

The Code

This is very helpful, being able to convert categories to binary so that it works better with machine learning
df_origin = pd.get_dummies(df)
Replace all zero values with nan, then drop all rows with nan in:
df.column.replace(0, np.nan, inplace=True) df.dropna()
Scale data down to be between a smaller range of numbers
from sklearn.preprocessing import Scale X_scaled = scale(X)

That's the end of the DataCamp course on scikit-learn (well this introduction anyway).

Thoughts on the course

Would I recommend the DataCamp course I did? That's actually a tough question. If you already pay for DataCamp or can get a free trial, I'd recommend doing this course. But I don't recommend paying for DataCamp specifically for this course. It's a good course to know the terminology to be able to do some of the basics of machine learning in Python (see previous days for specifics on what the whole course contained). I just feel like there are other free resources you could probably find to pick up the same knowledge.
Anyway, now that I've finished the course I'm going to try and build my first machine learning model on Kaggle tomorrow.

DEV Community

Day 5 + 6: Pre-processing data

The Code

Thoughts on the course

Top comments (0)

Read next

Why Run LLM's /SLM's locally

Why Seeing Data Beats Reading It: The Case for Data Visualization

Part 11: Building Your Own AI - Introduction to Generative Models: GANs and VAEs

New AI Breakthrough Makes Self-Driving Cars 15x Faster and Safer with Truncated Diffusion Model