I decided to group day's 5 and 6 together (today and yesterday) since they were both on the same topic. I'd picked up a little bit on pre-processing data through my PhD, but I just didn't really know the actual terms to do things in Python.
This is very helpful, being able to convert categories to binary so that it works better with machine learning
df_origin = pd.get_dummies(df)
Replace all zero values with nan, then drop all rows with nan in:
df.column.replace(0, np.nan, inplace=True)
Scale data down to be between a smaller range of numbers
from sklearn.preprocessing import Scale
X_scaled = scale(X)
Would I recommend the DataCamp course I did? That's actually a tough question. If you already pay for DataCamp or can get a free trial, I'd recommend doing this course. But I don't recommend paying for DataCamp specifically for this course. It's a good course to know the terminology to be able to do some of the basics of machine learning in Python (see previous days for specifics on what the whole course contained). I just feel like there are other free resources you could probably find to pick up the same knowledge.
Anyway, now that I've finished the course I'm going to try and build my first machine learning model on Kaggle tomorrow.