Today's the day to build my first model!
I decided to go with a K-Nearest Neighbours model since I've never done one of those before and I already have the code on how to do it. I've attempted to do linear regression and logistic regression before, but it didn't go well.
Next, I hunted on Kaggle for some data that I could use. I wanted to find one that someone else has used with a KNN before, so that way I know it is possible. After a quick Google search I found this dataset on Diabetes.
I tried to follow the steps I'd been taught on the Data camp course and examine the data first. However, I hit a snag. I've not worked with data like this before so I don't know if it's okay for features such as skin thickness to be 0. So I found someone else's notebook that had worked on this before to see what they did (it's okay because I'm still learning right?). They also weren't sure, so they filled the zero values with the mean values, so I did this too. Now the brain is really sweating.
Okay, so the next step was to build the actual model. I did this by only looking at my notes and I actually managed to get it to work first time! I did have some problems trying to get GridSearchCV (if you're unfamiliar, this is a way to find the best value for your hyper parameters). I think I'll come back to that on Monday and see if I can get that working. Instead, I managed to get a simple for loop working. Then I finished up by making a ROC curve and getting the area under the curve value. It's not perfect, but I'm pretty proud of getting this far and mostly understanding everything I've done.
I don't really have code to show today since it was just putting everything together from the last week. However, if you'd like to see what I've done you can look for yourself. Anyway, have a good weekend. I'll be back on Monday!
Top comments (0)