Happy Monday! Today is all about ROCs
Okay not that kind of rock. Instead, as I found out, it's a way to evaluate how well logistic regression works (and maybe other types of models, I'm not 100%).
So first you start by making a confusion matrix. This takes all of the true positives, false positives, true negatives and false negative and gives you their values. Sounds confusing I know so I guess they got the right name for the matrix.
The next step is to take these four new things and calculate other factors such as precision and accuracy. These give you an idea of how many true positives and false negatives you're getting, respectively. The confusion matrix also produced a value called F score, this is something I've come across before but not really understood so I'll try and brush up on that in the future.
Anyway, from these we can produce a receiver operating characteristic (ROC) curve which allows us to see these things more quickly in a graph. Here's an example of a ROC curve I found on Google:
So the diagonal line is how good your model would be if it just classified between two things at random, it would be right about half the time. The orange curve shows how good your model is. Basically, the closer it is to the top left the better your model is. With a graph like this, you can plot a ROC curve for multiple models and compare how they do. So if you're say comparing KNN and logistical regression you can visually see how well they perform in a specific task and choose the one that's right for you. Neat huh?
I'm trying a new thing today where I post some of the code I learnt so you can try and learn these new things with me. What do you think? Would you like to see more of this or would you prefer I keep it more of a diary?
The Code
How to make a confusion matrix:
From sklearn.metrics import confusion_matrix
Print(confusion_matrix(y_test, y_pred))
How to make and plot a ROC curve:
From sklearn.metrics import roc_curve
Y_pred_prob = logreg.predict_proba(X_test)[:,1]
Fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
Plt.plot([0,1], [0,1], ‘k—‘)
Plt.plot(fpr, tpr, label=’Logistic Regression’)
Plt.xlabel(‘False Positive Rate’)
Plt.ylabel(‘True Positive Rate’)
Plt.title(‘Logistic Regression ROC Curve’)
Plt.show()
Top comments (0)