This is a continuation of my last post (https://dev.to/mayankp/churn-prediction-c75). For this blog, I will talk about how well a random forest model did.
Random Forest
I tested out different types of criterion for determining the better model.
This model ended up being my best one with an roc_auc of 91.5%, which is pretty good.
The random forest model also find international plan, customer service calls, and total day minutes to be the most important factors.This somewhat makes sense because the people who are charged the most usually talk a lot more or have to make expensive calls (like international calls). Also, going back to my last blog, people generally like the status quo until they become irritated enough to change it, and high charges fall into that category.
Imbalances
While dealing with this data set, I realized that there was an imbalance between churns and non-churns. Approximately 15-20% of the data resulted in a churn, while the rest did not. This could cause problems for the data because it didn't learn from the churn data points enough. So in order to balance the data out, I employed the use of Smote. Smote creates new data based on the information it has from the existing data set. By doing this, you can give the model your training more information to learn from, which is always good.
Surely enough this increased my roc_auc to 96.1%.
Top comments (0)