I decided to look at a churn data set found on Kaggle ( https://www.kaggle.com/becksddf/churn-in-telecoms-dataset). A churn is when a customer decides to change their telecom service. So the point of this exercise was to try to identify factors that caused customers to switch their plans and to create a model to try and predict them.
EDA
After doing so preliminary exploratory data analysis, I found two stats that deserved more attention when trying to figure out important statistics. Customers who switched are labeled as one in the graphs below and are the second graph in each picture.
In this graph, you can see how customers with an international plan were more willing to switch plans. Maybe they weren't happy with the service or high prices.
In this graph, you can see how customers who switched called customer service a lot more than customers who didn't switch. Maybe they were so unhappy with the service that they wanted to switch. This makes sense because people generally stick to the status quo until they absolutely have to change.
Modeling
Now time for some modeling.
No. Not that kind of modeling!
Logistic Regression
For my first take, I tried a logistic regression model on the data. This model found these columns important.
Interestingly enough, this model found the number of customer calls the most important thing when predicting a churn, just like I thought earlier. However, this model was only 81% accurate.
For my next blog post, I will try a random forest model and see how well it fares.
Top comments (0)