DEV Community

Cover image for Leveraging on ML : Customer Churn Analysis:
CptWycliffe
CptWycliffe

Posted on

Leveraging on ML : Customer Churn Analysis:

Analyzing the Telco Dataset

In a rapidly evolving business industry, companies are constantly seeking ways to optimize their operations enhance customer experience and customer retention. In a bid to increase profit and revenue margin, customer retention is a key factor that any business entity or industry players must focus their resources. With the customers as the key revenue stream source, its therefore imperative to minimize the customer attrition.

One powerful approach is leveraging machine learning (ML) techniques to analyze large datasets and gain valuable insights into customer attrition factors and thereby formulate necessary intervention strategies.

In this article, we present a comprehensive analysis for the Telco Dataset using different ML models

Business Understanding
The business problem is that the customer churn at Telco is impacting the company's revenue and profitability.

Telco is a telecommunications company that provides phone and internet services to its customers. They have observed that their customer churn rate has been increasing over the past year, and they want to understand the reasons behind it and predict which customers are most likely to churn in the future.

In this project, we will analyse customers data from Telco in order to determine the key indicators of churn, and thus, formulate retention strategies that can be implemented to avert the problem.

Scope : The project will focus on analyzing the customer data sources, including customer type, services used and billing information. The analysis will cover just a portion of the number of customers and churn definition will be when a customer cancels their subscription.

Data Understanding
The data for provided for thisproject is in a csv format. The following describes the columns present in the data.

Gender -- Whether the customer is a male or a female
SeniorCitizen -- Whether a customer is a senior citizen or not
Partner -- Whether the customer has a partner or not (Yes, No)
Dependents -- Whether the customer has dependents or not (Yes, No)
Tenure -- Number of months the customer has stayed with the company
Phone Service -- Whether the customer has a phone service or not (Yes, No)
MultipleLines -- Whether the customer has multiple lines or not
InternetService -- Customer's internet service provider (DSL, Fiber Optic, No)
OnlineSecurity -- Whether the customer has online security or not (Yes, No, No Internet)
OnlineBackup -- Whether the customer has online backup or not (Yes, No, No Internet)
DeviceProtection -- Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport -- Whether the customer has tech support or not (Yes, No, No internet)
StreamingTV -- Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies -- Whether the customer has streaming movies or not (Yes, No, No Internet service)
Contract -- The contract term of the customer (Month-to-Month, One year, Two year)
PaperlessBilling -- Whether the customer has paperless billing or not (Yes, No)
Payment Method -- The customer's payment method (Electronic check, mailed check, Bank transfer(automatic), Credit card(automatic))
MonthlyCharges -- The amount charged to the customer monthly
TotalCharges -- The total amount charged to the customer
Churn -- Whether the customer churned or not (Yes or No)

We’ll therefore analyze the relations between customer churn and any of the independent variables (gender, senior citizen status, partner, dependents, tenure, phone service, multiple lines, internet service, online security, online backup, device protection, tech support, streaming TV, streaming movies, contract, paperless billing, payment method, monthly charges, total charges) 

Hypothesis
H0:
We hypothesize that there is a significant relationship between customer churn and at least two of the independent variables.

H1: Alternative hypothesis - There is no significant relationship between any of independent variable and customer churn.

Research Questions
In understanding the Telco customers data trends, we are going to research on the following questions:
1.Is there a correlation between contract length and customer churn?
2.Do customers who have online security and backup services have lower churn rates?
3.Does the payment method have an impact on customer churn?
4.Is there a difference in churn rates between male and female customers?
5.Are customers with dependents less likely to churn compared to those without dependents?
6.Is there a correlation between the Total Charge and customer churn?
7.Are the customers who get TechSupport service less likey to church?
8.Is there a relationship between the customers who get Device protections and churn?
These questions will give us insights into the dataset, understand the distributions of variables, identify correlations, and discover any patterns or anomalies that may impact customer churn.

Findings
Plotting the categorical feature

Image description
Plotting the KDE for the numerical feature

Image description

Image description
From the E.D.A we make the following findings
(i)From the graph Bar Plot of Contract, we can see that more customer that churn have the Month-to-Month subscription. We can therefore conclude that there is a negative correlation between the contact length and customer churn; i.e the shorter the contract length, the more likely that the customer will churn.

(ii)From the Bar Plot of Online Security and Bar Plot of Online Backup While it is evident that that there is lower churn rates for customers with online security and onlineBackup services, there was still more customers that did not have these services but still churned.
(iii)While there is 0.48 probability that a customer without both OnlineSecurity and OnlineBackup will churn, the probability of a customer with both services churning is just 0.1.
(iv)Based on the findings above and from the graph Bar plot of PaymentMethod, we can observe that payment method does have an impact on customer attrition. However, we can conclude that customers using electronic check as their payment method have a significantly higher churn rate compared to other payment methods.
(v)The gender of the customer does not affect the probability of customer attrition.
(vi)Of the total number of customers that churned, abt 66 percent had no dependants while about 33% had dependants.We can therefore conclude that customers with dependants are less likey to churn.
(vii)The customers that did not have TechSuport had the highest attrition.
(viii)A large proportion of customers who churner did not have device protection.

Data Pre-processing
Since our target variable is “Churn”, we will use the syntax pandas.value_counts() to dertermine the number of the target class.

From the class, we see that the data is not balance.
Step 1.
We will balance data using the oversampling technique

Step2. We then split the dataset into train and test

Step 3. Since our categorical columns each have a maximum of three unique feature, we will use the OrdinalEncoder for categorical feature encoding. This technique will assign each unique value an inter 1,2 or 3.

Step 4. Scale the numerical columns using StandardScaler.
Model Building and Evaluation
We will then train and Evaluate different models which include
1)Logistic Regression
2)Decission Tree Classifier
3)Random Forest Classifier
4)Catboost Classifier
5)Gradient Boosting Classifier
6)Vaive Bayes
7)Nearest Neighbours (KNN)
8)Multi-Layer Perceptron
9)Stochastic Gradient Descent (SGD)
10)AdaBoost

Results and Analysis
From the models trained and evaluated, the best two models are SGD and adaboost with the following metrics
AdaBoost
Accuracy : 0.80
F1_Score : 78
Precission : 0.78
Recall : 0.80

SGD
Accuracy : 0.96
F1_Score : 78
Precission : 0.78
Recall : 0.80

Recommendation
a)This insight can help inform strategies to reduce churn, such as promoting alternative payment methods or providing incentives for customers to switch from electronic check to more stable payment methods
b)It's imparative fot the company to initiate measure of encouraging more customers to use TechSupport services in order to reduce the customers' turnover probability.
c)It is necessary for the company to formulate incentive strategies to encourage customers to use the device protection service.
d)THe company can continuously use ML algorithms the find the probability of a customer dropping the service, and thereby employ necessary measures to ensure customer retention.

Top comments (0)