DEV Community

Cover image for Churn Prediction - Telco Company
Ana Carolina Branco Neumann
Ana Carolina Branco Neumann

Posted on

Churn Prediction - Telco Company

Churn Prediction - Telco Company

📌 Dataset Source:

Telco Customer Churn - Kaggle


📂 Github Repository:

Telco Customer Churn - Github


About the Project

This project leverages Machine Learning to predict customer churn for a telecommunications company. The main goal is to identify patterns indicating the likelihood of cancellation, enabling the company to implement proactive retention strategies before customers discontinue their services.

The primary focus is on the Recall metric, which is crucial for capturing most churners, even at the cost of some false positives, as preventive retention actions are more advantageous for the business.


Exploratory Data Analysis

During the EDA, patterns in the dataset were explored to understand the factors associated with churn. Key findings include:

  • Monthly vs. Long-Term Contracts:

    Customers with monthly contracts showed a higher likelihood of churn, suggesting that long-term contracts may encourage loyalty.

  • Additional Services:

    Customers subscribing to additional services, such as online security or technical support, tended to churn less.

  • Tenure and Monthly Charges:

    • Customers with longer tenure (contract length) exhibited greater loyalty.
    • Higher MonthlyCharges correlated positively with churn.
  • Removal of TotalCharges:


    The TotalCharges column was removed due to high collinearity with tenure, which could affect the model's stability.


Technical Choices

Why SVM? Algorithm

The Support Vector Machine (SVM) was chosen for several reasons:

  1. Efficiency with Smaller Datasets:

    With approximately 7,000 rows, SVM effectively captures complex patterns without overfitting.

  2. Flexible Kernel Options:

    By combining linear and rbf kernels, SVM identifies both linear and non-linear relationships through GridSearchCV.

  3. Binary Classification:

    SVM is well-suited for binary problems like this one, where the goal is to predict churn (Yes or No).

Preprocessing Steps:

  1. Scaling (MinMaxScaler):

    Models like SVM are sensitive to differences in scale. Scaling was applied to normalize numeric variables between 0 and 1.

  2. Encoding (OneHotEncoder):

    Categorical variables were transformed into dummy variables. This ensures that categories are properly represented in a format the model can understand.

Data Splitting and Validation:

  • The dataset was split into 70% training and 30% testing sets.
  • Validation was conducted using 5-fold cross-validation to ensure robust results.

Machine Learning Pipeline

The implementation followed these steps:

  1. Dataset Splitting:

    The dependent variable (Churn) and independent variables were separated, ensuring proper data splitting for training and testing.

  2. Hyperparameter Tuning for SVM:

    Optimization was performed using GridSearchCV, adjusting:

    • C: Regularization parameter, controlling the trade-off between margin and error.
    • Kernel: Evaluation of linear and rbf kernels.
  3. Model Evaluation Metrics:

    The model was assessed using:

    • Accuracy: Percentage of correct predictions.
    • Recall: Identification of churners (true positives).
    • Precision: Percentage of correctly identified churners.
    • F1 Score: Harmonic mean of precision and recall.
    • ROC AUC: The model's ability to distinguish between classes.

Results

Metric Value
Accuracy 80.81%
Recall 56.09%
Precision 74.35%
F1 Score 63.95%
ROC AUC 85.42%

Analysis of Results:

While the accuracy is high, the primary focus was on Recall, achieving 56%. This means that the majority of customers likely to churn were identified, enabling proactive interventions.


Future Improvements

  1. Integrating External Data:

    • Enrich the dataset with customer satisfaction feedback, such as NPS or survey responses.
    • Include economic or regional indicators to identify specific patterns.
  2. Experimenting with Models:

    • Test models like XGBoost or LightGBM, which handle complex interactions well.
    • Perform feature importance analysis to optimize variable selection.
  3. Automation:

    • Develop a real-time pipeline to update the model with periodically refreshed data.
    • Integrate the model into the CRM system for automated retention actions.
  4. Customer Segmentation:

    • Focus retention efforts on high-value or high-risk customer segments.

Project Files

  • EDA.ipynb: Exploratory Data Analysis and main insights.
  • pre_processing.py: Data preprocessing and transformation script.
  • ML_application.py: Machine learning training, validation, and result exportation.

Contact Information:

For further inquiries or collaboration opportunities, feel free to reach out via LinkedIn.

Retry later

Top comments (0)

Retry later
Retry later