Oussama Belhadi

Posted on Aug 18

Predicting Customer Churn with TensorFlow – A Beginner-Friendly Guide

#tensorflow #data #python #machinelearning

Introduction

Customer churn is when customers leave a company. Predicting churn helps businesses retain valuable customers and increase revenue.

In this tutorial, I’ll show you how to use TensorFlow, pandas, and scikit-learn to build a neural network that predicts churn based on a real dataset.

You can find a working ready to test/use example in my Github

No heavy theory — just step-by-step coding, explanations, and visuals.

We have a .csv file that holds the customers data and we will use it as a dataset to train our model today, It's also available in the Github Repo

Step 1: Setting Up the Environment

We need these libraries:

pip install pandas numpy scikit-learn tensorflow matplotlib

pandas → for data manipulation
numpy → for numeric computations
scikit-learn → preprocessing, scaling, train/test splitting
tensorflow → building neural networks
matplotlib → plotting results

Step 2: Load and Inspect the Dataset

Load the dataset with pandas:

import pandas as pd

df = pd.read_csv("customer_churn.csv")
df.head()

Tip: ⚠️ Always check your column names. Spaces or extra characters can break code later:

df.columns = df.columns.str.strip().str.replace(" ", "_")

Step 3: Clean the Data

Convert numeric columns with potential issues:

df['Total_Charges'] = pd.to_numeric(df['Total_Charges'], errors='coerce')

Drop missing rows and irrelevant columns:

df = df.dropna()
df.drop('Customer_ID', axis=1, inplace=True)

Step 4: Encode Categorical Variables

Neural networks cannot process text. Convert categories to numbers:

from sklearn.preprocessing import LabelEncoder

df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})

cat_cols = df.select_dtypes(include='object').columns
le = LabelEncoder()
for col in cat_cols:
    df[col] = le.fit_transform(df[col])

Example: Male → 1, Female → 0. Similarly for other categories.

Step 5: Split Features and Target

Separate input features (X) and output (y);
Before scaling, each feature (column) has its own mean and standard deviation. Neural networks learn better when features are roughly in the same range.

Mean: average value of the feature
Standard Deviation: measures how spread out the values are

*The formula for the mean (μ) of a dataset with N values is:
*

Standard Scaler subtracts the mean and divides by the standard deviation,
The formula for the standard deviation is:

After scaling, each feature has mean ~0 and std ~1.

X = df.drop('Churn', axis=1)
y = df['Churn']

Scale features (important for neural networks):

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Split into train/test sets:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

Step 6: Build and Train the Neural Network

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

Why these layers and activations?

Dense(32) and Dense(16) → number of neurons in each hidden layer. Experiment to see what works best.
ReLU activation → introduces non-linearity, helps the network learn complex patterns.
Sigmoid in output → outputs a probability between 0 and 1, perfect for binary classification.

Optimizer: Adam
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

Why Adam?

Adaptive optimizer: adjusts learning rate automatically
Combines advantages of Momentum and RMSProp
Works well out-of-the-box for most problems
Loss function: binary_crossentropy → suitable for predicting 0/1 outcomes.
Metric: accuracy → how often the model predicts correctly.
Training

history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=20,
    batch_size=32
)

Epochs = 20 → model sees the dataset 20 times.

Batch size = 32 → updates weights every 32 samples.

Step 7: Evaluate and Visualize

import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Accuracy over Epochs')
plt.legend()
plt.show()

Ouput Example :

Train vs Validation curves → check for overfitting/underfitting.

Step 8: Predict Churn for a New Customer

import numpy as np
import pandas as pd

new_customer = pd.DataFrame([{
    'Gender': 0, 'Senior_Citizen': 0, 'Partner': 1, 'Dependents': 0,
    'tenure': 12, 'Phone_Service': 1, 'Multiple_Lines': 0, 'Internet_Service': 0,
    'Online_Security': 2, 'Online_Backup': 0, 'Device_Protection': 1,
    'Tech_Support': 0, 'Streaming_TV': 0, 'Streaming_Movies': 1, 'Contract': 0,
    'Paperless_Billing': 1, 'Payment_Method': 2, 'Monthly_Charges': 50.0, 'Total_Charges': 500.0
}])

new_customer_scaled = scaler.transform(new_customer)
churn_prob = model.predict(new_customer_scaled)[0][0]
churn_label = int(churn_prob > 0.5)

print(f"Churn Probability: {churn_prob:.2f}")
print(f"Churn Prediction: {churn_label} ({'Yes' if churn_label==1 else 'No'})")

Conclusion

You now have a complete pipeline to:

Clean and preprocess data
Train a neural network in TensorFlow
Evaluate model performance
Predict churn for new customers

This workflow is reusable for other tabular datasets and binary classification problems.

DEV Community