Introduction
Customer churn is when customers leave a company. Predicting churn helps businesses retain valuable customers and increase revenue.
In this tutorial, I’ll show you how to use TensorFlow, pandas, and scikit-learn to build a neural network that predicts churn based on a real dataset.
You can find a working ready to test/use example in my Github
No heavy theory — just step-by-step coding, explanations, and visuals.
We have a .csv file that holds the customers data and we will use it as a dataset to train our model today, It's also available in the Github Repo
Step 1: Setting Up the Environment
We need these libraries:
pip install pandas numpy scikit-learn tensorflow matplotlib
- pandas → for data manipulation
- numpy → for numeric computations
- scikit-learn → preprocessing, scaling, train/test splitting
- tensorflow → building neural networks
- matplotlib → plotting results
Step 2: Load and Inspect the Dataset
Load the dataset with pandas:
import pandas as pd
df = pd.read_csv("customer_churn.csv")
df.head()
Tip: ⚠️ Always check your column names. Spaces or extra characters can break code later:
df.columns = df.columns.str.strip().str.replace(" ", "_")
Step 3: Clean the Data
Convert numeric columns with potential issues:
df['Total_Charges'] = pd.to_numeric(df['Total_Charges'], errors='coerce')
Drop missing rows and irrelevant columns:
df = df.dropna()
df.drop('Customer_ID', axis=1, inplace=True)
Step 4: Encode Categorical Variables
Neural networks cannot process text. Convert categories to numbers:
from sklearn.preprocessing import LabelEncoder
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})
cat_cols = df.select_dtypes(include='object').columns
le = LabelEncoder()
for col in cat_cols:
df[col] = le.fit_transform(df[col])
Example: Male → 1, Female → 0. Similarly for other categories.
Step 5: Split Features and Target
Separate input features (X) and output (y);
Before scaling, each feature (column) has its own mean and standard deviation. Neural networks learn better when features are roughly in the same range.
Mean: average value of the feature
Standard Deviation: measures how spread out the values are
*The formula for the mean (μ) of a dataset with N values is:
*
Standard Scaler subtracts the mean and divides by the standard deviation,
The formula for the standard deviation is:
After scaling, each feature has mean ~0 and std ~1.
X = df.drop('Churn', axis=1)
y = df['Churn']
Scale features (important for neural networks):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Split into train/test sets:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
Step 6: Build and Train the Neural Network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
Dense(16, activation='relu'),
Dense(1, activation='sigmoid')
])
Why these layers and activations?
- Dense(32) and Dense(16) → number of neurons in each hidden layer. Experiment to see what works best.
- ReLU activation → introduces non-linearity, helps the network learn complex patterns.
- Sigmoid in output → outputs a probability between 0 and 1, perfect for binary classification.
Optimizer: Adam
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
Why Adam?
- Adaptive optimizer: adjusts learning rate automatically
- Combines advantages of Momentum and RMSProp
- Works well out-of-the-box for most problems
- Loss function: binary_crossentropy → suitable for predicting 0/1 outcomes.
- Metric: accuracy → how often the model predicts correctly.
- Training
history = model.fit(
X_train, y_train,
validation_data=(X_test, y_test),
epochs=20,
batch_size=32
)
Epochs = 20 → model sees the dataset 20 times.
Batch size = 32 → updates weights every 32 samples.
Step 7: Evaluate and Visualize
import matplotlib.pyplot as plt
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Accuracy over Epochs')
plt.legend()
plt.show()
Ouput Example :
Train vs Validation curves → check for overfitting/underfitting.
Step 8: Predict Churn for a New Customer
import numpy as np
import pandas as pd
new_customer = pd.DataFrame([{
'Gender': 0, 'Senior_Citizen': 0, 'Partner': 1, 'Dependents': 0,
'tenure': 12, 'Phone_Service': 1, 'Multiple_Lines': 0, 'Internet_Service': 0,
'Online_Security': 2, 'Online_Backup': 0, 'Device_Protection': 1,
'Tech_Support': 0, 'Streaming_TV': 0, 'Streaming_Movies': 1, 'Contract': 0,
'Paperless_Billing': 1, 'Payment_Method': 2, 'Monthly_Charges': 50.0, 'Total_Charges': 500.0
}])
new_customer_scaled = scaler.transform(new_customer)
churn_prob = model.predict(new_customer_scaled)[0][0]
churn_label = int(churn_prob > 0.5)
print(f"Churn Probability: {churn_prob:.2f}")
print(f"Churn Prediction: {churn_label} ({'Yes' if churn_label==1 else 'No'})")
Conclusion
You now have a complete pipeline to:
- Clean and preprocess data
- Train a neural network in TensorFlow
- Evaluate model performance
- Predict churn for new customers
This workflow is reusable for other tabular datasets and binary classification problems.
Top comments (0)