๐ง Introduction
In this project, we explore Customer Segmentation using the famous Mall Customers Dataset. We'll apply KMeans Clustering to group similar customers and Random Forest Regression to predict spending scores. Finally, weโll deploy the model with Streamlit.
๐ Dataset Overview
We use the Mall_Customers.csv dataset from Kaggle. It contains information about 200 customers:
Column | Description |
---|---|
CustomerID | Unique identifier |
Genre | Male / Female |
Age | Customerโs age |
Annual Income (k$) | Annual income in thousand dollars |
Spending Score (1โ100) | Score assigned by the mall |
โ๏ธ Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
import joblib
๐งน Step 2: Load and Preprocess Data
# Load dataset
df = pd.read_csv('Mall_Customers.csv')
# Clean columns
df.columns = df.columns.str.strip()
# Encode Gender
le = LabelEncoder()
df['Genre'] = le.fit_transform(df['Genre']) # Female=0, Male=1
# Drop CustomerID
X = df.drop(['CustomerID'], axis=1)
๐งฉ Step 3: Feature Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X[['Age', 'Annual Income (k$)', 'Spending Score (1โ100)']])
# Save scaler
joblib.dump(scaler, './Models/scaler.pkl')
๐ Step 4: KMeans Clustering
# Find optimal number of clusters using Elbow Method
inertia = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X_scaled)
inertia.append(kmeans.inertia_)
plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()
# Train KMeans
kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)
# Save the model
joblib.dump(kmeans, './Models/classifier.pkl')
๐ Step 5: Visualize Clusters
plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1โ100)'], c=df['Cluster'], cmap='viridis')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1โ100)')
plt.title('Customer Segments')
plt.show()
๐งฎ Step 6: Predicting Spending Score (Regression)
# Features and target
X_rf = df[['Genre', 'Age', 'Annual Income (k$)']]
y_rf = df['Spending Score (1โ100)']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_rf, y_rf, test_size=0.2, random_state=42)
# Train Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
# Evaluate
preds = rf.predict(X_test)
print(f'Rยฒ Score: {r2_score(y_test, preds):.2f}')
# Save model
joblib.dump(rf, './Models/Spending_Score.pkl')
๐ Step 7: Streamlit App for Deployment
Create a new file named app.py
:
import streamlit as st
import joblib
import numpy as np
# Load models
scaler = joblib.load('./Models/scaler.pkl')
kmeans = joblib.load('./Models/classifier.pkl')
rf = joblib.load('./Models/Spending_Score.pkl')
st.title('๐๏ธ Mall Customer Segmentation')
# Inputs
gender = st.selectbox('Gender', ['Female', 'Male'])
age = st.number_input('Age', 18, 70)
income = st.number_input('Annual Income (k$)', 10, 150)
# Encode gender
gender_encoded = 0 if gender == 'Female' else 1
# Predict Spending Score
predicted_score = rf.predict([[gender_encoded, age, income]])[0]
# Predict Cluster
scaled_features = scaler.transform([[age, income, predicted_score]])
cluster = kmeans.predict(scaled_features)[0]
st.subheader(f'๐ฐ Predicted Spending Score: {predicted_score:.2f}')
st.subheader(f'๐ Predicted Cluster: {cluster}')
๐ช Step 8: Run the Streamlit App
streamlit run app.py
Then open the provided localhost link to access your interactive dashboard.
๐ฏ Conclusion
Youโve successfully built and deployed a full Machine Learning + Streamlit project!
What You Learned
- How to preprocess and scale data
- Perform KMeans clustering for segmentation
- Train Random Forest for regression
- Deploy with Streamlit for interactivity
๐ป Author: Naman Srivastava
๐
Date: October 2025
๐ GitHub: Mall Customer Segmentation Project
Top comments (0)