DEV Community

Naman Srivastava
Naman Srivastava

Posted on

Mall Customer Segmentation using ML โ€” A Step-by-Step Tutorial

๐Ÿง  Introduction

In this project, we explore Customer Segmentation using the famous Mall Customers Dataset. We'll apply KMeans Clustering to group similar customers and Random Forest Regression to predict spending scores. Finally, weโ€™ll deploy the model with Streamlit.


๐Ÿ“Š Dataset Overview

We use the Mall_Customers.csv dataset from Kaggle. It contains information about 200 customers:

Column Description
CustomerID Unique identifier
Genre Male / Female
Age Customerโ€™s age
Annual Income (k$) Annual income in thousand dollars
Spending Score (1โ€“100) Score assigned by the mall

โš™๏ธ Step 1: Import Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
import joblib
Enter fullscreen mode Exit fullscreen mode

๐Ÿงน Step 2: Load and Preprocess Data

# Load dataset
df = pd.read_csv('Mall_Customers.csv')

# Clean columns
df.columns = df.columns.str.strip()

# Encode Gender
le = LabelEncoder()
df['Genre'] = le.fit_transform(df['Genre'])  # Female=0, Male=1

# Drop CustomerID
X = df.drop(['CustomerID'], axis=1)
Enter fullscreen mode Exit fullscreen mode

๐Ÿงฉ Step 3: Feature Scaling

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X[['Age', 'Annual Income (k$)', 'Spending Score (1โ€“100)']])

# Save scaler
joblib.dump(scaler, './Models/scaler.pkl')
Enter fullscreen mode Exit fullscreen mode

๐ŸŒ€ Step 4: KMeans Clustering

# Find optimal number of clusters using Elbow Method
inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

# Train KMeans
kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)

# Save the model
joblib.dump(kmeans, './Models/classifier.pkl')
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ˆ Step 5: Visualize Clusters

plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1โ€“100)'], c=df['Cluster'], cmap='viridis')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1โ€“100)')
plt.title('Customer Segments')
plt.show()
Enter fullscreen mode Exit fullscreen mode

๐Ÿงฎ Step 6: Predicting Spending Score (Regression)

# Features and target
X_rf = df[['Genre', 'Age', 'Annual Income (k$)']]
y_rf = df['Spending Score (1โ€“100)']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_rf, y_rf, test_size=0.2, random_state=42)

# Train Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Evaluate
preds = rf.predict(X_test)
print(f'Rยฒ Score: {r2_score(y_test, preds):.2f}')

# Save model
joblib.dump(rf, './Models/Spending_Score.pkl')
Enter fullscreen mode Exit fullscreen mode

๐ŸŒ Step 7: Streamlit App for Deployment

Create a new file named app.py:

import streamlit as st
import joblib
import numpy as np

# Load models
scaler = joblib.load('./Models/scaler.pkl')
kmeans = joblib.load('./Models/classifier.pkl')
rf = joblib.load('./Models/Spending_Score.pkl')

st.title('๐Ÿ›๏ธ Mall Customer Segmentation')

# Inputs
gender = st.selectbox('Gender', ['Female', 'Male'])
age = st.number_input('Age', 18, 70)
income = st.number_input('Annual Income (k$)', 10, 150)

# Encode gender
gender_encoded = 0 if gender == 'Female' else 1

# Predict Spending Score
predicted_score = rf.predict([[gender_encoded, age, income]])[0]

# Predict Cluster
scaled_features = scaler.transform([[age, income, predicted_score]])
cluster = kmeans.predict(scaled_features)[0]

st.subheader(f'๐Ÿ’ฐ Predicted Spending Score: {predicted_score:.2f}')
st.subheader(f'๐Ÿ“Š Predicted Cluster: {cluster}')
Enter fullscreen mode Exit fullscreen mode

๐Ÿช„ Step 8: Run the Streamlit App

streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

Then open the provided localhost link to access your interactive dashboard.


๐ŸŽฏ Conclusion

Youโ€™ve successfully built and deployed a full Machine Learning + Streamlit project!

What You Learned

  • How to preprocess and scale data
  • Perform KMeans clustering for segmentation
  • Train Random Forest for regression
  • Deploy with Streamlit for interactivity

๐Ÿ’ป Author: Naman Srivastava
๐Ÿ“… Date: October 2025
๐Ÿ”— GitHub: Mall Customer Segmentation Project

Top comments (0)