Introduction
Businesses want to know which customers are most valuable to them. Some customers spend more money, stay longer, and interact more with the business than others. If a company can predict this early, it can make better decisions about marketing, customer support, and retention.
This is where Customer Lifetime Value (CLV) becomes important. In this project, a machine learning model was built to predict CLV using customer data. The trained model was then deployed using FastAPI, allowing predictions to be made through a simple API.
Project Objectives
- Explain what Customer Lifetime Value (CLV) is
- Build a regression model to predict CLV
- Compare models and choose the best one
- Save the trained model for future use
- Deploy the model using FastAPI
- Predict CLV through an APIProject Objectives
- Test the API to confirm it works
Understanding Customer Lifetime Value (CLV)
- Customer Lifetime Value is the total amount of money a business expects to earn from a customer during their entire relationship with the company.
- For example, if a customer spends a small amount every month but stays for many years, their CLV can be high. On the other hand, a customer who spends a lot once but never comes back may have a low CLV.
Predicting CLV helps businesses to:
- Identify high-value customers
- Spend marketing money wisely
- Improve customer retention
- Plan better customer engagement strategies
Because CLV is a number, predicting it is a regression problem in machine learning.
Dataset and Business Problem
The dataset used in this project is called customer_lifetime.csv. Each row represents one customer, and each column describes something about that customer.
Important columns include:
- Customer_Age – Age of the customer
- Annual_Income – Yearly income
- Tenure_Months – How long the customer has been active
- Monthly_Spend – Average monthly spending
- Visits_Per_Month – Number of visits per month
- Avg_Basket_Value – Average value per purchase
- Support_Tickets – Number of support issues raised
- CLV – Customer Lifetime Value (target variable)
The main goal is to predict CLV for new customers before spending money on marketing or retention.
Data Preparation
The first step was to load and explore the dataset to understand the data types and structure. The CLV column was identified as the target, while the remaining columns were used as input features.
The data was then split into:
- Training data – used to teach the model
- Testing data – used to check how well the model performs
Splitting the data is important because it shows how the model will perform on new, unseen customers.
Building the Regression Models
Since CLV is a continuous number, regression models were used.
Two models were built:
- Linear Regression
This model was used as a baseline. It is simple, fast, and easy to understand, but it may not capture complex customer behavior.
- Random Forest Regressor
This model uses many decision trees to make predictions. It handles complex relationships better and usually gives more accurate results.
Regression is suitable for this problem because the goal is to predict a numeric value, not categories.
Model Evaluation and Selection
Both models were evaluated using regression metrics such as:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R-squared (R²)
The Random Forest model performed better than Linear Regression because it captured more complex patterns in the data. For this reason, it was selected as the final model.
Saving the Model
The final trained model was saved using Joblib, along with the list of features used during training.
Saving the model is important because:
- The model does not need to be retrained every time
- Predictions are consistent
- The API runs faster and more efficiently
The saved files are stored in a saved_model folder.
Deploying the Model Using FastAPI
FastAPI was used to deploy the model as a web API. FastAPI is a Python framework that makes it easy to create APIs and automatically validates input data.
The model is loaded when the API starts. The model is not retrained inside the API, which follows best practices for production systems.
How the API Works
The API has two main endpoints:
- Health Check Endpoint
GET /
This endpoint confirms that the API is running correctly.
2.CLV Prediction Endpoint
POST /predict-clv
This endpoint:
- Accepts customer data in JSON format
- Checks if the input data is valid
- Sends the data to the trained model
- Returns the predicted CLV value
Testing the API
The API was tested using:
- FastAPI Swagger UI
- Postman
Successful testing showed that the API correctly accepts input data and returns CLV predictions.
Conclusion and Future Improvements
This project shows a complete machine learning process, from understanding customer data to deploying a working prediction API. Predicting CLV helps businesses make smarter and more data-driven decisions.
In a real business setting, the model could be improved by:
- Adding more customer behavior data
- Monitoring predictions over time
- Retraining the model with new data
Overall, this project demonstrates how machine learning models can be moved from development into real-world applications using FastAPI.


Top comments (0)