teresa kungu

Posted on Feb 9

Deploying a Customer Lifetime Value (CLV) Prediction Model Using FastAPI

#datascience #machinelearning #regression #beginners

Introduction

Businesses want to know which customers are most valuable to them. Some customers spend more money, stay longer, and interact more with the business than others. If a company can predict this early, it can make better decisions about marketing, customer support, and retention.

This is where Customer Lifetime Value (CLV) becomes important. In this project, a machine learning model was built to predict CLV using customer data. The trained model was then deployed using FastAPI, allowing predictions to be made through a simple API.

Project Objectives

Explain what Customer Lifetime Value (CLV) is
Build a regression model to predict CLV
Compare models and choose the best one
Save the trained model for future use
Deploy the model using FastAPI
Predict CLV through an APIProject Objectives
Test the API to confirm it works

Understanding Customer Lifetime Value (CLV)

Customer Lifetime Value is the total amount of money a business expects to earn from a customer during their entire relationship with the company.
For example, if a customer spends a small amount every month but stays for many years, their CLV can be high. On the other hand, a customer who spends a lot once but never comes back may have a low CLV.

Predicting CLV helps businesses to:

Identify high-value customers
Spend marketing money wisely
Improve customer retention
Plan better customer engagement strategies

Because CLV is a number, predicting it is a regression problem in machine learning.

Dataset and Business Problem

The dataset used in this project is called customer_lifetime.csv. Each row represents one customer, and each column describes something about that customer.

Important columns include:

Customer_Age – Age of the customer
Annual_Income – Yearly income
Tenure_Months – How long the customer has been active
Monthly_Spend – Average monthly spending
Visits_Per_Month – Number of visits per month
Avg_Basket_Value – Average value per purchase
Support_Tickets – Number of support issues raised
CLV – Customer Lifetime Value (target variable)

The main goal is to predict CLV for new customers before spending money on marketing or retention.

Data Preparation

The first step was to load and explore the dataset to understand the data types and structure. The CLV column was identified as the target, while the remaining columns were used as input features.

The data was then split into:

Training data – used to teach the model
Testing data – used to check how well the model performs

Splitting the data is important because it shows how the model will perform on new, unseen customers.

Building the Regression Models

Since CLV is a continuous number, regression models were used.

Two models were built:

Linear Regression

This model was used as a baseline. It is simple, fast, and easy to understand, but it may not capture complex customer behavior.

Random Forest Regressor

This model uses many decision trees to make predictions. It handles complex relationships better and usually gives more accurate results.

Regression is suitable for this problem because the goal is to predict a numeric value, not categories.

Model Evaluation and Selection

Both models were evaluated using regression metrics such as:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R-squared (R²)

The Random Forest model performed better than Linear Regression because it captured more complex patterns in the data. For this reason, it was selected as the final model.

Saving the Model

The final trained model was saved using Joblib, along with the list of features used during training.

Saving the model is important because:

The model does not need to be retrained every time
Predictions are consistent
The API runs faster and more efficiently

The saved files are stored in a saved_model folder.

Deploying the Model Using FastAPI

FastAPI was used to deploy the model as a web API. FastAPI is a Python framework that makes it easy to create APIs and automatically validates input data.

The model is loaded when the API starts. The model is not retrained inside the API, which follows best practices for production systems.

How the API Works

The API has two main endpoints:

Health Check Endpoint

GET /
This endpoint confirms that the API is running correctly.

2.CLV Prediction Endpoint

POST /predict-clv

This endpoint:

Accepts customer data in JSON format
Checks if the input data is valid
Sends the data to the trained model
Returns the predicted CLV value

Testing the API

The API was tested using:

FastAPI Swagger UI
Postman

Successful testing showed that the API correctly accepts input data and returns CLV predictions.

Conclusion and Future Improvements

This project shows a complete machine learning process, from understanding customer data to deploying a working prediction API. Predicting CLV helps businesses make smarter and more data-driven decisions.

In a real business setting, the model could be improved by: