Beatrice Njagi

Posted on Mar 10 • Edited on May 1

Customer Lifetime Value (CLV) Prediction Model Using FastAPI

#datascience #fastapi #machinelearning #python

Customer Lifetime Value (CLV) is one of the most practically useful metrics a data-driven business can track. At its core, CLV estimates the total revenue a business can expect from a single customer over the entire duration of their relationship. Rather than treating every customer the same, CLV helps businesses identify which customers are worth investing in, which are at risk of churning, and how to allocate marketing and retention budgets more intelligently.

For ride-hailing platforms, e-commerce companies, and subscription services alike, predicting CLV accurately can be the difference between sustainable growth and expensive mistakes.

This article walks through how to build a CLV prediction model and deploy it as a live API using FastAPI.

The Data

The dataset used for this project contains customer records with seven input features and one target variable:

Customer_Age — the age of the customer, which can influence spending patterns and platform engagement
Annual_Income — the customer's yearly income, used as a proxy for overall purchasing power
Tenure_Months — how long the customer has been active, measured in months
Monthly_Spend — the average amount the customer spends per month on the platform
Visits_Per_Month — how frequently the customer engages with the platform each month
Avg_Basket_Value — the average value of each transaction or order placed
Support_Tickets — the number of support or complaint tickets raised, which can signal dissatisfaction and churn risk

The target variable is CLV — a score representing the lifetime value of each customer.

Model Selection: Linear Regression vs. Random Forest

Two models were trained and evaluated: Linear Regression and a Random Forest Regressor.

Linear Regression assumes a straight-line relationship between the input features and the CLV target. It is interpretable, fast to train, and performs reliably when the relationships in the data are consistent and proportional.

Random Forest is an ensemble method that builds many decision trees during training and averages their outputs, making it well-suited for capturing complex, non-linear patterns in data.

The models were evaluated using three common metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). These metrics measure how closely the predicted values match the actual values. These were results:

Linear Regression — MSE: 272,939,026 | RMSE: 16,520.87 | R2: 0.9398
Random Forest — MSE: 432,071,884 | RMSE: 20,786.34 | R2: 0.9047

After evaluation, Linear Regression performed better than Random Forest. It achieved a lower RMSE and a higher R² score, meaning its predictions were more accurate. Because of this performance advantage, Linear Regression was selected as the final model and saved for deployment along with the model features.

Deploying the Model with FastAPI

After training the model, the next step was to make it accessible for real-world use. This was done using FastAPI.

The model was loaded into a Python script that defines an API. FastAPI uses Pydantic to validate input data. A class was created using BaseModel to define the expected input features such as customer age, income, tenure, and spending patterns. This ensures that any request sent to the API contains the correct data types and required fields.

The API includes a /predict endpoint that accepts a POST request containing customer information in JSON format. When the request is received, the model processes the data and returns a predicted CLV value as a JSON response.

Running and Testing the API

The API was run locally using Uvicorn, an ASGI server used to run FastAPI applications. The server can be started using the command:

uvicorn main:app --reload

FastAPI automatically generates interactive API documentation that can be accessed at:

http://127.0.0.1:8000/docs

This interface allows users to test the API directly from a web browser by entering sample customer data.

For programmatic testing, a request using Python's requests body looks like this:

A successful response returns something like:

This output tells the business that, based on this customer's behaviour, they are predicted to generate $49322.59 in lifetime value — a number that can directly inform decisions around discounts, loyalty rewards, or re-engagement campaigns.

Conclusion

Building a CLV prediction model is only half the work — getting it into a form that others can actually use is what makes it valuable. FastAPI makes that second half much more manageable than it might seem. With a trained Linear Regression model, a few lines of code, and Pydantic handling the input validation, the API was up and running without much friction.

DEV Community

Customer Lifetime Value (CLV) Prediction Model Using FastAPI

Top comments (0)