Customer Lifetime Value (CLV) is one of the most practically useful metrics a data-driven business can track. At its core, CLV estimates the total revenue a business can expect from a single customer over the entire duration of their relationship. Rather than treating every customer the same, CLV helps businesses identify which customers are worth investing in, which are at risk of churning, and how to allocate marketing and retention budgets more intelligently.
For ride-hailing platforms, e-commerce companies, and subscription services alike, predicting CLV accurately can be the difference between sustainable growth and expensive mistakes.
This article walks through how to build a CLV prediction model and deploy it as a live API using FastAPI.
The Data
The dataset used for this project contains customer records with seven input features and one target variable:
- Customer_Age — the age of the customer, which can influence spending patterns and platform engagement
- Annual_Income — the customer's yearly income, used as a proxy for overall purchasing power
- Tenure_Months — how long the customer has been active, measured in months
- Monthly_Spend — the average amount the customer spends per month on the platform
- Visits_Per_Month — how frequently the customer engages with the platform each month
- Avg_Basket_Value — the average value of each transaction or order placed
- Support_Tickets — the number of support or complaint tickets raised, which can signal dissatisfaction and churn risk
The target variable is CLV — a score representing the lifetime value of each customer.
Model Selection: Linear Regression vs. Random Forest
Two models were trained and evaluated: Linear Regression and a Random Forest Regressor.
Linear Regression assumes a straight-line relationship between the input features and the CLV target. It is interpretable, fast to train, and performs reliably when the relationships in the data are consistent and proportional.
Random Forest is an ensemble method that builds many decision trees during training and averages their outputs, making it well-suited for capturing complex, non-linear patterns in data.
The models were evaluated using three common metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). These metrics measure how closely the predicted values match the actual values. These were results:
- Linear Regression — MSE: 272,939,026 | RMSE: 16,520.87 | R2: 0.9398
- Random Forest — MSE: 432,071,884 | RMSE: 20,786.34 | R2: 0.9047
After evaluation, Linear Regression performed better than Random Forest. It achieved a lower RMSE and a higher R² score, meaning its predictions were more accurate. Because of this performance advantage, Linear Regression was selected as the final model and saved for deployment along with the model features.
Deploying the Model with FastAPI
After training the model, the next step was to make it accessible for real-world use. This was done using FastAPI.
The model was loaded into a Python script that defines an API. FastAPI uses Pydantic to validate input data. A class was created using BaseModel to define the expected input features such as customer age, income, tenure, and spending patterns. This ensures that any request sent to the API contains the correct data types and required fields.
The API includes a /predict endpoint that accepts a POST request containing customer information in JSON format. When the request is received, the model processes the data and returns a predicted CLV value as a JSON response.
Running and Testing the API
The API was run locally using Uvicorn, an ASGI server used to run FastAPI applications. The server can be started using the command:
uvicorn main:app --reload
FastAPI automatically generates interactive API documentation that can be accessed at:
This interface allows users to test the API directly from a web browser by entering sample customer data.
For programmatic testing, a request using Python's requests body looks like this:
A successful response returns something like:
This output tells the business that, based on this customer's behaviour, they are predicted to generate $49322.59 in lifetime value — a number that can directly inform decisions around discounts, loyalty rewards, or re-engagement campaigns.
Conclusion
Building a CLV prediction model is only half the work — getting it into a form that others can actually use is what makes it valuable. FastAPI makes that second half much more manageable than it might seem. With a trained Linear Regression model, a few lines of code, and Pydantic handling the input validation, the API was up and running without much friction.




Top comments (0)