Introduction
RFM analysis is a powerful techniques used by businesses to segment their customers or understand them.It leverages three key points ie Recency , Frequency and Monetary value as they say a lot about customers behavior
steps
Step 1: Calculate the RFM values
The RFM values are the recency , frequency and monetary values.
Recency is calculated by subtracting a customers last transaction from the current date.
Frequency is the number of time a unique customer has purchased a product.Its is calculated by grouping the data by customer id and keeping track of the orderID count for each unique customer.
Monetary valueis total amount a customer has brought to the business.It's calculated by grouping the data by CostomerID and compounding the price of the product for each order placed.
You can join the different RFM values based on the customerID.
Step 2 :Calculate RFM scores
After getting the RFM values its time to calculate their RFM_Scores.You will need to have predefined bins and the *pd.cut() funtion*.
Bins are intervals in which continuous numerical data is to be divided into.Each bin has a certain number of categorical values for which every RFM_value will converted to a certain discrete value.EXAMPLE For every frequency value we can classify it in to one group from a bin like "High-freq , Mid-freq , Low-freq".This provides us with the RFM_Scores which a lable from "[High-freq , Mid-freq , Low-freq]".
Next is to create a new columns in the data set for each of the RFM scores.Make sure to keep track of the dtype .
NOTE
-1 RFM_Scores is not the same as RFM_Score
-2 To calculate the RFM_Score , RFM_Scores should numerical.If you used descriptive strings in you bins you can encode them taking in to consideration the weight of each RFM_Scores..
Step 3: RFM_Score segmentation
First we need to calculate the RFM Score which is gotten by adding all the RFM_scores for every unique customer.
Once you have the RFM_Score its time to segment them.A common method i came accross is the use of "[Hig-value ,Mid-value, Low-value]" as the list of lables.
we need the pd.qcut(data , q , labels = []) which splits the data into equal sized bins.data is the column containing the RFM_Score,q is the number of bins and labels is a list of labels for each bin.
Step 4: RFM Customer segmentation
This step involves classifying the customers into different groups based on their RFM_Score.It helps us understand the customers taking into consideration their recency , frequency , and monetary value.Here is an example of groups "[high-value customers ,potential opportunities,at-risk customers]"
Example
Customers with:
RFM_Score >= 9 = 'high-value customers
RFM_Score >= 6 and RFM_Score < 9 = potential opportunities
RFM_Score < 6 = at-risk customers
Sample
You can find the not book here with all the code.
Conclusion
RFM is used to suegments the customres based on their buying behavior.It stands fro Recency , Frequency , Monetary value.
After getting the customers segments one can visualize the data for analysis.
Here is an article about different visualization techniques
Top comments (0)