RFM is an acronym for Recency Frequency Monetary Data Analysis. It is a technique used by most data scientist in the e-commerce industry to rank customers based on how lately they bought goods from their store, how frequently they make purchases and how much in total they spend
RFM can be used to detect customers usage behavior and patterns, the number of customers who are frequent buyers or high spenders. This helps the company to come up with with new custom marketing strategies for each target group with an aim to accelerate sales and get the most out of each group. The Analysis is thus very important in coming up with better marketing strategies of a particular company.
In this article we will dive into a sample case scenario to explore some uses of RFM and how to leverage it in your industry. This is case study for learning purposes and not a full grade RFM Model
In an attempt to solve this I came up with this model:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
#Read csv File with raw data
myData= pd.read_csv("rfm_data.csv")
#Remove duplicates by making a copy of original Data
data = myData.copy(deep=True).drop_duplicates(subset=["CustomerID"],keep="last")
#Calculate frequency of each CustomerID
frequencySeries =myData.CustomerID.value_counts().reset_index(name="Frequency")
print(frequencySeries.loc[frequencySeries["CustomerID"]==8317])
#Appending frequency column to DataFrame
frequencySeries.rename({"index":"CustomerID"},inplace=True)
data = data.merge(frequencySeries,on="CustomerID",how="left")
#Total Money spent per user
moneySpent = myData.groupby("CustomerID")['TransactionAmount'].sum().reset_index(name="Total Spent")
data= data.merge(moneySpent,on="CustomerID",how="left")
recencyScore = [5,4,3,2,1,]
frequencyScore = [1,2,3,4,5]
monetaryScore = [1,2,3,4,5]
#Grading users
data["recencyScore"] =pd.cut(data["recency"],bins=5,labels=recencyScore).astype(int)
data["FrequencyScore"] = pd.cut(data["Frequency"],bins=5,labels=frequencyScore).astype(int)
data["Monetary Score"] = pd.cut(data["Total Spent"],bins=5,labels=monetaryScore).astype(int)
data['totalScore'] = data["FrequencyScore"] + data["Monetary Score"] + data["recencyScore"]
#Ranking users based on ttal RFM score
myLabels = ["Beginner","Intermediate","PRO"]
data["Rank"] = pd.cut(data["totalScore"],bins=3,labels=myLabels)
I then used matplotlib to show the various statistics of users
Beginner- Users with Low RFM Score
Intermediate - Users with Average RFM Score
Expert - Users with High RFM Score
1. Show the Number of users per ranking
plt.pie([len(typeBeginners),len(typeInter),len(typePro)],labels=myLabels,autopct='%1.1f%%')
plt.axis('equal')
plt.legend(labels=myLabels)
plt.show()
2. Show user distribution across major cities
def getType(myType,myCity) :
return len(data.loc[(data["Rank"]==myType) & (data["Location"]==myCity)])
myCityStats={}
for i in myLabels:
for j in locationLabels:
myCityStats[i]= [getType(i,j) for j in locationLabels]
print(myCityStats)
x=np.arange(len(locationLabels))
width = 0.3
multiplier = 0
fig,ax = plt.subplots(layout='constrained')
for att,val in myCityStats.items():
offset = width*multiplier
bars = ax.bar(x+offset,val,width,label=att)
ax.bar_label(bars,padding=3)
multiplier+=1
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Length (mm)')
ax.set_title('Number of users in a city per RFM Rank')
ax.set_xticks(x + width, locationLabels)
ax.legend(loc='upper left', ncols=3)
ax.set_ylim(0, 250)
plt.show()
Group bar charts for Rank distribution in Major Cities
This are just but a few visualizations more can be done based on the questions you focus on answering based on your organization
Top comments (0)