DEV Community

Cover image for DIVE INTO RFM DATA ANALYSIS
Isaac
Isaac

Posted on

DIVE INTO RFM DATA ANALYSIS

RFM is an acronym for Recency Frequency Monetary Data Analysis. It is a technique used by most data scientist in the e-commerce industry to rank customers based on how lately they bought goods from their store, how frequently they make purchases and how much in total they spend

RFM can be used to detect customers usage behavior and patterns, the number of customers who are frequent buyers or high spenders. This helps the company to come up with with new custom marketing strategies for each target group with an aim to accelerate sales and get the most out of each group. The Analysis is thus very important in coming up with better marketing strategies of a particular company.

In this article we will dive into a sample case scenario to explore some uses of RFM and how to leverage it in your industry. This is case study for learning purposes and not a full grade RFM Model
In an attempt to solve this I came up with this model:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

#Read csv File with raw data

myData= pd.read_csv("rfm_data.csv")

#Remove duplicates by making a copy of original Data
data = myData.copy(deep=True).drop_duplicates(subset=["CustomerID"],keep="last")


#Calculate frequency of each CustomerID
frequencySeries =myData.CustomerID.value_counts().reset_index(name="Frequency")
print(frequencySeries.loc[frequencySeries["CustomerID"]==8317])



#Appending frequency column to DataFrame
frequencySeries.rename({"index":"CustomerID"},inplace=True)
data = data.merge(frequencySeries,on="CustomerID",how="left")


#Total Money spent per user
moneySpent = myData.groupby("CustomerID")['TransactionAmount'].sum().reset_index(name="Total Spent")
data= data.merge(moneySpent,on="CustomerID",how="left")




recencyScore = [5,4,3,2,1,]
frequencyScore = [1,2,3,4,5]
monetaryScore = [1,2,3,4,5]

#Grading users 
data["recencyScore"] =pd.cut(data["recency"],bins=5,labels=recencyScore).astype(int)

data["FrequencyScore"] = pd.cut(data["Frequency"],bins=5,labels=frequencyScore).astype(int)

data["Monetary Score"] = pd.cut(data["Total Spent"],bins=5,labels=monetaryScore).astype(int)

data['totalScore'] = data["FrequencyScore"] + data["Monetary Score"] + data["recencyScore"]

#Ranking users based on ttal RFM score

myLabels = ["Beginner","Intermediate","PRO"]
data["Rank"] = pd.cut(data["totalScore"],bins=3,labels=myLabels)
Enter fullscreen mode Exit fullscreen mode

I then used matplotlib to show the various statistics of users
Beginner- Users with Low RFM Score
Intermediate - Users with Average RFM Score
Expert - Users with High RFM Score

1. Show the Number of users per ranking

plt.pie([len(typeBeginners),len(typeInter),len(typePro)],labels=myLabels,autopct='%1.1f%%')
plt.axis('equal')
plt.legend(labels=myLabels)
plt.show()

Pie chart to show users distribution

2. Show user distribution across major cities

def getType(myType,myCity) :
    return len(data.loc[(data["Rank"]==myType) & (data["Location"]==myCity)])

myCityStats={}
for i in myLabels:
    for j in locationLabels:
        myCityStats[i]= [getType(i,j) for j in locationLabels]

print(myCityStats)

x=np.arange(len(locationLabels))
width = 0.3
multiplier = 0

fig,ax = plt.subplots(layout='constrained')

for att,val in myCityStats.items():
    offset = width*multiplier
    bars = ax.bar(x+offset,val,width,label=att)
    ax.bar_label(bars,padding=3)
    multiplier+=1


# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Length (mm)')
ax.set_title('Number of users in a city per RFM Rank')
ax.set_xticks(x + width, locationLabels)
ax.legend(loc='upper left', ncols=3)
ax.set_ylim(0, 250)

plt.show()


Enter fullscreen mode Exit fullscreen mode

Group bar charts for Rank distribution in Major Cities
Group bar charts for Rank distribution in Major Cities

This are just but a few visualizations more can be done based on the questions you focus on answering based on your organization

Top comments (0)