DEV Community: M.ngugi

Correlation of Claim Amount and Flag for Fraud

M.ngugi — Sat, 16 Dec 2023 06:22:54 +0000

Question :

What does the correlation of claim amount and flag for fraud at 0.86 and 1.0 respectively mean ? (See Figure 1.0 below)

The linear relationship of positive 1 shows a perfect relationship that if the claim amount increases the likelihood
of the fraud increases.

However more analysis need to be done using different Data science methods to find out the causation for fraud.

In conclusion the larger the amount of claim the more likely to be assosciated with fraud.

RFM Analysis System Project

M.ngugi — Mon, 09 Oct 2023 10:32:52 +0000

Problem Statement

RFM analysis is a powerful technique used by companies to better understand customer behavior and optimize engagement strategies. It revolves around three key dimensions:recency, frequency, and monetary value. These dimensions capture essential aspects of customer transactions, providing valuable information for segmentation and personalized marketing campaigns.

The given dataset is provided by an e-commerce platform containing customer transaction data including customer ID, purchase date, transaction amount, product information, ID
command and location. The platform aims to leverage RFM (recency, frequency, monetary value) analysis to segment customers and optimize customer engagement strategies.

Your task is to perform RFM analysis and develop customer segments based on their RFM scores.The analysis should provide insights into customer behaviour and identification of high-value customers,at-risk customers, and potential opportunities for personalized marketing
campaigns.

Understanding RFM Analysis System.

According to G.Wright(n.d).

“RFM anlysis is a marketing technique used to quantitatively rank and group customers based on the recency, frequenc and monetary total of their recent transactions to identify the best customers and perform targeted marketing campaigns.”

RFM The system assigns each customer numerical scores based on these factors to provide an objective analysis. RFM analysis is based on the marketing adage that "80% of your business comes from 20% of your customers."_

RFM Analysis

“RFM analysis scores customers on each of the three main factors. Generally, a score from 1 to 5 is given, with 5 being the highest.”

Definitions:

• **Recency** is the most recent time a customer purchased an item and it is measured in days, weeks, hours and years.

• Frequency is how often a customer purchase an item.

• Monetary this is how much a customer spends in a given period of time. 

• Segmentation of customers in RFM analysis.  This is identifying clusters of customers with similar attributes.

Customer Types

1. Whales : These are customers that tick all the three attributes with a high score of (5,5,5).
2. New Customers: Customers with a high recency but a low frequency and obviously low monetary value. (5. 1 , x)
3. Lapsed Customers: Customers with low recency but high value (1,X,5) were once valuable customers but have since stopped.

References and Resources:
G.Wright(n.d). RFM analysis (recency, frequency, monetary). TechTarget, Data Management. https://www.techtarget.com/searchdatamanagement/definition/RFM-analysis
Reference link:
https://statso.io/rfm-analysis-case-study/

Week 1 Project : Churn Prediction for Sprint.

M.ngugi — Thu, 28 Sep 2023 18:21:22 +0000

2023-09-28

Churn Prediction for Sprint Telecom
Week1 Write up : Lux Academy Data Science Bootcamp
Project Name : Churn Prediction for Sprint Telecom
Author: Peter Mwangi Ngugi

Problem Defination:

Problem Description:
Sprint Telecom one of the biggest Telecom companies in the USA are keen on figuring out how many customers might decide to leave them in the coming months.

Luckily, they’ve got a bunch of past data about when customers have left before, as well as info about who these customers are, what they’ve bought, and other things like that.

Objectives:

So, if you were in charge of predicting customer churn how would you go about using machine learning to make a good guess about which customers might leave? Like, what steps would you take to create a machine learning model that can predict if someone’s going to leave or not?

Solution

1. Review Existing Customer Data:

The first step is to ’assemble’ all the existing data that pertains the customers who have left Churn Sprint and the currently existing Customers. Then categorize the data into two groups , Customers who have left and the Current Customers.Start by analyzing Customer usage patterns , observe Customer communication for example Customer complaints and feedback support given back to
the Customer and review Customer payment plans. In a nutshell these three attributes are the major reasons why a Customer would leave Sprint Telecom.

2. Start Processing the Data: Once step 1 above is thoroughly done the next step is to process data by cleaning the data ambiguity including data inconsistencies in order to have accurate data to work with. This processing involves data encoding that is ensuring all the data required is in numerical format and that a metric function can be used to process .

3. Identify important elements that impact on Customers: At this step it is important to identify which features impacts on the Customer to leave Spirit Telecom services. For example identifying relationships, correlations and models that are of impact to customers.

4. Identify and Select a Model:
At this stage it important to identify which algorithm is suitable for prediction of the datasets obtained from step 1 and step 2 above. Some of the known Machine Learning algorithms are logistic regression, decision trees, random forests, gradient boosting , and neural networks.

5. Model Training and Evaluate the Model:
Train the model according to the dataset obtained using the necessary algorithm. Evaluate the model performance using data tests cross validating on the accuracy of the results
produced.

6. Interpretation, Visualization and Representation of Results:

This step ensures that the results obtained are readable, repeatable and such can be interpretative, and visually represented to the consumer in this case Sprint Telecom.

7. Project Execution and Deployment:
The algorithm is ready for deployment and needs to be implemented on real time customer data for continuous data retraining and continuous improvement as well tracking on the
results.

8. Maintenance and Feedback: Continuous re-evaluation of the process against feedback from the customers is import for customer retention and to identify solutions as to why they customers want to leave.

Question 2 .
Let’s say you’re a Product Data Scientist at Instagram. How would you measure the success of the Instagram TV product?