<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CptWycliffe</title>
    <description>The latest articles on DEV Community by CptWycliffe (@cptwycliffe).</description>
    <link>https://dev.to/cptwycliffe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1061567%2F1be22c87-a0c3-47d4-ad1f-91b10639f920.png</url>
      <title>DEV Community: CptWycliffe</title>
      <link>https://dev.to/cptwycliffe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cptwycliffe"/>
    <language>en</language>
    <item>
      <title>Churn Prediction: Using Gradio</title>
      <dc:creator>CptWycliffe</dc:creator>
      <pubDate>Wed, 07 Feb 2024 16:25:31 +0000</pubDate>
      <link>https://dev.to/cptwycliffe/churn-prediction-using-gradio-4an9</link>
      <guid>https://dev.to/cptwycliffe/churn-prediction-using-gradio-4an9</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff606bf96-ba1f-474d-9524-a2f6bd33fe6a" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff606bf96-ba1f-474d-9524-a2f6bd33fe6a" alt="head image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  📢 ChurnPredict Pro: Customer Churn Prediction App with Gradio💥🔍
&lt;/h1&gt;

&lt;p&gt;In today's competitive business landscape, understanding and predicting customer behavior is paramount. One crucial aspect of this is forecasting customer churn, which can help businesses make data-driven decisions to enhance customer retention and satisfaction. ChurnPredict Pro is a cutting-edge web application that leverages the power of machine learning, specifically a Random Forest Classifier model, to provide real-time customer churn predictions. &lt;/p&gt;

&lt;p&gt;This article takes you on a journey through the development process of ChurnPredict Pro, highlighting the technology stack, features, and the exciting journey of integrating machine learning models for accurate churn predictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;ChurnPredict Pro is an innovative web application designed to predict customer churn.  The application allows users to input customer data effortlessly and receive instant churn predictions, thereby enabling them to take proactive measures to retain valuable customers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Importance of Customer Churn Prediction
&lt;/h2&gt;

&lt;p&gt;Customer churn, the loss of customers, can have a significant impact on a company's bottom line. By predicting churn, businesses can take preventive actions, such as targeted marketing and improved customer service, to reduce customer attrition. This predictive power can lead to higher customer satisfaction, improved profitability, and a more sustainable business model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technology Stack
&lt;/h2&gt;

&lt;p&gt;ChurnPredict Pro relies on a robust technology stack to deliver real-time customer churn predictions. Here are the key technologies used in the development of the application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Gradio: A Python library for building interactive interfaces, which is the foundation of the user interface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pandas: A widely used library for data manipulation and analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scikit-Learn: A machine learning library that simplifies the implementation of various machine learning algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Joblib: Used for serialization and deserialization of machine learning models.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Development Process
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;### Data Collection, Preprocessing and Model Training&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The development of ChurnPredict Pro began with data. Historical customer information used in the previous project titled "📢 Unlocking Insights: Decoding Telecommunication Customer Churn Through Machine Learning!💥🔍" was used as the basis for training a Random Forest Classifier model. This data included various customer attributes, such as gender, senior citizen status, contract details, and payment method, which are used to make predictions about churn. The Random Forest Classifier is chosen because it emerged as the best performing model to handle churn prediction.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;### Preprocessor and Model Exports&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The preprocessing steps and the Random Forest Classifier model were exported from the notebook using Joblib. This ensured that the preprocessor and model were readily available for further preprocessing tasks and forecasting within the app.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F83a0f8e1-d063-41e3-a592-39d77edeab45" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F83a0f8e1-d063-41e3-a592-39d77edeab45" alt="prep 1"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F70acd62b-d75f-469c-9923-131abc0c915b" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F70acd62b-d75f-469c-9923-131abc0c915b" alt="prep 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;### Building the User Interface&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The user interface serves as the front end of ChurnPredict Pro. Built with Gradio, it offers an interactive and intuitive experience. The design allows users to effortlessly input customer data, and with a single click, receive real-time churn predictions.&lt;/p&gt;

&lt;p&gt;Using Gradio Blocks, app's structure is organized into 2 main elements:&lt;/p&gt;

&lt;p&gt;The main function responsible for the preprocessing of the input data and returning the churn prediction and the customer information in a DataFrame.&lt;/p&gt;

&lt;p&gt;The output which consist of two components responsible for displaying the prediction and customer information.&lt;/p&gt;

&lt;p&gt;The UI is composed of Rows and Columns for Layout and gradio components to receive the inputs from the user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff8f75691-3f04-423a-a447-7e6239daea63" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff8f75691-3f04-423a-a447-7e6239daea63" alt="app header"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff976061a-5f9f-4361-b2f1-ca0d3ab452d3" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Ff976061a-5f9f-4361-b2f1-ca0d3ab452d3" alt="more cus info"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fa0d7ca84-2741-4cd8-9a43-ff605cfe41e7" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fa0d7ca84-2741-4cd8-9a43-ff605cfe41e7" alt="submit and pred"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;### Building the Logic&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Upon submission of customer data, the submit button calls the churn_predict function and passes the customer data as input to the function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fe30c8755-c7a9-46b6-9334-8a75d3cfaf01" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fe30c8755-c7a9-46b6-9334-8a75d3cfaf01" alt="submit code"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F6708068a-c7b9-436a-9e0c-a8683d669f61" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F6708068a-c7b9-436a-9e0c-a8683d669f61" alt="func"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The churn_predict function is invoked, which initiates the data processing as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The preprocessor and model are loaded using joblib.load from the model files. These components are essential for making churn predictions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customer data (received through *args)  is converted into a DataFrame.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The categorical feature "SeniorCitizen" data is converted from "Yes"/"No" to "1"/"0" for machine learning compatibility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The preprocessor is applied to transform the user's input.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictions are made using the Random Forest Classifier model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The DataFrame of the customer data and the predicted churn status is returned from the function.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fc1e6bc74-d700-433f-9a48-b3d0d067ba47" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2Fc1e6bc74-d700-433f-9a48-b3d0d067ba47" alt="code on output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output is displayed in a user-friendly format, the Gradio Dataframe component displays the Customer Information as received from the user, and the Gradio Label component displays the prediction making it easy for businesses to understand the likelihood of churn and make informed decisions to retain customers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;### Displaying Results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple click on the "Submit" button delivers a real-time churn prediction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F037aae69-f99b-4a36-97f7-d99b9c535626" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fsnyamson%2FP4-ChurnPredict-Pro%2Fassets%2F58486437%2F037aae69-f99b-4a36-97f7-d99b9c535626" alt="pred"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ChurnPredict Pro simplifies churn prediction. With real-time predictions, businesses can take immediate actions to enhance customer retention. The application provides a clear prediction of whether a customer is likely to churn or stay, helping businesses plan their customer management strategies effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;ChurnPredict Pro is more than a churn prediction tool; it's a solution that empowers businesses to optimize customer management. The development process involved data collection, model training, and the creation of an interactive user interface. ChurnPredict Pro exemplifies the potential of machine learning in real-time decision-making. Businesses can now anticipate customer churn and take the necessary steps to enhance customer satisfaction and profitability.&lt;/p&gt;

&lt;p&gt;The journey of developing ChurnPredict Pro is a testament to the power of combining machine learning and user-friendly applications. With the ability to predict customer churn, businesses can stay ahead of the competition and deliver the best possible service to their customers.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Leveraging on ML : Customer Churn Analysis:</title>
      <dc:creator>CptWycliffe</dc:creator>
      <pubDate>Tue, 13 Jun 2023 20:38:31 +0000</pubDate>
      <link>https://dev.to/cptwycliffe/leveraging-on-ml-customer-churn-analysis-59kb</link>
      <guid>https://dev.to/cptwycliffe/leveraging-on-ml-customer-churn-analysis-59kb</guid>
      <description>&lt;h2&gt;
  
  
  Analyzing the Telco Dataset
&lt;/h2&gt;

&lt;p&gt;In a rapidly evolving business industry, companies are constantly seeking ways to optimize their operations enhance customer experience and  customer retention. In a bid to increase profit and revenue margin, customer retention is a key factor that any business entity or industry players must focus their resources. With the customers as the key revenue stream source, its therefore imperative to minimize the customer attrition. &lt;/p&gt;

&lt;p&gt;One powerful approach is leveraging machine learning (ML) techniques to analyze large datasets and gain valuable insights into customer attrition factors and thereby formulate necessary intervention strategies.&lt;/p&gt;

&lt;p&gt;In this article, we present a comprehensive analysis for the Telco Dataset using different ML models&lt;/p&gt;

&lt;p&gt;Business Understanding &lt;br&gt;
The business problem is that the customer churn at Telco is impacting the company's revenue and profitability.&lt;/p&gt;

&lt;p&gt;Telco is a telecommunications company that provides phone and internet services to its customers. They have observed that their customer churn rate has been increasing over the past year, and they want to understand the reasons behind it and predict which customers are most likely to churn in the future. &lt;/p&gt;

&lt;p&gt;In this project, we will analyse customers data from Telco in order to determine the key indicators of churn, and thus, formulate retention strategies that can be implemented to avert the problem. &lt;/p&gt;

&lt;p&gt;Scope : The project will focus on analyzing the customer data sources, including customer type, services used and billing information. The analysis will cover just a portion of the number of customers and churn definition will be when a customer cancels their subscription.&lt;/p&gt;

&lt;p&gt;Data Understanding &lt;br&gt;
The data for provided for thisproject is in a csv format. The following describes the columns present in the data.&lt;/p&gt;

&lt;p&gt;Gender -- Whether the customer is a male or a female&lt;br&gt;
SeniorCitizen -- Whether a customer is a senior citizen or not&lt;br&gt;
Partner -- Whether the customer has a partner or not (Yes, No)&lt;br&gt;
Dependents -- Whether the customer has dependents or not (Yes, No)&lt;br&gt;
Tenure -- Number of months the customer has stayed with the company&lt;br&gt;
Phone Service -- Whether the customer has a phone service or not (Yes, No)&lt;br&gt;
MultipleLines -- Whether the customer has multiple lines or not&lt;br&gt;
InternetService -- Customer's internet service provider (DSL, Fiber Optic, No)&lt;br&gt;
OnlineSecurity -- Whether the customer has online security or not (Yes, No, No Internet)&lt;br&gt;
OnlineBackup -- Whether the customer has online backup or not (Yes, No, No Internet)&lt;br&gt;
DeviceProtection -- Whether the customer has device protection or not (Yes, No, No internet service)&lt;br&gt;
TechSupport -- Whether the customer has tech support or not (Yes, No, No internet)&lt;br&gt;
StreamingTV -- Whether the customer has streaming TV or not (Yes, No, No internet service)&lt;br&gt;
StreamingMovies -- Whether the customer has streaming movies or not (Yes, No, No Internet service)&lt;br&gt;
Contract -- The contract term of the customer (Month-to-Month, One year, Two year)&lt;br&gt;
PaperlessBilling -- Whether the customer has paperless billing or not (Yes, No)&lt;br&gt;
Payment Method -- The customer's payment method (Electronic check, mailed check, Bank transfer(automatic), Credit card(automatic))&lt;br&gt;
MonthlyCharges -- The amount charged to the customer monthly&lt;br&gt;
TotalCharges -- The total amount charged to the customer&lt;br&gt;
Churn -- Whether the customer churned or not (Yes or No)&lt;/p&gt;

&lt;p&gt;We’ll therefore analyze the relations between customer churn and any of the independent variables (gender, senior citizen status, partner, dependents, tenure, phone service, multiple lines, internet service, online security, online backup, device protection, tech support, streaming TV, streaming movies, contract, paperless billing, payment method, monthly charges, total charges)  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hypothesis&lt;/strong&gt; &lt;br&gt;
H0: &lt;br&gt;
We hypothesize that there is a significant relationship between customer churn and at least two of the independent variables. &lt;/p&gt;

&lt;p&gt;H1: Alternative hypothesis - There is no significant relationship between any of independent variable and customer churn. &lt;/p&gt;

&lt;p&gt;Research Questions &lt;br&gt;
In understanding the Telco customers data trends, we are going to research on the following questions: &lt;br&gt;
1.Is there a correlation between contract length and customer churn?&lt;br&gt;
2.Do customers who have online security and backup services have lower churn rates?&lt;br&gt;
3.Does the payment method have an impact on customer churn?&lt;br&gt;
4.Is there a difference in churn rates between male and female customers?&lt;br&gt;
5.Are customers with dependents less likely to churn compared to those without dependents?&lt;br&gt;
6.Is there a correlation between the Total Charge and customer churn?&lt;br&gt;
7.Are the customers who get TechSupport service less likey to church?&lt;br&gt;
8.Is there a relationship between the customers who get Device protections and churn?&lt;br&gt;
These questions will give us insights into the dataset, understand the distributions of variables, identify correlations, and discover any patterns or anomalies that may impact customer churn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Findings&lt;/strong&gt; &lt;br&gt;
&lt;strong&gt;&lt;em&gt;Plotting the categorical feature&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bYA_J-sx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mncgu5pe14mf8f8xrqck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bYA_J-sx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mncgu5pe14mf8f8xrqck.png" alt="Image description" width="800" height="629"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Plotting the KDE for the numerical feature&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EXYJ-7Kf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fckr88w2exk0lazmrl0q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EXYJ-7Kf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fckr88w2exk0lazmrl0q.png" alt="Image description" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--alAQPQ0U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fbtp62lz9wycvqdoinqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--alAQPQ0U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fbtp62lz9wycvqdoinqz.png" alt="Image description" width="800" height="367"&gt;&lt;/a&gt;&lt;br&gt;
From the E.D.A we make the following findings &lt;br&gt;
(i)From the graph Bar Plot of Contract, we can see that more customer that churn have the Month-to-Month subscription. We can therefore conclude that there is a negative correlation between the contact length and customer churn; i.e the shorter the contract length, the more likely that the customer will churn.&lt;/p&gt;

&lt;p&gt;(ii)From the Bar Plot of Online Security and Bar Plot of Online Backup While it is evident that that there is lower churn rates for customers with online security and onlineBackup services, there was still more customers that did not have these services but still churned. &lt;br&gt;
(iii)While there is 0.48 probability that a customer without both OnlineSecurity and OnlineBackup will churn, the probability of a customer with both services churning is just 0.1.&lt;br&gt;
(iv)Based on the findings above and from the graph Bar plot of PaymentMethod, we can observe that payment method does have an impact on customer attrition. However, we can conclude that customers using electronic check as their payment method have a significantly higher churn rate compared to other payment methods.&lt;br&gt;
(v)The gender of the customer does not affect the probability of customer attrition. &lt;br&gt;
(vi)Of the total number of customers that churned, abt 66 percent had no dependants while about 33% had dependants.We can therefore conclude that customers with dependants are less likey to churn.&lt;br&gt;
(vii)The customers that did not have TechSuport had the highest attrition.&lt;br&gt;
(viii)A large proportion of customers who churner did not have device protection.&lt;/p&gt;

&lt;p&gt;Data Pre-processing &lt;br&gt;
Since our target variable is “Churn”, we will use the syntax pandas.value_counts() to dertermine the number of the target class. &lt;/p&gt;

&lt;p&gt;From the class, we see that the data is not balance. &lt;br&gt;
Step 1. &lt;br&gt;
We will balance data using the oversampling technique&lt;/p&gt;

&lt;p&gt;Step2. We then split the dataset into train and test&lt;/p&gt;

&lt;p&gt;Step 3. Since our categorical columns each have a maximum of three unique feature, we will use the OrdinalEncoder for categorical feature encoding. This technique will assign each unique value an inter 1,2 or 3. &lt;/p&gt;

&lt;p&gt;Step 4. Scale the numerical columns using StandardScaler. &lt;br&gt;
Model Building and Evaluation &lt;br&gt;
We will then train and Evaluate different models which include &lt;br&gt;
1)Logistic Regression &lt;br&gt;
2)Decission Tree Classifier&lt;br&gt;
3)Random Forest Classifier&lt;br&gt;
4)Catboost Classifier &lt;br&gt;
5)Gradient Boosting Classifier&lt;br&gt;
6)Vaive Bayes&lt;br&gt;
7)Nearest Neighbours (KNN)&lt;br&gt;
8)Multi-Layer Perceptron &lt;br&gt;
9)Stochastic Gradient Descent (SGD)&lt;br&gt;
10)AdaBoost&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results and Analysis&lt;/strong&gt; &lt;br&gt;
From the models trained and evaluated, the best two models are SGD and adaboost with the following metrics&lt;br&gt;
AdaBoost &lt;br&gt;
Accuracy : 0.80&lt;br&gt;
F1_Score : 78&lt;br&gt;
Precission : 0.78&lt;br&gt;
Recall : 0.80&lt;/p&gt;

&lt;p&gt;SGD &lt;br&gt;
Accuracy : 0.96&lt;br&gt;
F1_Score : 78&lt;br&gt;
Precission : 0.78&lt;br&gt;
Recall : 0.80&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendation&lt;/strong&gt; &lt;br&gt;
a)This insight can help inform strategies to reduce churn, such as promoting alternative payment methods or providing incentives for customers to switch from electronic check to more stable payment methods&lt;br&gt;
b)It's imparative fot the company to initiate measure of encouraging more customers to use TechSupport services in order to reduce the customers' turnover probability.&lt;br&gt;
c)It is necessary for the company to formulate incentive strategies to encourage customers to use the device protection service. &lt;br&gt;
d)THe company can continuously use ML algorithms the find the probability of a customer dropping the service, and thereby employ necessary measures to ensure customer retention. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Exploring the Indian Startup Funding Ecosystem for the years 2018-to-2021</title>
      <dc:creator>CptWycliffe</dc:creator>
      <pubDate>Sun, 09 Apr 2023 19:00:47 +0000</pubDate>
      <link>https://dev.to/cptwycliffe/exploring-the-indian-startup-funding-ecosystem-for-the-years-2018-to-2021-50og</link>
      <guid>https://dev.to/cptwycliffe/exploring-the-indian-startup-funding-ecosystem-for-the-years-2018-to-2021-50og</guid>
      <description>&lt;h1&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;While it's reported that only 1 in every five startup businesses fail within the first year, the situation in India is not any different. A recent study of the Indian startup economy indicated that whereas India has 105 unicorn startups, surprisingly only 23 of them are profitable. &lt;br&gt;
This shocking trend is compounded by the fact that only 10% of Indian startups live to see their 5th anniversary, thereby sinking in billions in funding from venture capitalists and investment firms around the world.&lt;/p&gt;

&lt;p&gt;It's therefore imperative that Venture Capitalists make a number of considerations before committing resources in any startup venture. &lt;/p&gt;

&lt;p&gt;This project aims to explore and gain insight into the Indian startup funding ecosystem through an in-depth data analysis. We will try to understand the venture capital funding trends as observed in the year 2018-to-2021. Some of the factors we will consider are; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Location/headquarters of the Business startup&lt;/li&gt;
&lt;li&gt;The Sector/Industry of the business venture&lt;/li&gt;
&lt;li&gt;The Startup year&lt;/li&gt;
&lt;li&gt;The stage at which the business is in when seeking funding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Null hypothesis in this projects suggests that Indian Start-ups in the technology industry are likely to receive funding. The alternative hypothesis argues that no factor will affect the probability or amount of funding received by an Indian start-up.&lt;/p&gt;

&lt;p&gt;To understand these factors, we will research on the following questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What are the top five startup-sectors which are investor's favorites'?&lt;/li&gt;
&lt;li&gt;Can the success of obtaining finance from investors be impacted by location?&lt;/li&gt;
&lt;li&gt;Which stages receives more investment from investors for start-ups?&lt;/li&gt;
&lt;li&gt;What Sectors have the maximum amount of funding?&lt;/li&gt;
&lt;li&gt;What is the total amount of funds each year?&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;em&gt;Data Handling&lt;/em&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Using python scripts, we will explore four datasets with information about Indian startups funding for the years, 2018,2019,2020 and 2021. &lt;br&gt;
We will go through the following processes in a bid to validate out hypothesis &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load our data&lt;/li&gt;
&lt;li&gt;Process each dataset&lt;/li&gt;
&lt;li&gt;Merge the datasets &lt;/li&gt;
&lt;li&gt;Evaluate the data using univariate and multivariate analysis &lt;/li&gt;
&lt;li&gt;Visualize our findings &lt;/li&gt;
&lt;li&gt;Draw a conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First we will import the libraries that will enable us analyze the data. These libraries include &lt;br&gt;
pandas and NumPy: for data manipulation&lt;br&gt;
seaborn and matplotlib: for visualization&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Data handling
import pandas as pd
import numpy as np

# Vizualisation (Matplotlib, Plotly, Seaborn, etc. )
import matplotlib.pyplot as plt
import seaborn as sns
import re

# Other packages
import os
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Loading
&lt;/h2&gt;

&lt;p&gt;Using pandas.read_csv, we can load all our four datasets and be able to read the .csv files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# For CSV, use pandas.read_csv to import the data files
data_2018 = pd.read_csv('startup_funding2018.csv')
data_2019 = pd.read_csv('startup_funding2019.csv')
data_2020 = pd.read_csv('startup_funding2020.csv')
data_2021 = pd.read_csv('startup_funding2021.csv')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Data Processing
&lt;/h1&gt;

&lt;p&gt;In this section, we will explore each dataset separately. We will undertake three main tasks i.e&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand the data contained in them&lt;/li&gt;
&lt;li&gt;Identify the issues with the data&lt;/li&gt;
&lt;li&gt;Determine how to handle each of the issues identified 
To do this we will use the code:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;data.columns - To see what columns are contained in the data set &lt;br&gt;
data.head() - to preview the data contained in the dataset&lt;br&gt;
data.info() - to get a summary of data in each column&lt;br&gt;
pandas.isnull() - to check for the null values in each column &lt;/p&gt;
&lt;h2&gt;
  
  
  Processing 2018 Dataset
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_a0deX5m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p0q6kd84xmiimrikqke4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_a0deX5m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p0q6kd84xmiimrikqke4.jpg" alt="Image description" width="800" height="749"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;From the above, we notice the following&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
-The amount columns contains different currencies as well as cells with "-"&lt;br&gt;
-The Industry columns is described using more than one phrase&lt;br&gt;
-The Location is described using more than one physical address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To clean these;&lt;/strong&gt; &lt;br&gt;
&lt;em&gt;We will assume the Amounts without any signs are in dollars; thus convert the values in INP to USD and replace "-" with NaN.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Split and pick the first part of the phrase to describe the physical address/headquarter.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Pick only one phrase to describe the industry.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Processing 2019 - 2021 Datasets
&lt;/h2&gt;

&lt;p&gt;By repeating the process with the other datasets, we make the following observations &lt;/p&gt;
&lt;h3&gt;
  
  
  Observations from previewing the datasets
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The 2018 DataFrame&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The columns in 2018 are different from those of 2019 - 2021, meaning they have to be renamed for concatenation.&lt;/li&gt;
&lt;li&gt;The amounts in the 2018 DataFrame are a mix of Indian Rupees (INR) and US Dollars (USD), meaning they have to be converted into same currency.&lt;/li&gt;
&lt;li&gt;The industry and location columns have multiple information. A decision is to be made between selecting the first value before the separator(,) as the main value.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The 2019 DataFrame&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The datatype of the "Founded" column is set to float64. It should be set to a string for uniformity.&lt;/li&gt;
&lt;li&gt;The headquarter column has multiple information. A decision is to be made between selecting the first value before the separator(,) as the main value.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The 2020 DataFrame&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is an extra column called "Unnamed:9", giving it a total of 10 columns. It should be dropped to ensure complete alignment with the other DataFrames for ease of concatenation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The 2021 DataFrame&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The datatype of the "Founded" column is set to float64. It should be set to a string for uniformity.&lt;/li&gt;
&lt;li&gt;There are some cells that have null values, "Amount" has 3 null cells&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;General Observations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The currency signs and comma separator have to be removed from each of amount column for each DataFrame to allow numerical manipulation and analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The 2022 average INR/USD rate will be used to convert the Indian Rupee values to US Dollars in the 2018 DataFrame.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;First values of industry and location in the 2018 data will be selected as the primary sector and headquarters respectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amounts without currency symbols are assumed to be in USD ($)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Financial analysis will be narrowed to transactions whose amounts are available in the loaded data&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Merging the Datasets
&lt;/h2&gt;

&lt;p&gt;Once the cleaning of the data is completed, we combine all the four .csv files into one dataframe using the pandas.concat() syntax as demonstrated below. We can also view a summary of the new dataset contents as seen&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Joining all the four files using concatenate
data = pd.concat([data_2018,data_2019,data_2020,data_2021])

#Preview a summary of the data in the new combined file 
data.info()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ojCJ9Twk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/or4txy8rpzkyb6jidss3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ojCJ9Twk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/or4txy8rpzkyb6jidss3.jpg" alt="Image description" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can view more &lt;a href="https://azubiafrica-my.sharepoint.com/:u:/g/personal/wycliffe_omondi_azubiafrica_org/EcYf9kq8OylHuBxXU2tTjLcBqADv6qLZjASA3NV0H9drOQ?e=vk0xZH"&gt;Visualization of the findings here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By conducting multivariate analysis on the combined dataset, we are able to answer the research questions and put our findings on the visualization as shown. &lt;/p&gt;

&lt;p&gt;What are the top five startup-sectors which are investor's favorites'?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cTUK0NeZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v6vpxtd7czaqe79aei0k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cTUK0NeZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v6vpxtd7czaqe79aei0k.jpg" alt="Image description" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Can the success of obtaining finance from investors be impacted by location?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rt0Q1teJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/drtsyhw6kua37g7kaau9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rt0Q1teJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/drtsyhw6kua37g7kaau9.jpg" alt="Image description" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Which stages receives more investment from investors for start-ups?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HpqdNmRe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4tr4tk3oxodj9ielyq7c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HpqdNmRe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4tr4tk3oxodj9ielyq7c.jpg" alt="Image description" width="800" height="637"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What Sectors have the maximum amount of funding?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CV2hyA5F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ovjma7t95j3iqw7xiedp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CV2hyA5F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ovjma7t95j3iqw7xiedp.jpg" alt="Image description" width="800" height="704"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What is the total amount of funds each year?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0TBdxpNM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s475sdv4u54srknrw59w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0TBdxpNM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s475sdv4u54srknrw59w.jpg" alt="Image description" width="800" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exploring the Indian Startup funding provides insight into the vibrant and ever-growing venture capitalist ecosystem. From this analysis, we gain a better understanding of the characteristics associated with Indian startups.&lt;br&gt;
We are able to draw the following observations;&lt;br&gt;
&lt;em&gt;The Top sectors that are favorites' to investors are in FinTech and EdTech&lt;/em&gt;¶&lt;br&gt;
&lt;em&gt;The highest Amount received by a start-up are also in the FinTech and EdTech sectors&lt;/em&gt;&lt;br&gt;
&lt;em&gt;The startup with the highest amount of funding is also in the Edtech industry.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;We can therefore affirm the null hypothesis; &lt;strong&gt;&lt;em&gt;Indian Startups in the technology industry are likely to receive funding.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're interested in exploring this project further, please check out my &lt;a href="https://github.com/CptWycliffe/IndianStart_up_ecosystem"&gt;GitHub&lt;/a&gt; repository for more information, suggestions and input.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
