<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohammad Sakib Mahmood</title>
    <description>The latest articles on DEV Community by Mohammad Sakib Mahmood (@sakibb019).</description>
    <link>https://dev.to/sakibb019</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F626354%2Ff4e3cfda-1e94-4993-865b-325ec096958b.jpeg</url>
      <title>DEV Community: Mohammad Sakib Mahmood</title>
      <link>https://dev.to/sakibb019</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sakibb019"/>
    <language>en</language>
    <item>
      <title>Employee turnover analysis built using Python’s Scikit-Learn library.</title>
      <dc:creator>Mohammad Sakib Mahmood</dc:creator>
      <pubDate>Mon, 14 Jun 2021 15:19:40 +0000</pubDate>
      <link>https://dev.to/sakibb019/employee-turnover-analysis-built-using-python-s-scikit-learn-library-2nmn</link>
      <guid>https://dev.to/sakibb019/employee-turnover-analysis-built-using-python-s-scikit-learn-library-2nmn</guid>
      <description>&lt;p&gt;In this article features, I showed the implementation of an employee turnover analysis that is built using Python’s Scikit-Learn library. In this article, I will use Logistic Regression and Random Forest Machine Learning algorithms. &lt;/p&gt;

&lt;p&gt;At the end of this article, you would be able to choose the best algorithm for your future projects like Employee Turnover Prediction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Definition Employee Turnover?
&lt;/h3&gt;

&lt;p&gt;Employee Turnover or Employee Turnover ratio is the measurement of the total number of employees who leave an organization in a particular year. Employee Turnover Prediction understands that to predict whether an employee is going to leave the organization in the coming period.&lt;/p&gt;

&lt;p&gt;A Company uses this predictive analysis to measure how many employees they will need if the potential employees will leave their organization. A company also uses this predictive analysis to make the workstation better for the employees by understanding the major reasons for the high turnover ratio.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Preprocessing system
&lt;/h3&gt;

&lt;p&gt;Now let’s move into the data to learn further with this project on Employee Turnover Prediction. you can download the dataset, that I have used in this article from here. &lt;a href="https://github.com/sakibb019/Employee-turnover-analysis-built-using-Python-s-Scikit-Learn-library"&gt;Link&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
hr = pd.read_csv('HR_comma_sep.csv')
col_names = hr.columns.tolist()
print("Column names:")
print(col_names)
print("\nSample data:")
hr.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---i9YuPjj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7nf46cimnmoq9jetsk0l.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---i9YuPjj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7nf46cimnmoq9jetsk0l.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rename column name from “sales” to “department”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hr=hr.rename(columns = {'sales':'department'})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our data is pretty clean, with no missing values, so let’s move further and see how many employees work in the organization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hr.shape
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The “left” column is the outcome variable recording one and 0. 1 for employees who left the company and 0 for those who didn't. The department column of the dataset has many categories, and we need to reduce the categories for better modeling. Let’s see all the categories of the department column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hr['department'].unique()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y0OyTIY---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wxtnfa6w2do78e7m118m.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y0OyTIY---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wxtnfa6w2do78e7m118m.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Let’s add all the “technical”, “support” and “IT” columns into one column to make our analysis easier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np
hr['department']=np.where(hr['department'] =='support', 'technical', hr['department'])
hr['department']=np.where(hr['department'] =='IT', 'technical', hr['department'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating Variables for Categorical Variables
&lt;/h3&gt;

&lt;p&gt;As there are two categorical variables (department, salary) in the dataset and they need to be converted to dummy variables before they can be used for modeling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat_vars=['department','salary']
for var in cat_vars:
    cat_list='var'+'_'+var
    cat_list = pd.get_dummies(hr[var], prefix=var)
    hr1=hr.join(cat_list)
    hr=hr1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the actual variables need to be removed after the dummy variable has been created. Column names after creating dummy variables for categorical variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hr.drop(hr.columns[[8, 9]], axis=1, inplace=True)
hr.columns.values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YpWKS0hJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zmp1seog8emtjajtxhdx.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YpWKS0hJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zmp1seog8emtjajtxhdx.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
The outcome variable is “left”, and all the other variables are predictors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hr_vars=hr.columns.values.tolist()
y=['left']
X=[i for i in hr_vars if i not in y]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Feature Selection for Employee Turnover Prediction
&lt;/h3&gt;

&lt;p&gt;Let’s use the feature selection method to decide which variables are the best option that can predict employee turnover with great accuracy. There are a total of 18 columns in X, and now let’s see how we can select about 10 from them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
rfe = RFE(model, 10)
rfe = rfe.fit(hr[X], hr[y])
print(rfe.support_)
print(rfe.ranking_)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zaY117_k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k059xwa20x1xmcbb20pe.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zaY117_k--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k059xwa20x1xmcbb20pe.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
You can see that or feature selection chose the 10 variables for us, which are marked True in the support_ array and marked with a choice “1” in the ranking array. Now let's have a look at these columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cols=['satisfaction_level', 'last_evaluation', 'time_spend_company', 'Work_accident', 'promotion_last_5years', 
      'department_RandD', 'department_hr', 'department_management', 'salary_high', 'salary_low'] 
X=hr[cols]
y=hr['left']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Logistic Regression Model to Predict Employee Turnover
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bn3YO7NS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b1p7tve6svblnsg6g1k5.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bn3YO7NS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b1p7tve6svblnsg6g1k5.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Let’s check the accuracy of our logistic regression model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.metrics import accuracy_score
print('Logistic regression accuracy: {:.3f}'.format(accuracy_score(y_test, logreg.predict(X_test))))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pXooW0nP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jhaut21x62sk2ejst5fk.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pXooW0nP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jhaut21x62sk2ejst5fk.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Random Forest Classification Model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oUpHKvie--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mghd6vtl1ijdtp3zz56a.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oUpHKvie--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mghd6vtl1ijdtp3zz56a.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Now let’s check the accuracy of our Random Forest Classification Model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print('Random Forest Accuracy: {:.3f}'.format(accuracy_score(y_test, rf.predict(X_test))))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sCvGh9t_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/757503ksfvrevcc376eo.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sCvGh9t_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/757503ksfvrevcc376eo.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Confusion Matrix for our Machine Learning Models
&lt;/h3&gt;

&lt;p&gt;Now I will construct a confusion matrix to visualize predictions made by our classifier and evaluate the accuracy of our machine learning classification.&lt;/p&gt;
&lt;h4&gt;
  
  
  Random Forest
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.metrics import classification_report
print(classification_report(y_test, rf.predict(X_test)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v5_MV_DE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4lxtbtvkz9w3ndvtzp3j.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v5_MV_DE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4lxtbtvkz9w3ndvtzp3j.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib
import matplotlib.pyplot as plt
y_pred = rf.predict(X_test)
from sklearn.metrics import confusion_matrix
import seaborn as sns
forest_cm = metrics.confusion_matrix(y_pred, y_test, [1,0])
sns.heatmap(forest_cm, annot=True, fmt='.2f',xticklabels = ["Left", "Stayed"] , yticklabels = ["Left", "Stayed"] )
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.title('Random Forest')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--e5u5uS5R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cidzid1acbtw36rykfvi.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--e5u5uS5R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cidzid1acbtw36rykfvi.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Logistic Regression
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(classification_report(y_test, logreg.predict(X_test)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jvZ9LQdf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xn3dp8kz0tloklvdbv2u.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jvZ9LQdf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xn3dp8kz0tloklvdbv2u.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib
import matplotlib.pyplot as plt
logreg_y_pred = logreg.predict(X_test)
logreg_cm = metrics.confusion_matrix(logreg_y_pred, y_test, [1,0])
sns.heatmap(logreg_cm, annot=True, fmt='.2f',xticklabels = ["Left", "Stayed"] , yticklabels = ["Left", "Stayed"] )
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.title('Logistic Regression')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6TcTNV2Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z3hc7d605mkj5kazrcx8.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6TcTNV2Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z3hc7d605mkj5kazrcx8.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Employee Turnover Prediction Curve
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
logit_roc_auc = roc_auc_score(y_test, logreg.predict(X_test))
fpr, tpr, thresholds = roc_curve(y_test, logreg.predict_proba(X_test)[:,1])
rf_roc_auc = roc_auc_score(y_test, rf.predict(X_test))
rf_fpr, rf_tpr, rf_thresholds = roc_curve(y_test, rf.predict_proba(X_test)[:,1])
plt.figure()
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot(rf_fpr, rf_tpr, label='Random Forest (area = %0.2f)' % rf_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FIH35zuR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/590gej49u2snsvtfcsrg.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FIH35zuR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/590gej49u2snsvtfcsrg.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The receiver operating characteristic (ROC)&lt;/em&gt; curve is a standard tool used with binary classifiers. The red dotted line represents the ROC curve of a purely random classifier; a good classifier stays as far away from that line as possible (toward the top-left corner).&lt;br&gt;
So, as we can see that the Random Forest Model has proven to be more useful in the prediction of employee turnover, now let’s have a look at the feature importance of our random forest classification model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feature_labels = np.array(['satisfaction_level', 'last_evaluation', 'time_spend_company', 'Work_accident', 'promotion_last_5years', 
      'department_RandD', 'department_hr', 'department_management', 'salary_high', 'salary_low'])
importance = rf.feature_importances_
feature_indexes_by_importance = importance.argsort()
for index in feature_indexes_by_importance:
    print('{}-{:.2f}%'.format(feature_labels[index], (importance[index] *100.0)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Rlmtkkmb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v0lhswr8a7nicdjapwcs.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Rlmtkkmb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v0lhswr8a7nicdjapwcs.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
According to our Random Forest classification model, the above aspects show the most important features which will influence whether an employee will leave the company, in ascending order.&lt;/p&gt;

</description>
      <category>python</category>
      <category>scikitlearn</category>
      <category>machinelearning</category>
      <category>jupyter</category>
    </item>
    <item>
      <title>Heart Disease Analysis by using Machine Learning.</title>
      <dc:creator>Mohammad Sakib Mahmood</dc:creator>
      <pubDate>Mon, 07 Jun 2021 15:57:11 +0000</pubDate>
      <link>https://dev.to/sakibb019/heart-disease-analysis-by-using-machine-learning-3mf4</link>
      <guid>https://dev.to/sakibb019/heart-disease-analysis-by-using-machine-learning-3mf4</guid>
      <description>&lt;p&gt;Heart diseases refers to a group of conditions that affects your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease, Myocardial infraction, heart failure, heart rhythm problems (arrhythmia) and heart defects you’re born with (congenital heart defects), among others. Risk factor causing heart diseases following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Overweight &lt;/li&gt;
&lt;li&gt;High Blood pressure&lt;/li&gt;
&lt;li&gt;High-cholesterol level&lt;/li&gt;
&lt;li&gt;Diabetes Mellitus&lt;/li&gt;
&lt;li&gt;Being inactive&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GitHub Code &lt;a href="https://https://github.com/sakibb019/Heart-diseases-using-ML" rel="noopener noreferrer"&gt;Link&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from sklearn.neighbors import KNeighborsClassifier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.read_csv('dataset.csv')
print(df.head())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytjgo8vg64q46wb7zakr.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytjgo8vg64q46wb7zakr.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(df.info())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqa65n7xb7tynmy81iju.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqa65n7xb7tynmy81iju.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(df.describe())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj2uhhnbbfwkzwuu1an9.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdj2uhhnbbfwkzwuu1an9.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Feature Selection
&lt;/h3&gt;

&lt;p&gt;To get correlation of each feature in the data set&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import seaborn as sns
corrmat = df.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(16,16))
#plot heat map
g=sns.heatmap(df[top_corr_features].corr(),annot=True,cmap="RdYlGn")
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhz18u08z9w2b4ky2aso.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhz18u08z9w2b4ky2aso.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
It’s always a good practice to work with a data set where the target classes are of approximately equal size. Thus, let’s check for the same :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sns.set_style('whitegrid')
sns.countplot(x='target',data=df,palette='RdBu_r')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc94nghcufjut2s4owlvr.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc94nghcufjut2s4owlvr.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Data Processing
&lt;/h3&gt;

&lt;p&gt;After exploring the data set, I observed that I need to convert some categorical variables into dummy variables and scale all the values before training the Machine Learning models.&lt;/p&gt;

&lt;p&gt;First, I’ll use the &lt;code&gt;get_dummies&lt;/code&gt; method to create dummy columns for categorical variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dataset = pd.get_dummies(df, columns = ['sex', 'cp', 
                                        'fbs','restecg', 
                                        'exang', 'slope', 
                                        'ca', 'thal'])
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
columns_to_scale = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
dataset[columns_to_scale] = standardScaler.fit_transform(dataset[columns_to_scale])
dataset.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlvjmfpahbx1wwoalfi9.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlvjmfpahbx1wwoalfi9.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y = dataset['target']
X = dataset.drop(['target'], axis = 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.model_selection import cross_val_score
knn_scores = []
for k in range(1,21):
    knn_classifier = KNeighborsClassifier(n_neighbors = k)
    score=cross_val_score(knn_classifier,X,y,cv=10)
    knn_scores.append(score.mean())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.plot([k for k in range(1, 21)], knn_scores, color = 'red')
for i in range(1,21):
    plt.text(i, knn_scores[i-1], (i, knn_scores[i-1]))
plt.xticks([i for i in range(1, 21)])
plt.xlabel('Number of Neighbors (K)')
plt.ylabel('Scores')
plt.title('K Neighbors Classifier scores for different K values')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x6gp5wtv8kxlbamehuu.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x6gp5wtv8kxlbamehuu.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;knn_classifier = KNeighborsClassifier(n_neighbors = 12)
score=cross_val_score(knn_classifier,X,y,cv=10)
score.mean()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxls3cpyh77l0fl7qqn7.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxls3cpyh77l0fl7qqn7.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Random Forest Classifier
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.ensemble import RandomForestClassifier
randomforest_classifier= RandomForestClassifier(n_estimators=10)
score=cross_val_score(randomforest_classifier,X,y,cv=10)
score.mean()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t3vmy63chhl3120ab60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t3vmy63chhl3120ab60.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>jupyter</category>
      <category>python</category>
      <category>tensorflow</category>
    </item>
  </channel>
</rss>
