<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mark A. Moussa</title>
    <description>The latest articles on DEV Community by Mark A. Moussa (@markamoussa).</description>
    <link>https://dev.to/markamoussa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671058%2Fe51dd058-c4c5-4dc5-8fe2-d0d08a369d3e.jpeg</url>
      <title>DEV Community: Mark A. Moussa</title>
      <link>https://dev.to/markamoussa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/markamoussa"/>
    <language>en</language>
    <item>
      <title>Disease Prediction Based On Medical Side Symptoms</title>
      <dc:creator>Mark A. Moussa</dc:creator>
      <pubDate>Thu, 29 Jul 2021 23:28:44 +0000</pubDate>
      <link>https://dev.to/markamoussa/disease-prediction-based-on-medical-side-symptoms-55fk</link>
      <guid>https://dev.to/markamoussa/disease-prediction-based-on-medical-side-symptoms-55fk</guid>
      <description>&lt;p&gt;In this article, we will discuss one of &lt;strong&gt;DOCTOR-Y's Machine Learning Models&lt;/strong&gt;. This model predicts the current patients' medical conditions based on the associated symptoms with the previous diagnoses from the patient's medical history. &lt;/p&gt;

&lt;p&gt;We used a dataset containing the diseases and their symptoms in a checker format and classified it using &lt;strong&gt;&lt;em&gt;5 different machine learning classifiers&lt;/em&gt;&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;If you don't know what is &lt;strong&gt;DOCTOR-Y&lt;/strong&gt; check this &lt;a href="https://www.linkedin.com/posts/omarreda291_healthcare-health-ehr-activity-6826664589346779136-fnMs"&gt;post&lt;/a&gt;.&lt;/p&gt;



&lt;h2&gt;
  
  
  Idea
&lt;/h2&gt;

&lt;p&gt;Physicians will spend a lot of time reviewing the patient's previous e-prescriptions provided on DOCTOR-Y to know their past medical conditions and previous diseases. &lt;/p&gt;

&lt;p&gt;That's why DOCTOR-Y provides a summarized chart that represents the percentages for suffering from a group of diseases based on the associated symptoms with the previous diagnoses. The model is provided with a dataset to train and classify these symptoms. &lt;/p&gt;

&lt;p&gt;The model takes the symptoms as input from previous prescriptions, and the output will be the predicted disease based on these symptoms.&lt;/p&gt;

&lt;p&gt;The snippet below shows how the model works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;symptoms_disease&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="n"&gt;continous_sneezing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shivering&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chills&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Allergy'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;h2&gt;
  
  
  Dataset
&lt;/h2&gt;

&lt;p&gt;In this model, we used &lt;a href="https://www.kaggle.com/itachi9604/disease-symptom-description-dataset"&gt;Disease Symptom Prediction Dataset&lt;/a&gt;. This dataset is balanced. However, feature vectors (samples) in the data had a redundancy problem. &lt;/p&gt;

&lt;p&gt;We chose the features of the unique vector (unique samples) and fed it to machine learning algorithms, then we reconstructed the data in a Boolean form to facilitate the process of training the model and get better results, to obtain a &lt;strong&gt;refactored dataset&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The data is in a checker format where we have &lt;strong&gt;&lt;em&gt;133 columns&lt;/em&gt;&lt;/strong&gt;, the last column is the diseases, and the others are all the symptoms. We have a total of &lt;strong&gt;&lt;em&gt;309 entries&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;41 unique disease&lt;/em&gt;&lt;/strong&gt; averaging &lt;strong&gt;&lt;em&gt;8 entries per disease&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The table below is a sample of the symptoms, and you can find the full list &lt;a href="https://docs.google.com/spreadsheets/d/1jrjL_MjRCPuwX_qLLXfIs2I9TgbG6BwmG0QcNR4WCoo/edit?usp=sharing"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;itching&lt;/td&gt;
&lt;td&gt;skin rash&lt;/td&gt;
&lt;td&gt;nodal skin eruptions&lt;/td&gt;
&lt;td&gt;continuous sneezing&lt;/td&gt;
&lt;td&gt;shivering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;visual disturbances&lt;/td&gt;
&lt;td&gt;receiving blood transfusion&lt;/td&gt;
&lt;td&gt;receiving unsterile injections&lt;/td&gt;
&lt;td&gt;coma&lt;/td&gt;
&lt;td&gt;stomach bleeding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;irregular sugar level&lt;/td&gt;
&lt;td&gt;cough&lt;/td&gt;
&lt;td&gt;high fever&lt;/td&gt;
&lt;td&gt;sunken eyes&lt;/td&gt;
&lt;td&gt;breathlessness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;swelling of stomach&lt;/td&gt;
&lt;td&gt;swelled lymph nodes&lt;/td&gt;
&lt;td&gt;malaise&lt;/td&gt;
&lt;td&gt;blurred and distorted vision&lt;/td&gt;
&lt;td&gt;phlegm&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The table below shows the diseases in the full dataset.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fungal infection&lt;/td&gt;
&lt;td&gt;Migraine&lt;/td&gt;
&lt;td&gt;hepatitis A&lt;/td&gt;
&lt;td&gt;Heart attack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allergy&lt;/td&gt;
&lt;td&gt;Cervical spondylosis&lt;/td&gt;
&lt;td&gt;Hepatitis B&lt;/td&gt;
&lt;td&gt;Varicose veins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GERD&lt;/td&gt;
&lt;td&gt;Paralysis(brain hemorrhage)&lt;/td&gt;
&lt;td&gt;Hepatitis C&lt;/td&gt;
&lt;td&gt;Hypothyroidism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chronic cholestasis&lt;/td&gt;
&lt;td&gt;Jaundice&lt;/td&gt;
&lt;td&gt;Hepatitis D&lt;/td&gt;
&lt;td&gt;Hyperthyroidism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drug Reaction&lt;/td&gt;
&lt;td&gt;Malaria&lt;/td&gt;
&lt;td&gt;Hepatitis E&lt;/td&gt;
&lt;td&gt;Hypoglycemia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peptic ulcer diseae&lt;/td&gt;
&lt;td&gt;Chicken pox&lt;/td&gt;
&lt;td&gt;Alcoholic hepatitis&lt;/td&gt;
&lt;td&gt;Osteoarthristis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIDS&lt;/td&gt;
&lt;td&gt;Dengue&lt;/td&gt;
&lt;td&gt;Tuberculosis&lt;/td&gt;
&lt;td&gt;Arthritis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diabetes&lt;/td&gt;
&lt;td&gt;Typhoid&lt;/td&gt;
&lt;td&gt;Common Cold&lt;/td&gt;
&lt;td&gt;(vertigo) Paroymsal Positional Vertigo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gastroenteritis&lt;/td&gt;
&lt;td&gt;Psoriasis&lt;/td&gt;
&lt;td&gt;Pneumonia&lt;/td&gt;
&lt;td&gt;Acne&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bronchial Asthma&lt;/td&gt;
&lt;td&gt;Impetigo&lt;/td&gt;
&lt;td&gt;Dimorphic hemmorhoids(piles)&lt;/td&gt;
&lt;td&gt;Urinary tract infection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hypertension&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Data Preparation
&lt;/h3&gt;

&lt;p&gt;For the decision tree algorithm, we used PCA to normalize our data and reduce our features from &lt;strong&gt;&lt;em&gt;132 to 70&lt;/em&gt;&lt;/strong&gt;, and we transformed our training and testing data on the vector produced from the PCA.&lt;/p&gt;



&lt;h3&gt;
  
  
  Model Definition
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The Model is trained on the discussed dataset. &lt;/li&gt;
&lt;li&gt;The Model Input: the symptoms. &lt;/li&gt;
&lt;li&gt;The Model Output: the possible diseases the patient may suffer from.&lt;/li&gt;
&lt;/ul&gt;



&lt;h3&gt;
  
  
  Model Training
&lt;/h3&gt;

&lt;p&gt;We used five classification algorithms to process the data.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Decision Tree.&lt;/li&gt;
&lt;li&gt;Random Forest.&lt;/li&gt;
&lt;li&gt;Naïve Bayes.&lt;/li&gt;
&lt;li&gt;K-Nearest Neighbor (KNN).&lt;/li&gt;
&lt;li&gt;Artificial Neural Networks (ANN) is illustrated in the figure below, which shows that the model has one input layer with &lt;strong&gt;&lt;em&gt;132 neurons&lt;/em&gt;&lt;/strong&gt; since we have &lt;strong&gt;&lt;em&gt;132 symptoms&lt;/em&gt;&lt;/strong&gt;, one hidden layer, and one output layer with &lt;strong&gt;&lt;em&gt;41 neurons&lt;/em&gt;&lt;/strong&gt; since we have &lt;strong&gt;&lt;em&gt;41 labels&lt;/em&gt;&lt;/strong&gt; as outputs, &lt;strong&gt;&lt;em&gt;batch size of 16&lt;/em&gt;&lt;/strong&gt;, and &lt;strong&gt;&lt;em&gt;20 epochs&lt;/em&gt;&lt;/strong&gt;. 
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y_mXdnGr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.discordapp.com/attachments/680848585838821411/867763286158475276/unknown.png" alt="ANN format"&gt;
&lt;/li&gt;
&lt;/ol&gt;



&lt;h2&gt;
  
  
  Evaluation &amp;amp; Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Dataset was spilt by &lt;strong&gt;&lt;em&gt;66/33&lt;/em&gt;&lt;/strong&gt; for all the classifiers.&lt;/li&gt;
&lt;li&gt;The accuracy of each classification technique used for predicting diseases based on symptoms.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Decision Tree&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KNN&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Naïve Bayes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ANN&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;The models showed a decent performance and very high accuracy. The best results were provided by ANN while Naïve Bayes &amp;amp; KNN &amp;amp; Random Forest provided comparable results. &lt;/p&gt;

&lt;p&gt;However, while working with real and unseen data, the Random Forest showed the best results out of all the classifiers. &lt;/p&gt;

&lt;p&gt;The table below shows the details of each model. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Review&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Decision Tree&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Working on the original data resulted in low accuracy.&lt;/li&gt;
&lt;li&gt;Features Reduction to normalize data and reduce dimensions lead to better results.&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;&lt;li&gt;This model showed promising results, and we observed that when increasing the number of estimators, the results improved significantly.&lt;/li&gt;&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KNN&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;&lt;li&gt;We observed that when K was reduced, the results improved, and on trial-and-error experimentation we chose &lt;strong&gt;&lt;em&gt;7&lt;/em&gt;&lt;/strong&gt; to be the value of K, more experimentation may lead to a better accuracy.&lt;/li&gt;&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Naïve Bayes&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;&lt;li&gt;This model showed good performance on the original data achieving very high accuracy.&lt;/li&gt;&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ANN&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Unfortunately, papers did not provide guidelines on configuring the network of this model. So we had to use trial and error and determined the following hyperparameters.&lt;/li&gt; &lt;ul&gt;
&lt;li&gt;Number of layers: &lt;strong&gt;&lt;em&gt;3&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Number of neurons unit: &lt;strong&gt;&lt;em&gt;305&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Epoch Number: &lt;strong&gt;&lt;em&gt;20&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Number of batches: &lt;strong&gt;&lt;em&gt;16&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt; &lt;li&gt;The neural network did not find it challenging to train, and after &lt;strong&gt;&lt;em&gt;20 epochs&lt;/em&gt;&lt;/strong&gt;, the training accuracy was good, and the test accuracy was also good.&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Integration With DOCTOR-Y
&lt;/h2&gt;

&lt;p&gt;We used the Diseases Symptoms Prediction Model's results and combined them with the &lt;a href="https://dev.to/ahmedsamy/disease-prediction-based-on-medical-diagnosis-547o"&gt;Diseases Diagnoses Prediction Model&lt;/a&gt;'s results to calculate the percentage of suffering from a group of diseases based on previous diagnoses + the associated symptoms.&lt;/p&gt;

&lt;p&gt;The final diseases and their percentages are sent to the system server, which sends them to the client-side to be represented on a chart as shown in the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--abuyANv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wchf7prlbgzw8xcnsmil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--abuyANv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wchf7prlbgzw8xcnsmil.png" alt="Summarized Charts"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>deeplearning</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
