<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ahmedsamy1234</title>
    <description>The latest articles on DEV Community by ahmedsamy1234 (@ahmedsamy).</description>
    <link>https://dev.to/ahmedsamy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F671060%2F810c2efc-a81f-4afb-91ac-0c1964346d46.png</url>
      <title>DEV Community: ahmedsamy1234</title>
      <link>https://dev.to/ahmedsamy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ahmedsamy"/>
    <language>en</language>
    <item>
      <title>Disease Prediction Based On Medical Diagnosis</title>
      <dc:creator>ahmedsamy1234</dc:creator>
      <pubDate>Thu, 29 Jul 2021 23:28:46 +0000</pubDate>
      <link>https://dev.to/ahmedsamy/disease-prediction-based-on-medical-diagnosis-547o</link>
      <guid>https://dev.to/ahmedsamy/disease-prediction-based-on-medical-diagnosis-547o</guid>
      <description>&lt;p&gt;In this article, we will discuss one of &lt;strong&gt;DOCTOR-Y's Machine Learning Models&lt;/strong&gt;. This model predicts the current patients' medical conditions based on the previous diagnoses from the patient's medical history.&lt;/p&gt;

&lt;p&gt;We used a dataset containing the diseases and their diagnosis and classified it using &lt;strong&gt;&lt;em&gt;3 different machine learning classifiers&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you don't know what is &lt;strong&gt;DOCTOR-Y&lt;/strong&gt; check this &lt;a href="https://www.linkedin.com/posts/omarreda291_healthcare-health-ehr-activity-6826664589346779136-fnMs"&gt;post&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Idea
&lt;/h2&gt;

&lt;p&gt;Physicians will spend a lot of time reviewing the patient's previous e-prescriptions provided on DOCTOR-Y to know their past medical conditions and previous diseases.&lt;/p&gt;

&lt;p&gt;That's why DOCTOR-Y provides a summarized chart representing the percentages for suffering from a group of diseases based on previous diagnoses. The model is provided with a dataset to train and classify these diseases. The model takes the diagnoses as input from previous prescriptions, and the output will be the predicted disease based on these diagnoses.&lt;/p&gt;

&lt;p&gt;The snippet below shows how the model works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;NLP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="s"&gt;"The patient has high blood pressure"&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Hypertension'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Dataset
&lt;/h2&gt;

&lt;p&gt;In this model, most of the data were collected from &lt;a href="https://www.kaggle.com/itachi9604/disease-symptom-description-dataset"&gt;Disease Symptom Prediction Dataset from Kaggle&lt;/a&gt;.&lt;br&gt;
Our &lt;strong&gt;&lt;em&gt;dataset&lt;/em&gt;&lt;/strong&gt; is used for the disease diagnosis model based on previous diagnoses, and it is divided into two columns the disease name, and diagnoses for that disease. We have &lt;strong&gt;&lt;em&gt;773 rows&lt;/em&gt;&lt;/strong&gt; with &lt;strong&gt;&lt;em&gt;41 unique diseases&lt;/em&gt;&lt;/strong&gt; leaving us with approximately &lt;strong&gt;&lt;em&gt;19 entries&lt;/em&gt;&lt;/strong&gt; for each disease.&lt;/p&gt;

&lt;p&gt;The dataset is balanced. However, we faced a problem regarding building it from scratch. This data may lead to misclassification for diseases based on different diagnoses, which will affect the model’s accuracy.&lt;/p&gt;

&lt;p&gt;The majority of the data is collected by hand from multiple healthcare sites; we looked carefully for definitions and diagnoses for the required diseases and ensured that no entries were duplicated.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;th&gt;Prognosis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fungal infection&lt;/td&gt;
&lt;td&gt;Migraine&lt;/td&gt;
&lt;td&gt;hepatitis A&lt;/td&gt;
&lt;td&gt;Heart attack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allergy&lt;/td&gt;
&lt;td&gt;Cervical spondylosis&lt;/td&gt;
&lt;td&gt;Hepatitis B&lt;/td&gt;
&lt;td&gt;Varicose veins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GERD&lt;/td&gt;
&lt;td&gt;Paralysis(brain hemorrhage)&lt;/td&gt;
&lt;td&gt;Hepatitis C&lt;/td&gt;
&lt;td&gt;Hypothyroidism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chronic cholestasis&lt;/td&gt;
&lt;td&gt;Jaundice&lt;/td&gt;
&lt;td&gt;Hepatitis D&lt;/td&gt;
&lt;td&gt;Hyperthyroidism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drug Reaction&lt;/td&gt;
&lt;td&gt;Malaria&lt;/td&gt;
&lt;td&gt;Hepatitis E&lt;/td&gt;
&lt;td&gt;Hypoglycemia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peptic ulcer diseae&lt;/td&gt;
&lt;td&gt;Chicken pox&lt;/td&gt;
&lt;td&gt;Alcoholic hepatitis&lt;/td&gt;
&lt;td&gt;Osteoarthristis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIDS&lt;/td&gt;
&lt;td&gt;Dengue&lt;/td&gt;
&lt;td&gt;Tuberculosis&lt;/td&gt;
&lt;td&gt;Arthritis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diabetes&lt;/td&gt;
&lt;td&gt;Typhoid&lt;/td&gt;
&lt;td&gt;Common Cold&lt;/td&gt;
&lt;td&gt;(vertigo) Paroymsal Positional Vertigo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gastroenteritis&lt;/td&gt;
&lt;td&gt;Psoriasis&lt;/td&gt;
&lt;td&gt;Pneumonia&lt;/td&gt;
&lt;td&gt;Acne&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bronchial Asthma&lt;/td&gt;
&lt;td&gt;Impetigo&lt;/td&gt;
&lt;td&gt;Dimorphic hemmorhoids(piles)&lt;/td&gt;
&lt;td&gt;Urinary tract infection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hypertension&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data Preparation
&lt;/h3&gt;

&lt;p&gt;We prepared the data to be cleaner to obtain better results, and we implemented the following preprocessors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stop Words Removal&lt;/strong&gt; is used to remove stop words like (“the”, “them”, etc.). &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lowercasing&lt;/strong&gt; is used to convert all words in subject and body to lowercase. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Punctuation Removal&lt;/strong&gt; is used to remove all the punctuations like ('[/(){}[]|@,;]') and replace them with spaces."&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Model Definition
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The Model is trained on the discussed dataset.&lt;/li&gt;
&lt;li&gt;The Model input: the diagnosis.&lt;/li&gt;
&lt;li&gt;The Model output: the possible diseases the patient may suffer from.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Model Training
&lt;/h3&gt;

&lt;p&gt;We used three classification algorithms to process this data which are :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  SVM (Support Vector Machine)&lt;/li&gt;
&lt;li&gt;  NLP LSTM (Long Short Term Memory for Natural Language  Processing)
    Our model will have one input layer, one embedding layer, one LSTM layer with &lt;strong&gt;&lt;em&gt;100 neurons&lt;/em&gt;&lt;/strong&gt; and one output layer with &lt;strong&gt;&lt;em&gt;41 neurons&lt;/em&gt;&lt;/strong&gt; since we have &lt;strong&gt;&lt;em&gt;41 labels&lt;/em&gt;&lt;/strong&gt; in the output &lt;strong&gt;&lt;em&gt;batch size of 64 and 80 epochs&lt;/em&gt;&lt;/strong&gt;. 
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4aYtV4nU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/todet0ikhautpz2uzs68.png" alt="NLP model strucutre"&gt;
&lt;/li&gt;
&lt;li&gt;  Multinomial Naïve Bayes &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Evaluation &amp;amp; Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Dataset in NLP(LSTM) model  was split by &lt;strong&gt;&lt;em&gt;90/10&lt;/em&gt;&lt;/strong&gt;, and in SVM and Naïve Bayes was &lt;strong&gt;&lt;em&gt;80/20&lt;/em&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The accuracy of each classification technique used for predicting diseases based on diagnoses:&lt;/li&gt;
&lt;/ul&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SVM&lt;/td&gt;
&lt;td&gt;81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NLP (LSTM)&lt;/td&gt;
&lt;td&gt;69.20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAÏVE BAYES&lt;/td&gt;
&lt;td&gt;74%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The accuracy of the NLP model in training nearly reached &lt;strong&gt;&lt;em&gt;90% accuracy&lt;/em&gt;&lt;/strong&gt; in training and &lt;strong&gt;&lt;em&gt;69.2% accuracy&lt;/em&gt;&lt;/strong&gt; in the validation phase.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BPw2ZCfD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l0e0akkjuwayibnr33vj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BPw2ZCfD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l0e0akkjuwayibnr33vj.png" alt="Accuracy of NLP Model "&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X7_BLkEC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/eyqbtnpyz64r9bo33co4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X7_BLkEC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/eyqbtnpyz64r9bo33co4.png" alt="loss curve of NLP model"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;The least performing model was the LSTM model, while the best performing model was the SVM and Naiive model&lt;/p&gt;

&lt;h3&gt;
  
  
  NLP Model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Unfortunately, papers did not provide guidelines on configuring the network of this model. So we had to use trial and error to choose the hyperparameters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The results of the LSTM model are worse than both the SVM and Naïve models by achieving &lt;strong&gt;&lt;em&gt;69% accuracy&lt;/em&gt;&lt;/strong&gt;; because the LSTM model reads the data sequentially and it has a memory that helps to keep words and use them in the prediction process, so it is more reliable than both.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Integration With DOCTOR-Y
&lt;/h2&gt;

&lt;p&gt;We used the Diseases Diagnoses Prediction Model's results and combined them with the &lt;a href="https://dev.to/markamoussa/disease-prediction-based-on-medical-side-symptoms-55fk"&gt;Diseases Symptoms Prediction Model&lt;/a&gt;'s results to calculate the percentage of suffering from a group of diseases based on previous diagnoses + the associated symptoms.&lt;/p&gt;

&lt;p&gt;The final diseases and their percentages are sent to the system server, which sends them to the client-side to be represented on a chart as shown in the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2yNPoSyr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/udlm5xyiwrpd4yu0100n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2yNPoSyr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/udlm5xyiwrpd4yu0100n.png" alt="Chart"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>deeplearnin</category>
    </item>
  </channel>
</rss>
