DEV Community

Suleyman Sade
Suleyman Sade

Posted on • Originally published at suleymansade.blogspot.com

How I Build a Diabetes Risk App with Python & ML

DiaGuide: Diabetes Risk Prediction App

๐Ÿ‘‹ Introduction

Hi there!

Last week, I built my first fully functional website, implementing AI to predict diabetes risk using historical data. This was my first time publishing a real, working websiteโ€”and I am honestly proud of the result. And I built all this during 48-hour hackathon, working solo.

I used Streamlit for the UI, scikit-learn for the AI training, and a model. Here is how:

๐Ÿ’ก The Idea

When the project topic was first released, I was slightly surprised to see healthcare ๐Ÿ’“ โ€” most of the hackathons I had joined previously allowed more open-ended, general tech solutions. But then I started thinking ๐Ÿค”, and this pushed me to research more deeply.

Since I was good at data analysis and developing ML models, I decided to focus on those areas. I spent all of Friday and Saturday morning brainstorming ideas. I actually came up with a couple ๐Ÿง :

  • ๐Ÿ’Š Medication Tracker App

    • An app to keep track of daily medication while allowing users to note how effective each medication is.
    • โœ… Easy to implement
    • โŒ The idea wasn't original enough
    • โŒ I wasnโ€™t sure how to keep track of the data on a web server
    • โŒ I didnโ€™t have enough time to learn and build a mobile app
  • โ“Symptom-Based Doctor Recommendation

    • A questionnaire to determine whether the person needs to go to the doctor.
    • โœ… Useful in real life
    • โŒ Too broad
    • โŒ Hard to implement with all the parameters and questions
    • โŒ Hard to find a reliable database to use

Because of the reasons I mentioned above, I decided to pivot. But I still liked the idea of using a questionnaire to determine something important.

So I shifted gears to something more specific โ€” a disease or mental health condition. I was very indecisive at first, but then I found a very useful diabetes dataset, and the parameters made sense to me. Thatโ€™s when I committed to building DiaGuide.

โš™๏ธ Tools & Stack

๐Ÿ€„ Language(s): Python

๐Ÿ’ป Frontend

  • ๐Ÿ”จ Tools & Libraries: Streamlit
  • โ“ Why I chose it: I needed a simple library that I could use to create layout and I needed it fast ๐Ÿƒโ€โ™‚๏ธ๐Ÿ’จ. As someone who doesn't have much experience in frontend design, Streamlit helped me a lot with being beginner-friendly and having a good documentation ๐Ÿ“– and tutorials. It was the perfect tool to create data-heavy app that requires minimal UI design.
  • ๐Ÿ› ๏ธ How I used it: It was basically the cornerstone of my UI. Everythingโ€”from layout to interactivityโ€”was built using ๐Ÿ Python. No HTML, no CSS, just clean Python code.

๐Ÿค– Machine Learning

  • ๐Ÿ”จ Tools & Libraries: scikit-learn
  • ๐Ÿ”Ž Model Type(s): Logistic Regression, Random Forest, Gradient Boosting
  • ๐Ÿงน Data Cleaning Libraries: Pandas, Numpy
  • ๐Ÿ’ช Performance/Evaluation: I created 3 different models and evaluated each one of them by getting their ROC curve. This evaluation checks how successful the model is using the test sample when 1 = exact, 0.5 = same as randomly choosing:
    • Logistic Regression: 0.81, very fast โšก
    • Gradient Boosting: 0.82, okay speed โŒ›
    • Random Forest: 0.77, slows down the code significantly ๐Ÿข and slower server response
    • ๐Ÿ† Result: Logistic Regressionโ€”it didn't have much accuracy difference with Gradient Boosting but was significantly faster.

๐Ÿ“Š Dataset

  • ๐ŸŒ Source: Kaggle / UCI Machine Learning Repository
  • ๐Ÿ”— Link: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/data
  • ๐Ÿ”ข Features Used: BMI, Age, Blood Pressure, etc.
  • ๐Ÿค” Why this dataset: โœ… Clean, ๐Ÿท๏ธ labeled, ๐Ÿง  interpretable
  • ๐Ÿ“ Additional Notes: I didn't use two of the columns in the databaseโ€”education level and incomeโ€”because I thought they were more personal. Also, I evaluated with and without them and they only increased the accuracy by 1%, which is not significant.

๐ŸŒ Hosting & Deployment

  • ๐Ÿš€ Where I deployed: Streamlit Cloud
  • ๐Ÿง‘โ€๐Ÿ’ป GitHub Repository: SuleymanSade/DiaGuide
  • ๐ŸŒ Streamlit App: DiaGuide App
  • ๐Ÿ› ๏ธ How I did it: I put all the libraries I used into requirements.txt โžก๏ธ Configured Git LFS to fit my models (since some of them were bigger than 100MB) โžก๏ธ Created a GitHub repo & pushed the code โžก๏ธ Set up Streamlit Cloud and it was ready to go
  • โ“ Why I used Streamlit Cloud: It was free-to-use and really simple to set up. It also connected to the repository, so if I make a change in the future, the website is going to update too.

Top comments (0)