Building a CKD Risk Predictor with Streamlit and Gradient Boosting

As a student exploring machine learning, I decided to build a project that predicts the risk of Chronic Kidney Disease (CKD) based on patient data. I wanted to create something simple but practical, so I combined a Gradient Boosting model with a Streamlit app to make it interactive and easy to test.

This post is a walkthrough of how I built the project, the tools I used, and how you can run it on your machine.

What the App Does

The app takes some basic clinical inputs, like blood pressure, serum creatinine, and hemoglobin, and predicts whether the patient might be at risk for CKD. It's built using a Gradient Boosting Classifier model and uses Streamlit to create a simple web interface.

I’m still learning, so it’s not a production-level tool, but it works for the purpose of learning and experimenting with how ML models can be used in real-world applications.

Tech Stack

I kept things simple:

Python
scikit-learn for machine learning
pandas and NumPy for data handling
Streamlit for the web app
joblib for saving and loading models

How It Works

Model: I trained a Gradient Boosting Classifier on a dataset of clinical records
Data: The dataset (kidney_disease.csv) includes patient info like blood pressure, creatinine levels, and other factors
Preprocessing: I handled missing data, scaled the features, and encoded categorical variables
App: The Streamlit app takes input from the user, runs it through the model, and shows a prediction

Running It Locally

If you want to try it yourself, here’s how you can run it locally:

Clone the repository:

   git clone https://github.com/Hassan123j/Chronic-Kidney-Disease-CKD-Predictor-App.git
   cd Chronic-Kidney-Disease-CKD-Predictor-App

Set up a virtual environment:

   python -m venv venv
   source venv/bin/activate      # On Windows: venv\Scripts\activate

Install the dependencies:

   pip install -r requirements.txt

Run the app:

   streamlit run app.py

If the model files (model_gbc.pkl and scaler.pkl) are missing, you can regenerate them by running the Jupyter notebook CKD Model.ipynb.

Project Structure

Here’s how the project is organized:

.
├── app.py               # Streamlit app
├── CKD Model.ipynb      # Model training notebook
├── kidney_disease.csv   # Dataset
├── requirements.txt     # Python dependencies
└── models/
    ├── model_gbc.pkl    # Trained model
    └── scaler.pkl       # Feature scaler

Next Steps

As a learner, I’m planning to improve this project in a few ways:

Add explainability to the model with SHAP or LIME
Deploy it online, maybe using Streamlit Cloud
Improve the UI to make it more user-friendly
Explore other ways to handle missing data and scale the model for better accuracy

GitHub Repo

You can find the full code and instructions here:
GitHub - Chronic Kidney Disease Predictor

Final Thoughts

This was a fun project that helped me practice both machine learning and web development. I learned a lot about how to handle datasets, train models, and build a simple interactive app. It’s definitely not perfect, but it was a good starting point, and I plan to continue improving it as I learn more.

If you have any feedback or suggestions, feel free to reach out. Also, if you’re working on similar projects, I’d love to hear about them!