Hey, have you ever wondered how machine learning can be used to predict diseases? In this beginner-friendly guide, I'll show you a step-by-step how to build a real ML model to predict diabetes using medical data - and then deploy it as a web app using Streamlit!
No prior experience with ML? No worries. I'll break down everything into simple terms. By the end, you'll not only understand the key concepts but will have built and deployed your own working project.
🔗 GitHub Repo (Full Code):
👉 diabetes-prediction
🧠What You'll Learn :
✅ How to use a real-world dataset for training a model
✅ What a classifier is and why we use it
✅ What it means to "train" a model
✅ How to save the model and use it to make predictions
✅ How to build an interactive web app (with no HTML or JS!)
✅ How to deploy your project online
📊 The Dataset - Explained Simply
We're using the PIMA Indians Diabetes Dataset from Kaggle. It contains medical records of women aged 21+, with features like:
Pregnancies: Number of times pregnant
Glucose: Blood sugar levels
Blood Pressure
Skin Thickness: Body fat measure
Insulin: Insulin level in blood
BMI: Body Mass Index
Diabetes Pedigree Function: Family history of diabetes
Age
The last column is Outcome:
0 = not diabetic
1 = diabetic
👉 Our goal: train a machine to predict this Outcome using the other inputs.
📥 Get it here: PIMA Dataset on Kaggle
**
🛠️ Step 1: Project Setup
**
Make a folder like this:
diabetes-prediction/
│
├── diabetes.csv # Your dataset
├── train_model.py # Trains your model
├── trained_model.sav # Output file (saved model)
└── diabetes_app.py # Streamlit web app
✅ Use a Virtual Environment
This keeps your project clean and avoids dependency issues.
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
✅ Install Required Libraries
pip install pandas numpy scikit-learn streamlit
Or if you have a requirements.txt:
pip install -r requirements.txt
🤖 Step 2: Train the Machine Learning Model
Let's break this part down 👇
💡 What is a Machine Learning Model?
A model is a mathematical formula that learns patterns from data and then uses those patterns to make predictions. Just like how we learn from examples, the model "learns" from the dataset.
We'll use a model called a Support Vector Machine (SVM). Don't worry about the name - just think of it as a smart classifier that tries to separate diabetic vs. non-diabetic patients.
🧪 Inside train_model.py
# Load the data
df = pd.read_csv('diabetes.csv')
# Split into input (X) and output (Y)
X = df.drop('Outcome', axis=1)
Y = df['Outcome']
We use 80% for training, 20% for testing - so we can see how well our model performs on unseen data.
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, stratify=Y, test_size=0.2)
Train the model using a linear classifier (fast and accurate for this type of data):
from sklearn.svm import SVC
model = SVC(kernel='linear')
model.fit(X_train, Y_train)
Check the accuracy:
from sklearn.metrics import accuracy_score
print("Training Accuracy:", accuracy_score(Y_train, model.predict(X_train)))
print("Test Accuracy:", accuracy_score(Y_test, model.predict(X_test)))
Save the model:
import pickle
pickle.dump(model, open('trained_model.sav', 'wb'))
Run the script:
python train_model.py
🌐 Step 3: Build the Streamlit Web App
💡 What is Streamlit?
Streamlit is like magic 🪄 - it turns Python scripts into beautiful web apps without writing any frontend code.
Inside diabetes_app.py
1. Load the saved model
model = pickle.load(open('trained_model.sav', 'rb'))
- Create a function to make predictions
def predict(input_data):
input_np = np.asarray(input_data).reshape(1, -1)
result = model.predict(input_np)
return 'Diabetic' if result[0] == 1 else 'Not Diabetic'
- Design the app UI using Streamlit
import streamlit as st
st.title("🩺 Diabetes Prediction Web App")
preg = st.number_input("Pregnancies", 0)
glu = st.number_input("Glucose", 0)
bp = st.number_input("Blood Pressure", 0)
skin = st.number_input("Skin Thickness", 0)
insulin = st.number_input("Insulin", 0)
bmi = st.number_input("BMI", 0.0)
dpf = st.number_input("Diabetes Pedigree Function", 0.0)
age = st.number_input("Age", 0)
if st.button("Predict"):
result = predict([preg, glu, bp, skin, insulin, bmi, dpf, age])
st.success(f"The person is {result}")
▶️ Step 4: Run the App
Run this in your terminal:
streamlit run diabetes_app.py
🎉 You'll get an interactive web page where you can input values and see results instantly.
🧠 Common Questions (for Beginners)
❓ What's the difference between training and testing?
Training = Teaching the model
Testing = Checking if it learned well
❓ What does "saving the model" mean?
It's like saving a recipe so we don't have to re-cook from scratch each time.
❓ Why use Streamlit?
It's beginner-friendly and avoids the complexity of HTML/JS/CSS. Perfect for data projects!
_☁️ Optional: Deploy on Streamlit Cloud
Want to share your app with others?
Push your code to GitHub
Go to streamlit.io/cloud
Click New App and select diabetes_app.py
Done! 🌍 Your app is now live
_
💬 Final Thoughts
Building your first ML project doesn't have to be hard. This project covers the full cycle:
✅ Load real data
✅ Train a model
✅ Save and reuse it
✅ Build a web app
✅ Share it with the world
Give it a try, fork the repo, tweak the code - and most importantly, have fun learning! 😄
Top comments (0)